Corpus¶
-
class
sentivi.data.data_loader.
Corpus
(train_file: Optional[str] = None, test_file: Optional[str] = None, delimiter: Optional[str] = '\n', line_separator: Optional[str] = None, n_grams: Optional[int] = None, text_processor: Optional[sentivi.text_processor.TextProcessor] = None, max_length: Optional[int] = None, truncation: Optional[str] = 'head', mode: Optional[str] = 'sentivi')¶ Text corpus for sentiment analysis
-
__init__
(train_file: Optional[str] = None, test_file: Optional[str] = None, delimiter: Optional[str] = '\n', line_separator: Optional[str] = None, n_grams: Optional[int] = None, text_processor: Optional[sentivi.text_processor.TextProcessor] = None, max_length: Optional[int] = None, truncation: Optional[str] = 'head', mode: Optional[str] = 'sentivi')¶ Initialize Corpus instance
- Parameters
train_file – Path to train text file
test_file – Path to test text file
delimiter – Separator between text and labels
line_separator – Separator between samples.
n_grams – N-grams
text_processor – sentivi.text_processor.TextProcessor instance
max_length – maximum length of input text
-
build
()¶ Build sentivi.data.data_loader.Corpus instance
- Returns
sentivi.data.data_loader.Corpus instance
- Return type
sentivi.data.data_lodaer.Corpus
-
get_test_set
()¶ Get test samples
- Returns
Input and output of test samples
- Return type
Tuple[List, List]
-
get_train_set
()¶ Get training samples
- Returns
Input and output of training samples
- Return type
Tuple[List, List]
-
text_transform
(text)¶ Preprocessing raw text
- Parameters
text – raw text
- Returns
text
- Return type
str
-