Corpus

class sentivi.data.data_loader.Corpus(train_file: Optional[str] = None, test_file: Optional[str] = None, delimiter: Optional[str] = '\n', line_separator: Optional[str] = None, n_grams: Optional[int] = None, text_processor: Optional[sentivi.text_processor.TextProcessor] = None, max_length: Optional[int] = None, truncation: Optional[str] = 'head', mode: Optional[str] = 'sentivi')

Text corpus for sentiment analysis

__init__(train_file: Optional[str] = None, test_file: Optional[str] = None, delimiter: Optional[str] = '\n', line_separator: Optional[str] = None, n_grams: Optional[int] = None, text_processor: Optional[sentivi.text_processor.TextProcessor] = None, max_length: Optional[int] = None, truncation: Optional[str] = 'head', mode: Optional[str] = 'sentivi')

Initialize Corpus instance

Parameters
  • train_file – Path to train text file

  • test_file – Path to test text file

  • delimiter – Separator between text and labels

  • line_separator – Separator between samples.

  • n_grams – N-grams

  • text_processor – sentivi.text_processor.TextProcessor instance

  • max_length – maximum length of input text

build()

Build sentivi.data.data_loader.Corpus instance

Returns

sentivi.data.data_loader.Corpus instance

Return type

sentivi.data.data_lodaer.Corpus

get_test_set()

Get test samples

Returns

Input and output of test samples

Return type

Tuple[List, List]

get_train_set()

Get training samples

Returns

Input and output of training samples

Return type

Tuple[List, List]

text_transform(text)

Preprocessing raw text

Parameters

text – raw text

Returns

text

Return type

str