Pipeline¶

Pipeline is a sequence of callable layer (DataLayer, ClassifierLayer). These layers will be executed with given input (text file) sequentially. Output of the pipeline is the output of last executed layer.

Pipeline can be initialized by default constructor, callable layer can be passed through pipeline in initialization once or use append method.

For example:

from sentivi import Pipeline
from sentivi.data import DataLoader, TextEncoder
from sentivi.classifier import SVMClassifier
from sentivi.text_processor import TextProcessor

text_processor = TextProcessor(methods=['word_segmentation', 'remove_punctuation', 'lower'])

pipeline = Pipeline(DataLoader(text_processor=text_processor, n_grams=3),
                    TextEncoder(encode_type='one-hot'),
                    SVMClassifier(num_labels=3))

or

pipeline = Pipeline()
pipeline.append(DataLoader(text_processor=text_processor, n_grams=3))
pipeline.append(TextEncoder(encode_type='one-hot'))
pipeline.append(SVMClassifier(num_labels=3))

Executing pipeline with given corpus (text file). By default text file should be in our format, double newline character (\n\n) is the separated symbol of training samples:

#corpus.txt
polarity_01
sentence_01

polarity_02
sentence_02

Pipeline also accept arbitrary keyword arguments when executed function is call, these arguments is passed through executed functions of each layer. Training results will be represented as text in the form of sklearn.metrics.classification_report.

results = pipeline(train='train.txt', test='test.txt')

#results

Training classifier...
Testing classifier...
Saved classifier model to ./weights/svm.sentivi

Training results:
              precision    recall  f1-score   support

           0       1.00      0.00      0.00         1
           1       0.75      1.00      0.86         3
           2       1.00      1.00      1.00         2

    accuracy                           0.83         6
   macro avg       0.92      0.67      0.62         6
weighted avg       0.88      0.83      0.76         6

Test results:
              precision    recall  f1-score   support

           1       1.00      1.00      1.00         1
           2       1.00      1.00      1.00         1

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2

Predict polarity with given texts:

predict_results = pipeline.predict(['hàng ok đầu tuýp có một số không vừa ốc siết. chỉ được một số đầu thôi .cần '
                                    'nhất đầu tuýp 14 mà không có. không đạt yêu cầu của mình sử dụng',
                                    'Son đẹpppp, mùi hương vali thơm nhưng hơi nồng, chất son mịn, màu lên chuẩn, '
                                    'đẹppppp'])
print(predict_results)
print(f'Decoded results: {pipeline.decode_polarity(predict_results)}')

[2 1]
Decoded results: ['#NEG', '#POS']

For persistency, pipe can be save and load later:

pipeline.save('./weights/pipeline.sentivi')
_pipeline = Pipeline.load('./weights/pipeline.sentivi')

predict_results = _pipeline.predict(['hàng ok đầu tuýp có một số không vừa ốc siết. chỉ được một số đầu thôi .cần '
                                    'nhất đầu tuýp 14 mà không có. không đạt yêu cầu của mình sử dụng',
                                    'Son đẹpppp, mùi hương vali thơm nhưng hơi nồng, chất son mịn, màu lên chuẩn, '
                                    'đẹppppp'])
print(predict_results)
print(f'Decoded results: {_pipeline.decode_polarity(predict_results)}')

class sentivi.Pipeline(*args, **kwargs)¶

Pipeline instance

__init__(*args, **kwargs)¶

Initialize Pipeline instance

Parameters

args – arbitrary arguments
kwargs – arbitrary keyword arguments

append(method)¶

Append a callable layer

Parameters: method – [DataLayer, ClassifierLayer]
Returns: None

decode_polarity(x: Optional[list])¶

Decode numeric polarities into label polarities

Parameters: x – List of numeric polarities (i.e [0, 1, 2, 1, 0])
Returns: List of label polarities (i.e [‘neg’, ‘neu’, ‘pos’, ‘neu’, ‘neg’]
Return type: List

forward(*args, **kwargs)¶

Execute all callable layer in self.apply_layers

Parameters

args –
kwargs –

Returns

get_labels_set()¶

Get labels set

Returns: List of labels
Return type: List

get_server()¶

Serving model

Returns

get_vocab()¶

Get vocabulary

Returns: Vocabulary in form of List
Return type: List

keyword_arguments()¶

Return pipeline’s protected attribute and its value in form of dictionary.

Returns: key-value of protected attributes
Return type: Dictionary

static load(model_path: str)¶

Load model from disk

Parameters: model_path – path to pre-trained model
Returns

predict(x: Optional[list], *args, **kwargs)¶

Predict target polarity from list of given features

Parameters

x – List of input texts
args – arbitrary positional arguments
kwargs – arbitrary keyword arguments

Returns

List of labels corresponding to given input texts

Return type

List

save(save_path: str)¶

Save model to disk

Parameters: save_path – path to saved model
Returns

to(device)¶

To device

Parameters: device –
Returns