Pipeline¶
Pipeline is a sequence of callable layer (DataLayer
, ClassifierLayer
). These layers will be executed with given
input (text file) sequentially. Output of the pipeline is the output of last executed layer.
Pipeline can be initialized by default constructor, callable layer can be passed through pipeline in initialization once or use append method.
For example:
from sentivi import Pipeline
from sentivi.data import DataLoader, TextEncoder
from sentivi.classifier import SVMClassifier
from sentivi.text_processor import TextProcessor
text_processor = TextProcessor(methods=['word_segmentation', 'remove_punctuation', 'lower'])
pipeline = Pipeline(DataLoader(text_processor=text_processor, n_grams=3),
TextEncoder(encode_type='one-hot'),
SVMClassifier(num_labels=3))
or
pipeline = Pipeline()
pipeline.append(DataLoader(text_processor=text_processor, n_grams=3))
pipeline.append(TextEncoder(encode_type='one-hot'))
pipeline.append(SVMClassifier(num_labels=3))
Executing pipeline with given corpus (text file). By default text file should be in our format, double newline character
(\n\n)
is the separated symbol of training samples:
#corpus.txt
polarity_01
sentence_01
polarity_02
sentence_02
Pipeline also accept arbitrary keyword arguments when executed function is call, these arguments is passed through
executed functions of each layer. Training results will be represented as text in the form of
sklearn.metrics.classification_report
.
results = pipeline(train='train.txt', test='test.txt')
#results
Training classifier...
Testing classifier...
Saved classifier model to ./weights/svm.sentivi
Training results:
precision recall f1-score support
0 1.00 0.00 0.00 1
1 0.75 1.00 0.86 3
2 1.00 1.00 1.00 2
accuracy 0.83 6
macro avg 0.92 0.67 0.62 6
weighted avg 0.88 0.83 0.76 6
Test results:
precision recall f1-score support
1 1.00 1.00 1.00 1
2 1.00 1.00 1.00 1
accuracy 1.00 2
macro avg 1.00 1.00 1.00 2
weighted avg 1.00 1.00 1.00 2
Predict polarity with given texts:
predict_results = pipeline.predict(['hàng ok đầu tuýp có một số không vừa ốc siết. chỉ được một số đầu thôi .cần '
'nhất đầu tuýp 14 mà không có. không đạt yêu cầu của mình sử dụng',
'Son đẹpppp, mùi hương vali thơm nhưng hơi nồng, chất son mịn, màu lên chuẩn, '
'đẹppppp'])
print(predict_results)
print(f'Decoded results: {pipeline.decode_polarity(predict_results)}')
[2 1]
Decoded results: ['#NEG', '#POS']
For persistency, pipe can be save and load later:
pipeline.save('./weights/pipeline.sentivi')
_pipeline = Pipeline.load('./weights/pipeline.sentivi')
predict_results = _pipeline.predict(['hàng ok đầu tuýp có một số không vừa ốc siết. chỉ được một số đầu thôi .cần '
'nhất đầu tuýp 14 mà không có. không đạt yêu cầu của mình sử dụng',
'Son đẹpppp, mùi hương vali thơm nhưng hơi nồng, chất son mịn, màu lên chuẩn, '
'đẹppppp'])
print(predict_results)
print(f'Decoded results: {_pipeline.decode_polarity(predict_results)}')
-
class
sentivi.
Pipeline
(*args, **kwargs)¶ Pipeline instance
-
__init__
(*args, **kwargs)¶ Initialize Pipeline instance
- Parameters
args – arbitrary arguments
kwargs – arbitrary keyword arguments
-
append
(method)¶ Append a callable layer
- Parameters
method – [DataLayer, ClassifierLayer]
- Returns
None
-
decode_polarity
(x: Optional[list])¶ Decode numeric polarities into label polarities
- Parameters
x – List of numeric polarities (i.e [0, 1, 2, 1, 0])
- Returns
List of label polarities (i.e [‘neg’, ‘neu’, ‘pos’, ‘neu’, ‘neg’]
- Return type
List
-
forward
(*args, **kwargs)¶ Execute all callable layer in self.apply_layers
- Parameters
args –
kwargs –
- Returns
-
get_labels_set
()¶ Get labels set
- Returns
List of labels
- Return type
List
-
get_server
()¶ Serving model
- Returns
-
get_vocab
()¶ Get vocabulary
- Returns
Vocabulary in form of List
- Return type
List
-
keyword_arguments
()¶ Return pipeline’s protected attribute and its value in form of dictionary.
- Returns
key-value of protected attributes
- Return type
Dictionary
-
static
load
(model_path: str)¶ Load model from disk
- Parameters
model_path – path to pre-trained model
- Returns
-
predict
(x: Optional[list], *args, **kwargs)¶ Predict target polarity from list of given features
- Parameters
x – List of input texts
args – arbitrary positional arguments
kwargs – arbitrary keyword arguments
- Returns
List of labels corresponding to given input texts
- Return type
List
-
save
(save_path: str)¶ Save model to disk
- Parameters
save_path – path to saved model
- Returns
-
to
(device)¶ To device
- Parameters
device –
- Returns
-