LanguageFlow¶
⚠️ WARNING¶
⚠️ This project is archived and no longer maintained ⚠️
⏩ From underthesea v1.2.0
, languageflow is merged with underthesea. So enjoy a better underthesea
without languageflow dependency.
⚰️ RIP Languageflow
Old Readme¶
Data loaders and abstractions for text and NLP
Requirements¶
Install dependencies
$ pip install future, tox
$ pip install python-crfsuite==0.9.5
$ pip install Cython
$ pip install -U fasttext --no-cache-dir --no-deps --force-reinstall
$ pip install xgboost==0.82
Installation¶
$ pip install languageflow
Components¶
- Transformers: NumberRemover, CountVectorizer, TfidfVectorizer
- Models: SGDClassifier, XGBoostClassifier, KimCNNClassifier, FastTextClassifier, CRF
Data¶
Download a dataset using download command
$ languageflow download DATASET
List all dataset
$ languageflow list
Datasets¶
The datasets module currently contains:
- Tagged: VLSP2018-NER, VTB-CHUNK*, VLSP2016-NER*, VLSP2013-POS*, VLSP2013-WTK*
- Categorized: AIVIVN2019_SA*, VLSP2018_SA*, UTS2017_BANK, VLSP2016_SA*, VNTC
- Plaintext: VNESES, VNTQ_SMALL, VNTQ_BIG
Caution (*): With closed license dataset, you must provide URL to download
Example¶
Download UTS2017_BANK
dataset
$ languageflow download UTS2017_BANK
Use UTS2017_BANK
dataset
>>> from languageflow.data_fetcher import DataFetcher, NLPData
>>> corpus = DataFetcher.load_corpus(NLPData.UTS2017_BANK_SA)
>>> print(corpus)
CategorizedCorpus: 1780 train + 197 dev + 494 test sentences