ParseHub
Beautiful Soup
Selenium
CORPRO (庫博中文獨立語料庫分析工具)
OpenRefine
Source: Web Scraper (2020). Data Cleaning with OpenRefine
CKIP Tagger
In recent years technology on Chinese classic text recognition has been more advanced. Few platforms have been developed for individual use and help to boost related research with Chinese classics.
古籍酷AI服務 (https://ocr.gj.cool/)
Developed by Master Xianchao (賢超法師) from Longchuan Temple (龍泉寺) in Beijing in providing free OCR service on Chinese classics images.
中研院文字辨識與校對平台 (https://ocr.ascdc.tw)
The platform is developed by Academia Sinica Center for Digital Cultures (ASCDC), Taiwan using image processing and deep learning technology. It includes image processing, textual detection and recognition with fine-tuning from user feedback.
Chinese Historical documents Automatic Transcription (CHAT) models (https://github.com/colibrisson/CHAT_models#chinese-historical-documents-automatic-transcription-chat-models)
This is part of an ongoing project by the Numerica Sinologica consortium in building open-source digital tools for pre-modern Chinese studies. Their repository contains segmentation and transcription models trained using the kraken OCR engine.
KNIME
Weka
Scikit-Learn
Tensorflow
Voyant Tools
Tableau
Matplotlib
Seaborn