|The following software for text mining is available at the Digital Scholarship Lab on the G/F of University Library. Designing to support digital scholarship research, the Lab is equipped with 10 workstations that are installed with these software for CUHK users. The complete list of these specialist software is available here. CUHK users may also reserve them for a longer period of time.|
Python & Anaconda
Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Anaconda Individual Edition is the world’s most popular Python distribution platform for data science and machine learning.
Due to the specialty of Chinese characters, a few tools/software have been developed for analysing Chinese texts. Below are two examples:
Developed by National University of Taiwan (NTU), this free tool was originally developed for research projects by Prof. Chueh Ho-chia in Department of Bio-industry Communication and Development. It was widely recommended to other researchers and students for the functionality in analysing Chinese texts with network analysis. The software is free for download and use.
MARKUS helps analysing Chinese and Korean texts by automatically tagging personal names, place names, temporal references, bureaucratic offices, etc. from the texts uploaded by users. It was developed as part of the project “Communication and Empire: Chinese Empires in Comparative Perspective” funded by the European Research Council.
OpenRefine is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
KNIME is a free and open-source data analytics platform. It integrates various components for machine learning and data mining through its modular data pipelining "Lego of Analytics" concept.
Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.
NVivo is a software for qualitative analysis. It can work with different text formats and even multi-media materials. Nvivo helps organizing, analyzing and finding insights in unstructured, or qualitative data such as interviews, open-ended survey responses, articles, social media and web content.
Voyant Tools is a free web-based tool allowing users to quickly visualise and analyze text data.
With its user-friendly interface and similarities with Excel, SPSS is a popular tool for statistical analysis in social science.
(Source: Statistics How To, http://www.statisticshowto.com/how-to-make-a-bar-chart-in-spss/)
Digital Scholar Lab (CUHK subscribed database)
It is developed by Gale which provides a cloud-based platform in applying digital humanities tools for analysis to customised raw text datasets from our subscription in Gale Primary Sources collections.