Skip to Main Content

Text-mining and Analysis in Digital Scholarship Research: Software & Tools in CUHK Library

Text Mining Software Available at Digital Scholarship Lab, CUHK Library

The following software for text mining is available at the Digital Scholarship Lab on the G/F of University Library. Designing to support digital scholarship research, the Lab is equipped with 10 workstations that are installed with these software for CUHK users. The complete list of these specialist software is available here. CUHK users may also reserve them for a longer period of time.

Programming Language and Software

PythonAnaconda  
Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Anaconda Individual Edition is the world’s most popular Python distribution platform for data science and machine learning.

R & RStudio 
Used by majority of academic statisticians and social scientists, R is a free and open source software for statistical computing and graphics. 

Chinese Text Analysing Tools

Due to the specialty of Chinese characters, a few tools/software have been developed for analysing Chinese texts.  Below are two examples:

CORPRO 庫博

Developed by National University of Taiwan (NTU), this free tool was originally developed for research projects by Prof. Chueh Ho-chia in Department of Bio-industry Communication and Development.  It was widely recommended to other researchers and students for the functionality in analysing Chinese texts with network analysis.  The software is free for download and use.

MARKUS

MARKUS helps analysing Chinese and Korean texts by automatically tagging personal names, place names, temporal references, bureaucratic offices, etc. from the texts uploaded by users.  It was developed as part of the project “Communication and Empire: Chinese Empires in Comparative Perspective” funded by the European Research Council.

Text Analyzing Tools

OpenRefine
OpenRefine is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

KNIME
KNIME is a free and open-source data analytics platform. It integrates various components for machine learning and data mining through its modular data pipelining "Lego of Analytics" concept.

NVivo (Older PC version available on 1/F, University Library)
NVivo is a software for qualitative analysis. It can work with different text formats and even multi-media materials. NVivo helps organizing, analyzing and finding insights in unstructured, or qualitative data such as interviews, open-ended survey responses, articles, social media and web content.

Voyant Tools
Voyant Tools is a free web-based tool allowing users to quickly visualise and analyze text data.

SPSS (Institutional license provided by ITSC)
With its user-friendly interface and similarities with Excel, SPSS is a popular tool for statistical analysis in social science. 

(Source: Statistics How To, http://www.statisticshowto.com/how-to-make-a-bar-chart-in-spss/)

Digital Scholar Lab (CUHK subscribed database)
It is developed by Gale which provides a cloud-based platform in applying digital humanities tools for analysis to customised raw text datasets from our subscription in Gale Primary Sources collections.