Skip to Main Content

Research Data Management: RDM in Practice​

Data Collection

1. File Organization

You may have many projects or files. Organizing your files properly can save your time to find and manage the files in the future. Creating subfolders in your storage can help you organize your files effectively.

There are many possible methods to organize the files in your storage. These are some possible schemes in the organization:​

  • By project​
  • By researcher​
  • By date​
  • By research notebook number​
  • By sample number​
  • By experiment type/instrument​
  • By data type​
  • By any combination of the above

 

2. File Naming Conventions

When you name a file, it is advised that the names are:​

  • Unique​
  • Indicative of what the file contains​
  • In line with your research structure​
  • Scannable​
    • Commonly understandable​
    • No space​
    • Use underscores, hyphens, camel cases​
      • Examples:​
        • Underscores:  research_data_management​
        • Hyphens: research-data-management​
        • Camel cases (having the first character of each word capitalized, but without space): ResearchDataManagment​
  • Dates​
    • Chronological order by yyyymmdd for the computer to sort the files automatically by year-month-day ​
    •   Preferred Not preferred
      Format Chronological order by yyyymmdd:​ Naming files by ddmmyyyy gives a messy date order:​
      Example

      20180809.txt​

      20190302.txt​

      20200103.txt​

      02032019.txt​

      03012020.txt​

      09082018.txt​

  • Not too lengthy, but not too short​
    • Without special characters (e.g. $, %, @)​
    • Including versioning when necessary

Data Storage and Backup

Here are some golden rules for storage and backup:​

1. LOCKSS: Lots of Copies Keep Stuff Safe​

2. 3-2-1 backup

  • Keep at least 3 copies of a file​
  • On at least 2 different media​
  • With at least 1 offsite​

3. Good storage​

  • Computer​
  • External hard drives​
  • Local drives and servers​
  • Cloud storage with good security​

4. Poor storage​

  • Thumb drives: Thumb drives are easily stolen or lost

Data Preservation

When data are preserved in a data repository, some file formats are preferred due to the obsolescence of software and file formats. Here is a reference: ​

Data type Preferred format Description Non-preferred format​
Text .txt
.rtf​
.xml
.pdf​
Plain text format​
Rich text format​
eXtensible Mark-up Language​
PDF​
Word
Tabular .csv
.tsv​
Comma-separated values​
Tab-separated values
Excel
Image .tif​
.svg
.jp2​
TIFF​
SVG​
JPEG2000​
GIFF
JPEG​
Audio  .mp3
.wav
MP3​
WAVE​
 
Video .mp4​
.avi​
MPEG-4​
AVI​
 

Data Documentation

Throughout the research life cycle, researchers should document the provenance, the content, and the ideas of the data in order to support the creation of metadata and readme file in later stages of the data life cycle.​

 

1. Data Documentation: Metadata

To support the discovery of data, metadata should be put alongside the data.​

Metadata is the data that describes other data. Metadata ensures data can be discovered, identified, managed, retrieved, and reused. It is vital to successful curation, although its creation takes time.

Metadata should be structured, as well as human- and machine-readable. It should be organized in a metadata standard. Dublin Core (DC) and Data Documentation Initiative (DDI) are two examples of common metadata standards.

When you choose a data repository to deposit your data, you will be asked to provide some description for your data. The way you fill in the form and the presentation of the descriptions in search results are organized as human-readable metadata. When this metadata is processed by the data repository, it is transformed into a machine-readable format, such as markup languages. A well-developed repository has its chosen metadata standard(s). Some metadata standards used by repositories around the world are available on https://www.re3data.org/.

 

2. Data Documentation: readme.txt

A readme.txt file allows prospective users to know how to open the data files, learn and reuse its content. It is advised to be deposited along with the data in a data repository. A readme file is usually in .txt format, and contains:​

  • Project name​
  • Project summary​
  • Previous work on the project and location of that information​
  • Funding information​
  • Primary contact information​
  • You name and title, if you are not the primary contact​
  • Other people working on the project​
  • Location of data and supporting information for the project​
  • Organization and naming conventions used for the data​
  • The relationship between the files that make up the dataset​
  • The format(s) of the files in the dataset​

 

3. Data Documentation: CUHK Research Data Repository

In the CUHK Research Data Repository, each data should be accompanied with some details on Citation Metadata to facilitate the discovery of the research data. Depending on the nature of the research data, the data owner can provide more details using other metadata templates:

  • Astronomy and Astrophysics Metadata
  • Geospatial Metadata
  • Life Sciences Metadata
  • Social Science and Humanities Metadata

The data owner can also provide a readme.txt along with the deposited data in order to facilitate potential users to understand and reuse the data.

Data Publishing and Sharing

When research outputs undergo peer review, publishers sometimes request access to data for validation of research results and preparation of data sharing. You can deposit your data in a data repository temporarily in private mode in order to protect your intellectual property before publishing your research outputs. You will be provided a private URL for sharing with your publishers and trusted parties. For details on private data deposit at CUHK Research Data Repository, please refer to this page.

When your research outputs are ready to be published, you can publish the data for open access to support data sharing and reuse.