LibGuides: Research Data Management: RDM in Practice​

Data Collection

1. File Organization

You may have many projects or files. Organizing your files properly can save your time to find and manage the files in the future. Creating subfolders in your storage can help you organize your files effectively.

There are many possible methods to organize the files in your storage. These are some possible schemes in the organization:

By project
By researcher
By date
By research notebook number
By sample number
By experiment type/instrument
By data type
By any combination of the above

2. File Naming Conventions

When you name a file, it is advised that the names are:

Unique
Indicative of what the file contains
In line with your research structure
Scannable
- Commonly understandable
- No space
- Use underscores, hyphens, camel cases
  - Examples:
    - Underscores: research_data_management
    - Hyphens: research-data-management
    - Camel cases (having the first character of each word capitalized, but without space): ResearchDataManagment

Dates

Chronological order by yyyymmdd for the computer to sort the files automatically by year-month-day

Preferred

Not preferred

Format

Chronological order by yyyymmdd:

Naming files by ddmmyyyy gives a messy date order:

Example

20180809.txt

20190302.txt

20200103.txt

02032019.txt

03012020.txt

09082018.txt

Not too lengthy, but not too short
- Without special characters (e.g. $, %, @)
- Including versioning when necessary

Data Storage and Backup

Here are some golden rules for storage and backup:

1. LOCKSS: Lots of Copies Keep Stuff Safe

2. 3-2-1 backup

Keep at least 3 copies of a file
On at least 2 different media
With at least 1 offsite

3. Good storage

Computer
External hard drives
Local drives and servers
Cloud storage with good security

4. Poor storage

Thumb drives: Thumb drives are easily stolen or lost

Data Preservation

When data are preserved in a data repository, some file formats are preferred due to the obsolescence of software and file formats. Here is a reference:

Data type	Preferred format	Description	Non-preferred format
Text	.txt .rtf .xml .pdf	Plain text format Rich text format eXtensible Mark-up Language PDF	Word
Tabular	.csv .tsv	Comma-separated values Tab-separated values	Excel
Image	.tif .svg .jp2	TIFF SVG JPEG2000	GIFF JPEG
Audio	.mp3 .wav	MP3 WAVE
Video	.mp4 .avi	MPEG-4 AVI

Data Documentation

Throughout the research life cycle, researchers should document the provenance, the content, and the ideas of the data in order to support the creation of metadata and readme file in later stages of the data life cycle.

1. Data Documentation: Metadata

To support the discovery of data, metadata should be put alongside the data.

Metadata is the data that describes other data. Metadata ensures data can be discovered, identified, managed, retrieved, and reused. It is vital to successful curation, although its creation takes time.

Metadata should be structured, as well as human- and machine-readable. It should be organized in a metadata standard. Dublin Core (DC) and Data Documentation Initiative (DDI) are two examples of common metadata standards.

When you choose a data repository to deposit your data, you will be asked to provide some description for your data. The way you fill in the form and the presentation of the descriptions in search results are organized as human-readable metadata. When this metadata is processed by the data repository, it is transformed into a machine-readable format, such as markup languages. A well-developed repository has its chosen metadata standard(s). Some metadata standards used by repositories around the world are available on https://www.re3data.org/.

2. Data Documentation: readme.txt

A readme.txt file allows prospective users to know how to open the data files, learn and reuse its content. It is advised to be deposited along with the data in a data repository. A readme file is usually in .txt format, and contains:

Project name
Project summary
Previous work on the project and location of that information
Funding information
Primary contact information
Your name and title, if you are the primary contact
Other people working on the project
Location of data and supporting information for the project
Organization and naming conventions used for the data
The relationship between the files that make up the dataset
The format(s) of the files in the dataset

3. Data Documentation: CUHK Research Data Repository

In the CUHK Research Data Repository, each data should be accompanied with some details on Citation Metadata to facilitate the discovery of the research data. Depending on the nature of the research data, the data owner can provide more details using other metadata templates:

Astronomy and Astrophysics Metadata
Geospatial Metadata
Life Sciences Metadata
Social Science and Humanities Metadata

The data owner can also provide a readme.txt along with the deposited data in order to facilitate potential users to understand and reuse the data.

Data Publishing and Sharing

When research outputs undergo peer review, publishers sometimes request access to data for validation of research results and preparation of data sharing. You can deposit your data in a data repository temporarily in private mode in order to protect your intellectual property before publishing your research outputs. You will be provided a private URL to your data for sharing with your publishers and trusted parties. For details on private data deposit at CUHK Research Data Repository, please refer to this page.

When your research outputs are ready to be published, you can publish the data for open access to support data sharing and reuse.

Research Data Management: RDM in Practice​

RDM in Practice

Data Collection

Data Storage and Backup

Data Preservation

Data Documentation

Data Publishing and Sharing

Research Data Management: RDM in Practice