Skip to Main Content

Research Data Management: Data Deposit

Features of Data Repository

A data repository is a storage space for researchers to:

  • curate research data;
  • deposit data for peer review of data-related manuscripts; and
  • share data to other researchers for reuse.

A well-established data repository should be managed with the TRUST principles:TRUST principles

 

 

 

 

 

  • Transparency
    • All potential users could easily find and access the information on the scope, target user community, policies, and capabilities of a TRUSTworthy data repository.
  • Responsibility
    • A TRUSTworthy data repository takes responsibility for the stewardship of the data holdings and for serving its user community.
  • User Focus
    • A TRUSTworthy data repository needs to focus on serving its target user community.
  • Sustainability
    • A TRUSTworthy data repository has to ensure uninterrupted access to its valuable data holdings for current and future user communities.
  • Technology
    • A TRUSTworthy data repository supports secure, persistent, and reliable services by software, hardware, and technical services of appropriate standards.

When you choose an appropriate data repository, you may consider whether your funders, publishers, and collaborators have any requirements for data deposit.  For instance, Springer Nature has its Data Deposition Guidance.  A good data repository should be supported by the TRUST data repository principle.

Institutional Data Repository

An institutional data repository is for members of an institution to deposit data.

The CUHK Research Data Repository serves as the institutional research data repository for CUHK members to deposit their research data. It could support deposit and curation of data from different fields, of different formats and file sizes.

CUHK Research Data Repository

The CUHK Research Data Repository (Data Repository) serves as an institutional data repository for the CUHK community to deposit research data and for the worldwide to discover and reuse the data. It is developed with the open-source software Dataverse.

CUHK Research Data Repository

 

 

 

 

 

CUHK members can log into this Data Repository with their CUHK OnePass credentials. They can create and manage their personal folders under their corresponding departmental folders. Data should be deposited inside personal folders.

Researchers can publish the data in the Data Repository for open access. Each published data will be given a unique and permanent identifier, Digital Object Identifier (DOI). If you would like to share your data with publishers for peer review of data-related manuscripts, you can also get a private URL without publishing the data.

If students would like to create a personal folder and deposit data in the Data Repository, they must be granted editing rights by their supervisors beforehand.

For further details, please reference to the CUHK Research Data Repository Guide.

Types of Data Repository

Different types of data repositories serves different purposes and user groups.  They can be divided into the followings:

  • Institutional Data Repository
  • General-purpose Data Repositories
  • Discipline-specific Data Repositories

General-purpose Data Repositories

A General-purpose data repository is a subject independent repository for the public or the large user communities to deposit data.

Mendeley Data and Zenodo is a general-purpose research data repository where no discipline-specific solution is available. Figshare is another general-purpose data repository which allow researchers to freely deposit up to 20 GB (single file size limit of 5 GB) of data.

Discipline-specific Data Repositories

Discipline-specific data repositories are storage for data of specific fields and accommodate the data-deposit needs of a specific research community. Re3data.org provides information on more than 2,000 repositories. Below are some selected data repositories for reference.

Repository Research Area
Dryad Biological sciences, basic and applied; a repository for datasets associated with published articles
GenBank The NIH genetic sequence database, an annotated collection of all publicly available DNA sequences
ICPSR Social sciences data
NOAA National Centers for Environmental Information (NCEI) Climatic, geophysical and oceanographic data
Open Context Archeological data
Protein Data Bank Worldwide repository of information about the 3D structures of large biological molecules
QDR: Qualitative Data Repository Social sciences data
tDAR The Digital Archeological Record
UK Data Service Social, economic and population data

Grant agencies like National Science Foundation (NSF) demands a compulsory data deposit policy. Some funders require data to be deposited in specific data centers like ESRC. Advice should be sought from funders or grant agencies for the accessibility of datasets to fulfill their specific requirement.

Deposit your Software

Some repository solutions include additional features for the hosting of computer code. That can assist you in managing the source code during the development process: