A data repository is a storage space for researchers to:
A well-established data repository should be managed with the TRUST principles:
When you choose an appropriate data repository, you may consider whether your funders, publishers, and collaborators have any requirements for data deposit. For instance, Springer Nature has its Data Deposition Guidance. A good data repository should be supported by the TRUST data repository principle.
An institutional data repository is for members of an institution to deposit data.
The CUHK Research Data Repository serves as the institutional research data repository for CUHK members to deposit their research data. It could support deposit and curation of data from different fields, of different formats and file sizes.
The CUHK Research Data Repository (Data Repository) serves as an institutional data repository for the CUHK community to deposit research data and for the worldwide to discover and reuse the data. It is developed with the open-source software Dataverse.
CUHK members can log into this Data Repository with their CUHK OnePass credentials. They can create and manage their personal folders under their corresponding departmental folders. Data should be deposited inside personal folders.
Researchers can publish the data in the Data Repository for open access. Each published data will be given a unique and permanent identifier, Digital Object Identifier (DOI). If you would like to share your data with publishers for peer review of data-related manuscripts, you can also get a private URL without publishing the data.
If students would like to create a personal folder and deposit data in the Data Repository, they must be granted editing rights by their supervisors beforehand.
For further details, please reference to the CUHK Research Data Repository Guide.
Different types of data repositories serves different purposes and user groups. They can be divided into the followings:
A General-purpose data repository is a subject independent repository for the public or the large user communities to deposit data.
Mendeley Data and Zenodo is a general-purpose research data repository where no discipline-specific solution is available. Figshare is another general-purpose data repository which allow researchers to freely deposit up to 20 GB (single file size limit of 5 GB) of data.
Discipline-specific data repositories are storage for data of specific fields and accommodate the data-deposit needs of a specific research community. Re3data.org provides information on more than 2,000 repositories. Below are some selected data repositories for reference.
Repository | Research Area |
---|---|
Dryad | Biological sciences, basic and applied; a repository for datasets associated with published articles |
GenBank | The NIH genetic sequence database, an annotated collection of all publicly available DNA sequences |
ICPSR | Social sciences data |
NOAA National Centers for Environmental Information (NCEI) | Climatic, geophysical and oceanographic data |
Open Context | Archeological data |
Protein Data Bank | Worldwide repository of information about the 3D structures of large biological molecules |
QDR: Qualitative Data Repository | Social sciences data |
tDAR | The Digital Archeological Record |
UK Data Service | Social, economic and population data |
Grant agencies like National Science Foundation (NSF) demands a compulsory data deposit policy. Some funders require data to be deposited in specific data centers like ESRC. Advice should be sought from funders or grant agencies for the accessibility of datasets to fulfill their specific requirement.
Some repository solutions include additional features for the hosting of computer code. That can assist you in managing the source code during the development process: