Data Management and Sharing

Deciding What to Share

Stated very simply, data sharing is the release of data for use by others.

There are a variety of models for how data may be shared, including:

Fully open access – Data are made available through publicly-accessible repositories.
Controlled access – Data are made available to authorized users after a screening process.
Collaborative access among scientists - Data are made available to other researchers in a collaborative network.
Exclusive access for primary researchers – Data are only available to the research team involved in the data collection process.

When sharing, your data should not only be available but also usable. Making sure your data is well organized and documented as you are working on it will go a long way towards ensuring its usability when it comes time to share. One term you may encounter in this context is FAIR, which is defined below.

To be FAIR, your data should be:

F	Findable - The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers.
A	Accessible - Once a potential user finds the required data, they need to know how they can be accessed.
I	Interoperable - The data need to interoperate with applications or workflows for analysis, storage, and processing.
R	Reusable - Data should be well-described so that they can be replicated and/or combined in different settings.

Following the practices outlined in this guide will help you ensure that your data meets these guiding principles.

Deciding Where to Share

How and where you share your data will depend on the characteristics of your data. Datasets that are especially large or contain sensitive information can not be shared in the same way as datasets that are (comparably) smaller and are associated with less risk. A good rule of thumb when sharing data is to put it somewhere where it will be found by other researchers. For publicly shared data, this may involve choosing a particular repository. For data that is more restricted, this may involve giving precise instructions on how qualified researchers can gain access.

The figure below is designed to help you navigate the ever-changing data repository landscape. The registry of research repositories (re3data) is an extremely helpful resource for identifying repositories that are specialized for certain types of data.

Choosing an appropriate repository for your data

Why we don’t recommend sharing data “upon reasonable request”

Making data and other materials available “upon request” means that the requester must contact a member of the research team (often a corresponding author) and a team member must have the data on-hand (in a usable format) in order to respond to the request.

Over time- as contact information changes, team members move on, and data is archived- both requesting data and responding to requests become more difficult.

Why we don’t recommend sharing data as a supplementary material

Supplementary materials are an important part of the scholarly record, but it is not uncommon for links between them and the articles they are associated with to break down. Whenever feasible, we recommend uploading data into a repository that is designed to preserve and make data accessible to others and then link to and/or citing that dataset in your manuscript.

Clinical Trial Data

Attention researchers conducting clinical trials!

Stanford University researchers now have access to deposit their data free of charge into Vivli.

Vivli is a data sharing and analytics platform focused on the sharing of individual participant-level data from clinical trials. Data deposited in Vivli are assigned a Digital Object Identifier (DOI) to enhance discoverability and are archived and backed up through the Microsoft Azure cloud. Access to data deposited in Vivli is controlled by a managed access process and provided only after approval.

Vivli
Vivli is a non-profit organization working to advance human health through the insights and discoveries gained by sharing and analyzing data. It is home to an independent global data-sharing and analytics platform which serves all elements of the international research community. The platform includes a data repository, in-depth search engine and cloud-based analytics, and harmonizes governance, policy and processes to make sharing data easier. Vivli acts as a neutral broker between data contributor and data user and the wider data sharing community. The source of this description is the metadata record on FAIRsharing.org, an educational and informative resource that describes and links databases, standards, and data policies. FAIRsharing also creates collections of these resources and recommendations of databases and standards based on 3rd party data policies.

Dryad

The Dryad Digital Repository is a curated resource that makes research data discoverable, reusable, and citable. Dryad provides a home for a wide range of data types and is free to use for all Stanford affiliated researchers.

Key features of Dryad:

Flexible about file format, meaning you can upload your datasets in whatever form they take.
Automatically assigns digital object identifiers, meaning researchers will be able to easily cite your datasets.
Curated by experts, meaning that somebody at Dryad will check to ensure your files can be opened, you haven't inadvertently shared sensitive data, and that you have included sufficient descriptive information for another researcher to find and use your datasets.
Contents are preserved for the long term, meaning your datasets will be accessible indefinitely.

See their FAQ page for additional information about Dryad's features.

There are a variety of models and potential platforms for sharing your datasets with other researchers. Lane Library recommends Dryad as a way to openly share datasets that do not fit into more specialized repositories. For more information about Dryad, contact your liaison librarian.

Dryad is free for Stanford Affiliated Users

Dryad uses ORCID iDs for login. The first time you log in, you will be asked if you are affiliated with a member institution. After selecting Stanford from the drop-down menu, you will be asked to sign in using your Stanford credentials. On every subsequent login, you will only have to use your iDs.

Publish and Share your Data

Enter Metadata

Once you have logged into Dryad, you can begin the process of publishing and sharing your data. After clicking Start New Dataset, you will be prompted to begin entering metadata. Good metadata (also called data documentation) is vital for ensuring that your dataset can be discovered, understood, and used by other researchers.

Dryad only requires that you complete the title, authors, and abstract fields, but we strongly recommend that you complete every field and upload additional documentation (e.g. data dictionaries, readme files, etc) alongside your dataset.

Upload Methods

Dryad has two different methods for uploading data. Both methods allow you to upload multiple files.

Upload directly from your computer: For uploads less than 10gb.
Upload from a server or the cloud. For uploads up to 300gb.

Curation

Once you've uploaded your files, you can decide to submit them to the curation process immediately or keep them temporarily private for peer review. During the curation process, expert curators perform basic checks to ensure that the title and abstract are meaningful, there are sufficient methods and usage notes, that files can be opened, and that no sensitive information of material subject to copyright restrictions have been inadvertently included in the dataset. As an author, you can review the curation process for your dataset.

Describing Dryad

If you are plan to use Dryad to publish and share your data, please feel free to use or adapt the following description when completing data management plans or other documents:

Stanford University is a Dryad member institution. Dryad is an open source tool for data publication and digital preservation. Datasets deposited into Dryad are permanently archived in a CoreTrustSeal-certified repository. Data files are regularly audited to ensure fixity and authenticity and are replicated with multiple copies in multiple geographic locations. Professional curators examine all Dryad deposits to ensure the validity of the data, apply robust metadata, and make certain that highly sensitive information has not been inadvertently included. Datasets deposited in Dryad are automatically assigned a Digital Object Identifier (DOI) and are indexed by Google Dataset Search and other tools to enhance discoverability.

More information about Dryad's features, see this page. For additional assistance in describing Dryad or to discuss how it can be integrated into your research workflow, contact your liaison librarian.

Increasingly, there is an expectation that researchers will share their data. Data sharing can be a complex endeavor and, though we think very highly of Dryad, Lane Library recommends that you choose the method for sharing that is right for you and your data. Answering the questions below will help guide you through this process. For additional assistance, please see our upcoming classes and events page for workshops related to data management and sharing or contact your liaison librarian.

Do the groups that fund or publish your work specify where your data should be shared?

In some cases, your research funder or the journal publishing your work will specify that your data should be shared through a specific repository. For example, some projects funded by the National Institute of Mental Health are expected to share their data through NIMH Data Archive. In cases like this, we recommend that you share your data through the required repository.

Please note that some requirements state that data should be shared, but do not specify where. In such cases, refer to the next question.

Do researchers who work with similar data typically share it in a certain place?

If your research community typically shares the type of data you are looking to share through a specific repository, we generally recommend that you use that repository. To find repositories specialized for particular types of data, we recommend searching the Registry of Research Data Repositories (Re3Data).

If there is not a repository that is specific to the type of data your working with or if you have other concerns about sharing your data, see the next question.

Are there particular characteristics of your data that you think might affect how it can be shared?

Certain characteristics of your data may determine how and where it can be shared. For example, if you are working with big data (over 300 GB) or data that contains personally identifying information, we recommend scheduling a consultation with your liaison librarian so we can refer you to the appropriate group on campus to help you determine your options for making your data available.

However, if you are simply looking for a general-purpose data repository, we strongly recommend Dryad. Stanford Libraries also maintains the Stanford Digital Repository (SDR) which is recommended to Stanford University affiliates.

Giving and Receiving Credit

Data are generally considered to be citable products of research, meaning you should cite them when you use them and look for repositories that facilitate easy citation.

Data citations should facilitate identification of, access, and verification of the specific data that support a claim.
A data citation should include a persistent method for identification that is both unique and widely used by a community.

Persistent Identifiers related to Data

Digital Object Identifier (DOI) - A unique alphanumeric string to identify content and provide a persistent link to its location on the internet. DOIs are commonly assigned to journal articles (i.e. https://doi.org/10.3897/rio.4.e26439) but can also be assigned to datasets. Getting a DOI for your dataset helps ensure that other researchers will be able to find and use it.

Accession Number - A unique number (or alphanumeric string) assigned by a database as a means of locating a specific object. When citing or pointing to a dataset, you should at least provide its accession number.

Research Resource Identifiers (RRID) - Unique ID numbers assigned to help researchers cite key resources (antibodies, model organisms, software projects, etc) in the biomedical literature. RRIDs are not applied to datasets directly but should be included in related documentation.

ORCID iD - An alphanumeric code that uniquely identifies scientific and other academic authors and contributors. Lane Library recommends that every researcher claim their ORCID iD. For more on registering and using an ORCID iD, head over to our dedicated guide page.