Data Management and Sharing

What is this page?

This page is intended as an introduction to the management and sharing of research data. 

Properly managing data can:
  • Help prevent data from being lost.
  • Make the research process more efficient.
  • Fulfill requirements from the organizations that sponsor and publish your work.
Sharing data can:
  • Enable others to ask new questions of existing datasets.
  • Advance the state of research and innovation.
  • Strengthen the evidence base, by ensuring that research is verifiable, reproducible, and transparent.
  • Ensure that the results of publically funded research are available to the public.

While proper data management and sharing benefit the research community as a whole, the researcher most likely to benefit from your data management and sharing practices is you.

The purpose of this guide is not to tell you that you need to manage and share your data in a specific way, but to communicate some best practices related to data management and sharing so that you can ensure your data can be understood, used, and build upon by your collaborators, other researchers, and- most importantly- by you in future.

Get Help

To request a consultation related to data management and/or data sharing, please follow this link or contact John Borghi - JBorghi(at)Stanford(dot)edu.

Managing and Sharing Data

Data management and sharing are iterative processes that occur throughout the course of a given research project. This means that decisions you make early- even while you are still planning out your data collection and analysis procedures- can have substantial effects on how you will be able to eventually use and share your data.

As shown by the visualization below, data management and sharing also exist along a continuum of practice. You and your collaborators may excel in one area of data management but have less formal practices related to another area. 

Source:  Borghi J.A., Abrams S., Lowenberg D., Simms B., Chodaki J. (2018) Support Your Data: A Research Data Management Guide for Researchers. Research Ideas and Outcomes, 4, e28439, https://doi.org/10.3897/rio.4.e26439
Defining our Terms

Data - According to the Stanford Research Handbook: Research data includes records that are necessary for the reconstruction and evaluation of reported results of research and the events and processes leading to those results, regardless of the form or the media on which they may be recorded.

Essentially, data includes the inputs or outputs- whether they consist of values entered into spreadsheets, images, or other formats- that are required to evaluate, reproduce, or built upon the analyses or conclusions of a given research project. For the purposes of this guide, This includes, this definition of data includes “raw” data, processed data, research-related code, and documentation pertaining to study parameters and procedures. 

Data Management - Encompasses activities and behaviors related to the storage, organization, documentation, and dissemination of data. Effective data management is also crucial to establishing the accessibility of data after a project’s conclusion. Proper data management is characterized by standardization (every member of a research team is following the same set of practices) and documentation (details of how data is to be managed are recorded, communicated, and updated when necessary).

Data Sharing – The release of data for use by others. Data sharing is a continuum of practices. While the term can refer to sharing data through a public repository it also includes sharing through more mediated or controlled mechanisms (e.g. data access committees, data use agreements, etc).

The sharing of research data is a component of Open Science, which encompasses a variety of efforts focused on making scientific research more transparent and accessible. Though the term is frequently used to refer to efforts aimed at ensuring access to the products of the research process - journal articles, datasets, code, and other materials - open science also encompasses efforts to ensure that the scientific enterprise is inclusive and equitable.

Defining our Terms

Some topics related to data management and sharing are best illustrated through example. Let's say you have a file data.csv, the contents of which are shown above. In its current form, could say what this data is describing? If you received this data from a collaborator, what questions would you have? Would you say that this data is complete?

We'll return to this example throughout this guide to demonstrate specific data management and sharing-related practices.

The contents of a file named data.csv
  Variable 1 Variable 2 Variable 3
1 97.5 55 97
2 97.6 52 98
3 97.5 49 97
4 97.5 58 98
5 97.4 56 98