Skip to Main Content
Notre Dame 5 Star University
University Library




Research Data Management

Documenting research

By capturing and documenting useful, accurate and relevant facts around your data, it can be made more useful to your future self and others. It can add valuable context and depth to your data and act as a reminder to yourself of what you did. Later on, if you choose to publish your data, good documentation of data will improve the visibility and value of your research to others. These will all contribute significantly to the Findability and Reusability of your data.

When considering what to document, there is no universal approach - each research project will examine something new; each discipline will have things that are considered useful or critical. But despite differences from field to field, there is one general thing to remember about documenting and describing your data:

Maximise documentation - Your own research problem will guide what data you collect. However, by capturing other information at the same time, your data can become exponentially more useful to future researchers. It may be that the context where you are making your observations will never exist again! It may also be that something you capture unexpectedly becomes critical to your own research later on. You'll need to balance your available time and resources with the possibilities, but if you can easily capture something that could inform future research, you should do so.

The following is a non-exhaustive list of elements to consider describing and documenting as you collect your data. You should think through each section and consider if that element is relevant to your research data, or if it could be useful to future researchers.
















 A name of the dataset or the name of the project.


 Names and contact details of the organisations or people who created the data.


 Unique and persistent identifiers used to

  • Identify the project: RAiD
  • Identify the people: ORCID
  • Identify the data: DOI (digital data), IGSN (material samples)
  • Identify the organisation: ROR


 Any key dates associated with the data. This may including project start and end date, time period or any other important dates associated with the data.


 Information on how the data was generated, such as specific equipment or software used (including model and version numbers), formulae, algorithms or methodologies.

 You might include an electronic lab notebook to aid this element.


 Information on how the data has been transformed, altered or processed (eg. normalisation).


 Citations to any third party data used in the research obtained or derived from other sources. At a minimum it should include the creator, the year, the title of the dataset and some form of access information.




Content description




 Keywords or phrases describing the subject or content of the data. This may also include Field of Research codes and Socio-Economic Objective codes (see the Australian Bureau of Statistics site for both code schedules)


 Descriptions of relevant geographic information. This could be city names, region names, countries or more precise geographic coordinates.

  Variable list

 A list of all variables in the data files, where applicable. This could also be captured in a codebook.

  Code list

 Explanation of codes or abbreviations used in either the file names or the variables in the data files (eg. '999 indicates a missing value in the data')




Technical description

  File inventory

 A list of all the files that make up the dataset, including extensions (eg. photo1023.jpeg’, ‘participant12.pdf’).

  File Formats

 File formats of the data (eg. HTML, PDF, GeoTIFF, JPEG etc).

  File structure

 Organisation of the data file(s) and layout of the variables, where applicable.


 Information on the different versions of the dataset that exist, if relevant.


Names and version numbers of any special-purpose software packages required to use, create, view, or analyse the data.




 Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data.


 Where and how the data can be accessed.



One way that data documentation commonly appears is in structured descriptions of the data - this is known as metadata. Metadata is often defined literally as "data about data" and refers to the information used to describe the attributes of a resource in a standardised format. Metadata is often collected into one central location, so researchers can search in one place to find datasets that will help their research. By including useful and accurate information in your metadata and data description, other researchers will be more likely to find and reuse your dataset.

Many disciplines actually have their own specific ways of structuring metadata - these specific structures are called schemas. A schema will list what information you'll need to include about your data and how that information should be structured. Below are some examples of various schemas.


  Metadata standard


  Dublin Core (DC)

  Metadata Object Description Schema (MODS)

  Metadata Encoding and Transmission Standard (METS)

  Arts/Creative Work

  Categories for the Description of Works of Art (CDWA)

  Visual Resources Association (VRA Core)

  Cultural heritage   MIDAS-Heritage


  Darwin Core


  Ecological Metadata Language (EML)


  Content Standard for Digital Geospatial Metadata (CSDGM)


  Protocol Data Element Definitions for clinical trials

  Genome Metadata

  Social sciences

  Data Documentation Initiative (DDI)