University Library: Research Data Management: Documentation and description

Documenting research

By capturing and documenting useful, accurate and relevant facts around your data, it can be made more useful to your future self and others. It can add valuable context and depth to your data and act as a reminder to yourself of what you did. Later on, if you choose to publish your data, good documentation of data will improve the visibility and value of your research to others. These will all contribute significantly to the Findability and Reusability of your data.

When considering what to document, there is no universal approach - each research project will examine something new; each discipline will have things that are considered useful or critical. But despite differences from field to field, there is one general thing to remember about documenting and describing your data:

Maximise documentation - Your own research problem will guide what data you collect. However, by capturing other information at the same time, your data can become exponentially more useful to future researchers. It may be that the context where you are making your observations will never exist again! It may also be that something you capture unexpectedly becomes critical to your own research later on. You'll need to balance your available time and resources with the possibilities, but if you can easily capture something that could inform future research, you should do so.

The following is a non-exhaustive list of elements to consider describing and documenting as you collect your data. You should think through each section and consider if that element is relevant to your research data, or if it could be useful to future researchers.

General overview	Title	A name of the dataset or the name of the project.
	Creator/s	Names and contact details of the organisations or people who created the data.
	Identifiers	Unique and persistent identifiers used to Identify the project: RAiD Identify the people: ORCID Identify the data: DOI (digital data), IGSN (material samples) Identify the organisation: ROR
	Date	Any key dates associated with the data. This may including project start and end date, time period or any other important dates associated with the data.
	Method	Information on how the data was generated, such as specific equipment or software used (including model and version numbers), formulae, algorithms or methodologies. You might include an electronic lab notebook to aid this element.
	Processing	Information on how the data has been transformed, altered or processed (eg. normalisation).
	Source	Citations to any third party data used in the research obtained or derived from other sources. At a minimum it should include the creator, the year, the title of the dataset and some form of access information.
Content description	Subjects	Keywords or phrases describing the subject or content of the data. This may also include Field of Research codes and Socio-Economic Objective codes (see the Australian Bureau of Statistics site for both code schedules)
	Location	Descriptions of relevant geographic information. This could be city names, region names, countries or more precise geographic coordinates.
	Variable list	A list of all variables in the data files, where applicable. This could also be captured in a codebook.
	Code list	Explanation of codes or abbreviations used in either the file names or the variables in the data files (eg. '999 indicates a missing value in the data')
Technical description	File inventory	A list of all the files that make up the dataset, including extensions (eg. photo1023.jpeg’, ‘participant12.pdf’).
	File Formats	File formats of the data (eg. HTML, PDF, GeoTIFF, JPEG etc).
	File structure	Organisation of the data file(s) and layout of the variables, where applicable.
	Version	Information on the different versions of the dataset that exist, if relevant.
	Software	Names and version numbers of any special-purpose software packages required to use, create, view, or analyse the data.
Access	Rights	Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data.
Access	Access information	Where and how the data can be accessed.

Metadata

One way that data documentation commonly appears is in structured descriptions of the data - this is known as metadata. Metadata is often defined literally as "data about data" and refers to the information used to describe the attributes of a resource in a standardised format. Metadata is often collected into one central location, so researchers can search in one place to find datasets that will help their research. By including useful and accurate information in your metadata and data description, other researchers will be more likely to find and reuse your dataset.

Many disciplines actually have their own specific ways of structuring metadata - these specific structures are called schemas. A schema will list what information you'll need to include about your data and how that information should be structured. Below are some examples of various schemas.

Discipline	Metadata standard
General	Dublin Core (DC) Metadata Object Description Schema (MODS) Metadata Encoding and Transmission Standard (METS)
Arts/Creative Work	Categories for the Description of Works of Art (CDWA) Visual Resources Association (VRA Core)
Cultural heritage	MIDAS-Heritage
Biology	Darwin Core
Ecology	Ecological Metadata Language (EML)
Geography	Content Standard for Digital Geospatial Metadata (CSDGM)
Health	Protocol Data Element Definitions for clinical trials Genome Metadata
Social sciences	Data Documentation Initiative (DDI)

ARDC - Metadata
The guide is intended to provide a simple generic working-level view of the needs, issues, and processes around metadata collection and creation as it relates to research data.

Research Data Management

RDM solution

Documenting research

Metadata

Find & Search

Visit the Library

Contact the Library

University Quicklinks