Module 5 - Data Management
Research Data Management (RDM) is an important consideration when undertaking research. It should be one of the first things you think about when scoping your project and it is something you should continue to consider and evaluate throughout your candidature.
This module will cover some of the main aspects of Research Data Management you should consider during your research, including:
- What is considered research data
- Your responsibilities for data as a researcher
- Data management planning
- Data management best practice
- Managing sensitive data
- Documenting metadata
- Data archiving and publishing
Upon completion of this module, you will be able to:
- Identify what constitutes research data
- Identify, explain and adhere to your responsibilities for data management, including sensitive data.
- Develop a research data management plan (RDMP) in consultation with your supervisors and the UNE Research Data Lead
- Employ data management best practice for file management, formats, naming, versioning, and architecture.
- Store, back up, archive, and publish your data using Cloud.UNE
5.1 Benefits of good data management
Adopting good data management practices at the beginning of your research has many benefits including:
- Keeping your data safe and secure
- Extending the usefulness of your data beyond your project, both for yourself and others
- Making your data identifiable, retrievable and available when you need it
- Ensuring you are compliant with university policy, funder policy, and wider frameworks such as The Code for the Responsible Conduct of Research
You can ensure that you are engaging with RDM best practice by considering:
- The creation of a Research Data Management Plan at the start of your project, before you collect any data
- How your files will be named, organised, versioned and documented before you collect any data
- Where you will safely, securely and compliantly store your research data
- How you will archive your data and preserve it for future use by yourself or others
5.2 What is research data?
When talking about Research Data Management, defining what we mean by data is a good place to start. At UNE, the Management and Storage of Research Data and Materials Policy states that:
Research Data means data as facts, observations, computer results, measurements or experiences on which an argument, theory or test is based. Data may be numerical, descriptive or visual. Data may be raw or analysed, experimental or observational.
Some examples of research data include, but are not limited to:
- Physical samples and collections
- Software, models and visualisations
- Lab books, field notes and observations
- Interview transcripts and survey results
- Data storage at UNE
5.3 What are your responsibilities as a researcher?
At UNE, your Research Data Management responsibilities are outlined in the Management of Research Data and Materials Policy. Additionally you may be subject to funder or publisher guidelines (e.g. ARC and NHMRC policies) so these must be considered carefully when considering how you will manage your research data.
Finally, every researcher in Australia must consider the requirements of The Australian Code for the Responsible Conduct of Research (The Code) and the Management of Data and Information In Research supporting guide when undertaking their work. Many of the principles in The Code are embedded in institutional policies like UNE’s ‘Management of Research Data and Materials Policy’, but it is useful to read The Code before you begin your research to get a strong understanding of how best to manage your data with transparency, reproducibility and integrity in mind.
5.4 Data Management Planning
5.4.1 What is a Research Data Management Plan?
A Research Data Management Plan (RDMP) is a structured document that provides researchers with an organised way to record information about their data management practices throughout the research project, including details about:
- What data will be created
- What policies will apply to the data
- Who will own and have access to the data
- What data management practices will be used
- What facilities and equipment will be required
- Who will be responsible for each of these activities
5.4.2 Why should you create a RDMP?
Creating and maintaining a RDMP ensures that your data is easy to interpret and understand throughout your project and afterward, both by you and your collaborators and anyone you choose to share it with. Maintaining a RDMP is also required by the Australian Code for the Responsible Code of Research.
Your RDMP should be created at the beginning of a research project, prior to collecting data.
5.4.3 Creating your RDMP
You can find the UNE RDMP Template on the RDM LibGuide or you can use an external RDMP template, such as DMPOnline. While working through your RDMP you may find it useful to refer to the ‘Research Data Management Checklist’ on the UNE RDM LibGuide.
It is important to keep in mind that a RDMP is a living document. Once you have created yours, you should continue to make changes to it as needed throughout your project. Your RDMP should always be accurate and up to date and should be stored with your project files, research data and documentation in Cloud.UNE.
5.5 Data Management best practice
Now that you know your responsibilities as a researcher and you have your RDMP under way, it is important to think about how you will organise and manage your data. The information below will give you a starting point for putting Data Management best practice into action in your project.
5.5.1 File management
Spending some time planning how you will manage your files before collecting your data will ensure that your data is always available, findable and identifiable and easy to share with current or future collaborators.
5.5.2 File formats
Have you ever tried to open an old Word document in a new version of Microsoft Word, only to get an error message saying that there are compatibility issues? If so, you’ll know how important it is to consider how to ensure your files will be usable for years to come or if you lose access to a specialist, proprietary software program
Where possible you should use durable file types when collecting your data. Durable file formats have the following qualities:
- The file format is recognised by standards agencies (e.g. Standards Australia; ISO).
- The file can be read by Open and non-proprietary software
- The file format is widely used within your discipline
- File format is self-documenting. This means that metadata about the file is saved within the file itself. This valuable metadata will remain with the file even if it is moved
Figure 5.5.2 Some examples of durable file formats
5.5.3 File naming
You should also spend some time thinking about how you intend to name your data files so that they are easy to identify and organise. You should decide on a file naming convention before you begin collecting your data. You can come up with your own convention or adopt one used by your supervisor, faculty or discipline.
Regardless of what file naming convention you choose, it should be constant, meaningful and help you find the files you’re looking for at a glance.
Some best practice principles to keep in mind include:
- Keep file names short and meaningful
- Avoid unnecessary repetition
- Use CamelCase to delimit words, avoid spaces and underscores as they can make files difficult to open in some software programs
- Use the YYYYMMDD format when adding dates to file names (something to consider: putting the date at the front of a file name will make it easy to order the files by date)
- Avoid using terms like draft or final in file names, rely on versioning (see below) instead
- Avoid using non-alphanumeric characters in file names as these can also make files difficult to open in some software programs
Figure 5.5.3 Example of best practice file naming
You can get further details about these principles here.
5.5.4 File versioning
As you analyse your data, you will make many edits and changes to your files. Whenever you make changes to a file, you should save a new version and amend the name of the file to reflect which version of the original file the new file is. This will ensure that you can go back to earlier versions of your files if needed or allow you trace the evolution of your dataset as you perform different analysis tasks.
Deciding on a clear and consistent versioning system will make it clear at a glance which version of the file you are looking at and will make it easy to find the oldest or most recent iteration of your work.
You should decide how you are going manage versioning before you collect your data. Examples of best practice include:
- Using a versioning numbering system – this might look like: 20200415FileNameV2. In this example, V2 indicates that this is the second version of the file and that it was made on 15 April 2020.
- Identifying who has made the changes – this is particularly useful if you are working collaboratively with other researchers. This might look like: 20200429FileNameKBV2. In this example, KB are the initials of the researcher updating the file.
Figure 5.5.4 Example of best practice when tracking authorship and version of files.
The above are just simple examples; you can come up with a system that works best for you or your research group.
5.5.5 File architecture
File architecture refers to how you will organise your files. Whether you are working on your own or collaboratively in a team, deciding on a robust but usable file architecture will ensure your data is organised in a logical manner that you can explain to others. Deciding on a file architecture at the beginning of your research will allow you to:
- Organise and access your files quickly, as you will know where everything is
- Help you avoid file duplication, accidental overwriting or misplacement
- Back up your working data quickly and efficiently
How you construct your file architecture is up to you, but some examples of best practice are highlighted in Figure 5.5.5. Before you start setting up your folder, you should check with the lead researcher in your team to find out if there is a file architecture already in place and how you may need to adhere to existing procedures.
Figure 5.5.5 File architecture – tips for best practice (note: click on link to download PDF of image)
5.6 Data storage at UNE
Cloud.UNE is the University's cloud storage solution for research data. All UNE academic staff and HDR candidates have access to Cloud.UNE, with an initial quota of 500 Gb and an option to apply for more storage if needed. This cloud storage service is accessible from anywhere with internet connect via https://cloud.une.edu.au/ using your UNE credentials.
Other storage solutions at UNE such as OneDrive, Dropbox, Google Drive and shared network drives should not be used for your research data.
If you would like additional support in setting up and using Cloud.UNE, please contact the Research Data Lead.
5.7 Backing up your data
As a researcher, you are responsible for ensuring that you make regular backups of your research data. It is possible to automate this process using the Cloud.UNE Sync Client. The Cloud.UNE Sync Client allows you to nominate a folder on your computer that will regularly sync any changes made in the folder to Cloud.UNE. You can set the frequency of these uploads, manually start an upload or pause an upload through the Sync Client. The Sync Client does require an internet connection to work, and if you are working with large files, a fast and stable internet connection is recommended.
You should think about the frequency of your backups while working through your Research Data Management Plan.
5.8 Sensitive data
The Australian National Data Service (ANDS) defines sensitive data as “data that can be used to identify an individual, species, object, process or location that introduces risk of discrimination, harm, or unwanted attention.”
If you think it is possible that your data falls under this definition, you should speak with your supervisor and contact the Research Data Lead for advice.
You should consider de-identifying or anonymising sensitive data where possible and appropriate. There are several frameworks you can utilise for this purpose including CSIRO Data 61’s ‘Framework for Data De-identification’ and ANDS’ ‘De-identification Guide’. Best practice may change from discipline to discipline, so it is a good idea to speak with your supervisor about this.
Sensitive data should be stored securely on Cloud.UNE during the course of your project.
5.9 Documenting your metadata in a README.txt
A README.txt file provides a place for you to record information (metadata) about your data files and is intended to help ensure that the research data it is stored with can be readily understood by others you may share the data with, or yourself at a later date.
There are many README.txt templates you can use for this purpose, including these ones by The University of Sydney and Cornell University. You can add and remove elements of the README.txt to suit your metadata needs. Once you have completed your README.txt, it should be saved with your research data in such a way that it is the first thing you or someone you were sharing your data with would see.
5.10 Data archiving and publishing
It is a requirement that all HDR students archive their research data in Research UNE (RUNE) prior to submitting their thesis for examination.
RUNE is UNE’s institutional repository, where researchers deposit their research outputs. It provides researchers with a means of safely archiving their research data in a way that makes it discoverable, which is a great way to increase engagement with your research.
If your research data is sensitive, you can choose to make your dataset mediated or closed in RUNE. The RUNE team will be able to advise you on the best option for your circumstances.
Submitting your research data to RUNE is a pretty straight forward process. You will find some instructions and advice to get you started here.
You are now ready to complete the Module 5 activities in your workbook.
Progress to Module 6- Research Software.