2 Data Acquisition and Management
2 Data Acquisition and Management
2 Data Acquisition and Management
Understand the importance of planning and what is a Data Management Plan (DMP)
Beware of the importance of a good record keeping for Reproducibility
Beware of biases and be skeptical of your own results
Know the basics of Data Sharing and be familiar with the FAIR principles
OUTLINE:
Planning
Record keeping
Best practices and data sharing
Data modifications
1. PLANING ============================================================
A Data Management Plan is a living document which should be made at the onset of each
project, and be updated during the project’s life cycle.
There are several online tools that can help you create your own Data Management Plan:
• Research Data Management Plan / Pla de Gestió de Dades de Recerca (csuc.cat) (Spain)
• https://dmponline.dcc.ac.uk (UK) helps you write your DMP to follow rules of specific
funders or institutions
• https://dmptool.org (US)
Research institutions, organisations and researchers must ensure appropriate stewardship and
curation of all data and research materials including the following points:
---> Write what you do and do what you write!!! Record everything! Keep all your data - what,
why, who, how, when, where, what happened, your interpretations, what’s next… Except:
confidential information
The PRBB provides registered notebooks free of charge to all personnel belonging to its constituent
centres. Some centres (like the CRG) have in place an ELN system for all its researchers.
Neither the Principal Investigator (PI), nor the Head of Department, as individuals, legally
own either the data or data books collected by him or herself, or by any students or other
scientists on a research project.
The perfect notebook should be: legible, well organized, accurate and complete, enable
repetition, compliant with requirements, accessible to authorized persons, stored properly,
and appropriately backed up and secure, ensuring:
Confidentiality: the information is accessible only to authorised people
Integrity: accuracy and completeness of the data
Availability: authorised users have access to information systems when required
Security: Prevent theft, accidental loss or damage
Data management will be very different for each research project/lab and it may be discipline-
specific (eg bioinformatics). But some challenges are common to most researchers (File naming,
Version control, use of proprietary software…) and it’s worth keeping them in mind!
Use logical and consistent file names, according to agreed conventions (in the lab, with
collaborators, etc)
Keep file names short and relevant (descriptive and without cryptic label!)
Do not use spetial characters or spaces (except _ and -)
Never change the file extension
The Open Science Framework, the University of Cambridge and the University of SouthHampton
give tips about file naming and organisation, creating a data management plan, etc. Check out also
the Software Carpentry website for tutorials on how to use version control and others.
Data sharing
In principle all data should be considered for sharing in the interests of:
• Free exchange of information.
• Spreading learning and avoiding waste through unnecessary repetition.
• In addition, new regulations by EU grants (Horizon Europe) and other funders (NIH)
make open access to research data compulsory!!
Ideally, the sharing process should have been described in the data management plan, or a specific
data sharing plan.
Whenever possible, make sure your data follows the FAIR data principles to ensure data are:
Findable
Accessible
Interoperable (allows data exchange and reuse between researchers - i.e. use of ontologies,
standards for formats, open software, etc.)
Reusable
SHARING EXCEPTIONS:
Data sharing restrictions do apply in 3 circumstances:
• Data prone to dual use: data that may be misused to pose a biologic threat to public
health and/or national security*
• Data that can lead to patents, commercial interest
• Data and information that can affect the privacy of human subjects
Balancing the obligation to protect and need to share: Be open when you can and close when you
must.
WHAT? This might mean to include both men and women in a study (and analyse the results
by sex) or to take into account how different genders might be affected by the research, if
applicable. It applies to humans, but also animal models or even cultured cells.
WHY? Fairness and quality of the research. There are differences in the way male and female
bodies react to drugs, for example. That’s why since 1994, the US has required all clinical trials
funded by the National Institutes of Health (NIH) to include women.
Some funders have made the inclusion of this dimension compulsory, e.g. European Research
Council (ERC) Horizon Europe programme, Swedish Research Council or Canadian Institutes of
Health Research.
Sex and gender perspective in research content is different than gender equality in research teams -
but the latter is just as important!
Once you have your data collected, you set to analyse it.
• Selecting which data you will use to draw conclusions, and which will be discarded if any
• Establishing significance and identifying potential weaknesses and limitations
• Choosing how to present it
Can the way you modify / transform your data affect its integrity?
Can the way you look at your data (and the result you are expecting to get) affect its integrity?
Cooking data…
“Scientific results can be distorted in several ways, which can often be very subtle and/or elude
researchers' conscious control. Data, for example,
- can be “cooked” (a process which mathematician Charles Babbage in 1830 defined as “an art
of various forms, the object of which is to give to ordinary observations the appearance and
character of those of the highest degree of accuracy”);
- it can be “mined” to find a statistically significant relationship that is then presented as the
original target of the study;
- it can be selectively published only when it supports one's expectations;
- it can conceal conflicts of interest, etc…”
Fanelli, D. How many scientists fabricate and falsify research? PloS ONE 2009
Beware of unconscious biases: ‘if you know the number you want to get, you’ll get it’
“The first principle is that you must not fool yourself and you are the easiest person to fool. So
you have to be very careful about that. After you’ve not fooled yourself, it’s easy not to fool other
scientists. You just have to be honest in a conventional way after that.”
Richard Feynman