Better data means better science
Professor Susanna-Assunta Sansone is the University’s Academic Lead for Research Practice, part of the Research Culture programme of work; she is also the Associate Director of the Oxford e-Research Centre, where she leads the Data Readiness Group. In this 1 minute video she describes her Group's R&D work.
Susanna has worked since 2001 in the areas of data interoperability and reproducibility, research integrity, and the evolution of scholarly publishing, and she collaborates with researchers, service providers, journal publishers, library science experts, funders and learned societies in academic, commercial and government settings alike.
With her team of data engineers (research software and knowledge engineers) she researches and develops new methods and tools to make digital research objects (including data, software, model and workflows) Findable, Accessible, Interoperable and Reusable, in one other word FAIR for humans as well as for machines. Her team also builds interoperability standards, and run informative, educational registries to enable data quality and readiness, essential in Data Science.
Underpinning the work of other scientists
Thanks to the amount of data, which is increasingly available in the public domain, we start to see the rise of scientific discoveries that are made using other people’s data. However, the vast majority of data that is in the public domain is still not reusable, mainly because data is poorly described for third party use.
Governments, funders and publishers expect greater transparency and reuse of research data, as well as greater access to and preservation of the data that supports research findings. The 2019 UKRI Research and Innovation Infrastructure report on “Opportunity to grow our capability” places the implementation of the FAIR Principles as enabler in today’s data-driven era. It also highlights that more detailed assessment of the implementation requirements for FAIR data within each discipline is needed. The report also states that the conceptual design, R&D and prototyping to improve existing or create new data infrastructures are significant research activities in their own right; and to meet the ambition of data-intensive science, the education and career development of research software engineers and research data professionals is critical.
I strive to enact the technical, cultural and policy changes necessary to motivate and reward researchers for share richly described, high-quality data, to maximize the reuse; and ensure data quality for use by machines in all areas of data sciences, such as AI and machine learning, where decisions are make with minimal human intervention.
Full list here.
Most Recent Publications
She completed a Diploma (1997) and PhD (2000) in Molecular Biology in the Faculty of Medicine, St Mary’s Hospital, of the Imperial College of Science, Technology and Medicine in London.
In 1999, she joined an Imperial spin off (Microscience Ltd. now Emergent BioSolutions, Inc.) to work as a Senior Scientist on the molecular characterization of a vaccine strain. In 2001, she moved to the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI, Cambridge) where she worked as a Project and Team Coordinator and Principal Investigator in research data management.
Susanna moved to Oxford in October 2010 as Principal Investigator at the Oxford e-Research Centre. In 2013 she was appointed to her current position of Associate Director of Oxford e-Research Centre, she was conferred the title of Associate Professor in 2017 and of Full Professor of Data Readiness in 2021. Since 2022, she is also University’s Academic Lead for Research Practice, part of the Research Culture programme of work.
Since 2012 she is also a consultant for Nature Research Group at Springer Nature, and the founding editor of its Scientific Data journal.
Most Recent Publications
Six most significant publications
- Rocca-Serra P, Sansone SA. Experiment design driven FAIRification of omics data matrices, an exemplar. Sci Data. 2019 Dec 12;6(1):271. doi:10.1038/s41597-019-0286-0. The first published step-by-step recipe on to make data FAIR retrospectively, now successfully used by major pharmas and service SMEs as ‘the guide’ to create further recipes, pre-competitively and collaboratively. [altmetric.com/details/72900172]
- Sansone SA, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo A, Lister A, Thurston M, and the FAIRsharing Community. FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol. 2019 Apr;37(4):358-367. doi:10.1038/s41587-019-0080-8. This article showcases the role and adoption of the FAIRsharing resource; authors include leadership of the Wellcome Open Research and USA NIH National Library of Medicine, as well as major publishers, such as Springer Nature, Wiley, Taylor & Francis, and Elsevier. [altmetric.com/details/58361684]
- Wilkinson MD, Dumontier M, Sansone SA, Bonino da Silva Santos LO, Prieto M, Batista D, McQuilton P, Kuhn T, Rocca-Serra P, Crosas M, Schultes E. Evaluating FAIR maturity through a scalable, automated, community-governed framework. Sci Data. 2019 Sep 20;6(1):174. doi:10.1038/s41597-019-0184-5. The first open source tool demonstrating how the FAIRness of data can be evaluated in an automated manner; this has paved the way for the collaborative development of common FAIR indicators world-wide. [altmetric.com/details/66880830].
- Ohno-Machado L, Sansone SA, Alter G, Fore I, Grethe J, Xu H, Gonzalez-Beltran A, Rocca-Serra P, Soysal E, Zong N, Kim H, the bioCADDIE Consortium. DataMed: Finding useful data across multiple biomedical data repositories. Nat Genet. 2017 May 26;49(6):816-819. doi:10.1038/ng.3864. Developed as prototype to understand challenges and users’ needs, this work has been pivotal for to activities under the current NIH Common Fund Data Ecosystem programme. [altmetric.com/details/20567331].
- Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, 't Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016 Mar 15;3. doi:10.1038/sdata.2016.18. High level group of 53 internationally recognized data leaders who designed the FAIR Principles, now endorsed by publishers, funders, societies and infrastructure programmes world-wide; my work, BioSharing (precursor of FAIRsharing, publication n. 2) and ISA (publication n. 6) feature as exemplars in this highly accessed and cited article. [altmetric.com/details/6193015]
- Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo CT, Forster MJ, Gaudet P, Gilbert J, Goble C, Griffin JL, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Ho Sui SJ, Laederach A, Liang S, Marshall S, McGrath A, Merrill E, Reilly D, Roux M, Shamu CE, Shang CA, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios I, Hide W. Toward interoperable bioscience data. Nat Genet. 2012 Jan 27;44(2). doi:10.1038/ng.1054. This article marks the start of wider uptake and collaborative development of the ISA resource, currently used in funded research and infrastructures projects world-wide. [altmetric.com/details/575348].
Most Recent Publications
- 2013-present: Scientific Data, Founding Academic Editor, Springer Nature.
- 2011-present: GigaScience, Editorial Board Member, Oxford University Press.
- 2009-present: Journal of Biomedical Semantics, Editorial Board Member, BioMedCentral, Springer Nature.
- 2009-present: Reviewer for PloS and several Springer Nature journals.
- 2009-2014: Standards in Genomic Sciences, Founding Member, BioMedCentral, Springer Nature.
- 2006-2009: OMICS: A Journal of Integrative Biology, Editorial Board Member, Mary Ann Liebert.
- 2022 - present: University-wide - Research Culture, Research Practice programme, Academic Lead
- 2021- present: Department of Engineering - Senior User Group, Information Sciences Building.
- 2021- present: Department of Engineering - WiE Network, Governance Board Senior Faculty Advisor.
- 2020- present: Department of Engineering - Athena SWAN, Application Writing Team Member.
- 2019- 2020: University-wide - Research Data Management, Review Governing Group Member.
- 2016-2017: University-wide - Research File Service Board Chair.
- 2015-2016: University-wide - Storage as a Service Board Chair.
- 2015-present: University-wide - Research Data Management, Delivery Group Member.
- 2014-present: University-wide - Research Data Oxford, Support Group Member.
- 2013-present: University-wide - IT Architecture Group Member.
- 2021-present: EOSC Association - FAIR Metrics and Data Quality Task Force Member.
- 2021-present: German National Infrastructure of Personal Health Data - Scientific Advisory Board Member.
- 2020-present: OxLOD Limited, non-for-profit University spin-off - Academic Board Member.
- 2019-present: USA NIH dkNET Information Hub - External Scientific Advisory Panel Member.
- 2019-present: GO-FAIR - Executive Board Member.
- 2018- 2019: UKRI - Data Working Group Member.
- 2016- present: Board of Directors, Member – Massive Analysis and QC (MAQC) Society.
- 2016- present: Research Data Management, Advisory Board Member – Elsevier.
- 2015-present: Management Committee, Member - ELIXIR UK Node.
- 2015-2019: Advisory Board, Member – Force11 Community.
- 2015-present: Source Data Project Advisory Board, Member – EMBO Press.
- 2015-present: Data Processing & Integration TC276/WG5 Technical Committee, Member – ISO
- 2014-2019: UK Open Research Data Forum, Member – multi-stakeholders, including RCUK, JISC, Wellcome Trust, Royal Society and Universities UK.
- 2013-2016: Technical Advisory Board, Member – Research Data Alliance.
- 2013-2014: Data Intensive Bioscience Expert Working Group, Member – UK BBSRC.
- 2012-2017: Board of Directors, Member and (elected in 2015 as) Vice-Chair – Dryad.
- 2010-2011: Data Sharing Policy Monitoring Group, Member – UK BBSRC.
- 2010-2011: Insect Pollinators Initiative Review Panel, Member – UK BBSRC.
- 2008-2014: Coordinating Committee, Member – OBO Foundry.
- 2007-present: Board of Directors, Member – Genomics Standards Consortium (GSC).
- 2007-2013: Bio-Ontology, Co-chair – ISMB Community of Special Interest Bio-Ontology.
- 2005-2010: Board of Directors, Member – Metabolomics Standards Initiative (MSI).
- 2004-2008: Post-Genomics and Proteomics Steering Committee, Data Management Chair – NERC.
- 2004-2010: Coordinator Committee, Member – Ontology for Biomedical Investigations (OBI) consortium.
- 2003-2012: Board of Directors, Member – FGED (previously MGED) Society.
Most Recent Publications
I am currently looking for motivated DPhil/PhD students to join my group. If you have an interest in my areas of activity, please get in touch with your CV and an overview of a project proposal.
Check here when the latest deadline for applications for most Oxford scholarships is.
More information on the topics
I am interested in research proposals in any disciplines and at the intersection of data and software engineering that fit under the Departmental Information Engineering theme.
The research proposals should respond to the needs for delivering step-changes in the ability of researchers to utilise existing large and complex data types, and offer the much-needed learning opportunities in research data readiness. The FAIR Principles provide a high-level guidance to improve data (re)use by machine, however, there is no elucidation on the technical, social and policy implications necessary to make data FAIR or FAIRer.
The research proposals should be designed to deliver novel conceptual and methodological contributions to advance the practices and the infrastructure for research data management necessary to use data at scale in a way that is not possible now. For example, the research proposals should define (and prototype) how to move from the current manually-focused, time-consuming and error-prone operations to a streamlined, unambiguous and AI-ready framework, using objective metrics to drive the advancements and demonstrate the project’s impact on the researchers.
Beyond science, the research proposals can also contribute of the nascent body of knowledge around ‘research on research’, opening up the whole way of thinking how we discover, access, reuse extant data or create, curate and share new scholarly knowledge; and how we enact the cultural changes that motivate, reward and credit researchers for disseminating high-quality, FAIR data.
Most Recent Publications
- 2021-present: 'Reproducibility' course - for the FAIR component; EPSRC Centre for Doctoral Training in Sustainable Approaches to Biomedical Science: Responsible and Reproducible Research (SABS: R³).
- 2018- present: 'Introduction to Data Readiness', part of the Biomedical Engineering coursework module, 2nd year of the MEng in Engineering Science.
- 2014-present: 'Data Management, Analysis and Statistics' foundation module; Oxford BBSRC Interdisciplinary Bioscience Doctoral Training Programme, and the EPSRC, BBSRC synthetic Biology Centre for Doctoral training.
- 2014-2020: 'Things to do with data: research data management and publication' - Lectures for Research Data Oxford, IT Learning Programme, Open Access Week, Oxford/Berlin Summer School.
Since the early 2000s, I have delivered over 200 plenary lectures and keynotes (many of which are available here), and members of my group also deliver educational seminars, training and teaching material, to events worldwide.