Loading...
Presentation Type
Lightning Talk
Location
Zoom. Recording Coming Soon!
Start Date
15-4-2024 3:00 PM
End Date
15-4-2024 3:20 PM
Description
Institutional requirements and changing scholarly cultures have led to a positive increase in the amount of research data that is openly available. However, academic institutions do not always know much about where their researchers share data and may not be able to provide them support for doing so effectively. When research datasets are poorly described or lack key metadata, they are difficult to discover and therefore unlikely to be reused. Persistent identifiers such as digital object identifiers (DOIs) for research outputs, ROR IDs for institutions, and ORCID iDs for individuals are of particular importance for promoting findability and accessibility. For example, DataCite, a common minter of DOIs for datasets, provides public dashboards that aggregate datasets at the institutional level based on their attached ROR IDs; however, datasets shared without affiliated ROR IDs are not captured.
This lightning talk will report on a project in process to explore a process for more fully capturing datasets shared by researchers at a given institution by utilizing the DataCite API. It will also cover how to use Python code to extract relevant metadata, including the repositories where researchers share their data and the completeness of its associated metadata. This data can inform services by giving librarians a robust understanding of repositories of interest at their institutions as well as gaps in data sharing practices that they may seek to address in order to ensure that research data can be more findable and accessible.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Presentation Slides
Investigating Researcher Data Sharing Practices Using the DataCite API
Zoom. Recording Coming Soon!
Institutional requirements and changing scholarly cultures have led to a positive increase in the amount of research data that is openly available. However, academic institutions do not always know much about where their researchers share data and may not be able to provide them support for doing so effectively. When research datasets are poorly described or lack key metadata, they are difficult to discover and therefore unlikely to be reused. Persistent identifiers such as digital object identifiers (DOIs) for research outputs, ROR IDs for institutions, and ORCID iDs for individuals are of particular importance for promoting findability and accessibility. For example, DataCite, a common minter of DOIs for datasets, provides public dashboards that aggregate datasets at the institutional level based on their attached ROR IDs; however, datasets shared without affiliated ROR IDs are not captured.
This lightning talk will report on a project in process to explore a process for more fully capturing datasets shared by researchers at a given institution by utilizing the DataCite API. It will also cover how to use Python code to extract relevant metadata, including the repositories where researchers share their data and the completeness of its associated metadata. This data can inform services by giving librarians a robust understanding of repositories of interest at their institutions as well as gaps in data sharing practices that they may seek to address in order to ensure that research data can be more findable and accessible.
Comments
While still in development, I hope to be able to provide participants with access to a Google Colab notebook that will allow them to recreate my analysis for their own institutions without having to change the underlying Python code. It would be similar to the analysis I provide here: https://github.com/igwink/DOI_Metadata_Analysis/blob/main/DataCite_DOI_Analysis.ipynb