Last updated
Last updated
Join the conversation on data & knowledge management in biomedicine and discover new ideas and practical tools for analyzing, sharing, and publishing your research.
Taralyn Tan, Ph.D., Assistant Dean for Educational Innovation and Scholarship & Ella Batty, Ph.D., Assistant Dean for Educational Programs at Kempner Institute
0:00 Introduction | 2:44 Main Presentation | 49:00 Q&A with Audience
Shaun Rawson, Ph.D., CryoEM Computational Specialist | Cyro-EM@Harvard Medical School
0:00 Introduction | 1:15 Main Presentation | 41:10 Q&A with audience
Advances in cryoEM technologies have resulted in an increasing rate of data acquisition. With new detectors generating multiple TB per instrument in a single evening this deluge has outstripped the field's ability to organise and handle the data. As a field we are grappling with ways to handle this data at the user and institutional level, and with how to organise and share information more widely. This presentation will cover data handling from the instrument from the facility perspective before exploring the wider topic of downstream management of cryoEM data. We will discuss integration with HMS-IT storage solutions, along with metadata handling and the unsolved challenges we still face.
Stuart Levine, Ph.D., Director of BioMicro Center | Massachusetts Institute of Technology
0:00 Introduction | 0:52 Main Presentation | 38:25 Q&A with audience
Data management is a critical challenge required to improve the rigor and reproducibility of large projects. Adhering to Findable, Accessible, Interoperable, and Reusable (FAIR) standards provides a baseline for meeting these requirements. Although many existing repositories handle data in a FAIR-compliant manner, connecting these datasets in a coherent manner is a growing challenging in an increasingly multi-omic and multi-institutional environment. We have developed NExtSEEK as a data management platform that allows for creating highly structured and warehoused metadata that is compatible with public deposition of these metadata in the public repository fairdomhub.org. This metadata management platform is currently used by the IMPAcTB program, the MIT superfund research program, and the MIT Metastasis Network program.
Christopher D Harvey, Ph.D., Professor of Neurobiology | Harvard Medical School
Cindy Yuan, graduate student | Harvard Medical School
0:00 Introduction | 1:10 Main Presentation | 43:26 Q&A with audience
Caterina Strambio De Castillia, Ph.D. | CZI Imaging Scientist, Assistant Professor of Molecular Medicine, UMass Chan Medical School
0:00 Introduction | 2:00 Main Presentation | 47:12 Q&A with audience
Rigorous and quantitative cell science crucially depends on the generation of high-quality datasets in which all relevant information (i.e., metadata) about a microscopy experiment is reported using FAIR (Findable Accessible Interoperable Reusable) principles. Significant advances in spatiotemporal resolution have led to ever-expanding microscopy datasets which, without agreed-upon community guidelines, are challenging to quantitatively analyze (including AI-assisted strategies), reproduce, and re-use. To overcome this hurdle, it is essential to integrate community-specified image documentation and quality-control guidelines within easy-to-use Research Data Management (RDM) software tools and pipelines to support the streamlined execution, tracking, and documentation of the full life-cycle of image data from sample preparation, image acquisition and analysis to publication and sharing (i.e., data provenance).
Adam Taylor, Ph.D. | Senior Research Scientist, Sage Bionetworks
0:00 Introduction | 2:30 Main presentation | 37:45 Q&A with audience
As biological research has grown increasingly data-intensive, collaboration among researchers with diverse expertise and resources has become essential. At Sage Bionetworks, we work with funders and researchers to coordinate data distribution under FAIR principles and to help “teams of teams” balance incentives and achieve research goals. Our interdisciplinary team of data curators, scientists, engineers, designers, and governance experts builds tools and systems to enable this, including our NIH-recognized data repository Synapse. Our flexible approach ensures secure and adaptive stewardship, curation, and sharing of data and metadata, meeting the unique needs of each research community. We aim to make biomedical data widely available and usable, directly engaging research communities and leveraging team science-based strategies to support collaborative science. In this seminar, we will share our approach to accelerating collaborative research; our work with large consortia such as the Human Tumor Atlas Network; how you can use Synapse today; and ways of working with us to implement and enhance your data management and sharing plans.
Paula Montero Llopis, Ph.D. | Director of MicRoN Core, Harvard Medical School
0:00 Introduction | 3:02 Main presentation | 47:40 Q&A with audience
Over the past decade, biomedical research has become more quantitative and interdisciplinary. The development and advancement of new tools in light microscopy and data analysis, especially open-source methods, have played a significant role in this shift, enabling breakthroughs in biomedicine. This means, researchers can tackle more challenging questions and obtain a deeper understanding of complex biological systems than ever before. However, the rapid development presents new challenges for researchers, as an in-depth knowledge of each technology is needed to appreciate its impacts on bias and reproducibility. In this seminar, we discuss what impacts microscopy data and conclusions and provide tools and resources for designing rigorous and reproducible microscopy experiments and how to appropriately report microscopy methods.
Benjamin M. Gyori, Ph.D. | Assistant Professor, jointly appointed in Khoury College of Computer Sciences & Bioengineering
0:00 Introduction | 6:38 Main presentation | 50:41 Q&A with audience
Making novel scientific discoveries requires integrating biomedical data and knowledge from diverse sources. However, merging disparate sets of information is time consuming and error-prone due to challenges like inconsistent naming conventions and the use of incompatible identifier resources. To address this, we introduce the Biopragmatics project, a new set of community standards and software tools to annotate data sets and make them easier to integrate. Then, we discuss the INDRA software system, which automatically assembles data and knowledge from large-scale automated processing of literature and pathway databases. We demonstrate how the knowledge assembled by INDRA can be used to generate mechanistic models and networks of biological systems, and inform novel hypotheses that can advance the field of biomedicine.