UNIVERSITY PARK, Pa. — Computer science and biology have a symbiotic relationship: computer scientists can develop new ways to analyze biological data, leading to new discoveries in biology and biomedicine, and biology can inspire and inform computational approaches. In some cases, the two disciplines merge in a single type of researcher: computational biologists.
At Penn State, two researchers are at the forefront of this intersection. Mingfu Shao, associate professor of computer science and engineering, and David Koslicki, associate professor of computer science and engineering and of biology, recently presented three papers at RECOMB, one of the top conferences in computational biology, which took place April 26-29 in Seoul, South Korea.
In this Q&A, Shao and Koslicki, who are affiliated with the Center for Computational Biology and Bioinformatics and the Intercollege Graduate Degree Program in Bioinformatics and Genomics in the Huck Institutes of the Life Sciences, spoke about how computational tools are advancing molecular biology.
Q: What is computational biology, and how does your research fit into this category?
Koslicki: Computational biology is one of the most interdisciplinary fields in science: It combines research from areas as diverse as physics, chemistry, computer science, mathematics, biology and statistics all under a unified theme of using computational tools to extract insight from biological data. I use both mathematical and machine learning tools to shed light on sequencing data, such as DNA and RNA, which provides us information about the genome and what genes are expressed at what level at a given time. I also work with organizing and learning from biomedical knowledge generally. This includes extracting knowledge from previously published papers or repositories of information to connect concepts and shed light on how different drugs might treat a disease or better learn the mechanism behind a disease.
Shao: Computational biologists develop algorithms that answer questions from biological datasets. My research lab works on three topics in this field. First, we develop methods to analyze RNA-sequencing data, with a focus on improving how we understand gene expression at a high resolution. Second, we develop fast algorithms for genome rearrangements, to understand how genetic instructions may change under various evolutionary models. Finally, we design new algorithms capable of tolerating errors, which are instrumental for comparing and processing sequencing data that classic approaches often cannot accurately assess.
Q: How does your research allow for a better understanding of biology or biomedicine?
Koslicki: My work with the National Center for Advancing Translational Services’ Biomedical Data Translator program involves building advanced computational tools, called knowledge graphs, that integrate vast amounts of biological data to reveal hidden relationships between genes, diseases or potential treatments. These tools help scientists more rapidly identify genetic underpinnings of complex disorders, providing clearer pathways toward accurate diagnoses and personalized therapies. By connecting genetic variations to specific diseases, we gain deeper insights into how certain mutations affect health, and this helps doctors and researchers pinpoint promising treatments faster. Ultimately, this research accelerates medical discoveries and improved treatment of human diseases.
Shao: My research develops computational tools that bridge RNA-sequencing data and biological discovery to enhance our ability to interpret genomic data in both basic and biomedical contexts. We have created highly accurate tools that assist in gene expression analysis and biomarker discovery — for instance, our tool was used to identify genetic material involved in SARS-CoV-2 studies, the virus that causes COVID-19. In comparative genomics, we design algorithms to trace genome rearrangements over evolution, offering insights into genome structure and function.