September 16, 2021 | Research
U of T’s Data Sciences Institute to help researchers find answers to their biggest questions
By Berton Woodward
Researchers working with the multi-university CHIME radio telescope in B.C. are collaborating with experts at U of T's Data Sciences Institute to solve computational and processing problems (photo courtesy of the Chime Collaboration)
When University of Toronto astronomer Bryan Gaensler looks up at the night sky, he doesn’t just see stars – he sees data. Big data.
So big, in fact, that his current research tracking the baffling “fast radio bursts” (FRBs) that bombard Earth from across the universe requires the capture of more data per second than all of Canada’s internet traffic.
“This is probably the most exciting thing in astronomy right now, and it’s a complete mystery,” says Gaensler, director of U of T’s Dunlap Institute for Astronomy & Astrophysics and Canada Research Chair in Radio Astronomy. “Randomly, maybe once a minute, there’s this incredibly bright flash of radio waves – like a one-millisecond burst of static – from random directions all over the sky.
“We now know that they’re from very large distances, up to billions of light-years, so they must be incredibly powerful to be able to be seen this far away.”
We are recording more than the entire internet of Canada, every day, every second. We would obviously like to better handle the data
U of T is a world leader in finding FRBs, using the multi-university CHIME radio telescope in British Columbia’s Okanagan region and a U of T supercomputer. Yet, despite the impressive technology, many daunting challenges remain.
“It’s a massive computational and processing problem that is holding us back,” he says. “We are recording more than the entire internet of Canada, every day, every second. And because there’s no hard drive big enough or fast enough to actually save that data, we end up throwing most of it away. We would obviously like to better handle the data, so that needs better equipment and better algorithms and just better ways of thinking about the data.”
With the creation of U of T’s Data Sciences Institute (DSI), Gaensler and his colleagues now have a new place to turn to for help. The institute, which is holding a launch event tomorrow, is designed to help the University’s wealth of academic experts in a variety of disciplines team up with statisticians, computer scientists, data engineers and other digital experts to create powerful research results that can solve a wide range of problems – from shedding light on interstellar mysteries to finding life-saving genetic therapies.
“The way forward is to bring together new teams of astronomers, computer scientists, artificial intelligence experts and statisticians who can come up with fresh approaches optimized to answer specific scientific questions that we currently don’t know how to address,” Gaensler says.
Learn more about U of T’s Data Sciences Institute
The Data Sciences Institute is just one of nearly two dozen Institutional Strategic Initiatives (ISI) launched by U of T to address complex, real-world challenges that cut across fields of expertise. Each initiative brings together a flexible, multidisciplinary team of researchers, students and partners from industry, government and the community to take on a “grand challenge.”
“We’re bringing together individuals at the intersection of traditional disciplinary fields and computational and data sciences,” says Lisa Strug, director of the Data Sciences Institute and a professor in the departments of statistical sciences and computer science in the Faculty of Arts & Science, and a senior scientist at the Hospital for Sick Children research institute.
She notes that U of T boasts world-leading experts in fields such as medicine, health, social sciences, astrophysics and the arts, and “some of the top departments in the world in the cognate areas of data science like statistics, mathematics, computer science and engineering.”
New fields of data science are emerging every day
Data science techniques can be brought to bear on a near-infinite variety of academic questions – from climate change to transportation, planning to art history. In literature, Strug says, many works from previous centuries are now being digitized, allowing data-based analysis right down to, say, sentence structure.
“New fields of data science are emerging every day,” says Strug, who oversees data-intensive genomics research in complex diseases such as cystic fibrosis that has led to the promise of new drugs to treat the debilitating lung disease. “We have so much computational disciplinary strength we can leverage to define and advance these new fields.
“We want to make sure that faculty have access to the cutting-edge tools and methodology that enable them to push the frontiers of their field forward. They may be answering questions they wouldn’t have been able to ask before, without that data and without those tools.”
A key function of the DSI is the creation and funding of Collaborative Research Teams (CRTs) of professors and students from a variety of disciplines who can work together on important projects with stable support.
Gaensler, who already has statisticians on his team, says he’s looking to the CRTs to greatly expand the scope of his work.
“We have just done the low-hanging fruit,” he says. “There are many deeper problems that we haven’t even started on.”
Data helped support Ontario’s successful strategy of targeting COVID-19 'hotspots'
Similarly, Laura Rosella, an associate professor at the Dalla Lana School of Public Health, says the collaborative teams will be a major asset for the University.
“We’re going to dedicate funding to these multi-disciplinary trainees and post-docs so we can start building a critical mass of people that can actually translate between these disciplines,” she says. “To solve problems, you need this connecting expertise.”
Rosella played a key role in how Ontario dealt with COVID-19 in the early part of 2021. By analyzing anonymous cellphone data along with health information, she and her interdisciplinary team were able to see where people were moving and congregating, and then predict in advance likely clusters of the disease that would appear up to two weeks later. Her work helped support the province’s highly successful strategy of targeting so-called “hotspots.”
“We’ve been able to work with diverse data sources in order to generate insights that are used for
high-level pandemic preparedness and planning, in ways that weren’t possible before,” says Rosella, who sits on Ontario’s COVID-19 Modelling Consensus Table. “And we’ve also brought in new angles to the data around the social determinants of health that have shone a light on the policy measures that are needed to truly address disparities in COVID rates.”
Rosella’s population risk tools also include one for diabetes, which health systems can use to estimate the future burden of the disease and guide future planning. This includes inputs about the built environment. For example, if people can walk to a new transit stop, Rosella says, the increased exercise may have an impact on diabetes or other diseases. Potentially, even satellite imaging data could be brought into the prediction mix, she says.
The Institute seeks to tackle societal inequalities uncovered by data research
In addition to advancing research in a given field, the Data Sciences Institute is also seeking to advance equity.
That includes tackling societal inequalities uncovered by data research – including how socio-economic factors can determine who is more likely to get COVID-19 – and the way the research itself is being conducted.
For example, Strug says most genomics studies have focused on participants of European origin, even though the genetic risk factors for various diseases can differ between different ethnicities.
“We must make sure we develop and implement the models, tools and research designs – and bring diverse sources of data together – to ensure our understanding of disease risk is applicable to all,” Strug says.
Many algorithms, or the data they use to make predictions, contain unconscious bias that may skew results – which is why Strug says transparency is vital both to support equity and to ensure studies can be reproduced properly.
Gaensler says it’s critical to ensure diversity among researchers, too.
“My department looks very different from the faces that I see on the subway,” he says. “It’s not a random sampling of Canadian society – it’s very male, white and old, and that’s a problem we need to work on.”
Strug hopes the Data Sciences Institute will ultimately become a nucleus for researchers across the University – and beyond.
“There’s never been one entrance to the University to guide people, so it’s so important for us to be that front door,” she says.
“We will make every effort to stay abreast of the different fantastic things that are happening in data sciences and be able to direct people to the right place, as well as provide an inclusive, welcoming and inspiring academic home.”
This article is part of a series about U of T’s Institutional Strategic Initiatives program – which seeks to make life-changing advancements in everything from infectious diseases to social justice – and the research community that’s driving it.