Back

Jan 21, 2025

Taming Frontier Data Part 3: How to prevent data storage and computing costs from overwhelming research IT budgets

Genomics
Data Management
3 min read
Devika Garg

Devika Garg

Director, Life Sciences Product Marketing

Over the next year, PWC reports that 89% of pharma companies are increasing their overall technology budget. But as life sciences organizations gather more data from biobanks, clinical trials and other sources, all that frontier data needs to be stored and managed. This means an ever-larger portion of higher life sciences IT spending is going to data storage and computing costs. How can life sciences organizations effectively manage their frontier data without overwhelming their research and IT budgets?

In this third of four posts on solving tough data challenges in life sciences, we will explore how to control data storage and computing spend while empowering research teams.

Finding solutions that make the most of frontier data in life sciences is not simple, and it’s especially challenging to do so cost-effectively. Relying on file-based approaches for life sciences data can slow collaboration and expose research teams to errors and redundancies while also increasing costs. Organizations can instead rely on custom private clouds or public cloud providers like AWS to store their research data more efficiently, but all this data is still expensive to maintain.

The ideal solution is a highly scalable database that is efficient with computational resources and agile with multi-dimensional data. This is what Rady Children’s Institute of Genomic Medicine needed to diagnose critical illnesses within hours inside the NICU. Their goal was to add new Variant Call Format (VCF) samples to existing populations in order to identify 388 additional genetic diseases.

However, this was computationally expensive using Rady Children’s Institute of Genomic Medicine’s existing file-based approach, as their storage grew super linearly each day—adding 1TB of genomic information daily. Making matters more complex, to refresh the allele frequency of every variant required an additional 10PB of genomic data every night. This hindered the research team’s ability to quickly interpret new data, and made their goal of expanding this solution ten times larger seem nigh impossible.

Rady’s Children’s Institute of Genomic Medicine needed a cost-effective and scalable database solution that could deliver a short turnaround time for diagnostics and efficiently manage volumes of genomic variant data at scale. They chose TileDB to handle their VCF samples in a 3-dimensional array on Amazon S3 using the TileDB-VCF open library, making this data analysis-ready on cloud storage.

The results more than achieved the efficiency and cost-effectiveness Rady’s Children’s Institute of Genomic Medicine needed, with a striking 97% cost reduction compared to their legacy file-based approach. “TileDB is allowing us now to do things that were hitherto not possible,” said Dr. Stephen Kingsmore, President and CEO, Rady Children’s Institute for Genomic Medicine. “It's not just a matter of running complex queries, it's a matter of running hundreds of concurrent complex queries on dramatically expanding genomic data, which is key for diagnostics in the NICU to guide the right treatments now and into the future with gene therapies.”

Today, genomic analysis at Rady’s Children’s Institute of Genomic Medicine that once required days can now be clinically turned around in seven hours—and these new genomic findings are ready to share with 70 other children’s hospitals. In the next post on tackling tough life sciences data challenges, we will look at ways to make frontier data FAIR-compliant and usable by AI and ML applications and models.

Explore our case study with Rady Children’s Hospital here.

radius query

Catch up on the series

If you missed parts one and two, here are links to review part one on how to efficiently process complex data queries at scale and part two on how to simplify collaboration between research and bioinformatics teams.

Want to see TileDB in action?
Devika Garg

Devika Garg

Director, Life Sciences Product Marketing