Back

Jan 22, 2025

Taming Frontier Data Part 4: How to make frontier data FAIR-compliant and usable by AI and ML applications and models

Genomics
3 min read
Devika Garg

Devika Garg

Director, Life Sciences Product Marketing

The breakthrough potential for generative AI and ML applications for life sciences is exhilarating. 66% of life sciences companies are currently experimenting with generative AI to test ideas and to build use-cases, and AI contributed to at least 19 drugs in clinical development in 2023. But before life sciences organizations can apply AI and ML models and applications to their frontier data, that data needs to be FAIR-compliant (Findable, Accessible, Interoperable and Reusable).

In this final of four posts on solving tough data challenges in life sciences, we will look at ways to make frontier data FAIR-compliant and usable by AI and ML technology.

One of the most exciting life sciences applications of AI and ML tools is in clinical diagnostics. The American Hospital Association reports more than 48% of hospital CEOs and strategy leaders believe health systems will have the infrastructure to use AI in augmenting clinical decision-making like diagnostics by 2028. AI and machine learning also have powerful promise to accelerate discovery and speed research breakthroughs, but first organizations must make their data FAIR so it’s ready for AI to process.

When frontier data is FAIR, it enables AI applications to easily access and integrate large volumes of data from diverse sources, delivering more accurate and powerful ML models. However, making frontier data FAIR can be a time-consuming and computatively complex endeavor. And before any ML projects can begin, healthcare data needs a robust consent-tracking system to be built in based on the genomics data standards of the Global Alliance for Genomics and Health (GA4GH) Data Use Ontology (DUO).

These were the machine learning challenges Quest Diagnostics faced in their efforts to build an enterprise-wide multi-omics data mesh. Because their data mesh needed to support hundreds of thousands of omics data files, Quest’s leadership knew traditional relational databases and cloud data warehouses would not deliver the performance they needed. Only an ML application could deliver the scale of analysis required by their bioinformatics goals. In addition to this enterprise-level ML platform, Quest needed to ensure a robust consent-tracking system could be built into the data mesh based on DUO standards.

Enter TileDB. Beyond delivering FAIR and ML-ready genomics data in a unified data mesh across Quest’s research and bioinformatics teams, TileDB enabled Quest to ingest, store and scale up to 6 million samples per year of analysis-ready variant data—all while meeting DUO standards. “TileDB is a rare find. They offer thought and execution partnership across all aspects of multi-omics, speak the language of our end-users and deliver a much simpler foundational data infrastructure at the scale we wish to operate,” said Ray Veeraghavan, Global Head of Bioinformatics and Software at Quest Diagnostics.

TileDB’s solution also helped Quest reduce storage costs by 26% by using TileDB arrays on Amazon S3 compared to compressed VCF files on S3. Today, Quest has the unified data platform they need to master their frontier data and make optimal use of AI applications. As your organization looks for new solutions to tackle your frontier data challenges, TileDB has the technology and industry expertise to help. Contact us to learn more.

radius query

Catch up on the series

If you missed the first three parts of this series on life sciences data challenges, here are links to review part one on how to efficiently process complex data queries at scale, part two on how to simplify collaboration between research and bioinformatics teams, and part 3 on better managing data storage and computing costs.

Want to see TileDB in action?
Devika Garg

Devika Garg

Director, Life Sciences Product Marketing