Customer Case Study

How Quest Diagnostics® is building an enterprise-wide multi-omics data mesh with TileDB
Quest Diagnostics, a Fortune 500 healthcare company, is the world's leading provider of diagnostic information services, offering a wide range of tests, lab services and integrated health information solutions. Recently, Quest's advanced R&D organization led a company-wide initiative to extract more value from existing datasets through machine learning. Their challenge was to unify data silos — traditionally isolated by specific formats and workflows — while ensuring that patient consent is properly tracked as data is anonymized and securely shared between internal teams.
rady-hospitalearthscopequestphenomicphenomic_ai

Domain

Health care

Datatypes

VCF

Imaging

Challenge

To enable the company's ML goals, the R&D team also had to establish a distributed governance model in the form of a data mesh. The underlying data mesh had to support hundreds of thousands of omics data files spanning whole-genome sequencing (WGS), whole-exome sequencing (WES) and clinical exome sequencing (CES), plus millions of digital pathology images (high-res scans of prepared microscope slides).


Managing these datasets as collections of files was already limiting the scale and scope of internal analysis projects. The Quest R&D team evaluated traditional relational databases and even modern cloud data warehouses, but they fell short on the performance demands of bioinformatics and required excessive data wrangling. Finally, before collaborative ML projects could even begin, it was critical to ensure that a robust consent-tracking system could be built into the data mesh based on the genomics data-sharing standards set by the Global Alliance for Genomics and Health (GA4GH) Data Use Ontology (DUO).

Genomics data management requirements at Quest Diagnostics

  • Deliver ML-ready genomics data that aligns with Quest DUO data use categories relevant to different internal projects and teams.
  • Ability to ingest, store and scale up to 6 million samples a year of variant data and render it as analysis-ready for bioinformaticians.
  • Support specific bioinformatics queries, such as complex range intersection.
  • Empower expert teams to own domain data, enabling collaborative multi-omics analyses and image labeling for ML model development.
  • Overcome the N+1 problem. For example: the ability to easily update existing VCF datasets without ballooning storage space or update speeds.

Solution

The TileDB and Quest teams started working together to analyze a dataset of 114,000 whole-exome VCF files. Within 2 months, they proved that TileDB checked all the boxes to ingest and query variant data. They also proved that TileDB Cloud's data cataloging and access control features could be extended to support the Quest consent ontology standards necessary for their data mesh implementation.

Early results of Quest Diagnostics & TileDB

  • Demonstrated superiority of TileDB over traditional tools for domain-specific queries such as slicing genomic regions, identifying a single gene across a cohort, and querying a single or a set of 10 SNPs for over 100,000 samples.
  • Achieved a 26% reduction in storage costs of TileDB arrays on Amazon S3 compared to compressed VCF files on S3.
  • Demonstrated cost-efficient ingest at the rate of 24,000 samples per day, at 1 cent per sample.
  • Easily handled sample appends to VCF datasets, solving the N+1 problem.
  • Implemented a tagging structure on TileDB Cloud to handle consent requests across TileDB Cloud arrays, notebooks and groups.
  • Began ingestion of digital pathology images into data marts to prepare for larger-scale collaboration.
rady-instituteearthscopequestphenomic
quotation-firstThe data mesh vision and rollout of data products is a multi-year disruptive, and yet, extremely rigorous undertaking in Quest's data leadership journey. TileDB is uniquely positioned as a strategic player in our ecosystem. TileDB is a rare find — simply put, they offer thought and execution partnership across all aspects of multi-omics, speak the language of our end-users, and deliver a much simpler foundational data infrastructure, at the scale we wish to operate.quotation-last
Ray Veeraghavan
Global Head of Bioinformatics & Software, Quest Diagnostics

Earthscope


Rady Children

Want to learn more about TileDB Cloud?

organizations