Customer Case Study

How Quest Diagnostics® is building an enterprise-wide multi-omics data mesh with TileDB

Quest Diagnostics, a Fortune 500 healthcare company, is the world's leading provider of diagnostic information services, offering a wide range of tests, lab services and integrated health information solutions. Recently, Quest's advanced R&D organization led a company-wide initiative to extract more value from existing datasets through machine learning. Their challenge was to unify data silos — traditionally isolated by specific formats and workflows — while ensuring that patient consent is properly tracked as data is anonymized and securely shared between internal teams.

Storage Cost Reduction

26% Storage cost reduction of TileDB arrays on Amazon S3 compared to compressed VCF files on S3

N+1 Problem Solved

Easily handled sample appends to VCF datasets

Cost Efficient Ingestion

$0.01 ingestion cost per sample for 24.000 samples/day

Partner

Quest Diagnostics

Domain

Health care

Datatypes

VCF, Imaging

Challenge

Genomics data management requirements at Quest Diagnostics

  • Deliver ML-ready genomics data that aligns with Quest DUO data use categories relevant to different internal projects and teams.

  • Ability to ingest, store and scale up to 6 million samples a year of variant data and render it as analysis-ready for bioinformaticians.

  • Support specific bioinformatics queries, such as complex range intersection.

  • Empower expert teams to own domain data, enabling collaborative multi-omics analyses and image labeling for ML model development.

  • Overcome the N+1 problem. For example: the ability to easily update existing VCF datasets without ballooning storage space or update speeds.

Solution

Early results of Quest Diagnostics & TileDB

  • Demonstrated superiority of TileDB over traditional tools for domain-specific queries such as slicing genomic regions, identifying a single gene across a cohort, and querying a single or a set of 10 SNPs for over 100,000 samples.

  • Achieved a 26% reduction in storage costs of TileDB arrays on Amazon S3 compared to compressed VCF files on S3.

  • Demonstrated cost-efficient ingest at the rate of 24,000 samples per day, at 1 cent per sample.

  • Easily handled sample appends to VCF datasets, solving the N+1 problem.

  • Began ingestion of digital pathology images into data marts to prepare for larger-scale collaboration.

The data mesh vision and rollout of data products is a multi-year disruptive, and yet, extremely rigorous undertaking in Quest's data leadership journey. TileDB is uniquely positioned as a strategic player in our ecosystem. TileDB is a rare find — simply put, they offer thought and execution partnership across all aspects of multi-omics, speak the language of our end-users, and deliver a much simpler foundational data infrastructure, at the scale we wish to operate.

Ray Veeraghavan

Global Head of Bioinformatics & Software, Quest Diagnostics

More Case Studies