Customer Case Study
Domain
Health care
Datatypes
VCF
Imaging
Challenge
To enable the company's ML goals, the R&D team also had to establish a distributed governance model in the form of a data mesh. The underlying data mesh had to support hundreds of thousands of omics data files spanning whole-genome sequencing (WGS), whole-exome sequencing (WES) and clinical exome sequencing (CES), plus millions of digital pathology images (high-res scans of prepared microscope slides).
Managing these datasets as collections of files was already limiting the scale and scope of internal analysis projects. The Quest R&D team evaluated traditional relational databases and even modern cloud data warehouses, but they fell short on the performance demands of bioinformatics and required excessive data wrangling. Finally, before collaborative ML projects could even begin, it was critical to ensure that a robust consent-tracking system could be built into the data mesh based on the genomics data-sharing standards set by the Global Alliance for Genomics and Health (GA4GH) Data Use Ontology (DUO).
Genomics data management requirements at Quest Diagnostics
Solution
The TileDB and Quest teams started working together to analyze a dataset of 114,000 whole-exome VCF files. Within 2 months, they proved that TileDB checked all the boxes to ingest and query variant data. They also proved that TileDB Cloud's data cataloging and access control features could be extended to support the Quest consent ontology standards necessary for their data mesh implementation.
Early results of Quest Diagnostics & TileDB