Customer Case Study
Domain
Life Sciences
Datatypes
Single-cell
scRNA-seq
Challenge
Growing from 2 million to 30 million cells
At the core of Phenomic AI's target discovery platform are machine-learning models that integrate hundreds of curated scRNA datasets. With roots in developing innovative ML models for analyzing microscopy images, the company began leveraging scRNA data to support target discovery several years ago. Seeing an opportunity to leverage scRNA data at scale for oncology target discovery, Phenomic has been amassing and curating single-cell data, in the last year growing from 2 million cells to approximately 30 million cells. While this increased scale is enabling more robust discovery of better-targeted medicines, the added data processing demand slowed the ability of bioinformaticians and data scientists to iteratively query and analyze the new single-cell data.
Phenomic were storing flat files in the AnnData format on Amazon S3. When datasets were in the tens of gigabytes, the dataset could be downloaded into memory and quickly accessed. However, when the combined datasets grew beyond the memory constraints of even large instances, Phenomic's bioinformatics team realized that they needed a database solution to scale complex metadata queries and support specific single-cell access patterns for their accelerated implementations of key tools such as differential gene expression (DGE). As a result Phenomic began to look for a better solution to storing and managing their single-cell data workflows, with a focus on identifying a platform that would also enable effective data sharing and collaboration between their software and wet-lab teams.
Solution
Enter TileDB-SOMA and TileDB Cloud for single-cell analysis
The machine learning team at Phenomic AI evaluated a range of cloud data management solutions, including SQL-based tools and TileDB-SOMA, which provides Python and R implementations of the open SOMA API specification for storing and analyzing large collections of single-cell experiments directly on cloud object stores. Impressed with the ability of TileDB to allow fast access to the massive amounts of scRNA data they had curated, they landed upon TileDB Cloud as a data management platform that checked all the boxes for their current single-cell, and future multi-omics requirements: