Customer Case Study

Scale and speed target discovery by applying ML to large-scale single cell data on TileDB

Phenomic AI is a pioneering biotech company with a mission to improve patient outcomes by delivering new medicines against the tumor stroma, a barrier that surrounds cancer and stops today's medicines from working. The company's platform, scTx®, employs advanced machine learning tools for integrating curated scRNA at scale to enable discovery of novel stromal targets. Armed with one of the world’s largest single-cell RNA (scRNA) datasets and unique deep-learning tools, Phenomic is positioned to identify game-changing new targets and transform cancer care for patients with challenging stromal-rich tumors occurring in pancreatic, colorectal, lung, ovarian, and breast cancers.

Security

Access controls aligned with datasets, not files

Storage & Compute

Fast Data access, in a single cloud solution

TileDB APIs

Efficient out-of-core computation

Partner

Phenomic AI

Domain

Life Sciences

Datatypes

Single-cell, scRNA-seq

Challenge

Growing from 2 million to 30 million cells

At the core of Phenomic AI's target discovery platform are machine-learning models that integrate hundreds of curated scRNA datasets. With roots in developing innovative ML models for analyzing microscopy images, the company began leveraging scRNA data to support target discovery several years ago. Seeing an opportunity to leverage scRNA data at scale for oncology target discovery, Phenomic has been amassing and curating single-cell data, in the last year growing from 2 million cells to approximately 30 million cells. While this increased scale is enabling more robust discovery of better-targeted medicines, the added data processing demand slowed the ability of bioinformaticians and data scientists to iteratively query and analyze the new single-cell data.

Phenomic were storing flat files in the AnnData format on Amazon S3. When datasets were in the tens of gigabytes, the dataset could be downloaded into memory and quickly accessed. However, when the combined datasets grew beyond the memory constraints of even large instances, Phenomic's bioinformatics team realized that they needed a database solution to scale complex metadata queries and support specific single-cell access patterns for their accelerated implementations of key tools such as differential gene expression (DGE). As a result Phenomic began to look for a better solution to storing and managing their single-cell data workflows, with a focus on identifying a platform that would also enable effective data sharing and collaboration between their software and wet-lab teams.

Challenge

Enter TileDB-SOMA and TileDB Cloud for single-cell analysis

The machine learning team at Phenomic AI evaluated a range of cloud data management solutions, including SQL-based tools and TileDB-SOMA, which provides Python and R implementations of the open SOMA API specification for storing and analyzing large collections of single-cell experiments directly on cloud object stores. Impressed with the ability of TileDB to allow fast access to the massive amounts of scRNA data they had curated, they landed upon TileDB Cloud as a data management platform that checked all the boxes for their current single-cell, and future multi-omics requirements:

A unified system with cataloging capabilities for all single-cell datasets application and experimental metadata.
A single platform for multi-omics to support future plans spanning proteomics and spatial transcriptomic analysis.
Usability and ease of extracting, filtering and downsampling subsets of large datasets to accommodate analyses like differential gene expression at rapid speeds.
A collaborative environment to manage all data, metadata and custom algorithms in a single contextual spot for a growing research team.
Serverless cloud architecture, allowing experts to focus on scientific analysis and ML, not on data engineering and pipelines.

TileDB was the best database and platform out there for our cloud workflows and unique domain of single-cell research. Of course, TileDB delivered the analysis speed, scale, and usability throughout our evaluations. What sets TileDB apart is their single-cell biology team — they have walked in our shoes and are innovators in this field.

Dr. Stephen Kingsmore

Sam Cooper

More Case Studies

Customer Case Study

How Quest Diagnostics® is building an enterprise-wide multi-omics data mesh with TileDB

Customer Case Study

TileDB supports expanded newborn screening and genetic diagnosis program for Rady Children’s Hospital