Customer Case Study

How Cellarity powers next generation drug discovery using single-cell data

Founded in 2019 in Somerville, MA, Cellarity is a therapeutics company that emerged from Flagship Pioneering. Cellarity operates at the intersection of biology, chemistry, high-dimensional data and machine learning. Their experts leverage cutting-edge biological and computational technologies to explore cellular and molecular components for drug discovery.

Catalog

FAIR platform for single-cell data to drive efficiency

Performance

< 1 hour vs days to build a single-cell atlas

Storage & Compute

< 1 hour to update single-cell catalogs to newest standards of data

Partner

Cellarity

Domain

Biotech

Datatypes

Single-cell, RNA-seq

Cellarity's cell-centric approach to drug discovery

The traditional approach in drug discovery has been to reduce disease biology into a single molecular target, then leverage high-throughput screening to identify molecules that bind to such targets. In contrast, Cellarity focuses on the whole cell, as disease often isn't driven by one mechanism or protein alone.

“We are a drug discovery organization using AI to find drugs that were never possible before,” says Parul Bordia Doshi, Chief Data Officer, Cellarity. “We have a unique way of approaching things. We don't start with a target and simplify biology like most others do. We instead embrace the complexity and do drug discovery from the point of view of the whole cell.”

Using single-cell technologies, scientists at Cellarity identify cellular drivers of the transition from health to disease, then apply deep learning models to create drugs that reverse disease at the cellular level.

Need for a better data management solution

The Cellarity data science and visualization team needed to analyze transcriptomic data from hundreds of millions of single cells to support deep learning models that inform drug development. They realized their file-based storage approach was limiting in both scale and functionality, which was resulting in inefficient data wrangling across their teams of engineers and scientists.Handling 100s of millions of cells spread across datasets with a file-based approach had many challenges:

Lack of a single source of truth: Single cell data was not cataloged to be easily findable and accessible, and was not in compliance with FAIR data principles.
Sluggish access to data: Single-cell data files had to be downloaded and loaded into main memory by each scientist, which was an inefficient process.
Slowed collaboration: Tedious downloads and uploads as well as siloed analytics in Excel sheets meant there were no standard processes for concurrent access and versioning, which prevented scientists from collaborating on the same dataset
Inability to drive traceability and reproducibility: Lack of ability to analyze data sets, conditions and changes over time made it difficult to make informed decisions and avoid mistakes.
Inability to handle new data modalities: The file-based approach made it hard for teams to efficiently query across multiple datasets, preventing them from incorporating modalities like spatial transcriptomics into their research.

Why Cellarity chose TileDB

The Cellarity computational biologists and machine learning team learned about TileDB via the collaboration with the Chan Zuckerberg Initiative on SOMA and the TileDB-SOMA libraries. SOMA is a flexible, open-source API spec designed to enable access to any dataset that can be modeled as groups of annotated sparse 2D matrices, which is ideal for single-cell data.

The team turned their AnnData single-cell datasets into a collection of SOMAs and started with a database solution that enabled highly performant queries across datasets and experiments as well as data and code collaboration. TileDB Cloud fulfilled all their requirements for analysis-ready single cell data on cloud storage:

Data flexibility: Ability to model and transition AnnData into TileDB SOMA objects
Versatile support across all requirements: Only TileDB put forth experts across data management, single-cell and devops infrastructure.
Performance: Scale to unprecedented single cell datasets, all living on object stores with distributed analysis.
Open and future-ready architecture: Easy integration with existing solutions and applications such as Saturn Cloud.

Reducing the data engineering burden for a promising future

Why Cellarity chose TileDB

“We believe that TileDB is a FAIR platform. We now have a catalog, a single source of truth, and we can always go back to it and update it at scale in parallel,” says James Gatter, software engineer at Cellarity. “In addition, TileDB Cloud's compute power, and the ability to slice through TileDB arrays is really great. We've gained the capacity to run queries and compute across our catalog.”

With single cell data amassed from millions of cells, the Cellarity team could not easily interoperate across individual experiments using their old architecture. With TileDB, the ability to aggregate those datasets has drastically improved. In addition, it is easier to make atlases and retrain. “Before it would take us days to build an atlas. Now it's just a very programmatic solution for doing that in parallel and that takes about an hour,” says Gatter.

“In addition, before TileDB, reading across catalogs and trying to update them to conform to the newest standards of data would have taken the team a significant amount of time. With TileDB we are able to make those changes within an hour,” says Gatter.

“TileDB solved a very unique problem for us,” says Doshi. “TileDB has improved our computational performance and organizational efficiency. By reducing the data engineering burden, our ML and computational scientists can focus on the science.”

FAIR data catalog : for single-cell data to drive efficiency
< 1 hour :to build a single-cell atlas which previously took days
< 1 hour : to update single-cell catalogs to conform to the newest standards of data

TileDB solved a very unique problem for us. TileDB has improved our computational performance and organizational efficiency. By reducing the data engineering burden, our ML and computational scientists can focus on the science.

Parul Bordia Doshi

Chief Data Officer at Cellarity

More Case Studies

Customer Case Study

Scale and speed target discovery by applying ML to large-scale single cell data on TileDB

Customer Case Study

How Quest Diagnostics® is building an enterprise-wide multi-omics data mesh with TileDB