Customer Case Study
Domain
Biotech
Datatypes
Single-cell
RNA-seq
Cellarity's cell-centric approach to drug discovery
The traditional approach in drug discovery has been to reduce disease biology into a single molecular target, then leverage high-throughput screening to identify molecules that bind to such targets. In contrast, Cellarity focuses on the whole cell, as disease often isn't driven by one mechanism or protein alone.
“We are a drug discovery organization using AI to find drugs that were never possible before,” says Parul Bordia Doshi, Chief Data Officer, Cellarity. “We have a unique way of approaching things. We don't start with a target and simplify biology like most others do. We instead embrace the complexity and do drug discovery from the point of view of the whole cell.”
Using single-cell technologies, scientists at Cellarity identify cellular drivers of the transition from health to disease, then apply deep learning models to create drugs that reverse disease at the cellular level.
Need for a better data management solution
The Cellarity data science and visualization team needed to analyze transcriptomic data from hundreds of millions of single cells to support deep learning models that inform drug development. They realized their file-based storage approach was limiting in both scale and functionality, which was resulting in inefficient data wrangling across their teams of engineers and scientists.
Handling 100s of millions of cells spread across datasets with a file-based approach had many challenges:
Why Cellarity chose TileDB
The Cellarity computational biologists and machine learning team learned about TileDB via the collaboration with the Chan Zuckerberg Initiative on SOMA and the TileDB-SOMA libraries. SOMA is a flexible, open-source API spec designed to enable access to any dataset that can be modeled as groups of annotated sparse 2D matrices, which is ideal for single-cell data.
The team turned their AnnData single-cell datasets into a collection of SOMAs and started with a database solution that enabled highly performant queries across datasets and experiments as well as data and code collaboration. TileDB Cloud fulfilled all their requirements for analysis-ready single cell data on cloud storage:
Reducing the data engineering burden for a promising future
“We believe that TileDB is a FAIR platform. We now have a catalog, a single source of truth, and we can always go back to it and update it at scale in parallel,” says James Gatter, software engineer at Cellarity. “In addition, TileDB Cloud's compute power, and the ability to slice through TileDB arrays is really great. We've gained the capacity to run queries and compute across our catalog.”
With single cell data amassed from millions of cells, the Cellarity team could not easily interoperate across individual experiments using their old architecture. With TileDB, the ability to aggregate those datasets has drastically improved. In addition, it is easier to make atlases and retrain. “Before it would take us days to build an atlas. Now it's just a very programmatic solution for doing that in parallel and that takes about an hour,” says Gatter.
“In addition, before TileDB, reading across catalogs and trying to update them to conform to the newest standards of data would have taken the team a significant amount of time. With TileDB we are able to make those changes within an hour,” says Gatter.
“TileDB solved a very unique problem for us,” says Doshi. “TileDB has improved our computational performance and organizational efficiency. By reducing the data engineering burden, our ML and computational scientists can focus on the science.”