Even the most aggressive cancer begins with a single malignant cell. This in mind, the better we can study and understand individual cell data, the better we can unlock breakthrough therapies to treat cancer. The large scale and high resolution data generated by single-cell sequencing have become critical to the discovery and development pipeline for cancer treatments.
But to unlock the potential of single-cell data for target discovery, you need technology that can manage and analyze this frontier data at scale. This is challenging for conventional tabular databases, which struggle to process the higher computing demands and complexity of single-cell data. To address this critical need, TileDB created Carrara, a powerful and flexible database solution architected around multi-dimensional arrays, as well as TileDB-SOMA, a special purpose database system for mastering single-cell data. Let’s walk through how TileDB empowers single-cell researchers to overcome their unique data challenges.
Like most areas of systems biology research, single-cell research lacks a universal standard for storing multimodal data. This leads many data toolkits to use their own format for single-cell data, making it difficult to share and aggregate data across teams or organizations. Adding to the complexity, these toolkit-specific formats typically require loading the entire dataset into memory, which is increasingly infeasible as datasets grow in size. Finally, these varying formats are not optimized for cloud object stores, which have become the preferred and most economical storage option for large-scale data. In short, the immense potential of single-cell data to drive oncology breakthroughs is often unrealized because of data technology shortfalls.
This is why TileDB partnered with the Chan Zuckerberg Initiative (CZI) to develop a scalable, efficient and user-friendly storage solution for single-cell genomics data. The collaboration aims to address the challenges posed by the rapidly growing volume and complexity of single-cell data, enabling researchers to focus more on scientific discovery and less on data management. The outcome of this collaboration is two projects:
Built for TileDB Carrara, the TileDB SOMA implementation is optimized for cloud object stores, interoperable with popular tools and languages and highly scalable for atlas-scale data. Here’s how TileDB’s solutions address the existing issues and challenges when working with single-cell genomics data:
These capabilities led therapeutics company Cellarity to choose TileDB as their FAIR platform to empower their cell-centric approach to drug discovery. To support their deep learning models, Cellarity’s data science and visualization team needed to analyze transcriptomic data from hundreds of millions of single cells. However, their file-based storage approach failed to deliver the scale and functionality they needed, leading to tedious data wrangling across teams of engineers and scientists.
To learn how Cellarity unlocked the potential of their single cell data with TileDB and became able to build a single-cell atlas in less than an hour, read the full case study.