Cambridge, MA, March 20, 2023: TileDB, the database for any complex data and compute, today announced the launch of TileDB-SOMA, the first collection of software libraries that implement the open-source SOMA API specification. SOMA and TileDB-SOMA are the result of a collaboration between the Chan Zuckerberg Initiative and TileDB to accelerate single-cell research by eliminating data silos and enable large-scale computations that are otherwise too challenging to execute on commodity hardware.
New technologies and analysis tools have led to the exponential growth of single-cell RNA sequencing (scRNA-seq) data, requiring new solutions that can accommodate datasets at scale. Advancements in genomics technologies have also enabled researchers to combine multiple modalities of data collected from the same cell samples, increasing the complexity and impact of single-cell analysis.
"The unsaid assumption in single-cell research is that dataset size is bound by RAM, but instead of asking researchers to change their computational tools, we’re rethinking how the data model itself could do more heavy lifting for scientists," said Stavros Papadopoulos, Founder & CEO, TileDB, Inc. "With TileDB-SOMA for R and Python, computational biologists can work across programming languages and combine data that was previously formatted specifically for Seurat, Anndata/Scanpy or Bioconductor. This breaks down data silos, and allows scientists to collaborate without the hassle of converting or duplicating data. Everyone can access the dataset, stored locally or in the cloud, at any scale."
SOMA makes cloud-based, single-cell data readily available for analysis and rapid experimentation. SOMA is a flexible, open-source API spec designed to enable access to any dataset that can be modeled as groups of annotated sparse 2D matrices. Storage engines that implement the SOMA spec allow scientists to expand their research across a growing body of scRNA-seq data using existing computational tools. Initially developed to help researchers query large single-cell biology datasets directly in cloud storage without loading unneeded data into RAM, SOMA’s design requirements can be applied to a wide range of scientific data.
The first two SOMA API implementations, TileDB-SOMA for Python (version 1.0) and TileDB-SOMA for R (pre-release), are based on the TileDB open-source and cloud-optimized storage engine, and allow single-cell researchers using different tools — Anndata/Scanpy and Seurat, with Bioconductor coming soon — to access large cloud-based datasets quickly and conveniently from different programming languages.
"By streamlining access to enormous datasets, powerful new tools like TileDB-SOMA will accelerate the research efforts of single-cell biologists," said Ambrose Carr, a computational biologist and Director of Product Management for Single-Cell Biology at the Chan Zuckerberg Initiative. "Our engineering team collaborated closely with TileDB to build TileDB-SOMA as a readily accessible, cloud-based storage engine for single-cell datasets, so scientists have the ability to execute complex queries faster and more efficiently. Our team is excited for the launch of this new tool, which will solve some of the fundamental data accessibility challenges facing the single-cell community."
Learn more at github.com/single-cell-data/TileDB-SOMA.
About TileDB
TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 as a database built on an open array engine to structure complex data for optimized cloud compute and analytics. The company's flagship product, TileDB Cloud, streamlines data management and provides extreme performance at any degree of dimensionality, and at any scale. TileDB also develops a wide range of open-source tools for interoperability across the data science and scientific computing ecosystems. TileDB is backed by Two Bear Capital, Nexus Venture Partners, Uncorrelated Ventures, Intel Capital and Big Pi. Start for free by signing up at cloud.tiledb.com.