Back

Jul 30, 2024

TileDB Newsletter - July 2024

Newsletters
3 min read
Devika Garg

Devika Garg

Director, Life Sciences Product Marketing

To discover breakthrough therapies, research teams need to dive deep into the complexities of biology.

Multi-modal omics data can show you the path to discovery – if you can make sense of this data’s unprecedented complexity and volume.

TileDB is designed for discovery – a purpose-built platform crafted to tackle these challenges. By making it easier to manage multimodal omics data, TileDB simplifies collaboration and analysis to unlock new therapeutic possibilities. Whether you're working with single-cell data, population genomics or biomedical imaging, TileDB upscales your life sciences workflows so you can find breakthroughs faster.

We want to be your partner in discovery. In this newsletter, we’ll share updates on new releases, features, new public datasets and learning resources available on TileDB Cloud.

The TileDB team

Single Cell Genomics

Efficiently query across multiple single cell experiments on TileDB Cloud

SOMA is an open data model for single-cell analysis, and TileDB-SOMA is our open-source API built on the TileDB array engine. It supports Python and R for easy integration with single-cell toolkits, allowing you to filter and select data without loading entire datasets into memory.

With the new SOMA Experiment Collection Mapper UDF, you can now query across multiple SOMA experiments simultaneously on TileDB Cloud’s serverless platform. Provide a list of experiment URIs and the query, and the UDF will execute it in parallel, returning AnnData objects for analysis. Check out the demo notebook for more details.

Image 1.png

Benchmarks of single-cell Census models

The Census is a collection of a variety of SOMA objects containing most RNA non-spatial data from CZ CELLxGENE. On this data large models are trained, the Census models. The Census team released new data models in 2023 to improve single-cell data access. These models were tested for capturing biological information and correcting data inconsistencies. While promising, further testing is needed for specific tasks. For more details, visit the full article here.

Population Genomics

Why precision medicine demands a completely new type of database

Precision medicine is revolutionizing healthcare, requiring advanced databases to handle and analyze vast, complex data quickly. Traditional databases fall short, struggling with huge data volumes and the need for nimbleness and real-time processing. Effective precision medicine databases must integrate diverse data types, support rapid analysis, and facilitate secure, seamless sharing among healthcare stakeholders. TileDB addresses these challenges, enabling comprehensive insights and better patient outcomes. For more details, read the full article here.

Two new methods in the VCF Cloud Toolbox

The VCF Toolbox, a Python library in TileDB Cloud, now includes two new methods to enhance and speed up variant analyses.

  1. Distributed Transform Query: This method transforms VCF query results in a distributed fashion, allowing for filtering, creating new columns, modifying column names and order, and generating new arrays like summary statistics or ML model inputs without assembling the entire dataframe.
  2. Annotation Query: This method quickly joins allele-based annotations with existing VCF datasets, offering custom ordering and subsetting, multiallelic splits, and zygosity to be included in the result without needing extra code.

Go to this notebook. for an example.

Meet us on the road

The TileDB team will be at the following events and conferences:

Want to see TileDB in action?
Devika Garg

Devika Garg

Director, Life Sciences Product Marketing