Table Of Contents:
About TileDB
CAMBRIDGE, MA - September 7, 2024 - As the field of single-cell RNA sequencing continues to evolve, researchers are increasingly interested in using these datasets to train foundational models for a wide range of applications. While training models on smaller datasets that fit into memory is relatively straightforward, scaling beyond single machines presents significant technical challenges.
WHAT: TileDB, the database designed for scientific discovery, and the Chan Zuckerberg Initiative (CZI) will lead a workshop at the inaugural scverse Conference titled, “Training Models on Atlas-Scale Single Cell Datasets.” Attendees will gain hands-on experience training models on CZI’s CELLxGENE Discover Census, a large dataset comprising 70 million cells with laptop-sized memory on Python, and learn about the technologies and resources that make this possible.
WHO: This workshop will be co-led by Ryan Williams, SOMA Software Engineer at TileDB, who has extensive experience in analyzing bulk and single-cell genomics data; and Maximillan Lombardo, Senior Product Applications Scientist at the Chan Zuckerberg Initiative, who is responsible for collaborating with the CELLxGENE team to engage the single-cell community and enhance the adoption of CELLxGENE tools.
WHEN: Thursday, September 12, 2024, at 9 a.m. local time (60 minutes long).
WHERE: scverse Conference, Technical University of Munich, Munich Germany; main conference room - MW 0350

STRUCTURE: The workshop will cover the following:
Section 1: TileDB The open source data format and storage engine enables efficient indexing and retrieval of large datasets stored on remote object stores like AWS S3.
Section 2: SOMA A language-agnostic data model and API specifically designed for storing and querying single-cell data using TileDB's format.
Section 3: CZ CELLxGENE Discover Census The world's largest public resource providing standardized single-cell data to researchers worldwide.
Section 4: SOMA/Census PyTorch Loaders Specialized loaders for PyTorch modeling optimized for memory-efficient training via TileDB-SOMA's support for out-of-core data access.
About TileDB
TileDB is the foundational database designed for scientific discovery. Powered by shape-shifting arrays, TileDB resolves the complexity of multimodal data so scientists and data teams can effectively glean meaningful insights from it. For more information about TileDB, visit: tiledb.com.
About the author

Devika Garg
Director of product marketing
Devika Garg leads product marketing for life sciences at TileDB. Prior to TileDB, she ran marketing engines at Pure Storage, Proteus Digital Health and Applied Materials. A scientist and a science journalist in her past life, she loves to geek out on the latest discoveries and inventions. She earned her PhD at the National University of Singapore, her MS in Science Communication at University of California Santa Cruz, and her B Tech at the Indian Institute of Technology Kanpur.
Meet the authors