Table Of Contents:
Takeaway 1: Tabular databases cannot match the scale of single cell datasets
Takeaway 2: How machine learning is driving discovery in single-cell research
Takeaway 3: Why strong interoperability between data formats is key to effective research collaboration
3 key takeaways from the tech talk webinar
How can biotech organizations scale their data storage and analysis for single-cell data from tens of millions of cells? Why are standardization and data aggregation key to managing widely variable data across studies and repositories? What is the best way to prepare large-scale single-cell data for machine learning applications?
Takeaway 1: Tabular databases cannot match the scale of single cell datasets
As a biotech company developing new therapeutics for solid cancers, Phenomic AI’s approach centers on single-cell biology. “Single cell biology is the first sort of technology that’s allowed us to understand the full set of cell states that exist inside solid human tissues,” Cooper said. Since its founding in 2017, Phenomic AI has built a large atlas of tissue data based on single-cell RNA sequencing from thousands of patient samples, 1,600 mouse samples and 500 spatial samples.
When Phenomic AI scaled its dataset to nearly 100 million cells, they knew the sheer size of this dataset demanded a technology change. Traditional tabular databases stored in flat files on Amazon S3 struggled to efficiently process single cell data at this scale. What’s more, the combined datasets grew beyond the memory constraints of large AWS instances and could not handle Phenomic AI’s complex metadata queries and single-cell access patterns.
Takeaway 2: How machine learning is driving discovery in single-cell research
Cooper unpacked Phenomic AI’s innovative approach to driving discovery in single-cell research, beginning with using AI to power transcriptomics analysis of human tissue samples and building a massive atlas of tissue data based on single-cell data from many different studies. Using advanced machine learning applications for integrating curated scRNA at scale, Phenomic AI is improving target discovery of novel stromal targets.
However, this led to huge increases in data processing demand, slowing the ability of Phenomic AI’s bioinformaticians and data scientists to effectively query and analyze this data. To optimize their data infrastructure at the scale required for their massive single-cell dataset, Phenomic relies on TileDB’s platform and is transitioning to a specific data loader created by TileDB for added simplicity. “Optimizing our data infrastructure and increasing the amount of training data was the key thing in getting our models really accurate,” said Cooper, “We didn’t actually adjust the ML-architectures that much.”
Takeaway 3: Why strong interoperability between data formats is key to effective research collaboration
To learn more about Phenomic AI’s scalable approach to single-cell data analysis with TileDB, watch the full webinar here.
Meet the authors
Devika Garg
Director of product marketing