A conversation with Chad Krilow: Building a more flexible data management ecosystem

Table Of Contents:

Jumping the hurdles of processing massive multimodal datasets

The flexible approach behind TileDB’s technology

Why TileDB Carrara is an ecosystem for the future of life sciences data management

From single-cell data to population genomics, multimodal datasets in life sciences are both inherently complex and getting larger. And as new data modalities emerge, legacy data management solutions that rely on tabular databases struggle to scale. As Chad Krilow dealt with such data bottlenecks across his career as a bioinformatics engineer, these challenges inspired him to help create a better way. In his work as Director of Solutions Architecture at TileDB, Chad leads a team building a more flexible data management ecosystem to meet the challenges and potential of multimodal data. We interviewed Chad to learn more about his journey and what drives his work.

Jumping the hurdles of processing massive multimodal datasets

What were some of your proudest career accomplishments prior to joining TileDB?

Krilow: During my time at Quest Diagnostics, I was part of a team that helped develop a really groundbreaking cancer panel. This was a forward-thinking human genomics project that I saw really help people during my time at Quest. Then transitioning over to my time at the NIH here in the DC area, I was able to work for Dr. Francis Collins’s lab at the NHGRI where we did research in the single-cell space as well as in their rare and genetic disease discovery space, which led to new genetic therapies as well as many really impactful papers.

At NIH, I was facing challenges with data processing. Questions on how to handle these massive amounts of data both intramurally and extramurally. And I was drawn to TileDB’s forward thinking approach to data management, specifically how it utilizes multi-dimensional arrays. I saw the potential of this technology to address the kind of bottlenecks I had faced. And now as a solutions architect and bioinformatics engineer at TileDB, I feel like I have the opportunity to help other life science firms get past these hurdles and accelerate their research.

You joined TileDB in January of 2022 as a Genomics Solutions Architect, and now you’re the Director of Solutions Architecture. Can you tell me more about that journey?

Krilow: In early 2022, we were a smaller company, and I was one of two solutions architects. This led to a lot of tight collaboration with my colleagues in sales and engineering, helping me learn and really get a sense of what people are doing with these solutions, and ask how we can improve what we have to make life easier for these folks.

Moving into the role of the director, now I get a better chance to mentor and help other folks who are coming in as solutions architects like I was. It’s great to see them grow and make an impact, not only within TileDB, but also in the companies that we work with and the people whom we interact with at conferences and trade shows.

The flexible approach behind TileDB’s technology

You’ve also spent a lot of time working directly with TileDB partners and customers. How do you help each customer get the most out of TileDB’s solutions?

Krilow: For each of our customers, we try to pinpoint and identify “What is the business value that we're trying to drive here? What’s their why?” If they need their data analyzed more quickly, why is that? If they need to figure out earlier on in a diagnostic process where a gene is coming from, why is that? For example, it might be because they want to intervene with a certain medicine.

Once we've identified these business needs, we can find the best, quickest way for our technology to meet these business needs so they get the best return on investment. That's what I try to do with every customer: Identify where we can make an impact and make things more efficient, then come up with a plan to execute. This is how we work together to make our solution identify drug targets early on in the process or expedite rare genetic disease research or find tumors more efficiently. Through our team and their team working together, we find a robust and cost-effective solution for them, with TileDB as the ecosystem that helps drive it all.

This flexibility in approach ties into one of TileDB’s big advantages. With tabular databases, you’re dealing with a rigid row-based structure that struggles with the high dimensionality and sparse nature of multimodal data like genomics, VCF, single-cell RNA sequencing and spatial and biomedical imagining. This often leads to data being shoved into formats that are inefficient for both storage and analytics. That’s created large data wrangling issues that I’ve suffered through in my career as a bioinformatician. It really hinders the pace of discovery. This is why our multi-dimensional database at TileDB provides a flexible data model in which users can define dimensions and attributes for specific data types instead of forcing data into predefined schemas like a tabular database.

Let’s talk more about that. What makes TileDB’s multi-dimensional arrays a better fit to drive discovery with the multimodal data types you mentioned?

Krilow: Multi-dimensional arrays are well suited for multimodal data: genomics, single-cell, transcriptomics and spatial data because these arrays have a natural ability to represent the inherent structure of these data types. Take Variant Call Format (VCF) files for example. They can be sparse three dimensional arrays, and similarly single cell data can be represented as large sparse matrices. So when you store that kind of data in a multi-dimensional array, it's almost like a native format. And then you can perform really high efficient querying on this data. This is how our customer Quest Diagnostics can efficiently integrate and query data from millions of cells to identify novel drug targets and understand disease mechanisms at the cellular level.

Plus with TileDB's technology, it all sits in a cloud object store and you can operate it directly from there. This saves on the substantial cost associated with old analysis pipelines that do things like perform data egress from object stores to large EC2 so they can operate on the data files. In this case, single-cell data has to be operated in main memory, which is computationally expensive. But with TileDB, the multi-dimensional arrays query more efficiently and you can operate directly in object storage to drive costs down. It’s a really elegant fit for this use case.

Can you give me a specific example of how TileDB’s technology is directly addressing a key problem you faced in your earlier career in bioinformatics?

Krilow: I was always dealing with ever growing variant data sets. That’s an N+1 problem for all intents and purposes. It was a constant struggle because each new sample you add requires reprocessing the entire cohort. With TileDB's ability to update multi-dimensional arrays on demand, the N+1 problem is elegantly solved. We can add data incrementally without rewriting and reindexing the entire data set. That dramatically reduces computational cost and time.

For instance, at Rady Children's Hospital they leverage TileDB to manage their vast genomic variant data for the genome sequencing program that they run to quickly identify disease-causing variants in newborns. For their bioinformatic pipelines, TileDB is the underlying and unifying data engine. So instead of passing flat files between tools, each step in these pipelines can read and write directly from TileDB arrays. This eliminates the need for intermediate conversions bringing data from S3 to main memory and streamlines the entire workflow.

Why TileDB Carrara is an ecosystem for the future of life sciences data management

How do you see TileDB Carrara building on our technology to unlock more possibilities in multimodal data?

Krilow: The vision for Carrara is an ecosystem for the future of life science data management. We’re building Carrara to be an end to end platform for not only data organization, structuring, and collaboration but also analysis. It's going to provide a unified collaborative environment that'll break down a lot of these data silos that exist right between research teams and organizations as well as databases and the applications that need access to that data.

One of the real value drivers that Carrara provides is the fact that regardless of where multimodal data is stored, it can be accessed by those who need it in a centralized and secure location. I really believe that Carrara is going to foster a more collaborative and efficient research ecosystem that’s going to empower life science organizations to work together more efficiently at a global scale.

Explore how TileDB Carrara harmonizes multimodal data to drive life science discovery and collaboration.

Meet the authors