Director, Life Sciences Product Marketing
Discovery. It’s more than a clever word chosen by a marketing department. For life sciences organizations searching multimodal data to develop new treatments for cancer and rare diseases, discovery is finding the right targets. This crucial step of identifying and validating the molecules or genetic entities altered by a disease shows researchers how these affected targets can be modulated by new drugs or treatments. Improve target discovery, and you accelerate how quickly a life sciences organization can bring life-saving drugs to patients.
At TileDB, we call our platform a “database designed for discovery.” From the start, we wanted to build more than a database that helps query data. Instead, our aim is to create a database that empowers our customers to discover actual insights that lead to true breakthroughs. In this blog post, we want to unpack what a database designed for discovery really means and why it is more important than ever to life sciences firms.
Organize: The discovery process can begin when life sciences teams are able to account for and locate all relevant data. However, this is difficult when data lives in different places, exists in varying formats and involves large and small quantities. To organize life sciences data effectively, research teams need to be able to create folder hierarchies, easily attach meaningful descriptions and metadata and quickly locate specific data with powerful and easy to use search functionality.
Structure: After organizing their data, life sciences teams need to bring structure to any data not already in databases. This is especially vital for frontier data types that are inefficient to manipulate or analyze at scale in their original formats. To derive maximum value from these novel modalities, life sciences teams must structure their frontier data into an optimized data format while preserving its domain specificity and useful metadata.
Collaborate: Once life sciences researchers have organized and structured their data, they need to ensure all key stakeholders across the world can safely access and work on it together. This requires simple yet secure ways of democratizing data in defined data products that also have strong governance controls—in short, turning their object store into a trusted research environment. Otherwise, life sciences firms can easily develop silos that hinder collaboration and shared productivity.
Analyze: Now that all their data is properly organized, structured and shared, life sciences teams are ready to effectively analyze all their data—including genomics, transcriptomics, clinical spreadsheets and imaging data—to drive discovery. But because different researchers have different needs and goals, it’s essential they can analyze all their data types with their preferred tools and mechanisms. Having true technology flexibility in the Analyze step accelerates the time from sample to insight as well as the preparation of regulatory submissions.
This journey to discovery is paved by the complex and multimodal data that we call frontier data, which we believe holds the future of life sciences breakthroughs. But while frontier data is the cornerstone of drug and target discovery, many life sciences organizations are struggling to unlock this multimodal data’s potential. Why? Because they are trying to do so with traditional database technology designed for completely different purposes. Here are three key reasons life sciences firms cannot rely on tabular databases to drive discovery:
1. Tabular databases cannot handle the diversity of life sciences data. While traditional databases excel at handling data structured in tabular datasets, an estimated 80% of data is considered unstructured and comes from a diverse range of sources and formats. This means traditional databases cannot manage data types like images, PDFs and more complex frontier data until this data is structured in tabular format. Practically, this means all this non-tabular data is set aside until someone gets around to structuring it in a table—until then, it’s largely useless to a life sciences research team relying on tabular databases. Considering how many life sciences data types and mechanisms do not naturally fit in a table, disregarding “unstructured data” is ignoring a huge amount of potential insight.
2. Tabular databases lack the secure collaboration that sensitive data requires. Life sciences data contains privileged and proprietary information subject to HIPAA and other data privacy statutes. This can make moving or sharing this data across organizations both difficult and expensive. The result is life sciences firms often create sprawling database solutions for different data types with different access rules, making it tough to share information and collaborate across teams while also increasing complexity and costs. Instead of relying on these unwieldy tabular databases, organizations need a single centralized trusted research environment that enables simple and secure collaboration across teams while maintaining SOC Type 2 and HIPAA compliance.
3. Tabular databases struggle to analyze frontier data at scale. The sheer scale of frontier data in life sciences is beyond what tabular databases were designed for. Take single-cell data as an example; with datasets that can encompass 1 million observations, single-cell data is difficult to even visualize using traditional database technology, much less perform useful analysis. The result is that single-cell data is often stored in bespoke and complex file formats inside large data volumes. This simplistic cataloging of single-cell data means researchers cannot analyze this data without a massive investment in computing power—and even then, the analysis takes a long time to perform. Rather than throwing more and more time and computing resources at a problem databases are ill-equipped to solve, life sciences firms need to adopt shape-shifting multi-dimensional arrays that can bring structure to complex frontier data and process it at scale.
After observing these struggles with tabular databases, we knew a novel approach was necessary to empower life sciences researchers to organize, structure, collaborate on and analyze their data. By solving for all four steps of the discovery journey, this new solution would deliver faster time to insights, simpler infrastructure and greater economies of scale for life sciences organizations. We call this database designed for discovery TileDB Carrara, and invite you to learn more about how it accelerates discovery by supercharging life sciences data.