This blog post summarizes the recent webinar that Seth Shelnutt (our CTO) and I presented about TileDB Cloud, a revolutionary universal database that aims at redefining how organizations should manage their data, and how scientists and analysts should be able to easily, efficiently and inexpensively collaborate at global scale.
Below you can find the full recording of the webinar. This presentation effectively captures all the features and capabilities of TileDB Cloud that our team has been working on for the past couple of years. We have a long list of exciting features coming up, which we will be unveiling in future webinars soon. Stay tuned!
For those that prefer a 4' read instead, here is the gist in separate sections with the corresponding video clips for easier consumption.
What is TileDB Cloud and why use it? I covered at a very high level the need for unifying data management using a universal database that:
Seth started by explaining how all your assets are managed in the TileDB Cloud console. And by assets we don’t just mean data. We also mean code (as user-defined functions), notebooks, dashboards and ML models. Seth covered basic concepts around catalogs, descriptions, metadata and sharing, which all apply equally to all assets for a very simple reason: all assets are stored as arrays in the world of TileDB.
One of the most powerful features of TileDB Cloud is its totally serverless infrastructure. Any compute task from simple slicing, to SQL, to user-defined functions (UDFs), to sophisticated task-graphs can be easily defined and invoked by the user, and TileDB Cloud automatically scales and deploys the tasks. Users are freed from ever sizing or spinning up a single cluster.
Seth described a very simple example of using task graphs to carry out embarrassingly parallel ingestion. This feature is enabled by the combination of the parallel reader / parallel writer model of the TileDB storage engine, and the ability of TileDB Cloud to serverlessly scale across thousands of workers without the user needing to spin up and monitor clusters. The sky's the limit when it comes to building task graphs for implementing any sophisticated distributed computing algorithm and pipeline.
Seth unveiled one of the new features of TileDB Cloud: dashboards. Specifically, any user can now create their own dashboards using Python widgets or R Shiny apps. Then those dashboards can be shared with any other TileDB Cloud user, and spun up on demand in a scalable way.
With every asset in TileDB Cloud being an array, we implemented a unified way to share data and code across multiple users, within and outside an organization, at unprecedented scale. You can define any access policies and take research and analysis collaboration to another level. Adding monetization capabilities in the mix (with an elegant Stripe integration), you can now join a massive marketplace of data and code with an important difference to anything else you have experienced so far: all data and code is analysis-ready. You no longer have to force your users/customers to download data or deploy code. Instead, they can operate directly and efficiently inside of TileDB Cloud, reducing operational costs for all parties involved and drastically increasing time to insight.
Last but not least, TileDB Cloud offers a nice and easy way to manage your profile and view your billing details. Contrary to other cloud platforms that are notorious for unpredictable costs, billing in TileDB Cloud is ultra transparent, providing you with insights into your spend.
Here are the slides we used in the webinar.
A few final remarks:
Last but not least, a huge thank you to our awesome team for all the amazing work!