Hi there!
TileDB has grown since we last reached out. We have put our Series A funding to work — tripling the size of our team in order to accelerate development of TileDB Cloud and TileDB Embedded. Here is what we have been up to.
Rolling releases are coming faster and more frequently to TileDB Cloud. Read on to learn about our expanded code-sharing capabilities and newly added public datasets, as well as some upcoming functionality.
We introduced array sharing on TileDB Cloud more than a year ago. Now, the same sharing and access controls extend to Jupyter notebooks and to user-defined functions (UDFs), allowing users to neatly share code alongside arrays. Code and its data are self-contained and ready to run, for fast reproducibility. All activity on shared resources is monitored and logged.
In addition to privately sharing arrays with other users, you can publicly share datasets within TileDB Cloud. As TileDB, Inc., we are publicly sharing datasets to help with your next genomics or geospatial analysis.
TileDB-Inc/vcf-1kg-phase3-data – 70 GB sparse array that contains an analysis-ready version of the 1000 Genomes Project genomic variant data, created using the TileDB-VCF library.
capellaspace/SN6_CAPELLA_* – Collection of 85 dense arrays that comprise the Capella Space: SpaceNet 6 Expanded Dataset Release of SAR imagery.
We are working to add the mapboxgl-jupyter plugin as another of the many pre-installed Python packages available for geospatial notebook images on TileDB Cloud. The update will allow TileDB Cloud users to render Mapbox maps directly in Jupyter notebooks. The basemap is retrieved from Mapbox, and TileDB Cloud will enable users to overlay array data as additional map layers. As part of this design, TileDB Cloud will also provide a vector tile server, extending your data beyond notebooks. Data in TileDB Cloud arrays will be made available as additional map layers to a local vector tile client, or to any web client that supports Mapbox Vector Tiles.
An extension of array sharing, TileDB Cloud creates a marketplace that gives array owners the option to monetize their data by setting usage-based pricing. We will soon add similar marketplace functionality to notebooks and UDFs — all at no extra cost to sellers.
UDFs on TileDB Cloud currently work only on Python. We will soon support serverless UDFs and array UDFs in R.
Once you save a Jupyter notebook or register a UDF on TileDB Cloud, we store these objects using TileDB arrays on the backend. In TileDB, data versioning and time traveling are built into the data format. Soon, we will be surfacing these capabilities on TileDB Cloud, enabling users to review the version history of notebooks and UDFs.
In the past few months, we have focused on expanding support for cloud object storage services, increasing options for more data types, and adding a new Hilbert layout for efficient space-filling-curve ordering of cells. We are also excited about our latest feature for pushing down attribute filtering to the storage engine. It’s akin to SQL WHERE clause functionality, but within the context of the NumPy-like slicing mechanics you are already familiar with when using TileDB.
Visit our release notes for highlights and to dig deeper into other GitHub releases. You can also submit a feature request on our feedback page.
TileDB was recently interviewed on the Scene From Above Podcast. Hosted and produced by earth data scientists Alastair Graham and Andrew Cutts, Stavros and Norman from TileDB discussed how a universal database based on dense and sparse multi-dimensional arrays can unify all geospatial data. We had an excellent time and encourage you to listen.
That’s all for now! Have a happy and healthy summer.
Thank you,
— The TileDB Team