Hello!
Summer is in full swing, and we have been busy publishing more learning content here on the blog. Check out the full summary further below, but now, on to the tech updates!
There are schema design patterns in life sciences, earth observation, and other domains that regularly use TileDB arrays to optimize the higher-level data types found in their research. Now, these higher-level types are automatically categorized in TileDB Cloud's left navigation bar. Users still have access to generic TileDB Cloud data types — arrays, files, notebooks, UDFs, etc. — as well as grouped datasets.
Raster data can now be interactively previewed in TileDB Cloud. Each record's preview tab lets you pan, zoom, blend color channels, and see tile caching information to help users quickly explore imaging datasets under Geospatial / Rasters and Life Sciences / Biomedical Imaging.
We're working to make TileDB-VCF code on TileDB Cloud more convenient and less verbose by providing new one-liner functions for VCF ingestion and distributed queries. Here's a preview:
import tiledb.cloud.vcf as vcf
import tiledbvcf
# Initialize the TileDB config with AWS credentials (truncated)
config = {...}
# Set the URI of the VCF dataset
dataset_uri = "s3://bucket/prefix/vcf-dataset"
# One-line distributed VCF ingestion
dag, sample_uris = vcf.ingest(
dataset_uri,
config=config,
search_uri="s3://1000genomes-dragen-v3.7.6/data/individuals/hg38-graph-based",
pattern="*.vcf.gz",
max_files=10,
)
# Wait for the ingestion to complete
dag.wait()
# Get a list of samples in the dataset
ds = tiledbvcf.Dataset(dataset_uri, tiledb_config=config)
samples = ds.samples()
The 2.16.0 release features performance improvements to the existing set of compressors and filters for TileDB arrays, and it adds a new option for delta compression. This compressor will be used as part of a compression pipeline (in TileDB APIs or through PDAL profiles) offering TileDB compression that is equal to or better than LAZ.
Query conditions (QCs) on array attributes now support negation of the entire set of conditions via the NOT
operator. Previously, QCs supported the not-equal-to operator (!=
) on individual conditions; however, the new NOT
operator negates the entire statement. Also, as part of general performance improvements to dense arrays and local file systems, QCs can now be applied to the dimensions of dense arrays.
Core to TileDB's commitment to maintaining an open, cloud-native array engine, we recently updated TileDB Embedded to support the latest SDKs for Azure and Google Cloud. In addition to taking advantage of the usual cloud provider security and bug fixes, the new Azure SDK also provides TileDB support for Azure's premium block blob storage for high-performance workloads.
TileDB-Vector-Search provides an open-source Python API for storage and search of vector embeddings, built on top of the TileDB array engine. As part of the TileDB open-source ecosystem, TileDB-Vector-Search is cloud-native, with support for all TileDB backends (AWS S3, Azure Blob Storage, Google Cloud Storage). With TileDB-Vector-Search, it is now possible to store, process, and query data for the entire lifecycle of a vector search project in one unified system — everything from the raw data used for training (as TileDB arrays), to training and fine-tuning with TileDB Cloud task graphs, to indexing and retrieval of embeddings!
To complement existing TileDB support for raster data via GDAL and point cloud data via PDAL, TileDB now supports geometry-based queries as a vector driver for GDAL. Additionally, the TileDB-MariaDB integration also supports query pushdown for common spatial operations like ST_INTERSECT
and ST_CONTAINS
.
Geometries are stored in TileDB using the well-known binary (WKB) format as defined by the Open Geospatial Consortium (OGC) and take advantage of novel indexing techniques that use TileDB's existing R-tree structures. Look for more examples on TileDB geometries coming soon on the blog!
You can now visualize microscopy images stored as TileDB arrays with the napari n-dimensional image viewer. The napari-tiledb-bioimg plugin supports reading TileDB-BioImaging multi-resolution arrays within napari.
The TileDB-SOMA R API is fast approaching its 1.0 release, featuring the ability to import and export Seurat objects to and from SOMA experiments. Please try out the integration and send us your feedback!
There's a range of new introductory content published to the TileDB Blog!
In our latest webinar, we rethink how a modern database system that uses arrays as its foundation can morph to support data mesh implementations that unify tabular and complex data, generative AI, and data products.
This newsletter packed quite the punch. Thank you for reading!
If you'd like to share product feedback, simply reply to this email, join our Slack community, or follow us on Twitter and LinkedIn. We'd love to hear about your TileDB experience and future requirements.
Thank you,
— The TileDB Team