Hello TileDB Community! Here are the latest product updates:
The self-hosted version of TileDB Cloud now supports Google Cloud Platform (GCP) integration. Companies that use GCP as their preferred storage platform can now easily register their code and data assets through TileDB Cloud. Please don’t hesitate to reach out if you’re interested in an exploratory call with our sales team!
Along with the existing default image and server size, you can now also set a default region in the settings of your Jupyter notebook. You may navigate to Assets > Code > Notebooks and choose a notebook for which you have admin rights. You can go to Settings, scroll down, and you will find the option to choose a default region.
There is a new icon added to the Cloud Credentials table of your namespace settings, for each credential, that will indicate if a given credential has the permissions to run tasks & code. This is important for knowing at a glance which credentials can be injected into a task graph environment. Please note that only certain types of credentials allow for this action, which you will see when creating your credential on the UI.
Catalog is renamed as Assets to communicate additional functionality on top of a typical catalog, like previews, metadata editing, sharing, and more. Compute is renamed to Monitor to reflect the actual functionalities exposed on this menu.
TileDB 2.22 is now available, with the following improvements:
You can now bulk ingest and automatically register your bioinformatics variant data from its original raw format on cloud storage to a TileDB-VCF Group, with a few simple clicks on TileDB Cloud!
From Assets, navigate to Add Asset > Data > Life sciences > VCF and choose the Ingest VCF dataset option. You’ll need to have an ARN credential with the relevant permissions set up, as well as cloud storage paths to where your original data is stored & where the newly-created TileDB data will be stored. Finally, select Ingest, and keep an eye on the created task graph that will run your query end-to-end!
During VCF file ingestion, TileDB-VCF can now collect sample QC statistics, including depth, genotype quality, genotype calls, and variant type. Sample QC is computed automatically upon ingestion. These can be used to identify low quality samples and batch effects.
To see an example of this in action, check out this notebook on the TileDB Marketplace that shows how the sample_stats
array is created and stored upon ingestion!
TileDB has made public TileDB-VCF variant stores of dog genomes:
TileDB has also released 5.67 million Sars-CoV2 sample sequence variantscollected as part of the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines Tracking Resistance and Coronavirus Evolution (ACTIV TRACE) initiative. The dataset is a single publicly available TileDB-VCF store!
To work with these datasets, you can sign up to TileDB Cloud and launch your first notebook. We have a quickstart notebook for the basics of TileDB-VCF you can launch to get started even quicker!
For more advanced workflows, check out our SARS-CoV-2 notebook example, which uses a distributed query to search for common variants like D614G and Omicron.
Reach out to us if you are interested in publishing notebooks on these open datasets!
TileDB-SOMA now supports block-processing of data, empowering users to efficiently process large datasets without loading the entire dataset into memory. This new API provides a flexible method to iterate over your data in blocks, specifying your preferred format and dimensions. Read more in the SOMA documentation here!
It supports both row-wise and column-wise data access, ensuring that you can process your data exactly how you need it. Iterators can be created directly from any tiledbsoma.SparseNDArray
or via a query object, to process a specific subset of data. Access the data in the format you need, including Apache Arrow tables or re-indexed sparse matrices.
Head over & sign up to TileDB Cloud, launch your first notebook on a Genomics image, and run this notebook example using it to get started.
TileDB Bioimaging now supports ingestion of NDPI images through the tiledb.bioimg.from_bioimg API in TileDB-Bioimaging v0.2.11.
To get started with your own NDPI images, launch a notebook on TileDB Cloud using our default Genomics image and this notebook example
Users can now automatically ingest point cloud (LAS/LAZ formats) data programmatically to TileDB Cloud as arrays, with just a few lines of code!! The function linked below is super flexible. It can be invoked by either listing the files to be ingested or pointing to the object store that you wish to ingest. It will also work for ingesting multiple files from a single bucket, all ingested in parallel! Check out our example notebook to see this in action! You can launch this on a Cloud server & try it out on your own data!
Along with point cloud data, users can also automatically ingest raster formats programmatically to TileDB Cloud. Again, this function can be invoked by either listing the files to be ingested or pointing to the object store that you wish to ingest. The raster ingestion handles overlapping raster datasets and supports the specification of the NODATA
values needed for indicating either transparency or no actual data.
To get started, head over to TileDB Cloud and launch our example notebook, and follow the steps to ingest your own raster data!
TileDB has open-sourced new Rust bindings with approximately 45% coverage of the TileDB C API, including the key Array Schema, Query, and Config APIs. We have a set of usage examples covering array creation and querying, and extensive property-based tests for the wrapped APIs. Suggestions, bug reports, and contributions are very much welcome. We’d love to hear from you on Github or contact [email protected] if you have questions or interesting use-cases for TileDB with Rust!
Thank you,
— The TileDB Team