This blog post summarizes a recent webinar hosted by myself, Jason Brown (Remote Sensing Image Scientist and Community Enablement Engineer at Capella Space) and Norman Barker (VP of Geospatial at TileDB), where we shared our experience analyzing SAR data using TileDB Cloud, speeding the data exploration and maximizing collaboration among researchers and citizen scientists alike. SAR is an exciting technology that allows producing high-quality earth imaging even in the presence of clouds or at night, and Capella Space is an amazing company that produces high-quality SAR imagery products.
Here is the full webinar recording:
Special thanks goes to Vicky Liau and Margriet Groenendijk from the TileDB team that contributed immensely to the technical materials presented in the webinar. Also big thanks to Hobu, Inc. for providing us with the open LAS dataset we used in the demos.
TileDB, Inc. is building a universal database, which can store all types of data in a canonical format (dense or sparse multi-dimensional array) and analyze them with a growing set of language APIs and open-source tools. It has two offerings:
Due to its universality, TileDB can efficiently manage both LiDAR (sparse point clouds) and SAR (dense images) data, in a single solution and with a unified API.
LiDAR is a collection of 3D points with various fields typically stored in file formats like LAS and LAZ. TileDB stores such data as 3D sparse arrays with native floating point coordinates, and integrates with the popular PDAL library. It builds R-trees on top of the (X, Y, Z) coordinates for rapid slicing, and takes advantage of its uber optimized cloud-native data format to perform superbly on cloud object stores like AWS S3. In addition, it offers lock-free, parallel writes for fast and scalable ingestion, while providing versioning and time traveling. TileDB is ideal if you are tired of storing thousands of separate LAS/LAZ files and having to handle metadata separately. You can find more details about TileDB’s LiDAR capabilities here.
SAR is a collection of 2D images, typically stacked on a time dimension. TileDB models SAR data as either a 2D dense array, or a 3D dense array where the third dimension is time. TileDB’s dense array format and lightweight indexing leads to rapid slicing on the dimensions. Similar to LiDAR, its cloud-native format makes TileDB ideal for storing huge quantities of SAR data on inexpensive cloud object stores, and offers fast versioning and time-traveling. TileDB integrates with the popular GDAL library for fast ingestion from TIFF and other file formats that store SAR data.
Before TileDB, SAR and LiDAR data seemed so different. Therefore, users would typically store SAR in TIFF format and LiDAR in LAS/LAZ. But there are so many opportunities for extracting valuable insights when fusing SAR with LiDAR data, such as producing higher-quality point clouds colored with the help of SAR imaging, or creating training datasets by superimposing labeled LiDAR data on SAR images for object detection and classification used in future imagery acquisitions. Data scientists would have to manually wrangle the different file formats and data models, build a colossal infrastructure for scalable compute, and integrate with state-of-the-art machine learning tools and other analytics software for their analyses.
With TileDB, the data scientists’ and analysts’ lives become significantly simpler. SAR and LiDAR data is not that different in the TileDB world. They can both be modeled by multi-dimensional arrays. Granted, those arrays have different schemas (e.g., one may be 2D dense, another 3D sparse), but they are all arrays nonetheless, stored and managed with a single solution. Moreover, TileDB Cloud provides all the necessary functionality for advanced analytics and ML, in addition to traditional database management (such as authentication, access control and logging). Specifically, it provides a powerful computational framework based on serverless task graphs, and extreme interoperability between the TileDB format and popular languages and tools such as, PDAL, GDAL, BabylonJS, Python, R, TensorFlow, MariaDB, and many more.
Watch the webinar video to get a taste of the power TileDB Cloud grants you in your SAR and LiDAR analysis.
The example notebooks are publicly available on TileDB Cloud and can be accessed here. You can freely download the notebooks, or simply run them directly inside the TileDB Cloud platform with a click of a button (you can sign up and contact us for free credits). Some of the notebooks make use of data from the Capella Space Open Data program, so you’ll need to request access directly from Capella Space (or contact us if you need help with that).
Here are the slides Jason Brown and I presented during the first part of the webinar.
A few final remarks:
Last but not least, a huge thank you to our awesome team for all the amazing work!