November 14, 2023

How to Process LiDAR Data with Snowflake

By John Emery

This blog was co-written by John Emery and Venkatesh Sekar.

Almost every person uses geospatial data and analysis in their daily life in one way or another. Often subconsciously, we analyze our surroundings to determine optimal routes, decide if an object will fit in a specific location, or calculate the distance of our morning run. Of course, these analyses are not limited to our personal lives but appear regularly in business. 

Virtually every industry can employ geospatial data in some form. Whether a simple mapping exercise, determining the optimal route between a series of points, or performing a site suitability analysis, geospatial problems run the gamut from simple to exceptionally complex.

In this blog, we will focus on a single type of geospatial analysis: processing point cloud data generated from LiDAR scans to assess changes in the landscape between two points in time. 

LiDAR point cloud data sets can be truly massive–the data set we will showcase here contains over 100 billion points. We will rely on Snowflake Data Cloud, a powerful cloud-based database, to process large amounts of data.

What is LiDAR?

LiDAR, LIght Detection And Ranging, is a remote sensing technique used to detect minute changes in elevation that can often go unnoticed by the unaided human eye. A LiDAR scan is produced by emitting light pulses from a device usually mounted on a drone, airplane, or other vehicle. 

The pulses of light reflect off the ground and return to the device, at which point the transit time of the light is recorded. From this information, high-resolution contours of the scanned area can be calculated.

LiDAR scans are used in a wide array of applications. Self-driving vehicles can employ LiDAR scanning for obstacle detection and avoidance. When you see a police officer pointing a device at you and then pulling you over for doing 75 in a 45, they are likely using a LiDAR device to determine your speed. 

The distance from the Earth to the Moon can be calculated to millimeter precision using LiDAR pulses bounced off mirrors left on the Moon during the Apollo missions.

In our example, we will use LiDAR point cloud data to analyze changes in the landscape after category 5 Hurricane Michael blew through the Florida Panhandle in 2018.

Why Snowflake?

Snowflake’s capabilities in processing massive point cloud data sources are further enhanced by its specialized geospatial functions and Snowpark for Python scripting. These features offer a powerful toolkit for handling and analyzing complex geospatial datasets.

First, Snowflake’s support for geospatial data is evident in its suite of ST_ functions. These functions are designed to manage and query spatial data efficiently. They enable users to perform a variety of spatial operations, such as calculating distances, creating buffers, and determining spatial relationships between different geometries. 

This is particularly beneficial when working with LiDAR data, as it allows for the extraction of specific geospatial features and insights from the raw point cloud data. The ST_ functions in Snowflake make geospatial analysis more accessible and efficient, allowing analysts to focus on deriving insights rather than getting bogged down in data processing complexities.

In addition to the geospatial functions, Snowpark for Python is another key feature that enhances Snowflake’s capabilities. Snowpark is a developer framework that allows data scientists and engineers to write and execute Python code directly within Snowflake. This integration is crucial for handling LiDAR data, as it provides the flexibility to use familiar Python libraries and tools for data processing and analysis. 

With Snowpark, users can directly leverage Python’s extensive geospatial analysis ecosystem within the Snowflake environment, including libraries such as Pandas, NumPy, and others. This seamless integration significantly streamlines the workflow, as it eliminates the need for moving data between different environments for processing and analysis.

Snowflake’s ST_ functions and Snowpark for Python scripting form a comprehensive toolset for processing and analyzing LiDAR point cloud data. These features not only enhance Snowflake’s native capabilities but also open up new possibilities for sophisticated geospatial analysis, making it an ideal platform for organizations dealing with large-scale spatial data.

An Example

In the wake of Hurricane Michael in 2018, a significant challenge faced by disaster response and recovery teams in Florida was assessing the extensive damage caused by the storm, particularly in terms of downed trees and power lines. 

Leveraging advanced data processing capabilities, we conducted a detailed analysis using LiDAR. This case study explores how Snowflake, a cloud data platform, was instrumental in handling and analyzing the vast amounts of LiDAR data. 

The ability to efficiently process and analyze such large datasets was key to gaining timely and actionable insights into the hurricane’s impact, demonstrating the potential of modern data platforms in disaster response and environmental analysis.

The following steps outline the technical approach in this LiDAR data analysis, providing a clear picture of the process from data importation to detailed elevation change analysis.

Step One: Getting the Data

The analysis commenced with the importation of LAZ files into Snowflake. These files, a compressed variant of the LAS format, are particularly suited for LiDAR data, balancing efficient storage with data integrity. This step was key in managing the extensive volume of LiDAR data. 

Utilizing Snowflake, renowned for its robust data handling and scalability, enabled seamless integration of these large datasets. The platform’s ability to manage and process large volumes of data effectively was crucial, ensuring that subsequent steps in the analysis, such as metadata extraction and detailed elevation change analysis, could be conducted efficiently and precisely.

Here, you can see that the LiDAR point clouds were spread out over 2.2 million individual files for a total size of well over 600 gigabytes. As we will see later, these 2.2 million files contained over 100 billion records–this is Big Data.

Step Two: Extracting Metadata from LAZ Files

Following the importation of LAZ files into Snowflake, the next critical step was the extraction of metadata, which was accomplished using Snowpark Container Services alongside libraries like PDAL and LASPY. 

These tools are integral for processing LiDAR data. PDAL (Point Data Abstraction Library) is adept at handling point cloud data, while LASPY specializes in reading and writing LAS files in Python, making them well-suited for the task.

The metadata extracted from the LAZ files, such as bounding boxes and coordinate reference systems, provides crucial contextual information. Bounding boxes help define the spatial confines of the data points, outlining the area covered by the LiDAR scan. 

This is particularly important for narrowing down the analysis to specific regions affected by Hurricane Michael. Meanwhile, coordinate reference systems ensure that each data point can be accurately mapped to real-world geographical locations, which is vital for precise analysis and overlaying with other geographical data.

This metadata extraction process lays the groundwork for more targeted and efficient data processing. By understanding the spatial limits and geographical context of the data, subsequent steps, such as defining the study area with shapefiles and extracting relevant point data, become more streamlined and accurate.

Step Three: Defining the Study Area Using Shapefiles

Defining the study area was a crucial step in this LiDAR data analysis, and it was achieved using shapefiles. This case study focused on Leon County, Florida, a region significantly impacted by Hurricane Michael. 

Shapefiles are a standard format in Geographic Information Systems (GIS) for storing the location, shape, and attributes of geographic features. They are particularly valuable in scenarios like this, where precise geographical delineation is required within a large dataset.

The use of shapefiles also facilitated the integration of LiDAR data with other geographic data layers. This integration is essential for comprehensive analysis, as it allows for the correlation of LiDAR data with additional geographical information, such as infrastructure maps, land use data, and historical records. 

In the context of Hurricane Michael, the LiDAR data could be effectively used to identify and assess the damage to specific areas, such as residential neighborhoods, roads, and power lines within Leon County.

Using the extent of Leon County, we were able to filter a highway shapefile to only those roads within the county. With this set of filtered highways, we can target only the LiDAR scans covering the interest areas. By targeting our analysis on the smallest possible areas, we see significant performance improvements compared to analyzing the entire set of LiDAR scans.

Step Four: Extracting Point Data from LAZ Files

Extracting point cloud data from LAZ files was a crucial step in the analysis, utilizing Snowpark Container Services and Python libraries. Snowpark provided the computational environment necessary for this task, while Python’s robust libraries, like PDAL and LASPY, facilitated the processing of LAZ files. 

These tools were essential for isolating relevant point cloud data, which represents a detailed 3D view of the surveyed landscape. This efficient extraction of point data is vital as it transforms raw LiDAR data into a usable format for further analysis, setting the foundation for understanding the hurricane’s topographical impact.

As you can see in the image above, we extracted over 100 billion records from the 2.2 million LAZ files. The files were split approximately 50/50 for pre- and post-Hurricane Michael. By comparing the elevation figures for the same location at two different points, we can perform a comparative analysis to see what locations were altered by the hurricane.

Step Five: Refining Point Cloud Data

The LAZ data underwent critical processing steps for cleaning, filtering, and preparing it for detailed analysis. A significant part of this process was the utilization of high-resolution H3 hexagons for spatial organization of the data.

H3, a spatial indexing system that partitions space into uniform hexagons, was instrumental in visualizing the impact of Hurricane Michael. By assigning LiDAR point data to these hexagons, it became easier to identify and analyze storm damage patterns, such as elevation changes or point density indicating downed trees or damaged structures.

This hexagonal mapping allowed for a structured and granular analysis of the data, facilitating a clear visualization of the hurricane’s effects at a high resolution. Such detailed spatial analysis was key in assessing the extent of damage and aiding in effective response planning.

In essence, processing the LAZ data with H3 hexagons transformed the raw LiDAR data into a format amenable to in-depth analysis and enhanced visualization, which is crucial for understanding and responding to the hurricane’s impact.

Conclusion

The in-depth analysis of Hurricane Michael’s aftermath, using LiDAR data processed in Snowflake, serves as a compelling example of the platform’s prowess in handling advanced geospatial tasks. The initial step of importing LAZ files into Snowflake set the stage for a series of complex processing operations. 

The utilization of Snowpark Container Services and Python libraries for extracting and processing point data underscores Snowflake’s compatibility with cutting-edge data processing tools and techniques.

This case study highlights Snowflake’s ability to manage and analyze massive datasets efficiently. By integrating Snowflake with spatial data processing methods, such as the use of H3 hexagons for data visualization, we demonstrated how large-scale, intricate geospatial analyses can be conducted with precision and efficiency. 

The cloud-based platform’s scalability and robust processing capabilities were essential in transforming vast amounts of raw LiDAR data into a structured and analyzable format, enabling a nuanced understanding of the hurricane’s impact.

More than just a data storage solution, Snowflake proved invaluable for conducting sophisticated geospatial analyses. This example illustrates the platform’s potential to support complex environmental studies and disaster response planning. It is a testament to Snowflake’s role in empowering organizations to tackle large-scale, data-intensive challenges confidently and clearly.

As we continue to explore the possibilities of geospatial data analysis, Snowflake stands out as a vital resource for handling the ever-increasing data demands of this field, paving the way for more informed decisions and strategies in environmental and disaster management.

If you need additional help or are curious about leveraging LiDAR data with Snowflake, contact our team of Snowflake experts today for help, guidance, and best practices!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit