Monday, August 3, 2020

ArcGIS Pro, Jupyter Notebook and Databricks¶

Yet another post in the continuing saga of the usage of Apache Spark from a Jupyter notebook within ArcGIS Pro.

In the previous posts, the execution was always within the ArcGIS Pro environment on a single machine, albeit taking advantage of all the cores of that machine.  Here, we take a different angle, the execution is performed on a remote cluster of machines in the cloud.

So, we author the notebook locally, but we execute it remotely.

In this notebook, we demonstrate the spatial binning of AIS broadcast points on a Databricks cluster on Azure. In addition, to colocate the data storage with the execution engine for performance purposes, we converted the local feature class of the AIS broadcast points to a parquet file and placed it in the Databricks distributed file system.

More to come :-)

No comments: