Friday, July 31, 2020

On ArcGIS Pro, Jupyter Notebook and Apache Spark

Been a while since I posted something, and thank you, faithful reader, for coming back :-)

I'm intending to writing a series of posts on how to use Apache Spark and Machine Learning within a jupyter notebook within ArcGIS Pro.  Yes, you can now start a jupyter notebook instance in ArcGIS Pro to create an amazing data science and data exploration experience. Check out this link to see how to get started with a Jupyter notebook in Pro.  But...my favorite hidden "GeoGem", is that Pro comes with built-in Apache Spark, and y'all know how much I love Spark. People think that Spark is intended for only BigData analytics.  That is so far from the truth. What I love about it, is the frictionless movement of data and analysis locally or remotely and the language fusion.  In my case, I'm using Python, SQL, and Scala.

The usage of Apache Spark in Pro was demonstrated in the publically shared Covid-19 Contact Tracing Application and the Proximity Tracing Application.

In this first notebook, we will start by loading selected features into a Spark dataframe from a local feature class, process the dataframe using Spark SQL, and write the result back to an ephemeral feature class that will be displayed on the map.



Like usual, all the source code can be found here.