
Thunderhead Explorer
Tips and Tricks using GIS BigData, ArcGIS APIs and other fun stuff :-)
Monday, August 24, 2020
On Machine Learning in ArcGIS and Data Preparation using Spark

Monday, August 3, 2020
ArcGIS Pro, Jupyter Notebook and Databricks¶
Sunday, August 2, 2020
Virtual Gate Crossing
Note that the join is to a "small" spatial dataset that we can:
- Broadcast to all the spark workers.
- Brutly traverse it on each worker, as it is cheaper and faster to do so that spatially index it.
Saturday, August 1, 2020
MicroPath Reconstruction of AIS Broadcast Points
Micropathing is the construction of a target's path from a limited set of a consecutive sequence of target points. Typically, the sequence is time-based, and the collection is limited to 2 or 3 target points. The following is an illustration of 2 micropaths derived from 3 target points:
Micropathing is different than path reconstruction, in such that the latter produced one polyline for the path of a target. Path reconstruction losses insightful in-path behavior, as a large number of attributes cannot be associated with the path parts. Some can argue that the points along the path can be enriched with these attributes. However, with the current implementations of Point objects, we are limited to only the extra M and Z to the necessary X and Y. You can also join the PathID and M to a lookup table and gain back that insight, but that joining is typically expensive and is difficult to derive from it the "expression" of the path using traditional mapping. A micropath overcomes today's limitations with today's traditional means to express the path insight better.
So, a micropath is a line composed of typically 2 points only and is associated with a set of attributes that describe that line. These attributes are typical enrichment metrics derived from its two ends. An attribute can be, for example, the traveled distance, time, or speed.
In this notebook, we will construct "clean" micropaths using SparkSQL. What do I mean by clean? As we all know, emitted target points are notoriously affected by noise, so using SparkSQL, we will eliminate that noise during the micropath construction.
Here is a result:
More to come...
Friday, July 31, 2020
On ArcGIS Pro, Jupyter Notebook and Apache Spark
Sunday, May 27, 2018
On Patterns Of Life: From MacroData to PicoData
Insight (or GeoInsight in our case) is lost in the deluge of data that we are acquiring today from everyday sensors that are machine or humanly generated. This project is a set of heuristic Spark based implementations to reveal signals from the movement of ships in and out of the Port of Miami.
The idea is to extract small clean data (PicoData) from the overlap of a massive amount of data (MacroData). The aggregation of "clean" PicoData derived from MacroData trust into the forefront patterns of life.
For example, given the following display of AIS broadcasts:

We can mutate the data to reveal the "clean" influx of ships into the harbor at high tide:

Like usual, you can download all the source code from here.
Monday, January 1, 2018
On ML and Elastic Principle Graphs
Happy 2018 all. It has been a while since my last post. Thank you for your patience dear reader. Like usual, the perpetual resolutions for every year in addition to blogging more are to eat well, often exercise and climb Ventoux.
Onward.
I genuinely believe that 2018 will be the year of the ubiquity of Geo-AI. It will be the year when Machine Learning and Spatial Awareness will blossom inside and mostly outside the GIS community.
We at Esri have had Machine Learning based tools in our "shed" for a long time. Every time an ArcGIS user performs a graphically weighted regression, trains a random trees classifier or detects an emerging hot spot, that user is using a form of Machine Learning without knowing it!
So one of my "missions" for 2018, it to make this knowledge more explicit to our users and non-traditional GIS users. Also, start to implement new forms of Machine Learning.
Machine Learning (ML), a branch of Artificial Intelligence (AI), is a disruptive force that is changing how today's industries are gaining new insight from their data. ML uses math, statistics and probability to find hidden patterns and make predictions from the data without being explicitly programmed. It is this last statement that is disruptive, "No explicit programming"! An ML algorithm iterates "intelligently" over the data, and the patterns emerge. Being iterative, the more data an ML algorithm is exposed to, the more refined the output becomes. Thus the coupling of BigData and ML is a perfect marriage fueled by cheap storage, ever more powerful computational power (think GPU) and faster networking.
This reemergence of this "No Explicit Programming" paradigm such as Deep Learning, Reinforcement Learning, and Self Organization is skyrocketing the likes of Google's AlphaGo-Zero, Facebook, and Uber.
So, I am starting this launch with something I have been fascinated by for quite some time, and that is "Elastic Principle Graphs."
It is a "deep" extension of PCA that I came across it during my research of mapping noisy 2D data to a curve and was fascinated by its self-organization.

After reading, (and rereading for the nth time) this paper, this GitHub repo is a minimalist implementation in Scala.
Happy New Year All.