Sunday, May 27, 2018

On Patterns Of Life: From MacroData to PicoData

Insight (or GeoInsight in our case) is lost in the deluge of data that we are acquiring today from everyday sensors that are machine or humanly generated. This project is a set of heuristic Spark based implementations to reveal signals from the movement of ships in and out of the Port of Miami.

The idea is to extract small clean data (PicoData) from the overlap of a massive amount of data (MacroData). The aggregation of "clean" PicoData derived from MacroData trust into the forefront patterns of life.

For example, given the following display of AIS broadcasts:


We can mutate the data to reveal the "clean" influx of ships into the harbor at high tide:


Like usual, you can download all the source code from here.

Monday, January 1, 2018

On ML and Elastic Principle Graphs

Happy 2018 all. It has been a while since my last post. Thank you for your patience dear reader. Like usual, the perpetual resolutions for every year in addition to blogging more are to eat well, often exercise and climb Ventoux.


I genuinely believe that 2018 will be the year of the ubiquity of Geo-AI. It will be the year when Machine Learning and Spatial Awareness will blossom inside and mostly outside the GIS community.

We at Esri have had Machine Learning based tools in our "shed" for a long time. Every time an ArcGIS user performs a graphically weighted regression, trains a random trees classifier or detects an emerging hot spot, that user is using a form of Machine Learning without knowing it!

So one of my "missions" for 2018, it to make this knowledge more explicit to our users and non-traditional GIS users. Also, start to implement new forms of Machine Learning.

Machine Learning (ML), a branch of Artificial Intelligence (AI), is a disruptive force that is changing how today's industries are gaining new insight from their data. ML uses math, statistics and probability to find hidden patterns and make predictions from the data without being explicitly programmed. It is this last statement that is disruptive, "No explicit programming"! An ML algorithm iterates "intelligently" over the data, and the patterns emerge. Being iterative, the more data an ML algorithm is exposed to, the more refined the output becomes. Thus the coupling of BigData and ML is a perfect marriage fueled by cheap storage, ever more powerful computational power (think GPU) and faster networking.

This reemergence of this "No Explicit Programming" paradigm such as Deep Learning, Reinforcement Learning, and Self Organization is skyrocketing the likes of Google's AlphaGo-Zero, Facebook, and Uber.

So, I am starting this launch with something I have been fascinated by for quite some time, and that is "Elastic Principle Graphs."

It is a "deep" extension of PCA that I came across it during my research of mapping noisy 2D data to a curve and was fascinated by its self-organization.


After reading, (and rereading for the nth time) this paper, this GitHub repo is a minimalist implementation in Scala.

Happy New Year All.

Sunday, April 16, 2017

On Machine Learning and General Path Recognition

This is part II of my journey back into SOMs. Actually this journey started with the below picture:


It is a map of AIS broadcast points around the harbor of Miami, FL. Now we, sentient creatures, when we look at this map we can quickly and clearly see patterns formed by the points. There is a sequence of points that start from the harbor and go northeast.  There is a sequence of points that starts at the harbor and go south southeast. And there are a plenty of north south paths, some are very close to the shore, others are on the "edge" to the east. And there are path in the "middle".

Wouldn't it be wonderful if the Machine can see these pattern and formulate the general paths? That is actually what started this journey. I needed an unsupervised way for the Machine to recognize the patterns and emit the paths. I'm sure there are multiple ways to solve this, but I remembered that a while back I used Self Organizing Maps due to their simplicity and crucially for belonging to a class of unsupervised machine learning algorithms. So, in Part I, I reacquainted myself with SOMs and in this part, I completed the journey by showing the paths.

However faithful reader, I have not been totally honest with you. Please forgive me. This is part III in this journey. Along the way, I diverted a bit, as I needed a way to assemble tracks from targets. There exist a hidden gem in this project. The PathFinder application is an important stop on this journey, as in addition to assembling tracks from targets, it quantizes the path into grid cells. The quantization of paths is the linchpin between the raw targets points and the unsupervised path detection.

The below picture will help in my explanation:


If a virtual grid is overlaid on the map (the grid cells in the above map are coarse on purpose for illustration purposes), then a linear vector that represents a path can be composes by scanning the grid cells from left to right and then from top to bottom.  The existence of targets in a cell is the binary value of the element in the vector.  So in the above case, the path will be represented by the vector [0,0,0,0,1,0,0,0,1,1,1,1,0,0,0,1]. Side note: In this implementation the vector is composed of binary values, however, a vector with real numbers can be composed where the element value is proportional to the number of targets. In addition, the cell values can "bleed" to neighboring cells in say a gaussian way for better path recognition. Will have to come back to this one day. A Master's thesis can be made out this.

Now that we can compose a set of vectors from a set of targets, we can train the SOM with these vectors. The below figure is the visual representation of a 3x3 SOM result. Each sub map is a visual representation of the settled weights of a SOM node where each node weight is a linear representation of a quantized grid as described above.


We can see the path patterns that have self organized, and they do reflect what we have implicitly seen in the first map as humans. I highlighted in red the cells in each map with the most target associations thus forming a path. Isn't it amazing ?

Like usual, you can download all the source code for this from here.

Sunday, April 2, 2017

On Machine Learning with Self Organizing Maps

Self Organizing Map (SOM) is a form of Artificial Neural Network (ANN) belonging to a class of Machine Learning. AI Junkie has a GREAT tutorial about it. What I like about SOMs is that they belong to a class of unsupervised learning models and they hold true to the first law of geography.

"Everything is related to everything else, but near things are more related than distant things." - Tobler

I encountered them and used them over 20 years ago, and since AI/ML is the hottest topic these days, I'm reacquainting myself with them. There are plenty of SOM libraries, but I learn (or in this case re-learn) by doing.  This project is my learning journey in implementing SOMs and "Sparkyfing" them.

The following is a sample output of the obligatory RGB classifier, where a million random RGB triples are organized by a Spark based SOM into a 10x10 square lattice:


And the following is a sample solution to a TSP using SOM:


Like usual, all the source code can be found here.

Monday, March 27, 2017

ArcGIS, Spark and Alluxio Integration

There exist a plethora of backend distributed data stores. I am always using S3 or Hadoop HDFS or OpenStack Swift with my GIS applications to read from these backends geospatial data or to save into these backends my data. Some of these distributed data stores are not natively supported by the ArcGIS platform. However, the platform can be extended with ArcPy to handle these situations. Depending on the data store, I will have to use a different API (mostly Python based) to read and write geospatial information. This is where Alluxio comes in very handy. It provides an abstract layer between the application and the data store and (here is the best part), it caches this information in memory in a distributed and resilient-to-failure manner. So, at the application level, the code to access the data is invariant. On the backend, I can configure Alluxio to use either S3, HDFS or SWIFT. Finally, the advent of a REST endpoint in Alluxio eases the integration with ArcGIS to write, read and visualize Geospatial data.


Like usual, all the source code for this integration can be found here.

Saturday, March 18, 2017

ArcGIS, Spark & MemSQL Integration

Just got back from the fantastic Strata + Hadoop 2017 conference where the topics ranged from BigData, Spark to lots of AI/ML and not so much on Hadoop explicitly, at least not in the sessions that I attended. I think that is why the conference is renamed Strata + Data from now on as there is more to Hadoop in BigData.

While strolling the exhibition hall, I walked into the booth of our friends at MemSQL and got a BIG hug from Gary. We reminisced about our co-presentations at various conferences regarding the integration of ArcGIS and MemSQL as they natively support geospatial types.

This post is a refresher on the integration with a "modern" twist, where we are using the Spark Connector to ETL geo spatial data into MemSQL in a Docker container. To view the bulk loaded data, ArcGIS Pro is extended with an ArcPy toolbox to query MemSQL, aggregate and view the result set of features on a map.


Like usual, all the source code can be found here

Monday, March 6, 2017

GeoBinning On IBM Bluemix Spark

This is a proof of concept project to enable ArcGIS Pro to invoke a Spark based geo analytics on IBM Bluemix and view the result of the analysis as features in a map.


Check out the source code here