Sunday, April 16, 2017

On Machine Learning and General Path Recognition

This is part II of my journey back into SOMs. Actually this journey started with the below picture:

Map

It is a map of AIS broadcast points around the harbor of Miami, FL. Now we, sentient creatures, when we look at this map we can quickly and clearly see patterns formed by the points. There is a sequence of points that start from the harbor and go northeast.  There is a sequence of points that starts at the harbor and go south southeast. And there are a plenty of north south paths, some are very close to the shore, others are on the "edge" to the east. And there are path in the "middle".

Wouldn't it be wonderful if the Machine can see these pattern and formulate the general paths? That is actually what started this journey. I needed an unsupervised way for the Machine to recognize the patterns and emit the paths. I'm sure there are multiple ways to solve this, but I remembered that a while back I used Self Organizing Maps due to their simplicity and crucially for belonging to a class of unsupervised machine learning algorithms. So, in Part I, I reacquainted myself with SOMs and in this part, I completed the journey by showing the paths.

However faithful reader, I have not been totally honest with you. Please forgive me. This is part III in this journey. Along the way, I diverted a bit, as I needed a way to assemble tracks from targets. There exist a hidden gem in this project. The PathFinder application is an important stop on this journey, as in addition to assembling tracks from targets, it quantizes the path into grid cells. The quantization of paths is the linchpin between the raw targets points and the unsupervised path detection.

The below picture will help in my explanation:

Track

If a virtual grid is overlaid on the map (the grid cells in the above map are coarse on purpose for illustration purposes), then a linear vector that represents a path can be composes by scanning the grid cells from left to right and then from top to bottom.  The existence of targets in a cell is the binary value of the element in the vector.  So in the above case, the path will be represented by the vector [0,0,0,0,1,0,0,0,1,1,1,1,0,0,0,1]. Side note: In this implementation the vector is composed of binary values, however, a vector with real numbers can be composed where the element value is proportional to the number of targets. In addition, the cell values can "bleed" to neighboring cells in say a gaussian way for better path recognition. Will have to come back to this one day. A Master's thesis can be made out this.

Now that we can compose a set of vectors from a set of targets, we can train the SOM with these vectors. The below figure is the visual representation of a 3x3 SOM result. Each sub map is a visual representation of the settled weights of a SOM node where each node weight is a linear representation of a quantized grid as described above.

Fig2

We can see the path patterns that have self organized, and they do reflect what we have implicitly seen in the first map as humans. I highlighted in red the cells in each map with the most target associations thus forming a path. Isn't it amazing ?

Like usual, you can download all the source code for this from here.

Sunday, April 2, 2017

On Machine Learning with Self Organizing Maps

Self Organizing Map (SOM) is a form of Artificial Neural Network (ANN) belonging to a class of Machine Learning. AI Junkie has a GREAT tutorial about it. What I like about SOMs is that they belong to a class of unsupervised learning models and they hold true to the first law of geography.

"Everything is related to everything else, but near things are more related than distant things." - Tobler

I encountered them and used them over 20 years ago, and since AI/ML is the hottest topic these days, I'm reacquainting myself with them. There are plenty of SOM libraries, but I learn (or in this case re-learn) by doing.  This project is my learning journey in implementing SOMs and "Sparkyfing" them.

The following is a sample output of the obligatory RGB classifier, where a million random RGB triples are organized by a Spark based SOM into a 10x10 square lattice:

Som2

And the following is a sample solution to a TSP using SOM:

Tsp

Like usual, all the source code can be found here.

Monday, March 27, 2017

ArcGIS, Spark and Alluxio Integration

There exist a plethora of backend distributed data stores. I am always using S3 or Hadoop HDFS or OpenStack Swift with my GIS applications to read from these backends geospatial data or to save into these backends my data. Some of these distributed data stores are not natively supported by the ArcGIS platform. However, the platform can be extended with ArcPy to handle these situations. Depending on the data store, I will have to use a different API (mostly Python based) to read and write geospatial information. This is where Alluxio comes in very handy. It provides an abstract layer between the application and the data store and (here is the best part), it caches this information in memory in a distributed and resilient-to-failure manner. So, at the application level, the code to access the data is invariant. On the backend, I can configure Alluxio to use either S3, HDFS or SWIFT. Finally, the advent of a REST endpoint in Alluxio eases the integration with ArcGIS to write, read and visualize Geospatial data.

img-alternative-text
img-alternative-text

Like usual, all the source code for this integration can be found here.

Saturday, March 18, 2017

ArcGIS, Spark & MemSQL Integration

Just got back from the fantastic Strata + Hadoop 2017 conference where the topics ranged from BigData, Spark to lots of AI/ML and not so much on Hadoop explicitly, at least not in the sessions that I attended. I think that is why the conference is renamed Strata + Data from now on as there is more to Hadoop in BigData.

While strolling the exhibition hall, I walked into the booth of our friends at MemSQL and got a BIG hug from Gary. We reminisced about our co-presentations at various conferences regarding the integration of ArcGIS and MemSQL as they natively support geospatial types.

This post is a refresher on the integration with a "modern" twist, where we are using the Spark Connector to ETL geo spatial data into MemSQL in a Docker container. To view the bulk loaded data, ArcGIS Pro is extended with an ArcPy toolbox to query MemSQL, aggregate and view the result set of features on a map.

img-alternative-text
img-alternative-text

Like usual, all the source code can be found here

Monday, March 6, 2017

GeoBinning On IBM Bluemix Spark

This is a proof of concept project to enable ArcGIS Pro to invoke a Spark based geo analytics on IBM Bluemix and view the result of the analysis as features in a map.

img-alternative-text

Check out the source code here

Wednesday, March 1, 2017

Space Time Ripples

Start by looking at this application and that one. Make sure to tilt the map by holding down the right mouse button and sliding the mouse up. Then, slide the bottom slider back and forth to see the data "ripple" through time.

img-alternative-text

This type of visualization is something I have wanted to do for a long time and is now possible with the advent of the new 4.2 ArcGIS API for JavaScript. The new API has "hooks" to enable a developer to invoke WebGL shaders directly, which can render a massive amount of data very efficiently and very quickly.

img-alternative-text

The authoring of the data for the above applications is based on ArcGIS Pro extended with a custom ArcPy based toolbox. The tool queries features from a user selected feature class, bins the features by space and time and emits a space-time "cube" in the form of a Dojo AMD module to be loaded by a JavaScript application. The source of the feature class can be a geodatabase, a relational data store, or the new SpatialTemporal BigData store.

Yes, I should have written a web service to do that, but this is my blog post and leaving that as an exercise for the reader :-)

I have to admit that I am a bit selfish in building the JavaScript application in "mixing" two languages: JavaScript and TypeScript. I wanted to try out the TypeScript extension to our JavaScript API, and I long for a type-safe language when building front end applications like in my olde Flex/AS3 days. It turned out that TypeScript is very cool, especially when used within IntelliJ :-)

Like usual, all the source code can be found here, and I will be talking about it more next week at my presentation at DevSummit. See some of you in Palm Springs.

Wednesday, May 25, 2016

Snapping Points To Lines And ArcGIS Pro

Been wanting to post on this subject for quite some time (actually over a year) as associating a world coordinate with the proper nearby linear feature provides tremendous insight based on the fusion of their attributes. Moreover, doing that on a massive scale and quickly is even more imperative in today's BigData world, thus the usage of Apache Spark. I’ve posted a standalone implementation that relies on well-documented simple math and published methodology to perform searches on massive datasets in batch mode. What is exciting to me in writing this post was the viewing of the snap results in ArcGIS Pro. My lack of knowledge in extending ArcGIS Pro with downloadable Python modules contributed to the delay (and slight case of procrastination :-). However, with the help of a colleague, I was able to pip install modules that can be imported by my custom ArcPy based toolboxes without any errors.

img-alternative-text

Also, since this is all based on BigData, well it has to be tested in a BigData environment. The post describes the usage of Docker and the Cloudera QuickStart container to check the snap and the visualization. The following illustrates my development environment.

img-alternative-text

Like usual, all the source code can be found here.