Friday, September 6, 2013

BigData GeoEnrichment

What is GeoEnrichment? An example would best describe it. Given a big set of customer location records, I would like each location to be GeoEnriched with the average income of the zip code where that location falls into and with the number of people between the age of 25 and 30 that live in that zip code.

Before GeoEnrichment:
CustId,Lat,Lon

After GeoEnrichment:
CustId,Lat,Lon,AverageIncome,Age25To30

Of course the key to this whole thing is the spatial reference data :-) and there are a lot of search options, such as Point-In-Polygon, Nearest Neighbor and enrichment based on a Drive Time Polygon from each location.

I've implemented two search methods:
  • Point-In-Polygon method
  • Nearest Neighbor Weighted method
The Point-In-Polygon (PiP) method is fairly simple. Given a point, find the polygon it falls into and pull from the polygon feature the selected attributes and add them to the original point.

The Nearest Neighbor Weighted (NNW) method finds all the reference points within a specified distance and weights each point based on its distance. The GeoEnrichment value is the sum of the weighted attribute value.

You can more details about this here, where I've used HBase and Geometry API for Java to perform the GeoEnrichment.

2 comments:

Priya said...

Thanks for the great post.I would like to extend this idea to a web based application.I was able to run this as a standaolne application.I'm not dealing with lot of data, approximately 20mb of polygon shape file and 2 mb of point data(csv).The polygons data is subject to change frequently.I understand, that i need to post this shape file to a shared file system in the server where the web application resides and ShapeFileDataStore can access this thru url. Please advise, if this approach would work.Thanks..Priya

thunderhead said...

you might be interested in http://thunderheadxpler.blogspot.com/2014/01/hadoop-and-shapefiles.html - but be careful of the non-splitabilty of the ShapeInputFormat