Monday, October 8, 2012

Streaming Big Data For Heatmap Visualization using Storm

A record number of tweets was set during the 2012 Presidential debate. If you wondered how technologically this happened, then Event Stream Processing is your answer.
Actually, Twitter open sourced such an implementation called Storm. Pretty impressive piece of technology! So, I wanted to try it out with a "geo" twist.
To get started, I recommend that you read "Getting Started with Storm".  Here is the challenge at hand, simulate a stream of aircraft flight track targets, in such that a heatmap is generated based on the density of targets at "near" locations.
The topology is very simple in this challenge, a spout reading the targets from an external source (to be defined later) is linked to a bolt that overlays a virtual grid on top of an area of interest.  Each emitted target tuple is mapped to a cell in the grid and the cell weight is decreased or increased when the target leaves or enters a cell.
Storm is a distributed system, that means that spouts and bolts can run on different physical nodes. The grid in the bolt is implemented as a map and since the bolt can to be distributed, then the map implementation has to be distributed too. Enter Hazelcast, it is a "clustering and highly scalable data distribution platform for Java". In addition to maps, a distributed queue implementation is also available.  The spouts (note the plural) use the latter to poll the offered targets and emit them to bolts (plural again :-) to update the density map grid. Here is the source code of the spout and the bolt.
The continuously running topology in this challenge is fronted with a web application container that "feeds" the spout and periodically accumulates the set of non-zero weighted cells from the grid for web client visualization.
The Spring web container is used in this implementation due to the Flex Integration and many many other reasons that I will not go into this post.
The Spring application context, defines:

  •  A bean component scanner
  •  A task scheduler
  •  A Hazelcast instance
  •  A flex message broker
  •  A flex message destination
  •  A Spring message template

Flight tracks are simulated by time stepping the parameters of cubic hermite spline that is bound by starting and ending locations and vectors. A scheduled bean iterates over the set of track parameters and offers to the Hazelcast queue the current position along that track at the current timestamp in the form of a Target instance. When the target reaches the end of the track, the latter is removed from the simulation set.  In addition, that scheduled bean, gets from the Hazelcast grid map the set of non-zero weighted cells and accumulates them in the form of a list that is sent back to the client using the injected message template thought the a message broker.
Onto the client. Upon startup completion, the application subscribes to the message destination and adds a message listener to handle the list of weighed cells. This list is bound to a DensityMapLayer instance that converts the cells in the list into gradient fillings on a bitmap. This enables a very fast and fluid rendering of a large set of data. The overlay of gradient fillings is what generates a heatmap. In addition, a UI Button is added to the stage enabling the user to define random starting and ending locations and vectors that are sent to the Spring container as seed to the Scheduled bean track/target simulator.
I know that this is very technical and I am assuming that you have some background in the above mentioned technologies - I just wanted to show how they can be integrated together seamlessly for BigData streaming - like usual, all the source code is available here.