Sunday, August 2, 2020

Virtual Gate Crossing

Yet another continuation post regarding Pro, Notebook, and Spark :-). In this notebook, we will demonstrate a parallel, distributed, share-nothing spatial join between a relatively large dataset and a small dataset.

In this case, virtual gates are defined at various locations in a port, and the outcome is an account of the number of crossings of these gates by ships using their AIS target positions.

Note that the join is to a "small" spatial dataset that we can:

  • Broadcast to all the spark workers.
  • Brutly traverse it on each worker, as it is cheaper and faster to do so that spatially index it.

The following are sample gates:


And the following is a sample processed output:

More to come...

No comments: