Friday, August 21, 2015

A Whale and a Python GeoSearching on a Photon Wave

In the last post, we walked through how to setup Elasticsearch in a Docker container and how to bulk load the content of an ArcGIS feature class into ES, in such that it can be spatially searchable from an ArcPy based tool.

There was something nagging me about my mac development environment, as I was using docker in VirtualBox and ArcGIS Desktop on Windows in WMWare Fusion. I wish I had one unified virtualized environment.

Well, while at MesosCon in Seattle, I stopped by the VMWare booth and the folks there told me about a new project named Photon™. It is "a minimal Linux container host. It is designed to have a small footprint and boot extremely quickly on VMware platforms. Photon™ is intended to invite collaboration around running containerized applications in a virtualized environment.” - That was exactly what I needed, and docker is built into it !

See, what also got me excited, was the fact that in a couple of weeks, I will be visiting a very forward thinking client that is willing to bootstrap a cluster on an on-premise WMWare based cloud with Linux for a BigData project. See, his IT department is a Windows shop and I was going to ask him to install CentOS and yum install docker and all that jazz. As you can imagine, that was going to raise some eyebrows. However, now that Photon™ is made by VMWare, it will trusted by the customer (I hope) to move forward with focusing on the BigData aspect of the project and not be dragged down with Linux installation issues.

The following, is a retrofit of the walk through, but using Photon™. And the best part is….there are no changes due to docker’s universality.

I’m using VMWare Fusion on mac, so I followed these instructions. However, I set up Photon™ with 4 CPUs and 4 GB of RAM.

Once the system was up, I logged in as root, and got the IP address that is bound to eth0 using the ifconfig command.

I created a folder named config, and populated it with the following Elasticsearch configuration files:
$ mkdir config

$ cat << EOF > config/elasticsearch.yml
cluster.name: elasticsearch
index.number_of_shards: 1
index.number_of_replicas: 0
network.bind_host: dev
network.publish_host: dev
cluster.routing.allocation.disk.threshold_enabled: false
action.disable_delete_all_indices: true
EOF

$ cat << EOF > config/logging.yml
es.logger.level: INFO
rootLogger: ${es.logger.level}, console
logger:
  action: DEBUG
  com.amazonaws: WARN
appender:
  console:
    type: console
    layout:
      type: consolePattern
      conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
EOF

Next, I started Elasticsearch in docker:

docker run -d -p 9200:9200 -p 9300:9300 -h dev -v /root/config:/usr/share/elasticsearch/config elasticsearch

And validated that ES is up and running by opening a browser on my mac and navigated to IP_ADDRESS:9200 and got:

{
status: 200,
name:"Longshot",
cluster_name: "elasticsearch",
version: {
 number: "1.7.1",
 build_hash: "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
 build_timestamp: "2015-07-29T09:54:16Z",
 build_snapshot: false,
 lucene_version: "4.10.4"
},
tagline: "You Know, for Search"
}

Excellent! From then on, the walk through is as previously described, but now I have one unified environment and that will be the same when in two weeks I will be on-site.

Final note: I set to yes the value of PermitRootLogin in the /etc/ssh/sshd_config file to able remote login as root into the VM from my mac iTerm. I recommend that you check out the FAQs.

Resources: Update to Docker 1.6