Getting Started with the GCE Discovery Plugin on Google Cloud
Introduction
The discovery module in Elasticsearch is responsible for finding nodes that belong to a cluster and electing a master node. The Google Compute Engine (GCE) discovery plugin uses GCE APIs to perform automatic unicast discovery. In this post, we'll work through setting up a three node Elasticsearch cluster in Google Cloud using the GCE discovery plugin, and create a custom image that can be shared among different projects.
Before we start
To follow along with this blog, there are two prerequisites:
After setting up the project ID and installing the SDK, we have a few more administrative items to do. First, set the default project:
gcloud config set project sherry-test-gce
Then login to Google Cloud:
gcloud auth login
You can also set the default region and zone at this time to omit the --zone
and --region
flags while using the gcloud tool. This is optional. Here's how:
gcloud config set compute/region us-west1-b
To simplify our tasks ahead, we will add the default project and region and omit them from rest of the commands. Also, all commands here that start with gcloud
are executed from the local machine and not on the GCE instances.
A GCE instance with Elasticsearch and GCE discovery plugin
Setup Firewall Rules
We will begin with creating a firewall rule so that the nodes in our Elasticsearch cluster can communicate with each other. Elasticsearch uses the TCP transport module for internal communications; the default port range for the transport module is between 9300 - 9400
. In addition, we will open up port 9200
to expose Elasticsearch APIs over HTTP.
gcloud compute firewall-rules create elasticsearch --direction=INGRESS \ --priority=1000 --network=default --action=ALLOW \ --rules=tcp:9200,tcp:9300-9400 --source-ranges=0.0.0.0/0
Create an instance using gcloud SDK
Now we are ready to set up a GCE instance:
gcloud compute instances create sherry-instance-gce-discovery \ --machine-type=n1-standard-2 --subnet=default \ --min-cpu-platform=Automatic \ --tags=elasticsearch,http-server,https-server \ --image=ubuntu-1604-xenial-v20180306 \ --image-project=ubuntu-os-cloud \ --boot-disk-size=10GB --boot-disk-type=pd-standard \ --boot-disk-device-name=sherry-gce-discovery --scopes=compute-ro
The tags
argument includes the Elasticsearch firewall rule along with http and https, which are optional. Once this process is complete, you should see the name of the instance, the zone it is created in, its internal and external IP, and the status of the instance.
A few things to note:
- The instance we created is called
sherry-instance-gce-discovery
- The instance is running ubuntu-16.04.
- The instance is using
--scopes=compute-ro
. This enables the GCE discovery module to call required APIs.
Now, that we've created the instance, we need to connect to it:
gcloud compute ssh sherry-instance-gce-discovery
If this is the first time you are using gcloud ssh to access a GCE instance, it will create a ssh key for you.
Besides using the gcloud
command line tool, you can connect to the instance using the Google Cloud Web console that is directly accessible from the browser. Another option is to use the external IP address of the instance:
ssh -i ~/.ssh/google_compute_engine 12.34.56.789
Install Java
We will need to install JDK 8 on our new instance:
sudo apt-get update sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer # Setting Java 8 as the default (optional) sudo apt-get install oracle-java8-set-default # Verify Java version java -version
Install Elasticsearch and GCE discovery plugin
Installing Elasticsearch is the next task. Please see install Elasticsearch with Debian Package and install Elasticsearch with RPM for installation details. In our case, we will choose an Ubuntu based image:
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.3.deb sudo dpkg -i elasticsearch-6.2.3.deb
Once Elasticsearch is ready, install the GCE discovery plugin. By default, the elasticsearch-plugin
command can be found at /usr/share/elasticsearch/bin
for both deb and rpm distributions.
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install discovery-gce
Now, add the following configurations to the /etc/elasticsearch/elasticsearch.yml
file:
# By default, the cluster name is elasticsearch cluster.name: lannister # Use ${HOSTNAME} to set the node name dynamically node.name: ${HOSTNAME} # Using GCE internal hostname to bind the network services network.host: _gce:hostname_ # Set the minimum master eligible nodes to avoid the split brain issue discovery.zen.minimum_master_nodes: 2 # Specific GCE discovery plugin settings cloud.gce.project_id: sherry-test-gce cloud.gce.zone: us-west1-b discovery.zen.hosts_provider: gce
All of the GCE discovery plugin settings here are required. Let's take a quick look at them:
cloud.gce.project_id
- The Google project ID that we are using
cloud.gce.zone
- The GCE zone where the Elasticsearch nodes will live
discovery.zen.hosts_provider
- We are using the GCE discovery mechanism
In the near future, we are planning to make the GCE discovery setup even easier by auto-discovering some of these configurations.
Start Elasticsearch
We are ready to start Elasticsearch and verify everything is working.
sudo /etc/init.d/elasticsearch start
Verify the instance is working as expected by tailing the Elasticsearch cluster log to ensure Elasticsearch has started:
tail /var/log/elasticsearch/lannister.log
Then use curl
to connect to the node and verify the service is running:
curl ${HOSTNAME}:9200
Note, the above curl
command can only run within the Google Cloud Platform. By default, nothing is opened to internet. Log off from the GCE instance as our work here is done.
Create a GCE custom image
We will use the instance that we built to make a custom image. We can share the image with other projects to create additional GCE instances with Elasticsearch and GCE discovery plugin installed. Although you can use the gcloud compute images export command to create a custom image from a virtual disk, we choose to use the manual method so we can clean up the files on the disk before packaging it.
Using our local environment, create a storage bucket to hold the disk image from our instance:
gsutil mb -p sherry-test-gce -c regional -l us-west1 gs://sherry-test-bucket/
We will stop the instance sherry-instance-gce-discovery
as well. This is an optional step but ensures the integrity of the disk content in the snapshot.
gcloud compute instances stop sherry-instance-gce-discovery
Take a snapshot of the instance/disk:
gcloud compute disks snapshot sherry-instance-gce-discovery \ --snapshot-names=sherry-gce-discovery-snapshot
Use the snapshot to build a new GCE disk that we will use to create an image later.
gcloud compute disks create sherry-gce-discovery-disk \ --source-snapshot sherry-gce-discovery-snapshot
Make a temporary disk that is at least 50% larger than the original image disk size. We will store the raw image of the disk from the instance sherry-instance-gce-discovery
and the compressed version that we will be creating on the temporary disk. Since our original disk is 10 GB, our temporary one will be 15 GB.
gcloud compute disks create sherry-gce-discovery-temp-disk --size 15GB
We will create a temporary GCE instance with the storage-rw
scope and attach both the disk from the snapshot we made earlier and the temporary disk.
gcloud compute instances create sherry-temp-instance --scopes storage-rw \ --disk name=sherry-gce-discovery-disk,device-name=sherry-gce-discovery-disk \ --disk name=sherry-gce-discovery-temp-disk,device-name=sherry-gce-discovery-temp-disk
Connect to the new instance:
gcloud compute ssh sherry-temp-instance
Once on the instance, list the disks and disk partitions available:
ls /dev/disk/by-id/
Now, format and mount the temporary disk:
sudo mkdir /mnt/tmp sudo mkfs.ext4 -F /dev/disk/by-id/google-sherry-gce-discovery-temp-disk sudo mount -o discard,defaults /dev/disk/by-id/google-sherry-gce-discovery-temp-disk /mnt/tmp
Mount the image disk so we can modify and removed files:
sudo mkdir /mnt/image-disk sudo mount /dev/disk/by-id/google-sherry-gce-discovery-disk-part1 /mnt/image-disk
Delete the Elasticsearch data directory. Be default, it is located at /mnt/image-disk/var/lib/elasticsearch/nodes
. Each new instance will create its own Elasticsearch data directory when the Elasticsearch service starts the first time.
Remove SSH key. The authorized_keys file is located in the /mnt/image-disk/home/[USER]/.ssh
directory. This step is not required but highly recommended.
Unmount the image disk:
sudo umount /mnt/image-disk/
Create a disk.raw from the image disk on the temporary disk. Please note, it must be called disk.raw
.
sudo dd if=/dev/disk/by-id/google-sherry-gce-discovery-disk of=/mnt/tmp/disk.raw bs=4096
Create a tar file of the disk.raw
file.
cd /mnt/tmp sudo tar czvf sherry-gce-discovery-disk.tar.gz disk.raw
Copy the tar file to the bucket we created earlier from the GCE instance directly:
gsutil cp /mnt/tmp/sherry-gce-discovery-disk.tar.gz gs://sherry-test-bucket
Log off the temporary instance after the file transfer is done.
Finally, make a custom image that we can use to create new GCE instances:
gcloud compute images create sherry-gce-discovery-image \ --source-uri=https://storage.googleapis.com/sherry-test-bucket/sherry-gce-discovery-disk.tar.gz
Create instances based on our custom image
We are ready to create three new instances from the images we just built.
gcloud compute instances create sherry-instance-1 \ --machine-type=n1-standard-2 --subnet=default \ --min-cpu-platform=Automatic --tags=elasticsearch,http-server,https-server \ --image=sherry-gce-discovery-image --image-project=sherry-test-gce \ --boot-disk-size=20GB --boot-disk-type=pd-standard \ --boot-disk-device-name=sherry-instance-1 --scopes=compute-ro gcloud compute instances create sherry-instance-2 \ --machine-type=n1-standard-2 --subnet=default \ --min-cpu-platform=Automatic --tags=elasticsearch,http-server,https-server \ --image=sherry-gce-discovery-image --image-project=sherry-test-gce \ --boot-disk-size=20GB --boot-disk-type=pd-standard \ --boot-disk-device-name=sherry-instance-2 --scopes=compute-ro gcloud compute instances create sherry-instance-3 \ --machine-type=n1-standard-2 --subnet=default \ --min-cpu-platform=Automatic --tags=elasticsearch,http-server,https-server \ --image=sherry-gce-discovery-image --image-project=sherry-test-gce \ --boot-disk-size=20GB --boot-disk-type=pd-standard \ --boot-disk-device-name=sherry-instance-3 --scopes=compute-ro
Once completed, logon to each instance and start Elasticsearch. Please see Elasticsearch installation instructions to configure Elasticsearch to start automatically.
From the instance sherry-instance-1
, we can run curl sherry-instance-1:9200/_cat/nodes
. We should see all three nodes in our cluster listed in the output.
10.138.0.8 14 29 6 0.10 0.11 0.07 mdi * sherry-instance-1 10.138.0.9 17 29 5 0.14 0.17 0.08 mdi - sherry-instance-2 10.138.0.10 13 29 5 0.13 0.22 0.10 mdi - sherry-instance-3
Conclusion
In a future blog, we will install and configure X-Pack to ensure our cluster is secure and production ready and the Google Cloud Storage repository plugin to provide the snapshot and restore capability. If you prefer the convenience of a managed cluster, Elastic Cloud, Elastic's official hosted solution is available on GCP. It comes with all of the X-Pack features and automatic snapshots.