Thursday, July 13, 2017

Deploying a MongoDB Sharded Cluster using Kubernetes StatefulSets on GKE

[Part 4 in a series of posts about running MongoDB on Kubernetes, with the Google Kubernetes Engine (GKE). For this post, a newer GitHub project gke-mongodb-shards-demo has been created to provide an example of a scripted deployment for a Sharded cluster specifically. This gke-mongodb-shards-demo project also incorporates the conclusions from the earlier posts in the series. Also see: http://k8smongodb.net/]


Introduction

In the previous posts of my blog series (1, 2, 3), I focused on deploying a MongoDB Replica Set in GKE's Kubernetes environment. A MongoDB Replica Set provides data redundancy and high availability, and is the basic building block for any mission critical deployment of MongoDB. In this post, I now focus on the deployment of a MongoDB Sharded Cluster, within the GKE Kubernetes environment. A Sharded Cluster enables the database to be scaled out over time, to meet increasing throughput and data volume demands. Even for a Sharded Cluster, the recommendations from my previous posts are still applicable. This is because each Shard is a Replica Set, to ensure that the deployment exhibits high availability, in addition to scalability.


Deployment Process

My first blog post on MongoDB & Kubernetes observed that, although Kubernetes is a powerful tool for provisioning and orchestrating sets of related containers (both stateless and now stateful), it is not a solution that caters for every required type of orchestration task. These tasks are invariably technology specific and need to operate below or above the “containers layer”. The example I gave in my first post, concerning the correct management of a MongoDB Replica Set's configuration, clearly demonstrates this point.

In the modern Infrastructure as Code paradigm, before orchestrating containers using something like Kubernetes, other tools are first required to provision infrastructure/IaaS artefacts such as Compute, Storage and Networking. You can see a clear example of this in the provisioning script used in my first post, showing non-Kubernetes commands ("gcloud"), which are specific to the Google's Compute Platform (GCP), being used first, to provision storage disks. Once containers have been provisioned by a tool like Kubernetes, higher level configuration tasks, such as data loading, system user identity provisioning, secure network modification for service exposure and many other "final bootstrap" tasks, will also often need to be scripted.

With the requirement here to deploy a MongoDB Sharded Cluster, the distinction between container orchestration tasks, lower level infrastructure/IaaS provisioning tasks and higher level technology-specific orchestration tasks, becomes even more apparent...

For a MongoDB Sharded Cluster on GKE, the following categories of tasks must be implemented:

    Infrastructure Level  (using Google's "gcloud" tool)
  1. Create 3 VM instances
  2. Create storage disks of various sizes, for containers to attach to
    Container Level  (using Kubernetes' "kubectl" tool)
  1. Provision 3 "startup-script" containers using a Kubernetes DaemonSet, to enable the XFS filesystem to be used and to disable Huge Pages
  2. Provision 3 "mongod" containers using a Kubernetes StatefulSet, ready to be used as members of the Config Server Replica Set to host the ConfigDB
  3. Provision 3 separate Kubernetes StatefulSets, one per Shard, where each StatefulSet is composed of 3 "mongod" containers ready to be used as members of the Shard's Replica Set
  4. Provision 2 "mongos" containers using a Kubernetes Deployment, ready to be used for managing and routing client access to the Sharded database
    Database Level  (using MongoDB's "mongo shell" tool)
  1. For the Config Servers, run the initialisation command to form a Replica Set
  2. For each of the 3 Shards (composed of 3 mongod processes), run the initialisation command to form a Replica Set
  3. Connecting to one of the Mongos instances, run the addShard command, 3 times, once for each of the Shards, to enable the Sharded Cluster to be fully assembled
  4. Connecting to one of the Mongos instances, under localhost exception conditions, create a database administrator user to apply to the cluster as a whole

The quantities of resources highlighted above are specific to my example deployment, but the types and order of provisioning steps apply regardless of deployment size.

In the accompanying example project, for brevity and clarity, I use a simple Bash shell script to wire together these three different categories of tasks. In reality, most organisations would use more specialised automation software, such as Ansible, Puppet, or Chef, for example, to glue all such steps together.


Kubernetes Controllers & Pods

The Kubernetes StatefulSet definitions for the MongoDB Config Server "mongod" containers and the Shard member "mongod" containers hardly differ from those described in my first blog post.

Below is an excerpt of the StatefulSet definition for each Config Server mongod container (shard specific addition highlighted in bold):

containers:
  - name: mongod-configdb-container
    image: mongo
    command:
      - "mongod"
      - "--port"
      - "27017"
      - "--bind_ip"
      - "0.0.0.0"
      - "--wiredTigerCacheSizeGB"
      - "0.25"
      - "--configsvr"
      - "--replSet"
      - "ConfigDBRepSet"
      - "--auth"
      - "--clusterAuthMode"
      - "keyFile"
      - "--keyFile"
      - "/etc/secrets-volume/internal-auth-mongodb-keyfile"
      - "--setParameter"
      - "authenticationMechanisms=SCRAM-SHA-1"

Below is an excerpt of the StatefulSet definition for each Shard member mongod container (shard specific addition highlighted in bold):

containers:
  - name: mongod-shard1-container
    image: mongo
    command:
      - "mongod"
      - "--port"
      - "27017"
      - "--bind_ip"
      - "0.0.0.0"
      - "--wiredTigerCacheSizeGB"
      - "0.25"
      - "--shardsvr"
      - "--replSet"
      - "Shard1RepSet"
      - "--auth"
      - "--clusterAuthMode"
      - "keyFile"
      - "--keyFile"
      - "/etc/secrets-volume/internal-auth-mongodb-keyfile"
      - "--setParameter"
      - "authenticationMechanisms=SCRAM-SHA-1"

For the Shard's container definition, the name of the specific Shard's Replica Set is declared. This Shard definition will result in 3 mongod replica containers being created for the Shard Replica Set, called "Shard1RepSet". Two additional and similar StatefulSet resources also have to be defined, to represent the second Shard ("Shard2RepSet") and third Shard ("Shard3RepSet") too.

To provision the Mongos Routers, a StatefulSet is not used. This is because neither persistent storage nor a fixed network hostname are required. Mongos Routers are stateless and, to a degree, ephemeral. Instead, a Kubernetes Deployment resource is defined, which is the Kubernetes preferred approach for stateless services. Below is the Deployment definition for the Router mongos container:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: mongos
spec:
  replicas: 2
  template:
    spec:
      volumes:
        - name: secrets-volume
          secret:
            secretName: shared-bootstrap-data
            defaultMode: 256
      containers:
        - name: mongos-container
          image: mongo
          command:
            - "numactl"
            - "--interleave=all"
            - "mongos"
            - "--port"
            - "27017"
            - "--bind_ip"
            - "0.0.0.0"
            - "--configdb"
            - "ConfigDBRepSet/mongod-configdb-0.mongodb-configdb-service.default.svc.cluster.local:27017,mongod-configdb-1.mongodb-configdb-service.default.svc.cluster.local:27017,mongod-configdb-2.mongodb-configdb-service.default.svc.cluster.local:27017"
            - "--clusterAuthMode"
            - "keyFile"
            - "--keyFile"
            - "/etc/secrets-volume/internal-auth-mongodb-keyfile"
            - "--setParameter"
            - "authenticationMechanisms=SCRAM-SHA-1"
          ports:
            - containerPort: 27017
          volumeMounts:
            - name: secrets-volume
              readOnly: true
              mountPath: /etc/secrets-volume

The actual structure of this resource definition is not a radical departure from the definition of a StatefulSet. A different command has been declared ("mongos", rather than "mongod"), but the same base container image has been referenced (mongo image from Docker Hub). For the "mongos" container, it is still important to enable authentication and to reference the generated cluster key file. Specific to the "mongos" parameter list is "--configdb", to specify the URL of the "ConfigDB" (the 3-node Config Server Replica Set). This URL will be connected to by each instance of the mongos router, to enable discovery of the Shards in the cluster and to determine which Shard holds specific ranges of the stored data. Due to the fact that the "mongod" containers, used to host the "ConfigDB", are deployed as a StatefulSet, their hostnames remain constant. As a result, a fixed URL can be defined in the "mongos" container resource definition, as shown above.

Once the  "generate" shell script in the example project has been executed, and all the Infrastructure, Kubernetes and Database provisioning steps have successfully completed, 17 different Pods will be running, each hosting one container. Below is a screenshot showing the results from running the "kubectl" command, to list all the running Kubernetes Pods:
This tallies up with what Kubernetes was asked to be deployed, namely:
  • 3x DaemonSet startup-script containers (one per host machine)
  • 3x Config Server mongod containers
  • 3x Shards, each composed of 3 replica mongod containers
  • 2x Router mongos containers
Only the "mongod" containers for the Config Servers and the Shard Replicas have fixed and deterministic names, because these were deployed as Kubernetes StatefulSets. The names of the other containers reflect the fact that they are regarded as stateless, disposable and trivial to re-create, on-demand, by Kubernetes.

With the full deployment generated and running, it is a straight forward process to connect to the sharded cluster to check its status. The screenshot below shows the use of the "kubectl" command to open a Bash shell connected to the first "mongos" container. From the Bash shell, the "mongo shell" has been opened, connecting to the local "mongos" process running in the same container.  Before the command to view the status of the Sharded cluster has been run, the database has first been authenticated with, using the "admin" user.
The output of the status command shows that the URLs of all three Shards that have been defined (each is a MongoDB Replica Set). Again, these URLs remain fixed by virtue of the use of Kubernetes StatefulSets for the "mongod" containers that compose each Shard.

UPDATE 02-Jan-2018: Since writing this blog post, I realised that because the mongos routers ideally require stable hostnames, to be easily referenceable from the app tier, the mongos router containers should also be declared and deployed as a Kubernetes StatefulSet and Service, rather than a Kubernetes Deployment. The GitHub project gke-mongodb-shards-demo, associated with this blog post, has been changed to reflect this.


Summary

In this blog post I’ve shown how to deploy a MongoDB Sharded Cluster using Kubernetes with the Google Kubernetes Engine. I've mainly focused on the high level considerations for such deployments, rather than listing every specific resource and step required (for that level of detail, please view the accompanying GitHub project). What this study does reinforce, is that a single container orchestration framework (Kubernetes in this case) does not cater for every step required to provision and deploy a highly available and scalable database, such as MongoDB. I believe that this would be true for any non-trivial and mission critical distributed application or set of services. However, I don't see this is a bad thing. Personally, I value flexibility and choice, and having the ability to use the right tool for the right job. In my opinion, a container orchestration framework that tries to cater for things beyond its obvious remit would be the wrong framework. A framework that attempts to be all things to all people, would end up diluting its own value, down to the lowest common denominator. At the very least, it would become far too prescriptive and restrictive. I feel that Kubernetes, in its current state, strikes a suitable and measured balance, and with its StatefulSets capability, provides a good home for MongoDB deployments.


Song for today: Three Days by Jane's Addiction

12 comments:

Unknown said...

Hi, this is great work! As a suggestion, I think it would be great to have an analysis of different failure scenarios and how to handle it on a sharded deployment on kubernetes... For instance, what happens if a node in the cluster simply disappear? How to recover from that? There are many others scenarios and I think it would be great to have such analysis!

Unknown said...

Hi, thanks for sharing your knowledge and experience on both kubernetes and mongodb.
I'm currently trying to set a mongo sharded cluster using a kubernetes cluster (as you did) but on bare-metal server. I tried to use a custom version of your script from your github project but I'm stuck on a very annoying problem : mongo services cannot reach each other ("connection refused"). kubernetes dns services and pods are running and the only log I get from kube-dns is "could not find endpoint for service 'mongodb-[service]'".Yet endpoints for thoses services are created.
Could you lend me a hand on this please ?

Unknown said...

Hi Paul,

Thanks for the amazing tutorials.
Please share me the connection string for mongodb.

Paul Done said...

@"mathob jehanno", if you are using MongoDB version 3.6+ then due to the change in default behaviour of mongod binding and listening on localhost only, you need to set the bind_ip explicitly in the mongod start-up params. See "--bind_ip" example in https://github.com/pkdone/gke-mongodb-shards-demo/blob/master/resources/mongodb-maindb-service.yaml

Also this repo may be of interest: https://github.com/pkdone/mongo-multi-svr-generator

Unknown said...

@"Paul Done" Thanks for your answer, it helped me moving forward, yet I've another problem, I'll try to figure out myself. The link you provided will probably help me.

Paul Done said...

@"Lathesh Karkera", I just updated the blog post. See comment "UPDATE 02-Jan-2018" added to the above post with a description. Using the updated github project the hostnames/ports for the mongos routers that can be referenced in the connection url will be: "mongos-router-0.mongos-router-service.default.svc.cluster.local:27017" and "mongos-router-1.mongos-router-service.default.svc.cluster.local:27017".

Unknown said...

Thank you so much Paul.
I want to deploy mongodb and mysql as containers in GKE(Google Kubernetes Engine)

I succeeded in deploying in single cluster in same region. But requirement is to deploy in different region and have a data synchronization.

Example: Assume there are 2 regions us-central and us-east If customer does any operation on database it has to reflect on both the region at same time.

How I can implement this using kubernetes container engine?

What are all the Pros and Cons?

Unknown said...

Hi Lathesh, I've not looked at multi-dc / multi-regions options for k8s yet. I'll be interested to find out what you discover. Paul

Anonymous said...

Hi Paul,

fantastic set of tutorials - so helpful to get a nicely distributed & scalable deployment of MongoDB.

It is probably my misunderstanding of Kubernetes, but I want to expose this cross-cluster and externally (ie: to connect and explore the data from something like Compass from my desktop) but seem to be having difficulties. From what I can understand, the entry point will be the mongos router service, but how is this exposed to the big wide world... :)

Again, thanks for a great set of articles - still learning and these were just incredible for me to understand how it all comes together.

cheers, Matt

MJY said...

Thanks for sharing this blog, it was very useful to get started with mongoDB in k8s.

On github repo : https://github.com/pkdone/gke-mongodb-shards-demo

one of the generate script might end up in infinite loop :
kubectl exec mongod-shard1-0 -c mongod-shard1-container -- mongo --quiet --eval 'while (rs.status().hasOwnProperty("myState") && rs.status().myState != 1) { print("."); sleep(1000); };'

Here when rs.initiate() is done, any shard or config server can be primary ( myState value=1)or secondary (myState value=2). so how can we assure that mongod-shard1-0 with mystate != 1 if it's a primary shard?

Dov Amir said...

I saw that you also did a tutorial on running MongoDB Deployment Demo for Kubernetes on Azure ACS
https://github.com/pkdone/azure-acs-mongodb-demo

any chance of doing the same thing for runnning Sharded mongodb on AKS ?

thanks

Unknown said...

Here when rs.initiate() is done, any shard or config server can be primary ( myState value=1)or secondary (myState value=2). so how can we assure that mongod-shard1-0 with mystate != 1 if it's a primary shard?

Does anybody solved it?