Introduction
In the previous posts of my blog series (1, 2, 3), I focused on deploying a MongoDB Replica Set in GKE's Kubernetes environment. A MongoDB Replica Set provides data redundancy and high availability, and is the basic building block for any mission critical deployment of MongoDB. In this post, I now focus on the deployment of a MongoDB Sharded Cluster, within the GKE Kubernetes environment. A Sharded Cluster enables the database to be scaled out over time, to meet increasing throughput and data volume demands. Even for a Sharded Cluster, the recommendations from my previous posts are still applicable. This is because each Shard is a Replica Set, to ensure that the deployment exhibits high availability, in addition to scalability.Deployment Process
My first blog post on MongoDB & Kubernetes observed that, although Kubernetes is a powerful tool for provisioning and orchestrating sets of related containers (both stateless and now stateful), it is not a solution that caters for every required type of orchestration task. These tasks are invariably technology specific and need to operate below or above the “containers layer”. The example I gave in my first post, concerning the correct management of a MongoDB Replica Set's configuration, clearly demonstrates this point.In the modern Infrastructure as Code paradigm, before orchestrating containers using something like Kubernetes, other tools are first required to provision infrastructure/IaaS artefacts such as Compute, Storage and Networking. You can see a clear example of this in the provisioning script used in my first post, showing non-Kubernetes commands ("gcloud"), which are specific to the Google's Compute Platform (GCP), being used first, to provision storage disks. Once containers have been provisioned by a tool like Kubernetes, higher level configuration tasks, such as data loading, system user identity provisioning, secure network modification for service exposure and many other "final bootstrap" tasks, will also often need to be scripted.
With the requirement here to deploy a MongoDB Sharded Cluster, the distinction between container orchestration tasks, lower level infrastructure/IaaS provisioning tasks and higher level technology-specific orchestration tasks, becomes even more apparent...
For a MongoDB Sharded Cluster on GKE, the following categories of tasks must be implemented:
Infrastructure Level (using Google's "gcloud" tool)
- Create 3 VM instances
- Create storage disks of various sizes, for containers to attach to
- Provision 3 "startup-script" containers using a Kubernetes DaemonSet, to enable the XFS filesystem to be used and to disable Huge Pages
- Provision 3 "mongod" containers using a Kubernetes StatefulSet, ready to be used as members of the Config Server Replica Set to host the ConfigDB
- Provision 3 separate Kubernetes StatefulSets, one per Shard, where each StatefulSet is composed of 3 "mongod" containers ready to be used as members of the Shard's Replica Set
- Provision 2 "mongos" containers using a Kubernetes Deployment, ready to be used for managing and routing client access to the Sharded database
- For the Config Servers, run the initialisation command to form a Replica Set
- For each of the 3 Shards (composed of 3 mongod processes), run the initialisation command to form a Replica Set
- Connecting to one of the Mongos instances, run the addShard command, 3 times, once for each of the Shards, to enable the Sharded Cluster to be fully assembled
- Connecting to one of the Mongos instances, under localhost exception conditions, create a database administrator user to apply to the cluster as a whole
The quantities of resources highlighted above are specific to my example deployment, but the types and order of provisioning steps apply regardless of deployment size.
In the accompanying example project, for brevity and clarity, I use a simple Bash shell script to wire together these three different categories of tasks. In reality, most organisations would use more specialised automation software, such as Ansible, Puppet, or Chef, for example, to glue all such steps together.
Kubernetes Controllers & Pods
The Kubernetes StatefulSet definitions for the MongoDB Config Server "mongod" containers and the Shard member "mongod" containers hardly differ from those described in my first blog post.Below is an excerpt of the StatefulSet definition for each Config Server mongod container (shard specific addition highlighted in bold):
containers:
- name: mongod-configdb-container
image: mongo
command:
- "mongod"
- "--port"
- "27017"
- "--bind_ip"
- "0.0.0.0"
- "--wiredTigerCacheSizeGB"
- "0.25"
- "--configsvr"
- "--replSet"
- "ConfigDBRepSet"
- "--auth"
- "--clusterAuthMode"
- "keyFile"
- "--keyFile"
- "/etc/secrets-volume/internal-auth-mongodb-keyfile"
- "--setParameter"
- "authenticationMechanisms=SCRAM-SHA-1"
Below is an excerpt of the StatefulSet definition for each Shard member mongod container (shard specific addition highlighted in bold):
containers:
- name: mongod-shard1-container
image: mongo
command:
- "mongod"
- "--port"
- "27017"
- "--bind_ip"
- "0.0.0.0"
- "--wiredTigerCacheSizeGB"
- "0.25"
- "--shardsvr"
- "--replSet"
- "Shard1RepSet"
- "--auth"
- "--clusterAuthMode"
- "keyFile"
- "--keyFile"
- "/etc/secrets-volume/internal-auth-mongodb-keyfile"
- "--setParameter"
- "authenticationMechanisms=SCRAM-SHA-1"
For the Shard's container definition, the name of the specific Shard's Replica Set is declared. This Shard definition will result in 3 mongod replica containers being created for the Shard Replica Set, called "Shard1RepSet". Two additional and similar StatefulSet resources also have to be defined, to represent the second Shard ("Shard2RepSet") and third Shard ("Shard3RepSet") too.
To provision the Mongos Routers, a StatefulSet is not used. This is because neither persistent storage nor a fixed network hostname are required. Mongos Routers are stateless and, to a degree, ephemeral. Instead, a Kubernetes Deployment resource is defined, which is the Kubernetes preferred approach for stateless services. Below is the Deployment definition for the Router mongos container:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: mongos
spec:
replicas: 2
template:
spec:
volumes:
- name: secrets-volume
secret:
secretName: shared-bootstrap-data
defaultMode: 256
containers:
- name: mongos-container
image: mongo
command:
- "numactl"
- "--interleave=all"
- "mongos"
- "--port"
- "27017"
- "--bind_ip"
- "0.0.0.0"
- "--configdb"
- "ConfigDBRepSet/mongod-configdb-0.mongodb-configdb-service.default.svc.cluster.local:27017,mongod-configdb-1.mongodb-configdb-service.default.svc.cluster.local:27017,mongod-configdb-2.mongodb-configdb-service.default.svc.cluster.local:27017"
- "--clusterAuthMode"
- "keyFile"
- "--keyFile"
- "/etc/secrets-volume/internal-auth-mongodb-keyfile"
- "--setParameter"
- "authenticationMechanisms=SCRAM-SHA-1"
ports:
- containerPort: 27017
volumeMounts:
- name: secrets-volume
readOnly: true
mountPath: /etc/secrets-volume
Once the "generate" shell script in the example project has been executed, and all the Infrastructure, Kubernetes and Database provisioning steps have successfully completed, 17 different Pods will be running, each hosting one container. Below is a screenshot showing the results from running the "kubectl" command, to list all the running Kubernetes Pods:
This tallies up with what Kubernetes was asked to be deployed, namely:
- 3x DaemonSet startup-script containers (one per host machine)
- 3x Config Server mongod containers
- 3x Shards, each composed of 3 replica mongod containers
- 2x Router mongos containers
With the full deployment generated and running, it is a straight forward process to connect to the sharded cluster to check its status. The screenshot below shows the use of the "kubectl" command to open a Bash shell connected to the first "mongos" container. From the Bash shell, the "mongo shell" has been opened, connecting to the local "mongos" process running in the same container. Before the command to view the status of the Sharded cluster has been run, the database has first been authenticated with, using the "admin" user.
UPDATE 02-Jan-2018: Since writing this blog post, I realised that because the mongos routers ideally require stable hostnames, to be easily referenceable from the app tier, the mongos router containers should also be declared and deployed as a Kubernetes StatefulSet and Service, rather than a Kubernetes Deployment. The GitHub project gke-mongodb-shards-demo, associated with this blog post, has been changed to reflect this.
Summary
In this blog post I’ve shown how to deploy a MongoDB Sharded Cluster using Kubernetes with the Google Kubernetes Engine. I've mainly focused on the high level considerations for such deployments, rather than listing every specific resource and step required (for that level of detail, please view the accompanying GitHub project). What this study does reinforce, is that a single container orchestration framework (Kubernetes in this case) does not cater for every step required to provision and deploy a highly available and scalable database, such as MongoDB. I believe that this would be true for any non-trivial and mission critical distributed application or set of services. However, I don't see this is a bad thing. Personally, I value flexibility and choice, and having the ability to use the right tool for the right job. In my opinion, a container orchestration framework that tries to cater for things beyond its obvious remit would be the wrong framework. A framework that attempts to be all things to all people, would end up diluting its own value, down to the lowest common denominator. At the very least, it would become far too prescriptive and restrictive. I feel that Kubernetes, in its current state, strikes a suitable and measured balance, and with its StatefulSets capability, provides a good home for MongoDB deployments.Song for today: Three Days by Jane's Addiction
12 comments:
Hi, this is great work! As a suggestion, I think it would be great to have an analysis of different failure scenarios and how to handle it on a sharded deployment on kubernetes... For instance, what happens if a node in the cluster simply disappear? How to recover from that? There are many others scenarios and I think it would be great to have such analysis!
Hi, thanks for sharing your knowledge and experience on both kubernetes and mongodb.
I'm currently trying to set a mongo sharded cluster using a kubernetes cluster (as you did) but on bare-metal server. I tried to use a custom version of your script from your github project but I'm stuck on a very annoying problem : mongo services cannot reach each other ("connection refused"). kubernetes dns services and pods are running and the only log I get from kube-dns is "could not find endpoint for service 'mongodb-[service]'".Yet endpoints for thoses services are created.
Could you lend me a hand on this please ?
Hi Paul,
Thanks for the amazing tutorials.
Please share me the connection string for mongodb.
@"mathob jehanno", if you are using MongoDB version 3.6+ then due to the change in default behaviour of mongod binding and listening on localhost only, you need to set the bind_ip explicitly in the mongod start-up params. See "--bind_ip" example in https://github.com/pkdone/gke-mongodb-shards-demo/blob/master/resources/mongodb-maindb-service.yaml
Also this repo may be of interest: https://github.com/pkdone/mongo-multi-svr-generator
@"Paul Done" Thanks for your answer, it helped me moving forward, yet I've another problem, I'll try to figure out myself. The link you provided will probably help me.
@"Lathesh Karkera", I just updated the blog post. See comment "UPDATE 02-Jan-2018" added to the above post with a description. Using the updated github project the hostnames/ports for the mongos routers that can be referenced in the connection url will be: "mongos-router-0.mongos-router-service.default.svc.cluster.local:27017" and "mongos-router-1.mongos-router-service.default.svc.cluster.local:27017".
Thank you so much Paul.
I want to deploy mongodb and mysql as containers in GKE(Google Kubernetes Engine)
I succeeded in deploying in single cluster in same region. But requirement is to deploy in different region and have a data synchronization.
Example: Assume there are 2 regions us-central and us-east If customer does any operation on database it has to reflect on both the region at same time.
How I can implement this using kubernetes container engine?
What are all the Pros and Cons?
Hi Lathesh, I've not looked at multi-dc / multi-regions options for k8s yet. I'll be interested to find out what you discover. Paul
Hi Paul,
fantastic set of tutorials - so helpful to get a nicely distributed & scalable deployment of MongoDB.
It is probably my misunderstanding of Kubernetes, but I want to expose this cross-cluster and externally (ie: to connect and explore the data from something like Compass from my desktop) but seem to be having difficulties. From what I can understand, the entry point will be the mongos router service, but how is this exposed to the big wide world... :)
Again, thanks for a great set of articles - still learning and these were just incredible for me to understand how it all comes together.
cheers, Matt
Thanks for sharing this blog, it was very useful to get started with mongoDB in k8s.
On github repo : https://github.com/pkdone/gke-mongodb-shards-demo
one of the generate script might end up in infinite loop :
kubectl exec mongod-shard1-0 -c mongod-shard1-container -- mongo --quiet --eval 'while (rs.status().hasOwnProperty("myState") && rs.status().myState != 1) { print("."); sleep(1000); };'
Here when rs.initiate() is done, any shard or config server can be primary ( myState value=1)or secondary (myState value=2). so how can we assure that mongod-shard1-0 with mystate != 1 if it's a primary shard?
I saw that you also did a tutorial on running MongoDB Deployment Demo for Kubernetes on Azure ACS
https://github.com/pkdone/azure-acs-mongodb-demo
any chance of doing the same thing for runnning Sharded mongodb on AKS ?
thanks
Here when rs.initiate() is done, any shard or config server can be primary ( myState value=1)or secondary (myState value=2). so how can we assure that mongod-shard1-0 with mystate != 1 if it's a primary shard?
Does anybody solved it?
Post a Comment