Although starting a journey with containers is easy, finding the best solution for handling them can be much more challenging. Especially if you need to take care of scheduling, deploying and managing containers in an environment that is more complex than a simple instance. At Apptension we had to face the same dilemma while implementing our Continuous Delivery stack from the scratch.
Finally, we've found two solutions we use during our day-to-day work for clients around the globe. We're about to guide you through AWS Elastic Beanstalk and Google Kubernetes Engine, two services we use for handling containers on production. Why use both? We wanted to be flexible and avoid getting trapped in a vendor lock-in situation.
Using both, we are more flexible, avoiding getting trapped in vendor lock-in situations. This article should help you decide which one meets your needs best.
You may also like: Cloud vendor lock-in: Should you go for AWS & Co?
Understanding AWS Elastic Beanstalk and Google Kubernetes Engine
Both solutions come with advantages stemming from their very structure. When one starts playing with it, AWS Elastic Beanstalk (or AWS EB) requires less knowledge and experience. Google Kubernetes Engine (here referred to as GKE) is a grown-up solution absorbing more companies focused on containers. Before we describe some significant differences between these two services, let's discuss their general characteristics.
AWS Elastic Beanstalk
That was the obvious way to go when Apptension started to talk containers and we left behind the VMs and all that Ansible stuff we did back then. (We appreciate how Ansible helped us structurize our deployment, though). We've been using AWS for a long time, so looking for a native container solution was a natural choice. We started with AWS Elastic Container Service (AWS ECS) and later evolved into AWS Elastic Beanstalk.
We've been using the latter ever since. AWS EB is based on the AWS mentioned above ECS. It applies a layer of abstraction and adds some convenient wrapping around the Elastic Container Services' features. When one starts working with ECS, it's required to understand everything, starting from the VPC it's going to be placed in.
Given that we use Terraform for each AWS part we create, it's hard to connect all the pieces to retrieve the 200 status code on the environment's URL. We managed to make it happen, but Elastic Beanstalk was tempting us with a better UI and a way to manage the environments better. It also has a setup wizard for beginners.
Managing environments with Elastic Beanstalk
Elastic Beanstalk aggregates environments in applications, and every environment must be included in exactly one application.
There's no specific configuration for the application itself. It can be easily treated as a logical grouping of environments. The configuration takes place in the environment setup. Here we're assuming the only environment type we're talking about is Multi Container Docker setup (but there're more).
Every environment has to be placed in a VPC, whether in public or private subnets. While setting up the environment, you can choose what instance type to use and that's the most important thing to notice – Elastic Beanstalk environments consists of whole instances.
There's a configuration file called Dockerrun.aws.json, which defines containers and how they're connected. It's just a JSON file, but some specific fields are required. The configuration file describes the environment being deployed on every environment's instance. So if your environment is a scalable one and has, let's say 5 instances, the configuration will be deployed on 5 instances in the same manner.
It results in a "monolith" being deployed on 5 instances here. The pair will be deployed on each instance if your configuration file contains Nginx and backend containers. We figured it would be convenient to define such small monoliths. The configuration based on different environments connected via the network may complicate things, but there's a better way to go.
Following the previous example, the monolith containing Nginx and backend containers could be divided. Nginx would create a reverse proxy sitting behind the Load Balancer, and the backend, residing in private subnets, would be responsible for the requests. Then the backend connects to the RDS instance and everything works fine.
Also read: End of long-term support for Angular
Google Kubernetes Engine
GKE requires a more complex approach, but it results in some specific advantages you can notice migrating from AWS EB. Google Kubernetes Engine takes advantage of Kubernetes itself, "an open-source system for automating deployment, scaling, and management of containerized applications." It's not a framework, not a library - it's a whole platform. It can be used almost everywhere, and it's entirely infrastructure-independent.
You can use it natively as Google Kubernetes Engine (former Google Container Engine) and utilize "kubeadm" provided to set up the cluster on bare metals or "kops" which is the solution for AWS at the moment. AWS is currently working on EKS, a native solution similar to GKE, it's, however, in a closed preview phase when this post is being written. You can even set up everything by hand.
It will work as long as there's a network connection between instances involved in the cluster. "So why have you moved to Google and GKE when there's kops for AWS?", you may ask. Well, that's what we were thinking back then. But things turned up to be a bit more complicated than we expected. The truth is, GKE is a no-hassle solution.
You can create everything with a setup wizard on the Google Cloud Platform admin panel or with an SDK called gcloud. Then, it's as simple as looking at the Google Cloud admin panel to get information about the cluster. Since it's a native solution, there's not much to worry about when the first Kubernetes cluster is being created.
If there's a need for advanced configuration, it's right there too. On the other hand, Kops is a way of implementing Kubernetes into AWS. It's not a native solution (hence no official support) and there's a need for master instances to be created, deployed and managed along with nodes. While master represents REST API, which, combined with other parts, takes care of the whole cluster, you may not want to be responsible for that.
GKE takes care of it by default. We had to choose. And we chose GKE because of its simplicity. It's a no-brainer that it is simply a safer way to go. Oh, there's one more thing. We wanted to add another prominent Cloud provider to our stack. But we still wait to get our hands on AWS EKS and confront it with GKE.
Managing environments and configuration
Kubernetes itself is not working on the instances level as AWS EB does. Instances create a cluster Kubernetes operates on, but the smallest entity is Pod, representing some tightly coupled containers. Going up in the hierarchy of objects, there's a ReplicaSet consisting of Pods and responsible for holding a current configuration of Pods and their scaling policy.
ReplicaSets can be standalone objects, but currently, a Deployment object creates ReplicaSets with every update of configuration. Then it can manage the scaling, perform rolling updates and offer simple rollbacking. Everything is isolated in the cluster by default.
Suppose there's a need to expose anything. In that case, it's done with Service object type having 3 different types: ClusterIP (exposed on the cluster level internally), NodePort (exposed on the specific port number on every node of cluster), and finally - LoadBalancer type, which stands for a real Load Balancer if the cluster is backed on the cloud.
It's worth noting that with Kubernetes being deployed on AWS with kops, it's a bit more complicated to set up a Load Balancer expose type than in GKE, and it was one of the advantages we took into consideration when choosing the cloud provider for Kubernetes stack.
You might be interested: Flask vs. Django - which framework to choose?
The different approach to environment management these two services take is where all the advantages and disadvantages of both arise. Let's describe a few of them below.
Load Balancers and service exposition
AWS Elastic Beanstalk environment can be represented with just one instance being directly accessible, and that's a Single instance type. Still, the only kind we consider and use is a Scalable one when everything is placed behind a provisioned Load Balancer (either Classic or Application load balancer).
Hence, a Load Balancer is responsible for requests proxying to instances, health checks, and scaling activity. On the other side, some instances can be easily created and terminated. The environment can be updated with zero downtime thanks to Load Balancer's existence.
Sounds okay, but when there's an application consisting of 3 environments (one for QA, one for developers and one as a staging), it means 3 different load balancers, where one costs you ~20$ a month. And it's only when your application is a monolith. So then, depending on which approach has been chosen, there might be a separate RDS instance for each environment which results in ~15$ a month for the smallest type possible.
It becomes costly. The separation of resources here is good, isolation is always good, but it requires new resources each time an environment is added. With GKE, you can create one Ingress object with a LoadBalancer type Service (which results in a Load Balancer being built on the Google Cloud Platform). Then all the services for different environments can be proxied with this Ingress based on paths or domains.
While it creates two forwarding rules (default-backend and the Ingress) and Google priced Load Balancer based on forwarding rules, it results in ~35$ a month. It is more expensive than AWS Load Balancer, but it can be used for all environments in your project. Furthermore, it can be used for all projects in your cluster, so you pay just for one Load Balancer.
Gone are the days when one had to update certificate files when they reached their expiration date. Nowadays, we all try to automate as much as possible, which also applies here. AWS EB and GKE offer different ways of SSL termination. SSL certificates can be free on Amazon, but those are not the files you need to implement elsewhere. It's just an object you can refer to on your Cloudfront, or, that's what we want, Load Balancer.
Simply add Listener for HTTPS:443 and choose SSL Certificate from a dropdown, and it's done. HTTPS requests will be served, and all requests will reach your EB instances on port 80 (same as for HTTP). Then check the X-Forwarded-Proto header and set proper redirections. That's how simple SSL Termination is on AWS. And it's free.
For GKE, it's a bit different. There's no built-in certification you can use. Instead, Google offers defining certificates as below:
But why would you do it since automation is key here? We dropped this option right after we discovered that. Kubernetes, thankfully, has some neat extensions which, with some effort, can give you free, self-renewable certificates. Cert-manager is the way to go. It can be configured to use Let's Encrypt.
It cooperates with Ingress based on different Ingress Controllers. In our case, it's just Nginx and gce. I have described both Ingress Controllers on my blog. Here we're assuming that both are well-known to you. Since Ingress can be annotated as Nginx or gce-based, the cert-manager can create critical paths for ACME checks, so certification is fully automated.
The domain you're using must have A record set as the IP of a Load Balancer itself or the one exposing Ingress on the Internet. A service exposed as LoadBalancer type or Ingress with GCE annotation will create a certificate object in Google Cloud and set it as Frontend protocol as below.
If an Ingress has NGINX Ingress Controller, it's still exposed as Google Load Balancer, but it works on the 4th OSI layer. At the same time, the previous example is a Layer 7 Load Balancer, so TCP traffic on 80 and 443 ports is passed to the instances. Then the controller does the routing and uses unique objects called Certificates to terminate SSL when needed.
Multiple environments and resources utilization
Then there's resource utilization - something that directly impacts the prices paid month by month. AWS Elastic Beanstalk operates on the instances level. New instances are created or terminated if anything is scaled up or scaled down. And each instance needs to be paid. Google Kubernetes Engine (and Kubernetes itself, thanks to how it works) defines Pods.
It can schedule multiple Pods on one instance. If there's an instance with enough resources, Kubernetes can place Pods of multiple environments and keep them reasonably isolated. Resource utilization is the key when it comes to costs.
One instance in Kubernetes can be used for multiple environments or projects. One instance in AWS EB stands for one "replica" of an environment of a specific project. Most of the time, there's no need to keep as many resources as EB does. Hence, Kubernetes is the way to go. It's a fact.
I believe everyone was waiting for this! Honestly, it's like heaven and hell, but I can't tell what's playing heaven's role here. AWS Elastic Beanstalk, being the AWS-specific solution, relies on the AWS' developers here. As we mentioned, it's based on JSON configuration file. So you can upload the configuration as a zip file, which creates the so-called application version, which can be used further in specific environments.
AWS EB follows the configuration steps, provisions the instances according to the update policy that's set and informs about specific steps' results in the Event dashboard. Sometimes it's hard to tell what's responsible for broken deployment. Sometimes it's just a missing environment key, sometimes an invalid command, but CloudWatch logs integration helps here.
A particular script on each EB instance pushes logs to the CloudWatch (with a small help from us with .ebextensions), so we can quickly check our aggregated logs and discover the problem. From the beginning, we've been using a Python-based application written by us to deploy anything from Jenkins. GKE aggregates logs by default.
There's no additional configuration needed, which is good. But the deployment is a whole new thing. By default, Kubernetes operates on YAML files with a particular structure. Those files are similar to the Dockerrun.aws.json from AWS EB but in theory. There're many more options to be used and much more configurations to be taken care of. It's not a bad thing because it opens new doors.
Compared to GKE, AWS EB is quite limited. We've started with a custom Python application for deployment (it had support for Job and Deployment objects, rollback in case of failure, and some neat features for log printing after deployment) implemented on Jenkins. But that was not enough. Maintaining a custom script is a hassle. What's more, we wanted to check something new.
Hence, Spinnaker came into play. That's a tool (formerly called Asgard) created by Netflix, then Google joined the team and they are developing this tool for a while now. It's not just Kubernetes support, it works well for AWS instances, too (but it doesn't support AWS EB). Spinnaker supports any Kubernetes cluster independently from how this cluster was created (so kops or kubeadm would work well here).
At the moment, we're just building stuff on Jenkins, then Docker images are pushed to the registries, and Spinnaker is run automatically, starting its pipeline that may look like this. We like the Spinnaker-based deployment. It's a new experience for us, and currently, If I had to choose one of those two deployments, I would go for Spinnaker rather than anything done on Jenkins.
Worth checking: The ins and outs of outsourcing IT projects
It's a no-brainer that AWS EB and GKE differ. AWS Elastic Beanstalk is quite simple to start playing with. However, Kubernetes, when done correctly, helps your team reach a different level. It's a pleasure to work with it, especially on Google Kubernetes Engine, where everything was meant to be 1:1 as in Kubernetes. At the same time, Kubernetes requires more involvement at the very beginning, and knowledge while in production use, it pays off.
We're currently working on AWS EB and GKE because such a stack gives us more flexibility and quickly adapts to different projects' needs. But, of course, we can't wait to access AWS EKS (Elastic Container Service for Kubernetes) because when appropriately integrated with Spinnaker, we can deploy on both clouds with the same flow, the same software, and the same confidence.
Suppose you're looking for a simple container service to use in production. In that case, you can go with AWS EB, but when you have to cater to the whole infrastructure, different projects, changing requirements, and overall variability, Kubernetes (and GKE for sure) is the way to go.
Want to learn more about AWS and its capabilities? Read this article to learn about AWS X-Ray