Seven Aspects to Successfully Implement a CI/CD Pipeline

Alvaro Castroman
.
March 29, 2022
Seven Aspects to Successfully Implement a CI/CD Pipeline

Continuous Delivery is the practice of deploying (or being ready to deploy) to production any kind of changes including:

This deployment is done in a frequent, automatic, repeatable, reliable and auditable way. When such changes go directly into production then we are strictly doing Continuous Deployment. If there is a manual approval step before deploying, then we say we are doing Continuous Delivery.

When practicing CI/CD there are several aspects we believe should be considered for a successful implementation:

#1 – Just do it

Practice makes perfect so the harder something is for us, the more we should exercise it. Oftentime, it is difficult to start implementing a CI/CD pipeline, especially for an existing system. The advice is to approach it for a new greenfield project and do it incrementally. That is, we might create other stages or change the pipeline later but at least we should be able to deploy to a staging environment before even adding any line of code. Deploy a hello world program by using a hello pipeline.

This of course can be done for existing systems but other aspects need to be evaluated like whether it makes sense to slice or refactor the code beforehand, how good the automatic tests are (if any), etc. A topic for another post.

#2 – Make it simple

As the project grows and more features are implemented so does the pipeline. Like code, we should keep it simple and stick to the purpose: build, test and deploy to production. Simple. If we need to add extra stages that do not contribute to those objectives, if we need to add complex build/test scripts (e.g. with lots of lines, bifurcations), if we require extra approval steps, then it is likely that we have a design problem in our system. This is an indicator  that we should take a step back and analyze why such complexity is needed in the pipeline.

Also, try not to add logic that checks for the state of external systems (including other pipelines). Our pipeline should be able to run anytime independently of other systems. Again, if this is not the case, it might be a symptom of a bad architecture design.

#3 – Test automation

An automatic test suite is a must to successfully implement a CI/CD pipeline. In other words, it is not possible to practice Continuous Delivery without automatic tests. Furthermore, high quality well written tests are needed to ensure the correctness of every change that gets deployed in production and gain confidence in the process. That is, if it passes the tests, then it is likely we are not breaking anything in the new version.

How to write such tests is out of the scope of this post but here are a few tips to be mindful of:

#4 – Continuous Integration (CI) practices

It is obvious that to implement Continuous Delivery we need to successfully implement Continuous Integration first. And that implies:

#5 – Automate and source control everything

Everything is code. We apply Clean Code practices to our code and we should do the same to our tests (see #3)config files, data and infrastructure. Any configurable aspect of our system should be source controlled for every environment we deploy to. That includes our software and the underlying infrastructure required to run our software. For data, we should source control all the schema changes, as well as, the data not generated by the user (e.g: constant data).

Infrastructure as Code (IasC) is highly recommended. No need to reinvent the wheel here, you can leverage the IasC facilities offered by the cloud providers (e.g: Terraform, Cloud Formation, CDK, etc.).

Ideally, one repo should contain everything needed for running the system: the code, the tests, the configs, the data and the infrastructure (including the pipeline). There might be exceptions where it could make sense to separate the IasC from the code itself but just make sure to have a good reason to do so.

#6 – Use blue/green deployments, canaries and features toggle

If we use IasC from a cloud provider then the ability to do blue/green deployments comes for free as it can be configured. That is normally implemented at a change at the DNS or at the Load Balancer, depending on what we are using. Either way, we can go as far as ramping up a new Blue server and destroying the current Green one, or vice versa, seamlessly on every commit and every few seconds, without any manual intervention and with the full confidence that breaking anything is truly minor.

To gain even more confidence we can use canary releases and features toggle, two techniques that are complementary to each other and to blue/green deployments, which play very well together.

Going live with every commit does not mean releasing a new feature to the users every time. The code implementing (completely or partially) the feature could be live and therefore integrated but not necessarily exposed. In other words, it can be hidden behind a toggle, accessible only to those who have toggle it on. The toggle can be implemented by just hard-coding an if then in the code (remember that every commit can go live so we can remove the hard coded toggle with any future commit). The toggle can also be implemented via configuration or as sophisticated as using an external system with its own interface and pipeline. Again, favor simple solutions and avoid complexity when possible.

Canary releases implies releasing the feature to a reduced number of users. This is not compulsory to achieve Continuous Delivery but it could be very helpful in order to detect potential issues and minimize the impact. On the other hand, it is not convenient to have several versions deployed to different users as this can be a real management headache. So canaries are released and soon before long, if no problems are detected, the release receives the whole traffic. Be technically prepared for directing traffic like that, even when not using canaries.

#7 – Alerting and monitoring

Last but not least, another crucial aspect to be able to implement Continuous Delivery is how fast the team can react upon problems and solve them. Issues will happen and it is not about avoiding them (which would be impossible to do) but about minimizing the impact and fixing them as soon as possible.

Small and often changes are good because they allow us to have complete control and to isolate the problem easily. But we still need to detect them so this is where alerting and monitoring of our system comes into play.

Furthermore, we want to be ahead of potential issues: instead of finding out our system does not work because some users complained, we want to be alerted when certain metrics are not fulfilled, for instance, a queue surpasses certain size, the average time of a request took too long over the last 5 mins window, or we just got some 500s.

In any case and for whatever makes sense in our project, we want to be alerted for issues or potential issues. The suggestion is to address it from the beginning of the system and keep it evolving over time. A good dashboard where we can see at a glance the health of our system and good metrics where we can be alerted for deviations are a must.

Continuous Delivery is the practice of deploying (or being ready to deploy) to production any kind of changes including:

This deployment is done in a frequent, automatic, repeatable, reliable and auditable way. When such changes go directly into production then we are strictly doing Continuous Deployment. If there is a manual approval step before deploying, then we say we are doing Continuous Delivery.

When practicing CI/CD there are several aspects we believe should be considered for a successful implementation:

#1 – Just do it

Practice makes perfect so the harder something is for us, the more we should exercise it. Oftentime, it is difficult to start implementing a CI/CD pipeline, especially for an existing system. The advice is to approach it for a new greenfield project and do it incrementally. That is, we might create other stages or change the pipeline later but at least we should be able to deploy to a staging environment before even adding any line of code. Deploy a hello world program by using a hello pipeline.

This of course can be done for existing systems but other aspects need to be evaluated like whether it makes sense to slice or refactor the code beforehand, how good the automatic tests are (if any), etc. A topic for another post.

#2 – Make it simple

As the project grows and more features are implemented so does the pipeline. Like code, we should keep it simple and stick to the purpose: build, test and deploy to production. Simple. If we need to add extra stages that do not contribute to those objectives, if we need to add complex build/test scripts (e.g. with lots of lines, bifurcations), if we require extra approval steps, then it is likely that we have a design problem in our system. This is an indicator  that we should take a step back and analyze why such complexity is needed in the pipeline.

Also, try not to add logic that checks for the state of external systems (including other pipelines). Our pipeline should be able to run anytime independently of other systems. Again, if this is not the case, it might be a symptom of a bad architecture design.

#3 – Test automation

An automatic test suite is a must to successfully implement a CI/CD pipeline. In other words, it is not possible to practice Continuous Delivery without automatic tests. Furthermore, high quality well written tests are needed to ensure the correctness of every change that gets deployed in production and gain confidence in the process. That is, if it passes the tests, then it is likely we are not breaking anything in the new version.

How to write such tests is out of the scope of this post but here are a few tips to be mindful of:

#4 – Continuous Integration (CI) practices

It is obvious that to implement Continuous Delivery we need to successfully implement Continuous Integration first. And that implies:

#5 – Automate and source control everything

Everything is code. We apply Clean Code practices to our code and we should do the same to our tests (see #3)config files, data and infrastructure. Any configurable aspect of our system should be source controlled for every environment we deploy to. That includes our software and the underlying infrastructure required to run our software. For data, we should source control all the schema changes, as well as, the data not generated by the user (e.g: constant data).

Infrastructure as Code (IasC) is highly recommended. No need to reinvent the wheel here, you can leverage the IasC facilities offered by the cloud providers (e.g: Terraform, Cloud Formation, CDK, etc.).

Ideally, one repo should contain everything needed for running the system: the code, the tests, the configs, the data and the infrastructure (including the pipeline). There might be exceptions where it could make sense to separate the IasC from the code itself but just make sure to have a good reason to do so.

#6 – Use blue/green deployments, canaries and features toggle

If we use IasC from a cloud provider then the ability to do blue/green deployments comes for free as it can be configured. That is normally implemented at a change at the DNS or at the Load Balancer, depending on what we are using. Either way, we can go as far as ramping up a new Blue server and destroying the current Green one, or vice versa, seamlessly on every commit and every few seconds, without any manual intervention and with the full confidence that breaking anything is truly minor.

To gain even more confidence we can use canary releases and features toggle, two techniques that are complementary to each other and to blue/green deployments, which play very well together.

Going live with every commit does not mean releasing a new feature to the users every time. The code implementing (completely or partially) the feature could be live and therefore integrated but not necessarily exposed. In other words, it can be hidden behind a toggle, accessible only to those who have toggle it on. The toggle can be implemented by just hard-coding an if then in the code (remember that every commit can go live so we can remove the hard coded toggle with any future commit). The toggle can also be implemented via configuration or as sophisticated as using an external system with its own interface and pipeline. Again, favor simple solutions and avoid complexity when possible.

Canary releases implies releasing the feature to a reduced number of users. This is not compulsory to achieve Continuous Delivery but it could be very helpful in order to detect potential issues and minimize the impact. On the other hand, it is not convenient to have several versions deployed to different users as this can be a real management headache. So canaries are released and soon before long, if no problems are detected, the release receives the whole traffic. Be technically prepared for directing traffic like that, even when not using canaries.

#7 – Alerting and monitoring

Last but not least, another crucial aspect to be able to implement Continuous Delivery is how fast the team can react upon problems and solve them. Issues will happen and it is not about avoiding them (which would be impossible to do) but about minimizing the impact and fixing them as soon as possible.

Small and often changes are good because they allow us to have complete control and to isolate the problem easily. But we still need to detect them so this is where alerting and monitoring of our system comes into play.

Furthermore, we want to be ahead of potential issues: instead of finding out our system does not work because some users complained, we want to be alerted when certain metrics are not fulfilled, for instance, a queue surpasses certain size, the average time of a request took too long over the last 5 mins window, or we just got some 500s.

In any case and for whatever makes sense in our project, we want to be alerted for issues or potential issues. The suggestion is to address it from the beginning of the system and keep it evolving over time. A good dashboard where we can see at a glance the health of our system and good metrics where we can be alerted for deviations are a must.

Download your e-book today!