Continuous integration on GitLab

Content

Overview

Continuous Integration (CI) is a term used to describe frequent updates of your code. It is usually implemented with a mechanism in which every time someone pushes code to a project repository a script is activated and executes a series of jobs (e.g. building , testing, deploying) specified by the user.

Said script is a simple file that contains all the stages that the project has to go through. This is called a pipeline . The stages contained are usually building and testing in the CI-part and potentially review/release and deploy in a Continuous Deployment/Delivery (CD)-part. The pipeline is executed on so-called Runners .

There is more to a CI-based workflow than the CI-pipeline but that is beyond the scope of this knowledge base.

A Runner is a software provided by GitLab in our case. Runners execute the jobs within a pipeline.
It might be the case that there are some runners provided by GitLab which are free, but most probably are also slow, or you can install the runner software on the machine of your selection (even your own laptop) and then specify which projects can use this runner.

When the script is run, all jobs either succeed or fail (due to an error or a user-written test) and if the whole pipeline of jobs succeeds that means that the project is in good condition. For example, this script can be run every time before a merge request (e.g. featuremaster branch), so that it checks if the changes between the branches are not breaking tested behavior in the master branch.

Last and optional step would be to publish some kind of visual results in a web site automatically, and this is where GitLab Pages come in and help by providing free web hosting for simple static websites. This is described in Option b.

Option a: using artifacts

After a CI-pipeline is run by GitLab , one can download artifacts from the GitLab-website at CI/CD->Pipelines. They are used to inspect the results of the pipeline. Usually artifacts are deleted after a certain amount of time (e.g. 30 days). If you want to keep them, for example because they belong to a milestone of the project, they therefore need to be saved and tied to the commit .

If the code is archived in a repository, the artifacts should also be archived and cross-linked to it. The data items are cross-linked using the meta-data schemes given by your respective repository.

Option b: creating a blog-post website

For this guide the RWTH Aachen private GitLab instance will be used.

In order to create a CI pipeline we need two repositories in Gitlab:

  1. The first repository will host the actual project : project repository
  2. And the second repository will host the static web site, which will be used to publish visualizations of the results: pages repository

In every GitLab repository the CI Pipeline is enabled by default and the only thing needed is to add a file named .gitlab-ci.yml in the root directory of the project and of course an available Runner.

Whenever there is a new commit (a change in the code) the pipeline of the project repository will automatically start executing all the jobs defined in the .gitlab-ci.yml file, like building binaries, executing python code, testing and publishing results. This behavior can change so that the pipeline only executes when there are commits to a specific branch, or many other rules, so the Runner is not overloaded.

The publish of the results is achieved by the second repository (pages repository). In order for that to happen, this repository is pulled and the changes are saved in specific folders (_posts & assets/images) of the project with specific names (with current date and commit hash appended). After that, the updated repository is pushed upstream and its own pipeline starts building the new version of the web site and making it available online.

Workflow

The following diagram depicts the stages of the pipeline in yellow:

ci2

As you can see in this diagram, the main steps are building the software, running tests (e.g. numerical calculations), and processing & visualizing the results. Afterwards the results can be handled with option a or option b.

Because we want to visualize also test results that failed, we separate the tests into:
a. Data generation tests (numerical codes generate data)
b. Data agglomeration tests (jupyter notebooks crunch numerical data and generate diagrams that tell us where and what went wrong)
c. Logical tests (python pass / fail tests that check the data from step 2 to return PASS / FAIL).

Steps for creating a CI Pipeline

  1. Create a GitLab repository of your project on TU GitLab New Project (project repository).

  2. If you chose option b: Create a GitLab pages repository for publishing the test results on TU GitLab (pages repository). Follow the guide on how to fork and set up a Jekyll project or set up a HUGO project (what we do).

  3. If you chose option b: Enable automatic deployments from Project repository to the Pages repository with the help of SSH key pair (How to set up SSH keys for automatic deployments).

  4. You need a GitLab Runner that runs the your tests. It needs to be enabled and registered (How to Set up a GitLab Runner).
    If you chose option b for both of the projects.

  5. The last step is to create a .gitlab-ci.yml file which is responsible for the actual execution of the pipeline.
    How to create a Gitlab CI YAML configuration.

Where to start

If you want to read further, go to the literature.
To start working go back to the previous paragraph: Create your CI pipeline

You can also get a list of our articles assigned to this chapter via the tags on the top of every page.

See also