Please find here an overview over some terms and abbreviations.
An application programming interface enables communication between a software and the operating system or more generally between two softwares. An example is the command open
.
Linux exists in various variants, or distributions. One such distribution is Arch, another one is Ubuntu . Arch is very up-to-date, but can require significant maintenance from the user, whereas Ubuntu prioritizes lower maintenance over bleeding edge.
In GitLab-context artifacts are files and directories that emerge during a
pipeline
.
They can be downloaded as a zip-archive. You have to declare what files/folders you want to be an artifact in the .gitlab-ci.yml
file.
In a more general view, artifacts are results of something. That could be code/results from a project or a document emerging from a discussion.
Bash is a programming language mainly used for scripts and has more functionality than “normal” shell . Bash is an acronym for Bourne Again SHell.
A branch is a development line of your git repository. There are different styles how to use them, therefore please refer to our guide or the literature.
A branch typically consists of multiple commits , that originate from one another and thus can be displayed as a line. The name of the branch is basically the name of the line of commits and points to the latest commit.
In practice this means if you want to develop a new feature you can make a new branch, make your changes and do as many commits to your branch as you like. When you are satisfied with your work, you merge your branch with the master/main-branch, so your work is being taken over by the master/main branch (and other users).
Build testing is testing if a code compiles correctly on different platforms. This is the first test.
Building is the process of transferring one representation into another. This might be
A ‘build’ is a compiled , specific version of your program.
The term building is often used synonymously to compiling, but actually compiling is a part of building. Building features:
A build system handles platform-specific details for the compilation and tells those to the compiler. CMake is a popular build system.
A clone is a copy. If you clone a repository, you create a copy of it on your computer with the link to the remote repository .
For more information please refer to the literature.
A commit is a snapshot of your repository, a specific version of your files. Each commit has a previous commit (and a subsequent one). Commits belong to one or more branches .
Every commit is assigned a unique identifier, a ‘hash’. This hash is a 40 character SHA-1 checksum, what is a algorithm beyond the scope of this Knowledge Base. For quickness the hashes often are abbreviated with their first few characters.
Refer to the literature for more in-depth knowledge.
Compiling is the process of transferring a human readable representation into a machine readable one. The terms compiling and (= building ) are often used interchangeably.
You can distinguish building and compiling as follows:
The compile time is the part of the build time where your code is transferred typically into object files (.o). After the compile time these files are further transferred (linked) into
libraries
/executables/….
Popular open-source compilers are Clang and gcc . A popular closed-source compiler is the Intel C++ compiler.
A container encapsulates a programm with all its dependencies. It can be run on different machines (with e.g. Singularity installed)and always produces the same output. Since it contains the software you need, you don’t need to install anything on the machine you want to run your code on ( HPC-cluster ). In the same way they are great for reproducibility and archiving.
Continuous Integration (CI) is a term used to describe frequent updates of your code. It is usually implemented with a mechanism in which every time someone pushes code to a project repository a script is activated and executes a series of jobs (e.g. building , testing, deploying) specified by the user.
Said script is a simple file that contains all the stages that the project has to go through. This is called a pipeline . The stages contained are usually building and testing in the CI-part and potentially review/release and deploy in a Continuous Deployment/Delivery (CD)-part. The pipeline is executed on so-called Runners .
There is more to a CI-based workflow than the CI-pipeline but that is beyond the scope of this knowledge base.
Docker is a container software. It is basically similar to Singularity .
There is a DockerHub, a repository for software, that can also be reffered to as Docker in the context of pulling some image for a container.
A DOI is a ‘Digital Object Identifier’ and is also a PID . It can consist of numbers and letters. DOI’s are part of the URL of the object, making it very easy to find.
“The Engauge Digitizer tool accepts image files (like PNG, JPEG and TIFF) containing graphs, and recovers the data points from those graphs.” Source
Basically a forked project is a cloned project, that is on the server instead of your local computer. They are linked together.
For more information please refer to the literature.
A friend class has access to the class it is friend of.
A detailed explanation for C++ is given at cppreference.
gcc is the GNU C Compiler. It is part of the GCC, the GNU Compiler Collection, that contains compilers for other languages like C++, Objective-C, Fortran, Ada and Go.
Note that we have distinguished gcc with small letters and GCC with capital letters. This might not always be done, neither in our articles (that come from different authors) nor in the rest of the internet.
GNU is an operating system as well as an project.
git is a version control system and a program, that lets you track your code in terms of versions. Or basically any other (text-)files.
git is a distributed version control system, with which you can alone or in collaboration with others keep track of all your changes in your code base. You can have multiple branches , that are different versions of your code, that “live inside your folder simultaneously”. Please refer to the literature and our guide on version control.
For more information please refer to the literature.
Git-tags can be used to mark important commits in your git history. Often times they are used to mark releases e.g. ‘v2.3.5’.
GitLab is a collaboration platform based on git for software development. There is a public instance and private ones like that one this website is running on: https://git.rwth-aachen.de. Other popular platforms are GitHub and BitBucket.
GitLab offers a GUI and features like issues and can deploy websites. Just like this one.
For more information please refer to the literature.
A
GitLab Runner
is the program that actually runs the code defined in the .gitlab-ci.yml
-file.
The .gitlab-ci.yml
-file sets up the
CI
pipeline
.
The GitLab Runner executes this pipeline for every
commit
that is
pushed
to the
GitLab
-server. It is a program provided by GitLab.
A Runner can be dedicated just for a certain project, or it can be shared between many projects. After it finishes running all the actions defined in the YAML file, it sends the results back to GitLab.
For more information please refer to the literature.
A header file typically contains declarations (they “introduce” the function and its type to the compiler) for functions used in a program that are to be included in the API . It is important to the compiler , who generates the library files .
Header files are written in the corresponding programming language. They are written by the programmer.
Sometimes the header file also contains the function definitions (define the function with its input and output types).
You can have multiple header files.
HPC stands for High-Performance-Computing. Sometimes it also refers to HPC-machine/cluster.
A typical HPC-application are simulations that run on an HPC-cluster like the Lichtenberg Cluster .
HUGO is a framework for building static websites.
HUGO is like Jekyll a framework to enable Markdown -written articles, that are translated into html-code. This Knowledge Base is written in HUGO.
Refer to our literature for additional information.
The distinction between ‘image’ and ‘ container ’ can be a tough one and might not always be stringent.
Basically a container is a running image (file). As you may have noticed
Singularity
stores its containers images as .sif-files (Singularity Image File). If you run an image, you create a container.
Since you can run the image multiple times, you can have multiple containers from the same image.
Admittedly the distinction between the two terms is often not made and image-files are often called containers. This might be true for this Knowledge base too. The reason for this loose distinction might be that virtual machines (VM) also work with image-files but to clarify that a Singularity-file is not related to any VM it might have been called ‘container’.
In Jupyter notebooks there can be alternating program code, program output and text. The text is usually explanatory.
Jupyter notebooks are like interactive scripts, where you can watch your code perform live and step-for-step. Between the steps you can add text, explaining, what you are doing.
This is great for tutorials or result-/code-documentation.
Please refer to jupyter.org, where you can try it in your browser.
The Kanban system tracks tasks as ‘To Do’, ‘In Progress’ or ‘Done’. These tasks can be GitLab issues.
The kanban board (where all the tasks are listed in the categories) can of course be virtual. More categories than the 3 ‘To Do’, ‘In Progress’ or ‘Done’ are generally possible.
LaTeX is used to render math from text files like .md.
\begin{equation}
c = \frac {log( \frac{e\_2}{e\_1} )} {log( \frac{h\_2}{h\_1} )}
\end{equation}
is rendered as $$ c = \frac {log( \frac{e_2}{e_1} )} {log( \frac{h_2}{h_1} )} $$
A library file contains executable code in machine code. The functions therein are usually declared in the header files .
The library file may contain more declarations/code than specified in the (public) header file.
Lichtenberg is the HPC system at TU Darmstadt. You need to be granted access to it.
You can find more information on the website of the TU Darmstadt or of the Hessian Competence-center for HPC.
The local repository is your git repository on your machine. It stands in contrast to the remote repository, that usually is the GitLab repository.
Markdown is a simple language that let’s you write relatively easily text that is formatted e.g. bold or as heading.
Markdown is used for the articles in this Knowledge Base and is “translated” by HUGO into html-code. It is relatively easy to learn and to read, even “raw”.
It enables lists, checkboxes, headings, quotes, simple text formatting (bold, italic and strike-through) and code
. For more advanced formatting see e.g. this link.
A merge request (MR) actually is a request by an GitLab user to an eligible member of the GitLab repository to merge 2 branches .
When it is marked as a ‘Draft’ and the merge request cannot accidentally be executed. It is then a good ‘place’ to review your changes and discuss them with your colleagues. See our articles related to MRs.
Metadata are data how your data emerged. That can be which method you used for processing, at what date you did accessed something on the internet, who is the author, etc. Almost all files contain metadata.
An overlay provides the possibility to store files created within the otherwise immutable (non writable) Singularity container .
We have described to to set up an overlay in this article.
Pandas is an open source Python data analysis tool.
See also our articles
A parameter variation is called a study. In general multiple studies are conducted. Each study can contain hundreds of simulations, so called runs.
Refer also to the image found here.
The study runner is executed by a study notebook, which is a Jupyter notebook .
These study notebooks document their respective parameter study with graphical and mathematical descriptions, including equations, initial and boundary conditions and physical parameters. The study notebooks also process secondary data .
Refer also to the image found here.
A study runner varies the parameters in a parameter study and runs the simulations.
A PID is a persistent/permanent identifier. DOI ’s and the ISBN (Int. Standard Book Number) are PIDs.
In Continuous Integration (CI) a pipeline contains all stages and jobs that are run when the code changes. There are other triggers for a pipeline as well, like merge requests, a schedule or manual activation.
The stages contained are usually building and testing in the CI-part and potentially review/release and deploy in a Continuous Deployment/Delivery (CD) part.
A pipeline usually only is successfull if all jobs that belong to it are run successfully.
See also our guide on CI and the literature.
Post-processing is the treatment of the results (the primary data ) of a computation. This includes calculating derived quantities from your simulation results, visualizing your results in pictures or movies and generating plots and graphs. Secondary data are always post-processed.
Primary data are the not
post-processed
results. They use to be large in file-sizes.
They are opposed by the
secondary data
.
Production (level) tests are the biggest and last tests to run while code development, succeeding the smoke tests . Most likely they require HPC resources to run adequately.
For scientific software, production tests are often not additional tests, but simply smoke tests with the high input resolution that is used in production, as well as parameter studies .
A project notebook is a Jupyter notebook that links all the study notebooks .
After starting the jupyter server remotely on the HPC server , it can can be viewed live locally in a browser.
pull
is a
git
command that is used to download changes from the
remote repo
.
See also our version-control guide or the literature.
The opposite command is push .
push
is a
git
command that is used to upload your local changes (=push them to another/the
remote
repo). See also our version-control guide or the literature.
The opposite command is pull .
The remote repository usually is the GitLab repository. It stands in contrast to the local repository, which is your local copy of the repository on your computer.
Rendering means to calculate graphical output from ‘raw’ input. For example an image with velocity data out of simulation data or text files into a website.
RSA is an asymetric encryption technique. This means there is a public key and a private key. We use them to authenticate via SSH .
Other encryption techniques used in SSH-context are e.g. DSA, ECDSA and EdDSA.
Ruby is a programming language. It is object-oriented.
Actually runtime is the time-span a program is running from its start until it finishes. Often times runtime is used as short version of runtime environment.
The runtime environment means all the processes and resources that are running during the execution of a program. They are provided by the runtime system.
Secondary data are the (post-)processed results. This can be plots, tables, images or graphs. They are a vital part of papers and presentations. They are usually small in file-size compared to the opposing primary data .
Shell is mainly a programming language that can be used for scripts but it is also the command line interface.
A related language is Bash .
Singularity is a container software. It is basically similar to Docker .
For more information please refer to the literature.
SLURM is a workload manager used to run jobs on HPC . It is used e.g. on the Lichtenberg cluster.
The Smoke tests check the full functionality of a complex numerical method, but with small input data: they are used to test production-level scientific codes with a very coarse input, that enables fast execution on a Workstation.
They succeed unit testing and precede production testing .
Use the source
-command to load functions from files. You can use it inside the shell or in a shell script. It does reload the shell commands, load any functions from a .sh file into your shell (so that you can execute them) or execute the commands in a script directly.
Usually you source your bashrc file and the syntax for that looks similar to
source etc/bashrc
Structured Query Language (SQL) is a computer language for databases.
SSH-Keys are used to connect instances (can be passwordless). You use them for authentication to e.g. set up a connection between your computer and or your HPC-cluster or a GitLab repository. For later, see git-setup.
To automate actions between your computer and GitLab or different GitLab repositories, you should use SSH-Keys. If used correctly, they are secure and they will disburden you from a lot of work.
SSH-Keys consist of a ‘public key’ and a ‘private key’. Latter one should never be shared with others or transmitted via any unsecure method. In a setting, where you want to connect to GitLab, it stays on your computer and encrypts your data. You pass the public key to the GitLab-server and it is able to decrypt your messages.
See our articles about setting up git or automatic deployments between two GitLab repositories.
Static checkers analyse your code before it is run. They e.g. check for syntax errors, arguments and function-names.
In contrast dynamic analysis detects errors while the programm is running.
An example for a static checker is clang-tidy.
Submodules are basically projects within your git project. This often are themes or libraries. They are embedded within your normal folder structure.
For more information please refer to the literature.
svg
-files are vector graphics. This means you can “zoom in indefinetely” without seeing pixels.
Themes in HUGO or Jekyll are design- and functionality-modules. They are responsible how your articles are going to be displayed on a website. They also create additional sites like tags.
A list of themes for HUGO can be found at https://themes.gohugo.io/.
In a
Merge Request (MR)
on
GitLab
you can discuss aspects of the content of your branch with threads, wich work similar to other platforms or forums: one user can comment things and another can reply to the comment.
Once the discussion has ended, the thread is being marked resolved.
See our article on discussing MRs.
Except from being resolved/unresolved, what distinguishes threads from comments is:
Linux exists in various variants, or distributions. One such distribution is Ubuntu, another one is Arch . Ubuntu prioritizes lower maintenance over bleeding edge, whereas Arch is very up-to-date, but can require significant maintenance from the user.
Unit tests are responsible for testing the smallest algorithmic units with the smallest inputs. This results in having low input & output data volumes and low runtime. You can usually run them on your PC/workstation during development.
They succeed
build testing
and precede
smoke testing
.
For example, this could be tests that check if a polynomial interpolation is working, for testing root finders, or testing solutions of small linear systems.
Version Control helps to prevent the duplication of files, by tracking if changes (differences) exist between them in different commits . It also helps keeping an overview over different versions. Version control systems make it possible to have a single set of files (raw text, LaTex, fortran/C++/python/BASH source code) and switch between different versions (so called branches ) using a version control system (VCS). A popular example is git .
Thus VCS will enbable you to have one folder and when you switch branches it will remove, modify or add the files according to the state of the branch you are checking out. This also has the advantage to potentially save a lot of storage on your computer, since you do not have to have dozends or even hundreds of folders/files on your computer, manually sorting them.
We use vtk files for visualization. The Visualization Toolkit (VTK) is open-source program to display 2D- and 3D-data from vtk-files.
The Visualization Toolkit_ (VTK) is an open-source programm for data visualization. We use vtk files for visualization.