← home

Computing knowledge base

Caveat utilitor.


Shells are programs that interpret commands. They act as your interface to the system by allowing you to run other programs. When you type on your computer’s command line, you are using a shell in interactive mode. You can also write shell scripts to be batch processed.

There are many different shell command languages and shells that understand them. Most operating systems have multiple options, and you can choose which ones to use for scripting and your interactive shell.

sh is the POSIX-specified shell command language. Nearly every operating system has a shell located at /bin/sh that supports it. Modern shells that interpret languages with more features and better syntax than sh often have compatibility modes to interpret sh scripts.

bash is a command language and corresponding shell implementation. It is derived from sh with a number of extensions that make it nicer to use. It is also extremely widespread, but less so than sh. FreeBSD, for example, does not ship with a bash shell by default. You can find a description of the language and shell in the Bash manual.

zsh is a command language and shell, also derived from sh, that is more modern and friendly than bash. It is present on fewer systems than sh or bash, but it is gaining popularity. It has the best interactive mode of the three. You can find a description of the language and shell in the Zsh manual.

Choosing a shell

I recommend using zsh for your interactive shell, where concerns about cross-platform support don’t apply. When you need to write a script, you should choose the language based on where the script needs to work.

If the script is non-trivial and only needs to work on a small set of machines that you control, I recommend using a real programming language. They are much nicer to use, and all major languages have libraries for doing things that shells normally do, like executing subprocesses and setting up pipelines. These libraries also often support invoking a shell directly. You can simply install an interpreter for your language of choice on all the machines that need to run your script, and you’re off to the races.

If the script needs to work absolutely everywhere, then use sh. Otherwise, bash’s improved syntax is likely worth the reduced compatibility.

In the following notes on shell scripting, I assume the use of bash. For correct information about a particular shell or shell command language, read the appropriate manual.

Environment and shell variables

Like all processes, shells operate with a set of environment variables that are key-value pairs. These environment variables are passed to child processes that the shell creates. Shells also have a set of key-value pairs, known as shell variables, that are not passed to child processes. The set of shell variables is a superset of the set of environment variables. You can control environment variables, shell variables, and how they are passed to child processes using the shell command language.

To set a shell variable that later commands and built-in shell functions will see but child processes will not, you can use the syntax varname=value.

To set an environment variable that will also be seen by child processes, you can use the export builtin: export varname=value.

You can also define variables that are passed to a child process’ environment but not set as shell or environment variables in the shell’s process by defining them and executing a child process on the same line: varname=value program.

Here is an example script that shows all of these variable-setting methods in action:

# creates a shell variable
# creates an environment variable
$ export MYENVVAR=goodbye
# prints "hello" because echo is a builtin
# and can see the shell variable
$ echo ${MYSHELLVAR}
# prints nothing because the shell variable
# is not exported to the bash child process
$ bash -c 'echo ${MYSHELLVAR}'
# prints "goodbye" because echo is a builtin
# and can see the environment variable
$ echo ${MYENVVAR}
# prints "goodbye" because the environment
# variable is exported to the bash child process
$ bash -c 'echo ${MYENVVAR}'
# prints "ciao" because MYCHILDVAR is passed
# to the child process' environment
$ MYCHILDVAR="ciao" bash -c 'echo ${MYCHILDVAR}'
# prints nothing because MYCHILDVAR was only set
# for the previous command's child process
$ echo ${MYCHILDVAR}

You can list all shell variables with the set builtin by typing set at the command line with no arguments. You can list all environment variables passed to a child process with the standalone env program by typing env at the command line with no arguments.

Shell variables are used to store information private to the shell, including options that configure shell behavior and user-defined variables. Environment variables are used by all kinds of programs. Here are some canonical ones that are useful to know:

Commands and paths

The first part of a shell command is typically the name of a program to run. When processing the command, the shell creates a child processes and uses it to execute the specified program with specified arguments, input file, and output files.

If the name of the program is given as an absolute or relative path, then the shell executes the program at the specified path. If program is given as a name alone, then the shell searches directories in the PATH environment variable, in order, for an executable file with a name that matches the provided one. It executes the first one it finds.

The which program, when given a program name as an argument, searches directories in the PATH variable and prints the absolute path of the first program it finds with a name that matches the provided one. In other words, it tells you which program the shell would execute for a command started with the given program name.

The env program allows you to modify environment variables before executing a given program. Like the shell, and which, it searches the PATH variable to determine which program to execute. It is often used in shebangs at the top of executable script files as /usr/bin/env <interpreter> so that the script will be executed by the appropriate interpreter without its author having to know the interpreter’s exact path.

Systems aren’t required by POSIX to have env located at /usr/bin/env, but it’s the most portable solution for script shebangs in almost all scenarios. If you have a sh script that you really want to run everywhere, then using /bin/sh might be better.

Input and output

All processes have access to three special files: standard input, standard output, and standard error. By default, when the shell executes a program, it sets up the standard input file to receive keyboard input from the terminal emulator hosting the shell, and it sets up the standard output and error files so that their output is displayed in the terminal emulator.

You can use redirections to change where the input and output for these special files comes from and goes. You can also link multiple processes together using pipes, which hook up the standard output of each program in a pipeline to the standard input of the following one. You can create more sophisticated inter-process communication hookups using the mkfifo command.

You can use braces to group a list of commands together so that their combined output can be redirected as a single unit:

# outfile contains hello and world
{ echo hello; echo world; } > /path/to/outfile

Tips and tricks

Shell utilities

change the working directory
  • to go back to the previous working directory:
    cd -
change the working directory and push the current directory onto the directory stack
push a directory off the directory stack and change the working directory to it
output arguments simply
output arguments with more control
change a user’s interactive shell
locate a program in the user’s path
print environment variables, or modify environment and execute a program
read whitespace-delimited strings from standard input and execute a program with those strings as arguments
copy standard input to standard output and a list of files

Terminal emulators

Terminfo and Termcap are libraries and corresponding databases that let programs use terminals and terminal emulators in a device-independent way. They look up the capabilities of the terminal they are running on, as described by the TERM environment variable, in the databases, and allow programs to alter their behavior accordingly. You can add or modify entries in the databases to control how programs behave with your terminal.

ANSI escape sequences are standardized in-band byte sequences that control terminal behavior.

UNIX tools

In the philosophy of UNIX-like operating systems, programs are meant to be simple tools that do a small set of things well. As the user, you can configure and combine them to achieve your goals. In this section I discuss some important kinds of software in detail and maintain categorized lists of other useful programs.

The system manual is your first port of call for figuring out how software works. You can search the manual for information about a particular tool, system call, or library function by typing man <name> at the command line. Type man man for more information about using the manual.

Some of the below tools are specific to particular operating systems. Check your system’s manual to see whether you have them. You can also use the UNIX Rosetta Stone to translate between OS-level tools across different systems.

Many of these tools aren’t included in default OS installations. You can install them from the command line using your system’s package manager. It’s apt on Ubuntu, brew on macOS, and pkg on FreeBSD.

File creation

On Linux distributions, many essential software tools for file manipulation, such as ls, rm, cat, and mkdir, are from the GNU core utilities project. On flavors of BSD they are maintained and distributed as part of the base system. This means that these tools have different behaviors on the different systems.

update access and modified times; create empty file if it doesn’t exist
change file permissions
change file owner
change file flags
create hard links and symbolic links
  • to create a symbolic link:
    ln -s </path/to/symlink/target> </path/to/new/symlink>
copy files to local or remote destination

File searching, viewing, and editing

terminal file manager
find lines in a file with contents that match a regular expression
  • to find matching lines:
    grep <regular expression> </file/to/search>
  • -c prints the number of matching lines
  • -C[=num] prints lines of leading and trailing context
  • -e is useful to specify multiple regular expressions
  • -E enables extended regular expressions (special meanings for characters like ? and +)
  • -n each match is preceded by its line number in the file
  • -r recursively search subdirectories of a directory
  • -v select lines that don’t match the given expressions
rg (ripgrep)
faster and more powerful version of grep
search for files in a file hierarchy
  • to search for files with a sh extension:
    find </path/to/directory> -name "*.sh"
  • -name search by name
  • -type search by file type
  • -mtime search by modification time
faster and simpler version of find
fast generic fuzzy finder, good integrations with vim and shell history
display the last part of a file
  • to wait for and display additional data as it is appended to a file:
    tail -f </path/to/file>
list open files
text editor
nvim (neovim)
more modern, mostly backwards-compatible version of vim
create a hex dump of a binary file, or create a binary file from a hex dump
view and edit binary files in hexadecimal and ASCII
open a file in the corresponding default application on macOS
open a file in the corresponding default application on Linux or FreeBSD

File processing

create and manipulate archive files
  • to create a tar archive with gzip compression:
    tar -czvf </path/to/output.tar.gz> </path/to/input/files>
  • to extract a tar archive with gzip compression, so that the archive contents will be placed into an existing output directory:
    tar -xzvf </path/to/input.tar.gz> -C </path/to/output/directory>
video and audio converter
magick (imagemagick)
convert and edit images
universal document converter
stream edit files
pattern-directed file processing
print selected portions of each line of a file
merge corresponding lines of input files
report or filter out repeated lines in a file
sort files by lines

System administration

change a user’s password
list user groups
add a user on Linux
modify a user on Linux
add a group on Linux
manage users and groups on FreeBSD
cleanly halt, shutdown, or reboot a machine
exit an interactive shell
send email from a machine
  • to send a simple message
    mail -s <subject> <someone@example.com>
execute commands on a schedule
control daemons on Linux and FreeBSD
control daemons on macOS


list processes
send signals to processes
interactive display about processes
improved top
trace system calls on Linux
trace system calls on macOS and FreeBSD
trace system calls on macOS and FreeBSD
kernel statistics about processes, virtual memory, traps, and CPU usage on FreeBSD


transfer data to or from a server; more flexible than wget
  • to download a single file:
    curl -o </path/to/output.file> <http://domain.com/remote.file>
download files from a network
  • to download a single file:
    wget -O </path/to/output.file> <http://domain.com/remote.file>
open TCP connections
send ICMP packets to check whether a host is online
print the route taken by packets to a host
look up DNS information
ncat (nmap’s version of netcat)
scriptable TCP and UDP toolbox
show network-related data structures
information about open sockets on FreeBSD
capture and print packet contents
list and configure network interfaces on FreeBSD
manipulate network routing tables on FreeBSD
manage network interfaces and routing tables on Linux
configure firewall rules on Linux
simple front end to nftables
pfctl (packet filter)
configure firewall rules on BSDs
framework that bypasses the kernel to enable fast packet I/O


show free space for mounted filesystems
show disk usage for directories
  • to show the size of a particular directory, where -h means human-readable size and -d is the depth of subdirectory sizes to display:
    du -h -d 0 </path/to/directory>
mount filesystems or list mounted filesystems
  • to mount a filesystem:
    mount </path/to/device> </path/to/mount/point>
unmount filesystems
partition disks on FreeBSD
partition disks on Linux
create UFS filesystems on FreeBSD
create ZFS filesystems on FreeBSD
create filesystems on Linux
create a file system image from files on FreeBSD
combine file system images into a partitioned disk image on FreeBSD
work with disk images on macOS
copy files
  • to write a disk image to a storage device:
    dd if=</path/to/disk.img> of=</path/to/device> bs=8M status=progress
statistics about disk use on FreeBSD
kernel interface that allows userspace programs to export a virtual filesystem


information about peripheral devices on FreeBSD
list PCI devices on FreeBSD
configure PCI devices on FreeBSD
analyze ACPI tables on FreeBSD
list block devices on Linux
dynamic peripheral management and naming on Linux
dynamic peripheral management and naming on FreeBSD
terminal emulator for communicating over serial connections
  • to open a terminal session using a serial device:
    picocom -b <baud rate> </path/to/serial/device>
bpf (eBPF, extended Berkeley Packet Filter)
write arbitrary programs that run on a virtual machine within the kernel

Security analysis

afl-fuzz (American Fuzzy Lop plus plus)
general-purpose fuzzer
kernel fuzzer
network scanner
network packet analyzer
web proxy
wifi security tools
intercepting web proxy
dynamic binary instrumentation toolkit
binary reverse engineering tool
binary reverse engineering tool with command-line interface
identify files and code in binary firmware images
john (John the Ripper)
password cracker
password cracker with good GPU support
event auditing for Unix-like operating systems


The Secure Shell Protocol (SSH) is the most common way to get secure remote shell access to a machine. It supports a wide range of use cases, including port forwarding, X display forwarding, and traffic tunneling. The most popular implementation is OpenSSH, which I describe here.

The primary components of OpenSSH are sshd, the SSH server daemon that runs on the machine you want to access remotely, and ssh, the client application that runs on your local machine. Global configuration files for ssh and sshd can be found in /etc/ssh. /etc/ssh/ssh-config is used to configure ssh and /etc/ssh/sshd-config is used to configure sshd. Per-user configuration is in the directory ~/.ssh, and configuration files there must have permissions 700 to be used. ~/.ssh/config overrides the global /etc/ssh/ssh-config.

Key-based authentication

SSH can use various forms of authentication, including the password for the user on the remote machine, public-private keypairs, and Kerberos. Using passwords exposes you to brute-force attacks. You should configure your servers only to accept key-based authentication by adding the lines PubkeyAuthentication yes and PasswordAuthentication no to sshd-config.

You can generate a public-private keypair using the interactive ssh-keygen command. It puts both a public and private key in the ~/.ssh directory.

The private key, called id_rsa by default, stays on your local machine and is used to prove your identity. It must have permissions 600 for the programs to work correctly. You should protect your private key with a passphrase. Otherwise, someone who obtains your private key or gains access to your local user account automatically gains access to all of the machines you can SSH into.

The public key, called id_rsa.pub by default, is placed onto machines that you want to access using your private key. More specifically, the contents of id_rsa.pub are appended as a line in the file ~/.ssh/authorizd_keys in the home directory of the user that you want to log in as on the remote machine that you want to access. The authorized_keys file must have permissions 600 for the programs to work correctly. You can use the ssh-copy-id program to automatically add your public key to the appropriate authorized_keys file on a remote machine.

With your keypair set up in this way, you can SSH into the remote machine and get an interactive shell without using the remote user’s password:

ssh -i <path/to/private/key> <username>@<remote host>

In your ~/.ssh/config file, you can specify usernames and keys to use with particular hosts:

Host <remote host>
  User <username>
  IdentityFile <path/to/private/key>
  IdentitiesOnly yes # don't try default private keys

Then you can ssh into the machine with the simple command:

ssh <remote host>

SSH agents

If you set a passphrase on your private key, you will be prompted for this passphrase each time you want to use the key. You can use the ssh-agent and ssh-add programs to remember the private key passphrase for a certain amount of time:

eval `ssh-agent`
ssh-add -t <lifetime> <private key>

You can check which keys have been added to the agent as follows:

ssh-add -l

You can configure the command involving ssh-agent to be run every time you log in to your machine, so you only have to run ssh-add to store your private key passphrase. ssh-agent can also be used to implement single-sign-on across multiple remote machines, so that the passphrase for your private key only has to be entered on your local machine. This requires the ForwardAgent option to be enabled in the ssh_config file on clients and the AllowAgentForwarding option to be enabled in the sshd_config file on servers. You can then forward your agent connection with the -A flag:

ssh -A <user>@<remote host>

Traffic tunnelling

SSH can be used to tunnel your network traffic through an encrypted SOCKS proxy. This has the same effect as dedicated virtual private network (VPN) software but is much easier to set up. The below command opens a SOCKS tunnel on the specified local port number. Traffic sent to that port is encrypted and forwarded to the remote host, where it is then sent to its final destination.

ssh -D <localhost port> <user>@<remote host>

You can configure your operating system to forward all network traffic over SOCKS tunnel by specifying the appropriate localhost port number. You can also configure web browsers to forward only web-based traffic.

Local port forwarding

SSH features local port forwarding, also known as tunneling. It allows you to specify a port on your local machine such that connections to that port are forwarded to a given host and port via the remote machine. Data you send to the local port are passed through the encrypted SSH tunnel to the remote machine and then sent by the remote machine to the destination you specify. This is useful for getting access to a service behind a firewall from your local machine:

ssh -L <localhost port>:<destination host>:<destination port> <user>@<remote host>

Remote port forwarding

SSH features remote port forwarding, also known as reverse tunneling. It allows you to specify a port on the remote machine such that connections to that port are forward to a given host and port via your local machine. Data sent to the remote port are passed through the encrypted SSH tunnel to your local machine and then sent by your local machine to the destination you specify. By default, the remote port is only accessible from the remote host itself. You can open it to the wider Internet by enabling the GatewayPorts option in the sshd_config file on the remote machine. This can be run on a machine that’s behind a firewall or NAT to enable other machines to access it:

ssh -R <remote port>:<destination host>:<destination port> <user>@<remote host>

X forwarding

SSH can also forward X graphical applications from a remote host to your local machine. The X11Forwarding option must be enabled in the sshd_config file on the remote machine. You must be running an X server on your local machine, and the DISPLAY environment variable must be set correctly in your local machine’s shell.

DISPLAY tells an X application where to send its graphical output. Its format is <hostname>:<display>.<screen>. A display is a collection of monitors that share a mouse and keyboard, and most contemporary computers only have one display. A screen is a particular monitor. The hostname is the name of the host running the X server. It can be omitted, in which case localhost will be used. The display number must always be included, and numbering starts from 0. The screen number can be omitted, in which case screen 0 will be used. For example, a DISPLAY value of :0 means that the X server is running on localhost, and graphical output should be rendered on the first screen of the first display.

You can then run ssh -X to enable X forwarding. SSH should set DISPLAY in your shell session on the remote host to localhost:10.0, and it will tunnel traffic sent there to the X server on your local machine.

File transfers

SSH can transfer files to and from remote machines with the scp and sftp commands. The program sshfs, which is not part of OpenSSH, can mount directories on a remote machine using SSH.


To debug issues with SSH, you can run ssh -vvv and sshd -ddd for verbose debugging output. If you are using SSH to access a server and your Internet connection is spotty, dropped connections can be frustrating. One way to address this is by running tmux on the remote machine, so that you can reattach to sessions if you get dropped. If you are mobile or have a truly terrible Internet connection, mosh is a less featureful alternative to SSH that provides a better experience.


Version control software keeps track of how files change over time and allows multiple people to work on the same set of files concurrently. git is now the most widely used program for version control. It’s a complicated and flexible tool: some ways of using it make working in teams easy, while others make it really painful. This section describes some key git concepts and suggests a good way to use it for working collaboratively.



The files for a git-controlled project exist in a directory on your filesystem known as the work tree. As version control software, git keeps track of how files change over time. But changes that you make in the work tree aren’t automatically reflected in git’s historical record.

git records changes in terms of commits. Technically, commits are snapshots of the work tree, but for this discussion we’ll say that they contain a set of modifications that affect one or more files. These modifications are also colloquially called changes or diffs, for differences.

To create a commit, first make some changes in your work tree. You can run git status to see which files have changed in your work tree and git diff to see exactly what the changes are.

Commits are prepared in the staging area. Run git add <file> to add changes affecting a file in the work tree to the staging area. You can check exactly what’s in staging with git diff --staged.

You can create a commit by running git commit and adding a message when prompted. Changes are then taken from the staging area and added to git’s historical record as a commit. Each commit is assigned a unique hash identifier. git show <commit> shows the changes associated with a particular commit.

To remove a file from the staging area, run git restore --staged <file>. To discard changes to a file in your working directory, run git restore <file>.


It’s common to work with multiple versions of the same project. For example, you may want to add a big button to your website, but you aren’t sure whether red or blue would be a better color. You decide to create different versions of the site to check.

git supports different project versions via branches. A branch is a named sequence of commits, where each commit except the first one points to a parent commit that happened before it. A ref is a human-readable name that references a particular commit. Each branch name is a ref that points to the most recent commit, or tip, of the corresponding branch.

With git, you are always working on some branch. Most projects start with a branch called main, which holds the main version of the project. When you make a commit, the current branch’s ref is updated to point to the new commit, and the new commit’s parent is set to the previous tip of the branch, i.e. the previous value of the branch’s ref.

You can list branches and determine your current branch with git branch. You can switch branches with git switch <branch-name>. HEAD is a special ref that always points to the tip of the current branch. You can run git log to see the sequence of commits that makes up the current branch, starting from the tip.

To create a new branch and switch to it, run git switch -c <new-branch-name>. The ref of the new branch then points to the same commit as the ref of the branch you switched from. When you make commits in this new branch, they only change the new branch’s ref. They do not modify the ref of the original branch or the commit sequence that is shared between the new and original branches. The new and original branches are said to diverge at the start of that shared commit sequence.

Returning to our button example, say you evaluate the different colors by creating new branches off of your website’s main branch. Call one red-button and the other blue-button. In each branch, you make a commit that adds the appropriately colored button.

You decide you like red best. Now, you want to get the changes you made in the red-button branch into the website’s main branch. There are a few common ways to do so.


One way to integrate changes from one branch into another is to perform a merge. You can merge changes from the red-button branch into main by switching to main and running git merge red-button. For this discussion, we’ll call the branch that the changes are coming from the source branch and the branch being merged into the target branch.

In a merge, git takes the commits that have been made in the source branch since it diverged with the target branch and applies them to the target.

If the target branch hasn’t experienced any commits since it diverged with the source branch, then the source branch is just the target branch with additional commits added. In this case, which is known as a fast-forward merge, the merge operation simply sets the target branch’s ref to the source branch’s ref.

If both target and source branches have experienced commits since they diverged, then the merge operation adds a new commit to the target branch, known as a merge commit, that contains all of the changes from the source branch since divergence. After the merge, the branch histories are joined: merge commits have two parents, which are the refs of the source and target branches from before the merge.

After a merge, commits that were made in the source branch are displayed in the git log of the target branch, interleaved with commits that were made in the target branch according to creation time. You can run git log --graph to see a graphical representation of previously merged branches. Running git log --first-parent shows only commits that were made in the target branch.

If a target and source branch have made different changes to the same part of a file since divergence, then the merge may not be able to happen automatically. This is known as a merge conflict. In this case, git will pause in the middle of applying the merge commit and allow you to manually edit conflicting files to decide which changes should be preserved. When you are done resolving the merge conflict, add the conflicting files to the staging area and run git merge --continue. Running git merge --abort cancels a paused merge.

When you run git show on a merge commit, it will only show changes in files that have been modified relative to both parents. This means that it typically only shows files in which you manually merged changes as part of a conflict resolution. It also doesn’t include the context for those conflict resolutions. You can run git show --first-parent to see all changes made by a merge commit relative to the target branch, and git show --remerge-diff to see the context for merge conflict resolutions.



GNU Privacy Guard (GPG) is a good way to encrypt and decrypt individual files. It supports symmetric (passphrase-based) and asymmetric (keypair-based) encryption.

For GPG commands that produce binary output, the -a flag encodes the binary output as ASCII text for ease of transmission.

Symmetric encryption

Use the following command for symmetric encryption:

gpg -o <encrypted output file> -c --cipher-algo AES256 <plaintext input file>

You will be prompted to choose a passphrase.

Asymmetric encryption

Asymmetric encryption requires working with GPG keypairs. GPG keypairs are distinct from SSH keypairs.

To create a GPG keypair, run gpg --generate-key, which will prompt you to provide a name and email address to associate with the created keypair and then to enter a passphrase for the private key. You can reference a keypair by its id, name, or email address in GPG commands. You can list keys in GPG’s keyring with gpg --list-keys and edit keys with gpg --edit-key <key>.

To export and import public keys, use gpg -o <output key file> --export <key> and gpg --import <input key file>. The --export-secret-key and --allow-secret-key-import flags do the same thing for private keys.

With asymmetric encryption, you encrypt a file for a given public key that is present in your GPG keyring, and the corresponding private key is required to decrypt it:

gpg -o <encrypted output file> -e -r <recipient key> <plaintext input file>


Use the following command to decrypt a file encrypted by GPG:

gpg -o <plaintext output file> -d <encrypted input file>

You will be prompted to enter the passphrase for either the symmetric encryption or the appropriate private key. You can configure gpg-agent to reduce the number of times you have to enter a private key’s passphrase.


rclone is an excellent way to perform encrypted cloud backups. In ~/.config/rclone/rclone.conf, set up a crypt backend over a backend for your cloud provider. You can then use rclone sync to make an encrypted cloud backup match the contents of a local folder:

rclone sync --links --fast-list <path/to/local/folder> <crypt-backend:>

You can use the rclone bisync command to make an encrypted cloud backup sync bidirectionally with multiple clients.

You can also mount an encrypted cloud drive as a local filesystem:

rclone mount --vfs-cache-mode full <crypt-backend:> <path/to/local/mount/point>

Virtual environments

In the course of developing software, you’ll run into a lot of virtual environments. Below I describe the main kinds, how they’re useful, and some tools that can help you use them effectively.


Virtualization is when a piece of software known as a hypervisor uses features of real computer hardware to create and manage a set of virtual machines that can run guest operating systems as if each guest OS were running on an isolated instance of the underlying hardware. Sometimes the guest OSes are aware of the fact that they’re running on a virtual machine – this is known as paravirtualization – and sometimes they aren’t.

Virtualization is useful for setting up isolated development environments without polluting your primary operating system. It’s also useful for testing software against a wide range of operating systems or getting access to software that isn’t available on your primary OS.

Virtual machines are represented as disk-image files, which contain the guest OS, and hypervisor configuration files. Using identical images and configuration files results in identical instantiated VMs. These files can be distributed to share development environments, to package pre-configured applications with their dependencies, and to guarantee that networked applications are tested and deployed in identical environments.

Because virtualization uses hardware directly, it’s fast relative to emulation. However, it comes with the limitation that each virtual machine has the same architecture as the underlying physical hardware.

Popular hypervisors are kvm, bhyve, Hyper-V, the macOS Virtualization Framework, and Xen.


Emulation is when a software emulator mimics the behavior of hardware. It’s broadly similar to virtualization, except that because an emulator works purely in software, it can emulate any kind of hardware. Operating purely in software also makes emulation slower than virtualization.

Emulation can be used for the same things as virtualization, but its worse performance makes it less likely to be used for distributing or running production applications. It is most useful for testing software across a wide range of hardware architectures, peripheral devices, and operating systems. You can also run nearly any piece of software, even ancient relics, using emulation.

The most popular general-purpose emulator is QEMU.


The distinction between emulators and simulators is subtle. While emulators emulate the behavior of an entity, simulators simulate the entity’s behavior as well as some aspect of its internal operation. For example, an emulator for a particular CPU could take a set of instructions and execute them in any way, as long as the externally observable results are the same as they would be for the CPU being emulated. On the other hand, a simulator might execute the set of instructions in the same way that the real CPU would, taking the same number of simulated cycles and using simulated versions of its microarchitectural components.

In some sense, whether a piece of software is an emulator or simulator depends on the level of detail you’re interested in. Generally, though, simulators are much slower than emulators. They are typically used to do high-fidelity modeling of hardware before investing the resources to produce a physical version. A popular hardware simulator is gem5.


Containers are lightweight virtual environments within a particular operating system. They isolate applications by limiting unnecessary access to system resources. If you’re deploying a networked application, you should always run it in some kind of container to limit damage in case it is compromised.

Some containerization systems represent containers as image files that can be instantiated by a container runtime. These kinds of containers are similar to virtual machines, but because they work within a particular operating system, they are both more efficient and less flexible. They are most often used to package pre-configured applications with their dependencies and to guarantee that networked applications are tested and deployed in identical environments.

A compelling reason to use image-based containers for networked services is the ability to use container orchestration frameworks. Orchestration frameworks, such as Kubernetes, facilitate the deployment and automatic scaling of containerized applications across clusters.

Popular containerization systems include FreeBSD’s jails and Docker on Linux. Using chroot on UNIX-like systems changes the apparent root directory for a process, but it is not a containerization system.

Compatibility layers

Compatibility layers are interfaces that allow programs designed for some target system to run on a different host system. There are many kinds of compatibility layers, but typical ones work by implementing target system library function calls in terms of functions that are available on the host system. Some compatibility layers require recompiling the program, and others work on unmodified target system binaries.

Notable compatibility layers include:


Vagrant is a tool for managing virtual machines. It facilitates declarative VM configuration and provides a single interface to control multiple underlying hypervisors.

Vagrant has the concept of a box, which is a file format for images that can be instantiated as virtual machines. A virtual machine is specified by a Vagrantfile, which references the box to instantiate the VM from. It also lets you specify how the VM should be accessible from the host – for example, you can configure ssh access and shared folders – and commands to be run when the VM starts. There are all kinds of boxes available on the Vagrant Cloud, and you can use these as a starting point to manually configure a VM and then package it as a distinct box.

Because a VM is specified by its Vagrantfile, you don’t need to keep instantiated virtual machine images around. You can quickly create the VM, use it to build and test some code, then delete it when you’re done. The next time you need to work on the same code, you can use the same Vagrantfile to instantiate an identical VM from the underlying box. Setting up another identical VM on a different host is as simple as sharing the Vagrantfile. You can keep box files locally or let Vagrant download them as necessary.

Vagrant supports many VM providers by default, including VirtualBox, Hyper-V, and VMWare. There is also a plugin for Vagrant that adds support for libvirt as a VM provider. libvirt is an API for managing a wide range of virtual environment technologies, including QEMU.

Useful commands

To be run in the directory containing a Vagrantfile;

vagrant up
Download the Vagrant box, instantiate the VM, then start or resume the VM, as necessary.
vagrant ssh
Open an ssh session to the running VM.
vagrant suspend
Save the VM’s running state.
vagrant halt
Shut down the VM and save its persistent state.
vagrant destroy
Remove the VM from your system. The Vagrant box will stick around.
vagrant package
Package the VM as a box.

Can be run from anywhere:

vagrant box remove <box name>
Remove the specified box from your system.

Vagrant box files are stored in ~/.vagrant.d by default. Where VM images are stored depends on the provider you’re using. For VirtualBox the default location is ~/VirtualBox.

Useful Vagrantfile options

The box to use.
Controls how files are shared between host and guest. Can do nfs, smb, a one-timersync push, or VirtualBox shared folders. By default, if you don’t have this line in the Vagrantfile, Vagrant shares the directory containing the Vagrantfile with the host. It uses the provider’s default method and enables symlinks if they are supported!
config.vm.network "forwarded_port", guest: <guest port>, host: <host port>, host_ip: ""
Forwards a port from the host to the guest, with access only allowed from the host.
config.vm.provision :shell
Shell commands that run when you use vagrant up or vagrant reload --provision. You can use a path to a shell script or inline commands.


Docker is a tool for creating and running containers. Containers are instantiated from images, and you can instantiate multiple disposable containers from a single image. You can specify how an image should be built, including which packages and files to include, in a Dockerfile. Base images can be pulled from Docker Hub or custom-made.

Docker creates images in the Open Container Initiative (OCI) format. It uses the containerd daemon to manage images and containers at a high level. To actually instantiate containers, containerd uses runc.

When working on a networked application, you should put every constituent service in a separate container. For example, a database should be in a distinct container from the implementation of an API that uses it. Docker compose is a useful way to manage multi-container applications that will run on a single host. Kubernetes is a powerful container orchestration framework for deploying containerized applications on a cluster.

Docker runs natively on Linux, but there is a Docker Desktop edition for macOS and Windows. It runs the Docker stack inside a Linux VM and wraps everything up inside an Electron GUI. I recommend avoiding it and running your own Linux VM.

Useful commands

docker build
Create a new image from a Dockerfile.
docker commit <container> <tag>
Create a new image from a container’s current state.
docker run <image>
Start a container.
docker stop <container>
Stop a container.
docker ps -a
Show containers.
docker images
Show downloaded images.
docker rm <container>
Remove a container.
docker rmi <image>
Remove an image.


QEMU is a full-system emulator. It’s useful for testing system-level software across a wide range of hardware platforms and architectures. It has device models that emulate real peripheral devices, and it also supports VirtIO. When QEMU is emulating a target architecture that matches the architecture of the host machine, it can use hypervisors such as kvm on the host machine to achieve performance on par with that of virtualization.

A notable feature of QEMU is user-mode emulation, which is supported on Linux and BSDs. It supports running binaries compiled for the same operating system but a different architecture, and it’s lighter weight than doing full-system emulation. It’s useful for testing and debugging cross-compiled applications.