Why do you need an init process inside your Docker container (PID 1)

When you run your application inside a Docker container, it will be assigned process identifier (PID) 1. This particular PID is special in the Unix world. PID 1 is assigned to the very first process that the kernel starts, therefore it takes a special role in the system.

Why is PID 1 so special?

  • The default action is ignored when receiving signals, which means your process will not implicitly terminate on SIGINTor SIGTERM. Usually, operating systems terminate your process when processes receive SIGINTor SIGTERM.

  • Any orphaned process is adopted by PID 1.

Let's discuss these more in-depth, especially in the Docker world.

Signals

Do you know what actually happens behind the scenes when you run docker stop?

The main process (PID 1) inside the container will receive SIGTERM, and after a grace period, SIGKILL signal.

By default, Docker waits 10 seconds after SIGTERM before killing it with SIGKILL.

Did you ever have a container that took a long time to stop? Did it take 10 seconds accurately? That means your application doesn't handle signals explicitly!

Here is a simple command that spins up a Node.js container that runs forever:

docker run \
  -d \
  --rm \
  --name node-app \
  node:alpine \
  node -e "setInterval(() => {}, 1000);"

Now stop the container and measure the time:

time docker stop node-app

On my system, it took 10.644 seconds to stop the container.

real    0m10.644s
user    0m0.030s
sys     0m0.062s

It means the container couldn't be stopped gracefully, it had to be killed.

This is all because your Node.js application runs as PID 1, which doesn't run the default actions of signals, which would be process termination in this case.

Now let's handle the SIGTERM signal in the Node.js application:

docker run \
  -d \
  --rm \
  --name node-app \
  node:alpine \
  node -e "
      const interval = setInterval(() => {}, 1000); 
      process.on('SIGTERM', () => clearInterval(interval));
  "

When SIGTERM is received, the interval will be stopped, so the process will exit because there are no things to do.

Stop the container again:

time docker stop node-app

On my system, it took about half a second this time.

real    0m0.613s
user    0m0.000s
sys     0m0.060s

This means the Node.js application has been gracefully stopped instead of being killed after the 10 seconds timeout.

Orphaned processes

When a process dies, all of its children become an orphan and are adopted by PID 1. Now it's more interesting what happens when that child finishes its execution or dies for whatever reason.

Let's run a command that showcases orphaned processes. The following command will launch an Ubuntu container and runs the sh -c "sleep 10 & exec sleep 1000" command. This command creates a shell and executes sleep 10 & exec sleep 1000. The shell creates sleep 10 process in the background and replaces the shell via exec with sleep 1000. Therefore PID 1, in the beginning, is sh, but sleep 1000 takes its place.

docker run -d --rm --name orph ubuntu sh -c "sleep 10 & exec sleep 1000"

Right after running the command, print out the processes in the container:

docker exec orph ps -eaf

If you run it in 10 seconds, you should see a similar output:

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 16:34 ?        00:00:00 sleep 1000
root         7     1  0 16:34 ?        00:00:00 sleep 10

As you can see, sh doesn't exist because sleep 1000 replaced it.

Now wait until sleep 10 finishes, and print out the processes again:

docker exec orph ps -eaf

This time you should see a different output:

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 16:34 ?        00:00:00 sleep 1000
root         7     1  0 16:34 ?        00:00:00 [sleep] <defunct>

As you can see, sleep 10 finished, and it became defunct. Terminated processes are supposed to be cleaned up by their parent processes. This process's parent terminated, and the current PID 1 process doesn't take care of this process because it doesn't know about it - and sleep doesn't handle child processes. These processes are called zombie processes.

In normal circumstances, you should never see zombie processes in your process list. PID 1 should take care of removing zombie processes from the process table.

Using init

When you boot up a Unix-based operating system, the PID 1 will be an init process. This process takes care of reaping the zombie processes throughout your system's uptime.

Since orphaned processes are always adopted by PID 1 - the init process -, it can take care of those zombie processes easily.

In a Docker container, an init process should also take care of forwarding the signals to your application as well.

Let's rewrite the previous command a little bit:

 docker run -d --rm --name orph ubuntu sh -c "sh -c 'sleep 10 & exec sleep 1' & exec sleep 1000"

This command will produce the following process list:

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 21:42 ?        00:00:00 sleep 1000
root         7     1  0 21:42 ?        00:00:00 sleep 1
root         8     7  0 21:42 ?        00:00:00 sleep 10

sleep 1000 becomes PID 1, sleep 1 is a child of sleep 1000 and sleep 10 is a child of sleep 1.

sleep 1 immediately finishes, then you will see the following process list:

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 21:42 ?        00:00:00 sleep 1000
root         7     1  0 21:42 ?        00:00:00 [sleep] <defunct>
root         8     1  0 21:42 ?        00:00:00 sleep 10

As you can see, sleep 10 was adopted by sleep 1000. Because sleep 1 terminated, which also became a zombie process.

After 10 seconds, when sleep 10 finishes, it becomes a zombie process as well:

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 18:59 ?        00:00:00 sleep 1000
root         7     1  0 18:59 ?        00:00:00 [sleep] <defunct>
root         8     1  0 18:59 ?        00:00:00 [sleep] <defunct>

This time use the --init flag. This flag will boot up Tini, a lightweight init system in the container as PID 1.

 docker run --init -d --rm --name orph ubuntu sh -c "sh -c 'sleep 10 & exec sleep 1' & exec sleep 1000"

You will immediately see that sleep 1 became a zombie process and sleep 10 is adopted by PID 1:

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 21:45 ?        00:00:00 /sbin/docker-init -- sh -c sh -c 'sleep 10 & exec sleep 1' & exec sleep 1000
root         7     1  0 21:45 ?        00:00:00 sleep 1000
root         8     7  0 21:45 ?        00:00:00 [sleep] <defunct>
root         9     1  0 21:45 ?        00:00:00 sleep 10

As you can see, sleep 1 is a zombie process and stays as-is until sleep 1000 finishes. This is because sleep doesn't take care of cleaning up child processes.

After 10 seconds, sleep 10 is terminated and disappears from the process list.

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 21:45 ?        00:00:00 /sbin/docker-init -- sh -c sh -c 'sleep 10 & exec sleep 1' & exec sleep 1000
root         7     1  0 21:45 ?        00:00:00 sleep 1000
root         8     7  0 21:45 ?        00:00:00 [sleep] <defunct>

This proves that tini takes care of reaping zombie processes.

Now to prove that tini forwards signals, let's stop the docker container:

docker stop orph

The container stopped immediately, instead of waiting 10 seconds. Therefore signal forwarding is proven as well.

Why is this important at all?

Zombie processes reserve the process ID until they are properly cleaned up. Your operating system has a finite amount of process IDs. Therefore they can fill up the process table, and the chaos begins!

This isn't an issue if you are not starting new processes from your application. If you don't have child processes, it's impossible to have zombie processes. Although I still recommend using tini because it's available in Docker by default, and it's easy to switch on for both individual Docker commands (--init) and Docker Compose services (init: true).

Example Node.js project

A repository is available at GitHub/david-szabo97/node-docker-init-or-not-to-init, which showcases zombie processes in a single command.