Thursday, March 17, 2022

I got bit by the Docker pid 1 Zombie child reaping problem

Heh, how's that for a hook?

I was having the hardest time porting a collection of test suites to a new testing lab.

In particular, I was struggling with a shell script that does a strange thing:

  • The script is trying to shutdown a server process.
  • After various other steps, the script does a kill -9 to kill the process.
  • Then the script enters a loop where it checks whether the process still exists,
    • And it doesn't exit that loop until the process is gone

And the script was hanging.

Now, I agree, that seems like a strange thing for a script to do, since as we all know, signal 9 cannot be blocked nor caught, and the signal action is to end the process. Why was the script bothering to verify that the signal 9 worked?

But, more curious, why wasn't the signal working?

At first, I followed a wild goose chase, convinced that I was struggling with some sort of security issue which was preventing one process from inquiring about the status of another process. We were using kill -0 [PID] to check to see if the other process existed, so I tried messing about with that. I tried switching that code to use test -e /proc/[PID]. I tried using /bin/kill to see if we were somehow using a Bash with a built-in kill command that was malfunctioning. I tried using various incantations of the ps command to examine the target process, like ps -o pid= -p [PID] and ps -ef | grep \w[PID]\w | grep -v grep | wc -l.

All of these various approaches agreed: the process was still alive.

How could kill -9 not kill the process?

Finally I collected ps -ef output at the point where the termination script was hanging.

And, indeed, the process still existed!

It was in [defunct] state.

At this point, I finally knew what was really going on: the system wasn't reaping Zombie processes properly.

This is the function of the init process, so then I started searching the internet for things like Zombie defunct init Docker and I very quickly found the answer:

  • Docker and the PID 1 zombie reaping problem
    When building Docker containers, you should be aware of the PID 1 zombie reaping problem. That problem can cause unexpected and obscure-looking issues when you least expect it. This article explains the PID 1 problem, explains how you can solve it, and presents a pre-built solution that you can use: Baseimage-docker.

At this point, one (properly) skeptical colleague of mine said: "why are you pointing us at a 7-year-old blog post? Surely that's been fixed by now?".

But it hasn't! It's mostly fixed, but it's still a problem. The "fix" was to document the behavior:

  • Run multiple services in a container
    The container’s main process is responsible for managing all processes that it starts. In some cases, the main process isn’t well-designed, and doesn’t handle “reaping” (stopping) child processes gracefully when the container exits. If your process falls into this category, you can use the --init option when you run the container. The --init flag inserts a tiny init-process into the container as the main process, and handles reaping of all processes when the container exits.

There's a long tortured backstory for people who want to learn the gory details:

  • Zombie Processes
    If like in standard docker container launching a command, there is no proper init process, nobody will care about orphaned processes and they will stay here as zombies also called defunct.
  • Introducing dumb-init, an init system for Docker containers
    Lightweight containers have made running a single process without normal init systems like systemd or sysvinit practical. However, omitting an init system often leads to incorrect handling of processes and signals, and can result in problems such as containers which can’t be gracefully stopped, or leaking containers which should have been destroyed.
    Tini is the simplest init you could think of. All Tini does is spawn a single child (Tini is meant to be run in a container), and wait for it to exit all the while reaping zombies and performing signal forwarding.

Note that tini is init spelled backward. We programmers like that sort of joke.

And buried in that backstory is the odd side-story that bash, if used as --entrypoint, does reap zombie children, which explains why all my simple reproduction tests never reproduced the problem cleanly, so that I could only reproduce it with a two hour complicated full test run (groan).

But the bottom line is: docker run --init is there for a reason.

And kill -9 doesn't always "work", if init isn't reaping processes.

1 comment:

  1. Since I like to skim articles on the computer and try to get one or two ideas to retain, this one does not even offer that to my pea-brain.