If you’re like most sysadmins, you know that one of the biggest challenges of running a modern server is keeping it running. You constantly have to keep an eye on the CPU, memory, and disk usage to make sure your servers don’t overheat or run out of space. But what if your server’s daemon stops working? Suddenly your servers are down and you have to figure out how to get them back up and running as quickly as possible. In this article, we’ll show you how to keep Docker containers running when the daemon stops working. We’ll also discuss some common issues that can cause a daemon to stop working, and how you can troubleshoot them. ..
When Docker terminates, all your containers are stopped. The default installation doesn’t let containers run unless the daemon is also up. Here’s how to minimize workload downtime by keeping containers alive during a daemon outage.
Why Does This Matter?
Docker has proven to be a reliable system that’s capable of supporting solutions in production. That’s not to say it’s infallible. You could still encounter a crash that knocks the daemon out of action, taking your containers offline.
In another scenario, your operating system’s package manager might auto-update Docker, causing a daemon restart and a brief period of downtime. Ideally, these situations could be resolved without any impact on your workloads. As the daemon only manages containers, implementing commands like docker run and docker rm, there’s no inherent need for it to stick around through the intervening period in a container’s lifecycle.
Container Live Restore
Docker supports a system called “live restore” which makes this possible. Instead of terminating containers during daemon shutdown, Docker will keep them running. It’ll pick up where it left off once restarted.
Live restore must be manually enabled. You can use it on a one-off basis by running dockerd with the –live-restore flag:
To permanently enable live restore, add it to your Docker daemon configuration file. This is usually found at /etc/docker/daemon.json. You’ll need to create the file if it doesn’t already exist.
Next you need to instruct Docker to reload its configuration. A reload will not impact your containers, unlike a full daemon restart.
Live restore should now be activated. You can test it out by stopping the Docker daemon.
Any running containers should stay active, even though the daemon is shutdown. You won’t be able to use docker commands, as the daemon connection will be gone, but the containers will keep running and will retain their network connections.
Docker will automatically detect the existing containers when it restarts. You’ll be able to continue where you left off, without having to suffer any downtime.
Handling Sustained Daemon-less Running
Running containers without an active daemon connection shouldn’t have any serious consequences, even over a sustained time period. However, you will find logs start to get lost during a prolonged daemon outage.
Docker containers pipe their logs into a first-in first-out (FIFO) buffer. The Docker daemon reads the buffer contents to create the persisted log files you view with docker logs.
The default buffer size is only 64K so it can be exhausted if the daemon’s not actively reading its contents. When the buffer fills, no more logs can be handled until the daemon completes a buffer flush. You can increase the buffer size by editing the value of /proc/sys/fs/pipe-max-size.
Live Restore Caveats
Live Restore should cover most scenarios where the Docker daemon shuts down and later recovers. This includes Docker updates but only between minor patch releases. If you install a new major Docker version (such as 19.03 to 20.10), Live Restore won’t be used and the Docker daemon will always be restarted.
You should be wary of using Live Restore as a way to edit Docker daemon settings on the fly. Changing some options, such as bridge IP addresses, will prevent containers from restoring properly when the daemon restarts. If this happens, you’ll need to manually stop all the affected containers and replace them with new ones. This situation could also arise if your operating system assigns a different networking setup after a reboot.
Live Restore is intended for use during Docker updates and unplanned daemon outages. If you need to edit daemon settings, try to plan for downtime instead. You can also use systemctl reload docker to reload configuration files without completely restarting the daemon.
There’s not yet Live Restore for Windows-based containers. You can use Live Restore on Windows with Linux-based containers. It’s built-in to Docker Desktop and is enabled via Preferences > Daemon > Advanced.
Conclusion
Live Restore lets you minimize disruptive downtime by keeping containers running in the absence of the Docker daemon. If you need to install an urgent Docker update, or you hit a surprise crash, your workloads should stay operational while the daemon restarts.
Activating Live Restore is a best practice step when running Docker in production. Configuration analysis tools may flag installations which don’t have it enabled.
Beyond using Live Restore, you should ensure your containers have got appropriate restart policies too. Using restart: always will make individual containers come back up after an OS restart, or any other daemon launch where Live Restore couldn’t be used.