Set it as a "star" and take you to play Linux every day!

Since version 1.12, Docker has introduced a native health check implementation. For containers, the simplest health check is a process-level health check, which checks whether the process is alive. Docker Daemon will automatically monitor the PID1 process in the container. If restart policy is specified in the docker run command, it can automatically restart the terminated container according to the policy. In many practical scenarios, just using a process-level health check mechanism is not enough. For example, although the container process is still running, it cannot continue to respond to user requests due to application deadlock. Such problems cannot be found through process monitoring.

After the container starts, the initial state will be starting (starting). Docker Engine will wait for the interval time, start executing the health check command, and execute it periodically. If the return value of a single check is not 0 or the operation takes longer than the specified timeout time, the check is considered to have failed; if the continuous failure of the health check exceeds the number of retries retries, the status will become unhealthy (unhealthy).
Note:
1. Once a health check is successful, Docker will put the container back to the healthy state 2. When the health state of the container changes, Docker Engine will emit a health_status event. There are two ways to monitor the status by checking the container:▍1. Dockerfile method
The application's own health check configuration can be declared in the Dockerfile. The HEALTHCHECK instruction declares a health check command, which is used to judge whether the service status of the main process of the container is normal, so as to compare the actual status of the container.
HEALTHCHECK instruction format:
- HEALTHCHECK [options] CMD <command>: Set the command to check the health of the container
- HEALTHCHECK NONE: If the base image has a health check command, use this line to block it
Note: HEALTHCHECK can only appear once in the Dockerfile, if multiple are written, only the last one will take effect.
The image built using the Dockerfile containing the HEALTHCHECK instruction has the function of health status check when the Docker container is instantiated. A health check occurs automatically after starting the container.
Parameter reference:
https://docs.docker.com/engine/reference/builder/#healthcheckHEALTHCHECK supports the following options:
- --interval=<interval>: the interval between two health checks, the default is 30 seconds;
- --timeout=<interval>: The timeout time for the health check command to run. If this time is exceeded, the health check will be regarded as a failure. The default is 30 seconds;
- --retries=<number of times>: After the specified number of consecutive failures, the container state will be regarded as unhealthy, the default is 3 times.
- --start-period=<interval>: The initialization time of the application startup, the health check failure during the startup process will not be counted, the default is 0 seconds;
The function of the parameter is explained as follows:
- The health check first runs within interval seconds after the container is started, and then again within interval seconds after the previous check completes.
- A status check is considered to have failed if it takes longer than timeout seconds.
- A container is considered unhealthy if its health check fails retries consecutive times.
-
start period provides initialization time for containers that need time to start. Probe failures during this period will not count towards the maximum number of retries.
However, if the health check succeeds during startup, the container is considered to have started, and all consecutive failures will count towards the maximum number of retries.
The command after HEALTHCHECK [option] CMD has the same format as ENTRYPOINT, divided into shell format and exec format. The return value of the command determines the success of the health check:
- 0: success;
- 1: failed;
-
2: keep the value, don't use
Assuming that an image is the simplest Web service, we want to add a health check to determine whether its Web service is working normally. We can use curl to help determine, and the HEALTHCHECK of its Dockerfile can be written as follows:
FROM nginx:1.23
HEALTHCHECK --interval=5s --timeout=3s --retries=3 \
CMD curl -fs http://localhost/ || exit 1
Checks are set here every 5 seconds (this is a very short interval for testing purposes, which should actually be relatively long), if the health check command does not respond for more than 3 seconds, and if it does not respond three times, it is considered a failure, and curl-fs is used http://localhost/ || Exit 1 as a health check command.
Use docker build to build this image:
docker build -t myweb:v1 .
Start the container after building:
docker run -d --name web myweb:v1
After running the image, you can see the initial status through docker container ls (health: starting):
docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7068d793c6e4 myweb:v1 "/docker-entrypoint...." 3 seconds ago Up 2 seconds (health: starting) 80/tcp web
After waiting for a few seconds, docker container ls again, you will see that the health status changes to (healthy):
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7068d793c6e4 myweb:v1 "/docker-entrypoint...." 18 seconds ago Up 16 seconds (healthy) 80/tcp web
If the health check fails continuously for more than the number of retries, the status changes to (unhealthy). To aid in troubleshooting, the output of the health check command (both stdout and stderr) is stored in the health state, which can be viewed with docker inspect.
$ docker inspect --format '{{json .State.Health}}' web | python -m json.tool
{
"FailingStreak": 0,
"Log": [
{
"End": "2022-08-20T14:02:38.19224648+08:00",
"ExitCode": 0,
"Output": "xxx",
"Start": "2022-08-20T14:02:38.116041192+08:00"
},
{
"End": "2022-08-20T14:02:43.271105619+08:00",
"ExitCode": 0,
"Output": "xxx",
"Start": "2022-08-20T14:02:43.200932585+08:00"
}
],
"Status": "healthy"
}
▍ 2. docker run method
Another way is to directly specify the healthcheck related policy in the docker run command:
$ docker run -d \
--name=myweb \
--health-cmd="curl -fs http://localhost/ || exit 1" \
--health-interval=5s \
--health-retries=12 \
--health-timeout=2s \
nginx:1.23
View the relevant parameters and explanations by executing the docker run --help | grep health command as follows:
- --health-cmd string: Run command to check health status
- --health-interval duration: running interval time (ms|s|m|h) (default 0s)
- --health-retries int: Number of consecutive failures to report unhealthy
- --health-start-period duration : The start period (ms|s|m|h) that the container initializes before starting the health retry countdown (default 0)
- --health-timeout duration: Maximum time (ms|s|m|h) to allow a check to run (default 0s)
- --no-healthcheck: Disable the HEALTHCHECK specified by any container, which will invalidate the HEALTHCHECK function built by the Dockerfile.
$ docker run --rm -d \
--name=myweb \
--health-cmd="supervisorctl status" \
--health-interval=5s \
--health-retries=3 \
--health-timeout=2s \
nginx:v1
According to the setting of this parameter, if the supervisorctl status checks that the sub-service has an abnormal RUNNING state, then after waiting for about 15 seconds, the monitoring state of the container will change from (healthy) to (unhealthy)
▍3. docker-composer method
In docker-composer, you can use the following methods to check the health status of the container (take the container managed by the supervisor as an example):
version: '3'
services:
web:
image: nginx:v1
container_name: web
healthcheck:
test: ["CMD", "supervisorctl", "status"]
interval: 5s
timeout: 2s
retries: 3
After the execution is successful, wait a few seconds to query the status of the container:
$ docker-compose ps
Name Command State Ports
--------------------------------------------------------------------------------
web supervisord -c /etc/superv ... Up (healthy) 443/tcp, 80/tcp
When some sub-services inside are stopped by manual supervisorctl stop, so that the status of the sub-services inside is not all in the RUNNING state, then check the status of the container:
healthcheck:
disable: true
This article is reproduced from: "seafog's blog", the original text: https://url.hi-linux.com/mgvKJ, the copyright belongs to the original author. Contributions are welcome, submission email: editor@hi-linux.com