Kubernetes: Graceful shutdown of SpringBoot Pod

Yashwanth Kumar Nimmala
5 min readJul 2, 2022

--

Throughout the lifecycle of an application, running pods are terminated due to multiple reasons. In some cases, Kubernetes terminates pods due to user input (when updating or deleting a deployment, for example). In others, Kubernetes terminates pods because it needs to free resources on a given node. Regardless of the scenario, Kubernetes allows the containers running in a pod to shutdown gracefully within a configurable period.

Take a look at the chart below in order to better understand what is happening when a pod is being deleted.

Below are 2 Pod Shutdown Scenarios we seen in real time.

Graceful Shutdown

In this scenario, the containers within the pod shut down gracefully within the grace period. The “graceful shutdown” state for the containers represents the execution of an optional pre-stop hook and Pod responding to the SIGTERM signal. Once the containers exit successfully, the Kubelet deletes the pod from the API server.

Forceful Shutdown

In this scenario, the containers fail to shutdown within the grace period. Failure to shutdown could be due to multiple reasons, including 1) the application ignoring the SIGTERM signal, 2) the pre-stop hook taking longer than the grace period, 3) the application taking longer than the grace period to clean up resources, or 4) a combination of the above.

When the application fails to shutdown within the grace period, the Kubelet sends a SIGKILL signal to forcefully shutdown the processes running in the pod. Depending on the application, this can result in data loss and user-facing errors

In this article, we will focus on analyzing its graceful shutdown part.

Identify the issues

In Kubernetes, every deployment means creating pods of a new version while removing old pods.

Two problems can arise if there is no graceful shutdown during the process:

  1. A pod that is currently in the middle of processing a request is removed, which, if the request is not idempotent, leads to an inconsistent state.
  2. Kubernetes routes traffic to pods that have already been deleted, resulting in failure of processing requests and poor user experience.

Analyze the issues

During the deletion of Kubernetes pods, there are two parallel timelines as shown in the following diagram. One is the timeline of changing network rules. The other is the deletion of the pod.

When the programmer or deployment pipeline executes the kubectl delete pod command, two procedures begin

Network rules coming into effect

  1. Kube-apiserver receives the pod deletion request and updates the state of the pod to Terminating at Etcd;
  2. Endpoint Controller deletes the IP of the pod from the Endpoint object;
  3. Kuber-proxy updates the rules of iptables according to the change of the Endpoint object, and no longer routes traffic to the deleted pod.

Deleting a pod

  1. Kube-apiserver receives the pod deletion request and updates the state of the pod to Terminating at Etcd
  2. Kubelet cleans up container-related resources at the node, such as storage, network
  3. Kubelet sends SIGTERM to the container; if there are no configurations for the process within the container, the container will exit at once.
  4. If the container didn’t exit within the default 30s, Kubelet will send SIGKILL and force it to exit.

By walking through the procedure of deleting a pod, we can see that if no configuration was set for the process within the container, the container will exit at once, leading to issue 1.

Since updating network rules and deleting pods takes place simultaneously, the network rules aren’t guaranteed to get updated before the deletion of the pods. And this is what might lead to issue 2.

The Solution

The following configurations can solve these problems:

  1. Set the graceful shutdown for the process within the container.
  2. Add preStopHook.
  3. Modify terminationGracePeriodSeconds.

The following diagram shows the timeline after setup

For Issue 1: Setting graceful shutdown for the process within the container

Using SpringBoot as an example, enabling graceful shutdown is as easy as adding the correct setting in the Spring Boot config file.

server: 

shutdown: graceful

spring:

lifecycle:

timeout-per-shutdown-phase: 30s

By using the above configuration, Spring Boot guarantees it will no longer accept new requests upon receiving SIGTERM and finishes processing all the ongoing requests within the timeout. Even if it is unable to finish in time, related information will still be logged then forced to quit.

For the value of timeout, the maximum allowable duration to process a request should be referenced. In our experience, except under unusual circumstances, all requests generally finish processing within 30s. For those not finished within the defined timeout, we would capture the timeout in log monitoring and send alerts, then address the root cause of the timeout and take actions accordingly.

This is how issue 1 can be solved. Other languages and frameworks should have similar configurations.

For Issue 2: Adding preStopHook

To handle issue 2, we have to begin to delete the pod after new traffic is no longer being routed to it. Hence, preStopHook should be added to the Kubernetes yaml file to let Kubelet “take a break” upon receiving the deleting pod event and to leave Kube-proxy abundant time to update the network rules before beginning to delete the pod.

lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"] # set prestop hook

The above configuration will cause Kubelet to take the break we require.

Modifying terminationGracePeriodSeconds

Referring to the previous analytics of deleting pods, Kubernetes leaves a maximum timescale of 30 seconds for container deletion. If the sum of graceful shutdown timeout in Spring and preStopHooks in Kubernetes exceeds 30 seconds, it can lead to Kubernetes forcibly deleting the container before Spring Boot has finished processing requests. Therefore, if the procedure exceeds 30 seconds, the timerminationGracePeriodSeconds should be adjusted to be beyond the graceful shutdown timeout of Spring plus preStopHook.

terminationGracePeriodSeconds: 45

Finally, the fully updated Kubernetes yaml file looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
name: gracefulshutdown-app
spec:
replicas: 3
selector:
matchLabels:
app: gracefulshutdown-app
template:
metadata:
labels:
app: gracefulshutdown-app
spec:
containers:
- name: graceful-shutdown-test
image: gracefulshutdown-app:latest
ports:
- containerPort: 8080
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"] #set prestop hook
terminationGracePeriodSeconds: 45 # terminationGracePeriodSeconds

Setting up the graceful shutdown in Spring Boot guarantees ongoing requests will be fully processed before the container is terminated. Setting up preStopHook confirms the sequential relationship between deleting pods and updating network rules. Finally, to leave abundant time for the process to handle all requests, we set terminationGracePeriodSeconds. By following these three steps, we can adequately solve both issues.

Summary

This article describes a solution for assuring that a hypothetical service will correctly handle all requests as required for Zero Downtime Deployments, meaning an environment where deployments are done frequently. Consequently, building this capability will enrich the user experience and decrease the impact of introducing defects into a service.

--

--

Yashwanth Kumar Nimmala

Lead DevOps with 8+ years of technical experience in infrastructure design with a demonstrated history working in the Software industry.