Kubernetes Scaling Configuration and Background

Posted on: September 26, 2025 Posted by: rahulgite Comments: 0

1. Horizontal Pod Autoscaler (HPA)

Purpose/Background: HPA is for application-level scaling. It automatically adjusts the number of Pods (replicas) based on metrics like CPU usage to handle changing load efficiently, saving resources during quiet periods while scaling out during peak times.

Configuration File (`HorizontalPodAutoscaler` object):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app-deployment # Target the Deployment to be scaled
  minReplicas: 3  # Ensures high availability
  maxReplicas: 10 # Protects cluster resources and controls costs
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80 # Scale out when average CPU utilization hits 80% of the requested amount.

Crucial Prerequisite: Resource Requests ⚠️

For HPA to work with resource utilization metrics, the target Deployment must define resource requests in the Pod spec:

# Inside Deployment spec.template.spec.containers[0]
resources:
  requests:
    cpu: "200m" # HPA calculates 80% utilization based on this value
    memory: "256Mi"

2. Cluster Autoscaler (CA)

Purpose/Background: CA is for infrastructure-level scaling. It watches for Pods that are in a Pending state because there are no resources on existing nodes. It then communicates with the cloud provider to scale the underlying Node Pool by adding new Worker Nodes, ensuring capacity for HPA-triggered scaling.

Configuration Details (Cloud Provider Settings):

CA is not a standard Kubernetes YAML file; it’s configured on the underlying cloud platform (e.g., GKE, AKS, EKS).

Configuration Parameter	Background/Reason
Node Pool Min Size	The minimum number of nodes to maintain. Prevents the cluster from scaling down to zero when necessary services still need to run.
Node Pool Max Size	The maximum number of nodes allowed. Crucial for cost control and preventing runaway resource consumption.

Export to Sheets

3. Manual Scaling

Purpose/Background: Manual scaling is the simplest method, used for initial setup, planned capacity increases, or static workloads where the desired number of replicas is known and constant.

Configuration File (`Deployment` object):

You set the desired replica count directly in the Deployment manifest.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: static-service
spec:
  replicas: 5 # <--- The desired number of Pods is set manually here.
  # ... rest of the spec

Command Line:

This command immediately overrides any existing spec.replicas value in the Deployment:

kubectl scale deployment static-service --replicas=5

Kubernetes