1. Horizontal Pod Autoscaler (HPA)
Purpose/Background: HPA is for application-level scaling. It automatically adjusts the number of Pods (replicas) based on metrics like CPU usage to handle changing load efficiently, saving resources during quiet periods while scaling out during peak times.
Configuration File (HorizontalPodAutoscaler object):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app-deployment # Target the Deployment to be scaled
minReplicas: 3 # Ensures high availability
maxReplicas: 10 # Protects cluster resources and controls costs
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80 # Scale out when average CPU utilization hits 80% of the requested amount.
Crucial Prerequisite: Resource Requests ⚠️
For HPA to work with resource utilization metrics, the target Deployment must define resource requests in the Pod spec:
# Inside Deployment spec.template.spec.containers[0]
resources:
requests:
cpu: "200m" # HPA calculates 80% utilization based on this value
memory: "256Mi"
2. Cluster Autoscaler (CA)
Purpose/Background: CA is for infrastructure-level scaling. It watches for Pods that are in a Pending state because there are no resources on existing nodes. It then communicates with the cloud provider to scale the underlying Node Pool by adding new Worker Nodes, ensuring capacity for HPA-triggered scaling.
Configuration Details (Cloud Provider Settings):
CA is not a standard Kubernetes YAML file; it’s configured on the underlying cloud platform (e.g., GKE, AKS, EKS).
| Configuration Parameter | Background/Reason |
| Node Pool Min Size | The minimum number of nodes to maintain. Prevents the cluster from scaling down to zero when necessary services still need to run. |
| Node Pool Max Size | The maximum number of nodes allowed. Crucial for cost control and preventing runaway resource consumption. |
Export to Sheets
3. Manual Scaling
Purpose/Background: Manual scaling is the simplest method, used for initial setup, planned capacity increases, or static workloads where the desired number of replicas is known and constant.
Configuration File (Deployment object):
You set the desired replica count directly in the Deployment manifest.
apiVersion: apps/v1 kind: Deployment metadata: name: static-service spec: replicas: 5 # <--- The desired number of Pods is set manually here. # ... rest of the spec
Command Line:
This command immediately overrides any existing spec.replicas value in the Deployment:
kubectl scale deployment static-service --replicas=5