Introduction
GitOps is a paradigm that uses Git as the single source of truth for infrastructure and application deployment. Instead of imperative commands, you declare desired state in Git, and automated systems ensure the actual state matches. This guide covers GitOps principles, implementation patterns, and best practices for production systems.
Key Statistics:
- GitOps reduces deployment time by 60-70%
- Incident recovery time improves by 50%
- Infrastructure changes are fully auditable
- 95% of enterprises adopting GitOps report improved reliability
Core Concepts & Terminology
1. GitOps
Operational model where Git is the single source of truth for infrastructure and applications.
2. Declarative Infrastructure
Describing desired state rather than imperative steps to achieve it.
3. Continuous Deployment
Automatically deploying changes when they’re merged to main branch.
4. Pull-Based Deployment
Deployment system pulls changes from Git, rather than push-based webhooks.
5. Infrastructure as Code (IaC)
Managing infrastructure through code files (Terraform, CloudFormation, etc.).
6. Reconciliation
Process of ensuring actual state matches desired state in Git.
7. Drift Detection
Identifying when actual infrastructure differs from Git-declared state.
8. Sealed Secrets
Encrypting secrets in Git while keeping them accessible to deployment systems.
9. Kustomize
Tool for customizing Kubernetes manifests without templating.
10. ArgoCD
Popular GitOps tool for Kubernetes that implements pull-based deployment.
GitOps Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Git Repository โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Infrastructureโ โ Applications โ โ Secrets โ โ
โ โ (Terraform) โ โ (Manifests) โ โ (Encrypted) โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ CI Pipeline โ โ GitOps Operator โ
โ (GitHub Actions)โ โ (ArgoCD) โ
โ โโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโ โ
โ โ Validate โ โ โ โ Monitor โ โ
โ โ Test โ โ โ โ Reconcile โ โ
โ โ Build โ โ โ โ Sync โ โ
โ โโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโฌโโ
โ
โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ
โ Kubernetes Cluster โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Deployments, Services, โ โ
โ โ ConfigMaps, Secrets โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Git Repository Structure
Recommended Layout
gitops-repo/
โโโ infrastructure/
โ โโโ terraform/
โ โ โโโ main.tf
โ โ โโโ variables.tf
โ โ โโโ outputs.tf
โ โ โโโ environments/
โ โ โโโ dev/
โ โ โโโ staging/
โ โ โโโ production/
โ โโโ helm/
โ โโโ values-dev.yaml
โ โโโ values-staging.yaml
โ โโโ values-prod.yaml
โโโ applications/
โ โโโ app-1/
โ โ โโโ kustomization.yaml
โ โ โโโ deployment.yaml
โ โ โโโ service.yaml
โ โ โโโ overlays/
โ โ โโโ dev/
โ โ โโโ staging/
โ โ โโโ production/
โ โโโ app-2/
โโโ secrets/
โ โโโ sealed-secrets.yaml
โ โโโ .gitignore
โโโ docs/
โ โโโ README.md
โ โโโ CONTRIBUTING.md
โโโ .github/
โโโ workflows/
โโโ validate.yml
โโโ deploy.yml
โโโ drift-detection.yml
Terraform GitOps Implementation
Infrastructure as Code
# main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.0"
}
}
backend "s3" {
bucket = "terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
ManagedBy = "Terraform"
GitOps = "true"
}
}
}
# EKS Cluster
resource "aws_eks_cluster" "main" {
name = "${var.cluster_name}-${var.environment}"
role_arn = aws_iam_role.eks_cluster.arn
version = var.kubernetes_version
vpc_config {
subnet_ids = var.subnet_ids
endpoint_private_access = true
endpoint_public_access = true
}
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
tags = {
Name = "${var.cluster_name}-${var.environment}"
}
}
# Node Group
resource "aws_eks_node_group" "main" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "${var.cluster_name}-nodes"
node_role_arn = aws_iam_role.node.arn
subnet_ids = var.subnet_ids
scaling_config {
desired_size = var.desired_size
max_size = var.max_size
min_size = var.min_size
}
instance_types = var.instance_types
tags = {
Name = "${var.cluster_name}-nodes"
}
}
# Output
output "cluster_endpoint" {
value = aws_eks_cluster.main.endpoint
description = "EKS cluster endpoint"
}
output "cluster_name" {
value = aws_eks_cluster.main.name
description = "EKS cluster name"
}
Environment-Specific Configuration
# environments/production/terraform.tfvars
aws_region = "us-east-1"
environment = "production"
cluster_name = "myapp"
kubernetes_version = "1.28"
desired_size = 5
max_size = 20
min_size = 3
instance_types = ["t3.large"]
# environments/staging/terraform.tfvars
aws_region = "us-east-1"
environment = "staging"
cluster_name = "myapp"
kubernetes_version = "1.28"
desired_size = 2
max_size = 5
min_size = 1
instance_types = ["t3.medium"]
Kubernetes Manifests with Kustomize
Base Configuration
# applications/myapp/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: default
commonLabels:
app: myapp
managed-by: gitops
resources:
- deployment.yaml
- service.yaml
- configmap.yaml
- hpa.yaml
images:
- name: myapp
newTag: latest
replicas:
- name: myapp
count: 3
configMapGenerator:
- name: app-config
literals:
- LOG_LEVEL=info
- ENVIRONMENT=production
secretGenerator:
- name: app-secrets
envs:
- secrets.env
behavior: merge
Deployment Manifest
# applications/myapp/base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
serviceAccountName: myapp
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: myapp
image: myapp:latest
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: app-config
key: LOG_LEVEL
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: database-url
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
Environment Overlays
# applications/myapp/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
namespace: production
namePrefix: prod-
commonLabels:
environment: production
replicas:
- name: myapp
count: 5
images:
- name: myapp
newTag: v1.2.3
patchesStrategicMerge:
- deployment-patch.yaml
configMapGenerator:
- name: app-config
literals:
- LOG_LEVEL=warn
- ENVIRONMENT=production
behavior: merge
resources:
- ingress.yaml
- networkpolicy.yaml
ArgoCD GitOps Operator
ArgoCD Application
# argocd/applications/myapp.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/gitops-repo
targetRevision: main
path: applications/myapp/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- RespectIgnoreDifferences=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
# Notifications
notifications:
- name: slack
selector:
- key: notify
value: "true"
ArgoCD Configuration
# argocd/argocd-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
# Enable auto-sync
application.instanceLabelKey: argocd.argoproj.io/instance
# Webhook configuration
url: https://argocd.example.com
# Repository credentials
repositories: |
- url: https://github.com/myorg/gitops-repo
type: git
passwordSecret:
name: github-credentials
key: password
usernameSecret:
name: github-credentials
key: username
# Notification settings
notificationSettings: |
- name: slack
enabled: true
webhook: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
Sealed Secrets for GitOps
Encrypting Secrets
# Install sealed-secrets controller
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.24.0/controller.yaml
# Create a secret
kubectl create secret generic app-secrets \
--from-literal=database-url=postgresql://user:pass@db:5432/myapp \
--from-literal=api-key=sk_live_abc123 \
-n production \
--dry-run=client -o yaml > secret.yaml
# Seal the secret
kubeseal -f secret.yaml -w sealed-secret.yaml
# Commit sealed-secret.yaml to Git
git add sealed-secret.yaml
git commit -m "Add sealed secrets"
Sealed Secret Manifest
# applications/myapp/overlays/production/sealed-secrets.yaml
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: app-secrets
namespace: production
spec:
encryptedData:
database-url: AgBvF3x8K2x9... # Encrypted value
api-key: AgCdE4y9L3m2... # Encrypted value
template:
metadata:
name: app-secrets
namespace: production
type: Opaque
CI/CD Pipeline for GitOps
GitHub Actions Workflow
# .github/workflows/deploy.yml
name: GitOps Deploy
on:
push:
branches:
- main
paths:
- 'applications/**'
- 'infrastructure/**'
- '.github/workflows/deploy.yml'
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Validate Terraform
run: |
cd infrastructure/terraform
terraform init -backend=false
terraform validate
- name: Validate Kubernetes Manifests
run: |
kubectl apply --dry-run=client -f applications/ -R
- name: Lint with kubeval
run: |
kubeval applications/**/*.yaml
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run tests
run: |
# Run application tests
make test
deploy:
needs: [validate, test]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Deploy Infrastructure
run: |
cd infrastructure/terraform/environments/production
terraform init
terraform plan -out=tfplan
terraform apply tfplan
- name: Notify ArgoCD
run: |
# ArgoCD will automatically sync from Git
echo "Infrastructure deployed. ArgoCD will sync applications."
drift-detection:
runs-on: ubuntu-latest
schedule:
- cron: '0 */6 * * *' # Every 6 hours
steps:
- uses: actions/checkout@v4
- name: Check Terraform Drift
run: |
cd infrastructure/terraform/environments/production
terraform init
terraform plan -out=tfplan
if [ -s tfplan ]; then
echo "Drift detected!"
exit 1
fi
- name: Notify on Drift
if: failure()
run: |
# Send notification about drift
echo "Infrastructure drift detected"
Best Practices
- Single Source of Truth: Git is the only source of truth
- Declarative Configuration: Describe desired state, not steps
- Automated Reconciliation: System automatically fixes drift
- Pull-Based Deployment: Operator pulls from Git, not push
- Immutable Infrastructure: Rebuild instead of modify
- Sealed Secrets: Encrypt secrets in Git
- Code Review: All changes go through PR review
- Audit Trail: Full history of all changes
- Drift Detection: Regularly check for drift
- Gradual Rollout: Use canary/blue-green deployments
Common Pitfalls
- Mixing Push and Pull: Using both webhooks and operators
- Secrets in Git: Committing unencrypted secrets
- Manual Changes: Making changes outside Git
- No Drift Detection: Unaware of infrastructure drift
- Poor PR Process: Merging without review
- Ignoring Failures: Not handling sync failures
- No Rollback Plan: Unable to quickly revert
- Monolithic Repos: Single large repository
- No Testing: Deploying without validation
- Ignoring Monitoring: No visibility into deployments
Comparison: Deployment Approaches
| Approach | Speed | Auditability | Rollback | Learning Curve |
|---|---|---|---|---|
| Manual | Slow | Poor | Difficult | Low |
| CI/CD Push | Fast | Good | Easy | Medium |
| GitOps Pull | Fast | Excellent | Easy | High |
External Resources
Conclusion
GitOps provides a powerful paradigm for managing infrastructure and applications at scale. By treating infrastructure as code and using Git as the single source of truth, you gain auditability, reproducibility, and automated reconciliation. The key is implementing proper processes, tooling, and monitoring to ensure your infrastructure stays in sync with Git.
Next Steps:
- Set up Git repository structure
- Migrate infrastructure to Terraform
- Implement ArgoCD or Flux
- Set up sealed secrets
- Create CI/CD pipeline
- Monitor and iterate
Comments