Introduction
MinIO powers production applications across diverse industries, from massive data lakes to backup systems to AI/ML pipelines. This article explores detailed production use cases with implementation patterns, architecture examples, and best practices learned from real deployments.
Data Lakes
Enterprise Data Lake
Building a centralized repository for all organizational data:
# Data lake structure
s3://enterprise-datalake/
โโโ landing/
โ โโโ crm/
โ โโโ erp/
โ โโโ logs/
โโโ raw/
โ โโโ processed/
โ โโโ curated/
โโโ analytics/
โ โโโ warehouse/
โโโ ml/
โโโ training/
โโโ models/
Implementation
import boto3
s3 = boto3.resource('s3')
# Create data lake buckets
buckets = [
'datalake-landing',
'datalake-raw',
'datalake-curated',
'datalake-analytics',
'datalake-ml'
]
for bucket_name in buckets:
s3.create_bucket(Bucket=bucket_name)
# Set lifecycle policies
s3.Bucket('datalake-landing').LifecycleConfiguration(
Rules=[{
'ID': 'MoveToRaw',
'Status': 'Enabled',
'Transitions': [{
'Days': 1,
'StorageClass': 'STANDARD_IA'
}]
}]
)
Analytics Integration
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("DataLakeAnalytics") \
.config("spark.hadoop.fs.s3a.endpoint", "http://minio:9000") \
.config("spark.hadoop.fs.s3a.access.key", "minioadmin") \
.config("spark.hadoop.fs.s3a.secret.key", "minioadmin") \
.getOrCreate()
# Query data lake
df = spark.read.parquet("s3a://datalake-curated/sales/")
result = df.groupBy("region").sum("revenue")
result.show()
Backup and Recovery
Enterprise Backup
MinIO as a backup target:
# Restic backup to MinIO
export RESTIC_PASSWORD="backup-password"
export AWS_ACCESS_KEY_ID="minioadmin"
export AWS_SECRET_ACCESS_KEY="minioadmin"
export RESTIC_REPOSITORY="s3:http://minio:9000/backup"
# Initialize repository
restic init
# Backup files
restic backup /data
# List backups
restic snapshots
# Restore
restic restore latest --target /restore
Database Backups
#!/bin/bash
# PostgreSQL backup to MinIO
BACKUP_FILE="pg_backup_$(date +%Y%m%d_%H%M%S).sql.gz"
BUCKET="database-backups"
# Dump database
pg_dumpall | gzip > /tmp/$BACKUP_FILE
# Upload to MinIO
mc cp /tmp/$BACKUP_FILE minio/$BUCKET/
# Keep only 7 days
mc rm --recursive minio/$BUCKET/ --older-than 7d
Versioning and Retention
# Enable versioning for backup bucket
mc version enable minio/backup-bucket
# Set object lock (WORM)
mc retention set --mode GOVERNANCE --days 90 minio/compliance-backups
# Legal hold
mc legalhold set minio/compliance-backups/important-file
Media and Content Storage
Video Platform
Storing and streaming video content:
import boto3
s3 = boto3.client('s3',
endpoint_url='http://minio:9000',
aws_access_key_id='minioadmin',
aws_secret_access_key='minioadmin'
)
# Upload video
def upload_video(file_path, video_id, quality):
"""Upload video in different qualities"""
bucket = 'video-content'
# Upload original
s3.upload_file(
f'{file_path}.mp4',
bucket,
f'videos/{video_id}/original.mp4',
ExtraArgs={'ContentType': 'video/mp4'}
)
# Generate and upload variants
for quality in ['1080p', '720p', '480p']:
s3.upload_file(
f'{file_path}_{quality}.mp4',
bucket,
f'videos/{video_id}/{quality}.mp4',
ExtraArgs={
'ContentType': 'video/mp4',
'CacheControl': 'max-age=31536000'
}
)
# Upload thumbnail
s3.upload_file(
'thumbnail.jpg',
'video-content',
f'thumbnails/{video_id}.jpg',
ExtraArgs={'ContentType': 'image/jpeg'}
)
Image Storage
# Image processing pipeline
def process_images():
s3 = boto3.client('s3')
# List all images
response = s3.list_objects_v2(Bucket='images-raw')
for obj in response.get('Contents', []):
# Download image
s3.download_file('images-raw', obj['Key'], '/tmp/image.jpg')
# Process (resize, compress)
processed = process_image('/tmp/image.jpg')
# Upload variants
for size in ['thumb', 'medium', 'large']:
s3.upload_file(
f'/tmp/{size}.jpg',
'images-processed',
f'{size}/{obj["Key"]}'
)
IoT Data Storage
Time-Series IoT Data
import json
from datetime import datetime
def store_iot_data(device_id, sensor_type, value):
"""Store sensor reading"""
s3 = boto3.client('s3',
endpoint_url='http://minio:9000'
)
timestamp = datetime.utcnow().isoformat()
data = {
'device_id': device_id,
'sensor_type': sensor_type,
'value': value,
'timestamp': timestamp
}
# Store as JSON lines
key = f"iot/{sensor_type}/{device_id}/{timestamp[:10]}.jsonl"
s3.put_object(
Bucket='iot-data',
Key=key,
Body=json.dumps(data) + '\n',
ContentType='application/jsonlines'
)
# Store readings
store_iot_data('sensor-001', 'temperature', 23.5)
store_iot_data('sensor-001', 'humidity', 45.2)
IoT Analytics
# Analyze IoT data with Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("IoTAnalytics") \
.config("spark.hadoop.fs.s3a.endpoint", "http://minio:9000") \
.getOrCreate()
# Read daily sensor data
df = spark.read.json("s3a://iot-data/temperature/*.jsonl")
# Aggregations
daily_avg = df.groupBy('device_id', 'date').agg(
{'value': 'avg', 'value': 'max', 'value': 'min'}
)
# Detect anomalies
anomalies = df.filter((df.value > 100) | (df.value < -20))
Healthcare Imaging
DICOM Storage
import pydicom
import boto3
s3 = boto3.client('s3',
endpoint_url='http://minio:9000'
)
# Upload DICOM file
def store_dicom(file_path, patient_id, study_date):
"""Store medical imaging"""
bucket = 'medical-imaging'
# Read DICOM
ds = pydicom.dcmread(file_path)
# Extract metadata
metadata = {
'patient_id': patient_id,
'study_date': study_date,
'modality': ds.Modality,
'series': ds.SeriesInstanceUID
}
# Upload DICOM file
key = f"dicom/{patient_id}/{study_date}/{ds.SOPInstanceUID}.dcm"
s3.upload_file(file_path, bucket, key)
# Upload metadata as sidecar
s3.put_object(
Bucket=bucket,
Key=f"{key}.json",
Body=json.dumps(metadata)
)
# Retrieve for viewing
def get_study(patient_id, study_date):
"""Retrieve study for viewing"""
response = s3.list_objects_v2(
Bucket='medical-imaging',
Prefix=f"dicom/{patient_id}/{study_date}/"
)
return [obj['Key'] for obj in response.get('Contents', [])]
Compliance
# HIPAA-compliant storage
mc mb medical-records
# Enable encryption
mc encryption set SSE-S3 medical-records
# Set retention (7 years for medical records)
mc retention set --mode COMPLIANCE --days 2555 medical-records
# Audit access
mc admin audit medical-records
Log Aggregation
Centralized Logging
# Fluentd configuration for MinIO
<match **>
@type s3
s3_endpoint http://minio:9000
aws_access_key_id minioadmin
aws_secret_access_key minioadmin
s3_bucket logs
s3_object_key_format %{tag}/%{time.strftime:%Y-%m-%d}/%{hostname}_%{uuid}.log
<buffer>
@type file
path /var/log/fluent/s3
timekey 3600
timekey_use_utc true
chunk_limit_size 256m
</buffer>
</match>
Log Analysis
# Analyze logs with Athena-like queries
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
# Read application logs
logs = spark.read.text("s3a://logs/application/*.log")
# Parse common log format
parsed = logs.select(
logs.value.substr(1, 26).alias('timestamp'),
logs.value.substr(48, 100).alias('message')
)
# Error analysis
errors = parsed.filter(parsed.message.contains('ERROR'))
error_counts = errors.groupBy('timestamp').count()
Content Delivery
Static Website Hosting
# Configure website hosting
mc website set minio/www \
--index-document index.html \
--error-document error.html
# Set public read for static assets
mc anonymous set download minio/www/assets
CDN Integration
# Integrate with CDN (Cloudflare, Fastly, etc.)
def generate_presigned_url(key, expires=3600):
"""Generate signed URL for CDN origin"""
url = s3.generate_presigned_url(
'get_object',
Params={'Bucket': 'cdn-origin', 'Key': key},
ExpiresIn=expires
)
return url
# Cache invalidation
def invalidate_cdn(paths):
"""Invalidate CDN cache"""
# Cloudflare API call
pass
Best Practices Summary
Performance
# Use multiple drives (4+ for production)
# Use NVMe for high IOPS
# Separate networks for client and replication
# Enable caching for hot data
Security
# Enable TLS
# Use IAM policies
# Enable object locking for compliance
# Regular access audits
Data Management
# Set lifecycle policies
# Enable versioning
# Use appropriate storage classes
# Regular cleanup of temporary files
Conclusion
MinIO powers production applications across virtually every industry. The use cases demonstrated here show common patterns: data lakes for analytics, backup targets for disaster recovery, media storage for content platforms, IoT data pipelines, healthcare imaging, and log aggregation. These patterns form the foundation for building robust object storage infrastructure.
With this article, we’ve completed the MinIO tutorial series covering basics, operations, internals, trends, AI applications, and production use cases. You now have comprehensive knowledge to design, implement, and operate MinIO in your own applications.
Comments