Skip to main content
โšก Calmops

MinIO Use Cases: Production Applications Across Industries

Introduction

MinIO powers production applications across diverse industries, from massive data lakes to backup systems to AI/ML pipelines. This article explores detailed production use cases with implementation patterns, architecture examples, and best practices learned from real deployments.

Data Lakes

Enterprise Data Lake

Building a centralized repository for all organizational data:

# Data lake structure
s3://enterprise-datalake/
โ”œโ”€โ”€ landing/
โ”‚   โ”œโ”€โ”€ crm/
โ”‚   โ”œโ”€โ”€ erp/
โ”‚   โ””โ”€โ”€ logs/
โ”œโ”€โ”€ raw/
โ”‚   โ”œโ”€โ”€ processed/
โ”‚   โ””โ”€โ”€ curated/
โ”œโ”€โ”€ analytics/
โ”‚   โ””โ”€โ”€ warehouse/
โ””โ”€โ”€ ml/
    โ”œโ”€โ”€ training/
    โ””โ”€โ”€ models/

Implementation

import boto3

s3 = boto3.resource('s3')

# Create data lake buckets
buckets = [
    'datalake-landing',
    'datalake-raw',
    'datalake-curated',
    'datalake-analytics',
    'datalake-ml'
]

for bucket_name in buckets:
    s3.create_bucket(Bucket=bucket_name)

# Set lifecycle policies
s3.Bucket('datalake-landing').LifecycleConfiguration(
    Rules=[{
        'ID': 'MoveToRaw',
        'Status': 'Enabled',
        'Transitions': [{
            'Days': 1,
            'StorageClass': 'STANDARD_IA'
        }]
    }]
)

Analytics Integration

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("DataLakeAnalytics") \
    .config("spark.hadoop.fs.s3a.endpoint", "http://minio:9000") \
    .config("spark.hadoop.fs.s3a.access.key", "minioadmin") \
    .config("spark.hadoop.fs.s3a.secret.key", "minioadmin") \
    .getOrCreate()

# Query data lake
df = spark.read.parquet("s3a://datalake-curated/sales/")
result = df.groupBy("region").sum("revenue")
result.show()

Backup and Recovery

Enterprise Backup

MinIO as a backup target:

# Restic backup to MinIO
export RESTIC_PASSWORD="backup-password"
export AWS_ACCESS_KEY_ID="minioadmin"
export AWS_SECRET_ACCESS_KEY="minioadmin"
export RESTIC_REPOSITORY="s3:http://minio:9000/backup"

# Initialize repository
restic init

# Backup files
restic backup /data

# List backups
restic snapshots

# Restore
restic restore latest --target /restore

Database Backups

#!/bin/bash
# PostgreSQL backup to MinIO

BACKUP_FILE="pg_backup_$(date +%Y%m%d_%H%M%S).sql.gz"
BUCKET="database-backups"

# Dump database
pg_dumpall | gzip > /tmp/$BACKUP_FILE

# Upload to MinIO
mc cp /tmp/$BACKUP_FILE minio/$BUCKET/

# Keep only 7 days
mc rm --recursive minio/$BUCKET/ --older-than 7d

Versioning and Retention

# Enable versioning for backup bucket
mc version enable minio/backup-bucket

# Set object lock (WORM)
mc retention set --mode GOVERNANCE --days 90 minio/compliance-backups

# Legal hold
mc legalhold set minio/compliance-backups/important-file

Media and Content Storage

Video Platform

Storing and streaming video content:

import boto3

s3 = boto3.client('s3',
    endpoint_url='http://minio:9000',
    aws_access_key_id='minioadmin',
    aws_secret_access_key='minioadmin'
)

# Upload video
def upload_video(file_path, video_id, quality):
    """Upload video in different qualities"""
    bucket = 'video-content'
    
    # Upload original
    s3.upload_file(
        f'{file_path}.mp4',
        bucket,
        f'videos/{video_id}/original.mp4',
        ExtraArgs={'ContentType': 'video/mp4'}
    )
    
    # Generate and upload variants
    for quality in ['1080p', '720p', '480p']:
        s3.upload_file(
            f'{file_path}_{quality}.mp4',
            bucket,
            f'videos/{video_id}/{quality}.mp4',
            ExtraArgs={
                'ContentType': 'video/mp4',
                'CacheControl': 'max-age=31536000'
            }
        )

# Upload thumbnail
s3.upload_file(
    'thumbnail.jpg',
    'video-content',
    f'thumbnails/{video_id}.jpg',
    ExtraArgs={'ContentType': 'image/jpeg'}
)

Image Storage

# Image processing pipeline
def process_images():
    s3 = boto3.client('s3')
    
    # List all images
    response = s3.list_objects_v2(Bucket='images-raw')
    
    for obj in response.get('Contents', []):
        # Download image
        s3.download_file('images-raw', obj['Key'], '/tmp/image.jpg')
        
        # Process (resize, compress)
        processed = process_image('/tmp/image.jpg')
        
        # Upload variants
        for size in ['thumb', 'medium', 'large']:
            s3.upload_file(
                f'/tmp/{size}.jpg',
                'images-processed',
                f'{size}/{obj["Key"]}'
            )

IoT Data Storage

Time-Series IoT Data

import json
from datetime import datetime

def store_iot_data(device_id, sensor_type, value):
    """Store sensor reading"""
    s3 = boto3.client('s3',
        endpoint_url='http://minio:9000'
    )
    
    timestamp = datetime.utcnow().isoformat()
    
    data = {
        'device_id': device_id,
        'sensor_type': sensor_type,
        'value': value,
        'timestamp': timestamp
    }
    
    # Store as JSON lines
    key = f"iot/{sensor_type}/{device_id}/{timestamp[:10]}.jsonl"
    
    s3.put_object(
        Bucket='iot-data',
        Key=key,
        Body=json.dumps(data) + '\n',
        ContentType='application/jsonlines'
    )

# Store readings
store_iot_data('sensor-001', 'temperature', 23.5)
store_iot_data('sensor-001', 'humidity', 45.2)

IoT Analytics

# Analyze IoT data with Spark
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("IoTAnalytics") \
    .config("spark.hadoop.fs.s3a.endpoint", "http://minio:9000") \
    .getOrCreate()

# Read daily sensor data
df = spark.read.json("s3a://iot-data/temperature/*.jsonl")

# Aggregations
daily_avg = df.groupBy('device_id', 'date').agg(
    {'value': 'avg', 'value': 'max', 'value': 'min'}
)

# Detect anomalies
anomalies = df.filter((df.value > 100) | (df.value < -20))

Healthcare Imaging

DICOM Storage

import pydicom
import boto3

s3 = boto3.client('s3',
    endpoint_url='http://minio:9000'
)

# Upload DICOM file
def store_dicom(file_path, patient_id, study_date):
    """Store medical imaging"""
    bucket = 'medical-imaging'
    
    # Read DICOM
    ds = pydicom.dcmread(file_path)
    
    # Extract metadata
    metadata = {
        'patient_id': patient_id,
        'study_date': study_date,
        'modality': ds.Modality,
        'series': ds.SeriesInstanceUID
    }
    
    # Upload DICOM file
    key = f"dicom/{patient_id}/{study_date}/{ds.SOPInstanceUID}.dcm"
    s3.upload_file(file_path, bucket, key)
    
    # Upload metadata as sidecar
    s3.put_object(
        Bucket=bucket,
        Key=f"{key}.json",
        Body=json.dumps(metadata)
    )

# Retrieve for viewing
def get_study(patient_id, study_date):
    """Retrieve study for viewing"""
    response = s3.list_objects_v2(
        Bucket='medical-imaging',
        Prefix=f"dicom/{patient_id}/{study_date}/"
    )
    
    return [obj['Key'] for obj in response.get('Contents', [])]

Compliance

# HIPAA-compliant storage
mc mb medical-records

# Enable encryption
mc encryption set SSE-S3 medical-records

# Set retention (7 years for medical records)
mc retention set --mode COMPLIANCE --days 2555 medical-records

# Audit access
mc admin audit medical-records

Log Aggregation

Centralized Logging

# Fluentd configuration for MinIO
<match **>
  @type s3
  s3_endpoint http://minio:9000
  aws_access_key_id minioadmin
  aws_secret_access_key minioadmin
  s3_bucket logs
  s3_object_key_format %{tag}/%{time.strftime:%Y-%m-%d}/%{hostname}_%{uuid}.log
  <buffer>
    @type file
    path /var/log/fluent/s3
    timekey 3600
    timekey_use_utc true
    chunk_limit_size 256m
  </buffer>
</match>

Log Analysis

# Analyze logs with Athena-like queries
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

# Read application logs
logs = spark.read.text("s3a://logs/application/*.log")

# Parse common log format
parsed = logs.select(
    logs.value.substr(1, 26).alias('timestamp'),
    logs.value.substr(48, 100).alias('message')
)

# Error analysis
errors = parsed.filter(parsed.message.contains('ERROR'))
error_counts = errors.groupBy('timestamp').count()

Content Delivery

Static Website Hosting

# Configure website hosting
mc website set minio/www \
  --index-document index.html \
  --error-document error.html

# Set public read for static assets
mc anonymous set download minio/www/assets

CDN Integration

# Integrate with CDN (Cloudflare, Fastly, etc.)
def generate_presigned_url(key, expires=3600):
    """Generate signed URL for CDN origin"""
    url = s3.generate_presigned_url(
        'get_object',
        Params={'Bucket': 'cdn-origin', 'Key': key},
        ExpiresIn=expires
    )
    return url

# Cache invalidation
def invalidate_cdn(paths):
    """Invalidate CDN cache"""
    # Cloudflare API call
    pass

Best Practices Summary

Performance

# Use multiple drives (4+ for production)
# Use NVMe for high IOPS
# Separate networks for client and replication
# Enable caching for hot data

Security

# Enable TLS
# Use IAM policies
# Enable object locking for compliance
# Regular access audits

Data Management

# Set lifecycle policies
# Enable versioning
# Use appropriate storage classes
# Regular cleanup of temporary files

Conclusion

MinIO powers production applications across virtually every industry. The use cases demonstrated here show common patterns: data lakes for analytics, backup targets for disaster recovery, media storage for content platforms, IoT data pipelines, healthcare imaging, and log aggregation. These patterns form the foundation for building robust object storage infrastructure.

With this article, we’ve completed the MinIO tutorial series covering basics, operations, internals, trends, AI applications, and production use cases. You now have comprehensive knowledge to design, implement, and operate MinIO in your own applications.

Resources

Comments