Data-engineering — Topic Index

Topic index generated on 2026-04-23 — grouped article list

Below is an index of articles grouped by topic. Click a heading to jump to the section.

Data Engineering

data-engineering

  • Apache Spark: Big Data Processing at Scale 2026
    • A comprehensive guide to Apache Spark for big data processing in 2026. Learn about RDDs, DataFrames, Spark SQL, optimization techniques, and building scalable data pipelines.
  • Data Lakehouse: Combining Data Lake and Data Warehouse
    • A comprehensive guide to Data Lakehouse architecture, combining the flexibility of data lakes with the management features of data warehouses. Learn about Delta Lake, Apache Iceberg, Hudi, ACID transactions, and time travel.
  • Data Mesh: Decentralized Data Architecture 2026
    • A comprehensive guide to Data Mesh architecture in 2026, a decentralized approach to data management that treats data as a product. Learn about domain ownership, data as a product, self-serve platform, and federated governance.
  • Stream Processing with Kafka and Flink
    • A comprehensive guide to stream processing using Apache Kafka and Apache Flink. Learn about event streaming, exactly-once semantics, windowing, and building real-time data pipelines.

Data-Engineering


If you find missing articles or inaccurate groupings, run ./scripts/update_index.py with appropriate flags.

Introduction to Time Series Analysis

Learn time series analysis fundamentals including forecasting methods, decomposition, stationarity, and building predictive models for temporal data.

Data Catalog Implementation Guide

Build and implement a data catalog: metadata management, discovery, governance, and business glossary. Tools, architectures, and best practices for 2026.

Data Lakehouse Architecture: Complete Guide

Master data lakehouse architecture in 2026. Learn how to combine data lake flexibility with data warehouse reliability. Covers Delta Lake, Apache Iceberg, implementation …

Data Pipeline Orchestration: Complete Guide

Master data pipeline orchestration with Airflow, Dagster, and Prefect. Learn to build scalable, reliable ETL pipelines, manage dependencies, and implement best practices for data …

ETL vs ELT: Modern Data Integration Patterns

Compare ETL and ELT approaches for modern data integration. Learn when to use each pattern, tool recommendations, and implementation strategies for cloud data warehouses.

Data Mesh: Decentralized Data Architecture 2026

A comprehensive guide to Data Mesh architecture in 2026, a decentralized approach to data management that treats data as a product. Learn about domain ownership, data as a product, …

Stream Processing with Kafka and Flink

A comprehensive guide to stream processing using Apache Kafka and Apache Flink. Learn about event streaming, exactly-once semantics, windowing, and building real-time data …