Introduction
Analytics engineering bridges the gap between raw data and actionable insights. As the discipline matures, tools like dbt have revolutionized how analysts transform data. This guide covers building a modern analytics stack.
Key Statistics:
- dbt usage grew 300% in 2023
- Companies with analytics engineering have 2x faster time-to-insight
- 73% of data teams use some form of transformation tool
- Self-service analytics reduces analyst backlog by 60%
dbt (Data Build Tool)
Project Structure
# dbt_project.yml
name: analytics_warehouse
version: '1.0.0'
config-version: 2
vars:
source_database: raw
analytics_schema: analytics
models:
analytics:
+materialized: table
staging:
+materialized: view
intermediate:
+materialized: ephemeral
marts:
+materialized: table
+schema: marts
-- staging/stg_customers.sql
{{ config(materialized='view') }}
SELECT
customer_id,
email,
first_name,
last_name,
created_at,
status,
LOWER(email) AS email_normalized,
CONCAT(first_name, ' ', last_name) AS full_name,
DATE_TRUNC('month', created_at) AS cohort_month
FROM {{ source('ecom', 'customers') }}
WHERE deleted_at IS NULL
-- intermediate/int_customer_orders.sql
{{ config(materialized='table') }}
WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }}
WHERE status = 'completed'
),
customer_orders AS (
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(amount) AS lifetime_value,
AVG(amount) AS avg_order_value,
MIN(created_at) AS first_order_date,
MAX(created_at) AS last_order_date,
DATEDIFF('day', MIN(created_at), MAX(created_at)) AS customer_tenure_days
FROM orders
GROUP BY customer_id
)
SELECT * FROM customer_orders
-- marts/fct_customer_summary.sql
{{ config(materialized='table') }}
WITH customer_base AS (
SELECT * FROM {{ ref('stg_customers') }}
),
orders AS (
SELECT * FROM {{ ref('int_customer_orders') }}
),
subscription AS (
SELECT * FROM {{ ref('stg_subscriptions') }}
WHERE status = 'active'
)
SELECT
c.customer_id,
c.email,
c.full_name,
c.cohort_month,
COALESCE(o.total_orders, 0) AS total_orders,
COALESCE(o.lifetime_value, 0) AS lifetime_value,
COALESCE(o.avg_order_value, 0) AS avg_order_value,
COALESCE(s.monthly_recurring_revenue, 0) AS mrr,
CASE
WHEN o.lifetime_value > 1000 THEN 'high_value'
WHEN o.lifetime_value > 100 THEN 'medium_value'
ELSE 'low_value'
END AS customer_tier
FROM customer_base c
LEFT JOIN orders o ON c.customer_id = o.customer_id
LEFT JOIN subscription s ON c.customer_id = s.customer_id
Testing in dbt
# schema.yml
version: 2
models:
- name: stg_customers
description: Staged customer data
columns:
- name: customer_id
description: Primary key
tests:
- unique
- not_null
- name: email
description: Customer email
tests:
- unique
- not_null
- dbt_utils.expression_is_true:
expression: "LIKE '%@%.%'"
- name: fct_customer_summary
description: Customer summary metrics
tests:
- dbt_utils.recency:
datepart: day
field: last_order_date
interval: 2
config:
where: "total_orders > 0"
Looker
LookML Development
# customer.model.lkml
connection: analytics_warehouse
explore: customer_orders {
join: customer {
type: left_outer
sql_on: ${customer_orders.customer_id} = ${customer.customer_id}
relationship: many_to_one
}
join: subscription {
type: left_outer
sql_on: ${customer_orders.customer_id} = ${subscription.customer_id}
relationship: many_to_one
}
fields: [ALL_FIELDS*,
- customer.email,
- customer.phone]
}
# customer.view.lkml
view: customer {
sql_table_name: {{ schema }}.dim_customers ;;
dimension: customer_id {
type: string
primary_key: yes
sql: ${TABLE}.customer_id ;;
}
dimension: full_name {
type: string
sql: ${TABLE}.full_name ;;
}
dimension: email {
type: string
sql: ${TABLE}.email ;;
html: {% if user.can_see_pii %}
{{ linked_value }}
{% else %}
****
{% endif %} ;;
}
dimension_group: created {
type: time
timeframes: [date, week, month, quarter, year]
sql: ${TABLE}.created_at ;;
}
measure: count {
type: count
drill_fields: [detail*]
}
measure: avg_lifetime_value {
type: average
sql: ${lifetime_value} ;;
value_format: "$#,##0.00"
}
}
Looker Dashboard
# dashboard.lkml
dashboard: customer_analytics {
title: "Customer Analytics"
elements:
- title: "Customer Count by Tier"
type: pie
model: customer
explore: customer_orders
dimensions: [customer.customer_tier]
measures: [customer.count]
sort: [customer.count: desc]
colors: {high_value: "#2ECC71", medium_value: "#F1C40F", low_value: "#E74C3C"}
- title: "Revenue Trend"
type: line
model: customer
explore: customer_orders
dimensions: [customer_orders.created_month]
measures: [customer_orders.total_revenue]
filters: [customer_orders.created_date: "24 months"]
- title: "Top Customers"
type: table
model: customer
explore: customer_orders
dimensions: [customer.full_name, customer.email]
measures: [customer.lifetime_value, customer.count]
sorts: [customer.lifetime_value: desc]
limit: 20
Tableau
Data Source Configuration
<!-- Tableau Data Source -->
<datasource hasconnection='false' inline='true' name='Analytics' caption='Customer Analytics'>
<connection class='postgres' dbname='analytics' schema='marts' host='warehouse.example.com'>
<named-connections>
<named-connection name='Analytics DB'>
<connection class='postgres' dbname='analytics' schema='marts' port='5432' username='tableau_user' />
</named-connections>
</named-connections>
</connection>
<columns>
<column datatype='string' name='customer_id' param-domain-type='string' role='measure' type='nominal' />
<column datatype='string' name='full_name' role='dimension' type='nominal' />
<column datatype='real' name='lifetime_value' role='measure' type='quantitative' />
<column datatype='datetime' name='created_at' role='dimension' type='ordinal' />
</columns>
<aliases enabled='yes'>
<alias key='customer_id' value='Customer ID' />
<alias key='lifetime_value' value='Lifetime Value' />
</aliases>
</datasource>
Tableau Dashboard Best Practices
-- Tableau Calculations
-- Customer Tier
IF [Lifetime Value] > 10000 THEN 'Platinum'
ELSEIF [Lifetime Value] > 5000 THEN 'Gold'
ELSEIF [Lifetime Value] > 1000 THEN 'Silver'
ELSE 'Bronze'
END
-- Month-over-Month Growth
(ZN(SUM([Revenue])) - LOOKUP(ZN(SUM([Revenue])), -1))
/ ABS(LOOKUP(ZN(SUM([Revenue])), -1))
-- Running Total
RUNNING_SUM(SUM([Revenue]))
-- Cohort Retention
COUNTD(IF [Acquisition Month] = [Current Month] THEN [Customer ID] END)
/ COUNTD([Customer ID])
Comparison
| Feature | dbt | Looker | Tableau |
|---|---|---|---|
| Primary Use | Transformation | Discovery & Reporting | Visualization |
| SQL Control | Full | Limited | None |
| Learning Curve | Medium | Low | Medium |
| Cost | $$ | $$$ | $$$$ |
| Embedded Analytics | No | Yes (Plus) | Yes |
Comments