Skip to main content
โšก Calmops

Analytics Engineering: dbt, Looker, Tableau

Introduction

Analytics engineering bridges the gap between raw data and actionable insights. As the discipline matures, tools like dbt have revolutionized how analysts transform data. This guide covers building a modern analytics stack.

Key Statistics:

  • dbt usage grew 300% in 2023
  • Companies with analytics engineering have 2x faster time-to-insight
  • 73% of data teams use some form of transformation tool
  • Self-service analytics reduces analyst backlog by 60%

dbt (Data Build Tool)

Project Structure

# dbt_project.yml
name: analytics_warehouse
version: '1.0.0'

config-version: 2

vars:
  source_database: raw
  analytics_schema: analytics

models:
  analytics:
    +materialized: table
    staging:
      +materialized: view
    intermediate:
      +materialized: ephemeral
    marts:
      +materialized: table
      +schema: marts
-- staging/stg_customers.sql
{{ config(materialized='view') }}

SELECT
    customer_id,
    email,
    first_name,
    last_name,
    created_at,
    status,
    LOWER(email) AS email_normalized,
    CONCAT(first_name, ' ', last_name) AS full_name,
    DATE_TRUNC('month', created_at) AS cohort_month
FROM {{ source('ecom', 'customers') }}
WHERE deleted_at IS NULL
-- intermediate/int_customer_orders.sql
{{ config(materialized='table') }}

WITH orders AS (
    SELECT * FROM {{ ref('stg_orders') }}
    WHERE status = 'completed'
),

customer_orders AS (
    SELECT
        customer_id,
        COUNT(*) AS total_orders,
        SUM(amount) AS lifetime_value,
        AVG(amount) AS avg_order_value,
        MIN(created_at) AS first_order_date,
        MAX(created_at) AS last_order_date,
        DATEDIFF('day', MIN(created_at), MAX(created_at)) AS customer_tenure_days
    FROM orders
    GROUP BY customer_id
)

SELECT * FROM customer_orders
-- marts/fct_customer_summary.sql
{{ config(materialized='table') }}

WITH customer_base AS (
    SELECT * FROM {{ ref('stg_customers') }}
),

orders AS (
    SELECT * FROM {{ ref('int_customer_orders') }}
),

subscription AS (
    SELECT * FROM {{ ref('stg_subscriptions') }}
    WHERE status = 'active'
)

SELECT
    c.customer_id,
    c.email,
    c.full_name,
    c.cohort_month,
    COALESCE(o.total_orders, 0) AS total_orders,
    COALESCE(o.lifetime_value, 0) AS lifetime_value,
    COALESCE(o.avg_order_value, 0) AS avg_order_value,
    COALESCE(s.monthly_recurring_revenue, 0) AS mrr,
    CASE 
        WHEN o.lifetime_value > 1000 THEN 'high_value'
        WHEN o.lifetime_value > 100 THEN 'medium_value'
        ELSE 'low_value'
    END AS customer_tier
FROM customer_base c
LEFT JOIN orders o ON c.customer_id = o.customer_id
LEFT JOIN subscription s ON c.customer_id = s.customer_id

Testing in dbt

# schema.yml
version: 2

models:
  - name: stg_customers
    description: Staged customer data
    columns:
      - name: customer_id
        description: Primary key
        tests:
          - unique
          - not_null
      
      - name: email
        description: Customer email
        tests:
          - unique
          - not_null
          - dbt_utils.expression_is_true:
              expression: "LIKE '%@%.%'"

  - name: fct_customer_summary
    description: Customer summary metrics
    tests:
      - dbt_utils.recency:
          datepart: day
          field: last_order_date
          interval: 2
          config:
            where: "total_orders > 0"

Looker

LookML Development

# customer.model.lkml
connection: analytics_warehouse

explore: customer_orders {
  join: customer {
    type: left_outer
    sql_on: ${customer_orders.customer_id} = ${customer.customer_id}
    relationship: many_to_one
  }
  
  join: subscription {
    type: left_outer
    sql_on: ${customer_orders.customer_id} = ${subscription.customer_id}
    relationship: many_to_one
  }
  
  fields: [ALL_FIELDS*,
    - customer.email,
    - customer.phone]
}

# customer.view.lkml
view: customer {
  sql_table_name: {{ schema }}.dim_customers ;;
  
  dimension: customer_id {
    type: string
    primary_key: yes
    sql: ${TABLE}.customer_id ;;
  }
  
  dimension: full_name {
    type: string
    sql: ${TABLE}.full_name ;;
  }
  
  dimension: email {
    type: string
    sql: ${TABLE}.email ;;
    html: {% if user.can_see_pii %}
      {{ linked_value }}
    {% else %}
      ****
    {% endif %} ;;
  }
  
  dimension_group: created {
    type: time
    timeframes: [date, week, month, quarter, year]
    sql: ${TABLE}.created_at ;;
  }
  
  measure: count {
    type: count
    drill_fields: [detail*]
  }
  
  measure: avg_lifetime_value {
    type: average
    sql: ${lifetime_value} ;;
    value_format: "$#,##0.00"
  }
}

Looker Dashboard

# dashboard.lkml
dashboard: customer_analytics {
  title: "Customer Analytics"
  
  elements:
    - title: "Customer Count by Tier"
      type: pie
      model: customer
      explore: customer_orders
      dimensions: [customer.customer_tier]
      measures: [customer.count]
      sort: [customer.count: desc]
      colors: {high_value: "#2ECC71", medium_value: "#F1C40F", low_value: "#E74C3C"}
    
    - title: "Revenue Trend"
      type: line
      model: customer
      explore: customer_orders
      dimensions: [customer_orders.created_month]
      measures: [customer_orders.total_revenue]
      filters: [customer_orders.created_date: "24 months"]
    
    - title: "Top Customers"
      type: table
      model: customer
      explore: customer_orders
      dimensions: [customer.full_name, customer.email]
      measures: [customer.lifetime_value, customer.count]
      sorts: [customer.lifetime_value: desc]
      limit: 20

Tableau

Data Source Configuration

<!-- Tableau Data Source -->
<datasource hasconnection='false' inline='true' name='Analytics' caption='Customer Analytics'>
  <connection class='postgres' dbname='analytics' schema='marts' host='warehouse.example.com'>
    <named-connections>
      <named-connection name='Analytics DB'>
        <connection class='postgres' dbname='analytics' schema='marts' port='5432' username='tableau_user' />
      </named-connections>
    </named-connections>
  </connection>
  
  <columns>
    <column datatype='string' name='customer_id' param-domain-type='string' role='measure' type='nominal' />
    <column datatype='string' name='full_name' role='dimension' type='nominal' />
    <column datatype='real' name='lifetime_value' role='measure' type='quantitative' />
    <column datatype='datetime' name='created_at' role='dimension' type='ordinal' />
  </columns>
  
  <aliases enabled='yes'>
    <alias key='customer_id' value='Customer ID' />
    <alias key='lifetime_value' value='Lifetime Value' />
  </aliases>
</datasource>

Tableau Dashboard Best Practices

-- Tableau Calculations

-- Customer Tier
IF [Lifetime Value] > 10000 THEN 'Platinum'
ELSEIF [Lifetime Value] > 5000 THEN 'Gold'
ELSEIF [Lifetime Value] > 1000 THEN 'Silver'
ELSE 'Bronze'
END

-- Month-over-Month Growth
(ZN(SUM([Revenue])) - LOOKUP(ZN(SUM([Revenue])), -1)) 
/ ABS(LOOKUP(ZN(SUM([Revenue])), -1))

-- Running Total
RUNNING_SUM(SUM([Revenue]))

-- Cohort Retention
COUNTD(IF [Acquisition Month] = [Current Month] THEN [Customer ID] END) 
/ COUNTD([Customer ID])

Comparison

Feature dbt Looker Tableau
Primary Use Transformation Discovery & Reporting Visualization
SQL Control Full Limited None
Learning Curve Medium Low Medium
Cost $$ $$$ $$$$
Embedded Analytics No Yes (Plus) Yes

External Resources


Comments