Skip to main content
⚡ Calmops

InfluxDB Basics: Getting Started with Time-Series Data

Introduction

Time-series data is the foundation of modern monitoring, IoT, and analytics applications. Whether you’re tracking server metrics, sensor readings, or financial prices, you need a database optimized for timestamped data. InfluxDB, built by InfluxData, is the leading open-source time-series database that handles high write and query throughput while providing a powerful query language.

InfluxDB differs from relational databases by natively understanding time. It provides specialized functions for time-series analysis, automatic data downsampling, and retention policies. In this article, we explore InfluxDB fundamentals: data model, line protocol, InfluxQL, and practical examples.

Understanding the Data Model

InfluxDB organizes data hierarchically: measurements, tags, fields, and timestamps.

Measurements

A measurement is like a SQL table containing time-series data:

measurement: cpu

This is similar to a table name in relational databases.

Tags and Fields

InfluxDB distinguishes between indexed and non-indexed data:

-- Tags: indexed, used for filtering (low cardinality)
-- Fields: not indexed, stored as values (high cardinality)

cpu,host=server01,region=us-west value=0.640000000000 170000000
|---|----------------------|----------------------|---|-------------------|
|   |        tags         |        tags          |   |     timestamp     |
|   |  (indexed, string)  |  (indexed, string)  |   |   (nanoseconds)   |
|   |                     |                     |   |                   |
measurement              field                  field value

Example:

cpu,host=server01,region=us-west temperature=72.5,usage=0.64 1700000000000000000

Line Protocol

Line protocol is the format for writing data to InfluxDB:

measurement,tag1=value1,tag2=value2 field1=value1,field2=value2 timestamp

Writing data using line protocol:

# Using InfluxDB CLI
influx write "cpu,host=server01,region=us-west temperature=72.5,usage=0.64"

# Multiple points
influx write \
  "cpu,host=server01,region=us-west temperature=72.5,usage=0.64 1700000000000000000" \
  "cpu,host=server02,region=us-east temperature=68.0,usage=0.58 1700000000000000000"

Timestamp Precision

Timestamps can be in different precisions:

# Nanoseconds (default)
influx write "cpu,host=server01 value=0.5 1700000000000000000"

# Seconds
influx write "cpu,host=server01 value=0.5 1700000000"

# Milliseconds
influx write "cpu,host=server01 value=0.5 1700000000000"

# Microseconds
influx write "cpu,host=server01 value=0.5 1700000000000000"

Installing InfluxDB

docker run -p 8086:8086 \
  -v influxdb2-data:/var/lib/influxdb2 \
  -v influxdb2-config:/etc/influxdb2 \
  --name influxdb \
  influxdb:2.7

Access the UI at http://localhost:8086 and create your initial organization and bucket.

Linux Installation

# Add InfluxData repository
wget -qO- https://repos.influxdata.com/influxdata-archive_compat.key | gpg --dearmor > /etc/apt/trusted.gpg.d/influxdata.gpg
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata.gpg] https://repos.influxdata.com/debian stable main' | tee /etc/apt/sources.list.d/influxdata.list
apt update
apt install influxdb2

# Start service
systemctl start influxdb
systemctl enable influxdb

Configuration

Key configuration in /etc/influxdb2/config.yaml:

# HTTP bind address
http-bind-address: ":8086"

# Data directory
data-dir: "/var/lib/influxdb2"

# WAL directory
wal-dir: "/var/lib/influxdb2/wal"

# Retention settings
retention:
  enabled: true

InfluxQL: InfluxDB Query Language

InfluxQL is SQL-like query language for InfluxDB.

Basic Queries

-- Select all data from a measurement
SELECT * FROM cpu

-- Select specific fields
SELECT host, temperature FROM cpu

-- With time range
SELECT * FROM cpu WHERE time > now() - 1h

-- Limit results
SELECT * FROM cpu LIMIT 10

Time Functions

-- Group by time (5-minute buckets)
SELECT mean(value) FROM cpu 
WHERE time > now() - 1h 
GROUP BY time(5m)

-- Fill missing values
SELECT mean(value) FROM cpu 
WHERE time > now() - 1h 
GROUP BY time(5m) 
FILL(0)

-- Time alignment
SELECT mean(value) FROM cpu 
GROUP BY time(5m, 1h)  -- offset by 1 hour

Aggregation Functions

-- Count
SELECT count(value) FROM cpu

-- Mean, median
SELECT mean(value), median(value) FROM cpu

-- Min, max, percentile
SELECT min(value), max(value), percentile(value, 95) FROM cpu

-- Difference, derivative
SELECT derivative(value, 1s) FROM cpu

-- Cumulative sum
SELECT cumulative_sum(value) FROM cpu

Filtering with Tags

-- Filter by tag
SELECT * FROM cpu WHERE host = 'server01'

-- Regex filter
SELECT * FROM cpu WHERE host =~ /server.*/

-- Multiple conditions
SELECT * FROM cpu WHERE host = 'server01' AND region = 'us-west'

Multiple Measurements

-- Query multiple measurements
SELECT * FROM /cpu|memory|disk/

-- Join measurements
SELECT 
  c.value as cpu_value,
  m.value as memory_value
FROM cpu c 
JOIN memory m 
ON c.host = m.host AND c.time = m.time

Data Modeling

Effective InfluxDB schemas follow specific patterns.

Tag vs Field Selection

Use tags for:

  • Dimensions used in WHERE clauses (host, region, service)
  • Low cardinality values
  • Frequently queried metadata

Use fields for:

  • Numeric values you aggregate
  • High cardinality data
  • Values you don’t filter on
-- Good: tags for filtering
cpu,host=server01,region=us-west,env=production temperature=72,usage=0.65

-- Avoid: too many tags
cpu,host=server01,ip=192.168.1.1,process=nginx,... temperature=72

Designing Measurements

-- Separate by metric type
CREATE MEASUREMENT cpu
CREATE MEASUREMENT memory
CREATE MEASUREMENT disk

-- vs. combined with field
-- cpu,host=server01 cpu_temp=72 memory_used=8GB

Retention Policies

-- Create retention policy
CREATE RETENTION POLICY "one_day" ON "mydb" 
  DURATION 1d 
  REPLICATION 1

-- Write with retention policy
INSERT INTO one_day cpu,host=server01 value=0.5

-- Query with retention policy
SELECT * FROM one_day.cpu

Continuous Queries

Automatically downsample data:

-- Create continuous query
CREATE CONTINUOUS QUERY "cpu_1h" ON "mydb" 
BEGIN 
  SELECT mean(value) as value 
  INTO "cpu_1h" 
  FROM "cpu" 
  GROUP BY time(1h), host 
END

-- View continuous queries
SHOW CONTINUOUS QUERIES

Client Libraries

Python Client

from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS

# Connect to InfluxDB
client = InfluxDBClient(
    url="http://localhost:8086",
    token="my-token",
    org="my-org"
)

# Write data
write_api = client.write_api(write_options=SYNCHRONOUS)

point = Point("cpu") \
    .tag("host", "server01") \
    .field("usage", 0.64) \
    .time("2026-01-01T00:00:00Z")

write_api.write(bucket="my-bucket", org="my-org", record=point)

# Query data
query_api = client.query_api()
query = 'from(bucket: "my-bucket") |> range(start: -1h) |> filter(fn: r => r._measurement == "cpu")'
result = query_api.query_data_frame(query)

InfluxDB CLI

# Interactive shell
influx

# Create bucket
influx bucket create --name metrics --org my-org

# Write line protocol
influx write --bucket metrics --precision s "cpu,host=server01 value=0.5"

# Query
influx query 'from(bucket: "metrics") |> range(start: -1h)'

Practical Examples

IoT Sensor Data

-- Write sensor data
temperature,sensor_id=sensor-01,location=warehouse-1 temperature=22.5 1700000000
temperature,sensor_id=sensor-02,location=warehouse-1 temperature=23.1 1700000000
temperature,sensor_id=sensor-01,location=warehouse-1 temperature=22.7 1700000100

-- Query average temperature per sensor
SELECT mean(temperature) FROM temperature 
WHERE time > now() - 24h 
GROUP BY sensor_id, location

Application Metrics

-- Request latency
http_requests,method=GET,status=200,endpoint=/api/users latency=45.2 1700000000

-- Query p95 latency
SELECT percentile(latency, 95) FROM http_requests 
WHERE time > now() - 1h 
GROUP BY endpoint, method

Conclusion

InfluxDB provides a powerful foundation for time-series workloads. Its data model with measurements, tags, and fields optimizes for the write-heavy, query-intensive nature of time-series data. InfluxQL provides SQL-like querying with specialized time functions, while line protocol enables efficient data ingestion.

Key concepts to remember:

  • Use tags for low-cardinality, queryable metadata
  • Use fields for high-cardinality values
  • Leverage retention policies and continuous queries for data management
  • Use client libraries for programmatic access

In the next article, we’ll explore InfluxDB operations—installation in production, configuration, backup strategies, and monitoring.

Resources

Comments