Introduction
Time-series data is the foundation of modern monitoring, IoT, and analytics applications. Whether you’re tracking server metrics, sensor readings, or financial prices, you need a database optimized for timestamped data. InfluxDB, built by InfluxData, is the leading open-source time-series database that handles high write and query throughput while providing a powerful query language.
InfluxDB differs from relational databases by natively understanding time. It provides specialized functions for time-series analysis, automatic data downsampling, and retention policies. In this article, we explore InfluxDB fundamentals: data model, line protocol, InfluxQL, and practical examples.
Understanding the Data Model
InfluxDB organizes data hierarchically: measurements, tags, fields, and timestamps.
Measurements
A measurement is like a SQL table containing time-series data:
measurement: cpu
This is similar to a table name in relational databases.
Tags and Fields
InfluxDB distinguishes between indexed and non-indexed data:
-- Tags: indexed, used for filtering (low cardinality)
-- Fields: not indexed, stored as values (high cardinality)
cpu,host=server01,region=us-west value=0.640000000000 170000000
|---|----------------------|----------------------|---|-------------------|
| | tags | tags | | timestamp |
| | (indexed, string) | (indexed, string) | | (nanoseconds) |
| | | | | |
measurement field field value
Example:
cpu,host=server01,region=us-west temperature=72.5,usage=0.64 1700000000000000000
Line Protocol
Line protocol is the format for writing data to InfluxDB:
measurement,tag1=value1,tag2=value2 field1=value1,field2=value2 timestamp
Writing data using line protocol:
# Using InfluxDB CLI
influx write "cpu,host=server01,region=us-west temperature=72.5,usage=0.64"
# Multiple points
influx write \
"cpu,host=server01,region=us-west temperature=72.5,usage=0.64 1700000000000000000" \
"cpu,host=server02,region=us-east temperature=68.0,usage=0.58 1700000000000000000"
Timestamp Precision
Timestamps can be in different precisions:
# Nanoseconds (default)
influx write "cpu,host=server01 value=0.5 1700000000000000000"
# Seconds
influx write "cpu,host=server01 value=0.5 1700000000"
# Milliseconds
influx write "cpu,host=server01 value=0.5 1700000000000"
# Microseconds
influx write "cpu,host=server01 value=0.5 1700000000000000"
Installing InfluxDB
Docker Installation (Recommended for Development)
docker run -p 8086:8086 \
-v influxdb2-data:/var/lib/influxdb2 \
-v influxdb2-config:/etc/influxdb2 \
--name influxdb \
influxdb:2.7
Access the UI at http://localhost:8086 and create your initial organization and bucket.
Linux Installation
# Add InfluxData repository
wget -qO- https://repos.influxdata.com/influxdata-archive_compat.key | gpg --dearmor > /etc/apt/trusted.gpg.d/influxdata.gpg
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata.gpg] https://repos.influxdata.com/debian stable main' | tee /etc/apt/sources.list.d/influxdata.list
apt update
apt install influxdb2
# Start service
systemctl start influxdb
systemctl enable influxdb
Configuration
Key configuration in /etc/influxdb2/config.yaml:
# HTTP bind address
http-bind-address: ":8086"
# Data directory
data-dir: "/var/lib/influxdb2"
# WAL directory
wal-dir: "/var/lib/influxdb2/wal"
# Retention settings
retention:
enabled: true
InfluxQL: InfluxDB Query Language
InfluxQL is SQL-like query language for InfluxDB.
Basic Queries
-- Select all data from a measurement
SELECT * FROM cpu
-- Select specific fields
SELECT host, temperature FROM cpu
-- With time range
SELECT * FROM cpu WHERE time > now() - 1h
-- Limit results
SELECT * FROM cpu LIMIT 10
Time Functions
-- Group by time (5-minute buckets)
SELECT mean(value) FROM cpu
WHERE time > now() - 1h
GROUP BY time(5m)
-- Fill missing values
SELECT mean(value) FROM cpu
WHERE time > now() - 1h
GROUP BY time(5m)
FILL(0)
-- Time alignment
SELECT mean(value) FROM cpu
GROUP BY time(5m, 1h) -- offset by 1 hour
Aggregation Functions
-- Count
SELECT count(value) FROM cpu
-- Mean, median
SELECT mean(value), median(value) FROM cpu
-- Min, max, percentile
SELECT min(value), max(value), percentile(value, 95) FROM cpu
-- Difference, derivative
SELECT derivative(value, 1s) FROM cpu
-- Cumulative sum
SELECT cumulative_sum(value) FROM cpu
Filtering with Tags
-- Filter by tag
SELECT * FROM cpu WHERE host = 'server01'
-- Regex filter
SELECT * FROM cpu WHERE host =~ /server.*/
-- Multiple conditions
SELECT * FROM cpu WHERE host = 'server01' AND region = 'us-west'
Multiple Measurements
-- Query multiple measurements
SELECT * FROM /cpu|memory|disk/
-- Join measurements
SELECT
c.value as cpu_value,
m.value as memory_value
FROM cpu c
JOIN memory m
ON c.host = m.host AND c.time = m.time
Data Modeling
Effective InfluxDB schemas follow specific patterns.
Tag vs Field Selection
Use tags for:
- Dimensions used in WHERE clauses (host, region, service)
- Low cardinality values
- Frequently queried metadata
Use fields for:
- Numeric values you aggregate
- High cardinality data
- Values you don’t filter on
-- Good: tags for filtering
cpu,host=server01,region=us-west,env=production temperature=72,usage=0.65
-- Avoid: too many tags
cpu,host=server01,ip=192.168.1.1,process=nginx,... temperature=72
Designing Measurements
-- Separate by metric type
CREATE MEASUREMENT cpu
CREATE MEASUREMENT memory
CREATE MEASUREMENT disk
-- vs. combined with field
-- cpu,host=server01 cpu_temp=72 memory_used=8GB
Retention Policies
-- Create retention policy
CREATE RETENTION POLICY "one_day" ON "mydb"
DURATION 1d
REPLICATION 1
-- Write with retention policy
INSERT INTO one_day cpu,host=server01 value=0.5
-- Query with retention policy
SELECT * FROM one_day.cpu
Continuous Queries
Automatically downsample data:
-- Create continuous query
CREATE CONTINUOUS QUERY "cpu_1h" ON "mydb"
BEGIN
SELECT mean(value) as value
INTO "cpu_1h"
FROM "cpu"
GROUP BY time(1h), host
END
-- View continuous queries
SHOW CONTINUOUS QUERIES
Client Libraries
Python Client
from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS
# Connect to InfluxDB
client = InfluxDBClient(
url="http://localhost:8086",
token="my-token",
org="my-org"
)
# Write data
write_api = client.write_api(write_options=SYNCHRONOUS)
point = Point("cpu") \
.tag("host", "server01") \
.field("usage", 0.64) \
.time("2026-01-01T00:00:00Z")
write_api.write(bucket="my-bucket", org="my-org", record=point)
# Query data
query_api = client.query_api()
query = 'from(bucket: "my-bucket") |> range(start: -1h) |> filter(fn: r => r._measurement == "cpu")'
result = query_api.query_data_frame(query)
InfluxDB CLI
# Interactive shell
influx
# Create bucket
influx bucket create --name metrics --org my-org
# Write line protocol
influx write --bucket metrics --precision s "cpu,host=server01 value=0.5"
# Query
influx query 'from(bucket: "metrics") |> range(start: -1h)'
Practical Examples
IoT Sensor Data
-- Write sensor data
temperature,sensor_id=sensor-01,location=warehouse-1 temperature=22.5 1700000000
temperature,sensor_id=sensor-02,location=warehouse-1 temperature=23.1 1700000000
temperature,sensor_id=sensor-01,location=warehouse-1 temperature=22.7 1700000100
-- Query average temperature per sensor
SELECT mean(temperature) FROM temperature
WHERE time > now() - 24h
GROUP BY sensor_id, location
Application Metrics
-- Request latency
http_requests,method=GET,status=200,endpoint=/api/users latency=45.2 1700000000
-- Query p95 latency
SELECT percentile(latency, 95) FROM http_requests
WHERE time > now() - 1h
GROUP BY endpoint, method
Conclusion
InfluxDB provides a powerful foundation for time-series workloads. Its data model with measurements, tags, and fields optimizes for the write-heavy, query-intensive nature of time-series data. InfluxQL provides SQL-like querying with specialized time functions, while line protocol enables efficient data ingestion.
Key concepts to remember:
- Use tags for low-cardinality, queryable metadata
- Use fields for high-cardinality values
- Leverage retention policies and continuous queries for data management
- Use client libraries for programmatic access
In the next article, we’ll explore InfluxDB operations—installation in production, configuration, backup strategies, and monitoring.
Comments