InfluxDB Basics: Getting Started with Time-Series Data

Introduction

Time-series data is the foundation of modern monitoring, IoT, and analytics applications. Whether you’re tracking server metrics, sensor readings, or financial prices, you need a database optimized for timestamped data. InfluxDB, built by InfluxData, is the leading open-source time-series database that handles high write and query throughput while providing a powerful query language.

InfluxDB differs from relational databases by natively understanding time. It provides specialized functions for time-series analysis, automatic data downsampling, and retention policies. In this article, we explore InfluxDB fundamentals: data model, line protocol, InfluxQL, and practical examples.

Understanding the Data Model

InfluxDB organizes data hierarchically: measurements, tags, fields, and timestamps.

Measurements

A measurement is like a SQL table containing time-series data:

measurement: cpu

This is similar to a table name in relational databases.

Tags and Fields

InfluxDB distinguishes between indexed and non-indexed data:

-- Tags: indexed, used for filtering (low cardinality)
-- Fields: not indexed, stored as values (high cardinality)

cpu,host=server01,region=us-west value=0.640000000000 170000000
|---|----------------------|----------------------|---|-------------------|
|   |        tags         |        tags          |   |     timestamp     |
|   |  (indexed, string)  |  (indexed, string)  |   |   (nanoseconds)   |
|   |                     |                     |   |                   |
measurement              field                  field value

Example:

cpu,host=server01,region=us-west temperature=72.5,usage=0.64 1700000000000000000

Line Protocol

Line protocol is the format for writing data to InfluxDB:

measurement,tag1=value1,tag2=value2 field1=value1,field2=value2 timestamp

Writing data using line protocol:

# Using InfluxDB CLI
influx write "cpu,host=server01,region=us-west temperature=72.5,usage=0.64"

# Multiple points
influx write \
  "cpu,host=server01,region=us-west temperature=72.5,usage=0.64 1700000000000000000" \
  "cpu,host=server02,region=us-east temperature=68.0,usage=0.58 1700000000000000000"

Timestamp Precision

Timestamps can be in different precisions:

# Nanoseconds (default)
influx write "cpu,host=server01 value=0.5 1700000000000000000"

# Seconds
influx write "cpu,host=server01 value=0.5 1700000000"

# Milliseconds
influx write "cpu,host=server01 value=0.5 1700000000000"

# Microseconds
influx write "cpu,host=server01 value=0.5 1700000000000000"

Installing InfluxDB

Docker Installation (Recommended for Development)

docker run -p 8086:8086 \
  -v influxdb2-data:/var/lib/influxdb2 \
  -v influxdb2-config:/etc/influxdb2 \
  --name influxdb \
  influxdb:2.7

Access the UI at http://localhost:8086 and create your initial organization and bucket.

Linux Installation

# Add InfluxData repository
wget -qO- https://repos.influxdata.com/influxdata-archive_compat.key | gpg --dearmor > /etc/apt/trusted.gpg.d/influxdata.gpg
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata.gpg] https://repos.influxdata.com/debian stable main' | tee /etc/apt/sources.list.d/influxdata.list
apt update
apt install influxdb2

# Start service
systemctl start influxdb
systemctl enable influxdb

Configuration

Key configuration in /etc/influxdb2/config.yaml:

# HTTP bind address
http-bind-address: ":8086"

# Data directory
data-dir: "/var/lib/influxdb2"

# WAL directory
wal-dir: "/var/lib/influxdb2/wal"

# Retention settings
retention:
  enabled: true

InfluxQL: InfluxDB Query Language

InfluxQL is SQL-like query language for InfluxDB.

Basic Queries

-- Select all data from a measurement
SELECT * FROM cpu

-- Select specific fields
SELECT host, temperature FROM cpu

-- With time range
SELECT * FROM cpu WHERE time > now() - 1h

-- Limit results
SELECT * FROM cpu LIMIT 10

Time Functions

-- Group by time (5-minute buckets)
SELECT mean(value) FROM cpu 
WHERE time > now() - 1h 
GROUP BY time(5m)

-- Fill missing values
SELECT mean(value) FROM cpu 
WHERE time > now() - 1h 
GROUP BY time(5m) 
FILL(0)

-- Time alignment
SELECT mean(value) FROM cpu 
GROUP BY time(5m, 1h)  -- offset by 1 hour

Aggregation Functions

-- Count
SELECT count(value) FROM cpu

-- Mean, median
SELECT mean(value), median(value) FROM cpu

-- Min, max, percentile
SELECT min(value), max(value), percentile(value, 95) FROM cpu

-- Difference, derivative
SELECT derivative(value, 1s) FROM cpu

-- Cumulative sum
SELECT cumulative_sum(value) FROM cpu

Filtering with Tags

-- Filter by tag
SELECT * FROM cpu WHERE host = 'server01'

-- Regex filter
SELECT * FROM cpu WHERE host =~ /server.*/

-- Multiple conditions
SELECT * FROM cpu WHERE host = 'server01' AND region = 'us-west'

Multiple Measurements

-- Query multiple measurements
SELECT * FROM /cpu|memory|disk/

-- Join measurements
SELECT 
  c.value as cpu_value,
  m.value as memory_value
FROM cpu c 
JOIN memory m 
ON c.host = m.host AND c.time = m.time

Data Modeling

Effective InfluxDB schemas follow specific patterns.

Tag vs Field Selection

Use tags for:

Dimensions used in WHERE clauses (host, region, service)
Low cardinality values
Frequently queried metadata

Use fields for:

Numeric values you aggregate
High cardinality data
Values you don’t filter on

-- Good: tags for filtering
cpu,host=server01,region=us-west,env=production temperature=72,usage=0.65

-- Avoid: too many tags
cpu,host=server01,ip=192.168.1.1,process=nginx,... temperature=72

Designing Measurements

-- Separate by metric type
CREATE MEASUREMENT cpu
CREATE MEASUREMENT memory
CREATE MEASUREMENT disk

-- vs. combined with field
-- cpu,host=server01 cpu_temp=72 memory_used=8GB

Retention Policies

-- Create retention policy
CREATE RETENTION POLICY "one_day" ON "mydb" 
  DURATION 1d 
  REPLICATION 1

-- Write with retention policy
INSERT INTO one_day cpu,host=server01 value=0.5

-- Query with retention policy
SELECT * FROM one_day.cpu

Continuous Queries

Automatically downsample data:

-- Create continuous query
CREATE CONTINUOUS QUERY "cpu_1h" ON "mydb" 
BEGIN 
  SELECT mean(value) as value 
  INTO "cpu_1h" 
  FROM "cpu" 
  GROUP BY time(1h), host 
END

-- View continuous queries
SHOW CONTINUOUS QUERIES

Client Libraries

Python Client

from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS

# Connect to InfluxDB
client = InfluxDBClient(
    url="http://localhost:8086",
    token="my-token",
    org="my-org"
)

# Write data
write_api = client.write_api(write_options=SYNCHRONOUS)

point = Point("cpu") \
    .tag("host", "server01") \
    .field("usage", 0.64) \
    .time("2026-01-01T00:00:00Z")

write_api.write(bucket="my-bucket", org="my-org", record=point)

# Query data
query_api = client.query_api()
query = 'from(bucket: "my-bucket") |> range(start: -1h) |> filter(fn: r => r._measurement == "cpu")'
result = query_api.query_data_frame(query)

InfluxDB CLI

# Interactive shell
influx

# Create bucket
influx bucket create --name metrics --org my-org

# Write line protocol
influx write --bucket metrics --precision s "cpu,host=server01 value=0.5"

# Query
influx query 'from(bucket: "metrics") |> range(start: -1h)'

Practical Examples

IoT Sensor Data

-- Write sensor data
temperature,sensor_id=sensor-01,location=warehouse-1 temperature=22.5 1700000000
temperature,sensor_id=sensor-02,location=warehouse-1 temperature=23.1 1700000000
temperature,sensor_id=sensor-01,location=warehouse-1 temperature=22.7 1700000100

-- Query average temperature per sensor
SELECT mean(temperature) FROM temperature 
WHERE time > now() - 24h 
GROUP BY sensor_id, location

Application Metrics

-- Request latency
http_requests,method=GET,status=200,endpoint=/api/users latency=45.2 1700000000

-- Query p95 latency
SELECT percentile(latency, 95) FROM http_requests 
WHERE time > now() - 1h 
GROUP BY endpoint, method

Conclusion

InfluxDB provides a powerful foundation for time-series workloads. Its data model with measurements, tags, and fields optimizes for the write-heavy, query-intensive nature of time-series data. InfluxQL provides SQL-like querying with specialized time functions, while line protocol enables efficient data ingestion.

Key concepts to remember:

Use tags for low-cardinality, queryable metadata
Use fields for high-cardinality values
Leverage retention policies and continuous queries for data management
Use client libraries for programmatic access

In the next article, we’ll explore InfluxDB operations—installation in production, configuration, backup strategies, and monitoring.