Skip to main content
โšก Calmops

Introduction to Time Series Analysis

Introduction

Time series data is everywhereโ€”from stock prices to sensor readings to website traffic. Analyzing temporal patterns and forecasting future values are essential skills. This guide covers time series analysis fundamentals.

What Is Time Series

Definition

A time series is a sequence of data points indexed by time.

Types

  • Continuous: Measurements at every instant
  • Discrete: Measurements at specific intervals
  • Univariate: Single variable over time
  • Multivariate: Multiple variables

Components

Trend

Long-term movement in the data:

import pandas as pd
import numpy as np

# Generate data with trend
np.random.seed(42)
dates = pd.date_range('2020-01-01', periods=100, freq='D')
values = np.linspace(10, 50, 100) + np.random.randn(100) * 5

series = pd.Series(values, index=dates)

Seasonality

Repeating patterns at regular intervals:

# Monthly seasonality
seasonal = np.sin(np.linspace(0, 4*np.pi, 100)) * 10

Noise

Random fluctuations:

noise = np.random.randn(100) * 2

Decomposition

from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose the time series
decomposition = seasonal_decompose(series, model='additive', period=12)

# Access components
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

Stationarity

What Is Stationarity

A stationary time series has constant statistical properties over time:

  • Constant mean
  • Constant variance
  • Covariance doesn’t depend on time

Testing Stationarity

ADF Test:

from statsmodels.tsa.stattools import adfuller

def adf_test(series):
    result = adfuller(series.dropna())
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    print('Stationary' if result[1] < 0.05 else 'Non-stationary')

KPSS Test:

from statsmodels.tsa.stattools import kpss

result = kpss(series, regression='c', nlags='auto')
print('Stationary' if result[1] > 0.05 else 'Non-stationary')

Making Series Stationary

Differencing:

# First difference
diff1 = series.diff()

# Second difference
diff2 = series.diff().diff()

# Seasonal differencing
seasonal_diff = series.diff(12)

Log Transform:

log_series = np.log(series)

Forecasting Methods

Simple Methods

Moving Average:

# Simple moving average
rolling_mean = series.rolling(window=12).mean()

# Exponential moving average
ewm = series.ewm(span=12).mean()

Exponential Smoothing:

from statsmodels.tsa.holtwinters import ExponentialSmoothing

model = ExponentialSmoothing(
    series,
    trend='add',
    seasonal='add',
    seasonal_periods=12
).fit()

forecast = model.forecast(24)

ARIMA Models

Components

  • AR (p): Autoregressive - past values
  • I (d): Integrated - differencing
  • MA (q): Moving average - past errors

Finding Parameters

from pmdarima import auto_arima

# Automatically find best parameters
model = auto_arima(
    series,
    start_p=0, max_p=3,
    start_q=0, max_q=3,
    d=None,  # Let it determine d
    seasonal=True,
    m=12,
    stepwise=True,
    trace=True
)

print(model.summary())

Manual ARIMA

from statsmodels.tsa.arima.model import ARIMA

# Fit model
model = ARIMA(series, order=(1, 1, 1))
fitted = model.fit()

# Forecast
forecast = fitted.forecast(steps=24)

Prophet (Facebook)

Quick Start

from prophet import Prophet

# Prepare data (requires 'ds' and 'y' columns)
df = pd.DataFrame({
    'ds': dates,
    'y': values
})

# Create and fit model
model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False
)
model.fit(df)

# Make predictions
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)

# Plot
model.plot(forecast)
model.plot_components(forecast)

Machine Learning Approaches

Feature Engineering

def create_features(df):
    df = df.copy()
    df['hour'] = df.index.hour
    df['dayofweek'] = df.index.dayofweek
    df['month'] = df.index.month
    df['quarter'] = df.index.quarter
    df['lag_1'] = df['value'].shift(1)
    df['lag_7'] = df['value'].shift(7)
    df['rolling_mean_7'] = df['value'].rolling(7).mean()
    return df

Using XGBoost

from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Create features
df = create_features(df.dropna())

# Split
train = df.iloc[:-30]
test = df.iloc[-30:]

# Train
X_train = train.drop('value', axis=1)
y_train = train['value']
model = XGBRegressor(n_estimators=100)
model.fit(X_train, y_train)

# Predict
predictions = model.predict(test.drop('value', axis=1))

Evaluation Metrics

Common Metrics

from sklearn.metrics import mean_absolute_error, mean_squared_error

mae = mean_absolute_error(actual, predicted)
rmse = np.sqrt(mean_squared_error(actual, predicted))
mape = np.mean(np.abs((actual - predicted) / actual)) * 100

Cross-Validation

from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
for train_index, test_index in tscv.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    # Train and evaluate

Practical Applications

Demand Forecasting

# E-commerce demand prediction
from prophet import Prophet

sales = pd.read_csv('sales.csv', parse_dates=['date'])
sales_prophet = sales.rename(columns={'date': 'ds', 'sales': 'y'})

model = Prophet()
model.fit(sales_prophet)
future = model.make_future_dataframe(periods=90)
forecast = model.predict(future)

Anomaly Detection

# Detect anomalies using z-scores
def detect_anomalies(series, threshold=3):
    mean = series.mean()
    std = series.std()
    anomalies = series[(series - mean).abs() > threshold * std]
    return anomalies

anomalies = detect_anomalies(series)

Conclusion

Time series analysis combines statistics, domain knowledge, and machine learning. Start with visualization and decomposition, then apply appropriate models. Remember: forecasting accuracy degrades with longer horizons.


Resources

Comments