Introduction
Time series data is everywhereโfrom stock prices to sensor readings to website traffic. Analyzing temporal patterns and forecasting future values are essential skills. This guide covers time series analysis fundamentals.
What Is Time Series
Definition
A time series is a sequence of data points indexed by time.
Types
- Continuous: Measurements at every instant
- Discrete: Measurements at specific intervals
- Univariate: Single variable over time
- Multivariate: Multiple variables
Components
Trend
Long-term movement in the data:
import pandas as pd
import numpy as np
# Generate data with trend
np.random.seed(42)
dates = pd.date_range('2020-01-01', periods=100, freq='D')
values = np.linspace(10, 50, 100) + np.random.randn(100) * 5
series = pd.Series(values, index=dates)
Seasonality
Repeating patterns at regular intervals:
# Monthly seasonality
seasonal = np.sin(np.linspace(0, 4*np.pi, 100)) * 10
Noise
Random fluctuations:
noise = np.random.randn(100) * 2
Decomposition
from statsmodels.tsa.seasonal import seasonal_decompose
# Decompose the time series
decomposition = seasonal_decompose(series, model='additive', period=12)
# Access components
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
Stationarity
What Is Stationarity
A stationary time series has constant statistical properties over time:
- Constant mean
- Constant variance
- Covariance doesn’t depend on time
Testing Stationarity
ADF Test:
from statsmodels.tsa.stattools import adfuller
def adf_test(series):
result = adfuller(series.dropna())
print(f'ADF Statistic: {result[0]:.4f}')
print(f'p-value: {result[1]:.4f}')
print('Stationary' if result[1] < 0.05 else 'Non-stationary')
KPSS Test:
from statsmodels.tsa.stattools import kpss
result = kpss(series, regression='c', nlags='auto')
print('Stationary' if result[1] > 0.05 else 'Non-stationary')
Making Series Stationary
Differencing:
# First difference
diff1 = series.diff()
# Second difference
diff2 = series.diff().diff()
# Seasonal differencing
seasonal_diff = series.diff(12)
Log Transform:
log_series = np.log(series)
Forecasting Methods
Simple Methods
Moving Average:
# Simple moving average
rolling_mean = series.rolling(window=12).mean()
# Exponential moving average
ewm = series.ewm(span=12).mean()
Exponential Smoothing:
from statsmodels.tsa.holtwinters import ExponentialSmoothing
model = ExponentialSmoothing(
series,
trend='add',
seasonal='add',
seasonal_periods=12
).fit()
forecast = model.forecast(24)
ARIMA Models
Components
- AR (p): Autoregressive - past values
- I (d): Integrated - differencing
- MA (q): Moving average - past errors
Finding Parameters
from pmdarima import auto_arima
# Automatically find best parameters
model = auto_arima(
series,
start_p=0, max_p=3,
start_q=0, max_q=3,
d=None, # Let it determine d
seasonal=True,
m=12,
stepwise=True,
trace=True
)
print(model.summary())
Manual ARIMA
from statsmodels.tsa.arima.model import ARIMA
# Fit model
model = ARIMA(series, order=(1, 1, 1))
fitted = model.fit()
# Forecast
forecast = fitted.forecast(steps=24)
Prophet (Facebook)
Quick Start
from prophet import Prophet
# Prepare data (requires 'ds' and 'y' columns)
df = pd.DataFrame({
'ds': dates,
'y': values
})
# Create and fit model
model = Prophet(
yearly_seasonality=True,
weekly_seasonality=True,
daily_seasonality=False
)
model.fit(df)
# Make predictions
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)
# Plot
model.plot(forecast)
model.plot_components(forecast)
Machine Learning Approaches
Feature Engineering
def create_features(df):
df = df.copy()
df['hour'] = df.index.hour
df['dayofweek'] = df.index.dayofweek
df['month'] = df.index.month
df['quarter'] = df.index.quarter
df['lag_1'] = df['value'].shift(1)
df['lag_7'] = df['value'].shift(7)
df['rolling_mean_7'] = df['value'].rolling(7).mean()
return df
Using XGBoost
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
# Create features
df = create_features(df.dropna())
# Split
train = df.iloc[:-30]
test = df.iloc[-30:]
# Train
X_train = train.drop('value', axis=1)
y_train = train['value']
model = XGBRegressor(n_estimators=100)
model.fit(X_train, y_train)
# Predict
predictions = model.predict(test.drop('value', axis=1))
Evaluation Metrics
Common Metrics
from sklearn.metrics import mean_absolute_error, mean_squared_error
mae = mean_absolute_error(actual, predicted)
rmse = np.sqrt(mean_squared_error(actual, predicted))
mape = np.mean(np.abs((actual - predicted) / actual)) * 100
Cross-Validation
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5)
for train_index, test_index in tscv.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# Train and evaluate
Practical Applications
Demand Forecasting
# E-commerce demand prediction
from prophet import Prophet
sales = pd.read_csv('sales.csv', parse_dates=['date'])
sales_prophet = sales.rename(columns={'date': 'ds', 'sales': 'y'})
model = Prophet()
model.fit(sales_prophet)
future = model.make_future_dataframe(periods=90)
forecast = model.predict(future)
Anomaly Detection
# Detect anomalies using z-scores
def detect_anomalies(series, threshold=3):
mean = series.mean()
std = series.std()
anomalies = series[(series - mean).abs() > threshold * std]
return anomalies
anomalies = detect_anomalies(series)
Conclusion
Time series analysis combines statistics, domain knowledge, and machine learning. Start with visualization and decomposition, then apply appropriate models. Remember: forecasting accuracy degrades with longer horizons.
Comments