Introduction
High-frequency trading (HFT) represents the cutting edge of financial technology, where microseconds translate to millions of dollars. Building HFT infrastructure requires understanding of hardware acceleration, network optimization, and ultra-low latency software design.
This guide covers the complete HFT stack: from network architecture and colocation to FPGA implementation and latency measurement. Whether you’re building a trading system or optimizing existing infrastructure, these patterns will help you achieve the lowest possible latency.
Understanding HFT Latency
Latency Hierarchy
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HFT LATENCY HIERARCHY โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ TICK-TO-TRADE LATENCY BREAKDOWN โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โ Market Data Arrival โ โ
โ โ โ โ โ
โ โ โ Network Latency (10ฮผs - 1ms) โ โ
โ โ โ โ โ โ
โ โ โ โ NIC Processing (1-10ฮผs) โ โ
โ โ โ โ โ โ โ
โ โ โ โ โ OS/Driver Latency (1-5ฮผs) โ โ
โ โ โ โ โ โ โ โ
โ โ โ โ โ โ Application Processing (1-100ฮผs) โ โ
โ โ โ โ โ โ โ โ โ
โ โ โ โ โ โ | Order Entry (1-50ฮผs) โ โ
โ โ โ โ โ โ | โ โ โ
โ โ โ โ โ | | | Exchange Processing (10-100ฮผs)โ
โ โ โ โ โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ TARGET LATENCIES BY SYSTEM TYPE: โ
โ โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ System Type โ Target Latency โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ
โ โ FPGA Trading โ < 1 microsecond โ โ
โ โ Direct Memory โ 1-10 microseconds โ โ
โ โ Kernel Bypass โ 10-100 microseconds โ โ
โ โ Standard Linux โ 100 microseconds - 1 millisecond โ โ
โ โ Cloud-based โ 1-10 milliseconds โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Latency Measurement
// Ultra-low latency timestamp implementation
#include <stdint.h>
#include <time.h>
#include <sys/time.h>
// Hardware timestamp from CPU
static inline uint64_t get_cycles() {
unsigned int lo, hi;
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return ((uint64_t)hi << 32) | lo;
}
// Convert cycles to nanoseconds (requires calibration)
static inline uint64_t cycles_to_ns(uint64_t cycles, double ns_per_cycle) {
return (uint64_t)(cycles * ns_per_cycle);
}
// Timestamp with system time
static inline uint64_t get_timestamp_ns() {
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
return (uint64_t)ts.tv_sec * 1000000000ULL + ts.tv_nsec;
}
// Monotonic clock for intervals
static inline uint64_t get_monotonic_ns() {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return (uint64_t)ts.tv_sec * 1000000000ULL + ts.tv_nsec;
}
// Lock-free latency recorder
typedef struct {
uint64_t timestamp;
uint32_t sequence;
int32_t data;
} __attribute__((aligned(64))) LatencyRecord;
_Static_assert(sizeof(LatencyRecord) == 64, "Cache line size");
class LatencyRecorder {
private:
LatencyRecord* records;
uint32_t current_index;
uint64_t total_latency;
uint32_t sample_count;
public:
void record(uint64_t start_cycle, uint64_t end_cycle, double ns_per_cycle) {
uint64_t latency_ns = cycles_to_ns(end_cycle - start_cycle, ns_per_cycle);
// Lock-free recording using atomic
uint32_t index = __sync_fetch_and_add(¤t_index, 1) & 1023;
records[index].timestamp = get_timestamp_ns();
records[index].sequence = index;
records[index].data = (int32_t)latency_ns;
__sync_fetch_and_add(&total_latency, latency_ns);
__sync_fetch_and_add(&sample_count, 1);
}
double get_average_latency_ns() {
uint32_t count = __sync_fetch_and_add(&sample_count, 0);
if (count == 0) return 0;
return (double)__sync_fetch_and_add(&total_latency, 0) / count;
}
};
Network Architecture
Colocation and Network Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HFT NETWORK ARCHITECTURE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ EXCHANGE DATA CENTER โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ TRADING SERVER RACK โ โ โ
โ โ โ โ โ โ
โ โ โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ โ โ
โ โ โ โ Trading โ โ Trading โ โ Market โ โ โ โ
โ โ โ โ Server โ โ Server โ โ Data โ โ โ โ
โ โ โ โ (FPGA)โ โ (FPGA)โ โ Handler โ โ โ โ
โ โ โ โโโโโโฌโโโโโ โโโโโโฌโโโโโ โโโโโโฌโโโโโ โ โ โ
โ โ โ โ โ โ โ โ โ
โ โ โ โโโโโโโโโโโโโโผโโโโโโโโโโโโโ โ โ โ
โ โ โ โ โ โ โ
โ โ โ โโโโโโโดโโโโโโ โ โ โ
โ โ โ โ Switch โ โ โ โ
โ โ โ โ (Arista/ โ โ โ โ
โ โ โ โ Cisco) โ โ โ โ
โ โ โ โโโโโโโฌโโโโโโ โ โ โ
โ โ โ โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ โ โ โ
โ โ โ โโโโโโโโโโดโโโโโโโโโ โ โ โ
โ โ โ โ Exchange โ โ โ โ
โ โ โ โ Gateway โ โ โ โ
โ โ โ โ (Co-located) โ โ โ โ
โ โ โ โโโโโโโโโโฌโโโโโโโโ โ โ โ
โ โ โ โ โ โ โ
โ โ โ โโโโโโโดโโโโโโ โ โ โ
โ โ โ โ Exchange โ โ โ โ
โ โ โ โ Match โ โ โ โ
โ โ โ โ Engine โ โ โ โ
โ โ โ โโโโโโโโโโโโโ โ โ โ
โ โ โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ NETWORK LATENCY TARGETS: โ
โ โข Server to Switch: < 100 nanoseconds โ
โ โข Switch to Exchange: < 1 microsecond โ
โ โข Total Network: < 2 microseconds โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Kernel Bypass with DPDK
// DPDK-based ultra-low latency packet processing
#include <rte_ethdev.h>
#include <rte_mbuf.h>
#include <rte_cycles.h>
#define RX_DESC_DEFAULT 512
#define TX_DESC_DEFAULT 512
#define MBUF_CACHE_SIZE 250
#define BURST_SIZE 32
struct rte_mempool *mbuf_pool;
int port_init(uint16_t port, struct rte_eth_conf *port_conf) {
struct rte_eth_dev_info dev_info;
rte_eth_dev_info_get(port, &dev_info);
// Configure RX/TX queues
rte_eth_rx_queue_setup(
port, 0, RX_DESC_DEFAULT,
rte_eth_dev_socket_id(port),
NULL, mbuf_pool
);
rte_eth_tx_queue_setup(
port, 0, TX_DESC_DEFAULT,
rte_eth_dev_socket_id(port),
NULL
);
// Start port
rte_eth_dev_start(port);
rte_eth_promiscuous_enable(port);
return 0;
}
// Main packet processing loop
void packet_processing_loop(uint16_t port) {
struct rte_mbuf *rx_bufs[BURST_SIZE];
struct rte_mbuf *tx_bufs[BURST_SIZE];
while (1) {
// Receive packets
uint16_t nb_rx = rte_eth_rx_burst(
port, 0, rx_bufs, BURST_SIZE
);
if (nb_rx == 0) continue;
uint16_t nb_tx = 0;
// Process each packet
for (uint16_t i = 0; i < nb_rx; i++) {
struct rte_mbuf *pkt = rx_bufs[i];
// Parse and process market data
if (process_market_data(pkt) > 0) {
// Decision made - prepare order
tx_bufs[nb_tx++] = prepare_order(pkt);
}
}
// Send orders
if (nb_tx > 0) {
rte_eth_tx_burst(port, 0, tx_bufs, nb_tx);
}
// Free processed packets
for (uint16_t i = 0; i < nb_rx; i++) {
rte_pktmbuf_free(rx_bufs[i]);
}
}
}
Network Time Synchronization
# PTP (Precision Time Protocol) Configuration
import socket
import struct
import time
class PTPClient:
"""Precision Time Protocol client for nanosecond synchronization"""
def __init__(self, interface='eth0'):
self.interface = interface
self.socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_TIMESTAMPNS, 1)
def get_hardware_timestamp(self):
"""Get hardware timestamp from NIC"""
# SIOCGHWTSTAMP ioctl
pass
def synchronize(self, master_ip='192.168.1.100'):
"""Synchronize with PTP master"""
# Send sync request
sync_msg = self._create_sync_message()
send_time = time.time_ns()
self.socket.sendto(sync_msg, (master_ip, 319))
# Receive time
response, recv_time = self.socket.recvfrom(1024)
recv_time_ns = time.time_ns()
# Calculate offset
offset = self._calculate_offset(
send_time, recv_time_ns, response
)
return offset
def _create_sync_message(self):
# PTP message structure
pass
def _calculate_offset(self, send_time, recv_time, response):
# Calculate clock offset
pass
# Linux PTP configuration
"""
# /etc/linuxptp/ptp4l.conf
[global]
default_set 0
delay Mechanism E2E
network_transport L2
delay_request_interval 0
sync_interval -3
priority1 128
priority2 128
domainNumber 0
# Run ptp4l
# ptp4l -i eth0 -m -f /etc/linuxptp/ptp4l.conf
# Check synchronization status
# pmc -u -b 0 get STATUS
"""
FPGA Acceleration
FPGA Trading System Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FPGA TRADING SYSTEM โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ FPGA LOGIC BLOCKS โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ 10GbE MAC โ โ Market Dataโ โ Order Entry โ โ โ
โ โ โ (Xilinx IP) โโโบโ Parser โโโบโ Generator โ โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโ โ โ
โ โ โ โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ โ
โ โ โ Strategy โ โ Risk โ โ โ โ
โ โ โ Engine โโโโ Check โโโโโโโโโโโโโ โ โ
โ โ โ (Custom) โ โ Limits โ โ โ
โ โ โโโโโโโโฌโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โ โ โ โ
โ โ โโโโโโโโดโโโโโโโ โ โ
โ โ โ Order Book โ โ โ
โ โ โ Management โ โ โ
โ โ โโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ HOST SOFTWARE โ โ
โ โ โข Strategy management โ โ
โ โ โข Order and position management โ โ
โ โ โข Risk reporting โ โ
โ โ โข Logging and analytics โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ LATENCY BREAKDOWN: โ
โ โข NIC to FPGA: ~100ns โ
โ โข FPGA parsing: ~50ns โ
โ โข Strategy logic: ~100ns - 1ฮผs โ
โ โข Order generation: ~50ns โ
โ โข FPGA to NIC: ~100ns โ
โ โข TOTAL: < 1 microsecond โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Verilog Market Data Parser
// Market Data Parser - NASDAQ OUCH Protocol
module market_data_parser (
input wire clk,
input wire rst,
// 10GbE input
input wire [63:0] rx_data,
input wire rx_valid,
input wire rx_sop,
input wire rx_eop,
// Parsed output
output reg [31:0] symbol,
output reg [31:0] price,
output reg [31:0] size,
output reg [2:0] msg_type,
output reg data_valid,
output reg parse_error
);
// Protocol constants
localparam MSG_ADD_ORDER = 3'b001;
localparam MSG_EXECUTE = 3'b010;
localparam MSG_CANCEL = 3'b011;
localparam MSG_DELETE = 3'b100;
// State machine
reg [2:0] state;
localparam IDLE = 3'd0;
localparam PARSE_HEADER = 3'd1;
localparam PARSE_BODY = 3'd2;
localparam VALIDATE = 3'd3;
reg [15:0] msg_len;
reg [15:0] byte_count;
always @(posedge clk) begin
if (rst) begin
state <= IDLE;
data_valid <= 0;
parse_error <= 0;
end else begin
case (state)
IDLE: begin
if (rx_valid && rx_sop) begin
state <= PARSE_HEADER;
byte_count <= 0;
end
data_valid <= 0;
end
PARSE_HEADER: begin
if (rx_valid) begin
// First word: Message type + Length
msg_type <= rx_data[2:0];
msg_len <= rx_data[31:16];
byte_count <= byte_count + 8;
if (msg_len > 0) begin
state <= PARSE_BODY;
end
end
end
PARSE_BODY: begin
if (rx_valid) begin
byte_count <= byte_count + 8;
// Parse symbol (offset 8-11)
if (byte_count >= 8 && byte_count < 12) begin
symbol <= rx_data[31:0];
end
// Parse price (offset 16-19)
if (byte_count >= 16 && byte_count < 20) begin
price <= rx_data[31:0];
end
// Parse size (offset 20-23)
if (byte_count >= 20 && byte_count < 24) begin
size <= rx_data[31:0];
end
if (byte_count >= msg_len) begin
state <= VALIDATE;
end
end
end
VALIDATE: begin
// Basic validation
parse_error <= (price == 0) || (size == 0);
data_valid <= (price != 0) && (size != 0);
state <= IDLE;
end
endcase
end
end
endmodule
Order Management System
Low-Latency Order Router
// Java low-latency order management
import java.nio.ByteBuffer;
import java.util.concurrent.atomic.AtomicLong;
public class OrderRouter {
private final ByteBuffer sendBuffer;
private final long[] timestamps;
private final AtomicLong orderId;
// Pre-allocated order templates
private static final byte[] NEW_ORDER_TEMPLATE;
private static final byte[] CANCEL_TEMPLATE;
private static final byte[] MODIFY_TEMPLATE;
static {
NEW_ORDER_TEMPLATE = new byte[64];
CANCEL_TEMPLATE = new byte[32];
MODIFY_TEMPLATE = new byte[48];
}
public OrderRouter(int capacity) {
this.sendBuffer = ByteBuffer.allocateDirect(capacity);
this.timestamps = new long[10000];
this.orderId = new AtomicLong(0);
}
public long sendNewOrder(
String symbol,
Side side,
long price,
long size,
TimeInForce tif
) {
long startCycle = get_cycles();
// Generate order ID
long orderId = this.orderId.incrementAndGet();
// Build FIX/OUCH message
sendBuffer.clear();
// Message header
sendBuffer.put((byte) 'O'); // New Order
sendBuffer.putLong(orderId);
sendBuffer.putLong(System.nanoTime()); // Timestamp
// Symbol (6 bytes, space-padded)
byte[] symbolBytes = symbol.getBytes();
sendBuffer.put(symbolBytes, 0, Math.min(6, symbolBytes.length));
// Side
sendBuffer.put(side == Side.BUY ? (byte) 'B' : (byte) 'S');
// Price (8 bytes, long)
sendBuffer.putLong(price);
// Size (4 bytes)
sendBuffer.putInt((int) size);
// Time in force
sendBuffer.put((byte) tif.getCode());
// Send to exchange
int bytesSent = sendToExchange(sendBuffer);
// Record latency
long endCycle = get_cycles();
recordLatency(startCycle, endCycle, orderId);
return orderId;
}
public long sendCancel(long originalOrderId) {
sendBuffer.clear();
sendBuffer.put((byte) 'X'); // Cancel
sendBuffer.putLong(originalOrderId);
return sendToExchange(sendBuffer);
}
public long sendModify(long originalOrderId, long newPrice, long newSize) {
sendBuffer.clear();
sendBuffer.put((byte) 'U'); // Modify
sendBuffer.putLong(originalOrderId);
sendBuffer.putLong(newPrice);
sendBuffer.putInt((int) newSize);
return sendToExchange(sendBuffer);
}
private int sendToExchange(ByteBuffer buffer) {
// Direct NIO send or use kernel bypass
// Implementation depends on connectivity
return 0;
}
private native long get_cycles();
private void recordLatency(long start, long end, long orderId) {}
}
In-Memory Order Book
// Lock-free order book implementation
#include <stdatomic.h>
#include <stdbool.h>
#define MAX_PRICE_LEVELS 10000
#define MAX_ORDERS 100000
typedef struct {
atomic_uint_fast64_t price;
atomic_uint_fast32_t total_size;
atomic_uint_fast32_t order_count;
void* next;
} PriceLevel;
typedef struct {
atomic_uint_fast64_t order_id;
atomic_uint_fast64_t price;
atomic_uint_fast32_t size;
atomic_uint_fast32_t remaining;
atomic_uint_fast8_t side; // 0 = buy, 1 = sell
atomic_uint_fast8_t status;
atomic_uint_fast64_t timestamp;
} Order;
typedef struct {
PriceLevel* bid_levels[MAX_PRICE_LEVELS];
PriceLevel* ask_levels[MAX_PRICE_LEVELS];
Order orders[MAX_ORDERS];
atomic_uint_fast64_t best_bid;
atomic_uint_fast64_t best_ask;
atomic_uint_fast64_t last_trade_price;
atomic_uint_fast32_t total_bid_size;
atomic_uint_fast32_t total_ask_size;
} OrderBook;
void order_book_init(OrderBook* book) {
atomic_store(&book->best_bid, 0);
atomic_store(&book->best_ask, UINT64_MAX);
atomic_store(&book->total_bid_size, 0);
atomic_store(&book->total_ask_size, 0);
}
bool add_order(OrderBook* book, uint64_t order_id,
uint64_t price, uint32_t size, uint8_t side) {
// Find or create price level
PriceLevel* level = find_or_create_level(
side == 0 ? book->bid_levels : book->ask_levels,
price
);
if (level == NULL) return false;
// Create order
Order* order = &book->orders[order_id % MAX_ORDERS];
atomic_store(&order->order_id, order_id);
atomic_store(&order->price, price);
atomic_store(&order->size, size);
atomic_store(&order->remaining, size);
atomic_store(&order->side, side);
atomic_store(&order->status, 1); // New
// Update price level
atomic_fetch_add(&level->total_size, size);
atomic_fetch_add(&level->order_count, 1);
// Update book totals
if (side == 0) {
atomic_fetch_add(&book->total_bid_size, size);
// Update best bid if needed
uint64_t current_best = atomic_load(&book->best_bid);
while (price > current_best) {
if (atomic_compare_exchange_weak(&book->best_bid, ¤t_best, price)) {
break;
}
}
} else {
atomic_fetch_add(&book->total_ask_size, size);
uint64_t current_best = atomic_load(&book->best_ask);
while (price < current_best) {
if (atomic_compare_exchange_weak(&book->best_ask, ¤t_best, price)) {
break;
}
}
}
return true;
}
bool execute_order(OrderBook* book, uint64_t order_id, uint32_t executed) {
Order* order = &book->orders[order_id % MAX_ORDERS];
uint32_t remaining = atomic_fetch_sub(&order->remaining, executed);
// Update price level
PriceLevel* level = find_level(
order->side == 0 ? book->bid_levels : book->ask_levels,
atomic_load(&order->price)
);
if (level != NULL) {
atomic_fetch_sub(&level->total_size, executed);
}
// Check if order is fully filled
if (remaining <= executed) {
atomic_store(&order->status, 2); // Filled
}
return true;
}
Risk Management
Real-Time Risk Checks
# Real-time risk management system
from dataclasses import dataclass
from typing import Dict, List
from collections import deque
import threading
@dataclass
class Position:
symbol: str
long_qty: int
short_qty: int
avg_long_price: float
avg_short_price: float
@property
def net_qty(self):
return self.long_qty - self.short_qty
@property
def market_value(self):
return (self.long_qty * self.avg_long_price +
self.short_qty * self.avg_short_price)
@dataclass
class RiskLimits:
max_position_size: int
max_order_size: int
max_loss_per_minute: float
max_loss_per_day: float
max_orders_per_second: int
class RiskManager:
def __init__(self, limits: RiskLimits):
self.limits = limits
self.positions: Dict[str, Position] = {}
self.order_history = deque(maxlen=10000)
self.pnl_history = deque(maxlen=1440) # 24 hours of minutes
self.daily_pnl = 0.0
self.lock = threading.Lock()
def check_order(self, symbol: str, side: str,
quantity: int, price: float) -> bool:
"""Check if order passes risk checks"""
with self.lock:
position = self.positions.get(symbol, Position(
symbol, 0, 0, 0.0, 0.0
))
# Check order size
if quantity > self.limits.max_order_size:
return False
# Check position limit
new_qty = position.net_qty + (quantity if side == 'BUY' else -quantity)
if abs(new_qty) > self.limits.max_position_size:
return False
# Check minute loss
if self._get_minute_loss() > self.limits.max_loss_per_minute:
return False
# Check daily loss
if self.daily_pnl < -self.limits.max_loss_per_day:
return False
# Check order rate
if self._get_orders_per_second() > self.limits.max_orders_per_second:
return False
return True
def check_cancel(self, order_id: int) -> bool:
"""Allow all cancels"""
return True
def record_trade(self, symbol: str, side: str,
quantity: int, price: float, pnl: float):
"""Record executed trade"""
with self.lock:
# Update position
if symbol not in self.positions:
self.positions[symbol] = Position(symbol, 0, 0, 0.0, 0.0)
pos = self.positions[symbol]
if side == 'BUY':
total_cost = pos.long_qty * pos.avg_long_price + quantity * price
pos.long_qty += quantity
pos.avg_long_price = total_cost / pos.long_qty if pos.long_qty > 0 else 0
else:
total_cost = pos.short_qty * pos.avg_short_price + quantity * price
pos.short_qty += quantity
pos.avg_short_price = total_cost / pos.short_qty if pos.short_qty > 0 else 0
# Update P&L
self.daily_pnl += pnl
self.pnl_history.append((self._get_current_minute(), pnl))
def _get_minute_loss(self) -> float:
current_minute = self._get_current_minute()
return sum(pnl for time, pnl in self.pnl_history if time == current_minute and pnl < 0)
def _get_orders_per_second(self) -> int:
# Implementation to track orders per second
return 0
def _get_current_minute(self) -> int:
import time
return int(time.time() / 60)
Performance Testing
Latency Benchmarking
# Comprehensive latency testing
import numpy as np
import time
from dataclasses import dataclass
from typing import List
@dataclass
class LatencyStats:
min: float
max: float
mean: float
median: float
p99: float
p999: float
std: float
class LatencyBenchmark:
def __init__(self):
self.measurements: List[float] = []
def measure_round_trip(self, iterations=10000):
"""Measure round-trip latency"""
for _ in range(iterations):
start = time.perf_counter_ns()
# Simulate processing
self._simulate_workload()
end = time.perf_counter_ns()
self.measurements.append(end - start)
def _simulate_workload(self):
"""Simulate typical trading workload"""
# Market data parsing
# Strategy calculation
# Risk check
# Order generation
pass
def get_statistics(self) -> LatencyStats:
"""Calculate latency statistics"""
data = np.array(self.measurements)
return LatencyStats(
min=np.min(data),
max=np.max(data),
mean=np.mean(data),
median=np.median(data),
p99=np.percentile(data, 99),
p999=np.percentile(data, 99.9),
std=np.std(data)
)
def print_report(self):
stats = self.get_statistics()
print("=" * 60)
print("LATENCY BENCHMARK REPORT")
print("=" * 60)
print(f"Iterations: {len(self.measurements):,}")
print(f"Min: {stats.min:.0f} ns ({stats.min/1000:.2f} ฮผs)")
print(f"Max: {stats.max:.0f} ns ({stats.max/1000:.2f} ฮผs)")
print(f"Mean: {stats.mean:.0f} ns ({stats.mean/1000:.2f} ฮผs)")
print(f"Median: {stats.median:.0f} ns ({stats.median/1000:.2f} ฮผs)")
print(f"P99: {stats.p99:.0f} ns ({stats.p99/1000:.2f} ฮผs)")
print(f"P99.9: {stats.p999:.0f} ns ({stats.p999/1000:.2f} ฮผs)")
print(f"Std Dev: {stats.std:.0f} ns ({stats.std/1000:.2f} ฮผs)")
print("=" * 60)
Conclusion
Building high-frequency trading infrastructure requires attention to every microsecond. Key takeaways:
-
Network is Critical: Colocation, kernel bypass (DPDK), and PTP synchronization are essential.
-
FPGA for Ultimate Speed: For sub-microsecond latency, FPGA acceleration is required.
-
Lock-Free Data Structures: Avoid locks in the hot path; use atomic operations.
-
Comprehensive Testing: Benchmark everything and measure at the cycle level.
-
Risk Management: Real-time risk checks must be as fast as order execution.
The pursuit of lower latency is endless, but these patterns will help you build a competitive HFT system.
Comments