Cassandra Operations: Backup, Repair, and Cluster Management

Introduction

Running Cassandra in production requires careful attention to cluster management, backups, repairs, and monitoring. This guide covers essential DBA tasks for maintaining healthy Cassandra clusters.

Node Operations

Adding Nodes

# On new node: Install Cassandra with same version
# Configure cassandra.yaml

# Start Cassandra on new node
sudo service cassandra start

# From existing node: Join to cluster
nodetool join -h <new_node_ip>

# Or if using virtual nodes, simply start Cassandra
# It will automatically join cluster

# Check node status
nodetool status

Removing Nodes

# Decommission a node (graceful)
nodetool decommission

# Remove a node forcefully (if crashed)
nodetool assassinate <node_ip>

Node Repair

# Repair specific keyspace
nodetool repair mykeyspace

# Incremental repair (Cassandra 3.0+)
nodetool repair -inc mykeyspace

# Parallel repair
nodetool repair -pr mykeyspace

Backup and Restore

Taking Snapshots

# Take snapshot of a table
nodetool snapshot -t my_snapshot mykeyspace mytable

# Take snapshot of entire keyspace
nodetool snapshot -t full_backup mykeyspace

# Clear snapshot
nodetool clearsnapshot -t my_snapshot

Restore from Snapshot

# 1. Truncate table
cqlsh -e "TRUNCATE mykeyspace.mytable;"

# 2. Stop Cassandra
sudo service cassandra stop

# 3. Copy snapshot files
cp /backup/mykeyspace/mytable/*/snapshots/my_snapshot/* \
   /var/lib/cassandra/data/mykeyspace/mytable-*/

# 4. Start Cassandra
sudo service cassandra start

# 5. Run repair
nodetool repair mykeyspace mytable

Monitoring with nodetool

Cluster Status

# Check node status
nodetool status

# Table statistics
nodetool tablestats mykeyspace.mytable

# Column family stats
nodetool cfstats mykeyspace.mytable

# Compaction stats
nodetool compactionstats

Troubleshooting

# Get thread pool stats
nodetool tpstats

# Proxy histograms
nodetool proxyhistograms

# View log entries
tail -f /var/log/cassandra/system.log

Security

Authentication

-- Create superuser
CREATE ROLE admin WITH SUPERUSER = true AND LOGIN = true 
PASSWORD = 'secure_password';

-- Create application user
CREATE ROLE appuser WITH LOGIN = true PASSWORD = 'app_password';

-- Grant permissions
GRANT ALL ON KEYSPACE myapp TO appuser;

Compaction Strategies

Size-Tiered Compaction (STCS)

-- Default for write-heavy workloads
CREATE TABLE mykeyspace.mytable (
    id UUID PRIMARY KEY,
    data TEXT
) WITH compaction = {
    'class': 'SizeTieredCompactionStrategy',
    'min_threshold': 4,
    'max_threshold': 32
};

Leveled Compaction (LCS)

-- Better for read-heavy workloads
CREATE TABLE mykeyspace.mytable (
    id UUID PRIMARY KEY,
    data TEXT
) WITH compaction = {
    'class': 'LeveledCompactionStrategy',
    'sstable_size_in_mb': 160
};

Time-Window Compaction (TWCS)

-- Best for time-series data
CREATE TABLE mykeyspace.timeseries (
    id UUID,
    timestamp TIMESTAMP,
    data TEXT,
    PRIMARY KEY (id, timestamp)
) WITH compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_unit': 'DAYS',
    'compaction_window_size': 1
};

Conclusion

Cassandra operations require regular maintenance: repairs to ensure consistency, backups for disaster recovery, and monitoring for performance. With proper cluster management practices, your Cassandra deployment can scale reliably.

In the next article, we’ll explore Cassandra’s internal architecture: distributed design, storage engine, gossip protocol, and consistency mechanisms.