Skip to main content
โšก Calmops

Command-Line Tools for Computer Vision: Complete Guide 2026

Practical CLI Utilities for Audio and Visual Processing

Overview

This article introduces several command-line tools and utilities commonly used in computer vision and audio processing workflows. These tools help you interact with hardware devices, record and process audio, manage output settings efficiently, and handle image and video processing tasks essential for computer vision applications.

Command-line tools remain indispensable for computer vision work despite the proliferation of graphical interfaces and high-level libraries. CLI tools offer scriptability, automation capabilities, and often provide more direct access to system resources. They integrate seamlessly into CI/CD pipelines, enable batch processing of large datasets, and serve as the foundation for building more complex computer vision systems.

Whether you are building a computer vision pipeline for object detection, training a machine learning model on image datasets, or setting up a multi-camera system for industrial inspection, understanding these command-line utilities will make your workflow more efficient and your systems more robust.

Key CLI Tools for Computer Vision

FFmpeg: Video Processing Foundation

FFmpeg is the Swiss Army knife of media processing, providing capabilities essential for computer vision workflows. The tool can convert between virtually any video format, extract frames from video files, adjust video parameters, and even apply filtersโ€”all from the command line.

For computer vision applications, FFmpeg excels at preparing training data. You can extract frames from video datasets at specific intervals, resize and normalize images for model input, convert between color spaces, and batch process entire directories of media files.

# Extract frames from video at 1 frame per second
ffmpeg -i input_video.mp4 -vf "fps=1" frames/frame_%04d.png

# Extract frames at specific timestamps
ffmpeg -i video.mp4 -ss 00:01:30 -frames:v 1 snapshot.png

# Convert video to image sequence
ffmpeg -i input.mp4 -pix_fmt rgb24 output_%03d.raw

# Resize images for model input
ffmpeg -i input.jpg -vf "scale=224:224" output.jpg

# Convert color space for OpenCV processing
ffmpeg -i input.jpg -pix_fmt bgr24 output.raw

# Extract audio from video
ffmpeg -i input.mp4 -vn -acodec copy output.aac

# Concatenate videos
ffmpeg -f concat -safe 0 -i filelist.txt -c copy output.mp4

ImageMagick: Image Manipulation

ImageMagick provides comprehensive image processing capabilities through the convert command and its associated utilities. For computer vision, ImageMagick handles image preprocessing, dataset augmentation, and format conversion tasks efficiently.

The tool supports over 200 image formats and provides operations including resizing, cropping, rotation, color space conversion, and various filters. Batch processing capabilities make it ideal for preparing large image datasets.

# Basic image conversions
convert input.png output.jpg
convert input.tiff -resize 256x256 output.png

# Batch resize all images in directory
mogrify -resize 224x224 *.jpg

# Create thumbnails
convert input.jpg -thumbnail 128x128 thumbnail.jpg

# Adjust brightness and contrast
convert input.jpg -brightness-contrast 20x30 output.jpg

# Convert to grayscale
convert input.jpg -colorspace Gray gray_output.jpg

# Apply edge detection
convert input.jpg -edge 1 edge_output.jpg

# Split image into tiles
convert input.jpg -crop 2x2@ tiles/tile_%d.jpg

GStreamer: Multimedia Framework

GStreamer provides a powerful pipeline-based framework for media processing. Unlike simple command-line tools, GStreamer enables complex processing graphs that can handle real-time video streams, multi-camera setups, and hardware-accelerated processing.

For computer vision applications, GStreamer excels at camera capture, real-time preprocessing, and streaming. The framework integrates well with OpenCV and other computer vision libraries.

# Capture from webcam
gst-launch-1.0 v4l2src device=/dev/video0 ! videoconvert ! xvimagesink

# Capture and save to file
gst-launch-1.0 v4l2src device=/dev/video0 ! videoconvert ! mp4mux ! filesink location=capture.mp4

# RTSP stream playback
gst-launch-1.0 rtspsrc location=rtsp://camera/stream ! rtph264depay ! avdec_h264 ! xvimagesink

# Video file to pipeline
gst-launch-1.0 filesrc location=video.mp4 ! qtdemux ! h264parse ! avdec_h264 ! xvimagesink

OpenCV Command-Line Tools

OpenCV provides command-line utilities for various image and video processing tasks. The cvv tool offers an interactive GUI for visual debugging, while the opensles utility handles audio/video sync.

# Check OpenCV version and modules
opencv_version

# Image quality assessment
cvv -img image.jpg  # Interactive visual inspection

# Video file info
ffprobe -v error -select_streams v:0 -show_entries stream=width,height,avg_frame_rate -of default=noprint_wrappers=1 video.mp4

Audio Tools for Computer Vision

ALSA Utilities: Audio Recording and Playback

The Advanced Linux Sound Architecture (ALSA) utilities provide low-level access to audio hardware. For computer vision systems that include audio inputโ€”speech recognition, acoustic analysis, or audio-visual synchronizationโ€”these tools are essential.

The arecord command lists and records from audio devices, while aplay handles audio playback. These tools provide direct hardware access without the overhead of higher-level libraries.

# List available audio devices
arecord -l
aplay -l

# Record audio from specific device
arecord -D hw:2,0 -f S16_LE -c 1 -t wav -r 44100 test.wav

# Record with duration limit
arecord -d 60 -f cd test.wav

# Play audio file
aplay test.wav

# List device capabilities
arecord --dump-hw-params -D hw:0,0

SoX: Sound eXchange

SoX (Sound eXchange) provides audio manipulation capabilities including format conversion, effects processing, and recording. The tool’s simplicity and scriptability make it valuable for preprocessing audio data for machine learning models.

For computer vision applications that process video with audio tracks, SoX handles audio extraction, normalization, and format conversion efficiently.

# Basic playback
play audio.wav

# Convert formats
sox input.wav output.mp3

# Normalize audio volume
sox input.wav output.wav norm

# Apply effects
sox input.wav output.wav reverb compand

# Trim audio
sox input.wav output.wav trim 10 30

# Combine channels
sox -m left.wav right.wav stereo.wav

# Extract segment for dataset
sox input.wav output.wav trim 0 5 fade 0.1 5 0.1

Device Management Scripts

Multi-Display and GPU Configuration

Computer vision workstations often require multiple displays or GPU configurations. Command-line tools and scripts enable flexible setup management.

# List connected displays (xrandr)
xrandr --listmonitors

# Set primary display
xrandr --output DP-1 --primary

# Clone display to external monitor
xrandr --output HDMI-1 --same-as eDP-1

# NVIDIA GPU info
nvidia-smi
nvidia-smi -L
nvidia-smi --query-gpu=memory.used,memory.total,utilization.gpu --format=csv

Camera Configuration

# List video devices
v4l2-ctl --list-devices

# Query camera capabilities
v4l2-ctl -d /dev/video0 --all

# Set camera parameters
v4l2-ctl -d /dev/video0 --set-fmt-video=width=1920,height=1080
v4l2-ctl -d /dev/video0 --set-ctrl=brightness=128

# Capture test frame
v4l2-ctl --device=/dev/video0 --set-fmt-video=width=1920,height=1080,pixelformat=MJPG --stream-mmap --stream-to=frame.jpg --stream-count=1

Practical Workflow Examples

Dataset Preparation Pipeline

A typical computer vision dataset preparation workflow combines multiple CLI tools:

#!/bin/bash
# Dataset preparation pipeline

INPUT_DIR=$1
OUTPUT_DIR=$2

mkdir -p $OUTPUT_DIR

# Extract frames from all videos
for video in $INPUT_DIR/*.mp4; do
  basename=$(basename $video .mp4)
  mkdir -p $OUTPUT_DIR/$basename
  ffmpeg -i $video -vf "fps=1,scale=224:224" $OUTPUT_DIR/$basename/frame_%04d.jpg
done

# Normalize images
mogrify -normalize $OUTPUT_DIR/*/*.jpg

# Convert to training format
for img in $OUTPUT_DIR/*/*.jpg; do
  convert $img -colorspace RGB -depth 8 $img
done

Real-Time Processing Setup

Setting up real-time computer vision processing often involves combining camera capture, preprocessing, and model inference:

# Camera capture with GStreamer to OpenCV pipe
gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-raw,format=YUY2,width=640,height=480 ! videoconvert ! appsink

# Simultaneous multi-camera capture
gst-launch-1.0 \
  v4l2src device=/dev/video0 ! video/x-raw,width=640,height=480 ! queue ! muxer. \
  v4l2src device=/dev/video1 ! video/x-raw,width=640,height=480 ! queue ! muxer. \
  avimux name=muxer ! filesink location=dual_camera.avi

Practical Notes

Using command-line tools effectively requires understanding their capabilities and limitations. Here are key considerations for computer vision workflows:

Performance Considerations: CLI tools often process one file at a time. For large datasets, use shell loops or GNU parallel to process multiple files simultaneously. Consider hardware acceleration options (GPU, QuickSync) when available.

Format Compatibility: Different tools support different format sets. Use FFmpeg to convert between formats that other tools cannot handle directly. Always verify format compatibility before processing large datasets.

Batch Processing: Take advantage of shell capabilities for batch operations. The find command combined with -exec or piped to while loops enables sophisticated batch processing workflows.

Pipeline Integration: CLI tools integrate naturally into larger processing pipelines. Use named pipes (FIFO) for streaming data between tools without intermediate files.

Applications in Computer Vision

While these tools are primarily for audio and media processing, they are often used in computer vision projects for tasks such as:

  • Capturing training data: Recording and extracting frames from video datasets for model training
  • Preprocessing image datasets: Resizing, normalizing, and converting images to formats required by deep learning frameworks
  • Synchronizing sound and image inputs: Ensuring temporal alignment between audio and video streams in multimodal learning
  • Testing hardware setups: Verifying camera configurations, lighting, and capture parameters for industrial and consumer vision systems
  • Data augmentation: Applying transformations to expand training datasets
  • Model inference post-processing: Converting model outputs to video formats for visualization

For more advanced computer vision tasks, explore libraries like OpenCV, but mastering these CLI tools is essential for low-level hardware interaction, efficient batch processing, and building production computer vision pipelines.

Comments