Skip to main content
โšก Calmops

NumPy Array Masking and Filtering: A Complete Guide

Introduction

NumPy’s boolean masking (also called boolean indexing or fancy indexing) is one of its most powerful features. It lets you select, filter, and modify array elements based on conditions โ€” without explicit loops. This guide covers everything from basic boolean masks to advanced multi-dimensional filtering.

Boolean Masking Basics

A boolean mask is an array of True/False values with the same shape as the array you want to filter:

import numpy as np

a = np.array([10, 20, 30, 40, 50])

# Create a mask
mask = np.array([True, False, True, False, True])

# Apply the mask โ€” returns elements where mask is True
print(a[mask])  # => [10 30 50]

Filtering with Conditions

The most common pattern: create a mask from a condition, then apply it:

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Elements greater than 5
mask = x > 5
print(mask)     # => [False False False False False  True  True  True  True  True]
print(x[mask])  # => [ 6  7  8  9 10]

# Shorthand โ€” condition directly as index
print(x[x > 5])  # => [ 6  7  8  9 10]
print(x[x % 2 == 0])  # => [ 2  4  6  8 10]  (even numbers)

Combining Conditions

Use & (and), | (or), ~ (not) โ€” not Python’s and/or/not:

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Between 3 and 7 (inclusive)
print(x[(x >= 3) & (x <= 7)])  # => [3 4 5 6 7]

# Less than 3 OR greater than 8
print(x[(x < 3) | (x > 8)])    # => [ 1  2  9 10]

# NOT greater than 5
print(x[~(x > 5)])              # => [1 2 3 4 5]

Important: Always wrap each condition in parentheses when combining โ€” operator precedence can cause bugs otherwise.

2D Array Masking

Row/Column Selection with a 1D Mask

a = np.array([True, True, True, False, False])
b = np.array([[1, 2, 3, 4, 5],
              [6, 7, 8, 9, 10]])

# Select columns where mask is True
filtered = b[:, a]
print(filtered)
# => [[1 2 3]
#     [6 7 8]]

# Select rows with a row mask
row_mask = np.array([True, False])
print(b[row_mask])
# => [[1 2 3 4 5]]

Condition-Based 2D Filtering

matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Get all elements greater than 5 (returns 1D array)
print(matrix[matrix > 5])  # => [6 7 8 9]

# Replace elements greater than 5 with 0
result = matrix.copy()
result[result > 5] = 0
print(result)
# => [[1 2 3]
#     [4 5 0]
#     [0 0 0]]

np.where: Conditional Selection

np.where(condition, x, y) returns x where condition is True, y where False:

x = np.array([1, -2, 3, -4, 5])

# Replace negatives with 0
result = np.where(x > 0, x, 0)
print(result)  # => [1 0 3 0 5]

# Absolute value
abs_vals = np.where(x >= 0, x, -x)
print(abs_vals)  # => [1 2 3 4 5]

# Classify values
labels = np.where(x > 0, "positive", "negative")
print(labels)  # => ['positive' 'negative' 'positive' 'negative' 'positive']

np.where with Indices Only

Called with just a condition, np.where returns the indices where it’s True:

x = np.array([10, 20, 30, 40, 50])
indices = np.where(x > 25)
print(indices)       # => (array([2, 3, 4]),)
print(x[indices])    # => [30 40 50]

# 2D indices
matrix = np.array([[1, 5, 3],
                   [8, 2, 7]])
rows, cols = np.where(matrix > 4)
print(rows, cols)    # => [0 1 1] [1 0 2]
print(matrix[rows, cols])  # => [5 8 7]

np.select: Multiple Conditions

For more than two cases, use np.select:

scores = np.array([45, 72, 88, 55, 95, 61])

conditions = [
    scores >= 90,
    scores >= 80,
    scores >= 70,
    scores >= 60,
]
choices = ['A', 'B', 'C', 'D']

grades = np.select(conditions, choices, default='F')
print(grades)  # => ['F' 'C' 'B' 'F' 'A' 'D']

Masked Arrays (np.ma)

For arrays with invalid or missing data, use np.ma.MaskedArray:

data = np.array([1.0, 2.0, -999.0, 4.0, -999.0, 6.0])

# Mask invalid values
masked = np.ma.masked_where(data == -999.0, data)
print(masked)         # => [1.0 2.0 -- 4.0 -- 6.0]
print(masked.mean())  # => 3.25  (ignores masked values)
print(masked.sum())   # => 13.0

# Mask NaN values
data_with_nan = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
masked_nan = np.ma.masked_invalid(data_with_nan)
print(masked_nan.mean())  # => 3.0

Practical Examples

Filter Rows of a 2D Array

# Student scores: [name_id, math, science, english]
students = np.array([
    [1, 85, 90, 78],
    [2, 60, 55, 70],
    [3, 92, 88, 95],
    [4, 45, 50, 60],
    [5, 78, 82, 80],
])

# Students who passed all subjects (score >= 70)
passed_all = np.all(students[:, 1:] >= 70, axis=1)
print(students[passed_all])
# => [[ 1 85 90 78]
#     [ 3 92 88 95]
#     [ 5 78 82 80]]

# Students who failed at least one subject
failed_any = np.any(students[:, 1:] < 70, axis=1)
print(students[failed_any, 0])  # => [2 4]  (student IDs)

Normalize Data (Clip Outliers)

data = np.array([1, 2, 100, 3, 4, -50, 5])

# Clip values to [0, 10] range
clipped = np.clip(data, 0, 10)
print(clipped)  # => [ 1  2 10  3  4  0  5]

# Or use masking to replace outliers with NaN
result = data.astype(float)
result[(data < 0) | (data > 10)] = np.nan
print(result)  # => [ 1.  2. nan  3.  4. nan  5.]

Image Processing (Threshold)

# Simulate a grayscale image (0-255)
image = np.random.randint(0, 256, size=(4, 4))
print(image)

# Binary threshold: pixels > 128 become 255, others become 0
binary = np.where(image > 128, 255, 0)
print(binary)

# Mask dark pixels
dark_mask = image < 50
image[dark_mask] = 0  # set dark pixels to black

Performance Tips

# Boolean indexing creates a copy โ€” use it for selection
selected = arr[arr > 0]  # new array

# For in-place modification, use the mask directly
arr[arr < 0] = 0  # modifies arr in place โ€” no copy

# np.where is faster than boolean indexing for simple replacements
result = np.where(arr > 0, arr, 0)  # faster than arr[arr <= 0] = 0 pattern

# For large arrays, avoid creating intermediate masks when possible
# Slow:
mask = (arr > 0) & (arr < 100)
result = arr[mask]

# Faster (same result):
result = arr[(arr > 0) & (arr < 100)]

Summary

Method Use Case
arr[condition] Filter elements by condition
arr[bool_array] Filter by pre-computed boolean mask
np.where(cond, x, y) Conditional element selection
np.where(cond) Get indices of True elements
np.select(conds, choices) Multiple condition branching
np.ma.masked_where Mask invalid/missing values
np.clip(arr, min, max) Clamp values to a range

Resources

Comments