Introduction
NumPy’s boolean masking (also called boolean indexing or fancy indexing) is one of its most powerful features. It lets you select, filter, and modify array elements based on conditions โ without explicit loops. This guide covers everything from basic boolean masks to advanced multi-dimensional filtering.
Boolean Masking Basics
A boolean mask is an array of True/False values with the same shape as the array you want to filter:
import numpy as np
a = np.array([10, 20, 30, 40, 50])
# Create a mask
mask = np.array([True, False, True, False, True])
# Apply the mask โ returns elements where mask is True
print(a[mask]) # => [10 30 50]
Filtering with Conditions
The most common pattern: create a mask from a condition, then apply it:
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Elements greater than 5
mask = x > 5
print(mask) # => [False False False False False True True True True True]
print(x[mask]) # => [ 6 7 8 9 10]
# Shorthand โ condition directly as index
print(x[x > 5]) # => [ 6 7 8 9 10]
print(x[x % 2 == 0]) # => [ 2 4 6 8 10] (even numbers)
Combining Conditions
Use & (and), | (or), ~ (not) โ not Python’s and/or/not:
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Between 3 and 7 (inclusive)
print(x[(x >= 3) & (x <= 7)]) # => [3 4 5 6 7]
# Less than 3 OR greater than 8
print(x[(x < 3) | (x > 8)]) # => [ 1 2 9 10]
# NOT greater than 5
print(x[~(x > 5)]) # => [1 2 3 4 5]
Important: Always wrap each condition in parentheses when combining โ operator precedence can cause bugs otherwise.
2D Array Masking
Row/Column Selection with a 1D Mask
a = np.array([True, True, True, False, False])
b = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10]])
# Select columns where mask is True
filtered = b[:, a]
print(filtered)
# => [[1 2 3]
# [6 7 8]]
# Select rows with a row mask
row_mask = np.array([True, False])
print(b[row_mask])
# => [[1 2 3 4 5]]
Condition-Based 2D Filtering
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Get all elements greater than 5 (returns 1D array)
print(matrix[matrix > 5]) # => [6 7 8 9]
# Replace elements greater than 5 with 0
result = matrix.copy()
result[result > 5] = 0
print(result)
# => [[1 2 3]
# [4 5 0]
# [0 0 0]]
np.where: Conditional Selection
np.where(condition, x, y) returns x where condition is True, y where False:
x = np.array([1, -2, 3, -4, 5])
# Replace negatives with 0
result = np.where(x > 0, x, 0)
print(result) # => [1 0 3 0 5]
# Absolute value
abs_vals = np.where(x >= 0, x, -x)
print(abs_vals) # => [1 2 3 4 5]
# Classify values
labels = np.where(x > 0, "positive", "negative")
print(labels) # => ['positive' 'negative' 'positive' 'negative' 'positive']
np.where with Indices Only
Called with just a condition, np.where returns the indices where it’s True:
x = np.array([10, 20, 30, 40, 50])
indices = np.where(x > 25)
print(indices) # => (array([2, 3, 4]),)
print(x[indices]) # => [30 40 50]
# 2D indices
matrix = np.array([[1, 5, 3],
[8, 2, 7]])
rows, cols = np.where(matrix > 4)
print(rows, cols) # => [0 1 1] [1 0 2]
print(matrix[rows, cols]) # => [5 8 7]
np.select: Multiple Conditions
For more than two cases, use np.select:
scores = np.array([45, 72, 88, 55, 95, 61])
conditions = [
scores >= 90,
scores >= 80,
scores >= 70,
scores >= 60,
]
choices = ['A', 'B', 'C', 'D']
grades = np.select(conditions, choices, default='F')
print(grades) # => ['F' 'C' 'B' 'F' 'A' 'D']
Masked Arrays (np.ma)
For arrays with invalid or missing data, use np.ma.MaskedArray:
data = np.array([1.0, 2.0, -999.0, 4.0, -999.0, 6.0])
# Mask invalid values
masked = np.ma.masked_where(data == -999.0, data)
print(masked) # => [1.0 2.0 -- 4.0 -- 6.0]
print(masked.mean()) # => 3.25 (ignores masked values)
print(masked.sum()) # => 13.0
# Mask NaN values
data_with_nan = np.array([1.0, np.nan, 3.0, np.nan, 5.0])
masked_nan = np.ma.masked_invalid(data_with_nan)
print(masked_nan.mean()) # => 3.0
Practical Examples
Filter Rows of a 2D Array
# Student scores: [name_id, math, science, english]
students = np.array([
[1, 85, 90, 78],
[2, 60, 55, 70],
[3, 92, 88, 95],
[4, 45, 50, 60],
[5, 78, 82, 80],
])
# Students who passed all subjects (score >= 70)
passed_all = np.all(students[:, 1:] >= 70, axis=1)
print(students[passed_all])
# => [[ 1 85 90 78]
# [ 3 92 88 95]
# [ 5 78 82 80]]
# Students who failed at least one subject
failed_any = np.any(students[:, 1:] < 70, axis=1)
print(students[failed_any, 0]) # => [2 4] (student IDs)
Normalize Data (Clip Outliers)
data = np.array([1, 2, 100, 3, 4, -50, 5])
# Clip values to [0, 10] range
clipped = np.clip(data, 0, 10)
print(clipped) # => [ 1 2 10 3 4 0 5]
# Or use masking to replace outliers with NaN
result = data.astype(float)
result[(data < 0) | (data > 10)] = np.nan
print(result) # => [ 1. 2. nan 3. 4. nan 5.]
Image Processing (Threshold)
# Simulate a grayscale image (0-255)
image = np.random.randint(0, 256, size=(4, 4))
print(image)
# Binary threshold: pixels > 128 become 255, others become 0
binary = np.where(image > 128, 255, 0)
print(binary)
# Mask dark pixels
dark_mask = image < 50
image[dark_mask] = 0 # set dark pixels to black
Performance Tips
# Boolean indexing creates a copy โ use it for selection
selected = arr[arr > 0] # new array
# For in-place modification, use the mask directly
arr[arr < 0] = 0 # modifies arr in place โ no copy
# np.where is faster than boolean indexing for simple replacements
result = np.where(arr > 0, arr, 0) # faster than arr[arr <= 0] = 0 pattern
# For large arrays, avoid creating intermediate masks when possible
# Slow:
mask = (arr > 0) & (arr < 100)
result = arr[mask]
# Faster (same result):
result = arr[(arr > 0) & (arr < 100)]
Summary
| Method | Use Case |
|---|---|
arr[condition] |
Filter elements by condition |
arr[bool_array] |
Filter by pre-computed boolean mask |
np.where(cond, x, y) |
Conditional element selection |
np.where(cond) |
Get indices of True elements |
np.select(conds, choices) |
Multiple condition branching |
np.ma.masked_where |
Mask invalid/missing values |
np.clip(arr, min, max) |
Clamp values to a range |
Comments