NumPy is a powerful Python library for numerical computing, providing efficient array operations, linear algebra, and random number generation. It forms the foundation for most scientific computing in Python and is essential for machine learning, data science, and engineering applications. Below are practical examples for common tasks.
Introduction to NumPy
NumPy, short for Numerical Python, provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays. The library is the backbone of the Python scientific computing ecosystem, with pandas, scikit-learn, TensorFlow, and PyTorch all depending on NumPy.
The key advantage of NumPy over native Python lists is performance. NumPy arrays are stored in contiguous blocks of memory, enabling vectorized operations that are orders of magnitude faster than equivalent Python loops. This efficiency makes NumPy essential for handling large datasets and performing intensive computations.
NumPy provides functionality spanning linear algebra, Fourier transforms, random number generation, and more. Understanding NumPy’s capabilities and idioms is fundamental for anyone working with numerical data in Python.
Basic Array Creation
NumPy provides numerous ways to create arrays for different use cases. Understanding these creation methods forms the foundation for effective array manipulation.
import numpy as np
# Create a 4x3 array of zeros
zeros_array = np.zeros([4, 3])
print(zeros_array)
# Output:
# [[0. 0. 0.]
# [0. 0. 0.]
# [0. 0. 0.]
# [0. 0. 0.]]
# Create a 3x4 array of ones
ones_array = np.ones([3, 4])
print(ones_array)
# Create an uninitialized array (may contain arbitrary values)
empty_array = np.empty([2, 2])
# Create a 5x5 array filled with a specific value
full_array = np.full([5, 5], 7.0)
# Create a 1D array with range of values
range_array = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
# Create evenly spaced values
linspace_array = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ]
# Create a 3x2 array of random numbers (uniform [0, 1))
random_array = np.random.random([3, 2])
# Create identity matrix
identity = np.eye(4)
# Create diagonal matrix
diagonal = np.diag([1, 2, 3, 4])
Array Indexing and Slicing
NumPy offers powerful indexing and slicing capabilities that allow you to access and modify specific elements, rows, columns, or subarrays efficiently.
# Create a sample array
arr = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# Access single element
element = arr[0, 0] # 1
# Access entire first row
first_row = arr[0, :] # [1, 2, 3, 4]
# Access entire first column
first_col = arr[:, 0] # [1, 5, 9]
# Slice: rows 0-1, columns 1-2
subarray = arr[0:2, 1:3]
# [[2, 3]
# [6, 7]]
# Negative indexing
last_element = arr[-1, -1] # 12
# Boolean indexing - select elements where condition is true
data = np.array([1, 2, 3, 4, 5, 6])
mask = data > 3
selected = data[mask] # [4, 5, 6]
# Or more concisely
selected = data[data > 3]
# Fancy indexing - use arrays as indices
indices = [0, 2, 4]
selected = data[indices] # [1, 3, 5]
Matrix Properties and Operations
Understanding matrix properties is essential for linear algebra operations and data manipulation. NumPy provides intuitive methods for accessing and manipulating matrix characteristics.
# 1D array
x = np.array([3, 5])
print(x.shape) # (2,)
# 2D array
y = np.array([[3, 5, 2], [2, 4, 2]])
print(y.shape) # (2, 3)
# Number of dimensions
print(y.ndim) # 2
# Total number of elements
print(y.size) # 6
# Transpose
transposed = y.transpose()
# [[3, 2]
# [5, 4]
# [2, 2]]
# Or use shorthand
transposed = y.T
# Reshape array
reshaped = y.reshape(3, 2)
# Flatten to 1D
flattened = y.flatten()
# Add a number to all elements
z = y + 3
# [[6, 8, 5]
# [5, 7, 5]]
# Element-wise multiplication
result = z * y
# [[18, 40, 10]
# [10, 28, 10]]
# Matrix multiplication
np.matmul(x, z)
# Equivalent to: x @ z
# Dot product
dot_product = np.dot(x, np.array([1, 1])) # 8
# Cross product
cross_product = np.cross(x, np.array([1, 0]))
Basic Math Functions
NumPy provides comprehensive mathematical functions that operate element-wise on arrays, enabling efficient computation across entire datasets.
x = np.array([3, 5])
# Exponential functions
np.exp(x) # [20.08553692, 148.4131591]
np.exp2(x) # [8., 32.]
np.expm1(x) # [19.08553692, 147.4131591] - exp(x) - 1
# Logarithmic functions
np.log(x) # Natural log
np.log2(x) # Log base 2
np.log10(x) # Log base 10
# Trigonometric functions
np.sin(x) # Sine
np.cos(x) # Cosine
np.tan(x) # Tangent
# Inverse trigonometric
np.arcsin(np.array([0.5, -0.5]))
np.arccos(np.array([0.5, -0.5]))
np.arctan(x)
# Hyperbolic functions
np.sinh(x) # Hyperbolic sine
np.cosh(x) # Hyperbolic cosine
np.tanh(x) # Hyperbolic tangent
# Power and roots
np.power(x, 2) # [9, 25]
np.sqrt(x) # [1.73205081, 2.23606798]
Aggregation Functions
Aggregation functions compute summary statistics across arrays, essential for data analysis and exploration.
x = np.array([-10, 3, 5, 9, 21, 8])
# Maximum and minimum
np.max(x) # 21
np.min(x) # -10
x.max() # Alternative method syntax
x.min()
# Sum and product
np.sum(x) # 36
np.prod(x) # -59400 (includes negative)
# Mean and median
x.mean() # 6.0
np.median(x) # 6.0
# Standard deviation and variance
np.std(x) # 10.033
np.var(x) # 100.66
# Percentiles
np.percentile(x, 25) # 1.5
np.percentile(x, 50) # 6.0
np.percentile(x, 75) # 11.5
# Euclidean norm (L2 norm)
np.linalg.norm(x) # 27.313
# Sum along axis
matrix = np.array([[1, 2], [3, 4]])
np.sum(matrix, axis=0) # [4, 6] - sum columns
np.sum(matrix, axis=1) # [3, 7] - sum rows
Random Arrays and Numbers
Random number generation is crucial for simulations, testing, and machine learning. NumPy’s random module provides extensive capabilities.
# Random floats in [0, 1)
np.random.random([2, 1]) # 2x1 array
np.random.random([5, 6]) # 5x6 array
np.random.random() # Single random number
# Random integers
np.random.randint(0, 10) # Single int in [0, 10)
np.random.randint(0, 10, (3, 3)) # 3x3 matrix
# Random floats from normal distribution
np.random.randn(3, 3) # Mean=0, Std=1
# Random floats from custom normal distribution
np.random.normal(loc=0, scale=1, size=(2, 2))
# Random integers with replacement
np.random.choice([1, 2, 3, 4, 5], size=3)
# Random shuffle
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr) # In-place shuffle
# Set random seed for reproducibility
np.random.seed(42)
random_data = np.random.random(5)
# Generate random samples from various distributions
np.random.uniform(0, 1, 100) # Uniform
np.random.normal(0, 1, 100) # Normal/Gaussian
np.random.exponential(1, 100) # Exponential
np.random.poisson(5, 100) # Poisson
Broadcasting
Broadcasting enables NumPy to perform operations on arrays with different shapes, automatically expanding smaller arrays to match larger ones.
# Simple broadcasting
a = np.array([1, 2, 3])
b = 2
result = a * b # [2, 4, 6] - b is broadcast to match a's shape
# Broadcasting 1D to 2D
a = np.array([[1], [2], [3]]) # 3x1
b = np.array([10, 20, 30]) # 3,
result = a + b
# [[11, 21, 31]
# [12, 22, 32]
# [13, 23, 33]]
# Practical example: normalize rows
data = np.random.random((4, 3))
mean = data.mean(axis=1, keepdims=True)
normalized = data - mean
Linear Algebra Operations
NumPy’s linear algebra module provides essential operations for scientific computing, including matrix decompositions and solving linear systems.
# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = np.matmul(A, B) # or A @ B
# Matrix inverse
A_inv = np.linalg.inv(A)
# Determinant
det = np.linalg.det(A)
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
# Solve linear system Ax = b
A = np.array([[1, 2], [3, 4]])
b = np.array([5, 6])
x = np.linalg.solve(A, b)
# QR decomposition
Q, R = np.linalg.qr(A)
# Singular value decomposition
U, S, Vt = np.linalg.svd(A)
Working with Missing Data
NumPy uses special values to represent missing or invalid data, enabling robust numerical computing.
# Create array with NaN values
arr = np.array([1, 2, np.nan, 4, 5])
# Check for NaN
np.isnan(arr) # [False, False, True, False, False]
# Remove NaN values
clean = arr[~np.isnan(arr)]
# Replace NaN with value
filled = np.nan_to_num(arr, nan=0)
# Use nan-safe functions
np.nanmean(arr) # Mean ignoring NaN
np.nanstd(arr) # Std ignoring NaN
np.nansum(arr) # Sum ignoring NaN
Conclusion
NumPy is the foundation of numerical computing in Python, providing efficient array operations that power scientific applications. The examples in this guide cover essential operations for data manipulation, mathematical computation, and linear algebra.
Mastering NumPy’s array operations enables efficient data processing and provides the foundation for more advanced libraries like pandas, scikit-learn, and deep learning frameworks. Practice these operations to build fluency in numerical Python programming.
Comments