NumPy Fundamentals: Arrays and Operations for Numerical Computing

NumPy is the foundation of scientific computing in Python. If you work with data, machine learning, or numerical analysis, you’ll use NumPy. Yet many developers treat it as a black box, using it without understanding its power.

NumPy’s core strength is the ndarray (n-dimensional array)—a powerful data structure that enables fast, efficient numerical operations. Unlike Python lists, NumPy arrays are optimized for performance and provide a rich set of mathematical functions.

This guide takes you from NumPy basics to confident array manipulation, showing you why NumPy is essential for modern Python development.

What is NumPy and Why Does It Matter?

NumPy (Numerical Python) is a library for numerical computing that provides:

Efficient arrays: Faster than Python lists for numerical operations
Vectorized operations: Perform operations on entire arrays without explicit loops
Mathematical functions: Comprehensive library of mathematical operations
Broadcasting: Elegant way to work with arrays of different shapes
Integration: Foundation for Pandas, Scikit-learn, TensorFlow, and more

NumPy vs Python Lists

import numpy as np
import time

# Create a list and array with 1 million elements
python_list = list(range(1000000))
numpy_array = np.arange(1000000)

# Time list operation
start = time.time()
result_list = [x * 2 for x in python_list]
list_time = time.time() - start

# Time NumPy operation
start = time.time()
result_array = numpy_array * 2
numpy_time = time.time() - start

print(f"Python list: {list_time:.6f}s")
print(f"NumPy array: {numpy_time:.6f}s")
print(f"NumPy is {list_time/numpy_time:.1f}x faster")

# Output (approximate):
# Python list: 0.045234s
# NumPy array: 0.000234s
# NumPy is 193.3x faster

NumPy’s speed comes from:

C implementation: Core operations are written in C
Contiguous memory: Arrays store data contiguously for cache efficiency
Vectorization: Operations work on entire arrays without Python loops

Part 1: Creating NumPy Arrays

Installation

First, install NumPy:

pip install numpy

Creating Arrays from Python Lists

The simplest way to create a NumPy array is from a Python list:

import numpy as np

# Create 1D array from list
arr1d = np.array([1, 2, 3, 4, 5])
print("1D array:", arr1d)
# Output: 1D array: [1 2 3 4 5]

# Create 2D array from nested list
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D array:")
print(arr2d)
# Output:
# 2D array:
# [[1 2 3]
#  [4 5 6]]

# Create 3D array
arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("3D array shape:", arr3d.shape)
# Output: 3D array shape: (2, 2, 2)

Creating Arrays with Specific Values

NumPy provides convenient functions for creating arrays with specific patterns:

import numpy as np

# Array of zeros
zeros = np.zeros((3, 4))
print("Zeros array:")
print(zeros)
# Output:
# Zeros array:
# [[0. 0. 0. 0.]
#  [0. 0. 0. 0.]
#  [0. 0. 0. 0.]]

# Array of ones
ones = np.ones((2, 3))
print("Ones array:")
print(ones)
# Output:
# Ones array:
# [[1. 1. 1.]
#  [1. 1. 1.]]

# Array with specific value
filled = np.full((2, 3), 7)
print("Filled array:")
print(filled)
# Output:
# Filled array:
# [[7 7 7]
#  [7 7 7]]

# Identity matrix
identity = np.eye(3)
print("Identity matrix:")
print(identity)
# Output:
# Identity matrix:
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

Creating Arrays with Ranges

Use arange() and linspace() to create arrays with specific ranges:

import numpy as np

# arange: similar to Python's range()
arr_arange = np.arange(0, 10, 2)
print("arange(0, 10, 2):", arr_arange)
# Output: arange(0, 10, 2): [0 2 4 6 8]

# linspace: evenly spaced values
arr_linspace = np.linspace(0, 10, 5)
print("linspace(0, 10, 5):", arr_linspace)
# Output: linspace(0, 10, 5): [ 0.   2.5  5.   7.5 10. ]

# logspace: logarithmically spaced values
arr_logspace = np.logspace(0, 2, 5)
print("logspace(0, 2, 5):", arr_logspace)
# Output: logspace(0, 2, 5): [  1.           3.16227766  10.          31.6227766  100.        ]

Creating Random Arrays

Random arrays are useful for testing and simulations:

import numpy as np

# Set seed for reproducibility
np.random.seed(42)

# Random values between 0 and 1
random_uniform = np.random.rand(3, 3)
print("Random uniform [0, 1):")
print(random_uniform)
# Output:
# Random uniform [0, 1):
# [[0.37454012 0.95787588 0.73799541]
#  [0.18391881 0.69523025 0.04575852]
#  [0.00011437 0.30331272 0.4765969 ]]

# Random integers
random_int = np.random.randint(0, 10, size=(2, 3))
print("Random integers [0, 10):")
print(random_int)
# Output:
# Random integers [0, 10):
# [[5 0 3]
#  [3 7 9]]

# Random normal distribution
random_normal = np.random.randn(2, 3)
print("Random normal distribution:")
print(random_normal)
# Output:
# Random normal distribution:
# [[-0.14289023  0.61995701 -0.51841799]
#  [-0.51007333 -0.78106387  0.06531667]]

Part 2: Array Properties and Attributes

Understanding array properties is essential for working with NumPy effectively:

import numpy as np

# Create a sample array
arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

# Shape: dimensions of the array
print("Shape:", arr.shape)
# Output: Shape: (3, 4)

# Size: total number of elements
print("Size:", arr.size)
# Output: Size: 12

# ndim: number of dimensions
print("Number of dimensions:", arr.ndim)
# Output: Number of dimensions: 2

# dtype: data type of elements
print("Data type:", arr.dtype)
# Output: Data type: int64

# itemsize: size of each element in bytes
print("Item size:", arr.itemsize)
# Output: Item size: 8

# nbytes: total bytes consumed by array
print("Total bytes:", arr.nbytes)
# Output: Total bytes: 96

# T: transpose
print("Transposed shape:", arr.T.shape)
# Output: Transposed shape: (4, 3)

Data Types

NumPy supports various data types. Specifying the correct dtype is important for memory efficiency:

import numpy as np

# Integer types
int_arr = np.array([1, 2, 3], dtype=np.int32)
print("int32:", int_arr.dtype)
# Output: int32: int32

# Float types
float_arr = np.array([1.5, 2.5, 3.5], dtype=np.float32)
print("float32:", float_arr.dtype)
# Output: float32: float32

# Boolean type
bool_arr = np.array([True, False, True], dtype=np.bool_)
print("bool:", bool_arr.dtype)
# Output: bool: bool

# Complex type
complex_arr = np.array([1+2j, 3+4j], dtype=np.complex128)
print("complex128:", complex_arr.dtype)
# Output: complex128: complex128

# String type
string_arr = np.array(['a', 'b', 'c'], dtype='U1')
print("string:", string_arr.dtype)
# Output: string: <U1

Part 3: Indexing and Slicing

Indexing and slicing allow you to access and modify array elements:

Basic Indexing

import numpy as np

# 1D array indexing
arr1d = np.array([10, 20, 30, 40, 50])
print("First element:", arr1d[0])
# Output: First element: 10

print("Last element:", arr1d[-1])
# Output: Last element: 50

# 2D array indexing
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

print("Element at [1, 2]:", arr2d[1, 2])
# Output: Element at [1, 2]: 6

print("First row:", arr2d[0])
# Output: First row: [1 2 3]

print("First column:", arr2d[:, 0])
# Output: First column: [1 4 7]

Slicing

Slicing extracts portions of arrays using the syntax start:stop:step:

import numpy as np

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Basic slicing
print("arr[2:5]:", arr[2:5])
# Output: arr[2:5]: [2 3 4]

# Slicing with step
print("arr[::2]:", arr[::2])
# Output: arr[::2]: [0 2 4 6 8]

# Reverse array
print("arr[::-1]:", arr[::-1])
# Output: arr[::-1]: [9 8 7 6 5 4 3 2 1 0]

# 2D slicing
arr2d = np.array([[1, 2, 3, 4],
                  [5, 6, 7, 8],
                  [9, 10, 11, 12]])

print("First two rows, last two columns:")
print(arr2d[:2, -2:])
# Output:
# [[3 4]
#  [7 8]]

Boolean Indexing

Boolean indexing allows you to select elements based on conditions:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Select elements greater than 5
mask = arr > 5
print("Elements > 5:", arr[mask])
# Output: Elements > 5: [ 6  7  8  9 10]

# Select even numbers
even_mask = arr % 2 == 0
print("Even numbers:", arr[even_mask])
# Output: Even numbers: [ 2  4  6  8 10]

# Multiple conditions
result = arr[(arr > 3) & (arr < 8)]
print("Elements between 3 and 8:", result)
# Output: Elements between 3 and 8: [4 5 6 7]

Fancy Indexing

Use arrays of indices to select elements:

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

# Select specific indices
indices = np.array([0, 2, 4])
print("Elements at indices [0, 2, 4]:", arr[indices])
# Output: Elements at indices [0, 2, 4]: [10 30 50]

# 2D fancy indexing
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

rows = np.array([0, 2])
cols = np.array([1, 2])
print("Elements at (0,1), (2,2):", arr2d[rows, cols])
# Output: Elements at (0,1), (2,2): [2 9]

Part 4: Reshaping and Manipulation

Reshaping allows you to change array dimensions without changing data:

Reshaping Arrays

import numpy as np

# Create a 1D array
arr = np.arange(12)
print("Original array:", arr)
# Output: Original array: [ 0  1  2  3  4  5  6  7  8  9 10 11]

# Reshape to 2D
arr_2d = arr.reshape(3, 4)
print("Reshaped to (3, 4):")
print(arr_2d)
# Output:
# Reshaped to (3, 4):
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Reshape to 3D
arr_3d = arr.reshape(2, 2, 3)
print("Reshaped to (2, 2, 3):")
print(arr_3d)
# Output:
# Reshaped to (2, 2, 3):
# [[[ 0  1  2]
#   [ 3  4  5]]
#  [[ 6  7  8]
#   [ 9 10 11]]]

# Flatten array
flattened = arr_2d.flatten()
print("Flattened:", flattened)
# Output: Flattened: [ 0  1  2  3  4  5  6  7  8  9 10 11]

# Ravel (similar to flatten but returns view)
raveled = arr_2d.ravel()
print("Raveled:", raveled)
# Output: Raveled: [ 0  1  2  3  4  5  6  7  8  9 10 11]

Concatenating and Stacking

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Concatenate along axis 0 (default)
concat = np.concatenate([arr1, arr2])
print("Concatenated:", concat)
# Output: Concatenated: [1 2 3 4 5 6]

# Stack vertically (row-wise)
arr1_2d = np.array([[1, 2], [3, 4]])
arr2_2d = np.array([[5, 6], [7, 8]])

vstack = np.vstack([arr1_2d, arr2_2d])
print("Vertical stack:")
print(vstack)
# Output:
# Vertical stack:
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

# Stack horizontally (column-wise)
hstack = np.hstack([arr1_2d, arr2_2d])
print("Horizontal stack:")
print(hstack)
# Output:
# Horizontal stack:
# [[1 2 5 6]
#  [3 4 7 8]]

Splitting Arrays

import numpy as np

arr = np.arange(12).reshape(3, 4)
print("Original array:")
print(arr)
# Output:
# Original array:
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Split horizontally
split_h = np.hsplit(arr, 2)
print("Horizontal split (2 parts):")
for i, part in enumerate(split_h):
    print(f"Part {i}:")
    print(part)
# Output:
# Horizontal split (2 parts):
# Part 0:
# [[0 1]
#  [4 5]
#  [8 9]]
# Part 1:
# [[ 2  3]
#  [ 6  7]
#  [10 11]]

Part 5: Mathematical Operations

NumPy provides powerful mathematical operations that work on entire arrays:

Element-wise Operations

import numpy as np

arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])

# Arithmetic operations
print("Addition:", arr1 + arr2)
# Output: Addition: [ 6  8 10 12]

print("Subtraction:", arr1 - arr2)
# Output: Subtraction: [-4 -4 -4 -4]

print("Multiplication:", arr1 * arr2)
# Output: Multiplication: [ 5 12 21 32]

print("Division:", arr2 / arr1)
# Output: Division: [5.  3.  2.33333333 2. ]

print("Power:", arr1 ** 2)
# Output: Power: [ 1  4  9 16]

print("Modulo:", arr2 % arr1)
# Output: Modulo: [0 0 1 0]

Aggregation Functions

Aggregation functions reduce arrays to single values:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Sum
print("Sum:", np.sum(arr))
# Output: Sum: 55

# Mean
print("Mean:", np.mean(arr))
# Output: Mean: 5.5

# Median
print("Median:", np.median(arr))
# Output: Median: 5.5

# Standard deviation
print("Std Dev:", np.std(arr))
# Output: Std Dev: 2.8722813232690143

# Min and Max
print("Min:", np.min(arr))
# Output: Min: 1

print("Max:", np.max(arr))
# Output: Max: 10

# 2D aggregation
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

print("Sum along axis 0 (columns):", np.sum(arr2d, axis=0))
# Output: Sum along axis 0 (columns): [12 15 18]

print("Sum along axis 1 (rows):", np.sum(arr2d, axis=1))
# Output: Sum along axis 1 (rows): [ 6 15 24]

Mathematical Functions

import numpy as np

arr = np.array([0, np.pi/2, np.pi])

# Trigonometric functions
print("sin:", np.sin(arr))
# Output: sin: [0.00000000e+00 1.00000000e+00 1.22464680e-16]

print("cos:", np.cos(arr))
# Output: cos: [ 1.00000000e+00  6.12323400e-17 -1.00000000e+00]

# Exponential and logarithm
arr_exp = np.array([1, 2, 3])
print("exp:", np.exp(arr_exp))
# Output: exp: [ 2.71828183  7.3890561  20.08553692]

print("log:", np.log(arr_exp))
# Output: log: [0.         0.69314718 1.09861229]

# Square root
print("sqrt:", np.sqrt(arr_exp))
# Output: sqrt: [1.         1.41421356 1.73205081]

# Absolute value
arr_neg = np.array([-1, -2, 3, -4])
print("abs:", np.abs(arr_neg))
# Output: abs: [1 2 3 4]

Broadcasting

Broadcasting allows operations on arrays of different shapes:

import numpy as np

# 1D array and scalar
arr = np.array([1, 2, 3, 4])
scalar = 10
print("Array + scalar:", arr + scalar)
# Output: Array + scalar: [11 12 13 14]

# 2D array and 1D array
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6]])
arr1d = np.array([10, 20, 30])

print("2D + 1D (broadcasting):")
print(arr2d + arr1d)
# Output:
# 2D + 1D (broadcasting):
# [[11 22 33]
#  [14 25 36]]

# Different shapes
arr_a = np.array([[1], [2], [3]])  # Shape (3, 1)
arr_b = np.array([10, 20, 30])      # Shape (3,)

print("Broadcasting (3,1) + (3,):")
print(arr_a + arr_b)
# Output:
# Broadcasting (3,1) + (3,):
# [[11 21 31]
#  [12 22 32]
#  [13 23 33]]

Linear Algebra

import numpy as np

# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix multiplication (A @ B):")
print(A @ B)
# Output:
# Matrix multiplication (A @ B):
# [[19 22]
#  [43 50]]

# Dot product
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
print("Dot product:", np.dot(v1, v2))
# Output: Dot product: 32

# Determinant
print("Determinant of A:", np.linalg.det(A))
# Output: Determinant of A: -2.0

# Inverse
print("Inverse of A:")
print(np.linalg.inv(A))
# Output:
# Inverse of A:
# [[-2.   1. ]
#  [ 1.5 -0.5]]

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
# Output: Eigenvalues: [-0.37228132  5.37228132]

Part 6: Practical Examples

Example 1: Image Processing

import numpy as np

# Create a simple grayscale image (10x10)
image = np.random.randint(0, 256, size=(10, 10), dtype=np.uint8)

print("Original image shape:", image.shape)
# Output: Original image shape: (10, 10)

# Normalize image to [0, 1]
normalized = image / 255.0
print("Normalized image min/max:", normalized.min(), normalized.max())
# Output: Normalized image min/max: 0.0 1.0

# Apply brightness adjustment
brightened = np.clip(image * 1.2, 0, 255).astype(np.uint8)
print("Brightened image shape:", brightened.shape)
# Output: Brightened image shape: (10, 10)

# Calculate image statistics
print("Mean pixel value:", np.mean(image))
print("Std deviation:", np.std(image))

Example 2: Statistical Analysis

import numpy as np

# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=100, scale=15, size=1000)

# Calculate statistics
mean = np.mean(data)
median = np.median(data)
std = np.std(data)
variance = np.var(data)

print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Std Dev: {std:.2f}")
print(f"Variance: {variance:.2f}")

# Percentiles
p25 = np.percentile(data, 25)
p75 = np.percentile(data, 75)
print(f"25th percentile: {p25:.2f}")
print(f"75th percentile: {p75:.2f}")

# Count values in range
in_range = np.sum((data > 85) & (data < 115))
print(f"Values between 85 and 115: {in_range}")

Example 3: Data Normalization

import numpy as np

# Sample data
data = np.array([10, 20, 30, 40, 50])

# Min-Max normalization (scale to [0, 1])
min_val = np.min(data)
max_val = np.max(data)
normalized = (data - min_val) / (max_val - min_val)
print("Min-Max normalized:", normalized)
# Output: Min-Max normalized: [0.   0.25 0.5  0.75 1.  ]

# Z-score normalization (standardization)
mean = np.mean(data)
std = np.std(data)
z_score = (data - mean) / std
print("Z-score normalized:", z_score)
# Output: Z-score normalized: [-1.41421356 -0.70710678  0.          0.70710678  1.41421356]

Example 4: Matrix Operations

import numpy as np

# Create sample matrices
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

B = np.array([[9, 8, 7],
              [6, 5, 4],
              [3, 2, 1]])

# Element-wise operations
print("A + B:")
print(A + B)
# Output:
# A + B:
# [[10 10 10]
#  [10 10 10]
#  [10 10 10]]

# Matrix multiplication
print("A @ B:")
print(A @ B)
# Output:
# A @ B:
# [[30 24 18]
#  [84 69 54]
#  [138 114 90]]

# Transpose
print("A.T:")
print(A.T)
# Output:
# A.T:
# [[1 4 7]
#  [2 5 8]
#  [3 6 9]]

Best Practices

1. Use Vectorization Instead of Loops

import numpy as np

arr = np.arange(1000000)

# ✓ Good: Vectorized operation
result = arr * 2

# ❌ Avoid: Explicit loop
result_loop = np.array([x * 2 for x in arr])

2. Specify Data Types for Memory Efficiency

import numpy as np

# ✓ Good: Use appropriate dtype
arr_int32 = np.array([1, 2, 3], dtype=np.int32)

# ❌ Avoid: Default dtype might use more memory
arr_default = np.array([1, 2, 3])

3. Use Views Instead of Copies When Possible

import numpy as np

arr = np.arange(10)

# ✓ Good: Slicing creates a view (no copy)
view = arr[2:5]

# ❌ Avoid: Unnecessary copying
copy = arr[2:5].copy()

4. Use Broadcasting to Avoid Explicit Loops

import numpy as np

# ✓ Good: Broadcasting
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
arr1d = np.array([10, 20, 30])
result = arr2d + arr1d

# ❌ Avoid: Explicit loop
result_loop = np.zeros_like(arr2d)
for i in range(arr2d.shape[0]):
    result_loop[i] = arr2d[i] + arr1d

5. Use NumPy Functions Instead of Python Built-ins

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# ✓ Good: NumPy functions
mean = np.mean(arr)
total = np.sum(arr)

# ❌ Avoid: Python built-ins (slower)
mean_py = sum(arr) / len(arr)
total_py = sum(arr)

Conclusion

NumPy is essential for numerical computing in Python. By mastering arrays and operations, you unlock the power of scientific computing.

Key takeaways:

NumPy arrays are faster than Python lists - Use them for numerical data
Vectorization eliminates loops - Write cleaner, faster code
Broadcasting enables elegant operations - Work with arrays of different shapes
Indexing and slicing are powerful - Access and modify data efficiently
Aggregation functions reduce complexity - Calculate statistics easily
Linear algebra operations are built-in - Perform matrix operations natively
Memory efficiency matters - Choose appropriate data types

Next Steps

Now that you understand NumPy fundamentals, explore:

Pandas: Built on NumPy, adds labeled data structures
Scikit-learn: Machine learning library using NumPy
Matplotlib: Visualization library that works with NumPy
SciPy: Scientific computing library extending NumPy
TensorFlow/PyTorch: Deep learning frameworks built on NumPy concepts

NumPy is the foundation of the Python data science ecosystem. Invest time in mastering it, and you’ll accelerate your learning of more advanced tools. Start with simple arrays, gradually explore complex operations, and soon you’ll be writing efficient numerical code with confidence.

Happy computing!