Introduction
Neural networks are the foundation of modern artificial intelligence. From image recognition to natural language processing, deep learning powers breakthrough applications. This guide covers neural network fundamentals and practical implementation.
What Are Neural Networks
Biological Inspiration
Neural networks are inspired by the human brain. They consist of interconnected nodes (neurons) that process information.
The Perceptron
The simplest neural network:
import numpy as np
class Perceptron:
def __init__(self, n_inputs):
self.weights = np.random.randn(n_inputs)
self.bias = 0
def forward(self, x):
z = np.dot(x, self.weights) + self.bias
return 1 if z > 0 else 0
Network Architecture
Layers
- Input Layer: Receives data
- Hidden Layers: Process information
- Output Layer: Produces results
Activation Functions
import numpy as np
# ReLU (most common)
def relu(x):
return np.maximum(0, x)
# Sigmoid (for probabilities)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Softmax (for multi-class)
def softmax(x):
exp_x = np.exp(x - np.max(x))
return exp_x / exp_x.sum(axis=1, keepdims=True)
# Tanh
def tanh(x):
return np.tanh(x)
Forward Propagation
How It Works
class NeuralNetwork:
def __init__(self, layer_sizes):
self.weights = []
self.biases = []
for i in range(len(layer_sizes) - 1):
w = np.random.randn(layer_sizes[i], layer_sizes[i+1]) * 0.01
b = np.zeros((1, layer_sizes[i+1]))
self.weights.append(w)
self.biases.append(b)
def forward(self, X):
self.activations = [X]
for i in range(len(self.weights)):
z = np.dot(self.activations[-1], self.weights[i]) + self.biases[i]
a = relu(z) if i < len(self.weights) - 1 else softmax(z)
self.activations.append(a)
return self.activations[-1]
Backpropagation
The Learning Algorithm
def backward(self, X, y, learning_rate=0.01):
m = X.shape[0]
deltas = [None] * len(self.weights)
# Output layer error
output_error = self.activations[-1] - y
deltas[-1] = output_error
# Hidden layers
for i in range(len(self.weights) - 2, -1, -1):
error = deltas[i+1].dot(self.weights[i+1].T)
deltas[i] = error * (self.activations[i+1] > 0) # ReLU derivative
# Update weights and biases
for i in range(len(self.weights)):
self.weights[i] -= learning_rate * self.activations[i].T.dot(deltas[i]) / m
self.biases[i] -= learning_rate * np.sum(deltas[i], axis=0, keepdims=True) / m
Training Process
Full Training Loop
def train(self, X, y, epochs=1000, learning_rate=0.01):
for epoch in range(epochs):
# Forward pass
output = self.forward(X)
# Compute loss
loss = -np.mean(y * np.log(output + 1e-8))
# Backward pass
self.backward(X, y, learning_rate)
if epoch % 100 == 0:
print(f"Epoch {epoch}, Loss: {loss:.4f}")
Types of Neural Networks
Feedforward Neural Networks
Basic networks where data flows in one direction:
# Simple FNN with Keras
from tensorflow import keras
model = keras.Sequential([
keras.layers.Dense(128, activation='relu', input_shape=(784,)),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
Convolutional Neural Networks (CNNs)
For image data:
# CNN for image classification
from tensorflow import keras
model = keras.Sequential([
keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
keras.layers.MaxPooling2D((2,2)),
keras.layers.Conv2D(64, (3,3), activation='relu'),
keras.layers.MaxPooling2D((2,2)),
keras.layers.Flatten(),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
Recurrent Neural Networks (RNNs)
For sequential data:
# LSTM for sequence prediction
from tensorflow import keras
model = keras.Sequential([
keras.layers.LSTM(64, return_sequences=True, input_shape=(None, 1)),
keras.layers.LSTM(32),
keras.layers.Dense(1)
])
Transformers
The architecture behind modern NLP:
# BERT for text classification
from transformers import TFBertModel
bert = TFBertModel.from_pretrained('bert-base-uncased')
input_ids = keras.layers.Input(shape=(128,), dtype='int32')
attention_mask = keras.layers.Input(shape=(128,), dtype='int32')
output = bert(input_ids, attention_mask=attention_mask)
pooled_output = output.pooler_output
Practical Applications
Image Classification
# Using pre-trained model
from tensorflow.keras.applications import ResNet50
model = ResNet50(weights='imagenet')
# Predict
from tensorflow.keras.preprocessing import image
img = image.load_img('cat.jpg', target_size=(224, 224))
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0)
predictions = model.predict(img_array)
Text Classification
# Using Hugging Face
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
result = classifier("This product is amazing!")
Object Detection
# Using YOLO
import cv2
model = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
Optimization Techniques
Gradient Descent Variants
- SGD: Basic gradient descent
- Adam: Adaptive learning rate (most popular)
- RMSprop: Root mean square propagation
model.compile(optimizer='adam', loss='categorical_crossentropy')
Regularization
# Dropout
keras.layers.Dropout(0.5)
# L2 Regularization
keras.layers.Dense(64, kernel_regularizer=keras.regularizers.l2(0.01))
Batch Normalization
keras.layers.BatchNormalization()
Common Challenges
Overfitting
When model memorizes training data:
- Solution: Dropout, regularization, more data
Vanishing Gradients
When gradients become too small:
- Solution: Better activations (ReLU), residual connections
Dying ReLU
When neurons always output zero:
- Solution: Leaky ReLU, proper initialization
Conclusion
Neural networks revolutionized AI. Start with simple architectures and progressively explore CNNs, RNNs, and transformers. The key is understanding fundamentals before diving into complex architectures.
Comments