Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Take A QuizChallenge yourself and boost your learning! Start the quiz now to earn credits.
Take A QuizUnlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
Take A Quiz
Introduction
In this chapter, we will walk through the process of
building a simple Feedforward Neural Network (FNN) from scratch using Python
and NumPy. Neural networks are the foundation of deep learning models,
and understanding how they work at the core level is essential for mastering
more complex models like Convolutional Neural Networks (CNNs) or Recurrent
Neural Networks (RNNs).
We will implement a basic neural network with one hidden
layer. The network will be trained using backpropagation and gradient
descent to minimize the error between predicted and actual outputs.
This tutorial will cover the following:
By the end of this chapter, you will have a deep
understanding of how neural networks work and how they are trained.
1. Neural Network Architecture
A neural network consists of layers of neurons, each of
which performs a mathematical operation on the input data. The architecture of
a neural network typically consists of three types of layers:
In this example, we will create a network with:
2. Mathematical Representation of Neural Networks
The neural network operates through forward propagation
and backpropagation.
Where:
3. Implementing Forward Propagation
3.1 Initializing the Weights and Biases
Before the forward pass, we need to initialize the weights
and biases. The weights are usually initialized randomly, and the biases are
initialized to zero or small values.
Code Sample:
import
numpy as np
#
Initialize the weights and biases
def
initialize_parameters(input_size, hidden_size, output_size):
W1 = np.random.randn(hidden_size,
input_size) * 0.01 # Weight matrix for
hidden layer
b1 = np.zeros((hidden_size, 1)) # Bias for hidden layer
W2 = np.random.randn(output_size,
hidden_size) * 0.01 # Weight matrix for
output layer
b2 = np.zeros((output_size, 1)) # Bias for output layer
parameters = {
'W1': W1,
'b1': b1,
'W2': W2,
'b2': b2
}
return parameters
Explanation:
3.2 Forward Propagation
The forward propagation step involves computing the
activations for each layer. For the hidden layer, we compute the weighted sum
of the inputs, pass it through the ReLU activation function, and then
repeat the same for the output layer with a sigmoid activation function
for binary classification.
Code Sample:
def
forward_propagation(X, parameters):
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
# Compute activations for hidden layer
Z1 = np.dot(W1, X) + b1
A1 = np.maximum(0, Z1) # ReLU activation function
# Compute activations for output layer
Z2 = np.dot(W2, A1) + b2
A2 = 1 / (1 + np.exp(-Z2)) # Sigmoid activation function
cache = (Z1, A1, Z2, A2)
return A2, cache
Explanation:
4. Implementing Backpropagation
4.1 Calculating the Cost Function
The cost function measures the difference between the
predicted outputs and the true labels. For binary classification, we use the cross-entropy
loss:
Where:
Code Sample:
def
compute_cost(A2, Y):
m = Y.shape[1] # Number of examples
cost = - (1/m) * np.sum(Y * np.log(A2) + (1
- Y) * np.log(1 - A2))
return cost
Explanation:
4.2 Backpropagation to Compute Gradients
Backpropagation computes the gradients of the cost function
with respect to each parameter (weights and biases). We will use these
gradients to update the weights using gradient descent.
Code Sample:
def
backpropagation(X, Y, parameters, cache):
m = X.shape[1] # Number of examples
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
# Retrieve values from the cache
Z1, A1, Z2, A2 = cache
# Compute gradients for output layer
dZ2 = A2 - Y
dW2 = (1/m) * np.dot(dZ2, A1.T)
db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)
# Compute gradients for hidden layer
dZ1 = np.dot(W2.T, dZ2) * (A1 > 0) # Derivative of ReLU
dW1 = (1/m) * np.dot(dZ1, X.T)
db1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True)
gradients = {
'dW1': dW1,
'db1': db1,
'dW2': dW2,
'db2': db2
}
return gradients
Explanation:
4.3 Gradient Descent to Update Parameters
After calculating the gradients, we update the weights and
biases using gradient descent:
Where α is the learning rate.
Code Sample:
def
gradient_descent(parameters, gradients, learning_rate=0.01):
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
# Update parameters
W1 -= learning_rate * gradients['dW1']
b1 -= learning_rate * gradients['db1']
W2 -= learning_rate * gradients['dW2']
b2 -= learning_rate * gradients['db2']
updated_parameters = {
'W1': W1,
'b1': b1,
'W2': W2,
'b2': b2
}
return updated_parameters
Explanation:
5. Putting It All Together: Training the Neural Network
We can now train the neural network by performing forward
propagation, calculating the cost, backpropagating the gradients, and updating
the parameters over multiple iterations (epochs).
Code Sample:
def
train_neural_network(X_train, Y_train, input_size, hidden_size, output_size,
epochs=1000, learning_rate=0.01):
# Initialize parameters
parameters =
initialize_parameters(input_size, hidden_size, output_size)
for epoch in range(epochs):
# Forward propagation
A2, cache =
forward_propagation(X_train, parameters)
# Compute cost
cost = compute_cost(A2, Y_train)
# Backpropagation
gradients = backpropagation(X_train,
Y_train, parameters, cache)
# Update parameters
parameters =
gradient_descent(parameters, gradients, learning_rate)
if epoch % 100 == 0:
print(f"Epoch {epoch}, Cost: {cost}")
return parameters
Explanation:
6. Conclusion
In this chapter, we built a Feedforward Neural Network
(FNN) from scratch using Python and NumPy. We covered:
This tutorial provided a hands-on approach to understanding
how neural networks work and how they are trained. While deep learning
frameworks like TensorFlow and PyTorch handle these steps
automatically, implementing a neural network from scratch gives you a deeper
understanding of the inner workings of neural networks.
Answer: A neural network is a computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons). Each node performs a mathematical operation on the input and passes the output to the next layer. The network is trained using backpropagation and gradient descent to minimize the error between predicted and actual outputs.
Answer: A CNN is designed for image data and uses convolutional layers to extract features from images. It is effective for tasks like image classification and object detection. An RNN, on the other hand, is designed for sequential data and uses feedback connections to handle time-dependent data, such as text, speech, or time series.
Answer: The vanishing gradient problem occurs when gradients become too small during backpropagation in deep networks, making learning difficult. LSTM cells solve this by using gates to regulate the flow of information, allowing the network to capture long-term dependencies without the gradients vanishing.
Answer: In a GAN, the generator creates fake data that resembles real data, while the discriminator evaluates whether the data is real or fake. They are trained together in an adversarial manner, where the generator tries to fool the discriminator, and the discriminator tries to correctly identify real vs. fake data.
Answer: Overfitting occurs when a model learns the details of the training data too well, leading to poor generalization on new data. We can prevent overfitting using techniques like dropout, L2 regularization, and early stopping.
Answer: Activation functions introduce non-linearity into the network, allowing it to learn complex patterns. Common activation functions include ReLU, sigmoid, and tanh. Without activation functions, the network would essentially be a linear model.
Answer: The optimal number of layers and neurons depends on the complexity of the problem and the dataset. Generally, more complex tasks require deeper networks. Techniques like cross-validation and hyperparameter tuning can help find the best configuration.
Answer: Batch normalization normalizes the inputs to each layer, which helps reduce internal covariate shift and accelerates training. It can also improve the model’s generalization and stability.
Answer: Dropout is a regularization technique where randomly selected neurons are ignored during training. This prevents overfitting by ensuring that the network does not rely too heavily on any single neuron, encouraging more robust learning.
Answer: Supervised learning involves training a model on labeled data to predict outputs for unseen inputs, such as image classification. Unsupervised learning, on the other hand, deals with data without labels and involves tasks like clustering or dimensionality reduction (e.g., k-means clustering, autoencoders).
Please log in to access this content. You will be redirected to the login page shortly.
LoginReady to take your education and career to the next level? Register today and join our growing community of learners and professionals.
Comments(0)