Top 5 Machine Learning Interview Problems

1 0 0 0 0

Implementing Linear Regression from Scratch

Introduction

Linear regression is one of the simplest and most fundamental algorithms in machine learning. It forms the basis for understanding more complex models and techniques. In this tutorial, we will implement linear regression from scratch using Python and NumPy.

The goal of linear regression is to find the best-fitting line that predicts the target variable (y) based on the input features (X). The best-fitting line is computed by minimizing the mean squared error (MSE) between the predicted and actual values. To achieve this, we will use gradient descent, a method for optimizing the parameters (weights) of the linear regression model.

In this tutorial, we will break down the following steps:

  1. Understanding the basic theory behind linear regression.
  2. Implementing linear regression from scratch using gradient descent.
  3. Implementing the cost function and gradient calculation.
  4. Training the model and making predictions.
  5. Evaluating the model's performance.

By the end of this tutorial, you will have a clear understanding of how linear regression works and how to implement it from scratch in Python.


1. Theory of Linear Regression

Linear regression assumes a linear relationship between the dependent variable (y) and the independent variable(s) (X). The model predicts the value of y using a linear combination of the input features (X).

The general form of the linear regression equation is:

y = w1x1 + w2x2 +……+  wnxn+ b

Where:

  • w1,w2, … , wn are the weights (coefficients) associated with each input feature.
  • x1,x2, … , xn are the input features.
  • b is the bias term (intercept).
  • y is the predicted output.

The goal of linear regression is to find the best weights (w) and bias (b) that minimize the difference between the predicted values and the actual values.


2. Implementing Linear Regression from Scratch

2.1 Data Preparation

To train a linear regression model, we need a dataset. For simplicity, we will use a small synthetic dataset with one feature (X) and a target (y).

Here’s an example dataset:

Feature (X)

Target (y)

1

2

2

4

3

6

4

8

5

10

This dataset represents a perfect linear relationship where the target is double the value of the feature.

We will start by implementing the necessary imports and setting up the dataset.

Code Sample:

import numpy as np

import matplotlib.pyplot as plt

 

# Sample dataset

X = np.array([1, 2, 3, 4, 5])  # Feature (independent variable)

y = np.array([2, 4, 6, 8, 10])  # Target (dependent variable)

 

# Reshaping X for a single feature

X = X.reshape(-1, 1)

Here, X is the input feature, and y is the target variable. We reshape X to be a column vector because scikit-learn and NumPy expect features to be in this format.


2.2 The Cost Function

To train the linear regression model, we need to define a cost function to measure the performance of the model. The most common cost function used in linear regression is the Mean Squared Error (MSE), which is calculated as:

Screenshot 2025-04-14 155046

Where:

  • m is the number of data points.
  • Y(i) is the actual value.
  • Ŷ(i) is the predicted value.

The goal of linear regression is to minimize this error by adjusting the model parameters (weights and bias).

2.3 Implementing the Cost Function

Now, we’ll implement the cost function in Python.

Code Sample:

def compute_cost(X, y, w, b):

    m = len(X)

    # Predictions

    y_pred = np.dot(X, w) + b

    # Calculate cost

    cost = (1/(2*m)) * np.sum((y_pred - y)**2)

    return cost

This function calculates the mean squared error (MSE) between the predicted values and the actual values, returning the cost.


2.4 Implementing Gradient Descent

Gradient descent is used to minimize the cost function by iteratively adjusting the weights (w) and bias (b). The gradient descent update rule for linear regression is as follows:

Screenshot 2025-04-14 155756

Where:

  • α is the learning rate.
  • ∂J(w,b)/∂w and ∂J(w,b)/∂b are the gradients of the cost function with respect to the weights and bias.

We will implement the gradient descent algorithm that updates the weights and bias until the cost function converges.

Code Sample:

def gradient_descent(X, y, w, b, learning_rate, epochs):

    m = len(X)

    cost_history = []

   

    for epoch in range(epochs):

        # Compute the predictions

        y_pred = np.dot(X, w) + b

       

        # Calculate the gradients

        dw = (-2/m) * np.dot(X.T, (y - y_pred))

        db = (-2/m) * np.sum(y - y_pred)

       

        # Update weights and bias

        w -= learning_rate * dw

        b -= learning_rate * db

       

        # Compute the cost at each step

        cost = compute_cost(X, y, w, b)

        cost_history.append(cost)

       

        # Print the cost for every 100 epochs

        if epoch % 100 == 0:

            print(f"Epoch {epoch}: Cost {cost}")

   

    return w, b, cost_history

Explanation:

  • dw and db are the gradients of the cost function with respect to the weights and bias.
  • The weights and bias are updated using the gradient descent update rule.
  • The cost is calculated at each epoch and stored in cost_history for later plotting.

2.5 Training the Model

Now that we’ve defined the cost function and gradient descent, we can train the linear regression model on the dataset.

Code Sample:

# Initialize parameters

w = np.random.randn(1)  # Initialize weight

b = np.random.randn()   # Initialize bias

learning_rate = 0.01

epochs = 1000

 

# Train the model

w, b, cost_history = gradient_descent(X, y, w, b, learning_rate, epochs)

 

# Print the final weights and bias

print(f"Final weight: {w}")

print(f"Final bias: {b}")


2.6 Visualizing the Cost Function

To understand how the cost function decreases over time, we can plot the cost history.

Code Sample:

plt.plot(cost_history)

plt.xlabel('Epoch')

plt.ylabel('Cost')

plt.title('Cost Function Convergence')

plt.show()

This plot will show how the model converges towards the minimum cost over the epochs.


2.7 Making Predictions

Once the model is trained, we can use it to make predictions on new data.

Code Sample:

def predict(X, w, b):

    return np.dot(X, w) + b

 

# Example prediction

X_new = np.array([6]).reshape(-1, 1)  # New data point

prediction = predict(X_new, w, b)

print(f"Prediction for input 6: {prediction}")


3. Evaluation of the Model

After training, it's important to evaluate the model’s performance. One common metric for regression tasks is R-squared (R²), which measures how well the model explains the variance in the data.

3.1 Implementing R² Score

The R² score is calculated as:

Screenshot 2025-04-14 155942

Where:

  • ytrue are the actual values.
  • ypred are the predicted values.
  • ȳ is the mean of the actual values.

Code Sample:

def r_squared(y_true, y_pred):

    ss_total = np.sum((y_true - np.mean(y_true))**2)

    ss_residual = np.sum((y_true - y_pred)**2)

    return 1 - (ss_residual / ss_total)

 

# Compute R² score

y_pred = predict(X, w, b)

r2_score = r_squared(y, y_pred)

print(f"R² score: {r2_score}")


4. Conclusion

In this tutorial, we have implemented linear regression from scratch using Python and NumPy. We covered the following key steps:

  1. Cost function: We used Mean Squared Error (MSE) as the cost function.
  2. Gradient descent: We implemented the gradient descent algorithm to optimize the model parameters.
  3. Training the model: We trained the model using a synthetic dataset and visualized the convergence of the cost function.
  4. Making predictions: We demonstrated how to make predictions with the trained model.
  5. Evaluation: We introduced R² as a metric to evaluate the model’s performance.


By building the model from scratch, you gained a deeper understanding of how linear regression works and the importance of gradient descent in training machine learning models.

Back

FAQs


1. What is the difference between supervised and unsupervised learning?

Answer: Supervised learning involves training a model on labeled data (input-output pairs), while unsupervised learning involves finding patterns or structures in data without labeled responses.

2. What is the purpose of cross-validation in machine learning?

Answer: Cross-validation is used to assess the model’s performance by training and testing it on different subsets of the data, helping to avoid overfitting and ensuring the model generalizes well to unseen data.

3. How does gradient descent work in machine learning?

Answer: Gradient descent is an optimization algorithm that iteratively adjusts the model’s parameters in the opposite direction of the gradient of the loss function, thereby minimizing the loss.

4. What is the "kernel trick" in SVM?

Answer: The kernel trick is a technique that allows SVMs to efficiently perform non-linear classification by mapping the input data into a higher-dimensional space where a linear hyperplane can be found.

5. How do decision trees handle overfitting?

Answer: Decision trees can overfit if they grow too deep, capturing noise in the data. This can be controlled by limiting the depth of the tree or by pruning the tree after it has been built.

6. What is the main advantage of using a Random Forest over a single Decision Tree?

Answer: A Random Forest aggregates the predictions of multiple decision trees, which reduces variance and overfitting compared to using a single decision tree.

7. What is the intuition behind KNN?

Answer: KNN classifies data points based on the majority class of their K nearest neighbors in the feature space, using a distance metric like Euclidean distance.

8. How do you select the value of K in KNN?

Answer: The value of K is selected through experimentation or by using cross-validation. A small K may lead to overfitting, while a large K may underfit the model.

9. What are the advantages of SVM for classification?

Answer: SVMs are effective in high-dimensional spaces, handle non-linear data well using the kernel trick, and are less prone to overfitting compared to other classifiers like decision trees.

10. What is the difference between classification and regression problems?

Answer: Classification problems involve predicting discrete labels (e.g., classifying images as cats or dogs), while regression problems involve predicting continuous values (e.g., predicting house prices).