SCD

 

STOCHASTIC GRADIENT DESCENT ALGORITHM

Introduction:

Gradient Descent is an iterative optimization technique that seeks the optimal value (Minimum/Maximum) of an objective function. It is one of the most used techniques for altering a model's parameters in machine learning projects in order to lower a cost function. 

Typically, there are three types of Gradient Descent:  

1.     Batch Gradient Descent

2.     Stochastic Gradient Descent

3.     Mini-batch Gradient Descent

What is Stochastic Gradient Descent?

Stochastic gradient descent is an optimization algorithm often used in machine learning applications to find the model parameters that correspond to the best fit between predicted and actual outputs. It’s an inexact but powerful technique.

SGD has the benefit of being computationally efficient, particularly when working with huge datasets. Compared to conventional Gradient Descent techniques that demand processing the complete dataset, employing a single sample or a small batch considerably lowers the computing cost per iteration.

Here's how the stochastic gradient descent algorithm works:

1.     Initialization: Randomly initialize the parameters of the model.

2.     Set Parameters: Determine the number of iterations and the learning rate (alpha) for updating the parameters.

3.     Stochastic Gradient Descent Loop: Repeat the following steps until the model converges or reaches the maximum number of iterations: 

                  a. Shuffle the training dataset to introduce randomness. 

                  b. Iterate over each training example (or a small batch) in the shuffled order. 

                  c. Compute the gradient of the cost function with respect to the model parameters using the current training                         example (or batch).                   

                 d. Update the model parameters by taking a step in the direction of the negative gradient, scaled by the learning rate. 

                  e. Evaluate the convergence criteria, such as the difference in the cost function between iterations of the gradient.

4.     Return Optimized Parameters: Once the convergence criteria are met or the maximum number of iterations is reached, return the optimized model parameters.

 

In SGD, since only one sample from the dataset is chosen at random for each iteration, the path taken by the algorithm to reach the minima is usually noisier than your typical Gradient Descent algorithm. But that doesn’t matter all that much because the path taken by the algorithm does not matter, as long as we reach the minimum and with a significantly shorter training time.

 

 

 

 

 

 

                                    Fig: stochastic gradient optimization path 

 

Conclusion:

Large-scale machine learning models frequently use SGD for training, especially when the full dataset does not fit in memory. SGD offers effective and scalable training by randomly picking tiny chunks of the data. As a result of the numerous parameters and data samples, batch gradient descent is frequently utilized in deep learning algorithms like those for training neural networks.

References

·       https://realpython.com/gradient-descent-algorithm-python/

·       https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/

 

Narsima Ahmed

@INTERNATIONAL SCHOOL OF MANAGEMENT EXCELLENCE

Intern @Hunnarvi Technologies under guidance of Nanobi data and analytics pvt ltd.

Views are personal.

#gradientdecent #SCD #algorithm #nanobi #hunnarvi #ISME

 

Comments

Popular posts from this blog

Koala: A Dialogue Model for Academic Research