SCD
STOCHASTIC GRADIENT
DESCENT ALGORITHM
Introduction:
Gradient Descent is an iterative optimization technique that seeks the
optimal value (Minimum/Maximum) of an objective function. It is one of the most
used techniques for altering a model's parameters in machine learning projects
in order to lower a cost function.
Typically, there are three types of Gradient Descent:
2. Stochastic Gradient Descent
3. Mini-batch Gradient Descent
What is Stochastic Gradient Descent?
Stochastic gradient descent is an optimization
algorithm often used in machine learning applications to find the model
parameters that correspond to the best fit between predicted and actual
outputs. It’s an inexact but powerful technique.
SGD has the benefit of being computationally efficient,
particularly when working with huge datasets. Compared to conventional Gradient
Descent techniques that demand processing the complete dataset, employing a
single sample or a small batch considerably lowers the computing cost per
iteration.
Here's how the stochastic
gradient descent algorithm works:
1. Initialization: Randomly initialize the parameters of the model.
2. Set Parameters: Determine the number of iterations and the
learning rate (alpha) for updating the parameters.
3. Stochastic Gradient Descent Loop: Repeat the following steps until the model converges or reaches the
maximum number of iterations:
a.
Shuffle the training dataset to introduce randomness.
b.
Iterate over each training example (or a small batch) in the shuffled
order.
c.
Compute the gradient of the cost function with respect to the model parameters
using the current training
example (or batch).
d. Update
the model parameters by taking a step in the direction of the negative
gradient, scaled by the learning rate.
e.
Evaluate the convergence criteria, such as the difference in the cost function
between iterations of the gradient.
4. Return Optimized Parameters: Once the convergence criteria are met or the maximum number of
iterations is reached, return the optimized model parameters.
In SGD, since only one sample from the dataset is chosen at random for
each iteration, the path taken by the algorithm to reach the minima is usually
noisier than your typical Gradient Descent algorithm. But that doesn’t matter
all that much because the path taken by the algorithm does not matter, as long
as we reach the minimum and with a significantly shorter training time.
Fig: stochastic
gradient optimization path
Conclusion:
Large-scale machine learning models frequently use SGD for training,
especially when the full dataset does not fit in memory. SGD offers effective
and scalable training by randomly picking tiny chunks of the data. As a result
of the numerous parameters and data samples, batch gradient descent is
frequently utilized in deep learning algorithms like those for training neural
networks.
References
· https://realpython.com/gradient-descent-algorithm-python/
· https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/
Narsima Ahmed
@INTERNATIONAL SCHOOL OF MANAGEMENT EXCELLENCE
Intern @Hunnarvi Technologies under
guidance of Nanobi data and analytics pvt ltd.
Views are personal.
#gradientdecent #SCD #algorithm #nanobi #hunnarvi #ISME
Comments
Post a Comment