What is Gradient Descent in Machine Learning
Gradient descent (GD) is an algorithm in machine learning for finding the optimal values for weights in neural networks. Furthermore, it works by iteratively updating these weights in the direction of steepest descent of the loss function until it reaches its minimum.
In this article we’re going to dive deeper into the concept of gradient descent and its importance in machine learning.
How does it work?
GD plays a crucial role in the training process of a neural network. But let’s take a step back first and see where it comes in.
Before we can start optimizing the weights, we need to initialize them first. We can usually do this by setting random values with Gaussian distribution. After initialization we can start processing training data.
Each training sample, or a batch of training samples, will cause backpropagation algorithm that uses GD in order to slightly adjust weights to lower loss function.
In other words, GD acts as a compass which leads backpropagation algorithm to iteratively adjust weight values. This process repeats until the loss function, which is a measure for error, reaches its minimal value.
Types of GD algorithms
Types we’re going to mention here only differ based on the number of training examples each iteration processes.
Mini batch gradient descent
This is the most common type of GD algorithm we use today, because of its efficiency. In essence, it takes a batch of training examples and updates weights based on the average of that batch.
It is efficient because it takes less calculations to update weights with each passing epoch without impacting end performance of the model.
In other words, we update weights fewer times, because we do it for each batch of examples rather than for each example individually.
Stochastic gradient descent
This type of GD updates weights for each passing training example. Therefore, it takes more calculations and also more time for each epoch. In addition, the updates it makes to weights can be noisy due to the randomness of the individual examples.
However, it is simpler to understand, when we’re first getting familiar with the training process of neural networks. And it may be even a better choise whenever we’re limited with computing power of our computer in cases of extremely complicated neural network architectures.
Conclusion
To conclude, gradient descent is an essential optimization algorithm in machine learning, we use to find the best set of weight values in neural networks. It works by updating them iteratively bit by bit by looking at the value of loss function.
I hope this article helped you gain a better understanding about gradient descent algorithms and perhaps even inspire you to learn even more.