Fisher’s Linear Discriminant

Training a classifier consists of finding the weights vector that best separates the data. How you define this separation is what makes classifiers different. In a least-squares classifier, we find the vector that minimizes the mean squared error and is optimized through an optimization algorithm like stochastic gradient descent. In Fisher’s linear discriminant, we attempt to separate the data based on the distributions rather than adapting the weights vector with each datapoint. Fisher’s linear discriminant can be used as a supervised learning classifier. Given labeled data, the classifier can find a set of weights to draw a decision boundary, classifying the data. Fisher’s linear discriminant attempts to find the vector that maximizes the separation between classes of the projected data. Maximizing “separation” can be ambiguous. The criteria that Fisher’s linear discriminant follows to do this is to maximize the distance of the projected means and to minimize the projected within-class variance.

 

Our mission

Let’s set a problem, if you can complete it you’ll understand Fisher’s linear discriminants!

https://miro.medium.com/v2/resize:fit:414/1*Kn2Jf0QkNtmwgz253hdIIw.png

Here are two bivariate Gaussians with identical covariance matrices and distinct means. We want to find the vector that best separates the projections of the data. Let's draw a random vector and plot the projections.

https://miro.medium.com/v2/resize:fit:778/1*e2xzPGfWCQjGugupJgcEww.png

Remember we are looking at projections of the data onto the vector (the dot product of the weights vector and the data matrix) and not a decision boundary. The projections of the data onto this random weight vector can be plotted as a histogram (image on the right). As you can see when projecting the data onto the vector and drawing the histogram the two classes of data aren’t well separated. The goal is to find the line that best separates the two distributions on the image on the right. To separate the two distributions, we could first try to maximize the distance between the projected means, meaning the distributions are, on average, as far as possible from each other. Let’s draw a line between the two means and plot the histogram of the projections onto that line.

https://miro.medium.com/v2/resize:fit:770/1*1inGh5D86sPK5Dce_XcbVA.png

That’s quite a bit better, but the projections of the data are not fully separated yet. To fully separate them, Fisher’s linear discriminant minimizes the within-class variance of the projections at the same time as maximizing the projections between the means. It tries to maximise the means as we discussed before to separate them, but also attempts to make the distributions as tight as possible. This allows for better separation as you’ll see below.

https://miro.medium.com/v2/resize:fit:875/1*Zj8OBbYnyBZZib8jquWkIA.png

As you can see the projections of the data are well separated. We can take an orthogonal vector from the weights vector to create a decision boundary. The decision boundary tells us that on either side of the boundary the data can be predicted to be one class or another. For multivariate gaussian distributions with identical covariance matrices, this yields an optimal classifier.

 

conclusion

Fisher's Linear Discriminant is a supervised learning classifier that aims to separate data based on their underlying distributions. It finds a vector that maximizes the separation between classes by maximizing the distance of the projected means and minimizing the projected within-class variance This step further tightens the distributions, allowing for even better separation between the classes. Once the projections are well-separated, an orthogonal vector can be used to create a decision boundary. This boundary helps in predicting the class of new data points on either side. Fisher's Linear Discriminant is particularly effective for multivariate Gaussian distributions with identical covariance matrices, as it yields an optimal classifier. By understanding and applying the principles of Fisher's Linear Discriminant, we can improve our ability to classify data based on their underlying distributions.

References: https://towardsdatascience.com/fishers-linear-discriminant-intuitively-explained-52a1ba79e1bb

ISME Student Doing internship with Hunnarvi Technologies Pvt Ltd under guidance of Nanobi data and analytics. Views are personal.

#FisherLinearDiscriminant #SupervisedLearning #DataClassification #SeparationOfDistributions #MaximizingMeans #MinimizingVariance #OptimalClassifier #DecisionBoundary #MultivariateGaussian #DataProjection #InternationalSchoolofManagementExcellence #NanobiDataandAnalytics #hunnarvi

Comments

Popular posts from this blog

Koala: A Dialogue Model for Academic Research