What is Classification in Machine Learning

Classification is a core task in machine learning where the goal is to predict which category or class a data point belongs to, based on labeled training data. Some common examples include:

It’s considered a type of supervised learning, since the model learns from data that is already labeled with the correct classes. This is in contrast to unsupervised learning where the data has no pre-existing labels.

Key concepts in classification

To understand how classification works, there are a few important concepts to know.


The input data used to make predictions is represented by features. Features are the measurable properties or attributes of the data points. For example, if classifying emails, the features might include the length of the email, the presence of certain keywords, and attributes of the sender.


Labels are the categories or classes we want to predict. For instance, in a binary classification problem, there are two possible labels, like yes/no, true/false, or spam/not spam. In multi-class problems, there can be three or more potential labels.

Training data

The model learns to make predictions using training data – a dataset with many examples where the features and correct label are already known. In other words, it finds patterns in how the features map to the labels.


Once trained, the model can take in the features for a new, unlabeled data point and output a predicted label. Moreover, its goal is to accurately generalize from the training data to make correct predictions on new data.

Types of classification algorithms

There are many different machine learning algorithms, which we can use for classification, each with their own strengths and weaknesses.

Logistic regression

Logistic regression predicts the probability of a data point belonging to a certain class. It works well when the relationship between features and labels is roughly linear.

Decision trees

Decision trees learn a series of if/else questions to ask about the features that lead to a predicted label. They can learn non-linear relationships but may overfit the training data.

K-nearest neighbors

K-nearest neighbors makes a prediction by finding the most similar labeled data points to a new point and having them “vote” on the label. It’s simple but can be slow with a large dataset.

Neural networks

Neural networks can learn very complex non-linear relationships between features and labels. They require a lot of training data and can be computationally expensive but achieve state-of-the-art performance on many problems.

Evaluating classification models

An important aspect of building classification models is evaluating their performance to see how well they generalize to new data.

Some standard evaluation metrics include

  • Accuracy: what percent of predictions were correct?
  • Precision: what percent of positive predictions were actually correct?
  • Recall: what percent of actual positives were correctly predicted?
  • F1 score: the harmonic mean of precision and recall

It’s also useful to look at the confusion matrix, which shows a breakdown of correct and incorrect predictions for each class.

Improving classification models

If a classification model isn’t performing well, there are a number of things you can try to improve it:

  • Adding more training data, especially for classes that are underrepresented or have poor performance
  • Selecting more informative features or constructing new features
  • Choosing a different classification algorithm that may be a better fit for the data
  • Tuning the hyperparameters of the algorithm, like the k in k-nearest neighbors or the depth of a decision tree
  • Using ensemble methods that combine predictions from multiple models

Example: Classifying iris flowers

To illustrate these concepts, let’s walk through an example of building a classification model to identify species of iris flowers based on measurements of their petals and sepals. This is a classic dataset used for demonstrating machine learning methods.

The data

The iris dataset contains measurements in centimeters of the sepal length, sepal width, petal length, and petal width for 150 iris flowers, 50 each of three different species – setosa, versicolor, and virginica.

Here is a sample of the data:

Sepal LengthSepal WidthPetal LengthPetal WidthSpecies

In this case, the sepal and petal measurements are the features, and the species is the label we want to predict.

Choosing and training a model

For this example, let’s use a decision tree classifier. We’ll also use the scikit-learn library in Python to train the model.

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

iris = load_iris()
X = 
y =

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

dt = DecisionTreeClassifier(), y_train)

This code loads the iris data, splits it into training and testing sets, initializes a DecisionTreeClassifier, and trains it on the training data.

Making predictions

Now we can use the trained model to make predictions on the held-out test data:

y_pred = dt.predict(X_test)

The model takes in the sepal and petal measurements for the test set flowers and outputs its predicted species for each one.

Evaluating the model

Finally, we can evaluate how well the model performed.

from sklearn.metrics import accuracy_score, confusion_matrix

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.3f}')

cm = confusion_matrix(y_test, y_pred)
print('Confusion matrix:')

This prints out the accuracy of the model and the confusion matrix. In this case, with an accuracy of 0.967, the model correctly predicted the species for over 96% of the test set flowers!


Classification is generally a powerful application of machine learning for predicting categories based on patterns learned from labeled data. Moreover, there are many different classification algorithms to choose from, and the one that performs best will depend on the specific characteristics of your data and problem.

The most important things are to have informative features that relate to the labels, an appropriate amount of training data, and a way to properly evaluate your models. Furthermore by iterating on these pieces, you can build classification models to tackle a wide variety of real-world problems, from detecting credit card fraud to diagnosing diseases.

Share this article:

Related posts