Introduction to unsupervised learning

Machine learning is a technique where an algorithm learns hidden patterns from labeled or unlabeled data. I am emphasizing two words here: labeled and unlabeled. Because they express the meaning of supervised and unsupervised learning. If a machine learns patterns from unlabeled data, it is called unsupervised learning. If it learns from labeled data, it is called supervised learning. In this post, we will be discussing unsupervised learning.

First, we have to understand what unlabeled data is. Let’s take an example here: a collection of people’s data including their height, weight, sleep hours, sugar level, and anxiety level. Here, we do not have any labels like “does this person have anxiety or not.” In this dataset, an unsupervised algorithm will find hidden patterns between these features.

Let’s learn through code. I will be using the kaggle dataset here link

We have this data from kaggle. Then we drop all the columns except two columns

newdata = df[['weekly_self_study_hours', 'math_score']]

Then we perform k means clustering on the above modified data

from scipy.cluster.vq import whiten, kmeans, kmeans2,vq


data = newdata.values 
# Set the number of clusters
data = whiten(data)
k = 5

# Calculate the centroids and cluster assignments
centroids, _ = kmeans(data, k)

# Calculate cluster assignments for each data point
cluster_assignments, _ = kmeans2(data, centroids)

cluster_labels, _ = vq(data, centroids)

# Add the cluster labels back to the dataframe
newdata['cluster'] = cluster_labels

print(newdata)

So we get the above clusters according to their distance

In this post I have discussed the aspect of unsupervised learning. This post is not about K means clustering algorithm

Introduction to unsupervised learning

Understanding the Foundations and Applications of unsupervised learning through small project