Python Tutorials: Hierarchical Clustering in Python

In this part of Learning Python we Cover Hierarchical Clustering Technique In Python
Written by Paayi Tech |17-Oct-2020 | 0 Comments | 1525 Views

Hierarchical clustering is another form unsupervised form learning. It is also called hierarchical cluster analysis. It is an algorithm that groups similar objects into groups called clusters. The endpoint is set of a cluster, where each cluster is distinct from each other cluster, the object within a cluster are similar to one another and have the minimum distance between them.

Hierarchical clustering builds a tree-based hierarchical taxonomy that is known as a dendrogram from the set of feature vector x.

Figure 1

A dendrogram is a tree that shows how clusters are merged/split hierarchically. In the dendrogram, each node on the tree is a cluster, and each leaf node is a singleton cluster.



There are two approaches to solve the hierarchical clustering:

  • Agglomerative Clustering Algorithm
  • Divisive Clustering Algorithm


Agglomerative Clustering Algorithm:

This is the bottom-up approach of a hierarchical clustering algorithm. Initially, all the data of feature vector x is a single cluster. Then it merges the two closest cluster by computing the distance matrix and repeat this procedure until only single clusters remain.


Divisive Clustering Algorithm:

It is the second approach and opposite to the first one, but the result of making a tree of a cluster is the same. It is a top-down approach. Initially, all the data is a single cluster and recursively split the data until only one feature as a cluster remains.

Figure 2


How Does It Work?

As mentioned above, each cluster is merged based on the distance matrix. However, there are many variations of finding the closest pair of clusters.

Closest Points between Clusters: The distance between two clusters is defined as the smallest distance between a pair of the data points within the clusters. This is also known as the single linkage.

Using the maximum similarity pair by the formula, we get:


59_Machine Learning (Hierarchical Clustering) 9.png

Figure 3

Single linkage clusters form thin and sleek clusters due to the chaining effect.


Farthest Point between Clusters: The distance between two clusters is defined as the most considerable distance between a pair of the data points within the clusters. This is also known as the complete linkage.

Figure 4

Using minimum similarity pairs by the formula :

paayi python tutorials formula

It makes the tighter and spherical clusters, which are preferable.



Average Pairwise distance Between Clusters: As its names indicate. The distance between two clusters is defined by the average distance between a pair of data points within the clusters.

  1. Ward's Method: The similarity of two clusters is based on the increase in squared error when two clusters are merged. It is similar to the group average if the distance between points is distance squared.

When to use When:

  • A single link can handle non-elliptical data but sensitive to noisy data.
  • The full link is less susceptible to noise and outliers but is biased toward globular structure and tends to break large clusters.
  • The group average is generalized for both single and complete. It is less susceptible to noise and outliers and is biased toward globular clusters.



Suppose we have a data set given as below:

x = {(0,0), (1,2), (2,1), (4,1), (5,0), (5,3)}



if we plot the data, it looks like this:

Figure 5

1st  we will make the distance matrix of all the given feature and will take with the minimum.

Suppose the distance of (1,2) and (2,1) will be (1.5, 1.5) and update the distance matrix.

Figure 6


Respectively we will find minimum value from the distance matrix again this time take (4,1) and (5,0)

The distance of both the points will be (4.5, 0.5), and a new cluster will be formed:

Figure 7

Again this procedure now we will take an average of (1,2),(2,1), and (0,0) the output will be (1, 1)

Figure 8


Figure 9

In the end, it will form one cluster, and the dendrogram will look like the following tree:

Figure 10

Use Cases:

  • Crime Location Identification
  • Document Classification
  • Search Optimization
  • Call Record Data analysis.


Advantage of Hierarchical Clustering:

  • Do not have to assume the number of clusters. Any desired number of a cluster can be obtained by cutting the dendrogram at a proper level.
  • It corresponds to better and meaningful taxonomies, which provide a better search function.


A disadvantage of Hierarchical Clustering:

  • High in time complexity
  • No objective function
  • Different schemes have different problems
    - sensitive to noise
    -biased toward globular cluster
    -breaking large clusters.
  • Once the decision is made to form two clusters, it cannot be undone. ​​​​​​



It can be improved by using it with other algorithms named below:

  • CURE (Clustering using Representative)
  • BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies )
  • ROCK (Robust Clustering using Links)
  • CHAMELEON Algorithm
  • Linkage Algorithm
  • Leaders-Subleaders
  • Bisecting K-Means



Hierarchical clustering is a method that seeks to build a hierarchy of clusters. It is majorly used in clustering like Google news, Amazon Search, etc. It is giving a high accuracy but with much more time complexity. It is a tradeoff between good accuracy to time complexity. The quality of a simple hierarchical clustering method suffers from its inability to perform an adjustment; once merge, it cannot be split again. However, one thing is that it can be improved by using the algorithms mentioned above. 


Practical Implementation of Hierarchical Clustering:

import matplotlib.pyplot as plt 

import NumPy as np 

import scipy.cluster.hierarchy as shc

from sklearn.cluster import AgglomerativeClustering



First of all, import all the modules. In this, we have imported Matplotlib to plot the data to know what clusters we are going to make. The NumPy is imported to convert the data into a NumPy array before feeding the data to the machine learning algorithm. Then we have imported hierarchy, and this module is actually to plot the dendrograms. These dendrograms will help us to know how many clusters we should make. We have already seen how dendrogram looks, but now we will use it on real data.

This module will make the model fit to hierarchical clustering.

import matplotlib.pyplot as plt

import pandas as pd 

import NumPy as np 

import scipy.cluster.hierarchy as shc

from sklearn.cluster import AgglomerativeClustering


 And now, we will initialize random data.

X = np.array([[5,3], 










The array is initialized as a NumPy array. We can see the data by plotting.




59_Machine Learning.png paayi tutorials

Figure 11

The data is too simple. We can see how many clusters can be made as the data is too much apart. But when the data is complex, we can't predict the number of clusters, so that we use the technique of dendrograms. The following is the method:

plt.figure(figsize=(10, 7)) 


d = shc.dendrogram(shc.linkage(X, method='ward'))


 The dendrogram looks like this:

59_Machine Learning2.png figure 12 python tutorials

Figure 12

59_Machine Learning graphs python tutorials

Figure 13

In Figure 13, we can see that the blue line is the most continuous line in all the clusters. From here, we can deduce that two clusters are the actual clusters.

Now to fit the model following is the method.

model = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward') 



The fir and predict first fit the model than predict the classes which in this case is as follows:

 array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])

Login/Sign Up


Related Posts

© Copyright 2020, All Rights Reserved.

This site uses cookies. By continuing to use this site or clicking "I Agree", you agree to the use of cookies. Read our cookies policy and privacy statement for more information.