• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Wednesday, March 11, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

K-Means Cluster Evaluation with Silhouette Analysis

Josh by Josh
November 26, 2025
in Al, Analytics and Automation
0


In this article, you will learn how to evaluate k-means clustering results using silhouette analysis and interpret both average and per-cluster scores to guide model choices.

Topics we will cover include:

  • What the silhouette score measures and how to compute it
  • How to use silhouette analysis to pick a reasonable number of clusters
  • Visualizing per-sample silhouettes to diagnose cluster quality

Here’s how it works.

K-Means Cluster Evaluation Silhouette Analysis

K-Means Cluster Evaluation with Silhouette Analysis
Image by Editor

Introduction

Clustering models in machine learning must be assessed by how well they separate data into meaningful groups with distinctive characteristics. One of the key metrics for evaluating the internal cohesion and mutual separation of clusters produced by iterative algorithms like k-means is the silhouette score, which quantifies how similar an object — a data instance i — is to its own cluster compared to other clusters.

READ ALSO

A better method for planning complex visual tasks | MIT News

Google AI Introduces Gemini Embedding 2: A Multimodal Embedding Model that Lets Your Bring Text, Images, Video, Audio, and Docs into the Embedding Space

This article focuses on how to evaluate and interpret cluster quality through silhouette analysis, that is, an analysis of cluster structure and validity based on disciplined use of the silhouette metric. Silhouette analysis has practical implications in real-world segmentation tasks across marketing, pharmaceuticals, chemical engineering, and more.

Understanding the Silhouette Metric

Given a data point or instance i in a dataset that has been partitioned into k clusters, its silhouette score is defined as:

\[ s(i) = \frac{b(i) – a(i)}{\max\{a(i), b(i)\}} \]

In the formula, a(i) is the intra-cluster cohesion, that is, the average distance between i and the rest of the points in the cluster it belongs to. Meanwhile, b(i) is the inter-cluster separation, namely, the average distance between i and the points in the closest neighboring cluster.

The silhouette score ranges from −1 to 1. Lower a(i) and higher b(i) values contribute to a higher silhouette score, which is interpreted as higher-quality clustering, with points strongly tied to their cluster and well separated from other clusters. In sum, the higher the silhouette score, the better.

In practice, we typically compute the average silhouette score across all instances to summarize cluster quality for a given solution.

The silhouette score is widely used to evaluate cluster quality in diverse datasets and domains because it captures both cohesion and separation. It is also useful, as an alternative or a supplement to the Elbow Method, for selecting an appropriate number of clusters k — a necessary step when applying iterative methods like k-means and its variants.

Additionally, the silhouette score doubles as an insightful visual aid when you plot individual and cluster-level silhouettes, with bar widths reflecting cluster sizes. The following example shows silhouettes for every instance in a dataset partitioned into three clusters, grouping silhouettes by cluster to facilitate comparison with the overall average silhouette for that clustering solution.

Example visualization of silhouette scores

Example visualization of silhouette scores
Image by Author

On the downside, silhouette analysis may be less reliable for certain datasets and cluster shapes (e.g., non-convex or intricately shaped clusters) and can be challenging in very high-dimensional spaces.

Silhouette Analysis in Action: The Penguins Dataset

To illustrate cluster evaluation using silhouette analysis, we will use the well-known Palmer Archipelago penguins dataset, specifically the version freely available here.

We quickly walk through the preparatory steps (loading and preprocessing), which are explained in detail in this introductory cluster analysis tutorial. We will use pandas, scikit-learn, Matplotlib, and NumPy.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

import pandas as pd

from sklearn.preprocessing import StandardScaler

from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score, silhouette_samples

import matplotlib.pyplot as plt

import numpy as np

 

# Load dataset (replace with actual path or URL)

penguins = pd.read_csv(‘https://raw.githubusercontent.com/gakudo-ai/open-datasets/refs/heads/main/penguins.csv’)

penguins = penguins.dropna()

 

features = [‘bill_length_mm’, ‘bill_depth_mm’, ‘flipper_length_mm’, ‘body_mass_g’]

X = penguins[features]

 

# Scale numerical features for more effective clustering

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

Next, we apply k-means to find clusters in the dataset. We repeat this process for multiple values of the number of clusters k (the n_clusters parameter), ranging from 2 to 6. For each setting, we calculate the silhouette score.

range_n_clusters = list(range(2, 7))

silhouette_avgs = []

 

for n_clusters in range_n_clusters:

    kmeans = KMeans(n_clusters=n_clusters, n_init=10, random_state=42)

    cluster_labels = kmeans.fit_predict(X_scaled)

    sil_avg = silhouette_score(X_scaled, cluster_labels)

    silhouette_avgs.append(sil_avg)

    print(f“For n_clusters = {n_clusters}, average silhouette_score = {sil_avg:.3f}”)

The resulting output is:

For n_clusters = 2, average silhouette_score = 0.531

For n_clusters = 3, average silhouette_score = 0.446

For n_clusters = 4, average silhouette_score = 0.419

For n_clusters = 5, average silhouette_score = 0.405

For n_clusters = 6, average silhouette_score = 0.392

This suggests that the highest silhouette score is obtained for k = 2. This usually indicates the most coherent grouping of the data points, although it does not always match biological or domain ground truth.

In the penguins dataset, although there are three species with distinct traits, repeated k-means clustering and silhouette analysis indicate that partitioning the data into two groups can be more consistent in the chosen feature space. This can happen because silhouette analysis reflects geometric separability in the selected features (here, four numeric attributes) rather than categorical labels; overlapping traits among species may lead k-means to favor fewer clusters than the actual number of species.

Let’s visualize the silhouette results for all five configurations:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

fig, axes = plt.subplots(1, len(range_n_clusters), figsize=(25, 5), sharey=False)

 

for i, n_clusters in enumerate(range_n_clusters):

    ax = axes[i]

 

    kmeans = KMeans(n_clusters=n_clusters, n_init=10, random_state=42)

    labels = kmeans.fit_predict(X_scaled)

    sil_vals = silhouette_samples(X_scaled, labels)

    sil_avg = silhouette_score(X_scaled, labels)

 

    y_lower = 10

    for j in range(n_clusters):

        ith_sil_vals = sil_vals[labels == j]

        ith_sil_vals.sort()

        size_j = ith_sil_vals.shape[0]

        y_upper = y_lower + size_j

        color = plt.cm.nipy_spectral(float(j) / n_clusters)

        ax.fill_betweenx(np.arange(y_lower, y_upper),

                         0, ith_sil_vals,

                         facecolor=color, edgecolor=color, alpha=0.7)

        ax.text(–0.05, y_lower + 0.5 * size_j, str(j))

        y_lower = y_upper + 10  # separation between clusters

 

    ax.set_title(f“Silhouette Plot for k = {n_clusters}”)

    ax.axvline(x=sil_avg, color=“red”, linestyle=“–“)

    ax.set_xlabel(“Silhouette Coefficient”)

    if i == 0:

        ax.set_ylabel(“Cluster Label”)

    ax.set_xlim([–0.1, 1])

    ax.set_ylim([0, len(X_scaled) + (n_clusters + 1) * 10])

 

plt.tight_layout()

plt.show()

Silhouette plots for multiple k-means configurations on the Penguins dataset

Silhouette plots for multiple k-means configurations on the Penguins dataset
Image by Author

One clear observation is that for k ≥ 4 the average silhouette score drops to roughly 0.4, while it is higher for k = 2 or k = 3.

What if we consider a different (narrower) subset of attributes for clustering? For instance, consider only bill length and flipper length. This is as simple as replacing the feature selection statement near the start of the code with:

features = [‘bill_length_mm’, ‘flipper_length_mm’]

Then rerun the rest. Try different feature selections prior to clustering and check whether the silhouette analysis results remain similar or vary for some choices of the number of clusters.

Wrapping Up

This article provided a concise, practical understanding of a standard cluster-quality metric for clustering algorithms: the silhouette score, and showed how to use it to analyze clustering results critically.

K-Means Cluster Evaluation with Silhouette Analysis

K-means cluster evaluation with silhouette analysis in six easy steps (click to enlarge)



Source_link

Related Posts

A better method for planning complex visual tasks | MIT News
Al, Analytics and Automation

A better method for planning complex visual tasks | MIT News

March 11, 2026
Google AI Introduces Gemini Embedding 2: A Multimodal Embedding Model that Lets Your Bring Text, Images, Video, Audio, and Docs into the Embedding Space
Al, Analytics and Automation

Google AI Introduces Gemini Embedding 2: A Multimodal Embedding Model that Lets Your Bring Text, Images, Video, Audio, and Docs into the Embedding Space

March 11, 2026
AI Is Learning From the News. Now Publishers Want to Get Paid
Al, Analytics and Automation

AI Is Learning From the News. Now Publishers Want to Get Paid

March 11, 2026
3 Questions: Building predictive models to characterize tumor progression | MIT News
Al, Analytics and Automation

3 Questions: Building predictive models to characterize tumor progression | MIT News

March 10, 2026
Al, Analytics and Automation

How to Build a Risk-Aware AI Agent with Internal Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Reliable Decision-Making

March 10, 2026
marvn.ai and the rise of vertical AI search engines
Al, Analytics and Automation

marvn.ai and the rise of vertical AI search engines

March 10, 2026
Next Post
13 Best AI Automation Tools to Increase Productivity & Efficiency

13 Best AI Automation Tools to Increase Productivity & Efficiency

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

Attribution Breakdowns vs. Compare Attribution Settings Explained

Attribution Breakdowns vs. Compare Attribution Settings Explained

January 19, 2026
Here Are My Retreat Reflections

Here Are My Retreat Reflections

June 7, 2025
Trump says he’s found a buyer for TikTok

Trump says he’s found a buyer for TikTok

June 29, 2025
Microsoft says it will no longer use engineers in China for Department of Defense work

Microsoft says it will no longer use engineers in China for Department of Defense work

July 20, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Precision Circuit Board Flux Removal with Dry Ice
  • Manufact raises $6.3M as MCP becomes the ‘USB-C for AI’ powering ChatGPT and Claude apps
  • How Your Campaign Name Serves as Your North Star
  • Web Development Cost in 2026: Complete Pricing Guide
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions