K Means Algorithm

11 Oct 2017

Introduction

K-Means++ is a common algorithm in clustering. It’s an essential algorithm in data mining and Machine Learning. K-Means++ is origin from K-Means algorithm. This post discuss the nature of K-Means and K-Means++, and the reason why K-Means++ is better.

K-Means Algorithm

K-Means or K-Means++ are algorithm for clustering. Given a number of candidates, and given a specific number of groups, K-Means will calculate the similarity of each points and clustering them into specific number of groups.

Random initialization
Clustering
Repeat the previous steps for many times and take the best result

Sample execution

The choice of initialization point differs

With different initialization point, K-Means algorithm can have very different execution result.

K-Means++ Algorithm

Initialization phase
- Choose the first center at random
- Choose next center with probability proportional on D^2(x)
Iterate phase
- iterate as in k-means algorithm until convergence

Code source

You can also refer my github code source for K-Means++ real data example.