MA-INF 1319 - Cluster Analysis


Date of the 2nd exam: The re-exams will take place on the 12th of March.

Things that are NOT relevant for the exam: Chapter 2.7 (outliers) will not be discussed in the oral exams. The same is true for proofs that were deferred to the tutorials, and tutorial questions. These are there to improve the general understanding of the material, but they will not be discussed in the oral exams.

Exams: The exams are oral exams. We offer times for oral exams on the 4th, 6th and 8th of February. After enrolling for the exam in BASIS, please contact Christiane Andrade to schedule the time of your exam.

Lecture Notes: The lecture notes have last been updated on the 23th of January and cover the whole lecture!

Note: If you attend the lecture and have not done so yet, please send me a short email to this email address. Thank you!

General Information

The content of this lecture is the theoretical analysis of approximation algorithms for cluster analysis, in particular covering the following areas:

  • Approximation algorithms for k-center, k-median and k-means
  • Different techniques for approximation algorithms, including ILP-based techniques and local search
  • Clustering of Big Data and in Data Streams
  • Analysis of common clustering heuristics
  • Practically efficient methods with theoretical guarantees


When Where Start Lecturer
Tuesday, 10:15-11:45 INF / Room 2.078 October 09th Schmidt


When Where Start Lecturer
Wednesday, 12:30-14:00 INF / Room 2.050 October 17th Rösner
Friday, 14:00-15:30 INF / Room 2.050 October 19th Rösner

Problem Sets

  • Problem Set 1 (hand in until October 16th, to be discussed October 17th and October 19th)
  • Problem Set 2 (hand in until October 23th, to be discussed October 24th and October 26th)
  • Problem Set 3 (hand in until October 30th, to be discussed October 31st and November 2nd)
  • Problem Set 4 (hand in until November 6th, to be discussed November 7th and November 9th)
  • November 14th and November 16th tutorials were for repetition and questions and without a new exercise sheet
  • Problem Set 5 (hand in until November 20th, to be discussed November 21st and November 23rd)
  • Problem Set 6 (hand in until November 27th, to be discussed November 28th and November 30th)
  • Problem Set 7 (hand in until December 4th, to be discussed December 5th and December 7th)
  • Problem Set 8 (hand in until December 11th, to be discussed December 12th and December 14th)
  • Problem Set 9 (hand in until December 18th, to be discussed December 19th and December 21th)
  • Problem Set 10 (hand in until January 8th, to be discussed January 9th and January 11th)
  • Problem Set 11 (hand in until January 15th, to be discussed January 16th and January 18th)
  • Problem Set 12 (hand in until January 22nd, to be discussed January 23th and January 25th)


October 09 1 Introduction
2 The happy world of k-center
2.1 Definition
2.2 A simple and elegant 2-approximation
October 16 2.3 A matching lower bound
2.4 Incremental and hierarchical clustering
October 23 2.4 continued: Proof
2.5 Another elegant 2-approximation
October 30 2.6 A streaming algorithm for k-center
November 6 –cancelled due to illness–
November 13 2.7 The k-center problem with outliers
2.8 Fair k-center
November 20 2.8 Fair k-center (ctd)
3 The exciting world of k-means
3.1 Definition
3.2 Lloyd's algorithm
November 27 3.3 The k-means++ algorithm
3.3.1 D2-sampling as a bicriteria approximation
December 4 3.3.2 A glimpse on the analysis of k-means++
3.4 Dimensionality reduction
December 11 3.4.1 The Johnson-Lindenstrauss Lemma
December 18 3.4.2 The Singular Value Decomposition
January 8 3.5 A Coreset Construction
January 15 3.5 A Coreset Construction (ctd)
3.6 Merge&Reduce
January 22 3.6 Merge&Reduce (ctd)


The lecture notes cover the content of the lecture.

Page Tools