Clustering and Big Data



Clustering is an unsupervised learning tool. Clustering has been studied for decades, with the first and in some cases still popular clustering algorithms emerging over fifty years ago. Nevertheless, new challenges have kept the area active and made it one of the topical and growing fields today. Most prominently, one of these challenges is the rise of big data applications. Standard clustering methods do usually not scale well to scenarios where we assume to have polylogarithmic or even constant amounts of space. But also new aspects as privacy and fairness lead to new and exciting research problems.

We thus study clustering problems under different new aspects, including:

  • clustering of big data sets
  • clustering under side constraints
  • clustering with hierarchies

Usually, we aim for approximation algorithms with constant or even arbitrarily good approximation guarantees. However, when developing or analyzing practical algorithms, we are also interested in relaxed statements, e.g. algorithms with a higher approximation guarantee than optimally possible, or algorithms which perform very good under data niceness assumptions.


  • Clemens Rösner, Melanie Schmidt: Privacy preserving clustering with constraints. CoRR abs/1802.02497, 2018.
  • Euiwoong Lee, Melanie Schmidt, John Wright: Improved and simplified inapproximability for k-means. Information Processing Letters 120: 40-43, 2017. Also see CoRR abs/1509.00916.
  • Anupam Gupta, Guru Guruganesh, Melanie Schmidt: Approximation Algorithms for Aversion k-Clustering via Local k-Median. ICALP 2016: 66:1-66:13, 2016.

Page Tools