The Fréchet distance mathematically formalizes a notion of similarity or distance for curves. It is similar to the well-known Hausdorff distance for sets, but it in contrast to the Hausdorff distance, it takes the ordering of the points along the curves into account. One can intuitively define the distance measure as follows. Imagine walking forwards along the two curves simultaneously with varying speeds. The maximum distance between the two paired points on the curves over the entire walk is called the cost of the walk. The Fréchet distance is the minimal cost of a walk that traverses both curves from beginning to end without backtracking along the curves.
Clustering is a fundamental task in data analysis. We study efficient algorithms for center-based clustering under the Fréchet distance with the additional requirement that the center curves should be of low complexity. We study the computational complexity of this problem. We show that the problem (with or without this restriction) is NP-hard to approximate within a constant factor. Forthermore, we study probabilistic (1+eps)-approximation algorithms and efficient heuristics in practice. An important problem variant is subtrajectory clustering, where we are interested in an optimal subdivision of the curves into a set of shorter subcurves that admits a clustering.
Data structures for proximity searching are well-studied in computational geometry. Given a set of input elements from a metric space, the task is to preprocess them into a data structure that supports (approximate) nearest neighbor queries. We study this problem for polygonal curves where the metric is given by the Fréchet distance. We are interested in lower bounds on the tradeoffs between space complexity, query time, and approximation factor of attainable data structures, as well as upper bounds that match these lower bounds in the important problem parameters, such as number of input curves, complexity of the input curves, and the complexity of supported query curves. An interesting and important problem variant is concerned with range searching. Here, a range is a subset of a metric space that is specified with the query. A range query may asks for certain statistics derived from the set of input curves contained in the query range, for example, the number of input curves contained in the range.