Commit 1aecef07 authored by Vialle Stephane's avatar Vialle Stephane
Browse files


parent 8ed1e076
Two parallel and optimized implementations of the k-means clustering algorithm:
1. k-means on CPU: thread parallelization using OpenMP, auto-vectorization using AVX units
2. k-means on GPU: using shared memory, dynamic parallelism, and multiple streams
In particular, for both implementations we use a two-step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids.
【Makefile Configuration】
- By commenting the "-DDP" option or not, our code supports computations either in single or double precision, respectively.
- The choice for the "--gpu-architecture" option should be updated according to your own GPU device.
- If necessary, update the CUDA path according to your own situation.
【"main.h" Configuration】
The configuration for benchmark dataset, block size, etc., are adjustable in the "main.h" file.
Our CUDA C code does not generate any synthetic data, so users should specify the path and filename of their benchmark dataset in the "INPUT_DATA" constant, and also give the NbPoints, NbDims, NbClusters. If users want to impose the initial centroids, they should provide a text file containing the coordinates of initial centroids and specifiy the corresponding path and filename in the "INPUT_INITIAL_CENTROIDS" constant.
The synthetic dataset used in our papers below is too large (about 1.8GB) to be loaded here. So we provide the instead. Since the generator uses the random function, the dataset generated each time will have different values but will always keep the same distribution.
Before execution, recompile the code by entering the "make" command if any change has been made to the code.
Then you can run the executable file "kmeans" with several arguments:
-t <GPU|CPU>: run computations on target GPU or on target CPU (default: GPU)
-cpu-nt <int>: number of OpenMP threads (default: 1)
-max-iters <int>: maximal number of iterations (default: 200)
-tol <float>: tolerance, i.e. convergence criterion (default: 1.0E-4)
k-means on CPU: ./kmeans -t CPU -cpu-nt 20
k-means on GPU: ./kmeans
【Corresponding papers】
The approaches and experiments are documented in the following two papers. The second paper is an extended version of the first paper.
He, G., Vialle, S., & Baboulin, M. (2021). Parallelization of the k-means algorithm in a spectral clustering chain on CPU-GPU platforms. In B. B. et al. (Ed.), Euro-par 2020: Parallel processing workshops (Vol. 12480, LNCS, pp. 135–147). Warsaw, Poland: Springer. Available from:
He, G., Vialle, S., & Baboulin, M. (2021, revised & resubmitted). Parallel and accurate k-means algorithm on CPU-GPU architectures for spectral clustering. Concurrency and Computation: Practice and Experience. Wiley.
If you find any part of this project useful for your scientific research, please cite the papers mentioned above.
The code is available under XXX license.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment