## CPU-GPU-kmeans Optimized parallel implementations of the k-means clustering algorithm: 1. on multi-core CPU with vector units: thread parallelization using OpenMP, auto-vectorization using AVX units 2. on NVIDIA GPU: using shared memory, dynamic parallelism, and multiple streams In particular, for both implementations we use a two-step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids. ## Makefile Configuration - By commenting the `-DDP` option or not, our code supports computations either in single or double precision, respectively. - The choice for the `--gpu-architecture` option should be updated according to your own GPU device. - If necessary, update the CUDA path according to your own situation. ## "main.h" Configuration The configuration for benchmark dataset, block size, etc., are adjustable in the "main.h" file. Our CUDA C code does not generate any synthetic data, so users should specify the path and filename of their benchmark dataset in the `INPUT_DATA` constant, and also give the `NbPoints`, `NbDims`, `NbClusters`. If users want to impose the initial centroids, they should provide a text file containing the coordinates of initial centroids and specifiy the corresponding path and filename in the `INPUT_INITIAL_CENTROIDS` constant. The synthetic dataset used in our papers below is too large (about 1.8GB) to be loaded here. So we provide the Synthetic_Data_Generator.py instead. Since the generator uses the random function, the dataset generated each time will have different values but will always keep the same distribution. ## Execution Before execution, recompile the code by entering the `make` command if any change has been made to the code. Then you can run the executable file "kmeans" with several arguments: - `-t `: run computations on target GPU or on target CPU (default: GPU) - `-cpu-nt `: number of OpenMP threads (default: 1) - `-max-iters `: maximal number of iterations (default: 200) - `-tol `: tolerance, i.e. convergence criterion (default: 1.0E-4) Examples: - k-means on CPU: ``` ./kmeans -t CPU -cpu-nt 20 ``` - k-means on GPU: ``` ./kmeans ``` ## Corresponding papers The approaches and experiments are documented in the following paper. He, G., Vialle, S., & Baboulin, M. (2021). Parallelization of the k-means algorithm in a spectral clustering chain on CPU-GPU platforms. In B. B. et al. (Ed.), Euro-par 2020: Parallel processing workshops (Vol. 12480, LNCS, pp. 135–147). Warsaw, Poland: Springer. Available from: https://link.springer.com/chapter/10.1007/978-3-030-71593-9_11 If you find any part of this project useful for your scientific research, please cite the paper mentioned above.