README.md 2.96 KB
Newer Older
He Guanlin's avatar
He Guanlin committed
1
## CPU-GPU-kmeans
He Guanlin's avatar
He Guanlin committed
2

He Guanlin's avatar
He Guanlin committed
3
4
Optimized parallel implementations of the k-means clustering algorithm:
1. on multi-core CPU with vector units: thread parallelization using OpenMP, auto-vectorization using AVX units
He Guanlin's avatar
He Guanlin committed
5
6
2. on NVIDIA GPU: using shared memory, dynamic parallelism, and multiple streams

Vialle Stephane's avatar
Vialle Stephane committed
7
8
In particular, for both implementations we use a two-step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids.

He Guanlin's avatar
He Guanlin committed
9
## Makefile Configuration
Vialle Stephane's avatar
Vialle Stephane committed
10
11
12
13
- By commenting the "-DDP" option or not, our code supports computations either in single or double precision, respectively.
- The choice for the "--gpu-architecture" option should be updated according to your own GPU device.
- If necessary, update the CUDA path according to your own situation.

He Guanlin's avatar
He Guanlin committed
14
## "main.h" Configuration
Vialle Stephane's avatar
Vialle Stephane committed
15
16
17
18
The configuration for benchmark dataset, block size, etc., are adjustable in the "main.h" file.
Our CUDA C code does not generate any synthetic data, so users should specify the path and filename of their benchmark dataset in the "INPUT_DATA" constant, and also give the NbPoints, NbDims, NbClusters. If users want to impose the initial centroids, they should provide a text file containing the coordinates of initial centroids and specifiy the corresponding path and filename in the "INPUT_INITIAL_CENTROIDS" constant.
The synthetic dataset used in our papers below is too large (about 1.8GB) to be loaded here. So we provide the Synthetic_Data_Generator.py instead. Since the generator uses the random function, the dataset generated each time will have different values but will always keep the same distribution.

He Guanlin's avatar
He Guanlin committed
19
## Execution
Vialle Stephane's avatar
Vialle Stephane committed
20
21
22
23
24
25
26
27
28
29
30
Before execution, recompile the code by entering the "make" command if any change has been made to the code.
Then you can run the executable file "kmeans" with several arguments:
-t <GPU|CPU>: run computations on target GPU or on target CPU (default: GPU)
-cpu-nt <int>: number of OpenMP threads (default: 1)
-max-iters <int>: maximal number of iterations (default: 200)
-tol <float>: tolerance, i.e. convergence criterion (default: 1.0E-4)

Example:
k-means on CPU: ./kmeans -t CPU -cpu-nt 20
k-means on GPU: ./kmeans

He Guanlin's avatar
He Guanlin committed
31
## Corresponding papers
Vialle Stephane's avatar
Vialle Stephane committed
32
33
34
35
36
37
38
39
The approaches and experiments are documented in the following two papers. The second paper is an extended version of the first paper.

He, G., Vialle, S., & Baboulin, M. (2021). Parallelization of the k-means algorithm in a spectral clustering chain on CPU-GPU platforms. In B. B. et al. (Ed.), Euro-par 2020: Parallel processing workshops (Vol. 12480, LNCS, pp. 135–147). Warsaw, Poland: Springer. Available from: https://link.springer.com/chapter/10.1007/978-3-030-71593-9_11

He, G., Vialle, S., & Baboulin, M. (2021, revised & resubmitted). Parallel and accurate k-means algorithm on CPU-GPU architectures for spectral clustering. Concurrency and Computation: Practice and Experience. Wiley.

If you find any part of this project useful for your scientific research, please cite the papers mentioned above.