README.md 2.7 KB
Newer Older
He Guanlin's avatar
He Guanlin committed
1
## CPU-GPU-kmeans
He Guanlin's avatar
He Guanlin committed
2
3
Optimized parallel implementations of the k-means clustering algorithm:
1. on multi-core CPU with vector units: thread parallelization using OpenMP, auto-vectorization using AVX units
He Guanlin's avatar
He Guanlin committed
4
5
2. on NVIDIA GPU: using shared memory, dynamic parallelism, and multiple streams

Vialle Stephane's avatar
Vialle Stephane committed
6
7
In particular, for both implementations we use a two-step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids.

He Guanlin's avatar
He Guanlin committed
8
## Makefile Configuration
Vialle Stephane's avatar
Vialle Stephane committed
9
10
11
12
- By commenting the "-DDP" option or not, our code supports computations either in single or double precision, respectively.
- The choice for the "--gpu-architecture" option should be updated according to your own GPU device.
- If necessary, update the CUDA path according to your own situation.

He Guanlin's avatar
He Guanlin committed
13
## "main.h" Configuration
Vialle Stephane's avatar
Vialle Stephane committed
14
The configuration for benchmark dataset, block size, etc., are adjustable in the "main.h" file.
He Guanlin's avatar
He Guanlin committed
15

Vialle Stephane's avatar
Vialle Stephane committed
16
Our CUDA C code does not generate any synthetic data, so users should specify the path and filename of their benchmark dataset in the "INPUT_DATA" constant, and also give the NbPoints, NbDims, NbClusters. If users want to impose the initial centroids, they should provide a text file containing the coordinates of initial centroids and specifiy the corresponding path and filename in the "INPUT_INITIAL_CENTROIDS" constant.
He Guanlin's avatar
He Guanlin committed
17

Vialle Stephane's avatar
Vialle Stephane committed
18
19
The synthetic dataset used in our papers below is too large (about 1.8GB) to be loaded here. So we provide the Synthetic_Data_Generator.py instead. Since the generator uses the random function, the dataset generated each time will have different values but will always keep the same distribution.

He Guanlin's avatar
He Guanlin committed
20
## Execution
Vialle Stephane's avatar
Vialle Stephane committed
21
Before execution, recompile the code by entering the "make" command if any change has been made to the code.
He Guanlin's avatar
He Guanlin committed
22

Vialle Stephane's avatar
Vialle Stephane committed
23
Then you can run the executable file "kmeans" with several arguments:
He Guanlin's avatar
He Guanlin committed
24

He Guanlin's avatar
He Guanlin committed
25
26
27
28
- -t <GPU|CPU>: run computations on target GPU or on target CPU (default: GPU)
- -cpu-nt <int>: number of OpenMP threads (default: 1)
- -max-iters <int>: maximal number of iterations (default: 200)
- -tol <float>: tolerance, i.e. convergence criterion (default: 1.0E-4)
Vialle Stephane's avatar
Vialle Stephane committed
29

He Guanlin's avatar
He Guanlin committed
30
Example: <br>
He Guanlin's avatar
He Guanlin committed
31
32
- k-means on CPU: ./kmeans -t CPU -cpu-nt 20
- k-means on GPU: ./kmeans
Vialle Stephane's avatar
Vialle Stephane committed
33

He Guanlin's avatar
He Guanlin committed
34
## Corresponding papers
He Guanlin's avatar
He Guanlin committed
35
The approaches and experiments are documented in the following paper.
Vialle Stephane's avatar
Vialle Stephane committed
36
37
38

He, G., Vialle, S., & Baboulin, M. (2021). Parallelization of the k-means algorithm in a spectral clustering chain on CPU-GPU platforms. In B. B. et al. (Ed.), Euro-par 2020: Parallel processing workshops (Vol. 12480, LNCS, pp. 135–147). Warsaw, Poland: Springer. Available from: https://link.springer.com/chapter/10.1007/978-3-030-71593-9_11

He Guanlin's avatar
He Guanlin committed
39
If you find any part of this project useful for your scientific research, please cite the paper mentioned above.
Vialle Stephane's avatar
Vialle Stephane committed
40