README.md 4.32 KB
Newer Older
He Guanlin's avatar
He Guanlin committed
1
## CPU-GPU-kmeans
He Guanlin's avatar
He Guanlin committed
2
Optimized parallel implementations of the k-means clustering algorithm:
He Guanlin's avatar
He Guanlin committed
3 4
1. **on multi-core CPU with vector units**: thread parallelization using OpenMP, auto-vectorization using AVX units
2. **on NVIDIA GPU**: using shared memory, dynamic parallelism, and multiple streams
He Guanlin's avatar
He Guanlin committed
5

Vialle Stephane's avatar
Vialle Stephane committed
6 7
In particular, for both implementations we use a two-step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids.

He Guanlin's avatar
He Guanlin committed
8
## Makefile Configuration
He Guanlin's avatar
He Guanlin committed
9
- By commenting the `-DDP` option or not, our code supports computations either in single or double precision, respectively.
He Guanlin's avatar
He Guanlin committed
10
- The choices for the `-march` and `--gpu-architecture` options should be updated according to your own CPU and GPU devices, respectively.
Vialle Stephane's avatar
Vialle Stephane committed
11 12
- If necessary, update the CUDA path according to your own situation.

He Guanlin's avatar
He Guanlin committed
13
## "main.h" Configuration
He Guanlin's avatar
He Guanlin committed
14
The configuration for benchmark dataset, block size, etc., are adjustable in the _main.h_ file.
He Guanlin's avatar
He Guanlin committed
15

He Guanlin's avatar
He Guanlin committed
16 17 18
Our k-means code does NOT generate any synthetic data, so your need to give the path and filename of your benchmark dataset in the `INPUT_DATA` constant, and also specifiy the `NbPoints`, `NbDims`, `NbClusters`. 

Optionally, if you want to impose initial centroids, you need to provide a text file and specifiy the corresponding path and filename in the `INPUT_INITIAL_CENTROIDS` constant. Otherwise, the initial centroids will be selected uniformly at random.
He Guanlin's avatar
He Guanlin committed
19

He Guanlin's avatar
He Guanlin committed
20 21
## Benchmark Datasets
We tested our code on one synthetic dataset created by our own and two real-world datasets downloaded from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). Each of them contains millions of instances, hence is too large to be loaded here. Instead we provide the _Synthetic_Data_Generator.py_, and describe the filtering operations on real-world datasets.
Vialle Stephane's avatar
Vialle Stephane committed
22
- **Synthetic dataset** (our dataset). It contains 50 million instances uniformly distributed in 4 convex clusters. Each instance has 4 dimensions. Since the _Synthetic_Data_Generator.py_ uses the `random` function, the dataset generated each time will have different values but will always keep the same distribution.
He Guanlin's avatar
He Guanlin committed
23
- [**Household power consumption dataset**](https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption) (UCI Machine Learning Repository). It contains 2,075,259 measurements of electric power consumption in a household over a period
He Guanlin's avatar
He Guanlin committed
24 25 26
of nearly 4 years. Each measurement has 9 attributes. We remove the measurements containing missing values and also remove the first 2
attributes that record the date and time of measurements. The remaining set that we use for evaluation contains 2,049,280 measurements
with 7 numerical attributes.
He Guanlin's avatar
He Guanlin committed
27
- [**US census 1990 dataset**](https://archive.ics.uci.edu/ml/datasets/US+Census+Data+(1990)) (UCI Machine Learning Repository). It contains 2,458,285 instances with 68 categorical attributes. It is a simplified and discretized version of the USCensus1990raw dataset which contains one percent sample drawn from the full 1990 US census data.
Vialle Stephane's avatar
Vialle Stephane committed
28

He Guanlin's avatar
He Guanlin committed
29
## Execution
He Guanlin's avatar
He Guanlin committed
30
Before execution, recompile the code by entering the `make` command if any change has been made to the code.
He Guanlin's avatar
He Guanlin committed
31

He Guanlin's avatar
He Guanlin committed
32
Then you can run the executable file _kmeans_ with several arguments:
He Guanlin's avatar
He Guanlin committed
33

He Guanlin's avatar
He Guanlin committed
34 35 36 37
- `-t <GPU|CPU>`: run computations on target GPU or on target CPU (default: GPU)
- `-cpu-nt <int>`: number of OpenMP threads (default: 1)
- `-max-iters <int>`: maximal number of iterations (default: 200)
- `-tol <float>`: tolerance, i.e. convergence criterion (default: 1.0E-4)
Vialle Stephane's avatar
Vialle Stephane committed
38

He Guanlin's avatar
He Guanlin committed
39
Examples:
He Guanlin's avatar
He Guanlin committed
40 41 42 43 44 45 46 47
- k-means on CPU: 
```
./kmeans -t CPU -cpu-nt 20
```
- k-means on GPU: 
```
./kmeans
```
Vialle Stephane's avatar
Vialle Stephane committed
48

He Guanlin's avatar
He Guanlin committed
49 50
## Corresponding papers
The approaches and experiments are documented in the following papers.
Vialle Stephane's avatar
Vialle Stephane committed
51 52 53

He, G., Vialle, S., & Baboulin, M. (2021). Parallelization of the k-means algorithm in a spectral clustering chain on CPU-GPU platforms. In B. B. et al. (Ed.), Euro-par 2020: Parallel processing workshops (Vol. 12480, LNCS, pp. 135–147). Warsaw, Poland: Springer. Available from: https://link.springer.com/chapter/10.1007/978-3-030-71593-9_11

He Guanlin's avatar
He Guanlin committed
54
He, G, Vialle, S, Baboulin, M. Parallel and accurate k-means algorithm on CPU-GPU architectures for spectral clustering. Concurrency Computat Pract Exper. 2021;e6621. Available from: http://doi.org/10.1002/cpe.6621 
He Guanlin's avatar
He Guanlin committed
55 56

If you find any part of this project useful for your scientific research, please cite the papers mentioned above.
Vialle Stephane's avatar
Vialle Stephane committed
57