Commit 8912a677 authored by He Guanlin's avatar He Guanlin
Browse files

Update README.md

parent 75757084
## CPU-GPU-kmeans
Optimized parallel implementations of the k-means clustering algorithm:
1. on multi-core CPU with vector units: thread parallelization using OpenMP, auto-vectorization using AVX units
2. on NVIDIA GPU: using shared memory, dynamic parallelism, and multiple streams
1. **on multi-core CPU with vector units**: thread parallelization using OpenMP, auto-vectorization using AVX units
2. **on NVIDIA GPU**: using shared memory, dynamic parallelism, and multiple streams
In particular, for both implementations we use a two-step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids.
......@@ -11,7 +11,7 @@ In particular, for both implementations we use a two-step summation method with
- If necessary, update the CUDA path according to your own situation.
## "main.h" Configuration
The configuration for benchmark dataset, block size, etc., are adjustable in the "main.h" file.
The configuration for benchmark dataset, block size, etc., are adjustable in the _main.h_ file.
Our CUDA C code does not generate any synthetic data, so users should specify the path and filename of their benchmark dataset in the `INPUT_DATA` constant, and also give the `NbPoints`, `NbDims`, `NbClusters`. If users want to impose the initial centroids, they should provide a text file containing the coordinates of initial centroids and specifiy the corresponding path and filename in the `INPUT_INITIAL_CENTROIDS` constant.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment