Commit 7ba486d0 authored by He Guanlin's avatar He Guanlin
Browse files

Update README.md

parent d25ac316
......@@ -6,14 +6,14 @@ Optimized parallel implementations of the k-means clustering algorithm:
In particular, for both implementations we use a two-step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids.
## Makefile Configuration
- By commenting the "-DDP" option or not, our code supports computations either in single or double precision, respectively.
- The choice for the "--gpu-architecture" option should be updated according to your own GPU device.
- By commenting the `-DDP` option or not, our code supports computations either in single or double precision, respectively.
- The choice for the `--gpu-architecture` option should be updated according to your own GPU device.
- If necessary, update the CUDA path according to your own situation.
## "main.h" Configuration
The configuration for benchmark dataset, block size, etc., are adjustable in the "main.h" file.
Our CUDA C code does not generate any synthetic data, so users should specify the path and filename of their benchmark dataset in the "INPUT_DATA" constant, and also give the NbPoints, NbDims, NbClusters. If users want to impose the initial centroids, they should provide a text file containing the coordinates of initial centroids and specifiy the corresponding path and filename in the "INPUT_INITIAL_CENTROIDS" constant.
Our CUDA C code does not generate any synthetic data, so users should specify the path and filename of their benchmark dataset in the `INPUT_DATA` constant, and also give the `NbPoints`, `NbDims`, `NbClusters`. If users want to impose the initial centroids, they should provide a text file containing the coordinates of initial centroids and specifiy the corresponding path and filename in the `INPUT_INITIAL_CENTROIDS` constant.
The synthetic dataset used in our papers below is too large (about 1.8GB) to be loaded here. So we provide the Synthetic_Data_Generator.py instead. Since the generator uses the random function, the dataset generated each time will have different values but will always keep the same distribution.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment