Commit f436d14c authored by He Guanlin's avatar He Guanlin
Browse files

Update README.md

parent 43f4baa8
......@@ -13,7 +13,9 @@ In particular, for both implementations we use a two-step summation method with
## "main.h" Configuration
The configuration for benchmark dataset, block size, etc., are adjustable in the _main.h_ file.
Our k-means code does NOT generate any synthetic data, so your need to give the path and filename of your benchmark dataset in the `INPUT_DATA` constant, and also specifiy the `NbPoints`, `NbDims`, `NbClusters`. If you want to impose initial centroids, you need to provide a text file and specifiy the corresponding path and filename in the `INPUT_INITIAL_CENTROIDS` constant.
Our k-means code does NOT generate any synthetic data, so your need to give the path and filename of your benchmark dataset in the `INPUT_DATA` constant, and also specifiy the `NbPoints`, `NbDims`, `NbClusters`.
Optionally, if you want to impose initial centroids, you need to provide a text file and specifiy the corresponding path and filename in the `INPUT_INITIAL_CENTROIDS` constant. Otherwise, the initial centroids will be selected uniformly at random.
## Benchmark Datasets
We tested our code on one synthetic dataset created by our own and two real-world datasets downloaded from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php). Each of them contains millions of instances, hence is too large to be loaded here. Instead we provide the _Synthetic_Data_Generator.py_, and describe the filtering operations on real-world datasets.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment