• Recently, Kmeans algorithm was used in the project. Considering the limitation of CPU implementation speed, GPU acceleration is required, so I have checkedlibKMCUDALibrary.
  • Record some problems encountered during installation and use.

1. Introduction of KMCUDA

Project Address: KMCUDA

Large scale K-means and K-NN implementation on NVIDIA GPU/CUDA

See github for a full description of the project.

The performance is as follows:

Technically, the project is a shared library that exports the two functions defined in kMCUda.h: KMEASn_cuda and KNn_CUDa. It has built-in Python3 and R language native extension support, so you can import kmeans_CUDa or dyn.load(” libkmcuda.so “) from libKMCUDA.

2, installation,

Github provides the following installation commands:

git clone https://github.com/src-d/kmcuda
cd src
cmake -DCMAKE_BUILD_TYPE=Release . && make
Copy the code

There are a few parameters to note:

  • -d DISABLE_PYTHON: If you do not want to compile Python support modules, set the value to y, that is, add -d DISABLE_PYTHON=y

  • -d DISABLE_R: If you do not want to compile the R support module, add -d DISABLE_R=y

  • -d CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.0(change to your own path) : Add this item if CUDA cannot be found automatically

  • -d CUDA_ARCH=52: Specifies the GPU Compute Capability of the current machine.

  • GCC: as mentioned in the article, older versions of the GCC compiler are not supported. My current version is 5.4, which meets the requirements.

1. Query the GCC version

If the version is too early, you can install gCC-5.4. For details, see the following blog posts:

Install GCC in Linux

2. Query the GPU computing power

Query the GPU computing power of your GPU server on the NVIDIA official website. The address is as follows:

CUDA GPUs | NVIDIA Developer

The server I’m currently using is GeForce RTX 2070, which has 7.5 power. Therefore, CUDA_ARCH is set to 75, and -d CUDA_ARCH=75

3. Configure the GPU path

To automatically find the path to the related libraries, configure the CUDA path into the configuration file. The current shell used by the system is ZSH:

Add the following to ~/.zshrc:

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARAY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda:$CUDA_TOOLKIT_BOOT_DIR
export CUDA_INCLUDE_DIRS=/usr/local/cuda/include
Copy the code

Activation effect:

source ~/.zshrc

3. Complete installation commands

Current device parameters:

  • GCC: version 5.4
  • GPU computing power: 7.5
  • Only Python version support is required

Complete installation commands:

git clone https://github.com/src-d/kmcuda
cd src
cmake -DCMAKE_BUILD_TYPE=Release -D DISABLE_R-y -D CUDA_ARCH=75 . && make
Copy the code

Testing:

4. Problems encountered during installation

1. Install using PIP

The installation command is as follows:

CUDA_ARCH=75 pip install libKMCUDA

Error:

2. The GPU computing force is not specified or the default value is used

  • Install the source files using PIP

    Installation command:

    pip install git+https://github.com/src-d/kmcuda.git#subdirectory=src

    The following error occurs:

    The default value of -dcUDA_arch is 61, which is different from the actual value.

  • GPU computing power is not specified

    Installation command:

    cmake -DCMAKE_BUILD_TYPE=Release -D DISABLE_R=y . && make

    The following figure indicates that the installation is successful

When testing, the following error occurs:

A message is displayed indicating that the computing capability does not match the device.

Python test cases

1. K-means, L2 (Euclidean) distance

import numpy
from matplotlib import pyplot
from libKMCUDA import kmeans_cuda

numpy.random.seed(0)
arr = numpy.empty((10000.2), dtype=numpy.float32)
arr[:2500] = numpy.random.rand(2500.2) + [0.2]
arr[2500:5000] = numpy.random.rand(2500.2) -0.2]
arr[5000:7500] = numpy.random.rand(2500.2) + [2.0]
arr[7500:] = numpy.random.rand(2500.2) -2.0]
centroids, assignments = kmeans_cuda(arr, 4, verbosity=1, seed=3)
print(centroids)
pyplot.scatter(arr[:, 0], arr[:, 1], c=assignments)
pyplot.scatter(centroids[:, 0], centroids[:, 1], c="white", s=150)
pyplot.show()
Copy the code

2. K-means, angular (cosine) distance + average

import numpy
from matplotlib import pyplot
from libKMCUDA import kmeans_cuda

numpy.random.seed(0)
arr = numpy.empty((10000.2), dtype=numpy.float32)
angs = numpy.random.rand(10000) * 2 * numpy.pi
for i in range(10000):
    arr[i] = numpy.sin(angs[i]), numpy.cos(angs[i])
centroids, assignments, avg_distance = kmeans_cuda(
    arr, 4, metric="cos", verbosity=1, seed=3, average_distance=True)
print("Average distance between centroids and members:", avg_distance)
print(centroids)
pyplot.scatter(arr[:, 0], arr[:, 1], c=assignments)
pyplot.scatter(centroids[:, 0], centroids[:, 1], c="white", s=150)
pyplot.show()
Copy the code

The results are as follows:

6, Python API

1. kmeans_cuda()

Def kmeans_CUDa (samples, clusters, tolerance=0.01, init="k-means++", metric = yinyang_t = 0.1"L2", average_distance=False,
                seed=time(), device=0, verbosity=0)
Copy the code

Parameters:

  • samples: NUMpy pointer (raw Device pointer (int), device index (int);
    • Samples “samples” must be a 2D float32 or float16 NUMPY array
  • clusters: intType, number of clusters
    • Note: “Clusters” must be greater than 1 and less than (1 << 32) -1
  • tolerance:floatType, and the algorithm stops if the relative number of reallocations drops below this value.
  • init:stringOr numpy array, set the center of mass initialization method, which can bek-means++.afk-mc2.randomOr numpy array [cluster, character number] of the specified shape, type must befloat32
  • yinynag_t:floatType, usually set to 0.1
  • metric:strType, the name of the distance measure used. The default isDuclidean(L2), can be changed tocos. Note that in the latter case, the sample must be normalized.
  • average_distance:booleanType, which indicates whether the average distance between the element in the class and the corresponding centroid is computed, useful for finding optimal K, and returned as the third tuple element.
  • seed:intType of random generator seed used to reproduce results.
  • device:intType, CUDA device index, for example, 1 indicates the first device, 2 indicates the second device, and 3 indicates the use of the first and second devices. 0 indicates that all devices are enabled. The default value is 0.
  • verbosity:intType, 0 means no output at all, 1 means only progress is recorded, and 2 means a lot of output.

Return value: tuples (Centroids, Assignments, [average_distance]). If samples is a Numpy array even host pointer tuple, the type is a NUMpy array, otherwise, the original Pointers (integers) are allocated on the same device. If samples is Float16, the returned center of mass is also float16.

2. knn_cuda()

def knn_cuda(k, samples, centroids, assignments, metric="L2", device=0, verbosity=0)
Copy the code

Parameters:

  • K: INTEGER, the number of neighbors to search for each sample. Must be ≤ 116.

  • Samples: numpy array of shape [number of samples, number of features] or tuple(raw device pointer (int), device index (int), shape (tuple(number of samples, number of features[, fp16x2 marker]))). In the latter case, negative device index means host pointer. Optionally, the tuple can be 1 item longer with the preallocated device pointer for neighbors. dtype must be either float16 or convertible to float32.

  • Centroids: numpy array with precalculated clusters’ centroids (e.g., using K-means/kmcuda/kmeans_cuda()). dtype must match samples. If samples is a tuple then centroids must be a length-2 tuple, the first element is the pointer and the second is the number of clusters. The shape is (number of clusters, number of features).

  • Assignments: numpy array with sample-cluster associations. dtype is expected to be compatible with uint32. If samples is a tuple then assignments is a pointer. The shape is (number of samples,).

  • Metric: str, the name of the distance metric to use. The default is Euclidean (L2), it can be changed to “cos” to change the algorithm to Spherical K-means with the angular distance. Please note that samples must be normalized in the latter case.

  • Device: integer, bitwise OR-ed CUDA device indices, e.g. 1 means first device, 2 means second device, 3 means using first and second device. Special value 0 enables all available devices. The default is 0.

  • Verbosity: INTEGER, 0 means complete silence, 1 means mere Progress logging, 2 means lots of output.

The return value: neighbor indices. If samples was a numpy array or a host pointer tuple, the return type is numpy array, otherwise, a raw pointer (integer) allocated on the same device. The shape is (number of samples, k).