KGraph
 All Classes Namespaces Functions Pages
KGraph: A Library for Approximate Nearest Neighbor Search

Introduction

KGraph is a library for k-nearest neighbor (k-NN) graph construction and online k-NN search using a k-NN Graph as index. KGraph implements heuristic algorithms that are extremely generic and fast:

For best generality, the C++ API should be used. A python wrapper is provided under the module name pykgraph, which supports Euclidean and Angular distances on rows of NumPy matrices.

Building and Installation

KGraph depends on a recent version of GCC with C++11 support, cmake and the Boost library. The package can be built and installed with ```sh cmake -DCMAKE_BUILD_TYPE=release . make sudo make install ```

A Makefile.plain is also provided in case cmake is not available.

The Python API can be installed with ``` python setup.py install ```

Python Quick Start

```python from numpy import random import pykgraph

dataset = random.rand(1000000, 16) query = random.rand(1000, 16)

index = pykgraph.KGraph(dataset, 'euclidean') index.build(reverse=-1) # index.save("index_file");

load with index.load("index_file");

knn = index.search(query, K=10) # this uses all CPU threads knn = index.search(query, K=10, threads=1) # one thread, slower knn = index.search(query, K=1000, P=100) # search for 1000-nn, no need to recompute index. ```

Both index.build and index.search supports a number of optional keywords arguments to fine tune the performance. The default values should work reasonably well for many datasets. One exception is that reverse=-1 should be added if the purpose of building index is to speedup search, which is the typical case, rather than to obtain the k-NN graph itself.

Two precautions should be taken:

For performance considerations, the Python API does not support user-defined similarity function, as the callback function is invoked in such a high frequency that, if written in Python, speedup will inevitably be brought down. For the full generality, the C++ API should be used.

C++ Quick Start

The KGraph C++ API is based on two central concepts: the index oracle and the search oracle. (Oracle is a fancy way of calling a user-defined callback function that behaves like a black box.) KGraph works solely with object IDs from 0 to N-1, and relies on the oracles to map the IDs to actual data objects and computes the similarity. To use KGraph, the user has to extend the following two abstract classes

```cpp class IndexOracle { public: // returns size of dataset virtual unsigned size () const = 0; // computes similarity of object i and j virtual float operator () (unsigned i, unsigned j) const = 0; };

class SearchOracle { public: /// Returns the size of the dataset. virtual unsigned size () const = 0; /// Computes similarity of query and object i. virtual float operator () (unsigned i) const = 0; }; ```

With the oracle classes defined, index construction and online search become straightfoward:

```cpp

KGraph *index = KGraph::create();

if (need_to_create_new_index) { MyIndexOracle oracle(...); // subclass of kgraph::IndexOracle KGraph::IndexParams params; params.reverse = -1; index->build(oracle, params); index->save("some_path"); } else { index->load("some_path"); }

MySearchOracle oracle(...); // subclass of kgraph::SearchOracle

KGraph::SearchParams params; params.K = K; vector<unsigned> knn(K); // to save K-NN ids. index->search(oracle, params, &knn[0]);

delete index; ``` Doxygen documentation

http://www.kgraph.org/