GitHub _1
发布时间:2025-07-26 09:59:26

Helper Utility for Generating Command Line Syntax

This project is a library for creating perfect hash tables from 32-bit key sets. It is based on the acyclic random 2-part hypergraph algorithm. Its primary goal is finding the fastest possible runtime solution.

It is geared toward offline table generation: a command line application is used to generate a small C library that implements an routine, which, given an input key, will return the order-preserved index of that key within the original key set, e.g.:

    uint32_t ix;
    uint32_t key = 0x20190903;

    ix = Index(key);

This allows the efficient implementation of key-value tables, e.g.:

    extern uint32_t Table[];

    uint32_t
    Lookup(uint32_t Key)
    {
        return Table[Index(Key)]
    };

    void
    Insert(uint32_t Key, uint32_t Value)
    {
        Table[Index(Key)] = Value;
    }

    void
    Delete(uint32_t Key)
    {
        Table[Index(Key)] = 0;
    }

The fastest routine is the MultiplyShiftRX routine, which clocks in at about 6 cycles in practice, and boils down to something like this:

extern uint32_t Assigned[];

uint32_t
Index(uint32_t Key)
{
    uint32_t Vertex1;
    uint32_t Vertex2;

    Vertex1 = ((Key * Seed1) >> Shift);
    Vertex2 = ((Key * Seed2) >> Shift);

    return ((Assigned[Vertex1] + Assigned[Vertex2]) & IndexMask);
}

N.B. , , , and will all be literal constants in the final source code, not variables.

This compiles down to:

lea     r8, ptr [rip+0x23fc0]
imul    edx, ecx, 0xe8d9cdf9
shr     rdx, 0x10
movzx   eax, word ptr [r8+rdx*2]
imul    edx, ecx, 0xc2e3c0b7
shr     rdx, 0x10
movzx   ecx, word ptr [r8+rdx*2]
add     eax, ecx
ret

The IACA profile reports 8 uops:

The "cost" behind the perfect hash table is the array. The size of this array will be the number of keys, rounded up to a power of two, and then doubled. E.g. has 31,016 keys. Rounded up to a power of two is 32,768, then doubled: 65,336.

The data type used by the array is the smallest C data type that can hold the number of keys rounded up to a power of two. Thus, a 16-bit can be used for the array:

unsigned short int Assigned[65336] = { ... };

Thus, will be 131,072, or 128KB.

The routine will perform two memory lookups into this array per call. No pointer chasing or indirection is required. The most frequent keys will have both locations in L1 cache; the worst-case scenario is two memory lookups for both locations for cold or infrequent keys.

Initially, all development on this project was done exclusively on Windows, with Visual Studio 2022. Linux support has recently been added, although it is still quite rough around the edges. For some context: about 1,700 hours have been spent on the Windows portion. The Linux support was hacked together in about two weeks of evening and weekend work.

The generated compiled perfect hash tables are cross-platform, and will work on Windows, Mac, Linux, x86, x64, and ARM64.

Windows

The file lives in . You can either build this directly via Visual Studio, use one of the files, or just use from a Visual Studio 2022 command prompt:

You can also download the latest binaries from the Releases page. The zip files refer to profile-guided optimization builds, and are generally faster than the builds by up to 30-40%.

Once built or downloaded, there are two main command line executables: , and . The former is for creating a single table, and it takes a single input key file. The latter can be pointed at a directory of keys, and it will create tables for all of them.

Linux

Prerequisites: C compiler (GCC 10 tested), CMake. Optional: Ninja.

For normal Makefile support:

Mac

I haven't tried to build it on Mac yet. It'll require a bit of hacking, I assume.

The usage options are almost identical for both programs. If you run either one without arguments, it will print detailed usage instructions, also available here.

The main usage follows:

Assuming you have built a version of the library, from a Visual Studio 2022 command prompt (i.e. so the compiler is available in your ):

On Linux this would look like:

This should result in some output that looks like this:

N.B. Console output isn't supported on Linux yet.

N.B. If you get an error like this, it means couldn't be found on your path; make sure to launch a Visual Studio 2022 command prompt:

If you look in , you'll see something along the following lines. Intermediate build files have been omitted for brevity.

The main library implementing the perfect hash table is the file. There are three helper utilities also compiled for benchmarking and testing. You can assess the performance of just the routine via: :

On Windows, the output will look like this:

The exit code, 7106 in this case, is the minimum number of cycles it took, out of 1000 attempts, to do 1000 calls to the routine. So, you can divide by 1000 to get the approximate number of cycles per call, in this case, about 7.

The executable returns the minimum number of cycles it took, out of 100 attempts, to do 10 iterations of the following:

  • For each key, call .
  • For each key, call .
  • For each key, call .

The files are compiled with special versions of each routine (i.e. ), so you can call on them to get an analysis of the generated code, e.g.:

Although note that this isn't an exact science, sometimes the compiler reorders the and markers such that you end up missing a couple of instructions in the analysis. In the example above, the actual assembly for the routine is as follows:

To compile the hash table on Linux (using WSL1 and GCC 9 as an example):

With clang (version 10):

Identical to Linux, except you don't need :

平台注册入口