Berkeley DB Reference Guide: Btree prefix comparison

Berkeley DB Reference Guide:
Access Methods

Btree prefix comparison

The Berkeley DB Btree implementation maximizes the number of keys that can be stored on an internal page by storing only as many bytes of each key as are necessary to distinguish it from adjacent keys. The prefix comparison routine is what determines this minimum number of bytes (that is, the length of the unique prefix), that must be stored. A prefix comparison function for the Btree can be specified by calling DB->set_bt_prefix.

The prefix comparison routine must be compatible with the overall comparison function of the Btree, since what distinguishes any two keys depends entirely on the function used to compare them. This means that if a prefix comparison routine is specified by the application, a compatible overall comparison routine must also have been specified.

Prefix comparison routines are passed pointers to keys as arguments. The keys are represented as DBT structures. The only fields the routines may examine in the DBT structures are data and size fields.

The prefix comparison function must return the number of bytes necessary to distinguish the two keys. If the keys are identical (equal and equal in length), the length should be returned. If the keys are equal up to the smaller of the two lengths, then the length of the smaller key plus 1 should be returned.

An example prefix comparison routine follows:

u_int32_t
compare_prefix(dbp, a, b)
	DB *dbp;
	const DBT *a, *b;
{
	size_t cnt, len;
	u_int8_t *p1, *p2;

	cnt = 1;
	len = a->size > b->size ? b->size : a->size;
	for (p1 =
		a->data, p2 = b->data; len--; ++p1, ++p2, ++cnt)
			if (*p1 != *p2)
				return (cnt);
	/*
	 * They match up to the smaller of the two sizes.
	 * Collate the longer after the shorter.
	 */
	if (a->size < b->size)
		return (a->size + 1);
	if (b->size < a->size)
		return (b->size + 1);
	return (b->size);
}

The usefulness of this functionality is data-dependent, but in some data sets can produce significantly reduced tree sizes and faster search times.

Berkeley DB Reference Guide:Access Methods

Btree prefix comparison

Berkeley DB Reference Guide:
Access Methods