Berkeley DB Reference Guide: Copying databases

Berkeley DB Reference Guide:
Programmer Notes

Copying databases

Because file identification cookies (for example, filenames, device and inode numbers, volume and file IDs, and so on) are not necessarily unique or maintained across system reboots, each Berkeley DB database file contains a 20-byte file identification bytestring that is stored in the first page of the database, starting with the 53rd byte on the page. When multiple processes or threads open the same database file in Berkeley DB, it is this bytestring that is used to ensure that the same underlying pages are updated in the shared memory buffer pool, no matter which Berkeley DB handle is used for the operation.

It is usually a bad idea to physically copy a database to a new name. In the few cases in which copying is the best solution for your application, you must guarantee that there are never two different databases with the same file identification bytestring in the memory pool at the same time. Copying databases is further complicated because the shared memory buffer pool does not discard cached database pages when the database is closed by calling the DB->close method, cached pages are only discarded when the database is removed by calling the DB->remove method.

Before copying a database, you must ensure that all modified pages have been written from the memory pool cache to the backing database file. This is done using the DB->sync or DB->close methods.

Before using a copy of a database from Berkeley DB, you must ensure that all pages from any database with the same bytestring have been removed from the memory pool cache. If the environment in which you intend to open the copy of the database potentially has pages from files with identical bytestrings to the copied database (which is likely to be the case), there are a few possible solutions:

Remove the environment, either explicitly or by calling DB_ENV->remove. Note that this will not allow you to access both the original and copy of the database at the same time.
Create a new file that will have a new bytestring. The simplest way to create a new file that will have a new bytestring is to call the db_dump utility to dump out the contents of the database and then use the db_load utility to load the dumped output into a new file. This allows you to access both the original and copy of the database at the same time.
If your database is too large to be copied, reset the bytestring in the copied database to a new bytestring. This allows you to access both the original and copy of the database at the same time. You can reset the bytestring with the -r flag to the db_load utility.

Berkeley DB Reference Guide:Programmer Notes

Copying databases

Berkeley DB Reference Guide:
Programmer Notes