How to Change the Number of Hashed Bins

When using hashed bins, our configuration consists mostly in setting the number of desired bins. There is a trade-off between the size of each bin and the size of the snapshot metadata (see ticket #195: Optimize Snapshots). However, predicting the optimal number of bins is difficult (see ticket #137: Simplify hashed bins config). It is generally best to start with a smaller number of bins, and then grow it as the volume of hosted packages increases. This is an important task for the ongoing maintenance of repo over its lifecycle.

Basically, the process looks like this:

  1. Pause the automated functions in the repository (rugged pause-processing)
  2. Take a backup of existing repository metadata (eg. cp -r metadata/ metadata.bak, or use git)
  3. Update config to reflect desired # of bins (number_of_bins)
  4. trigger hashed bin update (rugged update-hashed-bins-count)
  5. Ensure that the repository is consistent (rugged validate)
  6. Resume the automated functions in the repository (rugged resume-processing)

Backups

Do not skip Step 2!

The reason to take a backup is to ensure that you can rollback to a known good state.

Changing the number of hashed bins involves re-writing the entire repository. If the process is interrupted, it might result in corrupted data that may be difficiult to recover from (especially without a backup).

Internals

During Step 4 (rugged update-hashed-bins-count), Rugged will:

  1. check that the repository is paused.
  2. confirm the current number of bins
  3. confirm the desired number of bins
  4. generate the new bins (including updates to targets.json and bins.json)
  5. drain the old bins (into the new ones)
  6. trigger updates to snapshot.json and timestamp.json.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Rugged TUF Server is a trademark of Consensus Enterprises.