When using hashed bins, our configuration consists mostly in setting the number of desired bins. There is a trade-off between the size of each bin and the size of the snapshot metadata (see ticket #195: Optimize Snapshots). However, predicting the optimal number of bins is difficult (see ticket #137: Simplify hashed bins config). It is generally best to start with a smaller number of bins, and then grow it as the volume of hosted packages increases. This is an important task for the ongoing maintenance of repo over its lifecycle.
Basically, the process looks like this:
rugged pause-processing
)cp -r metadata/ metadata.bak
, or use git
)number_of_bins
)rugged update-hashed-bins-count
)rugged validate
)rugged resume-processing
)Do not skip Step 2!
The reason to take a backup is to ensure that you can rollback to a known good state.
Changing the number of hashed bins involves re-writing the entire repository. If the process is interrupted, it might result in corrupted data that may be difficiult to recover from (especially without a backup).
During Step 4 (rugged update-hashed-bins-count
), Rugged will:
targets.json
and bins.json
)snapshot.json
and timestamp.json
.