This page:
Test suites:
Popular generators: |
## What is Romu?Romu is a new family of random-number generators that match the statistical quality (randomness) of the best popular generators,
but are faster. In fact, due to the processor’s ILP (explained below), Romu generators add Such speed and high quality are possible because Romu generators are nonlinear, unlike most popular generators. A Romu generator combines the linear operation of multiplication with the nonlinear operation of rotation, yielding better randomness at higher speed. Romu stands for You can learn all the details about Romu generators in the paper. Although this is a formal academic paper, the math in it isn’t overwhelming because it’s at the level of advanced algebra. The math derives equations for probabilities of trouble, such as the snowflake probability described next. ## One Snowflake in HistoryAbove, we mentioned that Romu generators are nonlinear. There is a reason people have avoided nonlinear generators in the past: Unlike a linear generator, a nonlinear generator does not have one cycle that covers all possible internal state-values. Instead, it gives you a random permutation of those values. Random permutations are known to consist of multiple cycles of random lengths. In the past, when generators stored their state in one 32-bit variable, these random cycle-lengths would cause users to sometimes encounter cycles that were too short. That problem eliminated nonlinear generators from consideration. But nowadays, processors can easily handle 64-bit variables, and having two or three such variables
in the generator’s state is no problem, which makes encountering a too-short cycle all but impossible.
For example, if you gave RomuTrio (with 192 bits of state)
the impossibly large job of outputting 2 Some people might insist that a generator have one cycle with a guaranteed period (which eliminates nonlinear generators). That is unwise because they will lose the substantial benefit of higher speed in order to avoid a will-never-happen problem. The chance of randomly selecting one snowflake in Earth’s history and the Queen is nil. Losing a benefit to avoid that nil is unwise. The snowflake-and-Queen probability tells us that the problem of too-short cycles has been solved, and now we can enjoy the increased speed that nonlinear generators can provide. ## Top QualityThe Romu family has unusually good randomness, which is verified by passing the toughest test suites: BigCrush and PractRand. Both are brutal to random number generators, especially PractRand (which is newer). All Romu random number generators using 64-bit arithmetic pass both suites with no signs of stress. RomuDuoJr is the worst such generator, and even it passes easily. It’s easier to appreciate the improved randomness of Romu generators using dot-plots of consecutive pairs of random numbers. Below, the left plot is from an LCG (linear congruential generator), which is historically the most common. The right plot is from a simple Romu generator. To make their patterns obvious, both generators were scaled down to 10 bits of state and run for their entire periods. The nonlinear Romu is obviously more random than the linear LCG.
## FastThere are several reasons why Romu generators are faster than others: - Their nonlinear arithmetic requires fewer operations to attain a given level of randomness.
- That improved randomness means that post-process scrambling of output is not needed, making Romu even faster.
- They make good use of the processor’s ILP.
ILP stands for ```
```
The computations in this function will consume only 3 clock cycles in today’s PC CPUs. Here’s what happens in each of those clock cycles:
The multiply consumes 3 cycles, which is why the function consumes 3 cycles. The “Multiplier” column on the right shows when the multiplier is busy. Note that the other calculations are done at the same time as the multiply (in parallel). That is ILP in action. Romu generators are designed to take maximum advantage of this processor feature, making them very fast. But that’s not the only way that ILP helps. Notice that the function returns the old value of ## Capacity
Unless we missed one in the literature, it appears that the capacities of only three families of generators have been measured by scaled-down testing: Romu, PCG, and the Blackman-Vigna generators. Blackman has kindly supplied us with capacity estimates of their best generators (found in the paper). Thus, these are the only generator-families that will give you confidence for substantial jobs. If you run a medium-size or larger job with any other kind of generator, you will not know whether you will exceed its capacity. Huge jobs are divided into several thousand streams of random numbers, all running in parallel for weeks or months.
Each stream can consist of up to 2 ## The Romu GeneratorsRomu generators use 64- or 32-bit arithmetic. The source-code for all of these is here. Here are the 64-bit generators. All of them were tested in PractRand to 2 **RomuQuad**- This is more than anyone could need. It can easily handle the largest jobs and has the lowest probabilities of trouble. But it’s a little slower than RomuTrio below because it uses more registers in the CPU.**RomuTrio**- This is the generator we recommend most highly. It is very fast, has no problem with the largest jobs, and has inconceivably low one-snowflake-in-history probabilities of poor results, as discussed above. Try this one first.**RomuDuo**- With 128 bits of state, this smaller model might be a little faster than RomuTrio due to using fewer registers. It is adequate for large jobs while giving low probabilities of trouble.**RomuDuoJr**- This is the fastest Romu generator using 64-bit math because it uses the fewest registers and consists of only three arithmetic operations: rotate, multiply, and subtract. But its capacity is the lowest, so it’s recommended only for jobs up to its estimated capacity of 2^{48}values, or perhaps less in order to leave some safety margin.
Here are the 32-bit generators. Use these only if your processor lacks fast 64-bit instructions. **RomuQuad32**- This is a half-size version of RomuQuad, so we recommend it for general purpose use.**RomuTrio32**- This is a half-size version of RomuTrio, and it’s suitable for all but the largest jobs.**RomuMono32**- This little jewel updates its 32-bit state with only two arithmetic instructions. Its capacity is 2^{26}values, constraining it to small jobs. Note that it outputs 16-bit values, not 32-bit. But this generator has an unusual feature: Even though it’s nonlinear, its period is known and is nearly 2^{32}, so there is no chance of encountering a short period.
Romu generators are forgiving about seeding. For example, in multiple runs, it’s fine to set one state variable to a run-counter (1,2,3,...) and to set the remaining variables to zero. Romu generators quickly escape from an initial state that’s nearly all zeros, but it doesn’t hurt to call the generator a few times to scramble its state so that the first few values used in the job will be random. However, to guarantee that multiple streams are uncorrelated, it’s prudent to seed the state variables with a different kind of random number generator. We recommend SplitMix64 (by Vigna) and SplitMix32. The source-code of both are in appendices in the paper. ## The Competition
## More InformationThe paper has all the details, including source-code. It gives you the details of how testing and extrapolation were done, how constants were determined, plus the underlying mathematical theory of multi-cycle nonlinear generators, including equations for probabilities. Source-code is also available here. The Contact link at the upper-right corner will tell you how to contact me both privately and through social media. I welcome thoughtful comments and suggestions. Updated 2020-4-8 |