How can we generate a uniform sequence of random numbers? The randomness so beautifully and abundantly generated by nature has not always been easy for humans to extract and quantify.
Pseudo-Random Number Generators are fundamental tools in many areas of software development: PRNGs do not produce truly random numbers but rather deterministic sequences that simulate randomness. The quality of a PRNG is determined by its ability to produce sequences that are statistically indistinguishable from true random sequences for practical purposes.
Let’s take a short trip around the evolution of PRNGs from their origins to the state-of-the-art algorithms. We will delve into the practical applications of PRNGs in modern software development, using the Java programming language as a guide. From simulations to game development, we’ll explore how PRNGs are crucial in these contexts.
Randomness vs. Pseudo-Randomness
In the realm of software development, the choice between true randomness and PRNGs often comes down to a delicate balance between unpredictability and reproducibility. While true randomness is crucial in security-critical applications, it’s typically slower and harder to generate reliably in software environments. On the other hand, PRNGs, while not truly random, offer a level of predictability that’s ideal for simulations, game development, and more.
By contrast, pseudo-randomness, generated through different algorithms, is deterministic and repeatable, making it ideal for simulations, game development, procedural content generation, statistical sampling and more. In these cases, the controlled predictability of pseudo-randomness ensures that results are consistent and debuggable, which is crucial for software testing and iterative design.
The Origins of PRNGs
Let’s take a step back in time to the early 20th century, when the seeds of PRNGs were first sown. In 1946, John von Neumann introduced the middle-square method, one of the earliest techniques to generate pseudo-random numbers. This method, which involved squaring a number and extracting the middle digits as the next number in the sequence, was a pioneering step in the evolution of PRNGs.
In the 1950s and 1960s, the linear congruential generator (LCG) emerged as a popular PRNG due to its simplicity and ease of implementation. An LCG generates the next number in the sequence using a simple formula:
LCGs were widely adopted due to their straightforward design and implementation. However, they have several limitations, such as short periods, predictability, and poor statistical properties. But wait, what are these metrics?
Theoretical Foundations and the new PRNGs
Researchers have developed more sophisticated algorithms based on deeper mathematical principles to address the limitations of early PRNGs. Key theoretical concepts that guide PRNG development include:
- Periodicity
-
The period of a PRNG is the length of the sequence before it begins to repeat. A long period is desirable to minimize the likelihood of repetition in practical applications.
- Uniform Distribution
-
A good PRNG should generate uniformly distributed numbers across its output range. (A uniform distribution is one where all possible outcomes are finite and equally likely to occur)
- Statistical Independence
-
The output of a PRNG should exhibit no discernible patterns or correlations, appearing statistically independent from one another.
The Mersenne Twister, introduced by Makoto Matsumoto and Takuji Nishimura in 1997, became a popular PRNG due to its long period (2^19937 - 1) and strong statistical properties. Despite its advantages, the Mersenne Twister also has limitations, such as slow initialization and relatively high memory usage.
The site https://www.random.org/, created in 1998, was the first Randomness-as-a-Service provider of free genuinely random numbers. It now offers mobile apps for true random coin flipping, dice rolling, card shuffling, and more.
Modern PRNG Algorithms
Modern PRNGs have evolved to meet the demands of contemporary computing environments, which require high-speed generation, low memory usage, and improved statistical properties. These elements are side effects of the adoption of Cloud and Constrained Resources Systems.
Recent notable contributions include algorithms by Sebastiano Vigna, Melissa O’Neill, and Daniel Lemire, but more incredible authors are engaged in an ever-active research field.
Xorshift and xoshiro/xoroshiro Family
The Xorshift generators, proposed by George Marsaglia in 2003 and later refined by Sebastiano Vigna, use a combination of bitwise XOR operations and shifts to generate pseudo-random numbers. While xorshift is fast and efficient, it has some weaknesses in certain statistical tests. To address these issues, Vigna developed the xoshiro (extended xorshift-rotations) and xoroshiro (xorshift-rotate) families, which provide better statistical properties and are optimized for modern CPU architectures.
public class Xoshiro256 {
private long[] state = new long[4];
public Xoshiro256(long seed) {
state[0] = seed;
state[1] = seed ^ 0x9E3779B97F4A7C15L;
state[2] = seed ^ 0x3C6EF372FE94F82AL;
state[3] = seed ^ 0xA54FF53A5F1D36F1L;
}
public long next() {
long result = state[0] + state[3];
long t = state[1] << 17;
state[2] ^= state[0];
state[3] ^= state[1];
state[1] ^= state[2];
state[0] ^= state[3];
state[2] ^= t;
state[3] = Long.rotateLeft(state[3], 45);
return result;
}
}
PCG (Permuted Congruential Generator) Family
The PCG family, introduced by Melissa O’Neill in 2014, is based on a simple linear congruential generator but applies additional permutation functions to improve its statistical properties. PCG generators are known for their simplicity, efficiency, and excellent statistical performance, offering a good balance of speed and randomness.
public class PCG32 {
private long state;
private long inc;
public PCG32(long seed, long seq) {
state = seed;
inc = (seq << 1) | 1;
}
public int next() {
long oldState = state;
state = oldState * 6364136223846793005L + inc;
int xorShifted = (int)(((oldState >>> 18) ^ oldState) >>> 27);
int rot = (int)(oldState >>> 59);
return (xorShifted >>> rot) | (xorShifted << ((-rot) & 31));
}
}
LXM (Linear and Multiplicative with XOR)
Sebastiano Vigna also developed the LXM family of generators, which combines linear and multiplicative congruential generators with XOR operations. The LXM generators offer excellent statistical properties and long periods, making it a compelling choice for applications requiring high-quality randomness.
This new family of generators are introduced in Java Language, together with a refactoring of the randomness commodities area, thanks to JEP 356
public class LXM {
private long state;
private final long multiplier = 0x5DEECE66DL;
public LXM(long seed) {
state = (seed ^ multiplier) & ((1L << 48) - 1);
}
public int next() {
state = (state * multiplier + 0xBL) & ((1L << 48) - 1);
return (int)(state >>> 16) ^ (int)(state);
}
}
Testing the new Generators
Daniel Lemire has made significant contributions to the study of PRNGs, particularly in the context of non-cryptographic applications where performance and statistical reliability are crucial. He has conducted extensive testing on well-known PRNGs, such as PCG, SplitMix64, and various Xoroshiro generators, to evaluate their performance across standard randomness tests like PractRand and L’Ecuyer’s TestU01. His results emphasize that while some PRNGs, including PCG32 and SplitMix64, generally pass these rigorous tests, others, like Xoroshiro128+, fail under specific conditions. This analysis has provided developers with valuable insights into which generators to use depending on their stability and performance requirements.
Importance of PRNGs in Modern Software
PRNGs play a crucial role in various areas of software development, directly impacting the performance, correctness, and reliability of a wide range of applications:
- Scientific Simulations
-
PRNGs are essential for Monte Carlo simulations, statistical sampling, and other methods that rely on randomness to model complex systems.
- Games and Procedural Generation
-
PRNGs are used to generate dynamic game environments, procedural levels, randomized AI behaviour, and unpredictable game mechanics.
- Machine Learning
-
High-quality random numbers are required for randomized algorithms, neural network weight initialization, data shuffling, and testing procedures.
- Randomized Algorithms
-
Many algorithms use randomness to achieve better average performance or handle input data uncertainty. In some cases, these algorithms are related to Probabilistic Data Structures used in more modern systems
- Distributed System Stability
-
Some applications as for Randomized Backoff and Randomized Retry Delays in Circuit Breaker Pattern, Consistent Hashing, or Load Balancing and Sharding
At the time of this writing, another technical branch related to PRNGs is to enhance the algorithms using SIMD (Single Instruction Multiple Data) and Vectorized API obtaining an interesting increase in production speed and optimizations at the compilation stage.
Comparison of PRNGs
Generator | Pros | Cons | Period | Speed | Statistical Quality |
---|---|---|---|---|---|
Linear Congruential |
Simple, easy to implement |
Short period, poor high-dimensional properties, predictable |
Up to 232 or 264 |
Fast |
Moderate |
Mersenne Twister |
Long period, good statistical properties |
High memory usage, slow initialization, not suitable for all applications |
219937 - 1 |
Moderate |
High |
xorshift |
Very fast, low memory usage |
Poor statistical properties in some cases |
Up to 2128 |
Very Fast |
Moderate |
xoshiro/xoroshiro |
Good speed, improved statistical properties over xorshift |
Can fail specific statistical tests (e.g., low-dimensional uniformity) |
Up to 2256 |
Very Fast |
High |
PCG |
Simple, efficient, excellent statistical quality |
Moderate speed, depends on implementation |
Up to 264 |
Fast |
Very High |
LXM |
Excellent statistical properties, very long period, robust construction |
More complex to implement, moderate speed |
Up to 2128 |
Fast |
Very High |
Conclusion
The evolution of PRNGs has advanced significantly, from simple methods like LCGs to sophisticated algorithms such as xoshiro, PCG, and LXM. Each PRNG provides a different balance of speed, memory usage, period length, and statistical quality, making selecting the right generator based on specific application requirements essential. These advances suggest that PRNGs will increasingly underpin both AI’s practical tools and foundational algorithms, bridging traditional randomization techniques with intelligent, data-driven adjustments. Modern Languages support different families of PRNGs, helping developers create robust, efficient, and reliable software. Funnily enough, you can discover how challenging and exciting randomness can be.
To go further:
- Xoshiro / Xoroshiro generators and the PRNG shootout
- LXM: Better Splittable Pseudorandom Number Generators
- PCG: A Family of Better Random Number Generators
- Pseudorandom numbers in Java, Part 1
- Pseudorandom numbers in Java, Part 2
- Random Numbers in Java: Comparison of java.util.random, Xorshift, and PCG
- SIMD-Oriented Fast Mersenne Twister: a 128-bit Pseudorandom Number Generator
- PCG — Java Implementation. High quality fast random number generator