There is a long back story to this which I'll skip. In short, one of my interests is prime number finding, and I noticed a big difference in performance between my Skylake systems. With CPUs at equal overclocks, I found the system with Ripjaws 5 F4-3200C16-8GVK (2x8GB) much faster than ones with two sticks of Ripjaws 4 F4-3333C16-4GRRD (4x4GB) in it. The performance followed the ram, not the CPU or motherboard. After more testing, I found the performance gap significantly reduced with 4 modules fitted. Bare in mind Skylake is dual channel only, is there some other effect with fitting 4 modules providing this advantage?
To illustrate the magnitude of the difference, I did testing with one system: i7-6700k at 4.2 GHz, HT off, 4.1 GHz cache, F4-3333C16-4GRRD at 3000 16-18-18-38. Mobo MSI Z170A Gaming Pro.
For each line, first value is for 4 modules fitted, 2nd value for 2 modules fitted (still in dual channel mode), and last one is first divided by 2nd to show relative performance.
The test I will use to demonstrate the performance difference is Prime95 28.7 built in benchmark, looking only at 4 workers throughput in iterations/second at various selected FFT sizes. Run once each.
1024k 997.60 819.47 1.22
2048k 468.45 376.43 1.24
4096k 228.53 182.85 1.25
8192k 111.84 86.51 1.29
22% and rising with increasing workload. 1024k FFT would be 8MB data, x4 for each thread so this hits ram hard. The CPU wants to work on 32MB of data which is obviously too big to fit in the L3 cache at once. At bigger FFT sizes, proportionately more ram is used.
So why does this Ripjaws 4 kit run much faster with 4 modules than 2? And the Ripjaws 5 kit runs fast even with 2 modules. If I wanted to get more ram, without necessarily buying the same parts again, how would I know if they would give me the higher performance levels I'm seeking?
Note I've tried some other ram specific benchmarks, and they didn't show any significant performance difference I see here.
To illustrate the magnitude of the difference, I did testing with one system: i7-6700k at 4.2 GHz, HT off, 4.1 GHz cache, F4-3333C16-4GRRD at 3000 16-18-18-38. Mobo MSI Z170A Gaming Pro.
For each line, first value is for 4 modules fitted, 2nd value for 2 modules fitted (still in dual channel mode), and last one is first divided by 2nd to show relative performance.
The test I will use to demonstrate the performance difference is Prime95 28.7 built in benchmark, looking only at 4 workers throughput in iterations/second at various selected FFT sizes. Run once each.
1024k 997.60 819.47 1.22
2048k 468.45 376.43 1.24
4096k 228.53 182.85 1.25
8192k 111.84 86.51 1.29
22% and rising with increasing workload. 1024k FFT would be 8MB data, x4 for each thread so this hits ram hard. The CPU wants to work on 32MB of data which is obviously too big to fit in the L3 cache at once. At bigger FFT sizes, proportionately more ram is used.
So why does this Ripjaws 4 kit run much faster with 4 modules than 2? And the Ripjaws 5 kit runs fast even with 2 modules. If I wanted to get more ram, without necessarily buying the same parts again, how would I know if they would give me the higher performance levels I'm seeking?
Note I've tried some other ram specific benchmarks, and they didn't show any significant performance difference I see here.
Comment