Linux Performance on Xeon em64t


Goal
Method
The hardware
The software
The benchmarks
Results
Conclusions

Goal: to determine the performance limits and usability of certain Linux distributions on scientific computation workstations.

Method: we put together a test box, load it with three versions of Linux, and benchmark the CPU/memory performance.

The hardware:

We had to determine first a hardware configuration that can be a good candidate for  our new scientific computation workstations or nodes of  parallel clusters. The criteria for picking the hardware are:

The following configuration was preferred:

Motherboard

Supermicro X6DAE-G2-O
 Dual Intel® 64-bit Xeon Support, up to 3.60 GHz, 800 MHz FSB
 Intel® E7525 (Tumwater) Chipset
 Up to 16GB DDRII 400 SDRAM
 Intel® 82545GM Single-port Gigabit Ethernet Controllers
 2x SATA Ports via ICH5R SATA Controller
 1 (x16) & 1 (x4) PCI-Express,
 1x 64-bit 133MHz PCI-X,
 2x 64-bit 100MHz PCI-X,
 1x 32-bit 33MHz PCI
 AC'97 Audio, 6-Channel Sound

CPUs (2x)

Intel Xeon "Irwindale" Processor BX80546KG3000FA
3.0GHz, 800MHz FSB,  2M Cache,  PGA4, HT, EM64T, XD, Active HS

RAM (4x)

Kingston KVR400D2S8R3/512
512MB, DDR2 DIMM, 400MHZ, REG ECC, CL3, Single Rank, 1.8V

Hard drive

Western Digital Caviar SE 120GB ATA-100 WD1200JB
Average Seek Time - 8.9 ms
Spindle Speed - 7200 rpm
Buffer Size - 8 MB

Images of the system (click to enlarge):

img1  img2  img4

While the main focus of our tests was to compare results obtained on the same hardware, for reference purpose we included some of the interesting results obtained on other computers (a P4 2.6 GHz 800FSB and a dual Xeon 2.8 GHz 533FSB). We considered these platforms as viable cheap second choices for scientific workstations and parallel cluster nodes.

The software:

A Linux distribution has to be picked. We used the following criteria optimized for maximum productivity:

We considered Novel/SUSE as being the distribution that is most suitable for our needs. For historical and comparison reasons, we included in the tests a Redhat 7.3 installation that we consider the longest lived and most successful Linux distribution ever.

Our main test system had the following software configuration:
  1. SUSE 9.2 64-bit,  kernel 2.6.8-24.14-smp, gcc version 3.3.4 (pre 3.3.5 20040809), ld-2.3.3.so
  2. SUSE 9.1 32-bit, kernel 2.6.5-7.151-smp, gcc version 3.3.3, ld-2.3.3.so
  3. Redhat 7.3, kernel 2.4.20-42.7.legacysmp, gcc version 2.96 20000731 -and- gcc version 3.0.4, ld-2.2.5.so

Hardware support problems:
  1. None of these kernels will recognize correctly the amount of L2 cache of the Xeon "Irwindale" CPUs. While /proc/cpuinfo in Redhat 7.3 and SUSE 9.1 report 0KB of L2 cache, SUSE 9.2 finds 16 KB (out of 2048 KB).

  2. As expected, Redhat 7.3 has problems configuring the gigabit Ethernet adapters and the sound built into the Supermicro X6DAE-G2-O motherboard. Everything else seems to work fine.

All the other computers mentioned in the results were running Redhat 7.3, kernel 2.4.20-42.7.legacy(smp), gcc version 2.96, ld-2.2.5.so.

The benchmarks:

It should be first noted that the "user experience" offered by a computer depends mainly on the characteristics of the job. To find out what bus or component of the computer determines the speed of a certain job, the user needs to analyze where is that job spending most of the time. E.g. in most of the cases an ftp transfer will mainly depend on the speed of the network and not on the clock of the CPU.

Our tests are specifically targeted to CPU and cache/memory performance, and are  intended to test the limits. We looked at kernel 2.4 vs 2.6  comparison, gcc-2.96 vs gcc-3.04, 32-bit vs 64-bit. To study that, we picked the following benchmark sets:
The benchmarks were used both in the standard way (so the results can be compared safely with other systems), and using custom parameter values to make the size of the problems larger.

While the main focus of our tests was to compare results obtained on the same hardware, for reference purpose we included some of the interesting results obtained on other computers running Redhat 7.3. These computers are a P4 2.6 GHz 800FSB and a dual Xeon 2.8 GHz 533FSB. We considered these platforms as viable cheap second choices for scientific workstations and parallel cluster nodes.

Selected Results:

Triad

See our complete STREAM bandwidth results and details.

IDEA

Assignment

Fourier

See our complete NBENCH-BYTE results, observations, and benchmark meaning.

Conclusions (as of April 2005):

  1. Performance for 64-bit OS can be as much as twice faster to twice slower than the 32-bit version on the same hardware. Test a typical job to see if 64-bit brings any improvement.

  2. New Xeons EM64T 800 FSB do not generally bring enough performance improvement to justify upgrades of older 533 FSB parallel cluster nodes.

  3. The memory bandwidth is practically OS -independent for our test system. The bandwidth is 10-20% smaller than that of a single CPU P4 system and about 50% larger than an older 533 FSB Xeon computer.


Comments and suggestions to: Florin Manolache | florin@andrew.cmu.edu