Err, what's PAE?
PAE (Physical Address Extension) is a "workaround" for letting x86-32bit(!) OS see more than 4GB of RAM. 4GB is the limit for 32bit memory addresses. PAE is not needed and not implemented on x86-64 processors when 64 bit ("long mode") is enabled.
How does it work?
In short, it simply adds 4 bits to the memory addresses (32bit -> 36bit) and one more level of memory-lookup-hierarchy, and: voila, OS can access up to 64GB ram (which is not science fiction these crazy days..). Of course, a single 32bit process is not aware, and would still only have a 4GB of virtual address space, even with PAE.
Performance penalty: yes or no, and how much?
I was given a task to research the PAE technology for recommending my company whether we should use it or not, especially performance-wise.
PAE is hardware supported since Intel Pentium Pro (back in the mid 90s..). Reading "hardware supported" might mislead, and make one think that it's all accelerated and there's no performance penalty. But still, even in hardware, PAE adds one more hierarchy level for memory lookups, and my weak hardware knowledge tells that it might still slow something down...
The Linux kernel of most 32bit distributions (in particular RedHat 5) ships with PAE disabled, while an optional PAE-enabled kernel is available. On 32bit Windows, PAE used to be disabled by default, but since WinXP SP2 windows is PAE-enabled by default. I've also seen some Linux distros enabling PAE in their default kernel recently.
Googling for "PAE performance effects" was not easy, and that's the main reason I wrote this article: to spread the knowledge. Best articles I've found are specified at the bottom.
My research conclusions, then:
- Main reason for PAE being disabled by default, seems to be hardware compatibility: hardware with no PAE support can't boot a PAE-enabled kernel. That's mostly history now, anyway, for all recent x86 processors support PAE.
- Performance penalty is very low (according to RedHat average is 1% and no more than 10%). Of course it depends on your exact scenarios, etc etc.
- As a friend suggested me: in most cases PAE is not needed, for x86-64 is so widely spread. One can simply run his 32bit apps on a 64bit OS. PAE lets the OS see up to 64GB, while x86-64 (current implementations) lets the OS see at least 16TB! So my main conclusion is.. that PAE is dead. All modern x86 processors (since ~year 2006) have x86-64 support.
- Best thing is compiling your code as 64bit and running a 64bit OS, of course
Accessing beyond the 4GB on 32bit mode
According to the PAE article on Wikipedia, there are ways for accessing areas of RAM beyond the regular 4GB of virtual address space. Looks like Windows has a nice API for that: Address Windowing Extention, and Linux is able to do it with mmap() system call. I couldn't figure out how, though. Any idea?
- RedHat article by Bob Matthews & Norm Murray: page 9, last paragraph till page 10.
- Novell NetWare article which seems relevant.
Phewww. My longest post so far, I think