0 Replies Latest reply on Feb 2, 2009 2:01 PM by

    Steps involved in a page fault due to instruction fetch.

      Hi,

       

      I noticed an odd behavior when dealing with page faults due to an instruction fetch.  To give you an idea of what I'm doing, I'm running Xen and running a single HVM guest on top of it.  As part of my research, I'm trying to capture execution jumps between different parts of the guest OS by playing around with page table permissions.

       

      What I noticed was, after the page fault, if I clear the top level page directory (lvl 4), I don't see the instruction fetch again.  To the best of my knowledge the TLB is flushed when the guest is resumed.  Another thing I noticed was, if the page that caused the instruction fetch is made executable but not-present, the error code still reflects the previous I/D flag, which I can only attribute to it being an artifact of the previous error (error bits weren't cleared?).

       

      I read that the exec bit is checked only when a translation is being fetched into the TLB, but since upon resume, the guest TLB is empty, shouldn't the instruction fetch cause a page fault again? Is there any source that would explain what checks are performed after an instruction fetch page fault?

       

      So far I've searched:

       

      Intel® 64 and IA-32 Architectures Software Developer's Manual

      Volume 3A: System Programming Guide

       

      and

       

      Intel® 64 and IA-32 Architectures Application Note
      TLBs, Paging-Structure Caches, and Their Invalidation

       

      and the errata

       

      Intel® Core™2 Duo Processor E8000¹ and E7000¹ Series Specification Update

       

      There is mention of updating page table entries during a page fault, which might cause unexpected behavior (AW48).  Could the page faults I'm observing fall into this category? Is there any way to circumvent this problem? I'm particularly interested in how the hardware continues after the interrupt is handled.  Which checks are performed again and are there any assumptions and/or optimization that may lead this problem.

       

      Thanks in advance,

       

      John