Wednesday, June 15, 2011

Paging in Windows

Windows uses demand paging algorithm to load pages in the memory. It also use cluster loading so that some of the adjacent pages are also loaded along with it (assuming that the Process would request for them sooner).

Windows have also build up some prefetcher which facilitate preloading various pages which it thinks would be used in the coming time. Prefetcher take this decision based on the usage history it has collected with time. It actually adjust priority of different pages to affect their loading behavior.

When prefetcher instigate loading of pages, the Pages get loaded and added to the Standby list. So whenever later, a Process or the System need to refer those pages, the Pages are transfered to the Working set using an inexpensive Soft page fault.

Page Priority

As with Process and Threads, Pages too have Priorities assigned to them. Standby list maintain the pages based on their priority. Lowest Priority Pages are used first for Page replacement. A number of factors determine the priority to be assigned to a given Page. This include the priority of the Process/Thread accessing this Page, the Usage history of this Page, etc.

Page Lists

Memory Manager keep list of different pages depending on the state of the Pages. It helps in managing the Page frame resource for the system. As and when need arise, Memory manager moves Pages between these lists. Following is small description of different Page lists maintained in the system:

  • Working Set: Pages (physical) those are assigned to a Process/System address space are maintained in this set.
  • Modified Page List: Pages that were written to and are removed from the Working Set are maintained in this list till the content are synced back to the pagefile. Once it is synced back to the Pagefile, the Page is moved to the Standby list.
  • Standby list: Pages in this list have almost same status as it would be in the Working Set. If need arise, it can be quickly moved back (soft page fault) to the Working Set without much work. It contain valid and most current content so no Page In I/O is required to reuse it again. Pages in this list were taken out of Working Set (sometime via Modified Page list) and were added to this list, probably for the reason that Process is not using them at the moment.
  • Free Page List: Pages are available for use. Contain invalid data. So it need proper initialization.
  • Zero Page List: Pages are available for use. Initialized to Zero. Used in case of Demand Zero Page request.

Page Frame Number (PFN) Database

Although each Page frame can be reached directly using the Frame number and the Page size, there is also certain properties attached to each frames which are maintained separately in a database. It has one record for each Page Frame. The data structure of this record may wary per page depending on the state of that Page. But some data members are common for all. Following is description of some of  the more relevant data members of PFN data structure:

  • Backward/Forward Pointer: Point to the next PFN record. It is used to link the pages when they are added to different Page list (e.g. Standby, Free, Zero, Modified Page lists, etc).
  • Page Priority
  • Reference/Shared Count
  • PTE Address/PFN of PTE: Information used to back point to PTE that was referring this page form the User/System address space.
  • Original PTE Content: Use for restoring the PTE value when the Page is removed from the Working Set of Process/System.


Monday, June 13, 2011

Prototype PTE and Shared Memory

In order to enable Sharing Memory among different process, an additional layer is added in the Virtual to Physical address translation. For each memory section thats being shared, a Segment structure (courtesy Section Object) is created. It contain the complete list of Prototype PTEs pointing to shared pages for that Section.

Prototype PTE is a special type of PTE. It forms the basic construct for supporting Shared memory in Windows. Prototype PTEs are same as any other regular PTE but with Prototype Bit field set. It contain enough information to access the desired Physical memory. Information in Prototype PTE is very much similar to any other regular PTE. It help Memory manager to bring the Pages to the Memory if it is not already there. Like for Pages backed up in the Pagefile or Mapped File, it will contain information about the Page Offset, etc.

Prototype PTE does not feature in the Page Tables and are not directly used for Address translation. They are only present in the Segment Structure. When any process open the Section object to a Shared memory the Page Table for that process is populated with PTEs that point to the Prototype PTEs in the corresponding Segment structure. When the process actually first try referring to any of the Shared Memory mapped into the Process address space, the memory manager use information in the Prototype PTE to update the Process PTE with Page Frame number of the resident shared memory.

If this was the first time any process has made reference to that shared memory, the Prototype PTE is also updated to directly point to the resident memory. During this,  the corresponding PFN database entry is also updated to indicate the number of processes sharing that memory. The PFN database entry also contain a back pointer to the Prototype PTE so that sometime later if memory manager decide to change the Page state and move it to some other location, it can use this pointer to update the Prototype PTE accordingly.



Sunday, June 12, 2011

Page States

If a given Page is in the Working Set, the PTE that points to it will have the Valid Bit flag set to One. This mean that the PTE points to a valid physical page. In this case the PTE will contain the Page Frame number for the corresponding Physical Page.

Otherwise if Valid Bit flag is Zero (broadly indicate an Invalid Page), the Page can be in one of the other special Page States. The actual state can then be determined by looking at the remaining PTE fields. Following are small description for some of these other Page states:

 - Page is backed up in the Page file: PTE will contain Page File Number and Page File Offset information.
 - Demand Zero Page: When first referenced, memory manager should allocate a Zero initialized page and assign it to the given PTE. Demand Zero Page at first would look like Page file PTE but the Page File number and the Page File offset is set to Zero.
 - Transition: Although the Page might be resident, but its not in the Working Set. It could be in the Standby list, or Modified Page list, etc. PTE will contain the Page Frame number for the resident Page. The Transition and the Prototype bit flag is set to One to indicate the Transition state of this Page. In order to use it again, the memory manager will have to include this page in Working Set.
 - Zero PTE: No Page yet assigned to this PTE. When first referenced, the memory manager should check the VADs to determine the Virtual memory reserve/commit state and act accordingly. If the Virtual memory is not committed yet, the memory manager will raise Access violation.

Thursday, June 2, 2011

Large and Small Pages

System make use of Large and Small pages for their individual merits. Although hardware these days are capable of supporting Page size as large as 1GB. But System assign an optimal size for Large and Small Page based on their performance results. For example on x86 Windows system, Small Pages are 4KB and Large Pages are 4MB (2MB on PAE systems).

Large page gives better performance as it make efficient use of TLB. When a byte is referred from a Large page its translation information is cached in TLB. This Cache will help efficiently accessing other bytes from that page next time.

On the down side, the memory protection are enforced at Page granularity. So for Large pages, many times Read only code and Read/Write data are mapped to a same page. This will relax the protection flag for this page to Read/Write. And any faulty or malicious program can write to Read only code mapped to this page and go undetected.

Windows configure their Page usage such that it can take advantage of both Large and Small pages. It maps core operating system images and data to the Large Pages. And User programs and data to Small pages. Although for debugging purpose, developer can override this behavior and run Driver Verifier to disable Large pages.