Memory addressing principle

In the network security incident analysis, will encounter memory addressing knowledge. Today, i am going to talk about the principle of memory addressing, the article is divided into two “memory addressing principle” and “memory addressing mode.”

With the development of information technology and data processing capabilities to improve the performance of computer hardware products and capacity also put forward new challenges, requiring computer processing capacity should also be able to change with the actual situation needs to change, change.

At present, an ordinary computer hard drive capacity to more than 200 G, memory also has 4G; such a large capacity of hard drives and memory, in dealing with large amounts of data or large games in front or seem powerless, need to meet the needs of expansion, such as memory From 4G to 8G or 16G range. After the expansion of the personal experience really enhance a lot. For the increase in memory capacity need to have the appropriate hardware-based support, need to be able to digest so much memory address address. For example, if an 8-bit microcontroller if you want to load 16G of memory, that is,

Where there is a demand where there is a market ; computer from the 8-bit 51 single-chip, 20 8086 addressing, the development of 32 win2003, 64 win10, are due to the expansion of information needs to promote the generation of computer generation innovation and innovation.

For memory expansion, many people are not very aware of how the application uses the physical memory address. Far from the do not say that the current system commonly used in the computer 32-bit, 64 systems; system is how to convert the virtual address into a linear address, the linear address is how to convert into a physical address, which also used to what the register or data Structure, I believe many people are also a little knowledge of this; also like me, would like to combine examples of the nature of the conversion from the address to master its essence. Then learn together from the logical address to the physical address of the entire conversion process.

Article Content

1. Introduction to real mode and protection mode
2. Protected mode addressing basics
2.1 memory address concept
2.2 virtual address, linear address, physical address relationship
2.3 segment mechanism and case analysis
- 2.3.1 Section Descriptor Basics
- 2.3.2 Segment descriptor table instance parsing
- 2.3.3 Segment selector structure
- 2.3.4 Logical Address to Linear Address Translation Example Resolution
2.4 page mechanism and case analysis
- 2.4.1 PDE structure and how to find the memory page directory
- 2.4.2 page table structure analysis

1. Introduction to real mode and protection mode

CPU common three modes of operation: real mode and protection mode, virtual 8086 mode.

Real mode : CPU reset (reset) or power (power on) when the real mode to start, the processor to work in real mode. In real mode, the memory addressing mode is the same as 8086, multiplying the contents of the 16-bit register by 16 (10H) as the segment base address, plus the 16-bit offset address to form the 20-bit physical address, the maximum addressing space 1MB The In real mode, all segments are readable, writable, and executable. There is no segmentation or paging mechanism in real mode, and logical addresses and physical addresses are equal.

Thus learned:

In real mode the maximum addressing space when 1M, 1M or more memory space in real mode will not be used.
In real mode all the memory data can be accessed. There is no user state, kernel state of the points.
In the BIOS load, MBR, ntdlr start phase are in real mode.

Protection mode : for the protection mode we are no stranger to the current operating system operating mode, the use of memory management mechanism to achieve the linear address to the physical address conversion, with a perfect task protection mechanism.

Protection mode common sense:

Now the application runs in the protected mode.
Horizontal protection, also known as task protection, multi-tasking operating system, a task can not destroy another task code, which is through the memory paging and different tasks of the memory page mapping to different physical memory to achieve.
Longitudinal protection, also known as task protection, the system code and application code are in the same address space, but the system code has a high priority, the application code is low priority, only high priority code can be used to access low priority code, This eliminates user code from breaking system code.

Virtual 8086 mode: referred to as V86 mode is running in protected mode in real mode, in order to 32-bit protected mode, the implementation of pure 16-bit program. 8086 program can be used as a protection mode to perform a task. Virtual 8086 allows the 8086 program to be executed without exiting protected mode.

Virtual 8086 common sense:

The addressing address space is 1M bytes.
You can run a 16-bit DOS program in virtual 8086 mode.
In V86 mode, the code segment is always writable; this is the same as the real mode, and the data segment is also executable.
32 system to write V86 mode of the program:

2. Protected mode addressing basics

The next 32-bit system, for example, describes the protection mode, the memory of some of the address conversion related to the register and data structure.

2.1 memory address concept

Logical address : in the C language programming, can read the variable address value (& operation), in fact, this value is the logical address, it can be returned by malloc or new call address. The address is relative to the address of the current process data segment and is not associated with the absolute physical address. Only in Intel real mode, the logical address is the same as the physical address (because the real mode is not segmented or paging mechanism, CPU does not carry out automatic address translation). Application programmers only need to deal with the logical address, and segmentation and paging mechanism for the general programmer is completely transparent, only by the system programmers involved. Application programmers, although they can directly operate the memory, it can only be allocated in the operating system to your memory segment operation. A logical address is an offset of a segment identifier plus a relative address in the specified segment, expressed as [segment identifier: intra-segment offset].

Linear Address : An intermediate layer between logical addresses and physical address transformations. The program code generates a logical address, or an offset address in the segment, with the base address of the corresponding segment generating a linear address. If the paging mechanism is enabled, the linear address can then be transformed to produce a physical address. If the paging mechanism is not enabled, the linear address is directly the physical address. Intel 80386 linear address space capacity of 4G (2 32-bit 32-bit address bus addressing).

Physical address (Physical Address) is to indicate the current address of the external address bus on the address of the address of the physical memory address is the final result of the address conversion address. If the paging mechanism is enabled, the linear address is converted to a physical address using the items in the page and page tables. If the paging mechanism is not enabled, the linear address becomes a physical address directly, such as in real mode.

2.2 virtual address, linear address, physical address relationship

The conversion between addresses in protected mode is transparent to the programmer. So how does the physical memory convert the virtual address to a physical address through a memory management mechanism? When a command in the program accesses a logical address, the CPU first converts the virtual address to a linear address based on the contents of the segment register. If the CPU finds that the memory page containing the linear address is not in physical memory, a page exception occurs, which is passed through the memory manager routine of the operating system. The memory manager gets an exception report based on the status information of the exception. In particular, the CR2 register contains a linear address that loads the required memory page into physical memory. The exception handler then returns an instruction that causes the processor to re-execute the page error exception, and the memory page required is already in physical memory, so it will no longer cause page fault exceptions.

2.3 segment mechanism and case analysis

As mentioned above, before the linear address is converted to a physical address, it is first converted to a linear address by a logical address. The system adopts the segment management mechanism to realize the conversion from logical address to linear address. In protected mode, the final linear address is addressed by “segment selector + segment offset”.

The CPU’s segment mechanism provides a means by which the system’s memory space can be divided into one smaller, protected area, one for each segment. Relative to the 32-bit system, that is, the 4G logical address space for different segments. Each segment has its own start address (base address), border, and access rights. An important data structure for implementing a segment mechanism is a segment descriptor.

The following is a list of programs that show values for each segment:

The figure shows the code segment CS, stack segment SS, data segment DS and other segments of the register value; from the value obtained, SS = DS = ES is equal, as to why some of the value of the same paragraph, will be said later. Take the address 0x83e84110 given in the example, where is the segment descriptor, where is the intra-segment offset, and how is the logical address converted to a linear address? I believe that many people can not wait to know the whole conversion process, the next step to look at the logical address to the linear address of the detailed conversion process.

In the above paragraph, it is said that there is a segment selector in the segment management mode and the offset address is located in the segment. The actual conversion process is shown in the following figure

As can be seen from the figure, the logical address to the linear address conversion, first through the segment selector from the descriptor table to find the segment descriptor, the segment descriptor and offset address to get a linear address. In other words, to get the segment descriptor requires three conditions:

Get the segment selector.
Get the segment descriptor table
Find the index locator descriptor for the segment descriptor from the segment descriptor table.

We mentioned earlier that the segment descriptor + offset address does not have a segment selector and a segment descriptor table. So we have to figure out how many of the concept segment selector, segment descriptor table, segment descriptor, and how can we get these descriptors?

2.3.1 Section Descriptor Basics

As can be seen from the above figure, through the segment selector to find the segment descriptor through the segment descriptor, then what is the segment descriptor table, and how to get the segment descriptor table?

In protected mode, each memory segment is a segment descriptor. Its structure as shown below:

As you can see, a segment descriptor is an 8-byte data structure that describes information about the location, size, access control, and status of a segment. The most basic content of the segment descriptor is the segment base and the boundary. The segment base is represented by 4 bytes (3, 4, 5, 8 bytes can be seen in the figure). 4 bytes just means any address of the 4G linear address (0x00000000-0xffffffff). The segment boundary 20 bits represent (1, 2 bytes and 7 bits lower).

2.3.2 Segment descriptor table instance parsing

In today’s multitasking system, there are usually multiple tasks at the same time. Each task has multiple segments. Each segment requires a segment descriptor. The segment descriptor is described in the previous section, so there are many segments in the system Descriptor. For ease of management, you need to save the descriptor in the segment descriptor table, that is, the segment descriptor table drawn above. The IA-32 processor has 3 descriptors in the table: GDT, LDT, and IDT.

GDT is a global descriptor table. A system usually has only one GDT table. GDT table is also in the figure above the descriptor table for the system so the procedures and tasks used. As for LDT and IDT today is not the focus.

So how to find the GDT table storage location? The GDTR register is used in the system to indicate the location and boundary of the GDT table, that is, the system finds the GDT table through the GDTR register. In 32-bit mode, its length is 48 bits, the upper 32 bits are the base address, the lower 16 bits Is the boundary; in IA-32e mode, the length is 80 bits, the higher 64-bit base address, the lower 16-bit boundary.

The descriptor of the first entry (0) in the GDT table is reserved for null descriptors. How do I view the GDT table location? By looking at the GDTR register, as shown below

From the above figure to see GDT table location address is 0x8095000, gdtl value to see GDT border 1023, the total length of 1024 bytes. Know that each segment descriptor is 8 bytes. So a total of 128 entries. The first entry in the figure is a null descriptor.

2.3.3 Segment selector structure

We introduced the format structure of the segment descriptor table and the segment descriptor earlier. So how does the segment selector find the segment descriptor and what is the segment selector?

The segment selector is also called the segment selector and is used to locate the required segment descriptor in the segment descriptor table. The segment selection sub-format is as follows:

The segment selector occupies 16 bits and two bytes, where the upper 13 bits are segments that describe the index in the segment description table. Low 3 is some other attributes, not much here to introduce. Using a 13-bit address means that up to 8k = 8192 descriptors can be indexed. But we know the last section GDT up to 128 entries.

In the protected mode all segment registers (CS, DS, ES, FS, GS) are stored in the segment selector.

2.3.4 Logical Address to Linear Address Translation Example Resolution

Has learned the logical address to the virtual address to the linear address of the conversion process, then take a look at the logical address 0x83e84110 in front of the corresponding linear address is how much?

First, the address 0x83e84110 corresponds to a logical address of the code segment, the address offset already known, that is, the segment is aware of the offset, through the register EIP is 0x83e34110. The segment selector is the CS register CS = 0008, and the index of the GDT table corresponding to its higher 13 bits is 1, which is the second item descriptor (the first term is the empty descriptor). The second item of the GDT table is 8 bytes of the standard red

The segment base is obtained by 3,4,5,8 bytes of the segment description.

As shown in the figure above, the corresponding value of 3,5,5 bytes for the second item descriptor is 0x00000000. Thus we got the segment mechanism and the segment shift. The last linear address is the segment base + segment offset = 0x0 + 0x83e34110 = 0x83e34110.

Thus we know that in 32 systems the logical address is the linear address.

In fact, by observing the other segment selectors will find that all segments of the corresponding sub-address is 0x0, this is because in 32 system protection mode, the use of a flat memory model, the base address and boundary values used are the same. Since the base address is 0, then the linear address is equal to the segment offset = logical address.

In short:

The segment descriptor is 8 bytes
GDTR is 48 bits
Segment selector 2 bytes.

2.4 page mechanism and case analysis

Before the introduction from the logical address to the linear address of the conversion process, then the next step is to talk about how the address is a logical address to a physical address. Need to understand some of the relevant data structure.

As mentioned earlier, if the CPU found that the memory page containing the linear address is not in the physical memory will produce a missing page exception, the exception handler is through the operating system memory manager routines. The memory manager gets an exception report based on the status information of the exception. In particular, the CR2 register contains a linear address that loads the required memory page into physical memory. The exception handling then returns an instruction that causes the processor to re-execute the page error exception, and the required memory page is already in the physical memory, so it will no longer cause an error exception.

2.4.1 PDE structure and how to find the memory page directory

From the above figure we know that through the register CR3 can find the page table of contents. So what is CR3? In the 32 system, CR3 stores the start address of the page directory. The CR3 register is also called the page directory base register. The mapping of the 4G linear addresses to the physical addresses in the different applications in the 32-bit system is different, and the CR3 registers in each application are also different. That is, each application in the page directory base address is different.

(PDE), the page directory occupies a 4kb memory page, each PDE length is 4 bytes, so the page directory contains up to 1KB. The page directory is used to store the page directory entry (PDE). When PAE is not enabled, there are two PDEs, here we only discuss the use of common points to the 4KB page table PDE.

The upper 20 bits of the page directory entry represent the upper 20 bits of the starting physical address pointed to by the PDE, and the lower 12 bits of the start address are 0, that is, the page table is found by the upper 20 bits of the PDE. Since the page table is low by 12 bits, the page table must be a 4KB boundary alignment. That is, through the page directory table in the table table table table to locate the use of which page table (each application has a lot of page table).

To start the calc program, for example, CR3 register is the value of DirBase, as shown below

Calc.exe program corresponding to the CR3 register value is 0x2960a000, the following is the corresponding PDT structure

2.4.2 page table structure analysis

The page table is used to hold the page table entry (PTE). Each page table accounts for 4KB of memory pages, each PTE occupies 4 bytes. So each page table up to 1024 PTE. Where the upper 20 bits represent the upper 20 bits of the starting physical address of the final page to be used. So 4KB of memory pages are also 4KB border alignment.