How to find MCFG table after located RSDP? - operating-system

I have located RSDP by searching the valid string.
I know the table contains a pointer direct me to XSDT.
However, I compared the "Length" and the entries in XSDT, only FACP, APIC, SSDT tables are located.
Does this mean that the system does not contain MCFG table?
Therefore I can not using memory map to access PCIe configuration space?
Thanks a lot!

MCFG is not a part of ACPI specification. The table is described in the PCI Firmware specification document.
Section 2.7.2 states
The ACPI MCFG table describes the location of the PCI Express configuration space, and this table will be present in a firmware implementation compliant to this specification version 3.0 (or later).
So, it means your firmware is not compliant to the PCI firmware specification v3.0 or later.
Reading further the section 4.1 about ECAM
On PC-compatible systems, the enhanced configuration access mechanism allows PCI configuration space to be accessed using memory primitives rather than I/O-based primitives (CF8/CFC mechanism).
So, it means in your case, if you are talking about PC-compatible systems, that only type 1 (CF8/CFC) access to the PCI configuration space is available (you may not reach space beyond 256 bytes).
Of course it may be just a bug in firmware that for some reason forgot to describe it. On x86 you may try to access traditional ECAM window (starting from 0xE0000000) and check if it works (be sure the memory region is marked as reserved in the OS).

Related

Reading and writing memory, but having trouble writing to a virtual address

I am trying to write a program where I scan a processes memory and can also write to these addresses(just like cheat engine). However I did some research and found out that the memory I was reading is virtual memory I can read this memory but I can't write to it and to translate it I need page tables. So my question is where can I find these page tables and is there any other way to write using the virtual address I get?
Virtual memory is an elaborate illusion. What you think is read/write RAM may actually be data in swap space, or "ready only, copy on write", or something else.
To maintain the illusion, and for security, and for compatibility (e.g. 32-bit program running on a 64-bit CPU with a 64-bit kernel); user-space is not given access to page tables.
An OS or kernel might provide an abstract interface to some of the information (with suitable restrictions and limitations for security). One example of this would be the VirtualQuery() and VirtualQueryEx() functions in Windows (see https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualqueryex ).
In a similar way, an OS or kernel might provide an abstract interface to alter a page's permissions (with suitable restrictions and limitations for security). One example of this would be the VirtualProtect() function in Windows (see https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualprotect ).
... and is there any other way to write using the virtual address I get?
If your CPU is an 80x86 CPU that supports Intel's transactional extensions; you can misuse "transactions" to suppress page faults (make them cause a "transaction abort" instead of triggering a page fault).
This won't allow you to write to a read-only or "not present" page; but will allow you to attempt to write without being detected by the OS.

Is UEFI required to map 4k pages on x64?

I am creating a kernel for x64 which is booting with UEFI. While the kernel has to be loaded at a low-ish address (I believe, because UEFI requires identity mapped pages so it cannot be mapped higher than the highest physical address), I want to relocate up to the end of memory. During this process I intend on creating new paging structures and in order to reduce memory consumption, I wanted to reuse the page tables used to map the image in the lower half. However, these page tables will only exist if 4k paging is used by UEFI, so my question is whether or not UEFI is required to use 4k paging on x64. I believe the answer is no, but I hope otherwise and wanted to see if this is true.
Now I understand UEFI allocates memory via BootServices->AllocatePage in 4k chunks it refers to as pages, but is this required to translate to the actual mapping structure used? I noticed that in section 2.3.6 of the UEFI 2.8 specification, the section referring to AArch64 calling conventions, it states
MMU configuration: Implementations must use only 4k pages [...]
There is no similar denotation in section 2.3.4, on the x64 calling conventions, which is why I believe the answer is no.
EDIT:
Based upon what I've already seen and the comment by Peter Cordes, I believe the standard does not specify exactly what it should be. Thus a revised version of the question is: Does the standard specify 4k translation granularity? If not, do most UEFI vendors on x64 use 4k pages?

Atomicity of small PCIE TLP writes

Are there any guarantees about how card to host writes from a PCIe device targeting regular memory are implemented from a software process' perspective, where a single TLP write is fully contained within a single CPU cache-line?
I'm wondering about a case where my device may write some number of words of data followed by a byte to indicate that the structure is now valid (for example an event completion), for example:
struct PCIE_COMPLETION_T {
uint64_t data_a;
uint64_t data_b;
uint64_t data_c;
uint64_t data_d;
uint8_t valid;
} alignas(SYSTEM_CACHE_LINE_SIZE);
Can I use a single TLP to write this structure, such that when software sees the valid member change to 1 (having been previously cleared to zero by software), then will the other data members will also reflect the values that I had written and not a previous value?
Currently I'm performing 2 writes, first writing the data and secondly marking it as valid, which doesn't have any apparent race conditions but does of course add unwanted overhead.
The most relevant question I can see on this site seems to be Are writes on the PCIe bus atomic? although this appears to relate to the relative ordering of TLPs.
Perusing the PCIe 3.0 specification, I didn't find anything that seemed to explicitly cover my concerns, I don't think that I need AtomicOps particularly. Given that I'm only concerned about interactions with x86-64 systems, I also dug through the Intel architecture guide but also came up no clearer.
Instinctively it seems that it should be possible for such a write to be perceived atomically -- especially as it is said to be a transaction -- but equally I can't find much in the way of documentation explicitly confirming that view (nor am I quite sure what I'd need to look at, probably the CPU vendor?). I also wonder if such a scheme can be extended over multiple cachelines -- ie if the valid sits on a second cacheline written from the same TLP transaction can I be assured that the first will be perceived no later than the second?
The write may be broken into smaller units, as small as dwords, but if it is, they must be observed in increasing address order.
PCIe revision 4, section 2.4.3:
If a single write transaction containing multiple DWs and the Relaxed
Ordering bit Clear is accepted by a Completer, the observed ordering
of the updates to locations within the Completer's data buffer must be
in increasing address order. This semantic is required in case a PCI
or PCI-X Bridge along the path combines multiple write transactions
into the single one. However, the observed granularity of the updates
to the Completer's data buffer is outside the scope of this
specification.
While not required by this specification, it is
strongly recommended that host platforms guarantee that when a PCI
Express write updates host memory, the update granularity observed by
a host CPU will not be smaller than a DW.
As an example of update
ordering and granularity, if a Requester writes a QW to host memory,
in some cases a host CPU reading that QW from host memory could
observe the first DW updated and the second DW containing the old
value.
I don't have a copy of revision 3, but I suspect this language is in that revision as well. To help you find it, Section 2.4 is "Transaction Ordering" and section 2.4.3 is "Update Ordering and Granularity Provided by a Write Transaction".

What is meant by "CPU generates logical address space"?

As far as I read from the book(correct me I am wrong) after a compiler puts the compiled code in the storage, the CPU creates logical addresses and those logical addresses are mapped to physical memory through MMU(Memory Management Unit).Also I know that CPU directly cannot access anything other than the physical memory.
Then how does CPU produce logical addresses for the process in the first place?
It sounds like you have a bit of confusion about what things do.
The operating system defines the logical address space by setting up page tables that logical pages to physical page frames. The operating system loads hardware registers of the CPU so that it knows about the page tables it has defined.
This use of page tables to define logical address spaces is an integral part of a modern CPU. In some systems, the only use of physical addresses is within page tables.
The compiler generates an object code file that describes the instructions and data used and created.
The linker combines object code into an executable file that defines how the program will be loaded into memory.
The loader reads the instructions in an executable file and sets up the logical addresses space to run the program. The loader calls system routines that set up the page tables the define the logical addresses space.
For example, where the executable has read-only data, the loader will call OS routines to creates read-only pages in the logical address space and map them to the data in the executable file.

MS-DOS, what determines the memory model selection

In this article we can see that 16 Bits systems have different memory models.
Through that answer we know that COM application always uses the Tiny Model (all segments are in the same one) but for the other executables what make the operating system uses one model or another?
I did not see in the MS-DOS Header any flag that would help for a choice so how does MS-DOS determines what memory model to use?
The selection of memory model is necessary as compiler option, not OS related. You can assume that DOS always works with Large memory model (far pointers for CS and DS).