Why doesn't Postgres store a page_id field in its PageHeaderData struct? - postgresql

From: https://www.postgresql.org/docs/15/storage-page-layout.html
With this information, if you were given an arbitrary page, I don't understand how you'd be able to tell what the page number of it was.
Why doesn't Postgres store something like page_id, and similar (ie next/prev_page_id)?
Field Type Length Description
pd_lsn PageXLogRecPtr 8 bytes LSN: next byte after last byte of WAL record for last change to this page
pd_checksum uint16 2 bytes Page checksum
pd_flags uint16 2 bytes Flag bits
pd_lower LocationIndex 2 bytes Offset to start of free space
pd_upper LocationIndex 2 bytes Offset to end of free space
pd_special LocationIndex 2 bytes Offset to start of special space
pd_pagesize_version uint16 2 bytes Page size and layout version number information
pd_prune_xid TransactionId 4 bytes Oldest unpruned XMAX on page, or zero if none

Related

Calculate Multi Level Page Table Size in Bytes

If I have a system with a 2 level page table such that each page is 4096 Bytes, 8 byte page table entries, and 4096 byte page tables at each level, what would the page table size at least have to be for a program with access to a 1000 pages of memeory?
I figured out that each page table can have 512 entries so we need at least 2 page tables but how do I account for the 2 levels.

How to compute the address at which the page table entry?

Suppose a system has:
20-bit virtual addresses,
1024 byte pages,
24-bit physical addresses,
4 byte page table enties,
a page table base pointer set to physical (byte) address 0x1000,
a single-level page table structure.
Based on the above information, what is the address at which the page table entry for the virtual address 0x1000 is stored? (Note that page table entries are larger than one byte.) Write your answer as a hexadecimal number.
1024 (2^10) byte pages -> page offset = 10
virtual address 0x1000 -> 100 0000000000
VPN: 100 -> 0x4
PTE address = 0x1000(page table base pointer) + 0x4(VPN)* 4(size of PTE) = 0x1010
So the correct answer is 0x1010.

Are the Physical Page Numbers in this diagram the same between all?

I'm currently reading a text book on xv6, and understand this so far ...
Virtual Address: First 20 bits to index into a PTE. The PTE takes these 20 bits and turns them into a Physical Page Number: PPN. The remaining 12 bits are used for offset, which will be the same in both virtual and physical addresses.
Paging: Paging hardware uses first 10 bits of 20 bits in the virtual address to select a page directory entry (PDE). If a PDE is present, uses next 10 bits of virtual address to select a page table entry (PTE). Something like this ...
00 0000 0011 | 00 0000 0010 | 0000 0000 0101
Page Dir. (3) | Page Table E. (2) | Offset (5)
Question: Is the PPN showed in the diagrams the same all across? I also know the difference between a page directory and page table entry is only by 1 bit, which is set to 0 or 1 depending if you are at page directory or table. Is the PPN common between all 3 then? (Physical Address, Page Table, Page Directory).
Hopefully, this answers your question. If you access a 32-bit address, 12-bits are saved for the offset into the page. They play no part in address translation.
The CR3 register points to a page table directory. Although not specified in your diagram, I believe this points to a physical page frame. That page frame contains an array of directories. The top 10 bits in your address are an index into that directories.
So now you have a structure like the one in your diagram. That structure contains a pointer to a physical page frame (PPN) containing a page table. Again this is physical address that would be padded with zeroes. You use the value in the PPN field to find the page table.
Your page table is an array of structures that look just like the directory. What is misleading in your diagram is that the D bit may or may not be set in a page table while it is always clear in a directory. The next 10 bits in your address are an index into this table. Use those to locate the desired page table entry.
As before you have a PPN. On this second iteration, this is a pointer to a physical address BUT now it is the actual memory page you want to access. Pad the 20 bits of the PPN with zero and add the lower 12 bits of your address and you have the physical address.

How to convert a Virtual address to Physical Address?

if i have a Virtual Address: 0xF3557100 , how do i convert it to Physical Address and what are the Values of Offset, Page Directory and Page Table ?
The PTE (page table entry) for that address has the value 0x87124053
thnx
Sadly, what you are asking is system dependent. You would need to know the size of the page to begin with.
In the simplest case, the lowest order bits corresponding to the page size are the offset and the remaining high order bits specify the page table entry.
You say that you have the value of he page table entry. You then need to know the structure of the page table entry. Some part of that will indicate the physical address. Other parts will define page attributes.
In short, we'd need to know a whole lot more information.
In general from this info you can not translate a VA to PA.
Each architecture has some constant value for PAGE_SHIFT. as your address is 32 bit, most of such architecture has 12 bit PAGE_SHIFT value.
this value determines the offset value so your offset value is 12 bits.that also means your page size is 4096 bytes. even though a architecture can support more than one value for PAGE_SHIFT, we take case of 12 bits offset which is usually default value in most systems making page of 4096
PTE contains address of the page frame/number along with other status and protection information.Lower 12 bits in PTE are used for status and protection while other 20 bits are used for PPN. as a principle virtual frame number is mapped to physical frame number and offset is same in both. so exclude lower most 12 bits from PTE and append 12 lower most bits from va.
so offset from va is 0x100 so physical address is 0x87124100
according to 10-10-12 rule (there is no general rule for this division)
offset = 12 bits
page table = page directory=10 bits
now you CAN easily calculate relevant bits value from given address.
1111001101 0101010111 000100000000
page directory offset = 1111001101
page table offset = 0101010111
page offset = 000100000000

Making sense of Postgres row sizes

I got a large (>100M rows) Postgres table with structure {integer, integer, integer, timestamp without time zone}. I expected the size of a row to be 3*integer + 1*timestamp = 3*4 + 1*8 = 20 bytes.
In reality the row size is pg_relation_size(tbl) / count(*) = 52 bytes. Why?
(No deletes are done against the table: pg_relation_size(tbl, 'fsm') ~= 0)
Calculation of row size is much more complex than that.
Storage is typically partitioned in 8 kB data pages. There is a small fixed overhead per page, possible remainders not big enough to fit another tuple, and more importantly dead rows or a percentage initially reserved with the FILLFACTOR setting.
And there is even more overhead per row (tuple): an item identifier of 4 bytes at the start of the page, the HeapTupleHeader of 23 bytes and alignment padding. The start of the tuple header as well as the start of tuple data are aligned at a multiple of MAXALIGN, which is 8 bytes on a typical 64-bit machine. Some data types require alignment to the next multiple of 2, 4 or 8 bytes.
Quoting the manual on the system table pg_tpye:
typalign is the alignment required when storing a value of this type.
It applies to storage on disk as well as most representations of the
value inside PostgreSQL. When multiple values are stored
consecutively, such as in the representation of a complete row on
disk, padding is inserted before a datum of this type so that it
begins on the specified boundary. The alignment reference is the
beginning of the first datum in the sequence.
Possible values are:
c = char alignment, i.e., no alignment needed.
s = short alignment (2 bytes on most machines).
i = int alignment (4 bytes on most machines).
d = double alignment (8 bytes on many machines, but by no means all).
Read about the basics in the manual here.
Your example
This results in 4 bytes of padding after your 3 integer columns, because the timestamp column requires double alignment and needs to start at the next multiple of 8 bytes.
So, one row occupies:
23 -- heaptupleheader
+ 1 -- padding or NULL bitmap
+ 12 -- 3 * integer (no alignment padding here)
+ 4 -- padding after 3rd integer
+ 8 -- timestamp
+ 0 -- no padding since tuple ends at multiple of MAXALIGN
Plus item identifier per tuple in the page header (as pointed out by #A.H. in the comment):
+ 4 -- item identifier in page header
------
= 52 bytes
So we arrive at the observed 52 bytes.
The calculation pg_relation_size(tbl) / count(*) is a pessimistic estimation. pg_relation_size(tbl) includes bloat (dead rows) and space reserved by fillfactor, as well as overhead per data page and per table. (And we didn't even mention compression for long varlena data in TOAST tables, since it doesn't apply here.)
You can install the additional module pgstattuple and call SELECT * FROM pgstattuple('tbl_name'); for more information on table and tuple size.
Related:
Table size with page layout
Calculating and saving space in PostgreSQL
Each row has metadata associated with it. The correct formula is (assuming naïve alignment):
3 * 4 + 1 * 8 == your data
24 bytes == row overhead
total size per row: 23 + 20
Or roughly 53 bytes. I actually wrote postgresql-varint specifically to help with this problem with this exact use case. You may want to look at a similar post for additional details re: tuple overhead.