Postgres and tables internal organization - postgresql

I found an explanation of how things work internally in postgresql. There was the following picture:
and the following explanation:
Items after the headers is an array identifier composed of (offset,
length) pairs pointing to the actual items.
Because an item identifier is never moved until it is freed, its index
can be used on a long-term basis to reference an item, even when the
item itself is moved around on the page to compact free space. A
Pointer to an item is called CTID (ItemPointer), created by
PostgreSQL, it consists of a page number and the index of an item
identifier.
Could you be so kind to clear a couple of things out here?
Am I right that items near the page header are CTIDs themselves or Items and CTIDs are different things?
Do CTIDs never move around or rows?
Depending on the answers, maybe I'll understand what the following means exactly "Because an item identifier is never moved until it is freed, its index can be used on a long-term basis to reference an item, even when the item itself is moved around on the page to compact free space."
However, additional more detailed explanation would be nice.

What is called “item” in the picture is a “line pointer” in PostgreSQL jargon. It is defined in src/include/storage/itemid.h:
/*
* A line pointer on a buffer page. See buffer page definitions and comments
* for an explanation of how line pointers are used.
*
* In some cases a line pointer is "in use" but does not have any associated
* storage on the page. By convention, lp_len == 0 in every line pointer
* that does not have storage, independently of its lp_flags state.
*/
typedef struct ItemIdData
{
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of line pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;
typedef ItemIdData *ItemId;
These line pointers are stored in an array right after the page header.
See the excellent documentation in src/include/storage/bufpage.h:
/*
* A postgres disk page is an abstraction layered on top of a postgres
* disk block (which is simply a unit of i/o, see block.h).
*
* specifically, while a disk block can be unformatted, a postgres
* disk page is always a slotted page of the form:
*
* +----------------+---------------------------------+
* | PageHeaderData | linp1 linp2 linp3 ... |
* +-----------+----+---------------------------------+
* | ... linpN | |
* +-----------+--------------------------------------+
* | ^ pd_lower |
* | |
* | v pd_upper |
* +-------------+------------------------------------+
* | | tupleN ... |
* +-------------+------------------+-----------------+
* | ... tuple3 tuple2 tuple1 | "special space" |
* +--------------------------------+-----------------+
* ^ pd_special
*
* NOTES:
*
* linp1..N form an ItemId (line pointer) array. ItemPointers point
* to a physical block number and a logical offset (line pointer
* number) within that block/page. Note that OffsetNumbers
* conventionally start at 1, not 0.
*
* tuple1..N are added "backwards" on the page. Since an ItemPointer
* offset is used to access an ItemId entry rather than an actual
* byte-offset position, tuples can be physically shuffled on a page
* whenever the need arises. This indirection also keeps crash recovery
* relatively simple, because the low-level details of page space
* management can be controlled by standard buffer page code during
* logging, and during recovery.
Answers to your questions:
The ctid of a tuple is the physical address, consisting of the block number (starting at 0) and the line pointer (starting at 1). You can identify the line pointer from the ctid of a table row: it is the second number. For example, (321,5) would be the fifth line pointer on the 322th page.
The location of the actual tuple in the block is not fixed: it is stored in lp_off. That allows PostgreSQL to move the data around in a block without changing the physical address (tid) of the tuples. The line pointer itself never changes.
As explained above, the actual data can move in the block, but the line pointer doesn't change. The ctid of a tuple is what is stored in the index. The statement should be clear now.

Related

Scala GraphLoader.edgeListFile (NumberFormatException)

Im loding edges of a graph from file
val graph = GraphLoader.edgeListFile(sc, "comb.txt")
Hoverwe its throwing error.
java.lang.NumberFormatException: For input string: "116374117927631468606"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
I think it accepts only integer node values. How do i . fix this
https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/graphx/GraphLoader.html
the api document clearly states the following comments
/**
* Loads a graph from an edge list formatted file where each line contains two integers: a source
* id and a target id. Skips lines that begin with#.
*
* If desired the edges can be automatically oriented in the positive
* direction (source Id is less than target Id) by settingcanonicalOrientationto
* true.
And 116374117927631468606 value is certainly big for it to be an integer as the official site says
final val MaxValue: Int(2147483647)
The largest value representable as a Int.

PostgreSQL - How is the cost of Sort Node in the Query Plan calculated?

I have the following Query Plan in postgreSQL:
Unique (cost=487467.14..556160.88 rows=361546 width=1093)
-> Sort (cost=487467.14..488371.00 rows=361546 width=1093)
Sort Key: (..)
-> Append (cost=0.42..108072.53 rows=361546 width=1093)
-> Index Scan using (..) (cost=0.42..27448.06 rows=41395 width=1093)
Index Cond: (..)
Filter: (..)
-> Seq Scan on (..) (cost=0.00..77009.02 rows=320151 width=1093)
Filter: (..)
I just wonder how the exact calculation for the two values in sort are done? I understand how it works for the scans and append but I can't find anything regarding the Sort-cost calculation.
Something like for the SeqScan which is:
(disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost)
The Query for the Plan was basicly something like this: (not exactly because it contained a view but you get the idea)
SELECT * FROM (
SELECT *, true AS storniert
FROM auftragsposition
WHERE mengestorniert > 0::numeric AND auftragbestaetigt = true
UNION
SELECT *, false AS storniert
FROM auftragsposition
WHERE mengestorniert < menge AND auftragbestaetigt = true
) as bla
It is implemented (and documented, as source code is often the only documentation) in src/backend/optimizer/path/costsize.c function cost_sort() and basic cost is like N*log(N) compare operations for in-memory sort (disk-based sort may be slower, and its costs are estimated too).
This N*log(N) is expected: https://en.wikipedia.org/wiki/Sorting_algorithm#Efficient_sorts "general sorting algorithms are almost always based on an algorithm with average time complexity ... O(n log n)"):
https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/optimizer/path/costsize.c#L1409
/*
* cost_sort
* Determines and returns the cost of sorting a relation, including
* the cost of reading the input data.
*
* If the total volume of data to sort is less than sort_mem, we will do
* an in-memory sort, which requires no I/O and about t*log2(t) tuple
* comparisons for t tuples.
*
* If the total volume exceeds sort_mem, we switch to a tape-style merge
* algorithm. There will still be about t*log2(t) tuple comparisons in
* total, but we will also need to write and read each tuple once per
* merge pass. We expect about ceil(logM(r)) merge passes where r is the
* number of initial runs formed and M is the merge order used by tuplesort.c.
* Since the average initial run should be about sort_mem, we have
* disk traffic = 2 * relsize * ceil(logM(p / sort_mem))
* cpu = comparison_cost * t * log2(t)
*
* If the sort is bounded (i.e., only the first k result tuples are needed)
* and k tuples can fit into sort_mem, we use a heap method that keeps only
* k tuples in the heap; this will require about t*log2(k) tuple comparisons.
*
* The disk traffic is assumed to be 3/4ths sequential and 1/4th random
* accesses (XXX can't we refine that guess?)
*
* By default, we charge two operator evals per tuple comparison, which should
* be in the right ballpark in most cases. The caller can tweak this by
* specifying nonzero comparison_cost; typically that's used for any extra
* work that has to be done to prepare the inputs to the comparison operators.
*
* 'pathkeys' is a list of sort keys
* 'input_cost' is the total cost for reading the input data
* 'tuples' is the number of tuples in the relation
* 'width' is the average tuple width in bytes
* 'comparison_cost' is the extra cost per comparison, if any
* 'sort_mem' is the number of kilobytes of work memory allowed for the sort
* 'limit_tuples' is the bound on the number of output tuples; -1 if no bound
*
* NOTE: some callers currently pass NIL for pathkeys because they
* can't conveniently supply the sort keys. Since this routine doesn't
* currently do anything with pathkeys anyway, that doesn't matter...
* but if it ever does, it should react gracefully to lack of key data.
* (Actually, the thing we'd most likely be interested in is just the number
* of sort keys, which all callers *could* supply.)
*/
Parts of actual calculations - disk, heap sort, quicksort. No estimations on parallel sort now (https://wiki.postgresql.org/wiki/Parallel_Internal_Sort, https://wiki.postgresql.org/wiki/Parallel_External_Sort)?
...
path->rows = tuples;
/*
* We want to be sure the cost of a sort is never estimated as zero, even
* if passed-in tuple count is zero. Besides, mustn't do log(0)...
*/
if (tuples < 2.0)
tuples = 2.0;
/* Include the default cost-per-comparison */
comparison_cost += 2.0 * cpu_operator_cost;
..
if (output_bytes > sort_mem_bytes)
{
...
/*
* We'll have to use a disk-based sort of all the tuples
*/
/*
* CPU costs
*
* Assume about N log2 N comparisons
*/
startup_cost += comparison_cost * tuples * LOG2(tuples);
/* Disk costs */
/* Compute logM(r) as log(r) / log(M) */
if (nruns > mergeorder)
log_runs = ceil(log(nruns) / log(mergeorder));
else
log_runs = 1.0;
npageaccesses = 2.0 * npages * log_runs;
/* Assume 3/4ths of accesses are sequential, 1/4th are not */
startup_cost += npageaccesses *
(seq_page_cost * 0.75 + random_page_cost * 0.25);
}
else if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes)
{
/*
* We'll use a bounded heap-sort keeping just K tuples in memory, for
* a total number of tuple comparisons of N log2 K; but the constant
* factor is a bit higher than for quicksort. Tweak it so that the
* cost curve is continuous at the crossover point.
*/
startup_cost += comparison_cost * tuples * LOG2(2.0 * output_tuples);
}
else
{
/* We'll use plain quicksort on all the input tuples */
startup_cost += comparison_cost * tuples * LOG2(tuples);
}
/*
* Also charge a small amount (arbitrarily set equal to operator cost) per
* extracted tuple. We don't charge cpu_tuple_cost because a Sort node
* doesn't do qual-checking or projection, so it has less overhead than
* most plan nodes. Note it's correct to use tuples not output_tuples
* here --- the upper LIMIT will pro-rate the run cost so we'd be double
* counting the LIMIT otherwise.
*/
run_cost += cpu_operator_cost * tuples;

Reading wrong data when indexing a nested array

I'm implementing a dcache in a pipeline processor.
My dcache is a 2-way associative with 2 words per block and 8 indexes
This is how I initialized my cache structure.
typedef struct packed {
logic [25:0] tag;
logic valid, dirty;
word_t [1:0] data;
} block_t;
typedef struct packed {
block_t [1:0] way;
} dcache_t;
dcache_t [7:0] cache;
So to access a word: cache[i].way[j].data[k]
I can write to cache fine.
index, way and sel are variables that use combination logic to determine where to index.
So for example, this line is in my always_ff register for the cache.
cache[index].way[way].data[sel] = ccif.dload[CPUID];
After the above line of code, the following gets stored into cache
for index = 6, way = 0, sel = 0
cache[6].way[0].data[0] <== 0x01234567
and after the next clock cycle the following for index = 6, way = 0, sel = 1
cache[6].way[0].data[1] <== 0x89ABCDEF
Since I load two words at a time.
...but when I read from it using index = 6, way = 0, sel = 1
dcif.dmemload = cache[index].way[way].data[sel];
The following gets read from my cache
dcif.dmemload <== 0xCDEF0123
I get wrong value and don't know why, since the value in cache is still the same and hasn't changed.
This is the current state of a section of my cache at the time of reading
+-------+------------+------------+
| index | data[1] | data[0] |
+-------+------------+------------+
| 6 | 89ABCDEF | 01234567 |
+-------+------------+------------+
Any ideas? I'm confused because my indexing works fine when writing but something weird happens when reading
Edit: the value read isn't always offset by 2 bytes.
I'm not sure if I have too many nested arrays.
This is a bug in ModelSim/Questa that is fixed in the next release.
The solution is to not make the entire nested array all packed. You probably did not mean to have your cache packed anyways. You should not pack your arrays unless you need to access the whole array as a single integral value.
dcache_t cache[7:0];

Postgresql earth_box algorithm

I found this tutorial how to find something in specified the radius. My question is what algorithm was used to implement it?
If you mean the earth_box, the idea is to come up with a data type that can be useful with a GIST index (inverted search tree):
http://www.postgresql.org/docs/current/static/gist-intro.html
See in particular the links at the bottom of the maintainers' page:
http://www.sai.msu.su/~megera/postgres/gist/
One leads to:
The GiST is a balanced tree structure like a B-tree, containing pairs. But keys in the GiST are not integers like the keys in a B-tree. Instead, a GiST key is a member of a user-defined class, and represents some property that is true of all data items reachable from the pointer associated with the key. For example, keys in a B+-tree-like GiST are ranges of numbers ("all data items below this pointer are between 4 and 6"); keys in an R-tree-like GiST are bounding boxes, ("all data items below this pointer are in Calfornia"); keys in an RD-tree-like GiST are sets ("all data items below this pointer are subsets of {1,6,7,9,11,12,13,72}"); etc. To make a GiST work, you just have to figure out what to represent in the keys, and then write 4 methods for the key class that help the tree do insertion, deletion, and search.
http://gist.cs.berkeley.edu/gist1.html
If you mean the earth distance itself, the meaty part of source is:
/* compute difference in longitudes - want < 180 degrees */
longdiff = fabs(long1 - long2);
if (longdiff > M_PI)
longdiff = TWO_PI - longdiff;
sino = sqrt(sin(fabs(lat1 - lat2) / 2.) * sin(fabs(lat1 - lat2) / 2.) +
                cos(lat1) * cos(lat2) * sin(longdiff / 2.) * sin(longdiff / 2.));
if (sino > 1.)
        sino = 1.;
return 2. * EARTH_RADIUS * asin(sino);
https://github.com/postgres/postgres/blob/master/contrib/earthdistance/earthdistance.c#L50
My math is too rusty to be affirmative on what the above does exactly, but my guess would be that it's computing the distance between two points on the surface of a sphere (without considering the height of the two points). In other words, nautical miles.

Progress 4GL - Enumerable.Take(TSource) Method in

Thanks in advance for looking at this question!
This is all in the context of a FOR EACH loop which can get quite lengthy - think 100,000 records - and I'm looking for a way to take n records from a position in that resultset e.g. start # 4000 & take the next 500 records.
I was looking around for keywords, in the ABL Reference, such as:
Position
LookAhead
RECID - Whether we can find a RECID at the nth position
Query Tuning
So far no luck. Any smarties out there with a hint?
Here is an example that I created against the sports database. The sports database is a sample database similar to the AdventureWorks database in SQL Server.
This should get you started:
def var v-query as char no-undo.
def var h as handle no-undo.
/* Here is where can set a dynamic query */
assign v-query = "for each orderline no-lock".
/* Create handle used for query */
create query h.
/* Set the table against the query so you can access it conveniently */
/* If you have other tables in your "for each", simply do a */
/* set-buffers on each table */
h:set-buffers(buffer orderline:handle).
/* Prepare Query */
h:query-prepare(v-query).
/* DO Open */
h:query-open.
/* Setup query to row 10 */
h:reposition-to-row(10).
LINE_LOOP:
repeat:
/* Read next row */
h:get-next.
/* Check if we are not past the end */
if h:query-off-end then leave LINE_LOOP.
/* Since we added orderline as a buffer we can now use it here */
disp orderline.ordernum
orderline.linenum
orderline.itemnum
orderline.price
orderline.qty.
end. /* repeat */
h:query-close.
FYI, the Progress Knowledge base and the PSDN have great samples and tutorials
other example - filling of dataset tables - BATCHING
DEFINE TEMP-TABLE ttOrder LIKE Order.
DEFINE DATASET dsOrder FOR ttOrder.
/* you can set a buffer object */
/*DEFINE DATA-SOURCE srcOrder FOR Order.*/
/* or you can set a query */
DEFINE QUERY qOrder FOR Order SCROLLING.
QUERY qOrder:QUERY-PREPARE ("FOR EACH Order").
DEFINE DATA-SOURCE srcOrder FOR QUERY qOrder.
BUFFER ttOrder:ATTACH-DATA-SOURCE( DATA-SOURCE srcOrder:HANDLE ).
/*The maximum number of ProDataSet temp-table rows to retrieve in each FILL operation.*/
BUFFER ttOrder:BATCH-SIZE = 10.
/* Empties the table before the FILL operation begins.
Without setting this attribute will next rows append to existing in temp-table */
BUFFER ttOrder:FILL-MODE = "EMPTY".
/* first time - result 1 - 10 */
DATASET dsOrder:FILL ().
FOR EACH ttOrder:
DISPLAY ttOrder.ordernum.
END.
/* set the startpoint to position 11 */
DATA-SOURCE srcOrder:RESTART-ROW = 11.
/* second time 11 - 20 */
DATASET dsOrder:FILL ().
FOR EACH ttOrder:
DISPLAY ttOrder.ordernum.
END.