index fragmentation in posgresql - postgresql

I can't figure out what exactly represent fragmentation for a postgres index. in the source code of pgstattuple I found this comment but it's not particularly clear for me.
/*
* If the next leaf is on an earlier block, it means a
* fragmentation.
*/

Btree leaf pages form a logical chain (a doubly linked list) and if you follow that chain in the forward direction you encounter the key values in sorted order. They also have a "physical" order, their block number within the file. That extension considers it fragmentation if the next page following the linked list is to a "physically" earlier page in the file.
You could quibble with this for any number of reasons, but this is the definition that that extension adopted. For example if the logically next page by the linked list is 100 pages forward, that is just as much fragmented as if it is one page backwards. But if it does skip forward a bunch then someone else must be pointing backwards to pick them up (unless they are unused) so would get counted when that "someone else" is encountered. Also, the "physical" order of pages in the file is really just the "logical" order of another layer of abstraction (the file system) and different file systems handle it differently.

Related

Is the access to the top-level page table random?

The top-level page table will not occupy all its table items. What happens when I access a table item with "in / out of place" of 0?
I see that the processing method given in the book is to force a page missing interrupt, but this page missing interrupt is different from what I think. It may directly end the process. Why not remap it?
In addition, if the access to the top-level page table is random (I think the virtual address can be large), For example, in this case in the figure, it is obvious that there are many empty table items in the top-level page table.Isn't it easy to miss?
I think I can extract the key point of the problem, that is, is there a certain virtual address after the program is compiled? Can the operating system try to ensure that it does not cross the border to access unmapped addresses?
I think I have found the answer. After compiling, the program will set logical addresses for them, and the page table items of the top-level page table also correspond to the logical memory distribution of the program. For example, the program body is at the bottom, the corresponding page table item is 0, and the last item of the page table corresponds to the stack area of the program. Once the program accesses the page table items without mapping, it means that it accesses an address area that should not be accessed, I wonder if my idea is correct?
I'm sorry that my English level may make it difficult for you to understand. Thank you for your answer!

Cloud storage rewrite not resetting the componentsCount property

I'm composing several files into one and then i do perform a "rewrite" operation to reset componentsCount, so they won't block further compositions (to avoid 1024 components problem, actually). But, the resulting rewritten object's componentCount property increases as if it was just a "rename" request.
It is stated in documentation (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite):
When you rewrite a composite object where the source and destination
are different locations and/or storage classes, the result will be a
composite object containing a single component (and, as always with
composite objects, it will have only a crc32c checksum, not an MD5).
It is not clear to me what do they mean by "different locations" -- different object names and/or different buckets?
Is there a way to reset this count w/o downloading and uploading resulting composite?
Locations refers to geographically where the source and destination bucket are (us-east1, asia, etc.) -- see https://cloud.google.com/about/locations
If your rewrite request is between buckets in different locations and/or storage classes, the operation does byte copying and (in the case of composite objects) will result in a new object with component count 1. Otherwise the operation will complete without byte copying and in that case (in the case of composite objects) will not change the component count.
It's no longer necessary to reset the component count using rewrite or download/upload because there's no longer a restriction on the component count. Composing > 1024 parts is allowed.
https://cloud.google.com/storage/docs/composite-objects

Database and item orders (general)

I'm right now experimenting with a nodejs based experimental app, where I will be putting in a list of books and it will be posted on a forum automatically every x minutes.
Now my question is about order of these things posted.
I use mongodb (not sure if this changes the question or not) and I just add a new entry for every item to be posted. Normally, things are posted in the exact order I add them.
However, for the web interface of this experimental thing, I made a re-ordering interaction where I can simply drag and drop elements to reorder them.
My question is: how can I reflect this change to the database?
Or more in general terms, how can I order stuff in general, in databases?
For instance if I drag the 1000th item to 1st order, everything below needs to be edited (in db) between 1 and 1000 the entries. This does not seem like a valid and proper solution to me.
Any enlightenment is appreciated.
An elegant way might be lexicographic sorting. Introduce a String attribute for each item. Make the initial length of the values large enough to accomodate the estimated number of items. E.g., if you expect 1000 items, let the keys be baa, bab, bac, ... bba, bbb, bbc, ...
Then, when an item is moved from where it is to another place between two items, assign a value to the sorting attribute of the moved item that is somewhere equidistant (lexicographically) to those items. So to move an item between dei and dej, give it the value deim. To move an item between fadd and fado, give it the value fadi.
Keys starting with a were initially not used to leave space for elements that get dragged before the first one. Never use the key a, as it will be impossible to move an element before this one.
Of course, the characters used may vary according to the sort order provided by the database.
This solution should work fine as long as elements are not reordered extremely frequently. In a worst case scenario, this may lead to longer and longer attribute values. But if the movements are somewhat equally distributed, the length of values should stay reasonable.

consistent hashing on Multiple machines

I've read the article: http://n00tc0d3r.blogspot.com/ about the idea for consistent hashing, but I'm confused about the method on multiple machines.
The basic process is:
Insert
Hash an input long url into a single integer;
Locate a server on the ring and store the key--longUrl on the server;
Compute the shorten url using base conversion (from 10-base to 62-base) and return it to the user.(How does this step work? In a single machine, there is a auto-increased id to calculate for shorten url, but what is the value to calculate for shorten url on multiple machines? There is no auto-increased id.)
Retrieve
Convert the shorten url back to the key using base conversion (from 62-base to 10-base);
Locate the server containing that key and return the longUrl. (And how can we locate the server containing the key?)
I don't see any clear answer on that page for how the author intended it. I think this is basically an exercise for the reader. Here's some ideas:
Implement it as described, with hash-table style collision resolution. That is, when creating the URL, if it already matches something, deal with that in some way. Rehashing or arithmetic transformation (eg, add 1) are both possibilities. This means, naively, a theoretical worst case of having to hit a server n times trying to find an available key.
There's a lot of ways to take that basic idea and smarten it, eg, just search for another available key on the same server, eg, by rehashing iteratively until you find one that's on the server.
Allow servers to talk to each other, and coordinate on the autoincrement id.
This is probably not a great solution, but it might work well in some situations: give each server (or set of servers) separate namespace, eg, the first 16 bits selects a server. On creation, randomly choose one. Then you just need to figure out how you want that namespace to map. The namespaces only really matter for who is allowed to create what IDs, so if you want to add nodes or rebalance later, it is no big deal.
Let me know if you want more elaboration. I think there's a lot of ways that this one could go. It is annoying that the author didn't elaborate on this point; my experience with these sorts of algorithms is that collision resolution and similar problems tend to be at the very heart of a practical implementation of a distributed system.

How do I model a queue on top of a key-value store efficiently?

Supposed I have a key-value database, and I need to build a queue on top of it. How could I achieve this without getting a bad performance?
One idea might be to store the queue inside an array, and simply store the array using a fixed key. This is a quite simple implementation, but is very slow, as for every read or write access the complete array must be loaded / saved.
I could also implement a linked list, with random keys, and there is one fixed key which acts as starting point to element 1. Depending on if I prefer a fast read or a fast write access, I could let point the fixed element to the first or the last entry in the queue (so I have to travel it forward / backward).
Or, to proceed with that - I could also have two fixed pointers: One for the first, on for the last item.
Any other suggestions on how to do this effectively?
Initially, key-value structure is extremely similar to the original memory storage where the physical address in computer memory plays as the key. So any type of data structure could be modeled upon key-value storage surely, including linked list.
Originally, a linked list is a list of nodes including the index information of previous node or following node. Then the node it self should also be viewed as a sub key-value structure. With additional prefix to the key, the information in the node could be separately stored in a flat table of key-value pairs.
To proceed with that, special suffix to the key could also make it possible to get rid of redundant pointer information. This pretend list might look something like this:
pilot-last-index: 5
pilot-0: Rei Ayanami
pilot-1: Shinji Ikari
pilot-2: Soryu Asuka Langley
pilot-3: Touji Suzuhara
pilot-5: Makinami Mari
The corresponding algrithm is also imaginable, I think. If you could have a daemon thread for manipulation these keys, pilot-5 could be renamed as pilot-4 in the above example. Even though, it is not allowed to have additional thread in some special situation, the result of the queue it self is not affected. Just some overhead would exist for the break point in sequence.
However which of the two above should be applied is the problem of balance between the cost of storage space or the overhead of CPU time.
The thread safe is exactly a problem however an ancient problem. Just like the class implementing the interface of ConcurrentMap in JDK, Atomic operation on key-value data is also provided perfectly. There are similar methods featured in some key-value middleware, like memcached, as well, which could make you update key or value separately and thread safely. However these implementation is the algrithm problem rather than the key-value structure it self.
I think it depends on the kind of queue you want to implement, and no solution will be perfect because a key-value store is not the right data structure for this kind of task. There will be always some kind of hack involved.
For a simple first in first out queue you could use a few kev-value stores like the folliwing:
{
oldestIndex:5,
newestIndex:10
}
In this example there would be 6 items in the Queue (5,6,7,8,9,10). Item 0 to 4 are already done whereas there is no Item 11 or so for now. The producer worker would increment newestIndex and save his item under the key 11. The consumer takes the item under the key 5 and increments oldestIndex.
Note that this approach can lead to problems if you have multiple consumer/producers and if the queue is never empty so you cant reset the index.
But the multithreading problem is also true for linked lists etc.