Cherry-pick commit with its merge commit without solving conflict - merge

Here is my case:
├── (c0) ── (c1) ── (c2-merge-commit) ── (b0)── (b1-merge-commit)
I wanted to combine c0, c1 and c2 into one commit and have this:
├── (squashed : c0, c1, c2-merge-commit) ── (b0)── (b1-merge-commit)
So i carried my HEAD to c2, then back to c0 and squash merge to c2 (HEAD#{1})
git merge --squash HEAD#{1}
This part went quite well and i had :
├── (squashed : c0, c1, c2)
Now i need to add 2 commits (b0) and (b1-merge-commit). I used cherry-pick, but in that case i need to resolve conflict for b0. But i dont want to resolve conflict again as i actually did it with (b1-merge-commit).
As a solution; I first squashed b0 and b1 into new commit; and then cherry-pick this one over (squashed : c0, c1, c2). But obviously this solution won't scale if i would do squashing +10 commits past with so many merge commits flying around.
TLDR :
I wanna carry two commits on top of my branch; one of them is actual change(b0) with conflicts and other one is merge commit where conflicts are actually resolved. When i cherry-pick one by one, git asks me to handle conflicts of b0 where conflict actually resolved with merge commit on b1. I could only manage to do it with squashing b0 and b1 into one new commit, and cherry-pick it on my branch.

I could find two approaches so far, but non of them gives what exactly i want. But atleast i wouldnt need to resolve conflicts again.
First one, as it is mentioned in question already.
I did squashing for b0 and b1 in another branch. Turn back to original branch and then cherry-pick squashed b0 and b1:
├── (squashed : c0, c1, c2)-(squashed: b0, b1)
Second one, cherry-pick merge commit directly:
This one is more faster and simpler.
$> git cherry-pick -m 1 <hash-b1>
And branch now look like this :
├── (squashed : c0, c1, c2)-(b1-merge-commit)

Related

How to handle concurrent adds on the same key in Last Write Wins map?

I am implementing an LWW map and in my design, all added key-value pairs have timestamps as is expected from LWW. That works for me until the same key is added in two replicas with different values at the same time. I can't understand how to make the merge operation commutative in this scenario.
Example:
Replica1 => add("key1", "value1", "time1")
Replica2 => add("key1", "value2", "time1")
Merge(Replica1, Replica2) # What should be the value of key1 in the resulting map?
Let's see what last write wins means in terms on causality. Let's say two clients C1 and C2 changed the same data D (same key) to values D1 and D2 respectively. If they don't ever talk to each other directly, you can pick D1 or D2 as the last value, and they will be OK.
But, if they talk to each other like, C1 changed the value to D1, informed C2, which as the result changed it to D2. Now, D1 and D2 have causal dependency, D1 happens before D2, so if your system picks D1 as the last value in merge, you have broken the last write wins guaranty.
Now coming to your question, when two clients make two requests in parallel, those two requests no way can have causal dependency, as both were inflight together, so any value you pick is fine.

Decomposing tables where some attributes aren't in any minimal, non-trivial functional dependency

While writing a library to automatically decompose a given table, I came across some special cases where the first steps of Bernstein's synthesis for 3NF (ACM 1976) give me results I don't expect.
Take this simple case:
a b
---
1 1
2 1
1 2
2 2
By my understanding, this is the full set of functional dependencies:
{} -> {}
a -> {}
a -> a
b -> {}
b -> b
ab -> {}
ab -> a
ab -> b
ab -> ab
By eye, we can see that both attributes together form a candidate key, and there's no normalisation to be done. However, suppose we take the full FD set above, and try to apply Bernstein to decompose the table. We expect to get the same table back.
Bernstein has the following first steps:
Eliminate extraneous attributes. We simplify to
{} -> {} (repeated)
a -> a (repeated)
b -> b (repeated)
ab -> ab
Find a non-redundant covering. ab -> ab is redundant by augmentation, so we have
{} -> {}
a -> a
b -> b
I'd say the latter two are redundant as well, by reflexivity. If we keep them, the remaining steps give two non-equivalent keys, which result in two separate relations after applying the rest of Bernstein synthesis. If we don't keep them, there's nothing to work with, so the remaining steps give no tables.
Where is the problem in the above?
This appears to be solved by an addendum to Bernstein's synthesis, that I came across in lecture videos from Gary Boeticcher, then at UHCL: if the decomposition does not contain a table containing one of the original table's candidate keys, then adding an additional table with one of those candidate keys will make the decomposition lossless. In this case, after applying Bernstein's synthesis, and getting no tables in return, we could add a table with both attributes a and b. This gives us back the original table, as we'd expect.

PostgreSQL Index physical layout

I am trying to understand how PostgreSQL physical index layout is. What I came to know is indexes are stores as part of set of pages with a B tree data structure. I am trying to understand how vacuumming impacts indexes. Does it help to contain its size?
B-tree indexes are decade-old technology, so a web search will turn up plenty of good detailed descriptions. In a nutshell:
A B-tree is a balanced tree of index pages (8KB in PostgreSQL), that is, every branch of the tree has the same depth.
The tree is usually drawn upside down, the starting (top) node is the root node, and the pages at the bottom are called leaf nodes.
Each level of the tree partitions the search space; the deeper the level, the finer the partitioning, until the individual index entries are reached in the leaf nodes.
Each entry in an index page points to a table entry (in the leaf nodes) or to another index page at the next level.
This is a sketch of an index with depth three, but mind the following:
some nodes are omitted, in reality all leaf nodes are on level 3
in reality here are not three entries (keys) in one node, but around 100
┌───────────┐
level 1 (root node) │ 20 75 100 │
└───────────┘
╱ ╱ │ ╲
╱ ╱ │ ╲
╱ ╱ │ ╲
┌───────────┐┌─────┐┌──────────┐┌─────┐
level 2 │ 5 10 15 ││ ... ││ 80 87 95 ││ ... │
└───────────┘└─────┘└──────────┘└─────┘
╱ ╱ │ ╲
╱ ╱ │ ╲
╱ ╱ │ ╲
┌─────┐┌─────┐┌──────────┐┌─────┐
level 3 (leaf nodes) │ ... ││ ... ││ 89 91 92 ││ ... │
└─────┘└─────┘└──────────┘└─────┘
Some notes:
The pointers to the next level are actually in the gaps between the entries, searching in an index is like “drilling down” to the correct leaf page.
Each node ia also linked with its siblings to facilitate insertion and deletion of nodes.
When a node is full, it is split in two new nodes. This splitting can recurse up and even reach the root node. When the root node is split, the depth of the index increases by 1.
In real life, the depth of a B-tree index can hardly exceed 5.
When an index entry is deleted, an empty space remains. There are techniques to consolidate that by joining pages, but this is tricky, and PostgreSQL doesn't do that.
Now to your question:
When a table (heap) entry is removed by VACUUM because it is not visible for any active snapshot, the corresponding entry in the index is removed as well. This results in empty space in the index, which can be reused by future index entries.
Empty index pages can be deleted, but the depth of the index will never be reduced. So mass deletion can (after VACUUM has done its job) reduce the index size, but will more likely result in a bloated index with pages that contain only few keys and lots of empty space.
A certain amount of index bloat (up to more than 50%) is normal, but if unusual usage patterns like mass updates and deletes cause bad index bloat, you'll have to rewrite the index with REINDEX, thereby getting rid of bloat. Unfortunately this operation locks the index, so that all concurrent access is blocked until it is done.

Find distinct tree hierarchy

I have DB with following tables and relations:
Trunk (1..n) Branch (1..n) Twig (0..n) Leaf
Branch, Twig and Leaf have similar columns.
For example Twig has columns:
id, branchID, name, description, key1, key2, key3
I don't care about name and description, but I care about key 1 - 3
My goal is to find distinct tree structure (or hierarchy) and get Trunk IDs that follow same tree configuration.
Same tree configuration means:
each Trunk has same number of Branches (1 - n)
Branches have same key values (but can have different name and description)
each Branch has same number of Twigs (1 -n)
Twigs have same key values (but can have different name and description)
each Twig has same number of Leafs (0 - n)
Leaf have same key values (but can have different name and description)
When I join all tables (Trunk, Branch, Twig and left join Leaf) and group them by relevant key values
I only get Trunks that share a specific Trunk-to-Leaf combination but they might differ in another branch (or miss it altogether)
Now I have 500 Trunks, while there might be only 8 different tree configurations.
My expected result would be: TrunkID - TreeConfiguration (1 - 8)

How to merge only some files?

I am trying to merge part of a commit from the default branch (not all files and parts of other files) to a named branch. I tried graft, but it just takes the whole commit wthout giving me a chance to choose. How would this be done?
Example:
A---B---C---D
\
-E---(G)
G does not exist yet. Lets say C and D each added 5 files and modified 5 files. I want G to have 2 of the 5 files added at C, all the modifications to one of the files and one modification to another file. I would ideally like it to also have something similar from D.
When I selected graft to local..., all I got was the whole C change-set. Same for merge with local...
The unit of merging is a whole changeset, so C and D should have been committed in smaller pieces. You could now merge the whole thing and revert some files, but this will have the result that you won't be able to merge the rest later-- they're considered merged already.
What I'd do is make a branch parallel to C-D, rooted at B in your example, that contains copies of the changes in C and D but splits them into coherent parts. Then you can merge whole changesets from that, and close (or or perhaps even delete) the original C-D branch.
C---D
/
A---B--C1--D1--C2--D2 (equivalent to C--D)
\
E---(G?)
In the above, C1 and C2 together are equivalent to C. While I was at it I went ahead and reordered the four new changesets (use a history-rewriting tool such as rebase), so that you can then simply merge D1 with E:
C---D
/
A---B--C1--D1--C2--D2
\ \
E------G
If reordering the new changesets is not an option, you'd have to do some fancy tapdancing to commit the partial changesets in the order C1, D1, C2, D2; it's probably a lot less trouble to use graft (or transplant) to copy the changes that you're not allowed to merge separately. E.g., in the following you can still merge C1, but then you need a copy of D1 (labeled D1') since there's no way to merge it without pulling C2 along with it.
C---D
/
A---B--C1--C2--D1--D2
\ \
E--G1--D1'