Is jsonb conversion to text deterministic?

Is jsonb conversion to text deterministic? - postgresql

This is the short version of an overly long question that sadly attracted no answers.
Is it possible, given two jsonb variables x and y, to have both
1. (x = y) yield true, and
2. (x::text = y::text) yield false
I ask this question because it appears there is no promised order in which a jsonb object will be unpacked into a string. I'd just like to be sure this is the case.
Thanks in advance for feedback!
Edit:
The original question covers the "why" for this question, but the skinny is that I hope to group data in different rows based upon a hash of many columns represented as text, some of which are jsonb.
I don't care which way the object comes in or which way it gets unpacked, but I do care if two jsonb fields which are equivalent as jsonb are not equivalent as text strings.
As it seems I cannot count on text representations to be presented in the same way, I've normalized out the jsonb field to a separate table with the jsonb field set as a unique index.
And if I write more here... this question will approach the length of the one it derives from!

Formally the order is not deterministic because of the JSON object definition:
An object is an unordered set of name/value pairs.
Practically it appears that objects are sorted by length of keys and then alphabetically:
with example(col) as (
values
('{"cc": 1, "ab": 1, "a": 1, "aa": 1, "b": 2, "abc": 1}'::jsonb)
)
select col::text
from example
col
-------------------------------------------------------
{"a": 1, "b": 2, "aa": 1, "ab": 1, "cc": 1, "abc": 1}
(1 row)
Note that this behavior is undocumented and may change in future releases (though it may seem unlikely).

Related

JSONB Data Type Modification in Postgresql

I have a doubt with modification of jsonb data type in postgres
Basic setup:-
array=> ["1", "2", "3"]
and now I have a postgresql database with an id column and a jsonb datatype column named lets just say cards.
id cards
-----+---------
1 {"1": 3, "4": 2}
thats the data in the table named test
Question:
How do I convert the cards of id->1 FROM {"1": 3, "4": 2} TO {"1": 4, "4":2, "2": 1, "3": 1}
How I expect the changes to occur:
From the array, increment by 1 all elements present inside the array that exist in the cards jsonb as a key thus changing {"1": 3} to {"1": 4} and insert the values that don't exist as a key in the cards jsonb with a value of 1 thus changing {"1":4, "4":2} to {"1":4, "4":2, "2":1, "3":1}
purely through postgres.
Partial Solution
I asked a senior for support regarding my question and I was told this:-
Roughly (names may differ): object keys to explode cards, array_elements to explode the array, left join them, do the calculation, re-aggregate the object. There may be a more direct way to do this but the above brute-force approach will work.
So I tried to follow through it using these two functions json_each_text(), json_array_elements_text() but ended up stuck halfway into this as well as I was unable to understand what they meant by left joining two columns:-
SELECT jsonb_each_text(tester_cards) AS each_text, jsonb_array_elements_text('[["1", 1], ["2", 1], ["3", 1]]') AS array_elements FROM tester WHERE id=1;
TLDR;
Update statement that checks whether a range of keys from an array exist or not in the jsonb data and automatically increments by 1 or inserts respectively the keys into the jsonb with a value of 1
Now it might look like I'm asking to be spoonfed but I really haven't managed to find anyway to solve it so any assistance would be highly appreciated 🙇

The key insight is that with jsonb_each and jsonb_object_agg you can round-trip a JSON object in a subquery:
SELECT id, (
SELECT jsonb_object_agg(key, value)
FROM jsonb_each(cards)
) AS result
FROM test;
(online demo)
Now you can JOIN these key-value pairs against the jsonb_array_elements of your array input. Your colleague was close, but not quite right: it requires a full outer join, not just a left (or right) join to get all the desired object keys for your output, unless one of your inputs is a subset of the other.
SELECT id, (
SELECT jsonb_object_agg(COALESCE(obj_key, arr_value), …)
FROM jsonb_array_elements_text('["1", "2", "3"]') AS arr(arr_value)
FULL OUTER JOIN jsonb_each(cards) AS obj(obj_key, obj_value) ON obj_key = arr_value
) AS result
FROM test;
(online demo)
Now what's left is only the actual calculation and the conversion to an UPDATE statement:
UPDATE test
SET cards = (
SELECT jsonb_object_agg(
COALESCE(key, arr_value),
COALESCE(obj_value::int, 0) + (arr_value IS NOT NULL)::int
)
FROM jsonb_array_elements_text('["1", "2", "3"]') AS arr(arr_value)
FULL OUTER JOIN jsonb_each_text(cards) AS obj(key, obj_value) ON key = arr_value
);
(online demo)

plpgsql jsonb_set for JSON array of objects with nested arrays

Using PostgreSQL 14.0 PL/SQL (inside a do block). Attempting to:
query a certain key ('county') in a jsonb array of objects (which in turn has object + nested arrays) based on dynamic variable value (named cty.cty_name)
retrieve value and change it
update said jsonb to reflect the updated value in (2)
after executing (3) on multiple values, create new table with above jsonb as a row with one column
steps (1) and (2) execute properly. But, for the life of me, I can't figure (3) out.
jsonb object(res) -- may have 100s of array items at index root:
[ {"county": "x", "dpa": ["a", "b", "c"]},
{"county": "y", "dpa": ["d", "e", "f"]},
{"county": "z", "dpa": ["h", "i", "j"]},
...
]
code for (1) and (2) above:
execute format('select jsonb_path_query_array(''%s'', ''$[*]?(#.%s=="%s")'')',
res,'county',cty.cty_name) into s1;
execute format('select jsonb_array_elements(''%s'')->''%s''', s1,'dpa') into s2;
s2 := s2 || jsonb_build_array(r1.name);
where say:
cty.cty_name is y (which is created from a select in for loop)
r1.name is m
s2 holds the new value e.g. ["d", "e", "f", "m"]
Now, to execute (3) I need path to dpa for which key county matches value y in some index in res. Having tried (and failed miserably) at various permutations of jsonb_query_path with SQL/JSON Path, dollar-quoted strings, jsonb_path_to_array with double-quoted hell for format queries, other SO solutions which use idx or idx-1 (but I don't have JSON in table), I had to resort to soliciting the Borg collective's wisdom. Help please.

The problem you're running into with the current approach is twofold:
there's no way to "delete" the matching county object in vitro (via jsonb_set(), etc.)
there's no way to force uniqueness (to utilize the ON CONFLICT ... DO UPDATE mechanism) within the json document itself to accomplish the same
When we get to
Now, to execute (3) I need path to dpa for which key county matches value y in some index in res.
instead of updating the existing record in-place, why not just remove the matching record (with now-stale value for "dpa"), re-aggregate what remains (i.e. the non-matching objects), and then append the updated matching object to the aggregated jsonb array, similar to:
SELECT jsonb_agg(a) || jsonb_build_object('county', 'y', 'dpa', (jsonb_path_query_array(res, '$[*] ? (#.county=="y")')#>'{0,dpa}') || jsonb_build_array('m') )
FROM jsonb_array_elements(res) a
WHERE NOT a #> (jsonb_path_query_array(res, '$[*] ? (#.county=="y")')->0);
This gives a single jsonb value back per your specification in (4); you should be able to parameterize this into your EXECUTE invocation as necessary.
Worth noting on the output order, if you're looping over the initial "res" array, then the order of the objects within the array (with respect to the "county" values of the driving cursor) will be restored according to the order of the cursor you're iterating through for "county".
This is because a full cycle through said cursor will remove each of the old objects and replace them at the end of the resultant array, so defining an ORDER BY clause in this cursor will be important if this is relevant.

updating postgres jsonb column

I have below json string in my table column which is of type jsonb,
{
"abc": 1,
"def": 2
}
i want to remove the "abc" key from it and insert "mno" with some default value. i followed the below approcach for it.
UPDATE books SET books_desc = books_desc - 'abc';
UPDATE books SET books_desc = jsonb_set(books_desc, '{mno}', '5');
and it works.
Now i have another table with json as below,
{
"a": {
"abc": 1,
"def": 2
},
"b": {
"abc": 1,
"def": 2
}
}
Even in this json, i want to do the same thing. take out "abc" and introduce "mno" with some default value. Please help me to achieve this.
The keys "a" and "b" are dynamic and can change. But the values for "a" and "b" will always have same keys but values may change.
I need a generic logic.
Requirement 2:
abc:true should get converted to xyz:1.
abc:false should get converted to xyz:0.

demo:db<>fiddle
Because of a possible variety of your JSON keys it might be complicated to generate a common query. This is because you need to give the path within the json_set() function. But without actual values it would be hard.
A simple work-around is using the regexp_replace() function on the text representation of the JSON string to replace the relevant objects.
UPDATE my_table
SET my_data =
regexp_replace(my_data::text, '"abc"\s*:\s*\d+', '"mno":5', 'g')::jsonb

For added requirement 2:
I wrote the below query based on already given solution:
UPDATE books
SET book_info =
regexp_replace(book_info::text, '"abc"\s*:\s*true', '"xyz":1', 'g')::jsonb;
UPDATE books
SET book_info =
regexp_replace(book_info::text, '"abc"\s*:\s*false', '"xyz":0', 'g')::jsonb;

MongoDb: how to create the right (composite) index for data with many searchable fields

UPDATE: I need to add that the point of this question is to allow me to define schemas for Json Rest Stores. The user can search by any one key, or several keys. So, I cannot easily predict what the users will search by -- it could be 1, 2, 5 fields (this is especially true for data-rich fields like people, bookings, etc.)
Imagine that I have an index as such:
{ "item": 1, "location": 1, "stock": 1 }
Following the MongoDb manual on indexes:
MongoDB can use this index to support queries that include:
the item field,
the item field and the location field,
the item field and the location field and the stock field, or
only the item and stock fields; however, this index would be less efficient than an index on only item and stock.
MongoDB cannot use this index to support queries that include:
only the location field,
only the stock field, or
only the location and stock fields.
Now, suppose I have a schema with exactly these fields:
item: String
location: String
stock: String
qty: number
And imagine I want to make sure every query is indeed indexed. I would do:
For item:
item, location, stock, qty
item, location, qty, stock
item, stock, qty, location
item, stock, location, qty
item, qty, location, stock
item, qty, stock, location
For location:
...you know the gist
Now... this seems a little insane. If you have a database where you have TEN searchable fields, this becomes clearly unworkable as the number of indexes grows exponentially.
Am I missing something? My idea was to define a schema, define which fields were searchable, and write a function that makes up all of the needed indexes regardless of what fields were present and what fields weren't. However, I am thinking about it, and... well, I must be missing something.
Am I?

I will try to explain what does this mean by example. The indexes based on B-tree is not something mongodb specific. In contrast it is rather common concept.
So when you create an index - you show the database an easier way to find something. But this index is stored somewhere with a pointer pointing to a location of the original document. This information is ordered and you might look at it as binary tree which has a really nice property: the search is reduced from O(n) (linear scan) to O(log(n)). Which is much much faster because each time we trim our space in half (potentially we can reduce the time from 10^6 to 20 lookups). For example we have a big collection with field {a : some int, b: 'some other things'} and if we index it by a, we end up with another data structure which is sorted by a. It looks this way (by this I do not mean that it is another collection, this is just for demonstration):
{a : 1, pointer: to the field with a = 1}, // if a is the smallest number in the starting collection
...
{a : 999, pointer: to the field with a = 990} // assuming that 999 is the biggest field
So right now we are searching for a field a = 18. Instead of going one by one through all elements we take something in the middle and if it is bigger then 18, then we are dividing the lower part in half and checking the element there. We continue till we will find a = 18. Then we look at the pointer and knowing it we extract the original field.
The situation with compound index is similar (instead of ordering by one element we order by many). For example you have a collection:
{ "item": 5, "location": 1, "stock": 3, 'a lot of other fields' } // was stored at position 5 on the disk
{ "item": 1, "location": 3, "stock": 1, 'a lot of other fields' } // position 1 on the disk
{ "item": 2, "location": 5, "stock": 7, 'a lot of other fields' } // position 3 on the disk
... huge amount of other data
{ "item": 1, "location": 1, "stock": 1, 'a lot of other fields' } // position 9 on the disk
{ "item": 1, "location": 1, "stock": 2, 'a lot of other fields' } // position 7 on the disk
and want an index { "item": 1, "location": 1, "stock": 1 }. The lookup table would look like this (one more time - this is not another collection, this is just for demonstration):
{ "item": 1, "location": 1, "stock": 1, pointer = 9 }
{ "item": 1, "location": 1, "stock": 2, pointer = 7 }
{ "item": 1, "location": 3, "stock": 1, pointer = 1 }
{ "item": 2, "location": 5, "stock": 7, pointer = 3 }
.. huge amount of other data (but not necessarily here. If item would be one it would be somewhere next to items 1)
{ "item": 5, "location": 1, "stock": 3, pointer = 5 }
See that here everything is basically sorted by item, then by location and then by pointer.
The same way as with a single index we do not need to scan everything. If we have a query which looks for item = 2, location = 5 and stock = 7 we can quickly identify where documents with item = 2 are and then the same way quickly identify where among these items item with location 5 and so on.
And right now an interesting part. Also we created just one index (although this is a compound index, it is still one index) we can use it to quickly find the element
only by the item. Really all we need to do is only the first step. So there is no point to create another index {location : 1} because it is already covered by compound index.
also we can quickly find only by item and by location (we need only 2 steps).
Cool 1 index but helps us in three different ways. But wait a minute: what if we want to find by item and stock. Oh it looks like we can speed up this query as well. We can in log(n) find all elements with specific item and ... here we have to stop - magic has finished. We need to iterate through all of them. But still pretty good.
But may it can help us with other queries. Lets look at a query by location which looks like was already ordered. But if you will look at it - you see that this is a mess. One in the beginning and then one in the end. It can not help you at all.
I hope this clarifies few things:
why indexes are good (reduce time from O(n) to potentially O(log(n))
why compound indexes can help with some queries nonetheless we have not created an index on that particular field and help with some other queries.
what indexes are covered by compound index
why indexes can harm (it creates additional datastructure which should be maintained)
And this should tell another valid thing: index is not a silver bullet. You can not speed up all your queries, so it sound silly to think that by creating indexes on all fields EVERYTHING would be super fast.

What are your real query patterns? It's very unlikely that you would need to create all of these possible index combinations. I also doubt that including qty in the index would be of much use. Do you need to search for things where qty == 4 independent of location and item type?
An index doesn't need to identify every single record, it just needs to be specific enough to make any final scan small. Given an item code or a stock value are there really that many locations that you'd also need to index on them?
I suspect in this case an index on item, an index on location and and index on stock would be sufficient to answer most likely queries with sufficient speed. (But we'd need to know more about what these field names mean and what the count and distribution of values is within them).
Use explain with your queries and you can see how well they are performing. Add indices as necessary, don't create every possible ordering.

Sorting an Array

I have an array containing all string values. Contents of the array are order_id, waypoint_x, waypoint_y. How do I sort it based on order_id and have the results like 1, 2, 3, 4.... (i.e. as if the field was an integer type and string)
When I use the waypoints as the array is at the moment the results come out 1, 10, 11, 12...
Regards,
Stephen

If you check the documentation of NSArray, you'll see there are different methods for sorting the array: sortedArrayUsingFunction, sortedArrayUsingSelector, sortedArrayUsingComparator, etc.
There's a nice example of how to use sortedArrayUsingFunction to sort with integer values in the answer to this question: Sorting an array of doubles or CLLocationDistance values on the iPhone

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse