Postgresql Query - select json - postgresql

I have a postgresql query that I want to save as .json, just from a especific part of the query result:
SELECT info FROM d.tests where tag like 'HMIZP'
The result of this query is:
{"blabla":{a lot of blabla}, "Body":[{....
I just want everything after "Body" (including " Body")
How can I do it?

You can combine the extraction with building a json
SELECT json_build_object('Body',json_extract_path('{"blabla": { "a": "a lot of blabla"},"Body": [{"a": [1,2]}, {"b":2}]}','Body'))
| json_build_object |
| :--------------------------------- |
| {"Body" : [{"a": [1,2]}, {"b":2}]} |
db<>fiddle here

Related

Postgresql - Filter object array and extract required values in a json object

I have a PostgreSQL table like below:
| data |
| -------------- |
| {"name":"a","tag":[{"type":"country","value":"US"}]} |
| {"name":"b","tag":[{"type":"country","value":"US"}]}, {"type":"country","value":"UK"}]} |
| {"name":"c","tag":[{"type":"gender","value":"male"}]} |
The goal is to extract all the value in "tag" array with "type" = "country" and aggregate them into a text array. The expected result is as follows:
| result |
| -------------- |
| ["US"] |
| ["US", "UK"] |
| [] |
I've tried to expand the "tag" array and aggregate the desired result back; however, it requires a unique id to group up the results. Hence, I add a column with row number to serve as unique id. Here is what I've done:
SELECT ROW_NUMBER() OVER () AS id, * INTO data_table_with_id FROM data_table;
SELECT ARRAY_AGG(tag_value) AS result
FROM (
SELECT
id,
json_array_elements("data"::json->'tag')->>'type' as tag_type,
json_array_elements("data"::json->'tag')->>'value' as tag_value
FROM data_table_with_id
) tags
WHERE tag_type = 'country'
GROUP BY id;
Is it possible to use a single select to filter the object array and get the required results?
You can do this easily with a JSON path function:
select jsonb_path_query_array(data, '$.tag[*] ?(#.type == "country").value')
from data_table;

Postgres SQL query, that will group fields in nested JSON objects

I need a SQL query in Postgres that produce a JSON with grouped/inherited data,
see example below.
having a table "issues" with following example data:
+--------------------------------------+-------+------------+-----------------------+
| product_id | level | typology | comment |
+--------------------------------------+-------+------------+-----------------------+
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 1 | electronic | LED broken |
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 1 | mechanical | missing gear |
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 1 | mechanical | cover damaged |
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 2 | electric | switch wrong color |
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 2 | mechanical | missing o-ring |
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 2 | electric | plug wrong type |
| 3567ae01-c7b3-4cd7-9e4f-85730aab89ee | 1 | mechanical | gear wrong dimensions |
+--------------------------------------+-------+------------+-----------------------+
product_id, typology and comment are string.
level is an integer.
I want to obtain this JSON:
{
"e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5": {
"1": {
"electronic": [ "LED broken" ],
"mechanical": [ "missing gear", "cover damaged"]
},
"2": {
"electronic": [ "switch wrong color", "plug wrong type" ],
"mechanical": [ "missing o-ring" ]
}
},
"3567ae01-c7b3-4cd7-9e4f-85730aab89ee": {
"1": {
"mechanical": [ "gear wrong dimensions"]
}
}
}
So I begun to wrote a query like this:
SELECT array_to_json(array_agg(json_build_object(
product_id, json_build_object(
level, json_build_object(
typology, comment
)
)
))) FROM issues
but I didn't realize ho to group/aggregate to obtain the wanted JSON
step-by-step demo:db<>fiddle
SELECT
jsonb_object_agg(key, value)
FROM (
SELECT
jsonb_build_object(product_id, jsonb_object_agg(key, value)) as products
FROM (
SELECT
product_id,
jsonb_build_object(level, jsonb_object_agg(key, value)) AS level
FROM (
SELECT
product_id,
level,
jsonb_build_object(typology, jsonb_agg(comment)) AS typology
FROM
issues
GROUP BY product_id, level, typology
) s,
jsonb_each(typology)
GROUP BY product_id, level
) s,
jsonb_each(level)
GROUP BY product_id
) s,
jsonb_each(products)
jsonb_agg() aggregates some values into one JSON array. This has been done with the comments.
After that there is a more complicated step. To aggregate two different JSON objects into one object, you need to do this:
simplified demo:db<>fiddle
First you need to expand the elements into a key and a value column using jsonb_each(). Now you are able to aggregate these two columns using the aggregate function jsonb_object_agg(). See also
This is why the following steps look somewhat difficult. Every level of aggregation (level and product_id) need these steps because you want to merge the elements into single non-array JSON objects.
Because every single aggregation needs separate GROUP BY clauses, every step is done in its own subquery.

postgreSQL: jsonb traversal

I currently have a table which contains a column with a JSON object representing Twitter cashtags.
For example, this is my original query:
SELECT
DATA->'id' as tweet_id,
DATA->'text' as tweet_text,
DATA->'entities'->'symbols' as cashtags
FROM documents
LIMIT 10
The cashtags column will return something like
[{"text":"HEMP","indices":[0,5]},{"text":"MSEZ","indices":[63,68]}]
How can I traverse this datatype, which is listed as jsonb, in order to say, only return results where the text is equal to HEMP or MSEZ?
The value data->'entities'->'symbols' is a json array. You can unnest the array using the function jsonb_array_elements(), e.g.:
SELECT
data->'id' as tweet_id,
data->'text' as tweet_text,
value as cashtag
FROM documents,
jsonb_array_elements(data->'entities'->'symbols')
where value->>'text' in ('HEMP', 'MSEZ');
tweet_id | tweet_text | cashtag
----------+------------+---------------------------------------
1 | "my_tweet" | {"text": "HEMP", "indices": [0, 5]}
1 | "my_tweet" | {"text": "MSEZ", "indices": [63, 68]}
(2 rows)
or:
SELECT DISTINCT
data->'id' as tweet_id,
data->'text' as tweet_text,
data->'entities'->'symbols' as cashtags
FROM documents,
jsonb_array_elements(data->'entities'->'symbols')
WHERE value->>'text' in ('HEMP', 'MSEZ');
tweet_id | tweet_text | cashtags
----------+------------+------------------------------------------------------------------------------
1 | "my_tweet" | [{"text": "HEMP", "indices": [0, 5]}, {"text": "MSEZ", "indices": [63, 68]}]
(1 row)

joining with a DISTINCT ON on an ordered subquery in sqlalchemy

Here is (an extremely simplified version of) my problem.
I'm using Postgresql as the backend and trying to build a sqlalchemy query
from another query.
Table setup
Here are the tables with some random data for the example.
You can assume that each table was declared in sqlalchemy declaratively, with
the name of the mappers being respectively Item and ItemVersion.
At the end of the question you can find a link where I put the code for
everything in this question, including the table definitions.
Some items.
item
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
A table containing versions of each item. Each has at least one.
item_version
+----+---------+---------+-----------+
| id | item_id | version | text |
+----+---------+---------+-----------+
| 1 | 1 | 0 | item_1_v0 |
| 2 | 1 | 1 | item_1_v1 |
| 3 | 2 | 0 | item_2_v0 |
| 4 | 3 | 0 | item_3_v0 |
+----+---------+---------+-----------+
The query
Now, for a given sqlalchemy query over Item, I want a function that returns
another query, but this time over (Item, ItemVersion), where the Items are
the same as in the original query (and in the same order!), and where the
ItemVersion are the corresponding latest versions for each Item.
Here is an example in SQL, which is pretty straightforward:
First a random query over the item table
SELECT item.id as item_id
FROM item
WHERE item.id != 2
ORDER BY item.id DESC
which corresponds to
+---------+
| item_id |
+---------+
| 3 |
| 1 |
+---------+
Then from that query, if I want to join the right versions, I can do
SELECT sq2.item_id AS item_id,
sq2.item_version_id AS item_version_id,
sq2.item_version_text AS item_version_text
FROM (
SELECT DISTINCT ON (sq.item_id)
sq.item_id AS item_id,
iv.id AS item_version_id,
iv.text AS item_version_text
FROM (
SELECT item.id AS item_id
FROM item
WHERE id != 2
ORDER BY id DESC) AS sq
JOIN item_version AS iv
ON iv.item_id = sq.item_id
ORDER BY sq.item_id, iv.version DESC) AS sq2
ORDER BY sq2.item_id DESC
Note that it has to be wrapped in a subquery a second time because the
DISTINCT ON discards the ordering.
Now the challenge is to write a function that does that in sqlalchemy.
Here is what I have so far.
First the initial sqlalchemy query over the items:
session.query(Item).filter(Item.id != 2).order_by(desc(Item.id))
Then I'm able to build my second query but without the original ordering. In
other words I don't know how to do the second subquery wrapping that I did in
SQL to get back the ordering that was discarded by the DISTINCT ON.
def join_version(session, query):
sq = aliased(Item, query.subquery('sq'))
sq2 = session.query(sq, ItemVersion) \
.distinct(sq.id) \
.join(ItemVersion) \
.order_by(sq.id, desc(ItemVersion.version))
return sq2
I think this SO question could be part of the answer but I'm not quite
sure how.
The code to run everything in this question (database creation, population and
a failing unit test with what I have so far) can be found here. Normally
if you can fix the join_version function, it should make the test pass!
Ok so I found a way. It's a bit of a hack but still only queries the database twice so I guess I will survive! Basically I'm querying the database for the Items first, and then I do another query for the ItemVersions, filtering on item_id, and then reordering with a trick I found here (this is also relevant).
Here is the code:
def join_version(session, query):
items = query.all()
item_ids = [i.id for i in items]
items_v_sq = session.query(ItemVersion) \
.distinct(ItemVersion.item_id) \
.filter(ItemVersion.item_id.in_(item_ids)) \
.order_by(ItemVersion.item_id, desc(ItemVersion.version)) \
.subquery('sq')
sq = aliased(ItemVersion, items_v_sq)
items_v = session.query(sq) \
.order_by('idx(array{}, sq.item_id)'.format(item_ids))
return zip(items, items_v)

group_by or distinct with postgres/dbix-class

I have a posts table like so:
+-----+----------+------------+------------+
| id | topic_id | text | timestamp |
+-----+----------+------------+------------+
| 789 | 2 | foobar | 1396026357 |
| 790 | 2 | foobar | 1396026358 |
| 791 | 2 | foobar | 1396026359 |
| 792 | 3 | foobar | 1396026360 |
| 793 | 3 | foobar | 1396026361 |
+-----+----------+------------+------------+
How would I could about "grouping" the results by topic id, while pulling the most recent record (sorting by timestamp desc)?
I've come to the understanding that I might not want "group_by" but rather "distinct on". My postgres query looks like this:
select distinct on (topic_id) topic_id, id, text, timestamp
from posts
order by topic_id desc, timestamp desc;
This works great. However, I can't figure out if this is something I can do in DBIx::Class without having to write a custom ResultSource::View. I've tried various arrangements of group_by with selects and columns, and have tried distinct => 1. If/when a result is returned, it doesn't actually preserve the uniqueness.
Is there a way to write the query I am trying through a resultset search, or is there perhaps a better way to achieve the same result through a different type of query?
Check out the section in the DBIC Cookbook on grouping results.
I believe what you want is something along the lines of this though:
my $rs = $base_posts_rs->search(undef, {
columns => [ {topic_id=>"topic_id"}, {text=>"text"}, {timestamp=>"timestamp"} ],
group_by => ["topic_id"],
order_by => [ {-desc=>"topic_id"}, {-desc=>"timestamp"} ],
})
Edit: A quick and dirty way to get around strict SQL grouping would be something like this:
my $rs = $base_posts_rs->search(undef, {
columns => [
{ topic_id => \"MAX(topic_id)" },
{ text => \"MAX(text)" },
{ timestamp => \"MAX(timestamp)" },
],
group_by => ["topic_id"],
order_by => [ {-desc=>"topic_id"}, {-desc=>"timestamp"} ],
})
Of course, use the appropriate aggregate function for your need.