group_by or distinct with postgres/dbix-class - postgresql

I have a posts table like so:
+-----+----------+------------+------------+
| id | topic_id | text | timestamp |
+-----+----------+------------+------------+
| 789 | 2 | foobar | 1396026357 |
| 790 | 2 | foobar | 1396026358 |
| 791 | 2 | foobar | 1396026359 |
| 792 | 3 | foobar | 1396026360 |
| 793 | 3 | foobar | 1396026361 |
+-----+----------+------------+------------+
How would I could about "grouping" the results by topic id, while pulling the most recent record (sorting by timestamp desc)?
I've come to the understanding that I might not want "group_by" but rather "distinct on". My postgres query looks like this:
select distinct on (topic_id) topic_id, id, text, timestamp
from posts
order by topic_id desc, timestamp desc;
This works great. However, I can't figure out if this is something I can do in DBIx::Class without having to write a custom ResultSource::View. I've tried various arrangements of group_by with selects and columns, and have tried distinct => 1. If/when a result is returned, it doesn't actually preserve the uniqueness.
Is there a way to write the query I am trying through a resultset search, or is there perhaps a better way to achieve the same result through a different type of query?

Check out the section in the DBIC Cookbook on grouping results.
I believe what you want is something along the lines of this though:
my $rs = $base_posts_rs->search(undef, {
columns => [ {topic_id=>"topic_id"}, {text=>"text"}, {timestamp=>"timestamp"} ],
group_by => ["topic_id"],
order_by => [ {-desc=>"topic_id"}, {-desc=>"timestamp"} ],
})
Edit: A quick and dirty way to get around strict SQL grouping would be something like this:
my $rs = $base_posts_rs->search(undef, {
columns => [
{ topic_id => \"MAX(topic_id)" },
{ text => \"MAX(text)" },
{ timestamp => \"MAX(timestamp)" },
],
group_by => ["topic_id"],
order_by => [ {-desc=>"topic_id"}, {-desc=>"timestamp"} ],
})
Of course, use the appropriate aggregate function for your need.

Related

Postgresql - Filter object array and extract required values in a json object

I have a PostgreSQL table like below:
| data |
| -------------- |
| {"name":"a","tag":[{"type":"country","value":"US"}]} |
| {"name":"b","tag":[{"type":"country","value":"US"}]}, {"type":"country","value":"UK"}]} |
| {"name":"c","tag":[{"type":"gender","value":"male"}]} |
The goal is to extract all the value in "tag" array with "type" = "country" and aggregate them into a text array. The expected result is as follows:
| result |
| -------------- |
| ["US"] |
| ["US", "UK"] |
| [] |
I've tried to expand the "tag" array and aggregate the desired result back; however, it requires a unique id to group up the results. Hence, I add a column with row number to serve as unique id. Here is what I've done:
SELECT ROW_NUMBER() OVER () AS id, * INTO data_table_with_id FROM data_table;
SELECT ARRAY_AGG(tag_value) AS result
FROM (
SELECT
id,
json_array_elements("data"::json->'tag')->>'type' as tag_type,
json_array_elements("data"::json->'tag')->>'value' as tag_value
FROM data_table_with_id
) tags
WHERE tag_type = 'country'
GROUP BY id;
Is it possible to use a single select to filter the object array and get the required results?
You can do this easily with a JSON path function:
select jsonb_path_query_array(data, '$.tag[*] ?(#.type == "country").value')
from data_table;

Postgres SQL query, that will group fields in nested JSON objects

I need a SQL query in Postgres that produce a JSON with grouped/inherited data,
see example below.
having a table "issues" with following example data:
+--------------------------------------+-------+------------+-----------------------+
| product_id | level | typology | comment |
+--------------------------------------+-------+------------+-----------------------+
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 1 | electronic | LED broken |
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 1 | mechanical | missing gear |
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 1 | mechanical | cover damaged |
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 2 | electric | switch wrong color |
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 2 | mechanical | missing o-ring |
| e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5 | 2 | electric | plug wrong type |
| 3567ae01-c7b3-4cd7-9e4f-85730aab89ee | 1 | mechanical | gear wrong dimensions |
+--------------------------------------+-------+------------+-----------------------+
product_id, typology and comment are string.
level is an integer.
I want to obtain this JSON:
{
"e1227f18-0c1f-4ebb-8cbf-a09c74ba14f5": {
"1": {
"electronic": [ "LED broken" ],
"mechanical": [ "missing gear", "cover damaged"]
},
"2": {
"electronic": [ "switch wrong color", "plug wrong type" ],
"mechanical": [ "missing o-ring" ]
}
},
"3567ae01-c7b3-4cd7-9e4f-85730aab89ee": {
"1": {
"mechanical": [ "gear wrong dimensions"]
}
}
}
So I begun to wrote a query like this:
SELECT array_to_json(array_agg(json_build_object(
product_id, json_build_object(
level, json_build_object(
typology, comment
)
)
))) FROM issues
but I didn't realize ho to group/aggregate to obtain the wanted JSON
step-by-step demo:db<>fiddle
SELECT
jsonb_object_agg(key, value)
FROM (
SELECT
jsonb_build_object(product_id, jsonb_object_agg(key, value)) as products
FROM (
SELECT
product_id,
jsonb_build_object(level, jsonb_object_agg(key, value)) AS level
FROM (
SELECT
product_id,
level,
jsonb_build_object(typology, jsonb_agg(comment)) AS typology
FROM
issues
GROUP BY product_id, level, typology
) s,
jsonb_each(typology)
GROUP BY product_id, level
) s,
jsonb_each(level)
GROUP BY product_id
) s,
jsonb_each(products)
jsonb_agg() aggregates some values into one JSON array. This has been done with the comments.
After that there is a more complicated step. To aggregate two different JSON objects into one object, you need to do this:
simplified demo:db<>fiddle
First you need to expand the elements into a key and a value column using jsonb_each(). Now you are able to aggregate these two columns using the aggregate function jsonb_object_agg(). See also
This is why the following steps look somewhat difficult. Every level of aggregation (level and product_id) need these steps because you want to merge the elements into single non-array JSON objects.
Because every single aggregation needs separate GROUP BY clauses, every step is done in its own subquery.

How to get back aggregate values across 2 dimensions using Python Cubes?

Situation
Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to use cubes to be able to do a display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
Note the following:
Even though there were no records in sales for Saucer 12 under Store S3, I displayed 0 instead of null or none.
I want to be able to do sort by store, say descending order for, S3.
The cells indicate the SUM total of that particular product spent in that particular store.
I also want to have pagination.
What I tried
This is the configuration I used:
"cubes": [
{
"name": "sales",
"dimensions": ["product", "store"],
"joins": [
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
This is the code I used:
result = browser.aggregate(drilldown=['Store','Product'],
order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])
I didn't get what I want.
I got it like this:
----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1 | 1 | 7.05 |
| 1 | 2 | 9 |
| 2 | 3 | 2.00 |
| and many more .... |
|---------------------------------------------|
which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.
My question
How do I get what I want?
Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?
Update
I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations
Cross-posted at Cubes GitHub issues to get more attention.
This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:
PostgreSQL Crosstab Query
And this about the "extra" column in the crosstab() output:
Pivot on Multiple Columns using Tablefunc
SELECT product_id, product
, COALESCE(s1, 0) AS s1 -- 1. ... displayed 0 instead of null
, COALESCE(s2, 0) AS s2
, COALESCE(s3, 0) AS s3
, COALESCE(s4, 0) AS s4
, COALESCE(s5, 0) AS s5
FROM crosstab(
'SELECT s.product_id, p.name, s.store_id, s.sum_amount
FROM product p
JOIN (
SELECT product_id, store_id
, sum(amount) AS sum_amount -- 3. SUM total of product spent in store
FROM sales
GROUP BY product_id, store_id
) s ON p.id = s.product_id
ORDER BY s.product_id, s.store_id;'
, 'VALUES (1),(2),(3),(4),(5)' -- desired store_id's
) AS ct (product_id int, product text -- "extra" column
, s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER BY s3 DESC; -- 2. ... descending order for S3
Produces your desired result exactly (plus product_id).
To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.
SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.
Major points
Read the basic explanation in the reference answer for crosstab().
I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.
You don't need the store table in the query if referential integrity is guaranteed.
ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:
PostgreSQL sort by datetime asc, null first?
For building crosstab() queries dynamically consider:
Dynamic alternative to pivot with CASE and GROUP BY
I also want to have pagination.
That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:
Displaying data in grid view page by page
I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.
To optimize performance for big result sets, consider:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Optimize query with OFFSET on large table

Optimizing MongoDB indexing multiple field with multiple query

I am new to database indexing. My application has the following "find" and "update" queries, searched by single and multiple fields
reference | timestamp | phone | username | key | Address
update x | | | | |
findOne | x | x | | |
find/limit:16 | x | x | x | |
find/limit:11 | x | | | x | x
find/limit:1/sort:-1 | x | x | | x | x
find | x | | | |
1)update({"reference":"f0d3dba-278de4a-79a6cb-1284a5a85cde"}, ……….
2)findOne({"timestamp":"1466595571", "phone":"9112345678900"})
3)find({"timestamp":"1466595571", "phone":"9112345678900", "username":"a0001a"}).limit(16)
4)find({"timestamp":"1466595571", "key":"443447644g5fff", "address":"abc road, mumbai, india"}).limit(11)
5)find({"timestamp":"1466595571", "phone":"9112345678900", "key":"443447644g5fff", "address":"abc road, mumbai, india"}).sort({"_id":-1}).limit(1)
6)find({"timestamp":"1466595571"})
I am creating index
db.coll.createIndex( { "reference": 1 } ) //for 1st, 6th query
db.coll.createIndex( { "timestamp": 1, "phone": 1, "username": 1 } ) //for 2nd, 3rd query
db.coll.createIndex( { "timestamp": 1, "key": 1, "address": 1, phone: 1 } ) //for 4th, 5th query
Is this the correct way?
Please help me
Thank you
I think what you have done looks fine. One way to check if your query is using an index, which index is being used, and whether the index is effective is to use the explain() function alongside your find().
For example:
db.coll.find({"timestamp":"1466595571"}).explain()
will return a json document which details what index (if any) was used. In addition to this you can specify that the explain return "executionStats"
eg.
db.coll.find({"timestamp":"1466595571"}).explain("executionStats")
This will tell you how many index keys were examined to find the result set as well as the execution time and other useful metrics.

Implicit mapping based on column values

I'm trying to do an implicit mapping in EF6 that is based on the values of the various columns. I have a simple table like so...
col1 | col2 | col3
with each of the columns nullable. A dependency would be from a column that has an entry, e.g.
"bla" | "blub" | "qwer"
to any entry that has one or more of the rows at null but the others the same like ...
"bla" | null | "qwer"
null | "bulb" | "qwer"
null | "blub" | null
is that somehow expressible as a "Map" operation or do I have to write a custom select for that (and if so, how)?
modelBuilder.Entity<MyDbType>()
.HasMany(e => e.Dependents)
.WithMany(e => e.DependsOn)
.Map(m => m.???)