Postgresql create fixed size groups of rows - postgresql

We have a table of items that each item has an invoice id. We process this data in chunks based on invoice id (100 "invoices" at a time). Can you assist in creating a query that will assign a group id to each set of 100 invoices (chunk). Here's a logical example of what we wish to attain:
In this scenario, we know we have 9 rows and 5 invoices in advance. We want to create groups that each group contains 2 invoices except the last group.

SELECT n1.*,
n2.r
FROM dbtable n1,
-- Group distinct inv_ids per group of 2
-- or any other number by changing the /2 to e.g., /4
(SELECT inv_id,
((row_number() OVER())-1)/2 AS r
FROM
-- Get distinct inv_ids
(SELECT DISTINCT inv_id AS inv_id
FROM dbtable
ORDER BY inv_id) n2a) n2
WHERE n1.inv_id=n2.inv_id ;
This query has the advantage that will select correct groups of inv_ids even when the inv_ids are not consecutive.
SQL fiddle here

Related

PG SQL UNION With ORDER BY, LIMIT Performance Optimization

I am trying to execute a query with an ORDER BY clause and a LIMIT clause for performance. Consider the following schema.
ONE
(id, name)
(1 , a)
(2 , b)
(5 , c)
TWO
(id, name)
(3 , d)
(4 , e)
(5 , f)
I want to be able to get a list of people from tables one and two ordered by ID.
The current query I have is as follows.
WITH combined AS (
(SELECT * FROM one ORDER BY id DESC)
UNION ALL
(SELECT * FROM two ORDER BY id DESC)
)
SELECT * FROM combined ORDER BY id LIMIT 5
the output of the table will be
(id, name)
(1 , a)
(2 , b)
(3 , d)
(4 , e)
(5 , c)
You'll notice that last row "c" or "f" will change based on the order of the UNION (one UNION two versus two UNION one). That's not important as I only care about the order for ID.
Unfortunately, this query does a full scan of both tables as per the ORDER BY on "combined". My table one and two are both billions of rows.
I am looking for a query that will be able to search both tables simultaneously, if possible. Meaning rather than looking through all of "one" for the entries that I need, it first looks to sort both by ID and then look for the minimum from both tables such that if the ID in one table is lower than the ID in another table, the query will look in the other table until the other table's ID is higher or equal to the first table before looking through the first table again.
The correct order of reading the table, given one UNION two would be a, b, d, e, c/f.
Do you just mean this?
WITH combined AS (
(SELECT * FROM one ORDER BY id LIMIT 5)
UNION ALL
(SELECT * FROM two ORDER BY id LIMIT 5)
)
SELECT * FROM combined ORDER BY id LIMIT 5
That will select the 5 "lowest id" rows from each table (which is the minimum you need to guarantee 5 output rows) and then find the lowest of those.
Thanks to a_horse_with_no_name's comment on Richard Huxton's answer regarding adding an index, the query runs considerably faster, from indeterminate to under one minute.
In my case, the query was still too slow, and I came across the following solution.
Consider using results from one table to limit results from another table. The following solution, in combination with indexing by id, worked for my tables with billions of rows, but operates on the assumption that table "one" is faster than table "two" to finish the query.
WITH first as (SELECT * FROM one ORDER BY id LIMIT 5),
filter as (SELECT min(id) FROM first),
second as (SELECT * FROM two
WHERE id < (SELECT filter.id FROM filter)
ORDER BY id LIMIT 5)
combined AS (
(SELECT * FROM first ORDER BY id LIMIT 5)
UNION ALL
(SELECT * FROM second ORDER BY id LIMIT 5)
)
SELECT * FROM combined ORDER BY id LIMIT 5
By using the minimum ID from the first complete query, I can limit the scope that the database scans for completion of the second query.

Postgres query filter by non column in table

i have a challenge whose consist in filter a query not with a value that is not present in a table but a value that is retrieved by a function.
let's consider a table that contains all sales on database
id, description, category, price, col1 , ..... col n
i have function that retrieve me a table of similar sales from one (based on rules and business logic) . This function performs a query again on all records in the sales table and match validation in some fields.
similar_sales (sale_id integer) - > returns a integer[]
now i need to list all similar sales for each one present in sales table.
select s.id, similar_sales (s.id)
from sales s
but the similar_sales can be null and i am interested only return sales which contains at least one.
select id, similar
from (
select s.id, similar_sales (s.id) as similar
from sales s
) q
where #similar > 1 (Pseudocode)
limit x
i can't do the limit in subquery because i don't know what sales have similar or not.
I just wanted do a subquery for a set of small rows and not all entire table to get query performance gains (pagination strategy)
you can try this :
select id, similar
from sales s
cross join lateral similar_sales (s.id) as similar
where not isempty(similar)
limit x

How to write proper/efficient query

I have a question about the right way of writing the query.
I have an employees table, lets say there are 4 columns employee_id, department, salary, email.
There are some records without email address, I'd like to find the most efficient way to write SQL query using window function that brings the sum salary per group, divided by all of those without email address.
I have 2 solutions, of course only one is efficient, can anyone give any advice about it?
select department, sum(salary) as total
from employees
where email is null
group by 1
option 1
select a.department , a.total/(select sum(salary) from employees where email is null)
from (
select department, sum(salary) as total
from employees
where email is null
group by 1
) a
option 2
select a.department , a.total/sum(a.total) over()
from (
select department, sum(salary) as total
from employees
where email is null
group by 1
) a
I guess that query 2 is more efficient, but is it the right way? and is it valid to leave over clause empty?
Just started using PostgreSQL instead of MySQL 5.6.
Your second query is better.
The first query has to scan employees twice, while the second table only scans the (hopefully smaller) result set of the subquery to calculate the sum.
It is perfectly valid to leave the OVER clause empty, that just means that all result rows will get the same value (which is what you want).

How can I combine two PIVOTs that use different aggregate elements and the same spreading/grouping elements into a single row per ID?

Couldn't find an exact duplicate question so please push one to me if you know of one.
https://i.stack.imgur.com/Xjmca.jpg
See the screenshot (sorry for link, not enough rep). In the table I have ID, Cat, Awd, and Xmit.
I want a resultset where each row is a distinct ID plus the aggregate Awd and Xmit amounts for each Cat (so four add'l columns per ID).
Currently I'm using two CTEs, one to aggregate each of Awd and Xmit. Both make use of the PIVOT operator, using Cat to spread and ID to group. After each CTE does its thing, I'm INNER JOINing them on ID.
WITH CTE1 (ID, P_Awd, G_Awd) AS (
SELECT ...
FROM Table
PIVOT(SUM(Awd) FOR Cat IN ('P', 'G'),
CTE2 ([same as CTE1 but replace "Awd" with "Xmit"])
SELECT ID, P_Awd, P_Xmit, G_Awd, G_Xmit
FROM CTE1 INNER JOIN CTE2 ON CTE1.ID = CTE2.ID
The output of this (greatly simplified) is two rows per ID, with each row holding the resultset of one CTE or the other.
What am I overlooking? Am I overcomplicating this?
Here on one method via a CROSS APPLY
Also, this is assumes you don't need dynamic SQL
Example
Select *
From (
Select ID
,B.*
From YourTable A
Cross Apply ( values (cat+'_Awd',Awd)
,(cat+'_Xmit',Xmit)
) B(Item,Value)
) src
Pivot (sum(Value) for Item in ([P_Awd],[P_XMit],[G_Awd],[G_XMit]) ) pvt
Returns (Limited Set -- Best if you not use images for sample data)
ID P_Awd P_XMit G_Awd G_XMit
1 1000 500 1000 0
2 2000 1500 500 500

how to get rowNum like column in sqlite IPHONE

I have an Sqlite database table like this (with out ascending)
But i need to retrive the table in Ascending order by name, when i set it ascending order the rowId changes as follows in jumbled order
But i need to retrieve some limited number of contacts 5 in ascending order every time
like Aaa - Eeee and then Ffff- Jjjjj ......
but to se**t limits like 0-5 5-10 .... ** it can able using rowids since they are in jumble order
So i need another column like (rowNum in oracle) wich is in order 1234567... every time as follows
how to retrive that column with existing columns
Note: WE DONTE HAVE ROWNUM LIKE COLUMN IN SQLITE
The fake rownum solution is clever, but I am afraid it doesn't scale well (for complex query you have to join and count on each row the number of row before current row).
I would consider using create table tmp as select /*your query*/.
because in the case of a create as select operation the rowid created when inserting
the rows is exactly what would be the rownum (a counter). It is specified by the SQLite doc.
Once the initial query has been inserted, you only need to query the tmp table:
select rowid, /* your columns */ from tmp
order by rowid
You can use offset/limit.
Get the first, 2nd, and 3rd groups of five rows:
select rowid, name from contactinfo order by name limit 0, 5
select rowid, name from contactinfo order by name limit 5, 5
select rowid, name from contactinfo order by name limit 10, 5
Warning, using the above syntax requires SQLite to read through all prior records in sorted order. So to get the 10th record for statement number 3 above SQLite needs to read the first 9 records. If you have a large number of records this can be problematic from a performance standpoint.
More info on limit/ offset:
Sqlite Query Optimization (using Limit and Offset)
Sqlite LIMIT / OFFSET query
This is a way of faking a RowNum, hope it helps:
SELECT
(SELECT COUNT(*)
FROM Names AS t2
WHERE t2.name < t1.name
) + (
SELECT COUNT(*)
FROM Names AS t3
WHERE t3.name = t1.name AND t3.id < t1.id
) AS rowNum,
id,
name
FROM Names t1
ORDER BY t1.name ASC
SQL Fiddle example