Group rows but choose to show value of one field in group? - tsql

I have a request which I am not certain I can grant. The desire is to group rows by certain fields but then also show an Item Description field that would break up the group and I cannot guarantee that using min() or max() on the Item Description field would result in showing the desired description in all cases.
Note in this code I have Item Description commented out, and this gives the number of rows and grouping they desire.
SELECT
--L.ITEMDESC,
SUM(L.UNITPRCE) AS Rate,
SUM(L.XTNDPRCE) AS Price,
L.QUANTITY AS Quantity,
RTRIM(L.UOFM) AS UOFM,
C.COMMENT_1,
C.COMMENT_2,
C.COMMENT_3
FROM
SOP10200 L, SOP10202 C
WHERE
L.SOPNUMBE = C.SOPNUMBE
AND L.SOPTYPE = C.SOPTYPE
AND L.LNITMSEQ = C.LNITMSEQ
AND L.SOPNUMBE = '00644680'
GROUP BY
C.COMMENT_1, C.COMMENT_2, C.COMMENT_3, L.QUANTITY, L.UOFM--, L.ITEMDESC
Results for example transaction:
Rate Price Quantity UOFM COMMENT 1 COMMENT 2 COMMENT 3
0.37891 63.56 167.72421 THERMS CESeq: 52593^Act^TARGET 10/13/2015 10/31/2015
0.34254 30.23 88.27579 THERMS CESeq: 52593^Act^TARGET 11/1/2015 11/10/2015
If I include the Item Description for explanatory purposes, this results:
ITEMDESC Rate Price Quantity UOFM COMMENT 1 COMMENT 2 COMMENT 3
Gas on the PG&E System 0.34691 58.19 167.72421 THERMS CESeq: 52593^Act^TARGET 10/13/2015 10/31/2015
PGE SPCC ADDER 0.03200 5.37 167.72421 THERMS CESeq: 52593^Act^TARGET 10/13/2015 10/31/2015
Gas on the PG&E System 0.31054 27.41 88.27579 THERMS CESeq: 52593^Act^TARGET 11/1/2015 11/10/2015
PGE SPCC ADDER 0.03200 2.82 88.27579 THERMS CESeq: 52593^Act^TARGET 11/1/2015 11/10/2015
In this specific scenario with this data they have said they want to see Gas on the PG&E System as the description, but I cannot just use min() or max() to resolve this transaction's scenario because that may not be appropriate for others.
So this image shows in one of the tables that I may be able to use a Unit Cost not equal to zero as the rule to choose which line's description to use. But I am struggling to think how I can implement that in code. I would like to suggest that to the client. How can I leverage Unit Cost <> 0 to choose which which Item Description is displayed?

here is the outline
select * from
(
select a, b, c, unitcost
, sum(UNITPRCE) over (partition by a, b order by 1) as p1
, sum(XTNDPRCE) over (partition by a, b order by 1) as p2
, row_number() over (partition by a, b order by unit unitcost) as rn
) as tt
where tt.rn = 1

Paparazzi's answer is good. Another solution would be to do something like the following:
SELECT
Max(case when L.UNITPRCE <> 0 then L.ITEMDESC else null end) AS ITEMDESC,
SUM(L.UNITPRCE) AS Rate,
SUM(L.XTNDPRCE) AS Price,
L.QUANTITY AS Quantity,
RTRIM(L.UOFM) AS UOFM,
C.COMMENT_1,
C.COMMENT_2,
C.COMMENT_3
FROM
SOP10200 L, SOP10202 C
WHERE
L.SOPNUMBE = C.SOPNUMBE
AND L.SOPTYPE = C.SOPTYPE
AND L.LNITMSEQ = C.LNITMSEQ
AND L.SOPNUMBE = '00644680'
GROUP BY
C.COMMENT_1, C.COMMENT_2, C.COMMENT_3, L.QUANTITY, L.UOFM
If only FIRST_VALUE was an aggregate function instead of an analytic function you could use that, but alas we are stuck with these cruddy solutions.

Related

How to keep one record in specific column and make other record value 0 in group by clause in PostgreSQL?

I have a set of data like this
The Result should look Like this
My Query
SELECT max(pi.pi_serial) AS proforma_invoice_id,
max(mo.manufacturing_order_master_id) AS manufacturing_order_master_id,
max(pi.amount_in_local_currency) AS sales_value,
FROM proforma_invoice pi
JOIN schema_order_map som ON pi.pi_serial = som.pi_id
LEFT JOIN manufacturing_order_master mo ON som.mo_id = mo.manufacturing_order_master_id
WHERE to_date(pi.proforma_invoice_date, 'DD/MM/YYYY') BETWEEN to_date('01/03/2021', 'DD/MM/YYYY') AND to_date('19/04/2021', 'DD/MM/YYYY')
AND pi.pi_serial in (9221,
9299)
GROUP BY mo.manufacturing_order_master_id,
pi.pi_serial
ORDER BY pi.pi_serial
Option 1: Create a "Running Total" field in Crystal Reports to sum up only one "sales_value" per "proforma_invoice_id".
Option 2: Add a helper column to your Postgresql query like so:
case
when row_number()
over (partition by proforma_invoice_id
order by manufacturing_order_master_id)
= 1
then sales_value
else 0
end
as sales_value
I prepared this SQLFiddle with an example for you (and would of course like to encourage you to do the same for your next db query related question on SO, too :-)

Is there a more efficient / elegant way to write this code I have?

I'm wondering if anybody can help me out with any or all of this code below. I've made it work, but it seems inefficient to me and is probably quite a bit slower than optimal.
Some basic background on the necessity of this code in the first place:
I have a table of shipping records that does not include the corresponding invoice number. I've looked all through the tables and I continue to do so. In fact, only this morning I discovered that if a packing slip has been generated that I can link the shipping table to the packing slip table via that packing slip ID and grab the invoice number from there. Absent that link, however, I'm forced to guess. In most instances, that's not terribly difficult, because the invoice table has number, line and release that can match up. But when there are multiple shipments for number, line and release (for instance, when a line is partially shipped) then there can be multiple answers, only one of which is correct. I am partially helped by the presence of a a column in the shipping table that states what the date sequence is for that number, line and release, but there are still circumstances where the process I use for "guessing" can be somewhat ambiguous.
What my procedure does is this. First, it creates a table of data that includes the invoice number if there was a pack slip to link it through.
Next, it dumps all of that data into a second table, this time using--only if the invoice was NULL in the first table--a "guess" about the invoice number based on partitioning all the shipping records by number, line, release, date sequence and date, and then comparing that to the same type of thing for the invoice table, and trying to line everything up by date.
Finally, it parses through that table and finds any last nulls and essentially matches them up with the first record of any invoice for that number, line and release.
Both guesses have added characters to show that they are, in fact, guesses.
IF OBJECT_ID('tempdb..#cosTAble') IS NOT NULL
DROP TABLE #cosTable
DECLARE #cosTable2 TABLE (
ID INT IDENTITY
,co_num CoNumType
,co_line CoLineType
,co_release CoReleaseType
,date_seq DateSeqType
,ship_date DateType
,inv_num NVARCHAR(14)
)
DECLARE
#co_num_ck CoNumType
,#co_line_ck CoLineType
,#co_release_ck CoReleaseType
DECLARE #Counter1 INT = 0
SELECT cos.co_num, cos.co_line, cos.co_release, cos.date_seq, cos.ship_date, cos.qty_invoiced, pck.inv_num
INTO #cosTable
FROM co_ship cos
LEFT JOIN pckitem pck
ON cos.pack_num = pck.pack_num
AND cos.co_num = pck.co_num
AND cos.co_line = pck.co_line
AND cos.co_release = pck.co_release
;WITH cos_Order
AS(
SELECT co_num, co_line, co_release, qty_invoiced, date_seq, ship_date, ROW_NUMBER () OVER (PARTITION BY co_num, co_line, co_release ORDER BY ship_date) AS cosrow
FROM co_ship
WHERE qty_invoiced > 0
),
invi_Order
AS(
SELECT inv_num, co_num, co_line, co_release, ROW_NUMBER () OVER (PARTITION BY co_num, co_line, co_release ORDER BY RecordDate) AS invirow
FROM inv_item
WHERE qty_invoiced > 0
),
cos_invi
AS(
SELECT cosO.*, inviO.inv_num
FROM cos_Order cosO
LEFT JOIN invi_Order inviO
ON cosO.co_num = inviO.co_num AND cosO.co_line = inviO.co_line AND cosO.cosrow = inviO.invirow)
INSERT INTO #cosTable2
SELECT cosT.co_num, cosT.co_line, cosT.co_release, cosT.date_seq, cosT.ship_date, COALESCE(cosT.inv_num,'*'+cosi.inv_num) AS inv_num
FROM #cosTable cosT
LEFT JOIN cos_invi cosi
ON cosT.co_num = cosi.co_num
AND cosT.co_line = cosi.co_line
AND cosT.co_release = cosi.co_release
AND cosT.date_seq = cosi.date_seq
AND cosT.ship_date = cosi.ship_date
WHILE #Counter1 < (SELECT MAX(ID) FROM #cosTable2) BEGIN
SET #Counter1 += 1
SET #co_num_ck = (SELECT co_num FROM #cosTable2 WHERE ID = #Counter1)
SET #co_line_ck = (SELECT co_line FROM #cosTable2 WHERE ID = #Counter1)
SET #co_release_ck = (SELECT co_release FROM #cosTable2 WHERE ID = #Counter1)
IF EXISTS (SELECT * FROM #cosTable2 WHERE ID = #Counter1 AND inv_num IS NULL)
UPDATE #cosTable2
SET inv_num = '^' + (SELECT TOP 1 inv_num FROM #cosTable2 WHERE
#co_num_ck = co_num AND
#co_line_ck = co_line AND
#co_release_ck = co_release)
WHERE ID = #Counter1 AND inv_num IS NULL
END
SELECT * FROM #cosTable2
ORDER BY co_num, co_line, co_release, date_seq, ship_date
You're in a bad spot - as #craig.white and #HLGEM suggest, you've inherited something without sufficient constraints to make the data correct or safe...and now you have to "synthesize" it. I get that guesses are the best you can do, and you can, at least make your guesses reasonable performance-wise.
After that, you should squeal loudly to get some time to fix the db - to apply the constraints needed to prevent further crapification of the data.
Performance-wise, the while loop is a disaster. You'd be better off replacing that whole mess with a single update statement...something like:
update c0
set inv_nbr = '^' + c1.inv_nbr
from
#cosTable2 c0
left outer join
(
select
co_num,
co_line,
co_release,
inv_nbr
from
#cosTable2
where
inv_nbr is not null
group by
co_num,
co_line,
co_release,
inv_nbr
) as c1
on
c0.co_num = c1.co_num and
c0.co_line = c1.co_line and
c0.co_release = c1.co_release
where
c0.inv_num is null
...which does the same thing the loop does, only in a single statement.
It seems to me that you are trying very hard to solve a problem that should not exist. What you describe is an unfortunately common situation where a process has grown organically without intent and specific direction as a business has grown which has made data extraction near impossible to automate. You very much need a set of policies and procedures- For (very crude and simple) example:
1: An Order must exist before a packing slip can be generated.
2: a packing slip must exist before an invoice can be generated.
3: an invoice is created using data from the packing slip and order (what was requested, what was picked, what do we bill)
-Again, this is a crude example just to illustrate the idea.
All of the data MUST be entered at the proper time or someone has not done their job.
It is not in the IT departments typical skillset to accurately and consistently provide management good data when such data does not exist.

PostgreSQL array_agg order for window functions

The answer to my question was almost here: PostgreSQL array_agg order
Except that I wanted to array_agg over a window function:
select distinct c.concept_name,
array_agg(c2.vocabulary_id||':'||c2.concept_name
order by c2.vocabulary_id, c2.concept_name)
over (partition by ca.min_levels_of_separation),
ca.min_levels_of_separation
from concept c
join concept_ancestor ca on c.concept_id = ca.descendant_concept_id
and max_levels_of_separation > 0
join concept c2 on ca.ancestor_concept_id = c2.concept_id
where
c.concept_code = '44054006'
order by min_levels_of_separation;
So, maybe this will work in some future version, but I get this error
ERROR: aggregate ORDER BY is not implemented for window functions
LINE 2: select distinct c.concept_name, array_agg(c2.vocabulary_id||...
^
I should probably be selecting from a subquery like the first answer to the quoted question above suggests. I was hoping for something as simple as the order by (in that question's second answer). Or maybe I'm just being lazy about the query and should be doing a group by instead of select distinct.
I did try putting the order by in the windowing function (over (partition by ca.min_levels_of_separation order by c2.vocabulary_id, c2.concept_name)), but I get these sort of repeated rows that way:
"Type 2 diabetes mellitus";"{"MedDRA:Diabetes mellitus"}";1
"Type 2 diabetes mellitus";"{"MedDRA:Diabetes mellitus","MedDRA:Diabetes mellitus (incl subtypes)"}";1
"Type 2 diabetes mellitus";"{"MedDRA:Diabetes mellitus","MedDRA:Diabetes mellitus (incl subtypes)","SNOMED:Diabetes mellitus"}";1
(btw: http://www.ohdsi.org/ if you happen to be curious about where I got the medical vocabulary tables)
Yes, it does look like I was being muddle-headed and didn't need the window function. This seems to work:
select c.concept_name,
array_agg(c2.vocabulary_id||':'||c2.concept_name
order by c2.vocabulary_id, c2.concept_name),
ca.min_levels_of_separation
from concept c
join concept_ancestor ca on c.concept_id = ca.descendant_concept_id
and max_levels_of_separation > 0
join concept c2 on ca.ancestor_concept_id = c2.concept_id
where c.concept_code = '44054006'
group by c.concept_name, ca.min_levels_of_separation
order by min_levels_of_separation
I won't accept my answer for a while since it just avoids the question instead of actually answering it, and someone might have something more useful to say on the matter.
like this :
select distinct c.concept_name,
array_agg(c2.vocabulary_id||':'||c2.concept_name ) over (partition by ca.min_levels_of_separation order by c2.vocabulary_id, c2.concept_name),
ca.min_levels_of_separation
from concept c
join concept_ancestor ca on c.concept_id = ca.descendant_concept_id
and max_levels_of_separation > 0
join concept c2 on ca.ancestor_concept_id = c2.concept_id
where
c.concept_code = '44054006'
order by min_levels_of_separation;

Why does usage of lower() changes the order of resultset?

I have a table where I store information about users. The table has the following structure:
CREATE TABLE PERSONS
(
ID NUMBER(20, 0) NOT NULL,
FIRSTNAME VARCHAR2(40),
LASTNAME VARCHAR2(40),
BIRTHDAY DATE,
CONSTRAINT PERSONEN_PK PRIMARY KEY
(ID)
ENABLE
);
After inserting some test data:
SET DEFINE OFF;
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('1','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('2','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('3','Carl','Carlchen',to_date('01.01.12','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('4','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('5','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('6','Carl','Carlchen',to_date('01.01.12','DD.MM.RR'));
I want to select all duplicates of a given user. Let's use "Max Mustermann" for example:
SELECT p.id,p.firstname,p.lastname,p.birthday
FROM persons p
WHERE p.firstname = 'Max'
AND p.lastname = 'Mustermann'
AND p.birthday = to_date('31.10.1989','dd.mm.yyyy')
ORDER BY p.firstname,p.lastname;
This gives me a result like this:
id first last birthday
=================================
1 Max Mustermann 31.10.89
2 Max Mustermann 31.10.89
4 Max Mustermann 31.10.89
5 Max Mustermann 31.10.89
I want to do a case insensitive compare, so I change the query using lower (and trim) like this:
SELECT p.id,p.firstname,p.lastname,p.birthday
FROM persons p
WHERE lower(trim(p.firstname)) = lower(trim('mAx '))
AND lower(trim(p.lastname)) = lower(trim(' musteRmann '))
AND p.birthday = to_date('31.10.1989','dd.mm.yyyy')
ORDER BY p.lastname,p.firstname;
Now surprise the order has changed!
id first last birthday
=================================
1 Max Mustermann 31.10.89
5 Max Mustermann 31.10.89
4 Max Mustermann 31.10.89
2 Max Mustermann 31.10.89
Why does the order change, just by using lower() (same result when using without trim())!? I can get a stable ordering by adding the id column to the ORDER BY. But shouldn't the lower() have no affect to the ordering?
Workaround by also using id column for ORDER BY:
SELECT p.id,p.firstname,p.lastname,p.birthday
FROM persons p
WHERE p.firstname = 'Max'
AND p.lastname = 'Mustermann'
AND p.birthday = to_date('31.10.1989','dd.mm.yyyy')
ORDER BY p.firstname,p.lastname,p.id;
SELECT p.id,p.firstname,p.lastname,p.birthday
FROM persons p
WHERE lower(trim(p.firstname)) = lower(trim('mAx '))
AND lower(trim(p.lastname)) = lower(trim(' musteRmann '))
AND p.birthday = to_date('31.10.1989','dd.mm.yyyy')
ORDER BY p.lastname,p.firstname,p.id;
If the values to be ordered by are identical, then the DBMS is free to choose any order it feels correct (the same way it is free to choose any order if no order by is specified alltogether).
Because all values of the columns in the order by are identical the resulting order is not stable. The only way to get a stable order is to include a unique column as an additional order criteria for ties - exactly what you did when you added the id column.
Why does the order change, just by using lower()
From a technical point, I'd guess that applying the lower() changed the execution plan and therefor the access path to the data.
But again (just to make sure): ordering on identical values never guarantees a stable order!
There is no ordering without an order by clause. Sometimes it looks like there might be (group by fooled a lot of people in older releases`, but it's only coincidental, and must not be relied upon. In your case you're ordering by some columns, but you expect duplicates within that ordering to be further ordered implicitly, which won't happen - or at least cannot be relied on.
In this case Oracle probably happens to be retrieving the rows for your first query in the order you inserted them purely as a side effect of how it's reading data from the blocks, and the order by sorts them within that set without actually changing them (or quite likely it's skipping the order by step internally if it realises it's pointless; the explain plan would tell you that).
If you change the order the order the records are created:
...
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values
('5','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values
('4','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
...
then the result 'order' changes too:
SELECT p.id,p.firstname,p.lastname,p.birthday
FROM persons p
WHERE p.firstname = 'Max'
AND p.lastname = 'Mustermann'
AND p.birthday = to_date('31.10.1989','dd.mm.yyyy')
ORDER BY p.firstname,p.lastname;
ID FIRSTNAME LASTNAME BIRTHDAY
---------- -------------------- -------------------- ---------
1 Max Mustermann 31-OCT-89
2 Max Mustermann 31-OCT-89
5 Max Mustermann 31-OCT-89
4 Max Mustermann 31-OCT-89
Once you have the function things are changing enough for that happy accident to go out of the window, even if the records are inserted in id order (which has no relevance to the DB internally). lower() isn't changing the ordering, you just aren't getting lucky any more.
You cannot expect or rely on an order unless you fully specify it in the order by clause.

T-SQL Query to process data in batches without breaking groups

I am using SQL 2008 and trying to process the data I have in a table in batches, however, there is a catch. The data is broken into groups and, as I do my processing, I have to make sure that a group will always be contained within a batch or, in other words, that the group will never be split across different batches. It's assumed that the batch size will always be much larger than the group size. Here is the setup to illustrate what I mean (the code is using Jeff Moden's data generation logic: http://www.sqlservercentral.com/articles/Data+Generation/87901)
DECLARE #NumberOfRows INT = 1000,
#StartValue INT = 1,
#EndValue INT = 500,
#Range INT
SET #Range = #EndValue - #StartValue + 1
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable;
SELECT TOP (#NumberOfRows)
GroupID = ABS(CHECKSUM(NEWID())) % #Range + #StartValue
INTO #SomeTestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
This will create a table with about 435 groups of records containing between 1 and 7 records in each. Now, let's say I want to process these records in batches of 100 records per batch. How can I make sure that my GroupID's don't get split between different batches? I am fine if each batch is not exactly 100 records, it could be a little more or a little less.
I appreciate any suggestions!
This will result in slightly smaller batches than 100 entries, it'll remove all groups that aren't entirely in the selection;
WITH cte AS (SELECT TOP 100 * FROM (
SELECT GroupID, ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY GroupID) r
FROM #SomeTestTable) a
ORDER BY GroupID, r DESC)
SELECT c1.GroupID FROM cte c1
JOIN cte c2
ON c1.GroupID = c2.GroupID
AND c2.r = 1
It'll select the groups with the lowest GroupID's, limited to 100 entries into a common table expression along with the row number, then it'll use the row number to throw away any groups that aren't entirely in the selection (row number 1 needs to be in the selection for the group to be, since the row number is ordered descending before cutting with TOP).