How to take max value from one column for each value in second column in table with many other columns in SAS Enterprise Guide? - group-by

I have table in SAS Enterprise Guide like below:
COL1 - date
COL2 - numeric
COL1 | COL2 | COL3 | COL4 | COL5
--------- |-------|-------|------|-------
01APR2021 | 11 | XXX | XXX | XXX
01MAY2021 | 5 | XXX | XXX | XXX
01MAY2021 | 25 | XXX | XXX | XXX
01JUN2021 | 10 | XXX | XXX | XXX
... | ... | ... | ... | ...
And I need to for each dates in COL1 select max value in COL2.
Moroever, in output I need to have also rest of my columns: COL3, COL4, COL5
So as a result I need somethin like below, because for date 01MAY2021 in COL1, in COL2 are two values and 25>5.
COL1 | COL2 | COL3 | COL4 | COL5
--------- |-------|-------|------|-------
01APR2021 | 11 | XXX | XXX | XXX
01MAY2021 | 25 | XXX | XXX | XXX
01JUN2021 | 10 | XXX | XXX | XXX
... | ... | ... | ... | ...
How can I do that in SAS Enterprise Guide ?

You will need a GROUP BY COL1 clause in order to compute MAX(COL2) within the group, and also a HAVING clause to select the rows with the aggregate computation. Note, there might be two rows with the same max, and thus you will get both in your result set.
Example:
create table want_table as
select * from have_table
group by COL1
having COL2 = max(COL2)
;

Related

Parse text data in PostgreSQL

I've got a PostgreSQL database, one table with 2 text columns, stored data like this:
id| col1 | col2 |
------------------------------------------------------------------------------|
1 | value_1, value_2, value_3 | name_1(date_1), name_2(date_2), name_3(date_3)|
2 | value_4, value_5, value_6 | name_4(date_4), name_5(date_5), name_6(date_6)|
I need to parse rows in a new table like this:
id | col1 | col2 | col3 |
1 | value_1 | name_1 | date_1 |
1 | value_2 | name_2 | date_2 |
...| ... | ... | ... |
2 | value_6 | name_6 | date_6 |
How might I do this?
step-by-step demo:db<>fiddle
SELECT
id,
u_col1 as col1,
col2_matches[1] as col2, -- 5
col2_matches[2] as col3
FROM
mytable,
unnest( -- 3
regexp_split_to_array(col1, ', '), -- 1
regexp_split_to_array(col2, ', ') -- 2
) as u (u_col1, u_col2),
regexp_matches(u_col2, '(.+)\((.+)\)') as col2_matches -- 4
Split the data of your first column into an array
Split the data of your second column into an array of form {a(a), b(b), c(c)}
Transpose all array elements into own records
Split the elements of form a(b) into an array of form {a,b}
Show required columns. For the col2 and col3 show the first or the second array element from step 4

How can I write a function with two tables inputs and one table output in PostgreSQL?

I want to create a function that can create a table, in which part of the columns is derived from the other two tables.
input table1:
This is a static table for each loan. Each loan has only one row with information related to that loan. For example, original unpaid balance, original interest rate...
| id | loan_age | ori_upb | ori_rate | ltv |
| --- | -------- | ------- | -------- | --- |
| 1 | 360 | 1500 | 4.5 | 0.6 |
| 2 | 360 | 2000 | 3.8 | 0.5 |
input table2:
This is a dynamic table for each loan. Each loan has seraval rows show the loan performance in each month. For example, current unpaid balance, current interest rate, delinquancy status...
| id | month| cur_upb | cur_rate |status|
| ---| --- | ------- | -------- | --- |
| 1 | 01 | 1400 | 4.5 | 0 |
| 1 | 02 | 1300 | 4.5 | 0 |
| 1 | 03 | 1200 | 4.5 | 1 |
| 2 | 01 | 2000 | 3.8 | 0 |
| 2 | 02 | 1900 | 3.8 | 0 |
| 2 | 03 | 1900 | 3.8 | 1 |
| 2 | 04 | 1900 | 3.8 | 2 |
output table:
The output table contains information from table1 and table2. Payoffupb is the last record of cur_upb in table2. This table is built for model development.
| id | loan_age | ori_upb | ori_rate | ltv | payoffmonth| payoffupb | payoffrate |lastStatus | modification |
| ---| -------- | ------- | -------- | --- | ---------- | --------- | ---------- |---------- | ------------ |
| 1 | 360 | 1500 | 4.5 | 0.6 | 03 | 1200 | 4.5 | 1 | null |
| 2 | 360 | 2000 | 3.8 | 0.5 | 04 | 1900 | 3.8 | 2 | null |
Most columns in the output table can directly get or transferred from columns in the two input tables, but some columns can not get then leave blank.
My main question is how to write a function to take two tables as inputs and output another table?
I already wrote the feature transformation part for data files in 2018, but I need to do the same thing again for data files in some other years. That's why I want to create a function to make things easier.
As you want to insert the latest entry of table2 against each entry of table1 try this
insert into table3 (id, loan_age, ori_upb, ori_rate, ltv,
payoffmonth, payoffupb, payoffrate, lastStatus )
select distinct on (t1.id)
t1.id, t1.loan_age, t1.ori_upb, t1.ori_rate, t1.ltv, t2.month, t2.cur_upb,
t2.cur_rate, t2.status
from
table1 t1
inner join
table2 t2 on t1.id=t2.id
order by t1.id , t2.month desc
DEMO1
EDIT for your updated question:
Function to do the above considering table1, table2, table3 structure will be always identical.
create or replace function insert_values(table1 varchar, table2 varchar, table3 varchar)
returns int as $$
declare
count_ int;
begin
execute format('insert into %I (id, loan_age, ori_upb, ori_rate, ltv, payoffmonth, payoffupb, payoffrate, lastStatus )
select distinct on (t1.id) t1.id, t1.loan_age, t1.ori_upb,
t1.ori_rate,t1.ltv,t2.month,t2.cur_upb, t2.cur_rate, t2.status
from %I t1 inner join %I t2 on t1.id=t2.id order by t1.id , t2.month desc',table3,table1,table2);
GET DIAGNOSTICS count_ = ROW_COUNT;
return count_;
end;
$$
language plpgsql
and call above function like below which will return the number of inserted rows:
select * from insert_values('table1','table2','table3');
DEMO2

How I can reverse the content of a table?

I need a general procedure that can reverse the content of a table.
Basically I have a ragged table (a dimension table for an OLAP DB) built from top level and I need to convert into from a bottom level perspective.
How can do it?
+-----+------+------+------+------+
| Co1 | Col2 | Col3 | Col4 | Col5 |
+-----+------+------+------+------+
| 1 | 9765 | 1234 | A | |
| 2 | 9765 | 1235 | A | |
| 3 | 9765 | 1235 | | |
| 4 | 9764 | 4567 | 789 | A1 |
| 5 | 9764 | | | |
| 6 | 9764 | 4568 | 3453 | A2 |
+-----+------+------+------+------+
+------+------+------+------+------+
| Co1 | Col2 | Col3 | Col4 | Col5 |
+------+------+------+------+------+
| A | 1234 | 9765 | 1 | |
| A | 1235 | 9765 | 2 | |
| 1235 | 9765 | 3 | | |
| A1 | 789 | 4567 | 9764 | 4 |
| 9764 | 5 | | | |
| A2 | 3453 | 4568 | 9764 | 6 |
+------+------+------+------+------+
if I properly understand the example tables you have provided, you can just insert them in reverse order if you are creating a new table with the data, or you can simply give the fields new names in query results by providing an alias using the AS keyword.
Both shown together to be more concise and because they won't conflict with one another. If you only need to select the data with different column names, then you can remove the INSERT clause. If you are creating a new, table you may omit the alias.
INSERT INTO table2
( Col5, Col4, Col3, Col2, Col1 )
SELECT
Col1 as Col5
, Col2 as Col4
, Col3
, Col4 as Col2
, Col5 as Col1
FROM table1

Postgres DISTINCT ON eqivalent in Hibernate Query Language

I need Postgres DISTINCT ON equivalent in HQL. For example consider the following.
SELECT DISTINCT ON (Col2) Col1, Col4 FROM tablename;
on table
Col1 | Col2 | Col3 | Col4
---------------------------------
AA1 | A | 2 | 1
AA2 | A | 4 | 2
BB1 | B | 2 | 3
BB2 | B | 5 | 4
Col2 will not be shown in the result as below
Col1 | Col4
------------
AA1 | 1
BB1 | 3
Can anyone give a solution in HQL. I need to use DISTINCT as it is part of a bigger query.
Sorry but I misread your question:
No, Hibernate does not support a DISTINCT ON query.
Here is possible duplicate of your question: Postgresql 'select distinct on' in hibernate

T-SQL: Rows to Columns With Count

Let me draw up the table first (there are dozens of columns and dozens of values under Code in reality)
Code | Pat | Col1 | Col2 | Col3
---------------------------------
ABC | 001 | | XX | Q1
ABC | 002 | xx | xx | Q1
ABC | 003 | xx | xxx | Q1
DEF | 004 | xx | xx | Q1
DEF | 005 | xx | xx | Q1
DEF | 006 | xx | xxx | Q1
The resulting table need to look like
ABC | DEF
---------
2 | 3
3 | 3
Let me try and explain. For each 'Code' column, I would need to count the number of entries in Col1 to ColX where the cell is not null/empty.
So in example above, Code ABC has a count of 2 in Col1 and a count of 3 in Col2 Similarly for DEF, both have a count of 3
I've tried lots of things but got to the point where I'm now looking at a blank page again!
ALTERNATIVELY
Code | Col1 | Col2
--------------------
ABC | 2 | 3
DEF | 3 | 3
Please advise
The alternative solution can be reached by using GROUP BY and summing up a calculated number:
SELECT
[Code],
SUM(CASE WHEN ISNULL(Col1, '') = '' THEN 0 ELSE 1 END) as [Col1],
SUM(CASE WHEN ISNULL(Col2, '') = '' THEN 0 ELSE 1 END) as [Col2],
...
FROM T
GROUP by [Code]