Lag Function to populate missing rows

Lag Function to populate missing rows - postgresql

I am trying to come up with a SQL to get to this data in a Postgres 9.6 database table.
Table Data
I have tried various variations of windows function but none of these seems to work,
Based on input column C3, I am projecting a fourth column C4 and the output should resemble as below.
Final Desired output
How can I accomplish this using SQL? The table can have up to 100 Million records.

I was able to get desired output using the following SQL
with t2 as (select c1, c3 from Test_table where c3 is not null)
update Test_table t1
set c3 = t2.c3
from t2
where t1.c1 <= t2.c1
and t1.c3 is null;
select
c1
,c2
,C3
,dense_rank() over(order by c3) cr
from Test_table
order by c1;

Use a window function to select the smallest c3 in a window ordered by c1 descending, but sort the whole output by c1 ascending:
select c1, c2, c3, min(c3) over (order by c1 desc) as c4 from t order by c1;

I ran the SQL provided by you and I get this output. I thought the picture described what I really wanted. Hopefully, showing your output from running your SQL and desired output may help.
SQL output from your query and desired output

Related

informix 14.10 How to "select" returns a specific phrase such as None or blank instead of no result

I have a query like this:
select c1 , ( select d1 from table2 where dt) from table1 where ct
but if there is no d1 under condition dt i have no result but i have a reult like this:
--c1 d1
value1 NONE or Blank
value 2 NONE or Blank
. .
. .
Can anybody Help?

The NVL function can be used to return either of its two arguments depending on whether the first evaluates to NULL. So your example query could be written as:
select c1 , NVL(( select d1 from table2 where dt), "NONE") from table1 where ct
The data types of the two arguments need to be compatible, for example both character or both numeric.
More information can be found at https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1445.htm

Need to split column into rows and columns

I have a table like this:
ID cst
1 string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8,string9, 65
2 string10;-3;string11;string12;56;string13;6;string14;string15;9
etc.
Now I want to split the cst column into 5 columns and multiple rows.
So like this:
ID C1 C2 C3 C4 C5
1 string1 3 string2 string3 34
1 string4 -1 string5 string6 12
1 string7 5 string8 string9 65
2 string10 -3 string11 string12 56
2 string13 6 string14 string15 9
etc.
How to accomplish this? I am on SQL-server 2017, so I can use the string_split function. The problem with this function is that it produces only one output column...
Preferably I would like yo create an UDF that outputs a table. The function would use these input parameters: the string, the separator character, the number of columns. So the function can be used dynamically with a varying number of columns.
ps. the strings can be of variable length of course.

Try it along this:
Hint: There are some "normal" commas in your sample data.
I suspected these as wrong and used semicolons.
If this is wrong, you might use a general REPLACE() to use ";" instead of ",".
Create a declared table to simulate your issue
DECLARE #tbl TABLE(ID INT, cst VARCHAR(1000));
INSERT INTO #tbl(ID,cst)
VALUES(1,'string1;3;string2;string3;34;string4;-1;string5;string6;12;string7;5;string8;string9; 65')
,(2,'string10;-3;string11;string12;56;string13;6;string14;string15;9');
--The query (for almost any version of SQL-Server, find v2017+ as UPDATE below)
WITH cte AS
(
SELECT t.ID
,B.Nr
,A.Casted.value('(/x[sql:column("B.Nr")]/text())[1]','varchar(max)') AS ValueAtPosition
,(B.Nr-1) % 5 AS Position
,(B.Nr-1)/5 AS GroupingKey
FROM #tbl t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.cst,';','</x><x>') + '</x>' AS XML)) A(Casted)
CROSS APPLY(SELECT TOP(A.Casted.value('count(x)','int')) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) FROM master..spt_values) B(Nr)
)
SELECT ID
,GroupingKey
,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
FROM cte
GROUP BY ID,GroupingKey
ORDER BY ID,GroupingKey;
The idea in short:
we use APPLY to add your string casted to XML to the result set. This will help to split the string ("a;b;c" => <x>a</x><x>b</x><x>c</x>)
We use another APPLY to create a tally on the fly with a computed TOP-clause. It will return as many virtual rows as there are elements in the XML
We use sql:column() to grab each element's value by its position and some simple maths to create a grouping key and a running number from 0 to 4 and so on.
We use GROUP BY together with MAX(CASE...) to place the values in the fitting column (old-fashioned pivot or conditional aggregation).
Hint: If you want this fully generically, with a number of columns not knwon in advance. You cannot use any kind of function or ad-hoc query. You would rather need some kind of dynamic statement creation together with EXEC within a stored procedure.
to be honest: This might be a case of XY-problem. Such approaches are the wrong idea - at least in almost all situations I can think of.
UPDATE for SQL-Server 2017+
You are on v2017, this allows for JSON, which is a bit faster in position safe string splitting. Try this:
SELECT t.ID
,A.*
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A
The general idea is the same. We transform a string to a JSON-array ("a,b,c" => ["a","b","c"]) and read it with APPLY OPENJSON().
You can perform the same maths at the "key" column and do the rest as above.
Just because it is ready here, this is the full query for v2017+
WITH cte AS
(
SELECT t.ID
,A.[key]+1 AS Nr
,A.[value] AS ValueAtPosition
,A.[key] % 5 AS Position
,A.[key]/5 AS GroupingKey
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT('["',REPLACE(t.cst,';','","'),'"]')) A
)
SELECT ID
,GroupingKey
,MAX(CASE WHEN Position=0 THEN ValueAtPosition END) AS C1
,MAX(CASE WHEN Position=1 THEN ValueAtPosition END) AS C2
,MAX(CASE WHEN Position=2 THEN ValueAtPosition END) AS C3
,MAX(CASE WHEN Position=3 THEN ValueAtPosition END) AS C4
,MAX(CASE WHEN Position=4 THEN ValueAtPosition END) AS C5
FROM cte
GROUP BY ID,GroupingKey
ORDER BY ID,GroupingKey;

The easiest option here honestly might be the following steps:
Write out the current table to a CSV flat file, using semicolon as the separator (which is also the separator for the current cst column
Then load the CSV using SQL Server's bulk loading tool, again with semicolon as the column separator. This will yield a table with 16 columns, ID, and then C1 through and including C15.
Create a new table (ID, C1, C2, C3, C4, C5)
Then populate the above table using:
INSERT INTO newTable (ID, C1, C2, C3, C4, C5)
SELECT ID, C1, C2, C3, C4, C5 FROM loadedTable UNION ALL
SELECT ID, C6, C7, C8, C9, C10 FROM loadedTable UNION ALL
SELECT ID, C11, C12, C13, C14, C15 FROM loadedTable;
While the above suggestion might seem like a lot of work, SQL Server has poor support for regex and complex string splitting operations, especially on earlier versions. Working directly with your current table might be either not possible or more work than the above.

Interbase - combining queries

I have a table with to columns: Question and Answer. I want a query that results in one column containing interchangeably the Question and it's respective Answer.
Exemplifying:
Table "Question_Answer"
Q1, A1
Q2, A2
Q3, A3
Q4, A4
Query output, column name "Question_Answer_Result"
Q1
A1
Q2
A2
Q3
A3
Q4
A4
I tried the following command:
select "Question_Answer_Result"
from (select "Question_Answer"."Question"
from "Question_Answer"
union all
select "Question_Answer"."Answer"
from "Question_Answer"
)
but I receive a message "Unexpected end of command".
What would be the right SQL command?
Thanks.

What are you selecting from the subquery? The select is looking for column "Question_Answer_Result" from the subquery result; no such column exists. The only column is "Question".

You are missing a column reference in the initial select. As alluded to in my first response, the column 'question' is the result of union subquery. You need to select that column by name (or * will suffice) and then rename it.
select question 'question_answer'
from (
select question from question_answer
union all
select answer from question_answer
)
EDIT: this is SQLite to get your exact output
select question 'question_answer'
from (
select question from question_answer
union all
select answer from question_answer
) x
order by substr(question,length(question-1),1),substr(question,1) desc
note, the order of output will not be as you illustrated.

Can we write a hive query in Spark - UDF

Can we write a hive query in Spark - UDF.
eg I have 2 tables:
Table A and B
where b1 contains column names of A and b2 contains the value of that column in A.
Now I want to query the tables in such a way that I get result as below:
Result.
Basically replace the values of column in A with B based on column names and their corresponding values.
To achieve that I wrote spark-UDF eg:convert as below
def convert(colname: String, colvalue:String)={
sqlContext.sql("SELECT b3 from B where b1 = colname and b2 = colvalue").toString;
}
I registered it as:
sqlContext.udf.register("conv",convert(_:String,_:String));
Now my main query is-
val result = sqlContext.sql("select a1 , conv('a2',a2), conv('a3',a3)");
result.take(2);
It gives me java.lang.NullPointerException.
Can someone please suggest if this feature is supported in spark/hive.
Any other approach is also welcome.
Thanks!

No, UDF Doesn't permit to write a Query inside.
You can only pass the data as variables and do transformation to get the final result back at row/column/table level.

Here is the solution to your question. You can do it in Hive itself.
WITH a_plus_col
AS (SELECT a1
,'a2' AS col_name
,a2 AS col_value
FROM A
UNION ALL
SELECT a1
,'a3' AS col_name
,a3 AS col_value
FROM A)
SELECT a_plus_col.a1 AS r1
,MAX(CASE WHEN a_plus_col.col_name = 'a2' THEN B.b3 END) AS r2
,MAX(CASE WHEN a_plus_col.col_name = 'a3' THEN B.b3 END) AS r3
FROM a_plus_col
INNER JOIN B ON ( a_plus_col.col_name = b1 AND a_plus_col.col_value = b2)
GROUP BY a_plus_col.a1;

Find equal twin record postgresql

I have a table company with 60 columns. The goal is to create a tool to find, compare and eliminate duplicates in this table.
Example: I have a record with id 22 and I know it has a twin because I run this (simplified code):
SELECT min(co_id),co_name,count(*) FROM co
GROUP BY co_name
HAVING count(*) > 1
The result shows there are one twin (count 2) and I get the oldest id by min(co_id)
My question is how I search for the twin co_id? Just passing the oldest id?
Something like:
SELECT co_id FROM co
WHERE co_name EQUAL TO co_id='22'
LIMIT 2
Sample data:
id co_name
22 Volvo
23 Volvo
24 Ford
25 Ford
I know id 22 and I want to search for the twin 23 based on the content of 22.
The closest I found is this. Which is far from generic. And a nightmare for comparing 60 field:
SELECT id,
(SELECT max(b.id) from co b
WHERE a.co_name = b.co_name
LIMIT 1) as twin
FROM co a
WHERE id='22'
How do I do this in a more simple and generic way? I just want the twin record co_id.
Thank you in advance!

select max_co,co_name from (
select max(co_id) max_co,min(co_id) min_co,co_name from co
group by co_name having count(*)>1) where min_co=(your old co id as input);

You can join your table with itself:
SELECT c1.*
FROM
co_name c1 INNER JOIN co_name c2
ON c1.co_name=c2.co_name
AND c1.id>c2.id
this will return all duplicated records (but not the original record with the lowest id). Or since you're using Postgresql you can use a window function:
SELECT *
FROM (
SELECT
id,
co_name,
row_number() OVER (PARTITION by co_name ORDER BY id) as row
FROM
co_name
) s
WHERE
row>1;
Please see an example here.
If you want to compare multiple columns, the JOIN solution would be more flexible. I don't know exactly how you want to compare your columns and how you exactly define "twin" rows, but you a query like this should help:
SELECT c1.*
FROM
co_name c1 INNER JOIN co_name c2
ON (
c1.co_name=c2.co_name
OR c1.co_city=c2.co_city
OR c1.co_owner=c2.co_owner
OR ...
) AND c1.id>c2.id
if you just want duplicated records of id=22 then you can try with this:
SELECT c1.*
FROM
co_name c1 INNER JOIN co_name c2
ON c1.co_name=c2.co_name
AND c1.id>c2.id
WHERE
c2.id=22
or if you just want a single twin, comparing 60 columns, you can try with this query:
SELECT MIN(ID) as Twin /* or MAX(ID), depending what you're after */
FROM
co_name c1 INNER JOIN co_name c2
ON (
c1.co_name=c2.co_name
OR c1.co_city=c2.co_city
OR c1.co_owner=c2.co_owner
OR ...
) AND c1.id>c2.id
WHERE
c2.id=22

I found one solution that is working on 60 columns if I use variables in stead of hardcode in the query. Thanks everybody for all input. Some of them were about the same track.
SELECT id,
(SELECT max(b.id) from co b
WHERE concat(a.co_name,etc) = concat(b.co_name,etc)
LIMIT 1) as twin
FROM co a
WHERE id='22'
Not the best one, but fetch one twin at a time. And it is far from generic. Thanks for pointing me in the right direction. A generic solution would be nicer.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Lag Function to populate missing rows - postgresql

I was able to get desired output using the following SQL with t2 as (select c1, c3 from Test_table where c3 is not null) update Test_table t1 set c3 = t2.c3 from t2 where t1.c1 <= t2.c1 and t1.c3 is null; select c1 ,c2 ,C3 ,dense_rank() over(order by c3) cr from Test_table order by c1;

Use a window function to select the smallest c3 in a window ordered by c1 descending, but sort the whole output by c1 ascending: select c1, c2, c3, min(c3) over (order by c1 desc) as c4 from t order by c1;

I ran the SQL provided by you and I get this output. I thought the picture described what I really wanted. Hopefully, showing your output from running your SQL and desired output may help. SQL output from your query and desired output

Related

informix 14.10 How to "select" returns a specific phrase such as None or blank instead of no result

Need to split column into rows and columns

Interbase - combining queries

Can we write a hive query in Spark - UDF

Find equal twin record postgresql

Categories

Resources