Finding if values in two columns exist - postgresql

I have two columns of dates and I want to run a query that returns TRUE if there is a date in existence in the first column and in existence in the second column.
I know how to do it when I'm looking for a match (if the data entry in column A is the SAME as the entry in column B), but I don't know know how to find if data entry in column A and B are in existence.
Does anyone know how to do this? Thanks!

If data in a column is present, it IS NOT NULL. You can query for that on both columns, with and AND clause to get your result:
SELECT (date1 IS NOT NULL AND date2 IS NOT NULL) AS both_dates
FROM mytable;

So, rephrasing:
For any two entries in table x with date columns a and b, is there some pair of rows x1 and x2 where x1.a = x2.b?
If that's what you're trying to do, you want a self-join, e.g, presuming the presence of a single key column named id:
SELECT x1.id, x2.id, x1.a AS x1_a_x2_b
FROM mytable x1
INNER JOIN mytable x2 ON (x1.a = x2.b);

Related

Create rows from part of column names

Source data
I am working on an ELT project to load data from CSV files into PostgreSQL where I will transform it. The CSV files have many columns that are consistent across files, but also contain activity columns that are inconsistent with names like Date (05/19/2020), Type (05/19/2020), etc.
In the loading script I am merging all of the columns with dates in the column name into one jsonb column so I don't have to constantly add new columns to the raw data table.
The resulting jsonb column in the raw data table looks like this:
id
activity
12345678
{"Date (05/19/2020)": null, "Type (05/19/2020)": null, "Date (06/03/2020)": "06/01/2020", "Type (06/03/2020)": "E"}
98765432
{"Date (05/19/2020)": "05/18/2020", "Type (05/19/2020)": "B", "Date (10/23/2020)": "10/26/2020", "Type (10/23/2020)": "T"}
JSON to columns
Using the amazing create_jsonb_flat_view function from this post I can convert the jsonb to columns like this:
id
Date (05/19/2020)
Type (05/19/2020)
Date (06/03/2020)
Type (06/03/2020)
Type (10/23/2020
Date (10/23/2020)
Type (10/23/2020)
10629465
null
null
06/01/2020
E
98765432
05/18/2020
B
10/26/2020
T
Need to move part of column name to row
Now, this is where I'm stuck. I need to remove the portion of the column name that is the Activity Date (e.g. (05/19/2020)) and create a row for each id and ActivityDate with additional columns for Date and Type like this:
id
ActivityDate
Date
Type
12345678
05/19/2020
null
null
12345678
06/03/2020
06/01/2020
E
98765432
05/19/2020
05/18/2020
B
98765432
10/23/2020
10/26/2020
T
I followed your link to the create_jsonb_flat_view article yesterday and then forgot this question. While I thank you for pointing me there, I think that mentioning it worked against you.
A more conventional approach using regexp_replace() works here. I left the date values as strings, but you can convert them with to_date() if needed:
with parse as (
select id, e.k, e.v,
regexp_replace(e.k, '\s+\([0-9/]{10}\)', '') as k_no_date,
regexp_replace(e.k, '^.+([0-9/]{10}).+', '\1') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;
db<>fiddle here
#Mike-Organek's Answer works beautifully!
However, I was curious if the regexp_replace() calls might be slowing the query down a bit and it seemed I could get the same results using a simpler function.
Since Mike gave me a great example to start with I modified it to split on the space between Date and (05/19/2020).
For 20,000 rows, it went from taking an avg of 7 sec on my local machine to an avg of .9 sec.
Here is the resulting query:
with parse as (
select id, e.k, e.v,
split_part(e.k, ' ', 1) as k_no_date,
trim(split_part(e.k, ' ', 2),'()') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;

Want to delete all rows of records containing null values in DolphinDB

I have a table where a record may contain null values in one or more columns. I want to delete these records as long as it contains a null value. I'm wondering if there is any suggested way to do that in DolphinDB?
Try DolphinDB function rowAnd to specify the output conditions.
The following script is for your reference. It outputs rows of data only when the columns meet the set conditions (delete the records if NULL contained):
sym = take(`a`b`c, 110)
id = 1..100 join take(int(),10)
id2 = take(int(),10) join 1..100
t = table(sym, id,id2)
t[each(isValid, t.values()).rowAnd()]
The output can be found in the screenshot:

How to re-map array column values in select in Postgresql?

Is it possible to re-map integer values from a Postgres array column in the select? This is what I have:
select unnest(tag_ids) from mention m where id = 288201;
unnest
---------
-143503
-143564
125192
143604
137694
tag_ids is integer[] column
I would like to translate those numbers. Functions like abs(unnest(..)) work but found I cannot use a CASE statement. Tx.
If you want to do anything non-trivial with the elements from an array after unnesting, use the set-returning function like table:
select u.tag_id
from mention m
cross join unnest(m.tag_ids) as u(tag_id)
where m.id = 288201;
Now, u.tag_id is an integer column that you can use like any other column, e.g. in a CASE expression.

What does a column assignment using an aggregate in the columns area of a select do?

I'm trying to decipher another programmer's code who is long-gone, and I came across a select statement in a stored procedure that looks like this (simplified) example:
SELECT #Table2.Col1, Table2.Col2, Table2.Col3, MysteryColumn = CASE WHEN y.Col3 IS NOT NULL THEN #Table2.MysteryColumn - y.Col3 ELSE #Table2.MysteryColumn END
INTO #Table1
FROM #Table2
LEFT OUTER JOIN (
SELECT Table3.Col1, Table3.Col2, Col3 = SUM(#Table3.Col3)
FROM Table3
INNER JOIN #Table4 ON Table4.Col1 = Table3.Col1 AND Table4.Col2 = Table3.Col2
GROUP BY Table3.Col1, Table3.Col2
) AS y ON #Table2.Col1 = y.Col1 AND #Table2.Col2 = y.Col2
WHERE #Table2.Col2 < #EnteredValue
My question, what does the fourth column of the primary selection do? does it produce a boolean value checking to see if the values are equal? or does it set the #Table2.MysteryColumn equal to some value and then inserts it into #Table1? Or does it just update the #Table2.MysteryColumn and not output a value into #Table1?
This same thing seems to happen inside of the sub-query on the third column, and I am equally at a loss as to what that does as well.
MysteryColumn = gives the expression a name also called a column alias. The fact that a column in the table#2 also has the same name is besides the point.
Since it uses INTO syntax it also gives the column its name in the resulting temporary table. See the SELECT CLAUSE and note | column_alias = expression and the INTO CLAUSE

Another way of returning rows if any of the columns has different value for the same id

Is there any other way for returning rows for the same id by joining two tables and return the row if any of the columns value for the same id is different.
Select Table1.No,Table2.No,Table1.Name,Table2.Name,Table1.ID,Table2.ID,Table1.ID_N,Table2.ID_N
From MyFirstTable Table1
JOIN MySecondTable Table2
ON Table1.No=Table2.No where Table1.ID!=Table2.ID or Table1.ID_N != Table2.ID_N
In the example above , I have only two columns I need to check but in my real case there are at least 20 .
Is there any other statment I can use instead of enumerating each column in the where codition?
...WHERE BINARY_CHECKSUM(Table1.*) <> BINARY_CHECKSUM(Table2.*)
or
...WHERE BINARY_CHECKSUM(Table1.Field1, Table1.Field2, ...) <> BINARY_CHECKSUM(Table2..Field1, Table2.Field2, ...)
*this assumes you have no blob fields in your tables
http://technet.microsoft.com/en-us/library/ms173784.aspx
If No is a PK
Select Table1.No,Table1.Name,Table1.ID,Table1.ID_N
From MyFirstTable Table1
except
Select Table1.No,Table1.Name,Table1.ID,Table1.ID_N
From MySecondTable Table1