How to join 2 tables on fields which have different formats? - postgresql

I have 2 tables with the following structure:
Table A:
id - number
a_d - text
where A.a_d has the text format: "yyyy-mm-dd 00:00:00" (examples: 2001-08-22 00:00:00, or 2002-03-23 00:00:00)
Table B:
id - number
a_d - text
where B.a_d has the text format: "dd-month-yyyy" (example: 01-jul-2002 or 09-feb-2005)
I want to run join query on the text fields of those table.
select a.a_d
from A a
join B b
on a.a_d =?= b.a_d
I can't change or update the tables, just get data from them
How can I compare this 2 fields, if there have different format ?

Use TO_DATE to convert the text dates into bona fide dates before comparing:
SELECT a.a_d
FROM A a
INNER JOIN B b
ON a.a_d::date = TO_DATE(b.a_d, 'DD-mon-YYYY');
Note that the a_d field in table A happens to be a text timestamp which can already be directly cast to date, so we only need TO_DATE for the B table.
Ideally you should store your dates and timestamps in proper columns rather than text. Then, the join would be possible without costly conversions.

Related

Create rows from part of column names

Source data
I am working on an ELT project to load data from CSV files into PostgreSQL where I will transform it. The CSV files have many columns that are consistent across files, but also contain activity columns that are inconsistent with names like Date (05/19/2020), Type (05/19/2020), etc.
In the loading script I am merging all of the columns with dates in the column name into one jsonb column so I don't have to constantly add new columns to the raw data table.
The resulting jsonb column in the raw data table looks like this:
id
activity
12345678
{"Date (05/19/2020)": null, "Type (05/19/2020)": null, "Date (06/03/2020)": "06/01/2020", "Type (06/03/2020)": "E"}
98765432
{"Date (05/19/2020)": "05/18/2020", "Type (05/19/2020)": "B", "Date (10/23/2020)": "10/26/2020", "Type (10/23/2020)": "T"}
JSON to columns
Using the amazing create_jsonb_flat_view function from this post I can convert the jsonb to columns like this:
id
Date (05/19/2020)
Type (05/19/2020)
Date (06/03/2020)
Type (06/03/2020)
Type (10/23/2020
Date (10/23/2020)
Type (10/23/2020)
10629465
null
null
06/01/2020
E
98765432
05/18/2020
B
10/26/2020
T
Need to move part of column name to row
Now, this is where I'm stuck. I need to remove the portion of the column name that is the Activity Date (e.g. (05/19/2020)) and create a row for each id and ActivityDate with additional columns for Date and Type like this:
id
ActivityDate
Date
Type
12345678
05/19/2020
null
null
12345678
06/03/2020
06/01/2020
E
98765432
05/19/2020
05/18/2020
B
98765432
10/23/2020
10/26/2020
T
I followed your link to the create_jsonb_flat_view article yesterday and then forgot this question. While I thank you for pointing me there, I think that mentioning it worked against you.
A more conventional approach using regexp_replace() works here. I left the date values as strings, but you can convert them with to_date() if needed:
with parse as (
select id, e.k, e.v,
regexp_replace(e.k, '\s+\([0-9/]{10}\)', '') as k_no_date,
regexp_replace(e.k, '^.+([0-9/]{10}).+', '\1') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;
db<>fiddle here
#Mike-Organek's Answer works beautifully!
However, I was curious if the regexp_replace() calls might be slowing the query down a bit and it seemed I could get the same results using a simpler function.
Since Mike gave me a great example to start with I modified it to split on the space between Date and (05/19/2020).
For 20,000 rows, it went from taking an avg of 7 sec on my local machine to an avg of .9 sec.
Here is the resulting query:
with parse as (
select id, e.k, e.v,
split_part(e.k, ' ', 1) as k_no_date,
trim(split_part(e.k, ' ', 2),'()') as k_date_only
from rawinput
cross join lateral jsonb_each_text(activity) as e(k, v)
)
select id,
k_date_only as activity_date,
min(v) filter (where k_no_date = 'Date') as date,
min(v) filter (where k_no_date = 'Type') as type
from parse
group by id, k_date_only;

How to group by date and calculate the averages at the same time

I am quite new to this, so here it goes: I am trying to convert from unixtime to date format and then group by this by date while calculating the average on another column. This is in MariaDB.
CREATE OR REPLACE
VIEW `history_uint_view` AS select
`history_uint`.`itemid` AS `itemid`,
date(from_unixtime(`history_uint`.`clock`)) AS `mydate`,
AVG(`history_uint`.`value`) AS `value`
from
`history_uint`
where
((month(from_unixtime(`history_uint`.`clock`)) = month((now() - interval 1 month))) and ((`history_uint`.`value` in (1,
0))
and (`history_uint`.`itemid` in (54799, 54810, 54821, 54832, 54843, 54854, 54865, 54876, 54887, 54898, 54909, 54920, 58165, 58226, 59337, 59500, 59503, 59506, 60621, 60624, 60627, 60630, 60633, 60636, 60639, 60642, 60645, 60648, 60651, 60654, 60657, 60660, 60663, 60666, 60669, 60672, 60675, 60678, 60681, 60684, 60687, 60690, 60693, 60696, 60699, 64610)))
GROUP by 'itemid', 'mydate', 'value'
When you select aggregate functions (like AVG) with columns without aggregate functions, you should list all columns but the ones with aggregate function in GROUP BY-clause.
So your group by should look like:
GROUP by itemid, mydate
If you use single quotes (like 'itemid'), MariaDB treats them as strings, not columns.

"Function does not exist" errors when trying to split column containing array of timestampz into delimited text string in Postgres

I have a table with columns that contain arrays that I want converted into strings so I can split them by the delimiter into multiple columns.
I'm having trouble doing this with arrays of dates with timezones.
create materialized view matview1 as select
(location) as location,
(nullif(split_part(string_agg(distinct name,'; '),'; ',1),'')) as name1,
(nullif(split_part(string_agg(distinct name,'; '),'; ',2),'')) as name2,
(nullif(split_part(string_agg(distinct name,'; '),'; ',3),'')) as name3,
(array_agg(distinct(event_date_with_timestamp))) as event_dates
from table2 b
group by location;
In the code above I'm creating a materialized view of table to consolidate all table entries related to certain locations into single rows.
How can I create additional columns for each event_date entry like I did with the names (e.g Name1, Name2 and Name3 from the 'name' array)?
I tried changing the array to a string format with:
(nullif(split_part(array_to_string(array_agg(distinct(event_date_with_timestamp))),'; ',1),'')) as event_date1
But this throws the error:
"function array_to_string(timestamp with time zone[]) does not exist"
And casting to different datatypes always produces errors saying I can't cast from type timestampz into anything else.
I found a way to accomplish this by casting from timestampz to text and then back again like this:
(nullif(split_part(string_agg(distinct event_date::text,'; '),'; ',1),'')::date) as date1,

Finding if values in two columns exist

I have two columns of dates and I want to run a query that returns TRUE if there is a date in existence in the first column and in existence in the second column.
I know how to do it when I'm looking for a match (if the data entry in column A is the SAME as the entry in column B), but I don't know know how to find if data entry in column A and B are in existence.
Does anyone know how to do this? Thanks!
If data in a column is present, it IS NOT NULL. You can query for that on both columns, with and AND clause to get your result:
SELECT (date1 IS NOT NULL AND date2 IS NOT NULL) AS both_dates
FROM mytable;
So, rephrasing:
For any two entries in table x with date columns a and b, is there some pair of rows x1 and x2 where x1.a = x2.b?
If that's what you're trying to do, you want a self-join, e.g, presuming the presence of a single key column named id:
SELECT x1.id, x2.id, x1.a AS x1_a_x2_b
FROM mytable x1
INNER JOIN mytable x2 ON (x1.a = x2.b);

Postgres: buckets always filled from left in crosstab query

My query looks like this:
SELECT mthreport.*
FROM crosstab
('SELECT
to_char(ipstimestamp, ''mon DD HH24h'') As row_name,
varid::text || log.varid || ''_'' || ips.objectname::text As bucket,
COUNT(*)::integer As bucketvalue
FROM loggingdb_ips_boolean As log
INNER JOIN IpsObjects As ips
ON log.Varid=ips.ObjectId
WHERE ((log.varid = 37551)
OR (log.varid = 27087)
OR (log.varid = 50876)
OR (log.varid = 45096)
OR (log.varid = 54708)
OR (log.varid = 47475)
OR (log.varid = 54606)
OR (log.varid = 25528)
OR (log.varid = 54729))
GROUP BY to_char(ipstimestamp, ''yyyy MM DD HH24h''), row_name, objectid, bucket
ORDER BY to_char(ipstimestamp, ''yyyy MM DD HH24h''), row_name, objectid, bucket' )
As mthreport(item_name text, varid_37551 integer,
varid_27087 integer ,
varid_50876 integer ,
varid_45096 integer ,
varid_54708 integer ,
varid_47475 integer ,
varid_54606 integer ,
varid_25528 integer ,
varid_54729 integer ,
varid_29469 integer)
the query can be tested against a test table with this connection string:
"host=bellariastrasse.com port=5432 dbname=IpsLogging user=guest password=guest"
The query is syntactically correct and runs fine. My problem is that it the COUNT(*) values are always filling the leftmost column. however, in many instances the left columns should have a zero, or a NULL, and only the 2nd (or n-th) column should be filled. My brain is melting and I cannot figure out what is wrong!
The solution for your problem is to use the crosstab() variant with two parameters.
The second parameter (another query string) produces the list of output columns, so that NULL values in the data query (the first parameter) are assigned correctly.
Check the manual for the tablefunc extension, and in particular crosstab(text, text):
The main limitation of the single-parameter form of crosstab is that
it treats all values in a group alike, inserting each value into the
first available column. If you want the value columns to correspond to
specific categories of data, and some groups might not have data for
some of the categories, that doesn't work well. The two-parameter form
of crosstab handles this case by providing an explicit list of the
categories corresponding to the output columns.
Emphasis mine. I posted a couple of related answers recently here or here or here.