SQL Insert fails on i.name does not exist, when it seemingly does, during insert - postgresql

I'm using Postgres SQL and pgAdmin. I'm attempting to copy data between a staging table, and a production table using INSERT INTO with a SELECT FROM statement with a to_char along the way. This may or may not be the wrong approach. The SELECT fails because apparently "column i.dates does not exist".
The question is: Why am I getting 'column i.dates does not exist'?
The schema for both tables is identical except for a date conversion.
I've tried matching the schema of the tables with the exception of the to_char conversion. I've checked and double checked the column exists.
This is the code I'm trying:
INSERT INTO weathergrids (location, dates, temperature, rh, wd, ws, df, cu, cc)
SELECT
i.location AS location,
i.dates as dates,
i.temperature as temperature,
i.rh as rh,
i.winddir as winddir,
i.windspeed as windspeed,
i.droughtfactor as droughtfactor,
i.curing as curing,
i.cloudcover as cloudcover
FROM (
SELECT location,
to_char(to_timestamp(dates, 'YYYY-DD-MM HH24:MI'), 'HH24:MI YYYY-MM-DD HH24:MI'),
temperature, rh, wd, ws, df, cu, cc
FROM wosweathergrids
) i;
The error I'm receiving is:
ERROR: column i.dates does not exist
LINE 4: i.dates as dates,
^
SQL state: 42703
Character: 151
My data schema is like:
+-----------------+-----+-------------+-----------------------------+-----+
| TABLE | NUM | COLNAME | DATATYPE | LEN |
+-----------------+-----+-------------+-----------------------------+-----+
| weathergrids | 1 | id | integer | 32 |
| weathergrids | 2 | location | numeric | 6 |
| weathergrids | 3 | dates | timestamp without time zone | |
| weathergrids | 4 | temperature | numeric | 3 |
| weathergrids | 5 | rh | numeric | 4 |
| weathergrids | 6 | wd | numeric | 4 |
| weathergrids | 7 | wsd | numeric | 4 |
| weathergrids | 8 | df | numeric | 4 |
| weathergrids | 9 | cu | numeric | 4 |
| weathergrids | 10 | cc | numeric | 4 |
| wosweathergrids | 1 | id | integer | 32 |
| wosweathergrids | 2 | location | numeric | 6 |
| wosweathergrids | 3 | dates | character varying | 16 |
| wosweathergrids | 4 | temperature | numeric | 3 |
| wosweathergrids | 5 | rh | numeric | 4 |
| wosweathergrids | 6 | wd | numeric | 4 |
| wosweathergrids | 7 | ws | numeric | 4 |
| wosweathergrids | 8 | df | numeric | 4 |
| wosweathergrids | 9 | cu | numeric | 4 |
| wosweathergrids | 10 | cc | numeric | 4 |
+-----------------+-----+-------------+-----------------------------+-----+

Your derived table (sub-query) named i has no column named dates because the column dates is "hidden" in the to_char() function and as it does not define an alias for that expression, no column dates is available "outside" of the derived table.
But I don't see the reason for a derived table to begin with. Also: aliasing a column with the same name is also unnecessary i.location as location is exactly the same thing as i.location.
So your query can be simplified to:
INSERT INTO weathergrids (location, dates, temperature, rh, wd, ws, df, cu, cc)
SELECT
location,
to_timestamp(dates, 'YYYY-DD-MM HH24:MI'),
temperature,
rh,
winddir,
windspeed,
droughtfactor,
curing,
cloudcover
FROM wosweathergrids
You don't need to give an alias to the to_timestamp() expression as the column are matched by position, not by name in an insert ... select statement.

Related

Create a PostgreSQL function that becomes a formula field of a table retrieving related data from other table

The example above can be done on a SQL Server. It is a function that performs the calculation on another table while getting the current table field Id to list data from other table, return a single value.
Question: how to do the exact thing with PostgreSQL
SELECT TOP(5) * FROM Artists;
+------------+------------------+--------------+-------------+
| ArtistId | ArtistName | ActiveFrom | CountryId |
|------------+------------------+--------------+-------------|
| 1 | Iron Maiden | 1975-12-25 | 3 |
| 2 | AC/DC | 1973-01-11 | 2 |
| 3 | Allan Holdsworth | 1969-01-01 | 3 |
| 4 | Buddy Rich | 1919-01-01 | 6 |
| 5 | Devin Townsend | 1993-01-01 | 8 |
+------------+------------------+--------------+-------------+
SELECT TOP(5) * FROM Albums;
+-----------+------------------------+---------------+------------+-----------+
| AlbumId | AlbumName | ReleaseDate | ArtistId | GenreId |
|-----------+------------------------+---------------+------------+-----------|
| 1 | Powerslave | 1984-09-03 | 1 | 1 |
| 2 | Powerage | 1978-05-05 | 2 | 1 |
| 3 | Singing Down the Lane | 1956-01-01 | 6 | 3 |
| 4 | Ziltoid the Omniscient | 2007-05-21 | 5 | 1 |
| 5 | Casualties of Cool | 2014-05-14 | 5 | 1 |
+-----------+------------------------+---------------+------------+-----------+
The function
CREATE FUNCTION [dbo].[ufn_AlbumCount] (#ArtistId int)
RETURNS smallint
AS
BEGIN
DECLARE #AlbumCount int;
SELECT #AlbumCount = COUNT(AlbumId)
FROM Albums
WHERE ArtistId = #ArtistId;
RETURN #AlbumCount;
END;
GO
Now, (at SQL Server), after update the first table fields with ALTER TABLE Artists ADD AlbumCount AS dbo.ufn_AlbumCount(ArtistId); whe can list and get the following result.
+------------+------------------+--------------+-------------+--------------+
| ArtistId | ArtistName | ActiveFrom | CountryId | AlbumCount |
|------------+------------------+--------------+-------------+--------------|
| 1 | Iron Maiden | 1975-12-25 | 3 | 5 |
| 2 | AC/DC | 1973-01-11 | 2 | 3 |
| 3 | Allan Holdsworth | 1969-01-01 | 3 | 2 |
| 4 | Buddy Rich | 1919-01-01 | 6 | 1 |
| 5 | Devin Townsend | 1993-01-01 | 8 | 3 |
| 6 | Jim Reeves | 1948-01-01 | 6 | 1 |
| 7 | Tom Jones | 1963-01-01 | 4 | 3 |
| 8 | Maroon 5 | 1994-01-01 | 6 | 0 |
| 9 | The Script | 2001-01-01 | 5 | 1 |
| 10 | Lit | 1988-06-26 | 6 | 0 |
+------------+------------------+--------------+-------------+--------------+
but how to achieve this on postgresql?
Postgres doesn't support "virtual" computed column (i.e. computed columns that are generated at runtime), so there is no exact equivalent. The most efficient solution is a view that counts this:
create view artists_with_counts
as
select a.*,
coalesce(t.album_count, 0) as album_count
from artists a
left join (
select artist_id, count(*) as album_count
from albums
group by artist_id
) t on a.artist_id = t.artist_id;
Another option is to create a function that can be used as a "virtual column" in a select - but as this is done row-by-row, this will be substantially slower than the view.
create function album_count(p_artist artists)
returns bigint
as
$$
select count(*)
from albums a
where a.artist_id = p_artist.artist_id;
$$
language sql
stable;
Then you can include this as a column:
select a.*, a.album_count
from artists a;
Using the function like that, requires to prefix the function reference with the table alias (alternatively, you can use album_count(a))
Online example

How to get non-aggregated measures?

I calculate my metrics with SQL and publish the resulting table to Tableau Server. Afterward, use this data source to create charts and dashboards.
For one analysis, I already calculated the measures per day with SQL. When I use the resulting table in Tableau, it aggregates these measures to SUM by default. However, I don't want to have SUM or AVG of the average or SUM of the Percentiles.
What I want is the result when I don't select date dimension and not GROUP BY date in SQL as attached below.
Here is the query:
SELECT
-- date,
COUNT(DISTINCT id) AS count_of_id,
AVG(timediff_in_sec) AS avg_timediff,
PERCENTILE_CONT(0.25) WITHIN GROUP(ORDER BY timediff_in_sec) AS percentile_25,
PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY timediff_in_sec) AS percentile_50
FROM
(
--subquery
) AS t1
-- GROUP BY date
Here are the first 10 rows of the resulting table:
+------------+--------------+-------------+---------------+---------------+
| date | avg_timediff | count_of_id | percentile_25 | percentile_50 |
+------------+--------------+-------------+---------------+---------------+
| 10/06/2020 | 61,65186364 | 22 | 8,5765 | 13,3015 |
| 11/06/2020 | 127,2913333 | 3 | 15,6045 | 17,494 |
| 12/06/2020 | 306,0348214 | 28 | 12,2565 | 17,629 |
| 13/06/2020 | 13,2664 | 5 | 11,944 | 13,862 |
| 14/06/2020 | 16,728 | 7 | 14,021 | 17,187 |
| 15/06/2020 | 398,6424595 | 37 | 11,893 | 19,271 |
| 16/06/2020 | 293,6925152 | 33 | 12,527 | 17,134 |
| 17/06/2020 | 155,6554286 | 21 | 13,452 | 16,715 |
| 18/06/2020 | 383,8101429 | 7 | 266,048 | 493,722 |
+------------+--------------+-------------+---------------+---------------+
How can I achieve the desired output above?
Drag them all into the dimensions list, then they will be static dimensions. For your use you could also just drag the Date field to Rows. Aggregating 1 value, which you have for each date, returns the same value whatever the aggregation type.

how to retrieve information from three tables in below conditions in postgresql

I have three tables.
TABLE_1:
T2_ID ver date boolean
---------------------------------------------------------
1 | X-20-50 | 2019-01-01 16:20:51.722336+00 | TRUE
2 | X-50-30 | 2019-02-26 16:20:51.722336+00 | TRUE
3 | X-20-32 | 2019-03-20 16:20:51.722336+00 | FALSE
1 | X-20-50 | 2019-01-09 16:20:51.722336+00 | FALSE
2 | X-20-50 | 2019-12-02 16:20:51.722336+00 | TRUE
3 | X-20-50 | 2019-01-24 16:20:51.722336+00 | TRUE
TABLE_2:
id | type | scheduler
--------------------------------------------------
1 | ABC | w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,w12
2 | PQR | w5,w9
3 | TRC | w1,w4,w8
TABLE_3
start_date_of_ver | end_date_of_ver | ver_name
-----------------------------------------------------------
2019-01-01 00:00:00+00 | 2019-04-01 00:00:00+00 | X-20-50
2019-02-25 00:00:00+00 | 2019-05-26 00:00:00+00 | X-50-30
2019-03-15 00:00:00+00 | 2019-06-06 00:00:00+00 | X-20-32
Table 4 should fulfill the below condition.
it takes version name (ver_name) as input
from this (ver_name), it takes start date and end date of version (from table_3) if the version period is 3 months then it creates 12 weeks table with id (type) as the first column and creates an entry of twelve-week according to table 2 of the scheduler.
information on table 4 will be updated as and when table 1 has entries of that particular week which are TRUE
Note: table 1, entries get generates on a daily basis.
Desired table: which has only ver_name as input and calculate below table.
When table_1 don't have any entries then table_4 should look like as below
Table_4: X-20-50
id_of_table_2 | week_1 | week_2 | week_3 | week_4 | week_5 | week_6 | week_7 | week_8 | week_9 | week_10 | week_11 | week_12 |
------------------------------------------------------------------------------------------------------------------------------
ABC | w1 | w2 | w3 | w4 | w5 | w6 | w7 | w8 | w9 | w10 | w11 | w12 |
PQR | | | | | w5 | | | | w9 | | | |
TRC | w1 | | | w4 | | | | w8 | | | | |
When table_1 has entries then table_4 should look like as below
X-20-50
id_of_table_2 | week_1 | week_2 | week_3 | week_4 | week_5 | week_6 | week_7 | week_8 | week_9 | week_10 | week_11 | week_12 |
------------------------------------------------------------------------------------------------------------------------------
ABC | Done | Done | w3 | w4 | w5 | w6 | w7 | w8 | w9 | w10 | w11 | w12 |
PQR | | | | | w5 | | | | w9 | | | |
TRC | Done | | | w4 | | | | w8 | | | | |
You can create function which can take starting date of a week as input.
Example-
create function a(start_date)
RETURNS json
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
outputjson json;
BEGIN
EXECUTE 'select json_agg(*) from table_name where date >= '||start_date||' and (date '||start_date||' + integer ''7'')' into outputjson;
RETURN outputjson;
END;
$$
Hope this will help.
Your requirement needs a little refinement. You specify to retrieve weekly data yet fail to define a your week. On what day does it begin? Are all weeks 7 days long? What happens when Dec 31 falls on Tuesday is Friday Jan 3 in the same week (see current year calendar). Then there is the issue of user input and what it represents. Is it the desired start date and the week is that date and the next 6 days or any date within weekly period?
The following assumes an ISO 8601 definition (google it - lots of stuff). Every week begins on Monday and all weeks are 7 days long. (Thus the week containing 31-Dec-2019 also includes 3-Jan-2020). The routine extracts the ISO Year and ISO week user entered date.
--setup
create table weekly_something( c1 text, c2 text, date1 timestamptz, someem boolean);
insert into weekly_something( c1, c2, date1, someem )
values ('ABC','AB-20-50','2019-11-25 16:20:51.722336+00',TRUE)
, ('PQR','AB-50-30','2019-11-26 16:20:51.722336+00',TRUE)
, ('TRC','CD-20-32','2019-11-27 16:20:51.722336+00',FALSE)
, ('ABC','AB-20-50','2019-12-02 16:20:51.722336+00',FALSE)
, ('ABC','AB-20-50','2019-12-02 16:20:51.722336+00',TRUE)
, ('JFF','yy-45-89','2019-12-31 16:20:51.722336+00',TRUE)
, ('JFF','yy-89-30','2020-01-03 16:20:51.722336+00',TRUE) ;
-- JFF Just For Fun
-- SQL Function
create function week_of(week_date date)
returns setof weekly_something
language sql stable strict
as $$
select *
from weekly_something
where (extract('isoyear' from week_date), extract('week' from week_date)) =
(extract('isoyear' from date1), extract('week' from date1));
$$;
-- test
select * from week_of('2019-11-26');
select * from week_of('2019-12-30');

postgres sql : getting unified rows

I have one table where I dump all records from different sources (x, y, z) like below
+----+------+--------+
| id | source |
+----+--------+
| 1 | x |
| 2 | y |
| 3 | x |
| 4 | x |
| 5 | y |
| 6 | z |
| 7 | z |
| 8 | x |
| 9 | z |
| 10 | z |
+----+--------+
Then I have one mapping table where I map values between sources based on my usecase like below
+----+-----------+
| id | mapped_id |
+----+-----------+
| 1 | 2 |
| 1 | 9 |
| 3 | 7 |
| 4 | 10 |
| 5 | 1 |
+----+-----------+
I want merged results where I can see only unique results like
+-----+------------+
| id | mapped_ids |
+-----+------------+
| 1 | 2,9,5 |
| 3 | 7 |
| 4 | 10 |
| 6 | null |
| 8 | null |
+-----+------------+
I am trying different options but could not figure this out, is there way I can write joins to do this. I have to use the mapping table where associations are stored and identify unique records along with records which are not mapped anywhere.
My understanding is, you want to see all dump_table IDs that do not appear in the mapping_id column and then aggregate the mapped_ids for those that are left:
select d1.id,
array_agg(m1.mapped_id order by m1.mapped_id) filter (where m1.mapped_id is not null) as mapped_ids
from dump_table d1
left join mapping_table m1 using (id)
where not exists (select *
from mapping_table m2
where m2.mapped_id = d1.id)
group by d1.id;
Online example: https://rextester.com/JQZ17650
Try something like this:
SELECT id, name, ARRAY_AGG(mapped_id) AS mapped_ids
FROM table1 AS t1
LEFT JOIN table2 AS t2 USING (id)
GROUP BY id, name

Assign a constant integer to existing newly inserted values

I'm looking for an intelligent sequence generator that will be able to assign a constant int value to a column if a string column values exists in the table already. The scenario is as below
_____________________________
| Col1 | Col2 | Col 3 |
|---------------------------|
| a | a | 1 |
| b | a | 1 |
| c | a | 1 |
| u | b | 2 |
| v | b | 2 |
| w | b | 2 |
-----------------------------
Let's say I insert another value which is ('d','a') to Col1 & Col2 respectively, I want Col3 to become '1' automatically as the Col3 value corresponding to 'a' already exists as '1' and become as seen below
_____________________________
| Col1 | Col2 | Col 3 |
|---------------------------|
| a | a | 1 |
| b | a | 1 |
| c | a | 1 |
| u | b | 2 |
| v | b | 2 |
| w | b | 2 |
| d | a | 1 |
------------------------------
Is there a way I can define it in 'Create Table' so that the Col3 value update happens upon value insertion into Col1, Col2?
Edit :
The scenario is something like this
______________________________________________________________
| Col1 | Col2 | Col 3 |
|------------------------------------------------------------|
| Adobe | Adobe | 1 |
| Adobe Systems | Adobe | 1 |
| Adobe Systems Inc | Adobe | 1 |
| Honeywell | Honeywell | 2 |
| Honeywell Inc | Honeywell | 2 |
| Honeywell Inc. | Honeywell | 2 |
--------------------------------------------------------------
And when I add new data, I would like it to be
______________________________________________________________
| Col1 | Col2 | Col 3 |
|------------------------------------------------------------|
| Adobe | Adobe | 1 |
| Adobe Systems | Adobe | 1 |
| Adobe Systems Inc | Adobe | 1 |
| Honeywell | Honeywell | 2 |
| Honeywell Inc | Honeywell | 2 |
| Honeywell Inc. | Honeywell | 2 |
| Adobe Systems Incorporated | Adobe | 1 |
--------------------------------------------------------------
The Col3 value has to be an integer for faster joins with other tables. I will insert values for Col1 & Col2, and the corresponding value should be available in Col3 upon insert.
Just normalize it:
create table corporation (
corporation_id serial,
short_name text
);
insert into corporation (short_name) values
('Adobe'),('Honeywell');
select * from corporation;
corporation_id | short_name
----------------+------------
1 | Adobe
2 | Honeywell
Now you table is:
| Adobe | 1
| Adobe Systems | 1
| Adobe Systems Inc | 1
| Honeywell | 2
| Honeywell Inc | 2
| Honeywell Inc. | 2
| Adobe Systems Incorporated | 1
The Col3 value has to be an integer for faster joins with other
tables.
So you need unique, but not necessarily sequential, values. You can use a hash function to map strings into unique integers, e. g.:
postgres=# select hashtext('Adobe');
hashtext
-----------
173079840
(1 row)
postgres=# select hashtext('Honeywell');
hashtext
------------
-453308048
(1 row)
With such a function, you avoid the need to lookup existing values in the table.
But are you really sure you will have performance problems using strings for foreign keys? You should test it on your data, I think. (Also test on upcoming 9.5, with its abbreviated keys feature.)