Hi I'm trying to automate a workflow in Airflow where I am going to be appending rows to an external Spectrum table daily and I need to alter the numRows on the spectrum table by extracting the count of the existing table + the new count of rows I am appending.
CREATE EXTERNAL TABLE spectrum.my_external_table
(
id INTEGER,
barkdata_timestamp timestamp,
created_at timestamp,
updated_at timestamp
)
PARTITIONED BY (asofdate timestamp)
STORED AS PARQUET
LOCATION 's3://<SOME BUCKET>/manifest'
table properties ('numRows'= '<some number>';
ALTER TABLE spectrum.my_external_table
ADD PARTITION (asofdate='2021-03-03 00:00:00') LOCATION 's3://<SOME BUCKET>/asofdate=2021-03-03 00:00:00/';
ALTER TABLE spectrum.couponable_coupon
SET TABLE PROPERTIES ('numRows'='<HELP HERE should be count(*) from my_external_table + count(*) from table_I_unloaded_to_s3 where asofdate='2021-03-03 00:00:00'>');
Related
I have a table
CREATE TABLE IF NOT EXISTS prices
(shop_id integer not null,
good_id varchar(24) not null,
eff_date timestamp with time zone not null,
price_wholesale numeric(20,2) not null default 0 constraint chk_price_ws check (price_wholesale >= 0),
price_retail numeric(20,2) not null default 0 constraint chk_price_rtl check (price_retail >= 0),
constraint pk_prices primary key (shop_id, good_id, eff_date)
)partition by list (shop_id);
CREATE TABLE IF NOT EXISTS prices_1 partition of prices for values in (1);
CREATE TABLE IF NOT EXISTS prices_3 partition of prices for values in (2);
CREATE TABLE IF NOT EXISTS prices_4 partition of prices for values in (3);
CREATE TABLE IF NOT EXISTS prices_4 partition of prices for values in (4);
...
CREATE TABLE IF NOT EXISTS prices_6 partition of prices for values in (100);
I'd like to delete outdated prices. The table is huge , so I try to delete small portions of records.
If I use loop and the variable v_shop_id then after 6 times Postgres starts scanning all partitions. I simplified the code, the real code has inner loop by shop_id.
If I use loop without the variable (I explicitly specify the value) Postgres doesn't scan all partitions
here code with the variable
do $$
declare
v_shop_id integer;
v_date_time timestamp with time zone := now();
begin
v_shop_id := 8;
for step in 1..10 loop
delete from prices p
using (select pd.good_id, max(pd.eff_date) as mxef_dt
from prices pd
where pd.eff_date < v_date_time - interval '30 days'
and pd.shop_id = v_shop_id
group by ppd.good_id
having count(1)>1
limit 40000) pfd
where p.eff_date <= pfd.mxef_dt
and p.shop_id = v_shop_id
and p.good_id = pfd.good_id;
end loop;
end;$$LANGUAGE plpgsql
How can I force Postrges to scan one desired partition only?
Let's say I have a partitioned table A.
create table A (
col1 timestamp,
col2 int
)
partition by col2;
create table partition1 partition of A from values (minvalue) to (y);
create table partition1 partition of A from values (y) to (maxvalue);
copy A from '/some/csv/file'
The above code gives me a paritioned table A with the data populated. I want to create another table using -
create table B as (
select *,
col2 * 3 as col3 -- Add a new column
from A
);
Can I save A as a partitioned CSV/'insert_format' file?
Is it possible that B is also paritioned the same way A is?
I'm using this query to find duplicate dates but not sure how to sum each duplicate dates, average it and remove duplicate dates.
DB Schema
date_time
datapoint_1
datapoint_2
SQL Query
SELECT date_time, COUNT(date_time)
FROM MYTABLE
GROUP BY date_time
HAVING COUNT(date_time) > 1
ORDER BY COUNT(date_time)
I would create a new table to replace the old one. That is easier and might even perform better:
CREATE TABLE mytable2 (LIKE mytable);
INSERT INTO mytable2 (date_time, datapoint_1, datapoint_2)
SELECT m.date_time, avg(m.datapoint_1), avg(m.datapoint_2)
FROM mytable AS m
GROUP BY m.date_time;
Then you can drop mytable and rename mytable2 to replace it.
To prevent new rows from creating duplicates, you could change the way you insert data:
-- to keep track of counts
ALTER TABLE mytable ADD numval integer DEFAULT 1;
-- to prevent duplicates
ALTER TABLE mytable ADD UNIQUE (date_time);
-- to insert new rows
INSERT INTO mytable (date_time, datapoint_1, datapoint_2)
VALUES ('2021-06-30', 42.0, -34.9)
ON CONFLICT (date_time)
DO UPDATE SET numval = mytable.numval + 1,
datapoint_1 = mytable.datapoint_1 + excluded.datapoint_1,
datapoint_2 = mytable.datapoint_2 + excluded.datapoint_2;
-- to select the averages
SELECT date_time,
datapoint_1 / numval AS datapoint_1,
datapoint_2 / numval AS datapoint_2
FROM mytable;
When you use GROUP BY you can also use aggregate functions to reduce multiple lines to a single one (COUNT, that you used is one of such functions). In your case the query would be:
SELECT date_time, avg(datapoint_1), avg(datapoint_2)
FROM MYTABLE
GROUP BY date_time
For every distinct date_time you will get a single row with the average of datapoint_1 and datapoint_2.
I have a table in postgresql named 'views', containing information about users viewing a classified ad.
CREATE TABLE views (
view_id uuid DEFAULT random_gen_uuid() NOT NULL,
user_id uuid NOT NULL,
ad_id uuid NOT NULL,
timestamp timestamp with time zone DEFAULT 'NOW()' NOT NULL
);
I want to be able to insert a row for a specific user/ad ONLY when there is no other row 'younger' than 5 minutes. So I want to check if there already is a row with the user ID and the ad ID and where the timestamp is less than 5 minutes old. If so, I want to do something like INSERT... ON CONFLICT DO NOTHING.
Is this possible to do with a UNIQUE constraint? Or do I need a CHECK constraint, or do I have to do a separate query first every time I insert this?
You have to do a lookup first, but you can do the lookup and the insert in one statement using something like this:
with invars (user_id, ad_id) as (
values (?, ?) -- Pass your two ids in
)
insert into views (user_id, ad_id)
select user_id, ad_id
from invars i
where not exists (select 1
from views
where (user_id, ad_id) = (i.user_id, i.ad_id)
and "timestamp" >= now() - interval '5 minutes');
I need to add a serial field or increasing the id field for each row in a query
The following code is an attempt to do what I want.
create temp table tt_Final as
SELECT
'Transaccion' = Table1.Nombrem,
Table1.number as "number",
"Id"= Serial
from Table1