Redshift Copy and row_number function

Redshift Copy and row_number function - amazon-redshift

Is there a way to make a Redshift Copy while at the same time generating the row_number() within the destination table?
I am basically looking for the equivalent of the below except that the group of rows does not come from a select but from a copy command for a file on S3
insert into aTable
(id,x,y,z)
select
#{startIncrement}+row_number() over(order by x) as id,
x,y,z,
from anotherTable;
Thx

I understand from your question is that, you need to insert an additional column id into table and that id was not there in the csv file. If my understanding is right, please follow the below approach,
Copy the data from the csv file to a temp table say "aTableTemp" which has the schema without column "id". Then insert data from "aTableTemp" into "aTable" as follows
Insert into aTable
Select #{startIncrement}+row_number() over(order by x) as id, * from aTableTemp
I hope this should solve your problem

Maybe copy into a table with a identity column just don’t copy into that field?

Related

Db2 convert rows to columns

I need the below results ..
Table :
Order postcode qnty
123 2234 1
Expected result:
Order 123
Postcode 2234
Qnty 1
SQL server:
Select pvt.element_name
,pvt.element_value(select order.postcode
from table name)up
unpivot (element_value for element_name in(order,postcode) as Pvt
How to achieve this in db2?

Db2 for IBM i doesn't have a built-in unpviot function.. AFAIK, it's not available on any Db2 platofrm...unless it's been added recently.
The straight forward method
select 'ORDER' as key, order as value
from mytable
UNION ALL
select 'POSTCODE', postcode
from mytable
UNION ALL
select 'QNTY', char(qnty)
from mytable;
A better performing method is to do a cross join between the source table and a correlated VALUES of as many rows as columns that need to be unpivoted.
select
Key, value
from mytable T,
lateral (values ('ORDER', t.order)
, ('POSTCODE', t.postcode)
, ('QNQTY', varchar(t.qnty))
) as unpivot(key, value);
However, you'll need to know ahead of time what the values you're unpivoting on.
If you don't know the values, there are some ways to unpivot with the XMLTABLE (possibly JSON_TABLE) that might work. I've never used them, and I'm out of time to spend answering this question. You can find some examples via google.

I have created a stored procedure for LUW that rotate a table:
https://github.com/angoca/db2tools/blob/master/pivot.sql
You just need to call the stored procedure by passing the tablename as parameter, and it will return a cursor with the headers of the column in the first column.

Don't copy all rows in postgres

I want to copy some data from a table to another but I dont want to copy all the rows, just a few a them (like only the first 100).
I didn't find an option in the COPY command for this.
So that's only possible or not ?
Thank you in advance.

You didn't look very hard.
https://www.postgresql.org/docs/current/static/sql-copy.html
To copy into a file just the countries whose names start with 'A':
COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO
'/usr1/proj/bray/sql/a_list_countries.copy';
So you would want COPY (SELECT ... LIMIT 100).
I'm assuming you want to copy between databases or via an intermediate file or something, otherwise you just want to use INSERT ... SELECT as shown in another answer.

You can use
INSERT INTO table2(col_list)
SELECT col_list FROM table1 WHERE condtitions ORDER BY order_col_list LIMIT X;

You can limit number of rows when you copy (SELECT ...LIMIT) to, but not COPY from. You did not find such options in docs, because there are no such.
use insert into ... select from ... limit 100 instead

Store the whole query result in variable using postgresql Stored procedure

I'm trying to get the whole result of a query into a variable, so I can loop through it and make inserts.
I don't know if it's possible.
I'm new to postgre and procedures, any help will be very welcome.
Something like:
declare result (I don't know what kind of data type I should use to get a query);
select into result label, number, desc from data
Thanks in advance!

I think you have to read PostgreSQL documentation about cursors.
But if you want just insert data from one table to another, you can do:
insert into data2 (label, number, desc)
select label, number, desc
from data
if you want to "save" data from query, you also can use temporary table, which you can create by usual create table or create table as:
create temporary table temp_data as
(
select label, number, desc
from data
)
see documentation

Hive: How to do a SELECT query to output a unique primary key using HiveQL?

I have the following schema dataset which i want to transform into a table that can be exported to SQL. I am using HIVE. Input as follows
call_id,stat1,stat2,stat3
1,a,b,c,
2,x,y,z,
3,d,e,f,
1,j,k,l,
The output table needs to have call_id as its primary key so it needs to be unique. The output schema should be
call_id,stat2,stat3,
1,b,c, or (1,k,l)
2,y,z,
3,e,f,
The problem is that when i use the keyword DISTINCT in the HIVE query, the DISTINCT applies to the all the colums combined. I want to apply the DISTINCT operation only to the call_id. Something on the lines of
SELECT DISTINCT(call_id), stat2,stat3 from intable;
However this is not valid in HIVE(I am not well-versed in SQL either).
The only legal query seems to be
SELECT DISTINCT call_id, stat2,stat3 from intable;
But this returns multiple rows with same call_id as the other columns are different and the row on the whole is distinct.
NOTE: There is no arithmetic relation between a,b,c,x,y,z, etc. So any trick of averaging or summing is not viable.
Any ideas how i can do this?

One quick idea,not the best one, but will do the work-
hive>create table temp1(a int,b string);
hive>insert overwrite table temp1
select call_id,max(concat(stat1,'|',stat2,'|',stat3)) from intable group by call_id;
hive>insert overwrite table intable
select a,split(b,'|')[0],split(b,'|')[1],split(b,'|')[2] from temp1;

,,I want to apply the DISTINCT operation only to the call_id"
But how will then Hive know which row to eliminate?
Without knowing the amount of data / size of the stat fields you have, the following query can the job:
select distinct i1.call_id, i1.stat2, i1.stat3 from (
select call_id, MIN(concat(stat1, stat2, stat3)) as smin
from intable group by call_id
) i2 join intable i1 on i1.call_id = i2.call_id
AND concat(i1.stat1, i1.stat2, i1.stat3) = i2.smin;

Finding duplicates between two tables

I've got two SQL2008 tables, one is a "Import" table containing new data and the other a "Destination" table with the live data. Both tables are similar but not identical (there's more columns in the Destination table updated by a CRM system), but both tables have three "phone number" fields - Tel1, Tel2 and Tel3. I need to remove all records from the Import table where any of the phone numbers already exist in the destination table.
I've tried knocking together a simple query (just a SELECT to test with just now):
select t2.account_id
from ImportData t2, Destination t1
where
(t2.Tel1!='' AND (t2.Tel1 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel2!='' AND (t2.Tel2 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel3!='' AND (t2.Tel3 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
... but I'm aware this is almost certainly Not The Way To Do Things, especially as it's very slow. Can anyone point me in the right direction?

this query requires a little more that this information. If You want to write it in the efficient way we need to know whether there is more duplicates each load or more new records. I assume that account_id is the primary key and has a clustered index.
I would use the temporary table approach that is create a normalized table #r with an index on phone_no and account_id like
SELECT Phone, Account into #tmp
FROM
(SELECT account_id, tel1, tel2, tel3
FROM destination) p
UNPIVOT
(Phone FOR Account IN
(Tel1, tel2, tel3)
)AS unpvt;
create unclustered index on this table with the first column on the phone number and the second part the account number. You can't escape one full table scan so I assume You can scan the import(probably smaller). then just join with this table and use the not exists qualifier as explained. Then of course drop the table after the processing
luke

I am not sure on the perforamance of this query, but since I made the effort of writing it I will post it anyway...
;with aaa(tel)
as
(
select Tel1
from Destination
union
select Tel2
from Destination
union
select Tel3
from Destination
)
,bbb(tel, id)
as
(
select Tel1, account_id
from ImportData
union
select Tel2, account_id
from ImportData
union
select Tel3, account_id
from ImportData
)
select distinct b.id
from bbb b
where b.tel in
(
select a.tel
from aaa a
intersect
select b2.tel
from bbb b2
)

Exists will short-circuit the query and not do a full traversal of the table like a join. You could refactor the where clause as well, if this still doesn't perform the way you want.
SELECT *
FROM ImportData t2
WHERE NOT EXISTS (
select 1
from Destination t1
where (t2.Tel1!='' AND (t2.Tel1 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel2!='' AND (t2.Tel2 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel3!='' AND (t2.Tel3 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Redshift Copy and row_number function - amazon-redshift

Maybe copy into a table with a identity column just don’t copy into that field?

Related

Db2 convert rows to columns

Don't copy all rows in postgres

Store the whole query result in variable using postgresql Stored procedure

Hive: How to do a SELECT query to output a unique primary key using HiveQL?

Finding duplicates between two tables

Categories

Resources