Redshift insert simultaniously into mixing id table and another table

Redshift insert simultaniously into mixing id table and another table - amazon-redshift

Situation
The output out table has values that come from some one of source tables, source1 or source2. All these tables have unique id and there is another table mix that keeps the relation between these ids.
Problem
Given an update on source1 with ids sId, I'd like to input values into out with new unique newIds and at the same time update mix table with (newId, sId, source1).
Is it incorrect to think that doing two step (below) is slower/less efficient?
Expectation
I'd like to input data into two tables at the same time since they're not dependent and want to know whether any of these fails. If it was my way it'd something like:
SELECT
s.id as sId, s.value as value, fn_uuid() as newId,'source1' as sourceName,
INTO
[mix (newId, sId, sourceName), out (newId, value)]
FROM
source1 s;
Current approach
SELECT fn_uuid() as newId, id as sId, 'source1' as source INTO mix FROM source1
and then
SELECT m.Id, s.value INTO out FROM source1 s JOIN mix m on s.id = m.sId
assuming that fn_uuid() is defined as
CREATE OR REPLACE FUNCTION public.fn_uuid()
RETURNS characte`r varying AS
' import uuid
return uuid.uuid1().__str__()
'
LANGUAGE plpythonu VOLATILE;

Related

Is there a way to add the same row multiple times with different ids into a table with postgresql?

I am trying to add the same data for a row into my table x number of times in postgresql. Is there a way of doing that without manually entering the same values x number of times? I am looking for the equivalent of the go[count] in sql for postgres...if that exists.

Use the function generate_series(), e.g.:
insert into my_table
select id, 'alfa', 'beta'
from generate_series(1,4) as id;
Test it in db<>fiddle.

Idea
Produce a resultset of a given size and cross join it with the record that you want to insert x times. What would still be missing is the generation of proper PK values. A specific suggestion would require more details on the data model.
Query
The sample query below presupposes that your PK values are autogenerated.
CREATE TABLE test ( id SERIAL, a VARCHAR(10), b VARCHAR(10) );
INSERT INTO test (a, b)
WITH RECURSIVE Numbers(i) AS (
SELECT 1
UNION ALL
SELECT i + 1
FROM Numbers
WHERE i < 5 -- This is the value `x`
)
SELECT adhoc.*
FROM Numbers n
CROSS JOIN ( -- This is the single record to be inserted multiple times
SELECT 'value_a' a
, 'value_b' b
) adhoc
;
See it in action in this db fiddle.
Note / Reference
The solution is adopted from here with minor modifications (there are a host of other solutions to generate x consecutive numbers with SQL hierachical / recursive queries, so the choice of reference is somewhat arbitrary).

postgress: insert rows to table with multiple records from other join tables

ّ am trying to insert multiple records got from the join table to another table user_to_property. In the user_to_property table user_to_property_id is primary, not null it is not autoincrementing. So I am trying to add user_to_property_id manually by an increment of 1.
WITH selectedData AS
( -- selection of the data that needs to be inserted
SELECT t2.user_id as userId
FROM property_lines t1
INNER JOIN user t2 ON t1.account_id = t2.account_id
)
INSERT INTO user_to_property (user_to_property_id, user_id, property_id, created_date)
VALUES ((SELECT MAX( user_to_property_id )+1 FROM user_to_property),(SELECT
selectedData.userId
FROM selectedData),3,now());
The above query gives me the below error:
ERROR: more than one row returned by a subquery used as an expression
How to insert multiple records to a table from the join of other tables? where the user_to_property table contains a unique record for the same user-id and property_id there should be only 1 record.

Typically for Insert you use either values or select. The structure values( select...) often (generally?) just causes more trouble than it worth, and it is never necessary. You can always select a constant or an expression. In this case convert to just select. For generating your ID get the max value from your table and then just add the row_number that you are inserting: (see demo)
insert into user_to_property(user_to_property_id
, user_id
, property_id
, created
)
with start_with(current_max_id) as
( select max(user_to_property_id) from user_to_property )
select current_max_id + id_incr, user_id, 3, now()
from (
select t2.user_id, row_number() over() id_incr
from property_lines t1
join users t2 on t1.account_id = t2.account_id
) js
join start_with on true;
A couple notes:
DO NOT use user for table name, or any other object name. It is a
documented reserved word by both Postgres and SQL standard (and has
been since Postgres v7.1 and the SQL 92 Standard at lest).
You really should create another column or change the column type
user_to_property_id to auto-generated. Using Max()+1, or
anything based on that idea, is a virtual guarantee you will generate
duplicate keys. Much to the amusement of users and developers alike.
What happens in an MVCC when 2 users run the query concurrently.

Postgres - UNION ALL vs INSERT INTO which is better?

I have two query by union all and insert into temp table.
Query 1
select *
from (
select a.id as id, a.name as name from a
union all
select b.id as id, b.name as name from b
)
Query 2
drop table if exists temporary;
create temp table temporary as
select id as id, name as name
from a;
insert into temporary
select id as id, name as name
from b;
select * from temp;
Please tell me which one is better for performance?

I would expect the second option to have better performance, at least at the the database level. Both versions require doing a full table scan of both the a and b tables. But the first version would create an unnecessary intermediate table, used only for the purpose of the insert.
The only potential issue with doing two separate inserts is latency, i.e. the time it might take some process to get to and from the database. If you are worried about this, then you can limit to one insert statement:
INSERT INTO temporary (id, name)
SELECT id, name FROM a
UNION ALL
SELECT id, name FROM b;
This would just require one trip to the database.

I think use union all is the better performance way, not sure, you can try it your self. In tab run of SQL application alway show time to run. I take a snapshot in oracle; mysql and sql sv have the same tool to see it
click here to see image

Hive: How to do a SELECT query to output a unique primary key using HiveQL?

I have the following schema dataset which i want to transform into a table that can be exported to SQL. I am using HIVE. Input as follows
call_id,stat1,stat2,stat3
1,a,b,c,
2,x,y,z,
3,d,e,f,
1,j,k,l,
The output table needs to have call_id as its primary key so it needs to be unique. The output schema should be
call_id,stat2,stat3,
1,b,c, or (1,k,l)
2,y,z,
3,e,f,
The problem is that when i use the keyword DISTINCT in the HIVE query, the DISTINCT applies to the all the colums combined. I want to apply the DISTINCT operation only to the call_id. Something on the lines of
SELECT DISTINCT(call_id), stat2,stat3 from intable;
However this is not valid in HIVE(I am not well-versed in SQL either).
The only legal query seems to be
SELECT DISTINCT call_id, stat2,stat3 from intable;
But this returns multiple rows with same call_id as the other columns are different and the row on the whole is distinct.
NOTE: There is no arithmetic relation between a,b,c,x,y,z, etc. So any trick of averaging or summing is not viable.
Any ideas how i can do this?

One quick idea,not the best one, but will do the work-
hive>create table temp1(a int,b string);
hive>insert overwrite table temp1
select call_id,max(concat(stat1,'|',stat2,'|',stat3)) from intable group by call_id;
hive>insert overwrite table intable
select a,split(b,'|')[0],split(b,'|')[1],split(b,'|')[2] from temp1;

,,I want to apply the DISTINCT operation only to the call_id"
But how will then Hive know which row to eliminate?
Without knowing the amount of data / size of the stat fields you have, the following query can the job:
select distinct i1.call_id, i1.stat2, i1.stat3 from (
select call_id, MIN(concat(stat1, stat2, stat3)) as smin
from intable group by call_id
) i2 join intable i1 on i1.call_id = i2.call_id
AND concat(i1.stat1, i1.stat2, i1.stat3) = i2.smin;

Detecting duplicate values in a column of a Datatable while traversing through It

I have a Datatable with Id(guid) and Name(string) columns. I traverse through the data table and run a validation criteria on the Name (say, It should contain only letters and numbers) and then adding the corresponding Id to a List If name passes the validation.
Something like below:-
List<Guid> validIds=new List<Guid>();
foreach(DataRow row in DataTable1.Rows)
{
if(IsValid(row["Name"])
{
validIds.Add((Guid)row["Id"]);
}
}
In addition to this validation I should also check If the name is not repeating in the whole datatable (even for the case-sensitiveness), If It is repeating, I should not add the corresponding Id in the List.
Things I am thinking/have thought about:-
1) I can have another List, check for the "Name" in the same, If It exists, will add the corresponding Guild
2) I cannot use HashSet as that would treat "Test" and "test" as different strings and not duplicates.
3) Take the DataTable to another one where I have the disctict names (this I havent tried and the code might be incorrect, please correct me whereever possible)
DataTable dataTableWithDistinctName = new DataTable();
dataTableWithDistinctName.CaseSensitive=true
CopiedDataTable=DataTable1.DefaultView.ToTable(true,"Name");
I would loop through the original datatable and check the existence of the "Name" in the CopiedDataTable, If It exists, I wont add the Id to the List.
Are there any better and optimum way to achieve the same? I need to always think of performance. Although there are many related questions in SO, I didnt find a problem similar to this. If you could point me to a question similar to this, It would be helpful.
EDIT :- The number of records might vary from 2000-3000.
Thanks

if you are looking to prevent duplicates, it may be grueling work, and I don't know how many records your dealing with at at atime... If a small set, I'd consider doing a query before each attempted insert from your LIVE source based on
select COUNT(*) as CountOnFile from ProductionTable where UPPER(name) = UPPER(name from live data).
If the result set CountOnFile > 0, don't add.
If you are dealing with a large dataset, like a bulk import, I would pull all the data into a temp table, then do a query where NOT IN... something like
create table OkToBeAdded as
select distinct upper( TempTable.Name ) as Name, GUID
from TempTable
where upper( TempTable.Name )
NOT IN ( select upper( LiveTable.Name )
from LiveTable
where upper( TempTable.Name ) = upper( LiveTable.Name )
);
insert into LiveTable ( Name, GUID )
select Name, GUID from OkToBeAdded;
Obviously, the SQL is sample and would need to be adjusted based on your specific back-end source

/* I did this entirely in SQL and avoided ADO.NET*/
/*I Pass the CSV of valid object Ids and split that in a table*/
DECLARE #TableTemp TABLE
(
TempId uniqueidentifier
)
INSERT INTO #TableTemp
SELECT cast(Data AS uniqueidentifier )AS ID FROM dbo.Split1(#ValidObjectIdsAsCSV,',')
/*Self join with table1 for any duplicate rows and update the column value*/
UPDATE Table1
SET IsValidated=1
FROM Table1 AS A INNER JOIN #TableTemp AS Temp
ON A.ID=Temp.TempId
WHERE NOT EXISTS (SELECT Name,Count(Name) FROM Table1
WHERE A.Name=B.Name
GROUP BY Name HAVING Count(Name)>1)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Redshift insert simultaniously into mixing id table and another table - amazon-redshift

Related

Is there a way to add the same row multiple times with different ids into a table with postgresql?

postgress: insert rows to table with multiple records from other join tables

Postgres - UNION ALL vs INSERT INTO which is better?

Hive: How to do a SELECT query to output a unique primary key using HiveQL?

Detecting duplicate values in a column of a Datatable while traversing through It

Categories

Resources