Redshift: Short cut to transform column values into a table

Redshift: Short cut to transform column values into a table - amazon-redshift

I have a need to solve an table transformation problem and I am looking for any short cuts in Redshift.
I have a table that contains a unique identity, a property_name, and a property_value as columns:
{{uid=1, property_name=A, property_value =AA}, {uid=1, property_name=B, property_value=BB}}
I need to transform this into a table where the unique_id and property_names are the columns and populate it with the associated property_values. In the example above the table would be transformed as so:
{{uid=1, A=AA, B=BB}}
Are there any redshift shortcuts anyone is aware of that would make this as easy as possible?
Thanks.

Not sure if I understood your question correctly, but below query looks like a solution to me
select uid, property_name + ' = ' + property_value from Your_Table group by uid, property_name , property_value
If I’ve made a bad assumption please comment and I’ll refocus my answer.

Related

How to save a query result into a column using postgresql

This may be a simple fix but Im somehow drawing a blank. I have this code below, and I want the results that I got from it to be added into their own column in an existing table. How would i go about doing this.
Select full_name, SUM(total) as sum_sales
FROM loyalty
where invoiceyear = 2013
GROUP BY full_name
order by sum_sales DESC;
This leaves me with one column with the name of employee and the second with their sales from that year.
How can i just take these results and add them into a column in addition to the table
Is it as simple as...
Alter table loyalty
Add column "2013 sales"
and then add in some sort of condition?
Any help would be greatly appreciated!

If i got your question right, you need to first alter the table allowing the new field to be null (you can change it later on) then you could use an insert clause to store the value permanently.

TSQL order by but first show these

I'm researching a dataset.
And I just wonder if there is a way to order like below in 1 query
Select * From MyTable where name ='international%' order by id
Select * From MyTable where name != 'international%' order by id
So first showing all international items, next by names who dont start with international.
My question is not about adding columns to make this work, or use multiple DB's, or a largerTSQL script to clone a DB into a new order.
I just wonder if anything after 'Where or order by' can be tricked to do this.

You can use expressions in the ORDER BY:
Select * From MyTable
order by
CASE
WHEN name like 'international%' THEN 0
ELSE 1
END,
id
(From your narrative, it also sounded like you wanted like, not =, so I changed that too)

Another way (slightly cleaner and a tiny bit faster)
-- Sample Data
DECLARE #mytable TABLE (id INT IDENTITY, [name] VARCHAR(100));
INSERT #mytable([name])
VALUES('international something' ),('ACME'),('international waffles'),('ABC Co.');
-- solution
SELECT t.*
FROM #mytable AS t
ORDER BY -PATINDEX('international%', t.[name]);
Note too that you can add a persisted computed column for -PATINDEX('international%', t.[name]) to speed things up.

ST_contains taking too much time

I am trying to match latitude/longitude to a particular neighbor location using below query
create table address_classification as (
select distinct buildingid,street,city,state,neighborhood,borough
from master_data
join
Borough_GEOM
on st_contains(st_astext(geom),coordinates) = 'true'
);
In this, coordinates is of below format
ST_GeometryFromText('POINT('||longitude||' '||latitude||')') as coordinates
and geom is of column type geometry.
i have already created indexes as below
CREATE INDEX coordinates_gix ON master_data USING GIST (coordinates);
CREATE INDEX boro_geom_indx ON Borough_GEOM USING gist(geom);
I have almost 3 million records in the main table and 200 geometric information in the GEOM table. Explain analyze of the query is taking so much time (2 hrs).
Please let me know, how can i optimize this query.
Thanks in advance.

As mentioned in the comments, don't use ST_AsText(): that doesn't belong there. It's casting the geom to text, and then going back to geom. But, more importantly, that process is likely to fumble the index.
If you're unique on only column then use DISTINCT ON, no need to compare the others.
If you're unique on the ID column and your only joining to add selectivity then consider using EXISTS. Do any of these columns come from the borough_GEOM other than geom?
I'd start with something like this,
CREATE TABLE address_classification AS
SELECT DISTINCT ON (buildingid),
buildingid,
street,
city,
state,
neighborhood,
borough
FROM master_data
JOIN borough_GEOM
ON ST_Contains(geom,coordinates);

Redshift Copy and row_number function

Is there a way to make a Redshift Copy while at the same time generating the row_number() within the destination table?
I am basically looking for the equivalent of the below except that the group of rows does not come from a select but from a copy command for a file on S3
insert into aTable
(id,x,y,z)
select
#{startIncrement}+row_number() over(order by x) as id,
x,y,z,
from anotherTable;
Thx

I understand from your question is that, you need to insert an additional column id into table and that id was not there in the csv file. If my understanding is right, please follow the below approach,
Copy the data from the csv file to a temp table say "aTableTemp" which has the schema without column "id". Then insert data from "aTableTemp" into "aTable" as follows
Insert into aTable
Select #{startIncrement}+row_number() over(order by x) as id, * from aTableTemp
I hope this should solve your problem

Maybe copy into a table with a identity column just don’t copy into that field?

T-SQL: Find column match within a string (LIKE but different)

Server: SQL Server 2008 R2
I apologize in advance, as I'm not sure of the best way to verbalize the question. I'm receiving a string of email addresses and I need to see if, within that string, any of the addresses exist as a user already. The query that obviously doesn't work is shown below, but hopefully it helps to clarify what I'm looking for:
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress LIKE '%user1#domain.com,user2#domain.com%'
I was hoping SQL had an "InString" operator, that would check for matches "within the string", but I my Google abilities must be weak today.
Any assistance is greatly appreciated. If there simply isn't a way, I'll have to dig in and do some work in the codebehind to split each item in the string and search on each one.
Thanks in advance,
Beems

Split the input string and use IN clause
to split the CSV to rows use this.
SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)
Now use the above query in where clause.
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress IN(SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a))
Or use can use Inner Join
SELECT f_emailaddress
FROM tb_users A
JOIN (SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) B
ON a.f_emailaddress = b.f_emailaddress

You first need to split the CSV list into a temp table and then use that to INNER JOIN with your existing table, as that will act as a filter.
You cannot use CONTAINS unless you have created a Full Text index on that table and column, which I doubt is the case here.
For example:
CREATE TABLE #EmailAddresses (Email NVARCHAR(500) NOT NULL);
INSERT INTO #EmailAddress (Email)
SELECT split.Val
FROM dbo.Splitter(#IncomingListOfEmailAddresses);
SELECT usr.f_emailaddress
FROM tb_users usr
INNER JOIN #EmailAddresses tmp
ON tmp.Email = usr.f_emailaddress;
Please note that the reference to "dbo.Splitter" is a placeholder for whatever string splitter you already have or might get. Please do not use any splitter that makes use of a WHILE loop. The best options are either the SQLCLR- or XML- based ones. The XML-based ones are generally fast but do have some issues with encoding if the string to be split has special XML characters such as &, <, or ". If you want a quick and easy SQLCLR-based splitter, you can download the Free version of the SQL# library (which I am the creator of, but this feature is in the free version) which contains String_Split and String_Split4k (for when the input is always <= 4000 characters).

SQL has a CONTAINS and an IN function. You can use either of those to accomplish your task. Click on either for more information via MSDNs website! Hope this helps.
CONTAINS
CONTAINS will look to see if any values in your data contain the entire string you provided. Kind of similar in presentations to LIKE '%myValue%';
SELECT f_emailaddress
FROM tb_users
WHERE CONTAINS (f_emailaddress, 'user1#domain.com');
IN
IN will return matches for any values in the provided comma delimited list. They need to be exact matches however. You can't provide partial terms.
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress IN ('user1#domain.com','user2#domain.com')
As far as splitting each of the values out into separate strings, have a look at the StackOverflow question found HERE. This might point you in the proper direction.

You can try like this(not tested).
Before using this, make sure that you have created a Full Text index on that table and column.
Replace your comma with AND then
SELECT id,email
FROM t
where CONTAINS(email, 'user1#domain.com and user2#domain.com');

--prepare temp table for testing
DECLARE #tb_users AS TABLE
(f_emailaddress VARCHAR(100))
INSERT INTO #tb_users
( f_emailaddress)
VALUES ( 'user1#domain.com' ),
( 'user2#domain.com' ),
( 'user3#domain.com' ),
( 'user4#domain.com' )
--Your query
SELECT f_emailaddress
FROM #tb_users
WHERE 'user1#domain.com,user2#domain.com' LIKE '%' + f_emailaddress + '%'