SAS: Concat columns from different tables (same number of rows) - merge

I have two tables with the same number of rows but not one column I could join them toghether. Like:
data table1(keep=Model) table2(keep=MSRP);
set sashelp.cars;
run;
How cound I concat table1 and table2 in order to achieve:
In python I would do it as pandas.concat([table1, table2], axis=1) but here anything I try like:
data cancated;
set table1 table2;
run;
or
proc sql;
create table joined as
select * from table1
union
select * from table2;
delete from joined where Model is missing or MSRP is missing;
run;
but especially the second one gave me error:
ERROR: Column 1 from the first contributor of UNION is not the same
type as its counterpart from the second.
So if I udnerstand corectly I cannot have this kind of join with different types of variables.
Thanks!

You can use a merge statement without any by statement to get a row-by-row matching of the observations from two or more datasets.
data concated;
merge table1 table2;
run;
You could also just use separate set statements for each dataset.
data concated;
set table1;
set table2;
run;
The difference will be when the two datasets have different number of observations. With merge the number of observations will match that of the larger dataset. (The variables contributed only from the smaller dataset will have their values retained.) With the set statements the result will only have the number of observations in the smaller dataset. The step will end when either of the set statements reads past the end of the input dataset.

to do something like in your query. you need to use row number as shown below and then join and delete.
data table1(keep=Model var) table2(keep=MSRP var);
set sashelp.cars;
var = _n_;
run;
proc sql;
create table joined(drop=var) as
select a.*, b.* from table1 a
full join table2 b
on a.var = b.var;
delete from joined where Model is missing or MSRP is missing
;

Related

How to do a Select * followed by a join SEA-ORM

I want to do a join with another table. I followed the tutorial on the site and the my code compiles but it's not performing the join and instead just selects the first table.
SELECT
"table1.col1"
"table1.col2"
"table1.col3"
FROM
"table1"
JOIN "table2" ON "table1"."col1" = "table2"."col1"
LIMIT
1
It is only returning the data from table1 and not concatenating the columns where the condition for table1 and table2 is met.
I execute the query using the following code:
Entity::find()
.from_raw_sql(Statement::from_string(DatabaseBackend::Postgres, query.to_owned()))
.all(&self.connection)
.await?
That returns a Vec<Model>. Is this the correct way? Also, how can I build a SQL statement using an Entity as the base which looks like SELECT * from "table1".
After 'SELECT' (and before 'FROM') you are specifying which columns
to include in the output,
and you are selecting only three columns from table1 in your code.
Add the columns you want to include from table2 here, and you may get
the results you want.

It's a function or table in this later join case?

It is often particularly handy to LEFT JOIN to a LATERAL subquery, so that source rows will appear in the result even if the LATERAL subquery produces no rows for them. For example, if get_product_names() returns the names of products made by a manufacturer, but some manufacturers in our table currently produce no products, we could find out which ones those are like this:
SELECT m.name
FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true
WHERE pname IS NULL;
All contents extract from PostgreSQL manual. LINK
Now I finally probably get what does LATERAL mean. In this case,
Overall I am Not sure get_product_names is a table or function. The following is my understanding.
A: get_product_names(m.id) is a function, and using m.id as a input parameter returns a table. The return table alias as pname. Overall it's a table m join a null (where condition) table.
B: get_product_names is a table, table m left join table get_product_names on m.id. pname is alias for get_product_names. Overall it's a table m join a null (where condition) table.
get_product_names is a table function (also known as set returning function or SRF in PostgreSQL slang). Such a function does not necessarily return a single result row, but arbitrarily many rows.
Since the result of such a function is a table, you typically use it in SQL statements where you would use a table, that is in the FROM clause.
A simple example is
SELECT * FROM generate_series(1, 5);
generate_series
-----------------
1
2
3
4
5
(5 rows)
You can also use normal functions in this way, they are then treated as a table function that returns exactly one row.

How to combine non matching datasets horizontally in SAS/SQL?

I work in SAS writing both SAS base and SQL statements.
My problem is, that I have two datasets I want to combine horizontally.
The one data set is called Code and have one variable: Code_value
It has 55 observations, with no duplicate values.
The other data set is called Persons and have one variable: Person_id
It has 167 unique person_id values.
I want to create a dataset where I join these data sets. - There are no matching values in the two datasets.
I want to force the data sets together, so I have a data set with for each person_id, there is a row with every code_value.
So i have combinations with these value combinations:
Code1 Pid1
Code1 Pid2
Code1 Pid3
...
Code2 Pid1
Code2 Pid2
Code2 Pid3
... etc. Ending up with a data set with 2 variables and 9185 rows in total.
I have tried data step with merge and also tried to write a sql with a full join, but with no luck.
Can anyone help?
Kind Regards
Maria
This is known as a cross join. I prefer to explicitly list the cross join.
proc sql;
create table want as
select *
from code
CROSS JOIN
persons;
quit;
Or without any specifications:
proc sql;
create table want as
select *
from one, two;
Both should give you the same answer.
The ON condition for the join should be 1=1. This will cause all rows in one to match all rows in two.
Example, 3 rows in one, 5 rows in two, 15 rows in crossings:
data one;
do i = 1 to 3;
output;
end;
run;
data two;
do j = 1 to 5;
output;
end;
run;
proc sql;
create table crossings as
select *
from one full join two on 1=1
;
quit;
If there are any column names in common you should either rename them or coalesce() them.

Hive: How to do a SELECT query to output a unique primary key using HiveQL?

I have the following schema dataset which i want to transform into a table that can be exported to SQL. I am using HIVE. Input as follows
call_id,stat1,stat2,stat3
1,a,b,c,
2,x,y,z,
3,d,e,f,
1,j,k,l,
The output table needs to have call_id as its primary key so it needs to be unique. The output schema should be
call_id,stat2,stat3,
1,b,c, or (1,k,l)
2,y,z,
3,e,f,
The problem is that when i use the keyword DISTINCT in the HIVE query, the DISTINCT applies to the all the colums combined. I want to apply the DISTINCT operation only to the call_id. Something on the lines of
SELECT DISTINCT(call_id), stat2,stat3 from intable;
However this is not valid in HIVE(I am not well-versed in SQL either).
The only legal query seems to be
SELECT DISTINCT call_id, stat2,stat3 from intable;
But this returns multiple rows with same call_id as the other columns are different and the row on the whole is distinct.
NOTE: There is no arithmetic relation between a,b,c,x,y,z, etc. So any trick of averaging or summing is not viable.
Any ideas how i can do this?
One quick idea,not the best one, but will do the work-
hive>create table temp1(a int,b string);
hive>insert overwrite table temp1
select call_id,max(concat(stat1,'|',stat2,'|',stat3)) from intable group by call_id;
hive>insert overwrite table intable
select a,split(b,'|')[0],split(b,'|')[1],split(b,'|')[2] from temp1;
,,I want to apply the DISTINCT operation only to the call_id"
But how will then Hive know which row to eliminate?
Without knowing the amount of data / size of the stat fields you have, the following query can the job:
select distinct i1.call_id, i1.stat2, i1.stat3 from (
select call_id, MIN(concat(stat1, stat2, stat3)) as smin
from intable group by call_id
) i2 join intable i1 on i1.call_id = i2.call_id
AND concat(i1.stat1, i1.stat2, i1.stat3) = i2.smin;

Updating multiple rows in one table based on multiple rows in a second

I have two tables, table1 and table2, both of which contain columns that store postgis geometries. What I want to do is see where the geometry stored in any row of table2 geometrically intersects with the geometry stored in any row of table1 and update a count column in table1 with the number of intersections. Therefore, if I have a geometry in row 1 of table1 that intersects with the geometries stored in 5 rows in table2, I want to store a count of 5 in a separate column in table one. The tricky part for me is that I want to do this for every row of column 1 at the same time.
I have the following:
UPDATE circles SET intersectCount = intersectCount + 1 FROM rectangles
WHERE ST_INTERSECTS(cirlces.geom, rectangles.geom);
...which doesn't seem to be working. I'm not too familiar with postgres (or sql in general) and I'm wondering if I can do this all in one statement or if I need a few. I have some ideas for how I would do this with multiple statements (or using for loop) but I'm really looking for a concise solution. Any help would be much appreciated.
Thanks!
something like:
update t1 set ctr=helper.ctr
from (
select t1.id, count(*) as cnt
from t1, t2
where st_intersects(t1.col, t2.col)
group by t1.id
) helper
where helper.id=t1.id
?
btw: Your version does not work, because a row can get updated only once in a single update statement.