Alter schema of a table in spark

Alter schema of a table in spark - pyspark

I have a table in spark which looks like as below:
Table1
col1 string
col2 int
col3 string
col4 int
col5 string
I have another table which looks like below:
Table2
col1 string
col2 int
col3 string
I want to dynamically read schema of table1 and alter the schema of table2, so that schema of both table should match(same column name and data type)
So finally table2 should look like below:
Table2
col1 string
col2 int
col3 string
col4 int
col5 string
Is it possible to achieve using Pyspark?

Yes, you can write your table into DataFrame and than get schema using:
val tableSchema = yourDataFrame.schema
You also can get it as list of table fields:
val tableSchema = yourDataFrame.schema.fields

Related

Set value of a column based on another column

I have the following table in Postgres 11.
col1 col2 source col3
a abc curation rejected
a abc DB
b etg DB accepted
c jfh curation
How can I assign value in col3 based on the values in col1
The expected output is:
col1 col2 source col3
a abc curation rejected
a abc DB rejected
b etg DB accepted
c jfh curation null
Is there a way to check if values in col1 and col2 in subsequent rows are identical, then assign same col3 to all the rows (if the col3 value in other row is null).
Any help is highly appreciated.

You're not entirely clear on what the criteria is, but at a basic level it could depend on how you want to query this data, there are multiple ways you could do this.
Generated Columns
drop table if exists atable ;
CREATE TABLE atable (
cola text ,
colb text GENERATED ALWAYS AS (case when cola='a' then 'rejected' else
null end) STORED
);
insert into atable(cola) values ('a')
A View.
create or replace view aview as
select cola, case when cola='a' then 'rejected' else null end as colb
from atable;
Both would yield the same results.
cola|colb |
----+--------+
a |rejected|
Other options could be a materialized view, simple query logic.
You have options.

update a2 set
col3 =
case when col1 = 'a' then 'rejected'
when col1 = 'b' then 'accepted'
when col1 = 'c' then 'null' end
where col3 is null
returning *;
You can also set triggers. But generated columns only available from 12. So you need upgrade to use generated columns.
db fiddle

Transpose on row with multiple (dynamic) columns to several rows (number of original columns) with two columns

I have a recordset (one row only):
Col1 Col2 ... Coln
Val1 Val2 ... Valn
I need to transpose it:
Field Value
Col1 Val1
Col2 Val2
... ...
Coln Valn
Can someone help me?
Thanks

Something like:
SELECT Field = 'Col1', Value = Val1
FROM TableA
UNION ALL
SELECT Field = 'Col2', Value = Val2
FROM TableA
UNION ALL
...

NOT NULL constraint on a column when another column has a particular value

create table test (
col1 varchar(20),
col2 varchar(20)
)
When col1 has value '1', col2 cannot be null.
When col1 has any other value, col2 can be null.
Is there a way to write a check constraints based on values of particular columns?

You can write a table-level constraint, sure.
CREATE TABLE test (
col1 VARCHAR(20),
col2 VARCHAR(20),
CHECK (col1 != '1' OR col2 IS NOT NULL)
);
Either col1 isn't '1' (and col2 can be anything), or col1 is '1' (and col2 can't be null).
See the third example in the manual.

Find all rows same value in Col1 but different values in Col2

Given a table similar to this:
Col1 Col2
---- ----
A A
A A
B B
C C
C D
I'm trying to write a query which will identify all values in Col1 which appear more than once AND have differing values in Col2. So a query that would return only rows with C in Col1 (because there are two rows with C in Col1, and they have differing values in Col2).

Groupy by col1 and take only the ones having more than 1 unique col2. These automatically have more than one col1 value too.
select col1
from your_table
group by col1
having count(distinct col2) > 1

How to write this T-SQL WHERE condition?

I've got two tables:
TableA
Col1
Col2
TableB
Col3
Col4
I want to join them together:
SELECT * from TableA join TableB ON (...)
Now, in place of ... I need to write an expression that evaluates to:
If Col3 is not null, then true iif Col1==Col3; otherwise
If Col3 is null, then true iif Col2==Col4
What would be the most elegant way to do this?

ON (Col1=Col3 OR (Col3 IS NULL AND Col2=Col4))
should do the trick (if Col3 is null, Col1=Col3 cannot evalutate to TRUE)

Try this:
SELECT *
FROM TableA
JOIN TableB
ON Col1 = Col3
OR (Col3 IS NULL AND Col2 = Col4)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Alter schema of a table in spark - pyspark

Yes, you can write your table into DataFrame and than get schema using: val tableSchema = yourDataFrame.schema You also can get it as list of table fields: val tableSchema = yourDataFrame.schema.fields

Related

Set value of a column based on another column

Transpose on row with multiple (dynamic) columns to several rows (number of original columns) with two columns

NOT NULL constraint on a column when another column has a particular value

Find all rows same value in Col1 but different values in Col2

How to write this T-SQL WHERE condition?

Categories

Resources