Spark 2.4 Unable to Insert Record using variable - scala

I am trying to insert record into a table using a variable but it is failing.
command:
val query = "INSERT into TABLE Feed_metadata_s2 values ('LOGS','RUN_DATE',{} )".format(s"$RUN_DATE")
spark.sql(s"query")
spark.sql("INSERT into TABLE Feed_metadata_s2 values ('LOGS','ExtractStartTimestamp',$ExtractStartTimestamp)")
error:
INSERT into TABLE Feed_metadata_s2 values ('SDEDLOGS','ExtractStartTimestamp',$ExtractStartTimestamp)
------------------------------------------------------------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)

It seems you're confused with string interpolation... you need to put s before the last query so that the variable is substituted into the string. Also the first two lines can be simplified:
val query = s"INSERT into TABLE Feed_metadata_s2 values ('LOGS','RUN_DATE',$RUN_DATE)"
spark.sql(query)
spark.sql(s"INSERT into TABLE Feed_metadata_s2 values ('LOGS','ExtractStartTimestamp',$ExtractStartTimestamp)")

Related

Split the column value and make key as column name in postgres query

I have the table with the column value as below:
data_as_of_date:20210202 unique_cc:3999
data_as_of_date:20220202 unique_cc:1999
i need to convert this column into like this:
data_as_of_date unique_cc
20210202 3999
20220202 1999
Sample data:
create table test (val varchar);
insert into test(val) values ('data_as_of_date:20210202 unique_cc:3999');
insert into test(val) values ('data_as_of_date:20220202 unique_cc:1999');
I have tried with unnest with string_to_array & crosstab functions, but it is not working.
You don't need unnest or a crosstab for this. A simple regular expression should do the trick:
select substring(the_column from 'data_as_of_date:([0-9]{8})') as data_as_of_date,
substring(the_column from 'unique_cc:([0-9]{4})') as unqiue_cc
from the_table;

DB2 Update statement not working using JDBC

I have a few rows stored in a source table (as defined as $schema.$sourceTable in the UPDATE query below). This table has 3 columns: TABLE_NAME, PERMISSION_TAG_COL, PT_DEPLOYED
I have an update statement stored in a string like:
var update_PT_Deploy = s"UPDATE $schema.$sourceTable SET PT_DEPLOYED = 'Y' WHERE TABLE_NAME = '$tableName';"
My source table does have rows with TABLE_NAME as $tableName (parameter) as I inserted rows into this table using another function of my program. The default value of PT_DEPLOYED when I inserted the rows was specified as NULL.
I'm trying to execute update using JDBC in the following manner:
println(update_PT_Deploy)
val preparedStatement: PreparedStatement = connection.prepareStatement(update_PT_Deploy)
val row = preparedStatement.execute()
println(row)
println("row updated in table successfully")
preparedStatement.close()
The above piece of code does not throw any exception, but when I query my table in a tool like DBeaver, the NULL value of PT_DEPLOYED does not get updated to Y.
If I execute the same query as mentioned in update_PT_Deploy inside DBeaver, the query works and the table updates. I am sure I am following the correct steps..

Spark SQL - regexp_replace not updating the column value

I ran the following query in Hive and it successfully updated the column value in the table: select id, regexp_replace(full_name,'A','C') from table
But when I ran the same query from Spark SQL, it did not update the actual records
hiveContext.sql("select id, regexp_replace(full_name,'A','C') from table")
but when I do a hiveContext.sql("select id, regexp_replace(full_name,'A','C') from table").show() -- it displays A replaced with C successfully ... only in the display and not in the actual table
I tried to assign the result to another variable
val vFullName = hiveContext.sql("select id, regexp_replace(full_name,'A','C') from table")
and then
vFullName.show() -- it displays the original values without replacement
How do I get the value replaced in the table from SparkSQL?

Replace rows based on a modified timestamp

I am looking for an efficient method (which I can reuse for similar situations) to drop rows which have been updated.
My table has many columns, but the important ones are:
creation_timestamp, id, last_modified_timestamp
My primary key is the creation_timestamp and the id. However, after and id has been created, it can be modified by other users which is indicated by the last_modified_timestamp.
1) Read a daily file and add any new rows (based on creation_timestamp and id)
2) Remove old rows which have a different last_modified_timestamp and replace them with the latest versions.
I typically do most of my operations with Pandas (python library) and pyscopg2, so I am not extremely familiar with PostgreSQL 9.6 which is the database I am using. My initial approach is to just add the last_modified_timestamp to the primary key, and then just use a view to SELECT DISTINCT based on the latest changes. However, it seems like that is 'cheating' and I will be wasting space since I do not need to retain previous versions.
EDIT:
def create_update_query(df, table=FACT_TABLE):
columns = ', '.join([f'{col}' for col in DATABASE_COLUMNS])
constraint = ', '.join([f'{col}' for col in PRIMARY_KEY])
placeholder = ', '.join([f'%({col})s' for col in DATABASE_COLUMNS])
updates = ', '.join([f'{col} = EXCLUDED.{col}' for col in DATABASE_COLUMNS])
query = f"""
INSERT INTO {table} ({columns})
VALUES ({placeholder})
ON CONFLICT ({constraint})
DO UPDATE SET {updates};"""
query.split()
query = ' '.join(query.split())
return query
def load_updates(df, connection=DATABASE):
conn = connection.get_conn()
cursor = conn.cursor()
df1 = df.where((pd.notnull(df)), None)
insert_values = df1.to_dict(orient='records')
for row in insert_values:
cursor.execute(create_update_query(df), row)
conn.commit()
cursor.close()
del cursor
conn.close()
This appears to work. I was running into some issues, so right now i am looping through each row of the DataFrame as a dictionary, then inserting that row. Also, I had to figure out a way to fill in the nan columns with None, because I was getting errors with Timestamp dtypes with blank values, etc.

How to insert DB Default values under EF?

when adding a new record like this
ContentContacts c2 = new ContentContacts();
c2.updated_user = c2.created_user = loggedUserId;
c2.created_date = c2.updated_date = DateTime.UtcNow;
db.ContentContacts.AddObject(c2);
I'm getting
Cannot insert the value NULL into column 'main_email_support', table 'SQL2008R2.dbo.ContentContacts'; column does not allow nulls. INSERT fails. The statement has been terminated.
but the default value in the database is an empty string like:
why am I getting such error? shouldn't the EF says something like:
"ohh, it's a nullvalue, so let's add the column default value instead"
I did a small test, created a table with a column that had a default value but did not allow nulls.
Then this SQL Statement:
INSERT INTO [Test].[dbo].[Table_1]
([TestText])
VALUES
(null)
Gives this error:
Msg 515, Level 16, State 2, Line 1
Cannot insert the value NULL into column 'TestText', table
'Test.dbo.Table_1'; column does not allow nulls. INSERT fails.
The problem here is that the insert specifies all the columns, also those with default values. The the Insert tries to update the columns with null values.
You have 2 options:
Update the table through a view, which does not contain the default columns
Set the default values in your C# code
A default value is business logic, so there is a case for it being set in the business layer of your application.