DELTA TABLES - The specified properties do not match the existing properties - pyspark

I'm using a set of user properties on DataBricks Delta Tables for metadata management. The problem is when I need to change one of those properties I'm getting the 'FAILED Error: The specified properties do not match the existing properties at /mnt/silver/...' error message.
Databricks documentation only states that an Exception will be raised and I didn't find any argument to force it to accept the new values.
Is it possible to just update table Properties?
Any Suggestions?
Sample Code:
query = f'''
CREATE TABLE if not exists {tableMetadataDBName}.{tableMetadataTableName}
(
... my columns ...
-- COMMON COLUMNS
,Hash string
,sourceFilename STRING
,HashWithFileDate string
,Surrogate_Key STRING
,SessionId STRING
,SessionRunDate TIMESTAMP
,Year INT GENERATED ALWAYS AS ( YEAR(fileDate))
,Month INT GENERATED ALWAYS AS ( MONTH(fileDate))
,fileDate DATE
)
USING DELTA
COMMENT '{tableDocumentationURL}'
LOCATION "{savePath}/parquet"
OPTIONS( "compression"="snappy")
PARTITIONED BY (Year, Month, fileDate )
TBLPROPERTIES ("DataStage"="{txtDataStage.upper()}"
,"Environment"="{txtEnvironment}"
,"source"="{tableMetadataSource}"
,"CreationDate" = "{tableMetadataCreationDate}"
,"CreatedBy" = "{tableMetadataCreatedBy}"
,"Project" = "{tableMetadataProject}"
,"AssociatedReports" = "{tableMetadataAssociatedReports}"
,"UpstreamDependencies" = "{tableMetadataUpstreamDependencies}"
,"DownstreamDependencies" = "{tableMetadataDownstreamDependencies}"
,"Source" = "{tableMetadataSource}"
,"PopulationFrequency" = "{tableMetadataPopulationFrequency}"
,"BusinessSubject" = "{tableMetadataBusinessSubject}"
,"JiraProject" = "{tableMetadataJiraProject}"
,"DevOpsProject" = "{tableMetadataDevOpsProject}"
,"DevOpsRepository" = "{tableMetadataDevOpsRepository}"
,"URL" = "{tableMetadataURL}") '''
spark.sql(query)

Yes, it's possible to change just properties - you need to use "ALTER TABLE [table_name] SET TBLPROPERTIES ..." for that:
query = f"""ALTER TABLE {table_name} SET TBLPROPERTIES (
'CreatedBy' = '{tableMetadataCreatedBy}
,'Project' = '{tableMetadataProject}'
....
)"""
spark.sql(query)

Related

Table not created, even after validating the table existence

class datalog(display_clock):
def con_mysql(self):
cat = mysql.connector.connect(
host="localhost", user="subramanya", passwd="Sureshbabu#4155", database="CFM")
if (cat):
datacursor = cat.cursor()
todaydate = d
check_table = (
"SELECT count(*) FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME=%s")
datacursor.execute(check_table, (todaydate,))
result = datacursor.fetchone()
if (result):
self.success_login()
else:
datacursor.execute(
"CREATE TABLE {today}(Sl_no INT NOT NULL AUTO_INCREMENT PRIMARY KEY,date DATE,Start_time TIME,End_time TIME,Item CHAR(255),Weight FLOAT, Amount INTEGER(10))".format(today=todaydate))
self.success_login()
else:
datacursor.Terminate
self.error_display.insert(0.0, "Connecting Database failed!!!")
I tried to check whether any table exists for today's date or not.
if not create the same.
no error occurred. But table not created for sysdate.
Welcome to stackoverflow!
I believe there is a small misconception here. You don't need to check if the table exists beforehand and create it afterward. Most of the current database technologies accept the condition IF NOT EXISTS on CREATE TABLE CLAUSE.
CREATE TABLE IF NOT EXISTS sales (
sale_id INT NOT NULL,
);
It means the table sales will be only created IF NOT EXISTS previously.
Also, I strongly recommend refactoring your code a wee bit. Take as a suggestion (please adapt accordingly your project needs):
from datetime import datetime
class Settings:
# please, avoid hard-coded credentials.
DB_HOST = "localhost"
DB_USER = "subramanya"
DB_PASSWD = "Sureshbabu#4155"
DB_SCHEMA = "CFM"
class datalog(display_clock):
def db_connect(self):
conn = mysql.connector.connect(
host=Settings.DB_HOST,
user=Settings.DB_USER,
passwd=Settings.DB_PASSWD,
database=Settings.DB_SCHEMA
)
if not conn:
raise Exception("Connecting Database failed!!!")
return conn
def ensure_table(self):
conn = self.db_connect()
conn.datacursor.execute("""
CREATE TABLE IF NOT EXISTS `{0}`(
Sl_no INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
date DATE,
Start_time TIME,
End_time TIME,
Item CHAR(255),
Weight FLOAT,
Amount INTEGER(10)
);
""".format(datetime.today().strftime('%Y%m%d')) # format 20200915
)
def run(self):
self.ensure_table()
self.success_login()
There are plenty of ways to write this code, but keep in mind that readability matters a lot.

Query with defined value after variable ${}

i have script to retrieve data stored in text file, then use variable query ${} to parse the data.
example:
data kept in text file is abc
below statement will execute query productId = 'abc'
Now, I want to append defined value after the abc. to make the query like below:
productId = 'abc/NDC-1111'
what should be the exact syntax i need to use?
//Read productId
def productId = new File(RunConfiguration.getProjectDir() + "/Data Files/productId.txt")
//SQL statement
dbQuery2 = /SELECT * FROM db.t1 where productId = '${productId.text}'/
You can just do:
dbQuery2 = "SELECT * FROM db.t1 where productId = ${"$productId.text/NDC-1111"}"

Postgresql Update & Inner Join

I am trying to update data in Table: local.import_payments from Table: local.payments based on update and Inner Join queries. The query I used:
Update local.import_payments
Set local.import_payments.client_id = local.payments.payment_for_client__record_id,
local.import_payments.client_name = local.payments.payment_for_client__company_name,
local.import_payments.customer_id = local.payments.customer__record_id,
local.import_payments.customer_name = local.payment_from_customer,
local.import_payments.payment_id = local.payments.payment_id
From local.import_payments
Inner Join local.payments
Where local.payments.copy_to_imported_payments = 'true'
The client_id, client_name, customer_id, customer_name in the local.import_payments need to get updated with the values from the table local.payments based on the condition that the field copy_to_imported_payments is checked.
I am getting a syntax error while executing the query. I tried a couple of things, but they did not work. Can anyone look over the queries and let me know where the issue is
Try the following
UPDATE local.import_payments
Set local.import_payments.client_id =
local.payments.payment_for_client__record_id,
local.import_payments.client_name =
local.payments.payment_for_client__company_name,
local.import_payments.customer_id = local.payments.customer__record_id,
local.import_payments.customer_name = local.payment_from_customer,
local.import_payments.payment_id = local.payments.payment_id
FROM local.payments as lpay
WHERE lpay.<<field>> = local.import_payments.<<field>>
AND local.payments.copy_to_imported_payments = 'true'
You shouldn't to specify the schema/table for updated columns, only column names:
Do not include the table's name in the specification of a target column — for example, UPDATE table_name SET table_name.col = 1 is invalid.
from the doc
You shouldn't to use the updating table in the from clause except of the case of self-join.
You can to make your query shorter using "column-list syntax".
update local.import_payments as target
set (
client_id,
client_name,
customer_id,
customer_name,
payment_id) = (
source.payment_for_client__record_id,
source.payment_for_client__company_name,
source.customer__record_id,
source.payment_from_customer,
source.payment_id)
from local.payments as source
where
<join condition> and
source.copy_to_imported_payments = 'true'

Selecting parent records by filtering multiple fields of collection of links

I have been trying to figure out this for a couple of days know but I can't come up with a query that gives me the correct results. The essence of the task is that I am trying to retrieve all the nodes of a graph that have children with attributes that satisfy multiple constraints. The issue I have is that a node may have multiple linked nodes and when I try to apply criteria to restrict which nodes must be returned by the query the criteria need to be imposed against sets of nodes instead of individual nodes.
Let me explain the problem in more detail through an example. Here is a sample schema of companies and locations. Each company can have multiple locations.
create class company extends V;
create property company.name STRING;
create class location extends V;
create property location.name STRING;
create property location.type INTEGER;
create property location.inactive STRING;
Let me now create a couple of records to illustrate the problem I have.
create vertex company set name = 'Company1';
create vertex location set name = 'Location1', type = 5;
create vertex location set name = 'Location2', type = 7;
create edge from (select from company where name = 'Company1') to (select from location where name in ['Location1', 'Location2']);
create vertex company set name = 'Company2';
create vertex location set name = 'Location3', type = 6;
create vertex location set name = 'Location4', type = 5, inactive = 'Y';
create edge from (select from company where name = 'Company2') to (select from location where name in ['Location3','Location4']);
I want to retrieve all companies that either don't have a location of type 5 or have a location of type 5 that is inactive (inactive = 'Y'). The query that I tried initially is shown below. It doesn't work because the $loc.type is evaluated against a collection of values instead of a individual record so the is null is not applied against the individual field 'inactive' of each location record but against the collection of values of the field 'inactive' for each parent record. I tried sub-queries, the set function, append and so on but I can't get it to work.
select from company let loc = out() where $loc.type not contains 5 or ($loc.type contains 5 and $loc.type is null)
You can try with this query:
select expand($c)
let $a = ( select expand(out) from E where out.#class = "company" and in.#class="location" and in.type = 5 and in.inactive = "Y" ),
$b = ( select from company where 5 not in out("E").type ),
$c = unionAll($a,$b)
UPDATE
I have created this graph
You can use this query
select expand($f)
let $a = ( select from E where out.#class = "company" and in.#class="location" ),
$b = ( select expand(out) from $a where in.type = 5 and in.inactive = "Y"),
$c = ( select expand(out) from $a where in.type = 5 and in.inactive is null),
$d = difference($b,$c),
$e = ( select from company where 5 not in out("E").type ),
$f = unionAll($d,$e)
Hope it helps.
Try this query:
select expand($d) from company
let $a=(select from company where out().type <> 5 and name contains $parent.current.name),
$b=(select from company where out().type contains 5 and name=$parent.current.name),
$c=(select from company where out().inactive contains "Y" and name=$parent.current.name),
$d=unionall($a,intersect($b,$c))
Hope it helps,
Regards,
Michela

INSERT INTO not working in IF block - T-SQL

im working on procedure which should transfer number of items (value #p_count) from old store to new store
SET #countOnOldStore = (SELECT "count" FROM ProductStore WHERE StoreId = #p_oldStoreId AND ProductId = #p_productID)
SET #countOnNewStore = (SELECT "count" FROM ProductStore WHERE StoreId = #p_newStoreID AND ProductId = #p_productID)
SET #ShiftedCount = #countOnOldStore - #p_count
SET #newStoreAfterShift = #countOnNewStore + #p_count
IF #ShiftedCount > 0
BEGIN
DELETE FROM ProductStore WHERE storeId = #p_oldStoreId and productID = #p_productID
INSERT INTO ProductStore (storeId,productId,"count") VALUES (#p_oldStoreId,#p_productID,#ShiftedCount)
DELETE FROM ProductStore WHERE storeId = #p_newStoreID and productID = #p_productID
INSERT INTO ProductStore (storeId,productId,"count") VALUES (#p_newStoreID,#p_productID,#newStoreAfterShift)
END
ELSE
PRINT 'ERROR'
well ... second insert is not working. I cant figure it out. It says
Cannot insert the value NULL into column 'count', table 'dbo.ProductStore'; column does not allow nulls. INSERT fails.
Can anyone see problem and explain it to me ? Its school project
It looks like your entire query should just be:
UPDATE ProductStore
SET [count] = [count] + CASE
WHEN storeId = #p_NewStoreID THEN #p_Count
ELSE -#p_Count END
WHERE
productID = #p_ProductID and
storeID in (#p_NewStoreID,#p_OldStoreID)
If either value in the following is NULL, the total will be NULL:
SET #newStoreAfterShift = #countOnNewStore + #p_count
Check both values (#countOnNewStore, #p_count) for NULL.
Looks like you are not assigning any value to #p_count, so it is NULL and so are #ShiftedCount and #newStoreAfterShift.