Py4JJavaError: An error occurred while calling o771.save. Azure Synapse Analytics Notebook - pyspark

Here is my pyspark code used in Notebook
data_lake_container = 'abfss://abc.dfs.core.windows.net'
stage_folder = 'abc'
delta_lake_folder = 'abc'
source_folder = 'abc'
source_wildcard = 'abc.parquet'
key_column = 'Id'
key_column1 = 'LastModifiedDate'
source_path = data_lake_container + '/' + stage_folder + '/' + source_folder + '/' + source_wildcard
delta_table_path = data_lake_container + '/' + delta_lake_folder + '/' + source_folder
sdf = spark.read.format('parquet').option("recursiveFileLookup", "true").load(source_path)
if (DeltaTable.isDeltaTable(spark, delta_table_path)):
delta_table = DeltaTable.forPath(spark, delta_table_path)
delta_table.alias("existing").merge(
source=sdf.alias("updates"),
condition=("existing." + key_column + " = updates." + key_column + " and existing." + key_column1 + " = updates." + key_column1) # We look for matches on the name column
).whenMatchedUpdateAll(
).whenNotMatchedInsertAll(
).execute()
else:
sdf.write.format('delta').save(delta_table_path)
while executing above code I'm getting below error
Py4JJavaError: An error occurred while calling o771.save.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
at org.apache.spark.sql.delta.files.TransactionalWrite.$anonfun$writeFiles$1(TransactionalWrite.scala:216)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:107)
Kindly help me in resolving error

Py4JJavaError: An error occurred while calling o771.save.
: org.apache.spark.SparkException: Job aborted.
The above error generally occurred because of non-compatible versions of spark connector and spark.
Refer - org.apache.spark.SparkException: Job aborted due to stage failure: Task from application
If the above solution does not work for you, please share a full stack trace of error. It is difficult to identify issues with shared information.

#AbhishekKhandave, when I looked into full error, there was date column with data range less than '1900-01-01'. That was the issue. Finally, I was able to run script. Thank you for your response.

Related

TypeError: __init__() got an unexpected keyword argument 'n_features_to_select' : - feature selection using forward selection

I am trying to do feature selection using feature forward method.
Tried previously answered questions but didn't get any proper solution. My code is as follows:
def forward_selection_rf(data, target, number_of_features=14):
# adapt number of features to select: if requested number
# is greater than features availabe, go for 75% of the
# features instead
if number_of_features > len(data.columns):
print("SFS: Wanted " + str(number_of_features) + " from " + str(len(data.columns)) + " featurs. Sanifying to 75%")
number_of_features = 0.75
# Sequential Forward Selection(sfs)
sfs1 = sfs(RandomForestClassifier(
n_estimators=70,
criterion='gini',
max_depth=15,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features='auto',
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
bootstrap=True,
oob_score=False,
n_jobs=-1,
random_state=0,
verbose=0,
warm_start=False,
class_weight='balanced'
),
n_features_to_select=14,
direction='forward',
scoring = 'roc_auc',
cv = 5,
n_jobs = 3)
sfs1.fit(data, target)
return sfs1
compiler gives runtime error as follows:
forward_selection_rf(X, y, number_of_features=14)
Traceback (most recent call last):
File "C:\Users\drash\AppData\Local\Temp\ipykernel_37980\1091017691.py", line 1, in <module>
forward_selection_rf(X, y, number_of_features=14)
File "C:\Users\drash\OneDrive\Desktop\Howto Health\untitled3.py", line 102, in forward_selection_rf
TypeError: __init__() got an unexpected keyword argument 'n_features_to_select'

Extracting Features vectorAssembler

When i am trying to extract the features from vector assembler. I got an error saying job aborted due to stage failure.
cols_to_keep_unscaled = ['Type_Index']
cols_to_scale = ['AirtemperatureK', 'ProcesstemperatureK', 'Rotationalspeedrpm','TorqueNm','Toolwearmin']
def extract(row):
return (row.Type_index) + tuple(row.scaled_features.toArray().tolist())
clean_df = clean_df.select(*cols_to_keep_unscaled, "scaled_features").rdd.map(extract).toDF(cols_to_keep_unscaled + cols_to_scale)

How to convert mnist dataset in array

Hello consider following code
# load the mnist training data CSV file into a list
training_data_file = open("Training_Set/mnist_train_100.csv", 'r')
training_data_list = training_data_file.readlines()
training_data_file.close()
for record in training_data_list:
all_values = record.split(',')
x_inputs = (np.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
print("xinput=" + str(x_inputs))
print(len(training_data_list))
MyCompleteInput = np.array(x_inputs,len(training_data_list))
I want to put x_inputs and len(training_data_list) into an array so if I print the shape of the array I get an output of (784,100).
But if I run my code I get following error:
TypeError Traceback (most recent call last)
<ipython-input-38-b0f129f57bcb> in <module>()
11 print("xinput=" + str(x_inputs))
12 print(len(training_data_list))
---> 13 MyCompleteInput = np.array(x_inputs,len(training_data_list))
14
15
TypeError: data type not understood
Can somebody help me out? tnx
The line will be
MyCompleteInput = np.array((x_inputs,len(training_data_list)))
Do this and your error will be gone. You need to add another set of parantheses for specifying the size.

Drools rule issue after migrating to 6.x from 5.3

I am getting issue in below rule. This is working fine in 5.3 but throwing error (must be boolean expression).
String drl="import com.drools.Applicant;"
+ "rule \"Is of valid age\" "
+ " when $a : Applicant(age > 18 && name matches \"(?i).*\"+ name + \"(.|\n|\r)*\")"
+ " then $a.setValid( true ); "
+ " System.out.println(\"validation: \" + $a.isValid());\n"+
"end";
Issue is with line :
" when $a : Applicant(age > 18 && name matches \"(?i).\"+ name + \"(.|\n|\r)\")"
Any advise.
The expression isn't correct since name cannot be resolved as part of an experssion. Use a binding.
$a : Applicant($n: name, age > 18, name matches \"(?i).*\"+ $name + \"(.|\n|\r)*\")"
(I don't think the the constraint makes much sense - it's merely a test whether a name matches itself, with or without arbitrary characters before and after. Moreover, the ?i is superfluous.)

Unable to execute pig scripts using Azure powershell

This is my Pig script
$QueryString = "A = load 'wasb://$containername#$StorageAccount.blob.core.windows.net/table1' using PigStorage(',') as (col1 chararray,col2 chararray,col3 chararray,col4 chararray,col5 chararray,col6 chararray,col7 int,col8 int);" +
"user_list = foreach A GENERATE $0;" +
"unique_user = DISTINCT user_list;" +
"unique_users_group = GROUP unique_user ALL;" +
"uu_count = FOREACH unique_users_group GENERATE COUNT(unique_user);" +
"DUMP uu_count;"
i get this error when i execute above pig script
'2015-04-14 23:17:55,177 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. <line 1, column 166> mismatched input 'chararray' expecting RIGHT_PAREN
Failed to parse: <line 1, column 166> mismatched input 'chararray' expecting RIGHT_PAREN
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:241)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:179)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344)
at org.apache.pig.PigServer.executeBatch(PigServer.java:369)
at org.apache.pig.PigServer.executeBatch(PigServer.java:355)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:509)
at org.apache.pig.Main.main(Main.java:156)
2015-04-14 23:17:55,177 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 1, column 166> mismatched input 'chararray' expecting RIGHT_PAREN
I edited the LOAD statement like this and the rest of the script is same
$QueryString = "A = load 'wasb://$containername#$StorageAccount.blob.core.windows.net/table1';" +
the error i get now is
2015-04-14 23:23:00,117 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. <line 1, column 162> Syntax error, unexpected symbol at or near ';'
Failed to parse: <line 1, column 162> Syntax error, unexpected symbol at or near ';'
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:241)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:179)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344)
at org.apache.pig.PigServer.executeBatch(PigServer.java:369)
at org.apache.pig.PigServer.executeBatch(PigServer.java:355)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:509)
at org.apache.pig.Main.main(Main.java:156)
2015-04-14 23:23:00,132 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 1, column 162> Syntax error, unexpected symbol at or near ';'
Details at logfile: C:\apps\dist\hadoop-2.4.0.2.1.9.0-2196\logs\pig_1429053777602.log
I don't understand what the error is. Can you someone help me with executing this query on windows powershell(i am using windows powershell ISE, so i can edit the queries)
The issue is at this statement user_list = foreach A GENERATE $0;. PowerShell is interpreting $0 as a parameter, and since it is not defined PowerShell is substituting an empty string. You can define a parameter in the script like $0 = '$0'; or just escape the $ like:
user_list = foreach A GENERATE `$0;
PowerShell uses the ` (backtick, next to the '1' key) as an escape character for double-quoted strings.
so the script can look like:
$0 = '$0';
$QueryString = "A = load 'wasb://$containerName#$storageAccountName.blob.core.windows.net/table1' using PigStorage(',') as (col1,col2,col3,col4,col5,col6,col7,col8) ;"+
"user_list = foreach A GENERATE $0;" +
"unique_user = DISTINCT user_list;" +
"unique_users_group = GROUP unique_user ALL;" +
"uu_count = FOREACH unique_users_group GENERATE COUNT(unique_user);" +
"DUMP uu_count;"
or
$QueryString = "A = load 'wasb://$containerName#$storageAccountName.blob.core.windows.net/table1' using PigStorage(',') as (col1,col2,col3,col4,col5,col6,col7,col8) ;"+
"user_list = foreach A GENERATE `$0;" +
"unique_user = DISTINCT user_list;" +
"unique_users_group = GROUP unique_user ALL;" +
"uu_count = FOREACH unique_users_group GENERATE COUNT(unique_user);" +
"DUMP uu_count;"