Py4JJavaError: An error occurred while calling o771.save. Azure Synapse Analytics Notebook

Py4JJavaError: An error occurred while calling o771.save. Azure Synapse Analytics Notebook - pyspark

Here is my pyspark code used in Notebook
data_lake_container = 'abfss://abc.dfs.core.windows.net'
stage_folder = 'abc'
delta_lake_folder = 'abc'
source_folder = 'abc'
source_wildcard = 'abc.parquet'
key_column = 'Id'
key_column1 = 'LastModifiedDate'
source_path = data_lake_container + '/' + stage_folder + '/' + source_folder + '/' + source_wildcard
delta_table_path = data_lake_container + '/' + delta_lake_folder + '/' + source_folder
sdf = spark.read.format('parquet').option("recursiveFileLookup", "true").load(source_path)
if (DeltaTable.isDeltaTable(spark, delta_table_path)):
delta_table = DeltaTable.forPath(spark, delta_table_path)
delta_table.alias("existing").merge(
source=sdf.alias("updates"),
condition=("existing." + key_column + " = updates." + key_column + " and existing." + key_column1 + " = updates." + key_column1) # We look for matches on the name column
).whenMatchedUpdateAll(
).whenNotMatchedInsertAll(
).execute()
else:
sdf.write.format('delta').save(delta_table_path)
while executing above code I'm getting below error
Py4JJavaError: An error occurred while calling o771.save.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
at org.apache.spark.sql.delta.files.TransactionalWrite.$anonfun$writeFiles$1(TransactionalWrite.scala:216)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:107)
Kindly help me in resolving error

Py4JJavaError: An error occurred while calling o771.save.
: org.apache.spark.SparkException: Job aborted.
The above error generally occurred because of non-compatible versions of spark connector and spark.
Refer - org.apache.spark.SparkException: Job aborted due to stage failure: Task from application
If the above solution does not work for you, please share a full stack trace of error. It is difficult to identify issues with shared information.

#AbhishekKhandave, when I looked into full error, there was date column with data range less than '1900-01-01'. That was the issue. Finally, I was able to run script. Thank you for your response.

Related

TypeError: init() got an unexpected keyword argument 'n_features_to_select' : - feature selection using forward selection

I am trying to do feature selection using feature forward method.
Tried previously answered questions but didn't get any proper solution. My code is as follows:
def forward_selection_rf(data, target, number_of_features=14):
# adapt number of features to select: if requested number
# is greater than features availabe, go for 75% of the
# features instead
if number_of_features > len(data.columns):
print("SFS: Wanted " + str(number_of_features) + " from " + str(len(data.columns)) + " featurs. Sanifying to 75%")
number_of_features = 0.75
# Sequential Forward Selection(sfs)
sfs1 = sfs(RandomForestClassifier(
n_estimators=70,
criterion='gini',
max_depth=15,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features='auto',
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
bootstrap=True,
oob_score=False,
n_jobs=-1,
random_state=0,
verbose=0,
warm_start=False,
class_weight='balanced'
),
n_features_to_select=14,
direction='forward',
scoring = 'roc_auc',
cv = 5,
n_jobs = 3)
sfs1.fit(data, target)
return sfs1
compiler gives runtime error as follows:
forward_selection_rf(X, y, number_of_features=14)
Traceback (most recent call last):
File "C:\Users\drash\AppData\Local\Temp\ipykernel_37980\1091017691.py", line 1, in <module>
forward_selection_rf(X, y, number_of_features=14)
File "C:\Users\drash\OneDrive\Desktop\Howto Health\untitled3.py", line 102, in forward_selection_rf
TypeError: __init__() got an unexpected keyword argument 'n_features_to_select'

Extracting Features vectorAssembler

When i am trying to extract the features from vector assembler. I got an error saying job aborted due to stage failure.
cols_to_keep_unscaled = ['Type_Index']
cols_to_scale = ['AirtemperatureK', 'ProcesstemperatureK', 'Rotationalspeedrpm','TorqueNm','Toolwearmin']
def extract(row):
return (row.Type_index) + tuple(row.scaled_features.toArray().tolist())
clean_df = clean_df.select(*cols_to_keep_unscaled, "scaled_features").rdd.map(extract).toDF(cols_to_keep_unscaled + cols_to_scale)

How to convert mnist dataset in array

Hello consider following code
# load the mnist training data CSV file into a list
training_data_file = open("Training_Set/mnist_train_100.csv", 'r')
training_data_list = training_data_file.readlines()
training_data_file.close()
for record in training_data_list:
all_values = record.split(',')
x_inputs = (np.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
print("xinput=" + str(x_inputs))
print(len(training_data_list))
MyCompleteInput = np.array(x_inputs,len(training_data_list))
I want to put x_inputs and len(training_data_list) into an array so if I print the shape of the array I get an output of (784,100).
But if I run my code I get following error:
TypeError Traceback (most recent call last)
<ipython-input-38-b0f129f57bcb> in <module>()
11 print("xinput=" + str(x_inputs))
12 print(len(training_data_list))
---> 13 MyCompleteInput = np.array(x_inputs,len(training_data_list))
14
15
TypeError: data type not understood
Can somebody help me out? tnx

The line will be
MyCompleteInput = np.array((x_inputs,len(training_data_list)))
Do this and your error will be gone. You need to add another set of parantheses for specifying the size.

Drools rule issue after migrating to 6.x from 5.3

I am getting issue in below rule. This is working fine in 5.3 but throwing error (must be boolean expression).
String drl="import com.drools.Applicant;"
+ "rule \"Is of valid age\" "
+ " when $a : Applicant(age > 18 && name matches \"(?i).*\"+ name + \"(.|\n|\r)*\")"
+ " then $a.setValid( true ); "
+ " System.out.println(\"validation: \" + $a.isValid());\n"+
"end";
Issue is with line :
" when $a : Applicant(age > 18 && name matches \"(?i).\"+ name + \"(.|\n|\r)\")"
Any advise.

The expression isn't correct since name cannot be resolved as part of an experssion. Use a binding.
$a : Applicant($n: name, age > 18, name matches \"(?i).*\"+ $name + \"(.|\n|\r)*\")"
(I don't think the the constraint makes much sense - it's merely a test whether a name matches itself, with or without arbitrary characters before and after. Moreover, the ?i is superfluous.)

Unable to execute pig scripts using Azure powershell

This is my Pig script
$QueryString = "A = load 'wasb://$containername#$StorageAccount.blob.core.windows.net/table1' using PigStorage(',') as (col1 chararray,col2 chararray,col3 chararray,col4 chararray,col5 chararray,col6 chararray,col7 int,col8 int);" +
"user_list = foreach A GENERATE $0;" +
"unique_user = DISTINCT user_list;" +
"unique_users_group = GROUP unique_user ALL;" +
"uu_count = FOREACH unique_users_group GENERATE COUNT(unique_user);" +
"DUMP uu_count;"
i get this error when i execute above pig script
'2015-04-14 23:17:55,177 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. <line 1, column 166> mismatched input 'chararray' expecting RIGHT_PAREN
Failed to parse: <line 1, column 166> mismatched input 'chararray' expecting RIGHT_PAREN
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:241)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:179)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344)
at org.apache.pig.PigServer.executeBatch(PigServer.java:369)
at org.apache.pig.PigServer.executeBatch(PigServer.java:355)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:509)
at org.apache.pig.Main.main(Main.java:156)
2015-04-14 23:17:55,177 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 1, column 166> mismatched input 'chararray' expecting RIGHT_PAREN
I edited the LOAD statement like this and the rest of the script is same
$QueryString = "A = load 'wasb://$containername#$StorageAccount.blob.core.windows.net/table1';" +
the error i get now is
2015-04-14 23:23:00,117 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. <line 1, column 162> Syntax error, unexpected symbol at or near ';'
Failed to parse: <line 1, column 162> Syntax error, unexpected symbol at or near ';'
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:241)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:179)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344)
at org.apache.pig.PigServer.executeBatch(PigServer.java:369)
at org.apache.pig.PigServer.executeBatch(PigServer.java:355)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:509)
at org.apache.pig.Main.main(Main.java:156)
2015-04-14 23:23:00,132 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 1, column 162> Syntax error, unexpected symbol at or near ';'
Details at logfile: C:\apps\dist\hadoop-2.4.0.2.1.9.0-2196\logs\pig_1429053777602.log
I don't understand what the error is. Can you someone help me with executing this query on windows powershell(i am using windows powershell ISE, so i can edit the queries)

The issue is at this statement user_list = foreach A GENERATE $0;. PowerShell is interpreting $0 as a parameter, and since it is not defined PowerShell is substituting an empty string. You can define a parameter in the script like $0 = '$0'; or just escape the $ like:
user_list = foreach A GENERATE `$0;
PowerShell uses the ` (backtick, next to the '1' key) as an escape character for double-quoted strings.
so the script can look like:
$0 = '$0';
$QueryString = "A = load 'wasb://$containerName#$storageAccountName.blob.core.windows.net/table1' using PigStorage(',') as (col1,col2,col3,col4,col5,col6,col7,col8) ;"+
"user_list = foreach A GENERATE $0;" +
"unique_user = DISTINCT user_list;" +
"unique_users_group = GROUP unique_user ALL;" +
"uu_count = FOREACH unique_users_group GENERATE COUNT(unique_user);" +
"DUMP uu_count;"
or
$QueryString = "A = load 'wasb://$containerName#$storageAccountName.blob.core.windows.net/table1' using PigStorage(',') as (col1,col2,col3,col4,col5,col6,col7,col8) ;"+
"user_list = foreach A GENERATE `$0;" +
"unique_user = DISTINCT user_list;" +
"unique_users_group = GROUP unique_user ALL;" +
"uu_count = FOREACH unique_users_group GENERATE COUNT(unique_user);" +
"DUMP uu_count;"

Categories

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Py4JJavaError: An error occurred while calling o771.save. Azure Synapse Analytics Notebook - pyspark

#AbhishekKhandave, when I looked into full error, there was date column with data range less than '1900-01-01'. That was the issue. Finally, I was able to run script. Thank you for your response.

Related

TypeError: init() got an unexpected keyword argument 'n_features_to_select' : - feature selection using forward selection

Extracting Features vectorAssembler

How to convert mnist dataset in array

Drools rule issue after migrating to 6.x from 5.3

Unable to execute pig scripts using Azure powershell

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Py4JJavaError: An error occurred while calling o771.save. Azure Synapse Analytics Notebook - pyspark

#AbhishekKhandave, when I looked into full error, there was date column with data range less than '1900-01-01'. That was the issue. Finally, I was able to run script. Thank you for your response.

Related

TypeError: __init__() got an unexpected keyword argument 'n_features_to_select' : - feature selection using forward selection

Extracting Features vectorAssembler

How to convert mnist dataset in array

Drools rule issue after migrating to 6.x from 5.3

Unable to execute pig scripts using Azure powershell

Categories

Resources

TypeError: init() got an unexpected keyword argument 'n_features_to_select' : - feature selection using forward selection