pyspark parseException exceptions - pyspark

ParseException in pyspark
try:
sql
Except Exception as e:
logmessage('header '+str(e),func1() )
ParseException everyting I use str(e), without that it works fine.
Any conversion needed? Thanks

Related

Catching spark exceptions in PySpark

I have a Databricks notebook that reads csv files as a first step in an ETL pipeline.
Sometimes the csv-files does not have the required schema and this causes the notebook to crash. I need to handle these errors when they occur instead of letting the entire pipeline crash.
Below is my code where I attempt to handle these errors. When a faulty csv-file is read I expect the output to be "Exception caught".
try:
newData = (spark.read
.format("csv")
.option("delimiter", "|")
.option("mode", "FAILFAST")
.option("inferSchema", "false")
.option("enforceSchema", "true")
.option("header", "True")
.schema(schema)
.load(bronzePath + "/"+ fileName + "*")
)
newData.display()
except FileReadException as e:
# Do stuff, handle exception
print("Exception caught")
But the Exception is never caught and I get the the full Exception as output.
FileReadException: Error while reading file dbfs:/mnt/datalake/bronze/<myPath>/<myFile.csv>.
Caused by: SparkException: Malformed records are detected in record parsing. Parse Mode: FAILFAST. To process malformed records as null result, try setting the option 'mode' as 'PERMISSIVE'.
Caused by: BadRecordException: org.apache.spark.sql.catalyst.csv.MalformedCSVException: Malformed CSV record
Caused by: MalformedCSVException: Malformed CSV record
Googling the issue helped me understand that it is not possible to catch scala exceptions in pyspark.
Is there some other way that I can catch this exception? What other alternatives do I have?
I need to in some way handle any faulty csv-files that are received. Using PERMISSIVE or DROPMALFORMED mode is not an option for me since I need to react and treat the faulty files.

How to output many sqlite3 commands inside one powershell script

What i was wondering is, how to copy the next sqlite3 code into a single powershell script:
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> .once -x
sqlite> SELECT * FROM analysis;
I tried the following thing regarding the changing to powershell:
echo "SELECT * FROM analysis" echo ".once -x;" | sqlite3 C:\ProgramData\PROISER\ISASPSUS\datastore\dsfile.db
But as you can know, there is no way that two echo statements can be together in a same place. And the end result that throws Powershell is:
sqlite3 : Error: near line 2: near "echo": syntax error
En línea: 1 Carácter: 51
+ ... once -x;" | sqlite3 C:\ProgramData\PROISER\ISASPSUS\datastore\dsfile. ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (Error: near lin...": syntax error:String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
Now.
Store your SQL in a file, and pass the filename to SQLite. Essentially:
sqlite3 DB.db ".read FILENAME"
See Running a Sqlite3 Script from Command Line

Ansible through WinRM: win_find throws OutOfMemoryException

I'm working on some Ansible playbooks that have to run on Windows Server 2008 with Powershell 3.
As part of one of these playbooks I do a win_find to get a list of files in a directory:
- name: filecheck
win_find:
paths: C:\dev\envs
register: envs
When I try to run the playbook, it runs correctly through preliminary operations (eg printing some debug variables) but then errors out when the above operation is executed:
"module_stderr": "Exception of type 'System.OutOfMemoryException' was thrown.\r\nAt line:60 char:9\r\n+ If (-not $obj.GetType)\r\n+ ~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : OperationStopped: (:) [], OutOfMemoryException\r\n + FullyQualifiedErrorId : System.OutOfMemoryException\r\n \r\n\r\n",
Now I don't see a scenario where a non-recursive find on a folder with three files would be able to consume 150MB of memory, let alone the 2048 that's been allocated to shells over WinRM.
Does anyone know how I can fix this issue or try and find the root cause?

What is the difference between ServiceFabricClusterConfiguration and ServiceFabricClusterManifest?

My question regards updating the configuration for a Service Fabric standalone Windows cluster.
What is the difference between ServiceFabricClusterConfiguration and ServiceFabricClusterManifest?
Suppose that I want to change the ApplicationPorts setting, I see these options:
Using ServiceFabricClusterConfiguration
use Get-ServiceFabricClusterConfiguration
edit the JSON file
Start an upgrade using Start-ServiceFabricClusterConfigurationUpgrade
or
Using ServiceFabricClusterManifest
Use Get-ServiceFabricClusterManifest
edit the XML file
Start an upgrade using
Register-ServiceFabricClusterPackage -Config -ClusterManifestPath "ClusterConfigv2.xml"
Start-ServiceFabricClusterUpgrade -ClusterManifestVersion 2 -Config
I tried to change ApplicationPorts via the ServiceFabricClusterManifest. We are now in a situation that the ApplicationPorts is different in the json (Get-ServiceFabricClusterConfiguration) than the value in the XML (Get-ServiceFabricClusterManifest)
My questions are:
What is the difference between the two approaches?
What is the approach I should take?
Since the different Get- commands give different results, which is the way to see the actual applied configuration?
update:
I get the following error when I run the Start-ServiceFabricClusterConfigurationUpgrade command.
Exception : System.Exception: Exception of type 'System.Exception' was thrown.
at System.Fabric.Interop.NativeClient.IFabricClusterManagementClient7.EndUpgradeConfiguration(IFabricAsyncOperationC
ontext context)
at System.Fabric.Interop.Utility.<>c__DisplayClassa.<WrapNativeAsyncInvoke>b__9(IFabricAsyncOperationContext context
)
at System.Fabric.Interop.AsyncCallOutAdapter2`1.Finish(IFabricAsyncOperationContext context, Boolean expectedComplet
edSynchronously)
TargetObject : Microsoft.ServiceFabric.Powershell.ClusterConnection
CategoryInfo : NotSpecified: (Microsoft.Servi...usterConnection:ClusterConnection) [Start-ServiceFa...gurationUpgrade], Exception
FullyQualifiedErrorId : StartClusterConfigurationUpgradeErrorId,Microsoft.ServiceFabric.Powershell.StartClusterConfigurationUpgrade
ErrorDetails :
InvocationInfo : System.Management.Automation.InvocationInfo
ScriptStackTrace : at <ScriptBlock>, <No file>: line 1
PipelineIterationInfo : {}
PSMessageDetails :
For on premises deployments Start-ServiceFabricClusterConfigurationUpgrade is the supported mechanism and is the only one you should use. As long as you use only one mechanism you shouldn't receive inconsistent results.
The detailed error is in the trace logs. Could u provide the 2 JSON files? Also, a common error is to upgrade without updating the json config version, which is the "clusterConfigurationVersion" item on the JSON config.

Is DB2 CLPPLUS editor able to do xquery

I am new to the CLPPlus editor and I'm trying a simple query that works if I execute it from a file like this
db2 -td% -svf C:\query.sql
and the query.sql file contains:
SELECT tx.ID,XMLQUERY('for $e in $d/Client/Address return data($e)' passing tx.contactinfo as "d") FROM clients tx %
If I just place the query as it is in the CLP or CLPPLUS editor as it is I get errors.
Error FROM CLP: SQL0104N An unexpected token "for $e in $d/Client/Ad"
was found following "LECT tx.ID,XMLQUERY(". Expected tokens may
include: "
Error from CLPPLUS: SQL16002N An XQuery expression has an unexpected
token "in" following "for ". Expected tokens may include: "is". Error
QName=err:XPST0003.