Synapse Spark error when writing dataframe - scala

im running a spark notebook within Azure Synapse Analytics, and it is writing dataframes after cognitive services transformations into parquet format.
Im receiving the following error:
org.apache.spark.SparkException: Failed to execute user defined function (functions$$$Lambda$1550/1889298707: (struct>,displayName:string,tasks:struct>,warnings:array>,statistics:struct>>,errors:array>,modelVersion:string>>>,entityLinkingTasks:array>,language:string,id:string,url:string,dataSource:string>>,warnings:array>,statistics:struct>>,errors:array>,modelVersion:string>>>,entityRecognitionPiiTasks:array>,redactedText:string,warnings:array>,statistics:struct>>,errors:array>,modelVersion:string>>>,keyPhraseExtractionTasks:array,warnings:array>,statistics:struct>>,errors:array>,modelVersion:string>>>,sentimentAnalysisTasks:array,confidenceScores:struct,sentences:array,offset:int,length:int>>,warnings:array>>>,errors:array>,modelVersion:string>>>>>) => array>,warnings:array>,statistics:struct>,error:struct>>,entityLinking:array>,language:string,id:string,url:string,dataSource:string>>,warnings:array>,statistics:struct>,error:struct>>,entityRecognitionPii:array>,redactedText:string,warnings:array>,statistics:struct>,error:struct>>,keyPhraseExtraction:array,warnings:array>,statistics:struct>,error:struct>>,sentimentAnalysis:array,confidenceScores:struct,sentences:array,offset:int,length:int>>,warnings:array>>,error:struct>>>>)
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:136)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ScalaUDF_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_8_4$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:93)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:304)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:311)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$16(FileFormatWriter.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: The value (0) of the type (java.lang.Integer) cannot be converted to the string type
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:303)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:295)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:113)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:268)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:248)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:113)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:258)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:248)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:113)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.$anonfun$toCatalystImpl$2(CatalystTypeConverters.scala:174)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:174)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:164)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:108)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:258)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:248)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:113)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.$anonfun$toCatalystImpl$2(CatalystTypeConverters.scala:174)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:174)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:164)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:108)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:48
Here is the input schema (after DF transformations):
optional binary productAsin (STRING);
optional binary username (STRING);
optional int64 ratingScore;
optional binary reviewTitle (STRING);
optional binary reviewUrl (STRING);
optional binary reviewReaction (STRING);
optional binary reviewedIn (STRING);
optional binary reviewDescription (STRING);
optional boolean isVerified;
optional binary avatar (STRING);
optional binary variant (STRING);
optional group reviewImages (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional int64 position;
optional group error {
optional binary response (STRING);
optional group status {
optional group protocolVersion {
optional binary protocol (STRING);
required int32 major;
required int32 minor;
}
required int32 statusCode;
optional binary reasonPhrase (STRING);
}
}
optional group textAnalysis (LIST) {
repeated group list {
optional group element {
optional group entityRecognition (LIST) {
repeated group list {
optional group element {
optional group result {
optional binary id (STRING);
optional group entities (LIST) {
repeated group list {
optional group element {
optional binary text (STRING);
optional binary category (STRING);
optional binary subcategory (STRING);
optional int32 offset;
optional int32 length;
required double confidenceScore;
}
}
}
optional group warnings (LIST) {
repeated group list {
optional group element {
optional binary code (STRING);
optional binary message (STRING);
optional binary targetRef (STRING);
}
}
}
optional group statistics {
required int32 charactersCount;
required int32 transactionsCount;
}
}
optional group error {
optional binary id (STRING);
optional binary error (STRING);
}
}
}
}
optional group entityLinking (LIST) {
repeated group list {
optional group element {
optional group result {
optional binary id (STRING);
optional group entities (LIST) {
repeated group list {
optional group element {
optional binary name (STRING);
optional group matches (LIST) {
repeated group list {
optional group element {
required double confidenceScore;
optional binary text (STRING);
required int32 offset;
required int32 length;
}
}
}
optional binary language (STRING);
optional binary id (STRING);
optional binary url (STRING);
optional binary dataSource (STRING);
}
}
}
optional group warnings (LIST) {
repeated group list {
optional group element {
optional binary code (STRING);
optional binary message (STRING);
optional binary targetRef (STRING);
}
}
}
optional group statistics {
required int32 charactersCount;
required int32 transactionsCount;
}
}
optional group error {
optional binary id (STRING);
optional binary error (STRING);
}
}
}
}
optional group entityRecognitionPii (LIST) {
repeated group list {
optional group element {
optional group result {
optional binary id (STRING);
optional group entities (LIST) {
repeated group list {
optional group element {
optional binary text (STRING);
optional binary category (STRING);
optional binary subcategory (STRING);
optional int32 offset;
optional int32 length;
required double confidenceScore;
}
}
}
optional binary redactedText (STRING);
optional group warnings (LIST) {
repeated group list {
optional group element {
optional binary code (STRING);
optional binary message (STRING);
optional binary targetRef (STRING);
}
}
}
optional group statistics {
required int32 charactersCount;
required int32 transactionsCount;
}
}
optional group error {
optional binary id (STRING);
optional binary error (STRING);
}
}
}
}
optional group keyPhraseExtraction (LIST) {
repeated group list {
optional group element {
optional group result {
optional binary id (STRING);
optional group keyPhrases (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional group warnings (LIST) {
repeated group list {
optional group element {
optional binary code (STRING);
optional binary message (STRING);
optional binary targetRef (STRING);
}
}
}
optional group statistics {
required int32 charactersCount;
required int32 transactionsCount;
}
}
optional group error {
optional binary id (STRING);
optional binary error (STRING);
}
}
}
}
optional group sentimentAnalysis (LIST) {
repeated group list {
optional group element {
optional group result {
optional binary id (STRING);
optional binary sentiment (STRING);
optional group statistics {
required int32 charactersCount;
required int32 transactionsCount;
}
optional group confidenceScores {
required double positive;
required double neutral;
required double negative;
}
optional group sentences (LIST) {
repeated group list {
optional group element {
optional binary text (STRING);
optional binary sentiment (STRING);
optional group confidenceScores {
required double positive;
required double neutral;
required double negative;
}
required int32 offset;
required int32 length;
}
}
}
optional group warnings (LIST) {
repeated group list {
optional group element {
optional binary code (STRING);
optional binary message (STRING);
optional binary targetRef (STRING);
}
}
}
}
optional group error {
optional binary id (STRING);
optional binary error (STRING);
}
}
}
}
}
}
}
}
This error only occurs during this dataframe, is there a way I can have a row exception to the notebook for when this error comes, it just skips the row and drops it?

Related

The return type 'int' isn't a 'String?', as required by the closure's context

what is the problem when I convert the string type value into an int so it will give me the following error mentioned in question.
If I understand your question correctly, you need to turn a String to int?
To do that, use int.parse(yourString) -> you will get int
If you are getting error with int and int? or String and String? (the same type but with question mark), try to use ?? operant
For example (with string value):
MyObject(
stringField: myString ?? '', //you are ensuring that if your String is null, there will be no error
);
if you want to return a String type data from int data, you need to convert it.
String test() {
int num = 0;
return(num.toString());
}
But if you want to return a int type data from String data, you need to parse it first to int.
int test() {
String pi = '3';
int now = int.parse(pi); // result from change String to int
return now;
}

Shorter Alternative to ternary to generate empty string if nil?

I have a parameter of Type Double?.
When this parameter is nil, I want to have an empty string.
I can use if (variable == nil) ? "" : String(variable!) but is there a shorter alternative?
Using Optional.map and the nil-coalescing operator ?? you can do
var variable: Double? = 1.0
let string = variable.map { String($0) } ?? ""
The closure is called (and the string returned) if the variable is not nil, otherwise map returns nil and the expression evaluates to the empty string.
I don't see a simple way to simplify your code. An idea is to create a Double extension like this:
extension Optional where Wrapped == Double {
var asString: String {
self == nil ? "" : String(self!)
}
}
And then instead of that if condition you just use:
variable.asString
If you want to use the resulting string in another string, like this:
let string = "The value is: \(variable)"
and possibly specify what to print when variable is nil :
let string = "The value is: \(variable, nil: "value is nil")"
you can write a handy generic extension for String.StringInterpolation which takes any type of value and prints this and if it's an optional and also nil it prints the specified "default" string:
extension String.StringInterpolation {
mutating func appendInterpolation<T>(_ value: T?, `nil` defaultValue: #autoclosure () -> String) {
if let value = value {
appendLiteral("\(value)")
} else {
appendLiteral(defaultValue())
}
}
}
Example:
var d: Double? = nil
print("Double: \(d, nil: "value is nil")")
d = 1
print("Double: \(d, nil: "value is nil")")
let i = 1
print("Integer: \(i, nil: "value is nil")")
Output on the console:
Double: value is nil
Double: 1.0
Integer: 1
Just for fun a generic approach to cover all types that conforms to LosslessStringConvertible:
extension LosslessStringConvertible {
var string: String { .init(self) }
}
extension Optional where Wrapped: LosslessStringConvertible {
var string: String { self?.string ?? "" }
}
var double = Double("2.7")
print(double.string) // "2.7\n"
Property wrappers should help you give the desired result - property wrappers have a special variables wrappedValue and projectedValue that can add a layer of separation and allow you to wrap your custom logic.
wrappedValue - manipulate this variable with getters and setters. It has very less use in our case as it is of Double? type
projectedValue - this is going to be our focus as we can use this variable to project the Double as a String in our case.
The implementation is as below
#propertyWrapper
struct DoubleToString {
private var number: Double = 0.0
var projectedValue: String = ""
var wrappedValue: Double?{
get {
return number // Not really required
}
set {
if let value = newValue { // Check for nil
projectedValue = value.description // Convert to string
number = value
}
}
}
}
Now we create a struct which uses this wrapper.
struct NumbersTest {
#DoubleToString var number1: Double?
#DoubleToString var number2: Double?
}
On running the below code, we get the desired result. $number1 gives us the projectedValue and if we ignore the $ symbol we get the wrappedvalue
var numbersTest = NumbersTest()
numbersTest.number1 = 25.0
numbersTest.number2 = nil
print(numbersTest.$number1) //"25.0"
print(numbersTest.$number2) //""
By using property wrappers you can keep the variable interoperable to get both Double and String values easily.

Type-Constrained Optional Extension visible to other types

So because I got bored of having to manually nil-coalesce optional values, I decided to make the common default values into nice and easy-to-use extensions on Optional, as seen below:
public extension Optional {
var exists: Bool { return self != nil }
}
public extension Optional where Wrapped == String {
var orEmpty: Wrapped { return self ?? "" }
}
public extension Optional where Wrapped == Int {
var orZero: Wrapped { return self ?? 0 }
}
public extension Optional where Wrapped == Double {
var orZero: Wrapped { return self ?? 0 }
}
public extension Optional where Wrapped == Float {
var orZero: Wrapped { return self ?? 0 }
}
public extension Optional where Wrapped == Bool {
var orFalse: Wrapped { return self ?? false }
}
My issue arrives when attempting to use these on an optional value with a specific type.
I can call .orZero on a variable of type String? and get one of the following errors:
Ambiguous reference to member 'orZero'
'String?' is not convertible to 'Optional'
I'd like to know why Xcode is providing the .orZero properties of such an optional as valid auto-completion options? I would've thought the generic constraints would prevent me from being able to see them.
For what it's worth, I'm using Xcode 10.1, and Swift 4.2
.orZero being provided as an auto-completion is a bug. It can be circumvented by rewriting your extensions in terms of the appropriate literal protocols.
public extension Optional where Wrapped: ExpressibleByIntegerLiteral {
var orZero: Wrapped { return self ?? 0 }
}
public extension Optional where Wrapped: ExpressibleByBooleanLiteral {
var orFalse: Wrapped { return self ?? false }
}
Expressed in this way, Swift can now figure out that .isZero ought not to be suggested for i.e. a variable of type String?, and if you try to call it anyway, it will give the error Type 'String' does not conform to protocol 'ExpressibleByIntegerLiteral'.
Take a look at this.
In you extension .orZero is only for Int, Float, Double.
For String you have .orEmpty.

Unable to return an optional as an optional

Why can't I get Swift to return a value as an optional.
I have a funtion that checks if an optional contains a value and return it as an optional if it isn't:
var someOptional: String?
func checkIfOptional<T>(value: T?) -> (String, T) {
if let _value = value {
return (("Your optional contains a value. It is: \(_value)"), (_value))
} else {
return (("Your optional did not contain a value"), (value?)) //ERROR: Value of optional type 'T?' not unwrapped; did you mean to use '!' or '?'?
}
}
When the optional is nil. Ist should return the same optional the was given to the function.
If there is a value. It should return the unwrapped value.
If you want to return an optional you have to declare the return type as optional
func checkIfOptional<T>(value: T?) -> (String, T?) {
if let _value = value {
return ("Your optional contains a value. It is: \(_value)", value)
} else {
return ("Your optional did not contain a value", value)
// or even return ("Your optional did not contain a value", nil)
}
I removed all unnecessary parentheses.
You may want to declare an enum like this:
enum Value<T> {
case full(String, T)
case empty(String, T?)
}
func checkIfOptional<T>(_ value: T?) -> Value<T> {
if let _value = value {
return .full("Your optional contains a value. It is: \(_value)", _value)
} else {
return .empty("Your optional did not contain a value.", value)
}
}
var toto: String?
print(checkIfOptional(toto)) // empty("Your optional did not contain a value", nil)
print(checkIfOptional("Blah")) // full("Your optional contains a value. It is: Blah", "Blah")
To treat a Value you should use switch this way:
var toto: String?
let empty = checkIfOptional(toto)
let full = checkIfOptional("Blah")
func treatValue<T>(_ value: Value<T>) {
switch(value) {
case .full(let msg, let val):
print(msg)
print(val)
case .empty(let msg, _):
print(msg)
}
}
treatValue(empty) // Your optional did not contain a value.
treatValue(full) // Your optional contains a value. It is: Blah\nBlah
But all of this seems to me to only add needless complexity to the straightforward type that is Optional. So you might want to expand on what you are trying to achieve here.

Verify a string as a valid UUID in Swift

Given an supplied String, how can I verify wether the String is a valid UUID in Swift?
A valid UUID (as far as I know) could be:
33041937-05b2-464a-98ad-3910cbe0d09e
3304193705b2464a98ad3910cbe0d09e
You could use UUID
var uuid = UUID(uuidString: yourString)
This will return nil if yourString is not a valid UUID
Note: this only validates the first case you presented, not the second but adding the dashes yourself is trivial.
The following is updated for Swift 4.0 to determine if a string is a valid UUID.
let uuidHyphens = "33041937-05b2-464a-98ad-3910cbe0d09e"
let uuidNoHyphens = "3304193705b2464a98ad3910cbe0d09e"
if UUID(uuidString: uuidHyphens) != nil {
print("UUID string with hypens is valid") // Will be valid
} else {
print("UUID string with hypens is not valid")
}
// In this scenario, the UUID will be nil,
if UUID(uuidString: uuidNoHyphens) != nil {
print("UUID string with no hypens is valid")
} else {
print("UUID string with no hypens is not valid") // Will not be valid
}
The string passed in to the UUID init must contain hyphens, otherwise the check will fail. If you are expecting strings without hypens, then you can utilize an approach such as what is discussed here to add hypens to a string if it satisfies a length of 32.
The relevant section from Apple's documentation:
Create a UUID from a string such as
“E621E1F8-C36C-495A-93FC-0C247A3E6E5F”.
I wrote the extension below to make UUID Strings
extension String {
var uuid: String? {
var string = self
var index = string.index(string.startIndex, offsetBy: 8)
for _ in 0..<4 {
string.insert("-", at: index)
index = string.index(index, offsetBy: 5)
}
// The init below is used to check the validity of the string returned.
return UUID(uuidString: string)?.uuidString
}
}