Strange error while writing parquet file to s3

Strange error while writing parquet file to s3 - scala

While trying to write a dataframe to S3 I am getting the below error with nullpointerexception. Sometimes the job goes through fine and sometime its failing.
I am using EMR 5.20 and spark 2.4.0
Spark session Creation
val spark = SparkSession.builder
.config("spark.sql.parquet.binaryAsString", "true")
.config("spark.sql.sources.partitionColumnTypeInference.enabled", "false")
.config("spark.sql.parquet.filterPushdown", "true")
.config("spark.sql.parquet.fs.optimized.committer.optimization-enabled","true")
.getOrCreate()
spark.sql("myQuery").write.partitionBy("partitionColumn").mode(SaveMode.Overwrite).option("inferSchema","false").parquet("s3a://...filePath")
Can anyone help resolve this mystery. Thanks in advance
java.lang.NullPointerException
at com.amazon.ws.emr.hadoop.fs.s3.lite.S3Errors.isHttp200WithErrorCode(S3Errors.java:57)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:100)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:184)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.deleteObjects(AmazonS3LiteClient.java:127)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.deleteAll(Jets3tNativeFileSystemStore.java:364)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.doSingleThreadedBatchDelete(S3NativeFileSystem.java:1372)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.delete(S3NativeFileSystem.java:663)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.delete(EmrFileSystem.java:332)
at org.apache.spark.internal.io.FileCommitProtocol.deleteWithJob(FileCommitProtocol.scala:124)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions(InsertIntoHadoopFsRelationCommand.scala:223)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:122)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:557)
... 55 elided

You're using SaveMode.Overwrite and the error line com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.deleteObjects(AmazonS3LiteClient.java:127) indicates a problem during the deletion operation of the overwrite.
I would check and make sure the S3 permissions in the IAM policy for your EMR EC2 instance profile allow the s3:DeleteObject action for the file path in your call to write Parquet. It should look something like this:
{
"Sid": "AllowWriteAccess",
"Action": [
"s3:DeleteObject",
"s3:Get*",
"s3:List*",
"s3:PutObject"
],
"Effect": "Allow",
"Resource": [
"<arn_for_your_filepath>/*"
]
}
In between jobs do you use different file paths in your call to write Parquet? If so then that would explain the intermittent job failures

Looks like a bug in the AWS code. That is closed source -you have to take it up with them.
I do see a hint that this is an error in the code trying to parse error responses. Maybe something has failed, but the code on the client to pass that error response is buggy. Isn't that unusual-it's the failure handling that rarely gets enough test coverage

Related

Error using "append" mode with Pyspark saveAsTable method

I'm able to write my dataframe as a hive table this way:
mydf.write.format("parquet").saveAsTable("mydb.mytable")
But when I'm trying to append the same data in the same table using "append" mode like this:
mydf.write.mode("append").format("parquet").saveAsTable("mydb.mytable")
I get an error:
py4j.protocol.Py4JJavaError: An error occurred while calling o106.saveAsTable.
: java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
at scala.collection.IterableLike$class.head(IterableLike.scala:107)
at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:48)
at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)
at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:48)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$18.apply(DataSource.scala:466)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$18.apply(DataSource.scala:463)
at scala.Option.map(Option.scala:146)
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:463)
at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:516)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:216)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:166)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:458)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:437)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
No idea why I'm getting this. Please help
Thanks

It's worth to make sure the table has been created before appending new records by a simple SQL code:
sqlContext.sql("SELECT * FROM mydb.mytable")
In addition, there is no need to set the file format as it has been defined already. So if you update your code to the following line it should work smoothly.
mydf.write.mode("append").saveAsTable("mydb.mytable")
Update:
I tested the same scenario without getting an error message. These are the steps I took:
Code and result screenshot
You could have access to the Google colab notebook that I just created for this test.
I hope this help

I had the same problem and I solved it by setting the "header" option to "true".
Example:
df.write.format("orc").option("header", "true").mode("append").insertInto("db.table")
Hope it helps someone in the future

How to store a spark DataFrame as CSV into Azure Blob Storage

I'm trying to store a Spark DataFrame as a CSV on Azure Blob Storage from a local Spark cluster
First, I set the config with the Azure Account/Account Key (I'm not sure what is the proper config so I've set all those)
sparkContext.getConf.set(s"fs.azure.account.key.${account}.blob.core.windows.net", accountKey)
sparkContext.hadoopConfiguration.set(s"fs.azure.account.key.${account}.dfs.core.windows.net", accountKey)
sparkContext.hadoopConfiguration.set(s"fs.azure.account.key.${account}.blob.core.windows.net", accountKey)
Then I try to store the CSV with the following
filePath = s"wasbs://${container}#${account}.blob.core.windows.net/${prefix}/${filename}"
dataFrame.coalesce(1)
.write.format("csv")
.options(Map(
"header" -> (if (hasHeader) "true" else "false"),
"sep" -> delimiter,
"quote" -> quote
))
.save(filePath)
But then this fails with Job aborted and the following stack trace
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:196)
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
But when I look in the blob container, I can see my file however I cannot read it back in a Spark DataFrame, I get this error Unable to infer schema for CSV. It must be specified manually.; and following stack trace
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)
scala.Option.getOrElse(Option.scala:121)
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:184)
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
It seems that the problem was already reported on Databricks forum!!
What is the proper way to store a DataFrame on Azure Blob?

It turns out way before the job fails there was an internal error
Caused by: java.lang.NoSuchMethodError: com.microsoft.azure.storage.blob.CloudBlob.startCopyFromBlob(Ljava/net/URI;Lcom/microsoft/azure/storage/AccessCondition;Lcom/microsoft/azure/storage/AccessCondition;Lcom/microsoft/azure/storage/blob/BlobRequestOptions;Lcom/microsoft/azure/storage/OperationContext;)Ljava/lang/String;
at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobWrapperImpl.startCopyFromBlob(StorageInterfaceImpl.java:399)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2449)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2372)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.restoreKey(NativeAzureFileSystem.java:918)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.close(NativeAzureFileSystem.java:819)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:320)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at com.univocity.parsers.common.AbstractWriter.close(AbstractWriter.java:876)
... 18 more
What's happening is after creating a temp file with the actual data, it's trying to move the file to the location given by the user using CloudBlob.startCopyFromBlob. Like always, microsft people broke this by renaming this method to CloudBlob.startCopy.
I'm using "org.apache.hadoop" % "hadoop-azure" % "3.2.1" which is most recent for "hadoop-azure" and it seems to have stuck with the older startCopyFromBlob, so
I need to use an old azure-storage version that has this method, probably 2.x.x.
See https://github.com/Azure/azure-storage-java/issues/113

IOException while overwriting parquet

I have parquet file let's say file name abc/A.parquet and few records are filtered out based on certain condition and create DF and I am trying overwrite file with resulted filtered DF with saveMode overwrite option, but throwing below exception:
command used to overwrite
filterDF.coalesce(1).write.mode("overwrite").parquet("file:/home/psub2/cls_parquet2/file:/home/psub7/abc/A.parquet")
failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:381)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File file:/home/psub7/abc/A.parquet does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
Pls help, Thanks in advance

Conceptually you can't read and write dataframe from same file. IOException thrown when you try to read df from file A and try write same df into the same file A. You can overwrite A parquet file only if you didn't read dataframe from file A.
For example you can read dataframe from file A and overwrite file B.

Intellij "Evaluate Expression" results in NoSuchMethodException

I am trying to inspect the contents of a moderately large Scala map of objects in IntelliJ while debugging an application. I enter the following in the "Evaluate" dialog: prices.get.keys.toList.filter(k => k.startsWith("GC")), where prices is Future[Map[String, SomeObject]]. I have tried directly on the Iterable without the toList collection with the same results.
I get the following exception:
Error during generated code invocation:
com.intellij.debugger.engine.evaluation.EvaluateException: Method threw 'java.lang.NoSuchMethodError' exception. The top of the stack trace is simply the line on which I have the breakpoint.
Anyone else run into this? If so, is there a workaround?
The stack trace is as follows:
c.p.p.e.u.ClassUnderTest$$anonfun$4$GeneratedEvaluatorClass$10$1.invoke(FileToCompile1993.scala:85)
c.p.p.e.u.ClassUnderTest$$anonfun$4.apply(ClassUnderTest.scala:81)
c.p.p.e.u.ClassUnderTest$$anonfun$4.apply(ClassUnderTest.scala:76)
scala.collection.immutable.HashSet$HashSet1.filter0(HashSet.scala:313)
scala.collection.immutable.HashSet$HashTrieSet.filter0(HashSet.scala:929)
scala.collection.immutable.HashSet$HashTrieSet.filter0(HashSet.scala:929)
scala.collection.immutable.HashSet.filter(HashSet.scala:167)
scala.collection.immutable.HashSet.filter(HashSet.scala:35)
c.p.p.e.u.ClassTest.get(ClassTest.scala:76)
c.p.p.e.u.ClassTest.$$anonfun$1.apply$mcV$sp(ClassTest.scala:35)
c.p.p.e.u.ClassTest$$anonfun$1.apply(ClassTest.scala:22)
c.p.p.e.u.ClassTest$$anonfun$1.apply(ClassTest.scala:22)
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
org.scalatest.Transformer.apply(Transformer.scala:22)
org.scalatest.Transformer.apply(Transformer.scala:20)
org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1647)
org.scalatest.Suite$class.withFixture(Suite.scala:1122)
org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1683)
org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1644)
org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1656)
org.scalatest.FlatSpec.runTest(FlatSpec.scala:1683)
org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
scala.collection.immutable.List.foreach(List.scala:381)
org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:390)
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:427)
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
scala.collection.immutable.List.foreach(List.scala:381)
org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
org.scalatest.FlatSpecLike$class.runTests(FlatSpecLike.scala:1714)
org.scalatest.FlatSpec.runTests(FlatSpec.scala:1683)
org.scalatest.Suite$class.run(Suite.scala:1424)
org.scalatest.FlatSpec.org$scalatest$FlatSpecLike$$super$run(FlatSpec.scala:1683)
org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760)
org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760)
org.scalatest.SuperEngine.runImpl(Engine.scala:545)
org.scalatest.FlatSpecLike$class.run(FlatSpecLike.scala:1760)
org.scalatest.FlatSpec.run(FlatSpec.scala:1683)
org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
scala.collection.immutable.List.foreach(List.scala:381)
org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
org.scalatest.tools.Runner$.run(Runner.scala:883)
org.scalatest.tools.Runner.run(Runner.scala)
org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:131)
org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:67)
I have updated the Scala plugin to the latest release 2018.2.9 and am on the latest release of IntelliJ IDEA (from the About dialog):
IntelliJ IDEA 2018.2 (Ultimate Edition)
Build #IU-182.3684.101, built on July 24, 2018

Scala debugger has some issues with the evaluation of complex expressions (with lambdas, anonymous classes, classes declaration, etc). One of them https://youtrack.jetbrains.com/issue/SCL-14194 looks a cause of your problem because the debugger stopped in a lambda c.p.p.e.u.ClassUnderTest$$anonfun$4.apply(ClassUnderTest.scala:81) and the expression contains a lambda function.
The reason is IDEA compiles the expression before evaluation and it does not always capture a context properly. As a workaround, you can try to evaluate the expression from other debugger stops, for example, a body of a usual method.

Elasticsearch Scripts to add element into an array

I am working on ElasticSearch in a scala project. I am using elastic4s as the client. I am trying to add elements to a document, from an iterator one by one.
while (iterator.hasNext) {
counter +=1
client.execute {
update id reportID in "reports/report" script "ctx._source.elasticData += output" params Map("output" -> iterator.next().toStringifiedJson)
}.await
}
The above code does not work yielding the following error:
[ERROR] [03/06/2015 14:44:23.515] [SparkActorSystem-akka.actor.default-dispatcher-5] [akka://SparkActorSystem/user/spark-actor] failed to execute script
org.elasticsearch.ElasticsearchIllegalArgumentException: failed to execute script
at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:189)
at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:176)
at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:170)
at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$1.run(TransportInstanceSingleOperationAction.java:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: script_lang not supported [groovy]
at org.elasticsearch.script.ScriptService.dynamicScriptEnabled(ScriptService.java:521)
at org.elasticsearch.script.ScriptService.verifyDynamicScripting(ScriptService.java:398)
[ERROR] [03/06/2015 14:44:23.515] [SparkActorSystem-akka.actor.default-dispatcher-5] [akka://SparkActorSystem/user/spark-actor] failed to execute script
org.elasticsearch.ElasticsearchIllegalArgumentException: failed to execute script
at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:189)
at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:176)
at org.elasticsearch.action.update.TransportUpdateAction.shardOperation(TransportUpdateAction.java:170)
at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$1.run(TransportInstanceSingleOperationAction.java:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: script_lang not supported [groovy]
at org.elasticsearch.script.ScriptService.dynamicScriptEnabled(ScriptService.java:521)
at org.elasticsearch.script.ScriptService.verifyDynamicScripting(ScriptService.java:398)
at org.elasticsearch.script.ScriptService.compile(ScriptService.java:363)
at org.elasticsearch.script.ScriptService.executable(ScriptService.java:503)
at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:183)
... 6 moreat org.elasticsearch.script.ScriptService.compile(ScriptService.java:363)
at org.elasticsearch.script.ScriptService.executable(ScriptService.java:503)
at org.elasticsearch.
action.update.UpdateHelper.prepare(UpdateHelper.java:183)
... 6 more
The problems is with the script I assume but I could not find any solution. Please help...

Does adding groovy dependency solve the problem? Please see that: gist.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Strange error while writing parquet file to s3 - scala

Related

Error using "append" mode with Pyspark saveAsTable method

How to store a spark DataFrame as CSV into Azure Blob Storage

IOException while overwriting parquet

Intellij "Evaluate Expression" results in NoSuchMethodException

Elasticsearch Scripts to add element into an array

Categories

Resources