Error using "append" mode with Pyspark saveAsTable method - pyspark

I'm able to write my dataframe as a hive table this way:
mydf.write.format("parquet").saveAsTable("mydb.mytable")
But when I'm trying to append the same data in the same table using "append" mode like this:
mydf.write.mode("append").format("parquet").saveAsTable("mydb.mytable")
I get an error:
py4j.protocol.Py4JJavaError: An error occurred while calling o106.saveAsTable.
: java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
at scala.collection.IterableLike$class.head(IterableLike.scala:107)
at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:48)
at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)
at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:48)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$18.apply(DataSource.scala:466)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$18.apply(DataSource.scala:463)
at scala.Option.map(Option.scala:146)
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:463)
at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:516)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:216)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:166)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:458)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:437)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
No idea why I'm getting this. Please help
Thanks

It's worth to make sure the table has been created before appending new records by a simple SQL code:
sqlContext.sql("SELECT * FROM mydb.mytable")
In addition, there is no need to set the file format as it has been defined already. So if you update your code to the following line it should work smoothly.
mydf.write.mode("append").saveAsTable("mydb.mytable")
Update:
I tested the same scenario without getting an error message. These are the steps I took:
Code and result screenshot
You could have access to the Google colab notebook that I just created for this test.
I hope this help

I had the same problem and I solved it by setting the "header" option to "true".
Example:
df.write.format("orc").option("header", "true").mode("append").insertInto("db.table")
Hope it helps someone in the future

Related

How to store a spark DataFrame as CSV into Azure Blob Storage

I'm trying to store a Spark DataFrame as a CSV on Azure Blob Storage from a local Spark cluster
First, I set the config with the Azure Account/Account Key (I'm not sure what is the proper config so I've set all those)
sparkContext.getConf.set(s"fs.azure.account.key.${account}.blob.core.windows.net", accountKey)
sparkContext.hadoopConfiguration.set(s"fs.azure.account.key.${account}.dfs.core.windows.net", accountKey)
sparkContext.hadoopConfiguration.set(s"fs.azure.account.key.${account}.blob.core.windows.net", accountKey)
Then I try to store the CSV with the following
filePath = s"wasbs://${container}#${account}.blob.core.windows.net/${prefix}/${filename}"
dataFrame.coalesce(1)
.write.format("csv")
.options(Map(
"header" -> (if (hasHeader) "true" else "false"),
"sep" -> delimiter,
"quote" -> quote
))
.save(filePath)
But then this fails with Job aborted and the following stack trace
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:196)
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
But when I look in the blob container, I can see my file however I cannot read it back in a Spark DataFrame, I get this error Unable to infer schema for CSV. It must be specified manually.; and following stack trace
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)
scala.Option.getOrElse(Option.scala:121)
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:184)
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
It seems that the problem was already reported on Databricks forum!!
What is the proper way to store a DataFrame on Azure Blob?
It turns out way before the job fails there was an internal error
Caused by: java.lang.NoSuchMethodError: com.microsoft.azure.storage.blob.CloudBlob.startCopyFromBlob(Ljava/net/URI;Lcom/microsoft/azure/storage/AccessCondition;Lcom/microsoft/azure/storage/AccessCondition;Lcom/microsoft/azure/storage/blob/BlobRequestOptions;Lcom/microsoft/azure/storage/OperationContext;)Ljava/lang/String;
at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobWrapperImpl.startCopyFromBlob(StorageInterfaceImpl.java:399)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2449)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2372)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.restoreKey(NativeAzureFileSystem.java:918)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsOutputStream.close(NativeAzureFileSystem.java:819)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:320)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at com.univocity.parsers.common.AbstractWriter.close(AbstractWriter.java:876)
... 18 more
What's happening is after creating a temp file with the actual data, it's trying to move the file to the location given by the user using CloudBlob.startCopyFromBlob. Like always, microsft people broke this by renaming this method to CloudBlob.startCopy.
I'm using "org.apache.hadoop" % "hadoop-azure" % "3.2.1" which is most recent for "hadoop-azure" and it seems to have stuck with the older startCopyFromBlob, so
I need to use an old azure-storage version that has this method, probably 2.x.x.
See https://github.com/Azure/azure-storage-java/issues/113

Strange error while writing parquet file to s3

While trying to write a dataframe to S3 I am getting the below error with nullpointerexception. Sometimes the job goes through fine and sometime its failing.
I am using EMR 5.20 and spark 2.4.0
Spark session Creation
val spark = SparkSession.builder
.config("spark.sql.parquet.binaryAsString", "true")
.config("spark.sql.sources.partitionColumnTypeInference.enabled", "false")
.config("spark.sql.parquet.filterPushdown", "true")
.config("spark.sql.parquet.fs.optimized.committer.optimization-enabled","true")
.getOrCreate()
spark.sql("myQuery").write.partitionBy("partitionColumn").mode(SaveMode.Overwrite).option("inferSchema","false").parquet("s3a://...filePath")
Can anyone help resolve this mystery. Thanks in advance
java.lang.NullPointerException
at com.amazon.ws.emr.hadoop.fs.s3.lite.S3Errors.isHttp200WithErrorCode(S3Errors.java:57)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:100)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:184)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.deleteObjects(AmazonS3LiteClient.java:127)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.deleteAll(Jets3tNativeFileSystemStore.java:364)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.doSingleThreadedBatchDelete(S3NativeFileSystem.java:1372)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.delete(S3NativeFileSystem.java:663)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.delete(EmrFileSystem.java:332)
at org.apache.spark.internal.io.FileCommitProtocol.deleteWithJob(FileCommitProtocol.scala:124)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions(InsertIntoHadoopFsRelationCommand.scala:223)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:122)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:557)
... 55 elided
You're using SaveMode.Overwrite and the error line com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.deleteObjects(AmazonS3LiteClient.java:127) indicates a problem during the deletion operation of the overwrite.
I would check and make sure the S3 permissions in the IAM policy for your EMR EC2 instance profile allow the s3:DeleteObject action for the file path in your call to write Parquet. It should look something like this:
{
"Sid": "AllowWriteAccess",
"Action": [
"s3:DeleteObject",
"s3:Get*",
"s3:List*",
"s3:PutObject"
],
"Effect": "Allow",
"Resource": [
"<arn_for_your_filepath>/*"
]
}
In between jobs do you use different file paths in your call to write Parquet? If so then that would explain the intermittent job failures
Looks like a bug in the AWS code. That is closed source -you have to take it up with them.
I do see a hint that this is an error in the code trying to parse error responses. Maybe something has failed, but the code on the client to pass that error response is buggy. Isn't that unusual-it's the failure handling that rarely gets enough test coverage

Intellij "Evaluate Expression" results in NoSuchMethodException

I am trying to inspect the contents of a moderately large Scala map of objects in IntelliJ while debugging an application. I enter the following in the "Evaluate" dialog: prices.get.keys.toList.filter(k => k.startsWith("GC")), where prices is Future[Map[String, SomeObject]]. I have tried directly on the Iterable without the toList collection with the same results.
I get the following exception:
Error during generated code invocation:
com.intellij.debugger.engine.evaluation.EvaluateException: Method threw 'java.lang.NoSuchMethodError' exception. The top of the stack trace is simply the line on which I have the breakpoint.
Anyone else run into this? If so, is there a workaround?
The stack trace is as follows:
c.p.p.e.u.ClassUnderTest$$anonfun$4$GeneratedEvaluatorClass$10$1.invoke(FileToCompile1993.scala:85)
c.p.p.e.u.ClassUnderTest$$anonfun$4.apply(ClassUnderTest.scala:81)
c.p.p.e.u.ClassUnderTest$$anonfun$4.apply(ClassUnderTest.scala:76)
scala.collection.immutable.HashSet$HashSet1.filter0(HashSet.scala:313)
scala.collection.immutable.HashSet$HashTrieSet.filter0(HashSet.scala:929)
scala.collection.immutable.HashSet$HashTrieSet.filter0(HashSet.scala:929)
scala.collection.immutable.HashSet.filter(HashSet.scala:167)
scala.collection.immutable.HashSet.filter(HashSet.scala:35)
c.p.p.e.u.ClassTest.get(ClassTest.scala:76)
c.p.p.e.u.ClassTest.$$anonfun$1.apply$mcV$sp(ClassTest.scala:35)
c.p.p.e.u.ClassTest$$anonfun$1.apply(ClassTest.scala:22)
c.p.p.e.u.ClassTest$$anonfun$1.apply(ClassTest.scala:22)
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
org.scalatest.Transformer.apply(Transformer.scala:22)
org.scalatest.Transformer.apply(Transformer.scala:20)
org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1647)
org.scalatest.Suite$class.withFixture(Suite.scala:1122)
org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1683)
org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1644)
org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1656)
org.scalatest.FlatSpec.runTest(FlatSpec.scala:1683)
org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
scala.collection.immutable.List.foreach(List.scala:381)
org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:390)
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:427)
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
scala.collection.immutable.List.foreach(List.scala:381)
org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
org.scalatest.FlatSpecLike$class.runTests(FlatSpecLike.scala:1714)
org.scalatest.FlatSpec.runTests(FlatSpec.scala:1683)
org.scalatest.Suite$class.run(Suite.scala:1424)
org.scalatest.FlatSpec.org$scalatest$FlatSpecLike$$super$run(FlatSpec.scala:1683)
org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760)
org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760)
org.scalatest.SuperEngine.runImpl(Engine.scala:545)
org.scalatest.FlatSpecLike$class.run(FlatSpecLike.scala:1760)
org.scalatest.FlatSpec.run(FlatSpec.scala:1683)
org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
scala.collection.immutable.List.foreach(List.scala:381)
org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
org.scalatest.tools.Runner$.run(Runner.scala:883)
org.scalatest.tools.Runner.run(Runner.scala)
org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:131)
org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:67)
I have updated the Scala plugin to the latest release 2018.2.9 and am on the latest release of IntelliJ IDEA (from the About dialog):
IntelliJ IDEA 2018.2 (Ultimate Edition)
Build #IU-182.3684.101, built on July 24, 2018
Scala debugger has some issues with the evaluation of complex expressions (with lambdas, anonymous classes, classes declaration, etc). One of them https://youtrack.jetbrains.com/issue/SCL-14194 looks a cause of your problem because the debugger stopped in a lambda c.p.p.e.u.ClassUnderTest$$anonfun$4.apply(ClassUnderTest.scala:81) and the expression contains a lambda function.
The reason is IDEA compiles the expression before evaluation and it does not always capture a context properly. As a workaround, you can try to evaluate the expression from other debugger stops, for example, a body of a usual method.

Problems with classloading when running JUnit plug-in tests in Eclipse

I have a problem when running JUnit Plug-in tests in Eclipse 4.3 and 4.2 and possible other versions. This worked in version 3.6 so I do not know what has changed.
The test are written using JUnit 4 and using parameterized tests (though I have tested with normal tests and the problem is the same). The test cases are written in YAML format and inflated at runtime using SnakeYAML.
When running the test as a normal JUnit test everything works fine, but when running the test as a JUnit Plug-in tests the test fails since it can no longer inflate objects from Yaml using SnakeYAML.
This is the code that reads test cases from YAML
public static ArrayList<StartsWithCommentTestCase> readTests(String filename) {
// set up how Java types match to the YAML file
Constructor testCaseConstructor = new Constructor(StartsWithCommentTestCase.class);
TypeDescription testCaseDesc = new TypeDescription(StartsWithCommentTestCase.class);
testCaseConstructor.addTypeDescription(testCaseDesc);
Yaml yamlParser = new Yaml(testCaseConstructor);
// read all the documents in the test file
String tests = TestUtils.readFile(filename);
ArrayList<StartsWithCommentTestCase> testCases = new ArrayList<StartsWithCommentTestCase>();
for (Object testCase : yamlParser.loadAll(new StringReader(tests))) {
testCases.add((StartsWithCommentTestCase) testCase);
}
return testCases;
}
#Test
public void bla(){
List<StartsWithCommentTestCase> tests = readTests(TEST_FILES_DIR + "starts-with-comment.yaml");
for(StartsWithCommentTestCase test : tests){
boolean actual = ToggleCommentHandler.startsWithComment( test.getLine() );
assertEquals( test.getName(), test.getExpected(), actual );
}
}
And this is the stacktrace that is received when running as a JUnit Plug-in test
Can't construct a java object for tag:yaml.org,2002:pde.test.tests.StartsWithCommentTestCase;exception=Class not found: pde.test.tests.StartsWithCommentTestCase
in "<reader>", line 4, column 1:
name: Empty line
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:325)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:181)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:140)
at org.yaml.snakeyaml.constructor.BaseConstructor.getData(BaseConstructor.java:109)
at org.yaml.snakeyaml.Yaml$1.next(Yaml.java:317)
at pde.test.tests.StartsWithCommentTest.readTests(StartsWithCommentTest.java:86)
at pde.test.tests.StartsWithCommentTest.bla(StartsWithCommentTest.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.pde.internal.junit.runtime.RemotePluginTestRunner.main(RemotePluginTestRunner.java:62)
at org.eclipse.pde.internal.junit.runtime.CoreTestApplication.run(CoreTestApplication.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.eclipse.equinox.internal.app.EclipseAppContainer.callMethodWithException(EclipseAppContainer.java:587)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:198)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:354)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:181)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:636)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:591)
at org.eclipse.equinox.launcher.Main.run(Main.java:1450)
at org.eclipse.equinox.launcher.Main.main(Main.java:1426)
Caused by: org.yaml.snakeyaml.error.YAMLException: Class not found: pde.test.tests.StartsWithCommentTestCase
at org.yaml.snakeyaml.constructor.Constructor.getClassForNode(Constructor.java:625)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.getConstructor(Constructor.java:313)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:323)
... 49 more
The strange thing (for me at least) is that I can make a StartsWithCommentTestCase object with
new StartsWithCommentTestCase()
without any problem also when running as a plugin test. In otherwords the class is avaiable at runtime, but for some reason not available to SnakeYAML.
Any pointers on this would be very helpful :)
Edit 1
The first lines of the YAML file:
#all the lines must have newlines in them since newlines are returned
#by Document.get()
---
name: Empty line
line:
expected: false
---
name: "Line where first character is #"
line: "#
"
expected: true
Edit 2
Added more code to the sample.
It looks like the Junit plugin don't know how to read the SnakeYAML code. Are you sure the class-loader of SnakeYAML is correctly initialized when you run the tests through the plug in?
And how does the mentioned line 4 look in your YAML code? Is it empty?
Maybe the junit plugin uses another jar dependency than your ordinary setup. Which maven binary/path does the junit plugin use? Is this the same as your ordinary one? If you have different m2-configurations or even different repos there will be strange issues.
I found out that this is because I had moved the SnakeYAML dependency to another plugin. SnakeYAML was then part of another plugin with a different class loader than the test plugin.
The solution was to add the following to the META-INF/MANIFEST.MF file in the dependeny plugin
Eclipse-BuddyPolicy: registered
And the following to the META-INF/MANIFEST.MF file in the test plugin
Eclipse-RegisterBuddy: <symbolic name of the dependency plugin>
Now on to see if I can move the SnakeYAML dependency back into a lib/ folder or similar.

Formatting a simple string, but `java.lang.NoSuchMethodError`

I'm using Scala 2.9.2.
Run Scala and test a simple code, this code is… OK:
...
val title = "Hashing file (%s)..." format sizeToStr(file.length)
But I couldn't understand what is what, while I put that code into a simple app, compiling OK, at runtime it throws this one:
java.lang.NoSuchMethodError: scala.Predef$.augmentString(Ljava/lang/String;)Lscala/collection/immutable/StringOps;
at group.pals.penguin.app.shasher.Shasher$.calcHash(shasher.scala:119)
at group.pals.penguin.app.shasher.Shasher$.main(shasher.scala:76)
at group.pals.penguin.app.shasher.Shasher.main(shasher.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:139)
at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:139)
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:28)
at scala.tools.nsc.JarRunner$.run(MainGenericRunner.scala:16)
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:35)
at scala.tools.nsc.JarRunner$.runAndCatch(MainGenericRunner.scala:16)
at scala.tools.nsc.JarRunner$.runJar(MainGenericRunner.scala:28)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:78)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
Could you please help me out? Thanks.
It seems like you're compiling against a different version of the standard library than you're using at runtime.