Scala. Eclipse. SBT crashes on pattern matching and say nothing. SBT bug? - scala

Sorry for a long message - a hope, you'll read it.
I started to do my course work on Scala. It is FTP server. It is using patter-matching for command recognizing. I write my code in Eclipse. I think, SBT have a bug, but I don't know where to say about it.
Description:
Long compile time. After compilation there is no any error in IDE. But there is next line in "Problems" panel:
The SBT builder crashed while compiling your project. This is a bug in
the Scala compiler or SBT. Check the Erorr Log for details. The error
message is: ch.epfl.lamp.fjbg.JCode$OffsetTooBigException: offset too
big to fit in 16 bits: 38838 FTPDaemon Unknown Scala Problem
But program starts and works. Next error in run-time:
Exception in thread "main" java.lang.ClassFormatError: Truncated class
file at java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:787) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:447) at
java.net.URLClassLoader.access$100(URLClassLoader.java:71) at
java.net.URLClassLoader$1.run(URLClassLoader.java:361) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:423) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:356) at
org.egslava.ftp.ControlConnection.(ControlConnection.scala:14)
at org.egslava.ftp.Main$.main(Main.scala:38) at
org.egslava.ftp.Main.main(Main.scala)
This error is showed when I do
new ControlConnection().start();
ControlConnection use variable currentState (pointer to abstract class FtpState). Current state may be instance of DoLogin class or WaitForCommandsState class.
WaitForCommands contains next block of code:
message match{
case owner.Noop() => "200 NOOP ok\r\n";
case owner.User(_) => "530 Can't change from guest user\r\n";
case owner.Pass(_) => "230 Already logged in\r\n";
case owner.Pasv() => pasv() + "\r\n"; case owner.List() => list() + "\r\n";
case "condition" => "error";
case owner.Nlst() => nlst() + "\r\n";
case owner.TypeCMD() => ""//"200 Switching to binary mode\r\n";
case "PWD" => "257 \"" + currentDirectory + "\"\r\n";
case "SITE HELP" => "200-\r\n200\r\n";
case owner.Cwd(path) => "250 Directory successfuly changed\r\n";
case "condition2" => "error2"
case unrecognizedCommand => "500 Unrecognized command " + unrecognizedCommand + "\r\n";
}
If I comment string (any):
case "condition" => "error";
or
case "condition2" => "error2"
The error will disappear.
What is it?

There algorithm for compilation of pattern matching will fail on overly large case statements, by producing bytecode longer than the maximum allowed for a JVM class. As I understand it, this is to be fixed in Scala 2.10.

Related

spark: SAXParseException while writing to parquet on s3

I'm trying to read in some json, infer a schema, and write it out again as parquet to s3 (s3a). For some reason, about a third of the way through the writing portion of the run, spark always errors out with the error included below. I can't find any obvious reasons for the issue: it isn't out of memory; there are no long GC pauses. There don't seem to be any additional error messages in the logs of the individual executors.
The script runs fine on another set of data that I have, which is of a very similar structure, but several orders of magnitude smaller.
I am running spark 2.0.1-hadoop-2.7 and am using the FileOutputCommitter. The algorithm version doesn't seem to matter.
Edit:
This does not appear to be a problem in badly formed json or corrupted files. I have unzipped and read in each file individually with no error.
Here's a simplified version of the script:
object Foo {
def parseJson(json: String): Option[Map[String, Any]] = {
if (json == null)
Some(Map())
else
parseOpt(json).map((j: JValue) => j.values.asInstanceOf[Map[String, Any]])
}
}
}
// read in as text and parse json using json4s
val jsonRDD: RDD[String] = sc.textFile(inputPath)
.map(row -> Foo.parseJson(row))
// infer a schema that will encapsulate the most rows in a sample of size sampleRowNum
val schema: StructType = Infer.getMostCommonSchema(sc, jsonRDD, sampleRowNum)
// get documents compatibility with schema
val jsonWithCompatibilityRDD: RDD[(String, Boolean)] = jsonRDD
.map(js => (js, Infer.getSchemaCompatibility(schema, Infer.inferSchema(js)).toBoolean))
.repartition(partitions)
val jsonCompatibleRDD: RDD[String] = jsonWithCompatibilityRDD
.filter { case (js: String, compatible: Boolean) => compatible }
.map { case (js: String, _: Boolean) => js }
// create a dataframe from documents with compatible schema
val dataFrame: DataFrame = spark.read.schema(schema).json(jsonCompatibleRDD)
It completes the earlier schema inferring steps successfully. The error itself occurs on the last line, but I suppose that could encompass at least the immediately preceding statemnt, if not earlier:
org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to commit task
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:275)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:257)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1345)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258)
... 8 more
Suppressed: java.lang.NullPointerException
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:147)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetFileFormat.scala:569)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$abortTask$1(WriterContainer.scala:282)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$2.apply$mcV$sp(WriterContainer.scala:258)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1354)
... 9 more
Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler). Response Code: 200, Response Text: OK
at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:738)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:399)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3480)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:604)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:962)
at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:1147)
at org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:1136)
at org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:142)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:400)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:117)
at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetFileFormat.scala:569)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:267)
... 13 more
Caused by: com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:150)
at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:279)
at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:75)
at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:72)
at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:712)
... 29 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2; XML document structures must start and end within the same entity.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.endEntity(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl.endEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.endEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.skipChar(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:141)
... 35 more
Here's my conf:
spark.executor.extraJavaOptions -XX:+UseG1GC -XX:MaxPermSize=1G -XX:+HeapDumpOnOutOfMemoryError
spark.executor.memory 16G
spark.executor.uri https://s3.amazonaws.com/foo/spark-2.0.1-bin-hadoop2.7.tgz
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.buffer.dir /raid0/spark
spark.hadoop.fs.s3n.buffer.dir /raid0/spark
spark.hadoop.fs.s3a.connection.timeout 500000
spark.hadoop.fs.s3n.multipart.uploads.enabled true
spark.hadoop.parquet.block.size 2147483648
spark.hadoop.parquet.enable.summary-metadata false
spark.jars.packages com.databricks:spark-avro_2.11:3.0.1
spark.local.dir /raid0/spark
spark.mesos.coarse false
spark.mesos.constraints priority:1
spark.network.timeout 600
spark.rpc.message.maxSize 500
spark.speculation false
spark.sql.parquet.mergeSchema false
spark.sql.planner.externalSort true
spark.submit.deployMode client
spark.task.cpus 1
I can think for three possible reasons for this problem.
JVM version. AWS SDK checks for the following ones. "1.6.0_06",
"1.6.0_13", "1.6.0_17", "1.6.0_65", "1.7.0_45". If you are using one
of them, try upgrading.
Old AWS SDK. Refer to
https://github.com/aws/aws-sdk-java/issues/460 for a workaround.
If you lots of files in the directory where you are writing these files, you might be hitting https://issues.apache.org/jira/browse/HADOOP-13164. Consider increasing the timeout to larger values.
A SAXParseException may indicate a badly formatted XML file. Since the job fails roughly a third of the way through consistently, this means it's probably failing in the same place every time (a file whose partition is roughly a third of the way through the partition list).
Can you paste your script? It may be possible to wrap the Spark step in a try/catch loop that will print out the file if this error occurs, which will let you easily zoom in on the problem.
From the logs:
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2; XML document structures must start and end within the same entity.
and
Caused by: com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
It looks like you have a corrupted/incorrectly formatted file, and your error is actually occurring during the read portion of the task. You could confirm this by trying another operation that will force the read such as count().
If confirmed, the goal would then be to find the corrupted file. You could do this by listing the s3 files, sc.parallelize() that list, and then trying to read the files in a custom function using map().
import boto3
from pyspark.sql import Row
def scanKeys(startKey, endKey):
bucket = boto3.resource('s3').Bucket('bucketName')
for obj in bucket.objects.filter(Prefix='prefix', Marker=startKey):
if obj.key < endKey:
yield obj.key
else:
return
def testFile(s3Path):
s3obj = boto3.resource('s3').Object(bucket_name='bucketName', key=key)
body = s3obj.get()['Body']
...
logic to test file format, or use a try/except and attempt to parse it
...
if fileFormatedCorrectly == True:
return Row(status='Good', key = s3Path)
else:
return Row(status='Fail', key = s3Path)
keys = list(scanKeys(startKey, endKey))
keyListRdd = sc.parallelize(keys, 1000)
keyListRdd.map(testFile).filter(lambda x: x.asDict.get('status') == 'Fail').collect()
This will return the s3 paths for the incorrectly formatted files
For Googlers:
If you:
have a versioned bucket
use s3a://
see ListBucketHandler and listObjects in your error message
Quick solution:
use s3:// instead of s3a://, which will use S3 driver provided by EMR
You may see this error because s3a:// in older versions uses S3::ListObjects (v1) API instead of S3::ListObjectsV2. The former would return extra info like owner, and is not robust against large number of deletion markers. Newer versions of the s3a:// driver solved this problem, but you could always use the s3:// driver instead.
Quote:
the V1 list API experience always returns 5000 entries (as set in fs.s3a.paging.maximum
except for the final entry
if you have versioning turned on in your bucket, deleted entries retain tombstone markers with references to their versions
which will surface in the S3-side of list calls, but get stripped out from the response
so...for a very large tree, you may end up S3 having to keep a channel open while is skips of thousands to millions of deleted
objects before it can find actual ones to return.
which can time out connections.
Quote:
Introducing a new version of the ListObjects (ListObjectsV2) API that allows listing objects with a large number of delete markers.
Quote:
If there are thousands of delete markers, the list operation might timeout。

akka.actor.ActorLogging does not log the stack trace of exception by logback

I am using Logback + SLF4J to do logging for those actors with trait of akka.actor.ActorLogging. However, when I do the code log.error("Error occur!", e), the stack trace of the exception e is not logged, but only print a line of Error occur! WARNING arguments left: 1. I wonder why and how to print the stack trace in the log file. Thank you. The following is my logback.groovy file configuration.
appender("FILE", RollingFileAppender) {
file = "./logs/logd.txt"
append = true
rollingPolicy(TimeBasedRollingPolicy) {
fileNamePattern = "./logs/logd.%d{yyyy-MM-dd}.log"
maxHistory = 30
}
encoder(PatternLayoutEncoder) {
pattern = "%date{ISO8601} [%thread] %-5level %logger{36} %X{sourceThread} - %msg%n"
}
}
root(DEBUG, ["FILE"])
Akka has separate logging, which is configured in Akka's application.conf. If you want bridge to SLF4J/Logback - use thеsе settings:
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "DEBUG"
}
See: http://doc.akka.io/docs/akka/2.0/scala/logging.html
As far as I can see here, reason (Throwable) should be the first argument of log.error:
def error(cause: Throwable, message: String)
That's why you see "WARNING arguments left" - your Throwable argument was just ignored.
The 'cause' exception should be the first argument to error, not the second (as correctly mentioned by JasonG in a comment on another answer).
Using the Akka log system instead of 'bare' scala-logging has some advantages around automatically added metadata and easier testing/filtering.
See also:
http://doc.akka.io/docs/akka/2.4.16/scala/logging.html
http://doc.akka.io/api/akka/2.4/akka/event/LoggingAdapter.html#error(cause:Throwable,message:String):Unit

scala nsc IMain bind() speed and memory issues

We are using tools.nsc.interpreter.IMain's bind() and interpret() method to execute scala scripts on a server. This is on on scala 2.9.1 and Java 7u2.
After repeatedly using the same IMain instance, the bind() methods suddenly starts to take very long time (5-6 seconds and even longer). I have tried close() reset() but nothing helps. Weird thing is that the sudden slowness occurs after several uses.
Code snippet (that is executed over and over again):
main.bind("status", status)
try {
main.interpret(prepare(restriction, input))
} catch {
case e: Exception =>
status.setCode("ERR6")
status.setSummary("Error Interpreting Restriction")
status.setType(MetaFileElements.ERROR_VALUE)
status.setValue("Restriction: \"" + restriction + "\", Input: \"" + input + "\"")
}
Another Issue is evetually the process crashes with this error:
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
at java.lang.ClassLoader.findBootstrapClass(Native Method)
at java.lang.ClassLoader.findBootstrapClassOrNull(ClassLoader.java:1061)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at scala.tools.nsc.util.Exceptional$.unwrap(Exceptional.scala:140)
at scala.tools.nsc.interpreter.IMain$Request$$anonfun$handleException$1$1.apply(IMain.scala:821)
at scala.tools.nsc.interpreter.IMain$Request$$anonfun$handleException$1$1.apply(IMain.scala:818)
at scala.tools.nsc.interpreter.IMain$$anonfun$withoutBindingLastException$2.apply(IMain.scala:228)
at scala.util.control.Exception$Catch.apply(Exception.scala:88)
at scala.tools.nsc.interpreter.IMain.withoutBindingLastException(IMain.scala:226)
at scala.tools.nsc.interpreter.IMain$Request.handleException$1(IMain.scala:818)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:838)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:471)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:503)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:468)
at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:525)
at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:544)
at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:545)
at com.nomura.fi.spg.kozo.meta.client.helper.RestrictionsHelper$.execute(RestrictionsHelper.scala:22)

Scala Object Serialization

I am having trouble opening multiple objects that were serialized into a single .bin file. Right now, I can only get one object to be read in when I attempt to open the file. After the first object is read, the error message is displayed (and no further objects are read). My code looks like the following:
val ois = new ObjectInputStream(new BufferedInputStream(newFileInputStream(chooser.selectedFile)))
val toRead:Int = ois.readInt()
for (i <- 0 to toRead) {
ois.readObject() match {
case anObject : myObject =>
aMutableBuffer += anObject
case _ =>
}
ois.close()
}
The error that I am getting is a lot of the following:
Exception in thread "AWT-EventQueue-0" java.security.PrivilegedActionException:java.security.PrivilegedActionException: java.io.IOException: Stream closed
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:649)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:296)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:211)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:201)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:196)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:188)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)
Caused by: java.security.PrivilegedActionException: java.io.IOException: Stream closed
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:98)
at java.awt.EventQueue$2.run(EventQueue.java:652)
at java.awt.EventQueue$2.run(EventQueue.java:650)
... 9 more
Caused by: java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
at java.io.BufferedInputStream.read(BufferedInputStream.java:241)
at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
atABCk.PhotoshopApp$$anonfun$ABCPhotoshopApp$$fileOpenPicture$1.apply$mcVI$sp(PhotoshopApp.scala:32)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:75)
at ABC.PhotoshopApp$.ABC$PhotoshopApp$$fileOpenPicture(PhotoshopApp.scala:31)
at ABC.PhotoshopApp$$anon$9$$anon$11$$anon$1$$anonfun$2.apply$mcV$sp(PhotoshopApp.scala:111)
at scala.swing.Action$$anon$2.apply(Action.scala:60)
at scala.swing.Action$$anon$1.actionPerformed(Action.scala:78)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2028)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2351)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
at javax.swing.AbstractButton.doClick(AbstractButton.java:389)
at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:809)
at com.apple.laf.AquaMenuItemUI.doClick(AquaMenuItemUI.java:137)
at javax.swing.plaf.basic.BasicMenuItemUI$Handler.menuDragMouseReleased(BasicMenuItemUI.java:913)
at javax.swing.JMenuItem.fireMenuDragMouseReleased(JMenuItem.java:568)
at javax.swing.JMenuItem.processMenuDragMouseEvent(JMenuItem.java:465)
at javax.swing.JMenuItem.processMouseEvent(JMenuItem.java:411)
at javax.swing.MenuSelectionManager.processMouseEvent(MenuSelectionManager.java:305)
at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:852)
at java.awt.Component.processMouseEvent(Component.java:6373)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3267)
at java.awt.Component.processEvent(Component.java:6138)
at java.awt.Container.processEvent(Container.java:2085)
at java.awt.Component.dispatchEventImpl(Component.java:4735)
at java.awt.Container.dispatchEventImpl(Container.java:2143)
at java.awt.Component.dispatchEvent(Component.java:4565)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4621)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4282)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4212)
at java.awt.Container.dispatchEventImpl(Container.java:2129)
at java.awt.Window.dispatchEventImpl(Window.java:2478)
at java.awt.Component.dispatchEvent(Component.java:4565)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:679)
at java.awt.EventQueue.access$000(EventQueue.java:85)
at java.awt.EventQueue$1.run(EventQueue.java:638)
at java.awt.EventQueue$1.run(EventQueue.java:636)
... 14 more
This only happens when I read in an object to my buffer that keeps track of the objects. Moreover, I am able to save the file correctly (as I have done tests to ensure everything got there). Anyone have any ideas what is going on here?
You are closing your Stream ois at the end, but inside the loop. Then you try to read from it on the first line of the loop. Which obviously failes with a IOException: Stream closed

Casbah's problem with large number of returned objects

Casbah (or the java driver for mongodb) seems to have problem dealing with a large number of returned objects. For example, the following code segment would produce an IllegalArgumentException and won't return a single result (full stack trace below). However, if I reduce the "limit(...)" to 1994, everything seems to work fine.
for (link <- links; query = link $exists true) {
val group = new HashMap[String, Set[(String, String)]] with MultiMap[String, (String, String)]
log.find(query, fieldsToGet.result).limit(1996) foreach {
x => {
group.addBinding(x.get(link).toString, (x.get("_id").toString(), x.get("eventType").toString))
}
}
allGroups += link -> group
}
Apr 26, 2011 8:23:40 PM com.mongodb.DBTCPConnector$MyPort error
SEVERE: MyPort.error called
java.lang.IllegalArgumentException: response too long: 1278031173
at com.mongodb.Response.<init>(Response.java:40)
at com.mongodb.DBPort.go(DBPort.java:101)
at com.mongodb.DBPort.go(DBPort.java:66)
at com.mongodb.DBPort.call(DBPort.java:56)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:266)
at com.mongodb.DBCursor._check(DBCursor.java:309)
at com.mongodb.DBCursor._hasNext(DBCursor.java:431)
at com.mongodb.DBCursor.hasNext(DBCursor.java:456)
at com.mongodb.casbah.MongoCursorBase$class.hasNext(MongoCursor.scala:72)
at com.mongodb.casbah.MongoCursor.hasNext(MongoCursor.scala:517)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at com.mongodb.casbah.MongoCursor.foreach(MongoCursor.scala:517)
at Sequencer$$anonfun$3.apply(Sequencer.scala:23)
at Sequencer$$anonfun$3.apply(Sequencer.scala:20)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
at scala.collection.immutable.List.foreach(List.scala:45)
at Sequencer$.<init>(Sequencer.scala:20)
at Sequencer$.<clinit>(Sequencer.scala)
at Sequencer.main(Sequencer.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)
Exception in thread "main" java.lang.ExceptionInInitializerError
at Sequencer.main(Sequencer.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)
Caused by: java.lang.IllegalArgumentException: response too long: 1278031173
at com.mongodb.Response.<init>(Response.java:40)
at com.mongodb.DBPort.go(DBPort.java:101)
at com.mongodb.DBPort.go(DBPort.java:66)
at com.mongodb.DBPort.call(DBPort.java:56)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:266)
at com.mongodb.DBCursor._check(DBCursor.java:309)
at com.mongodb.DBCursor._hasNext(DBCursor.java:431)
at com.mongodb.DBCursor.hasNext(DBCursor.java:456)
at com.mongodb.casbah.MongoCursorBase$class.hasNext(MongoCursor.scala:72)
at com.mongodb.casbah.MongoCursor.hasNext(MongoCursor.scala:517)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at com.mongodb.casbah.MongoCursor.foreach(MongoCursor.scala:517)
at Sequencer$$anonfun$3.apply(Sequencer.scala:23)
at Sequencer$$anonfun$3.apply(Sequencer.scala:20)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
at scala.collection.immutable.List.foreach(List.scala:45)
at Sequencer$.<init>(Sequencer.scala:20)
at Sequencer$.<clinit>(Sequencer.scala)
... 6 more
Seems the exception was produced by the following check in the "Response.java" in the java driver.
ByteArrayInputStream bin = new ByteArrayInputStream( b );
_len = Bits.readInt( bin );
if ( _len > ( 32 * 1024 * 1024 ) )
throw new IllegalArgumentException( "response too long: " + _len );
Could it be caused by that particular object returned? or could this be about casbah?
Thanks,
Derek
It looks like the Java driver is checking to see if the current response block is greater than 32 Megabytes and then throwing the exception.
If you set the batchSize(FEWER_NUMBER_OF_DOCS) on the cursor, this will reduce the lock time in the database and return less than 32 MB worth of data.
I would play around with the batchSize to see what is optimal for your application.
http://api.mongodb.org/scala/casbah/2.1.2/scaladoc/
The max should probably be increased in the Java driver.
The strange part about your response is that it says it is returning ~ 1.19 GB worth of data.
If your response doesn't have that much data, it may indicate the collection is corrupt.