scala nsc IMain bind() speed and memory issues - scala

We are using tools.nsc.interpreter.IMain's bind() and interpret() method to execute scala scripts on a server. This is on on scala 2.9.1 and Java 7u2.
After repeatedly using the same IMain instance, the bind() methods suddenly starts to take very long time (5-6 seconds and even longer). I have tried close() reset() but nothing helps. Weird thing is that the sudden slowness occurs after several uses.
Code snippet (that is executed over and over again):
main.bind("status", status)
try {
main.interpret(prepare(restriction, input))
} catch {
case e: Exception =>
status.setCode("ERR6")
status.setSummary("Error Interpreting Restriction")
status.setType(MetaFileElements.ERROR_VALUE)
status.setValue("Restriction: \"" + restriction + "\", Input: \"" + input + "\"")
}
Another Issue is evetually the process crashes with this error:
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
at java.lang.ClassLoader.findBootstrapClass(Native Method)
at java.lang.ClassLoader.findBootstrapClassOrNull(ClassLoader.java:1061)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at scala.tools.nsc.util.Exceptional$.unwrap(Exceptional.scala:140)
at scala.tools.nsc.interpreter.IMain$Request$$anonfun$handleException$1$1.apply(IMain.scala:821)
at scala.tools.nsc.interpreter.IMain$Request$$anonfun$handleException$1$1.apply(IMain.scala:818)
at scala.tools.nsc.interpreter.IMain$$anonfun$withoutBindingLastException$2.apply(IMain.scala:228)
at scala.util.control.Exception$Catch.apply(Exception.scala:88)
at scala.tools.nsc.interpreter.IMain.withoutBindingLastException(IMain.scala:226)
at scala.tools.nsc.interpreter.IMain$Request.handleException$1(IMain.scala:818)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:838)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:471)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:503)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:468)
at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:525)
at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:544)
at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:545)
at com.nomura.fi.spg.kozo.meta.client.helper.RestrictionsHelper$.execute(RestrictionsHelper.scala:22)

Related

pyspark on emr with boto3, copy of s3 object result with Futures timed out after [100000 milliseconds]

I have a pyspark application that will transform csv to parquet and before this happen I'm copying some S3 object from a bucket to another.
pyspark with spark 2.4, emr 5.27, maximizeResourceAllocation set to true
I have various csv files size, from 80kb to 500mb.
Nonetheless, my EMR cluster (it doesn't fail on local with spark-submit) fails at 70% completion on a file that is 166mb (a previous at 480mb succeeded).
The job is simple:
def organise_adwords_csv():
s3 = boto3.resource('s3')
bucket = s3.Bucket(S3_ORIGIN_RAW_BUCKET)
for obj in bucket.objects.filter(Prefix=S3_ORIGIN_ADWORDS_RAW + "/"):
key = obj.key
copy_source = {
'Bucket': S3_ORIGIN_RAW_BUCKET,
'Key': key
}
key_tab = obj.key.split("/")
if len(key_tab) < 5:
print("continuing from length", obj)
continue
file_name = ''.join(key_tab[len(key_tab)-1:len(key_tab)])
if file_name == '':
print("continuing", obj)
continue
table = file_name.split("_")[1].replace("-", "_")
new_path = "{0}/{1}/{2}".format(S3_DESTINATION_ORDERED_ADWORDS_RAW_PATH, table, file_name)
print("new_path", new_path) <- the last print will end here
try:
s3.meta.client.copy(copy_source, S3_DESTINATION_RAW_BUCKET, new_path)
print("copy done")
except Exception as e:
print(e)
print("an exception occured while copying")
if __name__=='__main__':
organise_adwords_csv()
print("copy Final done") <- never printed
spark = SparkSession.builder.appName("adwords_transform") \
...
but, in the stdout, no errors / exception are showing.
In stderr logs:
19/10/09 16:16:57 INFO ApplicationMaster: Waiting for spark context initialization...
19/10/09 16:18:37 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
19/10/09 16:18:37 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
)
19/10/09 16:18:37 INFO ShutdownHookManager: Shutdown hook called
I'm completely blind, I don't understand what is failing / why.
How can I figure that out? On local it works like a charm (but super slow of course)
Edit:
After many tries I can confirm that the function:
s3.meta.client.copy(copy_source, S3_DESTINATION_RAW_BUCKET, new_path)
make the EMR cluster timeout, even tho it processed 80% of the files already.
Does anyone have a recommendation about this?
s3.meta.client.copy(copy_source, S3_DESTINATION_RAW_BUCKET, new_path)
This will fail for any source object larger than 5 GB. please use multipart upload in AWS. See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#multipartupload

mongodb allanbank async driver callback being called twice

I have a simple callback bound to a findAsync call. Every few requests I observe a failure even though the data in the database is static.
Some quick debugging shows that my callback is ALWAYS being called twice. The cases where my app fails are caused by a null MongoIterator<Document> being passed in to my callback. Of course I only expected a single callback when the data is ready (or if there was an exception).
Is this expected behavior? Is there something I can do to ensure my callback is only called once when the query is complete?
Here is the code snippet:
collection.findAsync(
[
callback: { MongoIterator<Document> v ->
List data = []
try {
while(v.hasNext()) {
data.add(docToJson(v.next()))
}
} finally {
if (v != null) v.close()
}
sendReply([ status: 'ok', data: data ])
},
exception: { Throwable t ->
sendReply([ status: 'error', message: t.message ])
}
] as Callback<MongoIterator<Document>>,
find
)
Here is the stack trace:
Unexpected MongoDB Connection closed: Auth(MongoDB(43026-->localhost/127.0.0.1:27017)). Will try to reconnect.
Reconnected to localhost/127.0.0.1:27017
Exception in thread "MongoDB 43026<--localhost/127.0.0.1:27017" java.lang.NullPointerException: Cannot invoke method hasNext() on null object
at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:77)
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:45)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:32)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:54)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)
at mongoAsync$_run_closure2_closure6.doCall(mongoAsync.groovy:126)
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:909)
at groovy.lang.Closure.call(Closure.java:411)
at org.codehaus.groovy.runtime.ConvertedMap.invokeCustom(ConvertedMap.java:50)
at org.codehaus.groovy.runtime.ConversionHandler.invoke(ConversionHandler.java:81)
at com.sun.proxy.$Proxy14.callback(Unknown Source)
at com.allanbank.mongodb.client.AbstractReplyCallback.handle(AbstractReplyCallback.java:82)
at com.allanbank.mongodb.client.AbstractValidatingReplyCallback.callback(AbstractValidatingReplyCallback.java:72)
at com.allanbank.mongodb.client.AbstractValidatingReplyCallback.callback(AbstractValidatingReplyCallback.java:33)
at com.allanbank.mongodb.connection.message.ReplyHandler.reply(ReplyHandler.java:77)
at com.allanbank.mongodb.connection.socket.SocketConnection.reply(SocketConnection.java:560)
at com.allanbank.mongodb.connection.socket.SocketConnection$ReceiveRunnable.receiveOne(SocketConnection.java:735)
at com.allanbank.mongodb.connection.socket.SocketConnection$ReceiveRunnable.run(SocketConnection.java:683)
at java.lang.Thread.run(Thread.java:722)
Thanks for the extra information.
I am pretty sure I found the bug.
There is a race between the callback/iterator for the query getting the name of the server handling the query (needed for the GetMore requests) and the receive of a reply. Not sure how it has not been seen by others.
There is a patched jar here: [redact]. Let me know if it solves the problem for you.
The 1.2.2 version is available with the fix: http://www.allanbank.com/mongodb-async-driver/download.html
Assuming it does, I will push this fix out as a 1.2.2 tomorrow.
Rob.
Edit: Add link to official version.

java.net.SocketException: Invalid argument

Error receiving connection: java.net.SocketException: Invalid argument
Exception in thread "Thread-698" java.lang.RuntimeException: Did not
establish input/output at
Handin2.ConnectionHandlerImpl.run(ConnectionHandlerImpl.java:124) at
java.lang.Thread.run(Thread.java:722)
I have been searching a lot on this exception. Everywhere it seems to be the conclusion that if you compile with the JVM option -Djava.net.preferIPv4Stack=true it will solve the problem. It doesn't for me though. I am running IntelliJ and trying to implement a Chord system with Sockets and Threads. This is where the above mentioned exception is caught and thrown:
if(this.remote == null || this.node == null) {
throw new RuntimeException("Connection handler has not been properly "
+ "initalialized");
}
try{
if(this.out == null) {
this.out = new ObjectOutputStream(remote.getOutputStream());
}
if(this.in == null) {
this.in = new ObjectInputStream(remote.getInputStream());
}
} catch(IOException e) {
System.out.println("Trying to make input/output for remote " + remote);
System.err.println(node.getKey()/Constants.NORMALIZE_KEY+": Error receiving connection: " + e.toString());
return false;
}
I don't know if more code snippets are needed in order to help, but I am simply lost on this exception since the solutions when searching are not helping at all.
Any hints on what triggers this error? The socket exception is caught by the given try-catch block as a special case of an IO exception
if you compile with the JVM option -Djava.net.preferIPv4Stack=true it will solve the problem.
No. If you execute with that JVM option. Nothing to do with the compiler.

Scala. Eclipse. SBT crashes on pattern matching and say nothing. SBT bug?

Sorry for a long message - a hope, you'll read it.
I started to do my course work on Scala. It is FTP server. It is using patter-matching for command recognizing. I write my code in Eclipse. I think, SBT have a bug, but I don't know where to say about it.
Description:
Long compile time. After compilation there is no any error in IDE. But there is next line in "Problems" panel:
The SBT builder crashed while compiling your project. This is a bug in
the Scala compiler or SBT. Check the Erorr Log for details. The error
message is: ch.epfl.lamp.fjbg.JCode$OffsetTooBigException: offset too
big to fit in 16 bits: 38838 FTPDaemon Unknown Scala Problem
But program starts and works. Next error in run-time:
Exception in thread "main" java.lang.ClassFormatError: Truncated class
file at java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:787) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:447) at
java.net.URLClassLoader.access$100(URLClassLoader.java:71) at
java.net.URLClassLoader$1.run(URLClassLoader.java:361) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:423) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:356) at
org.egslava.ftp.ControlConnection.(ControlConnection.scala:14)
at org.egslava.ftp.Main$.main(Main.scala:38) at
org.egslava.ftp.Main.main(Main.scala)
This error is showed when I do
new ControlConnection().start();
ControlConnection use variable currentState (pointer to abstract class FtpState). Current state may be instance of DoLogin class or WaitForCommandsState class.
WaitForCommands contains next block of code:
message match{
case owner.Noop() => "200 NOOP ok\r\n";
case owner.User(_) => "530 Can't change from guest user\r\n";
case owner.Pass(_) => "230 Already logged in\r\n";
case owner.Pasv() => pasv() + "\r\n"; case owner.List() => list() + "\r\n";
case "condition" => "error";
case owner.Nlst() => nlst() + "\r\n";
case owner.TypeCMD() => ""//"200 Switching to binary mode\r\n";
case "PWD" => "257 \"" + currentDirectory + "\"\r\n";
case "SITE HELP" => "200-\r\n200\r\n";
case owner.Cwd(path) => "250 Directory successfuly changed\r\n";
case "condition2" => "error2"
case unrecognizedCommand => "500 Unrecognized command " + unrecognizedCommand + "\r\n";
}
If I comment string (any):
case "condition" => "error";
or
case "condition2" => "error2"
The error will disappear.
What is it?
There algorithm for compilation of pattern matching will fail on overly large case statements, by producing bytecode longer than the maximum allowed for a JVM class. As I understand it, this is to be fixed in Scala 2.10.

Scala Object Serialization

I am having trouble opening multiple objects that were serialized into a single .bin file. Right now, I can only get one object to be read in when I attempt to open the file. After the first object is read, the error message is displayed (and no further objects are read). My code looks like the following:
val ois = new ObjectInputStream(new BufferedInputStream(newFileInputStream(chooser.selectedFile)))
val toRead:Int = ois.readInt()
for (i <- 0 to toRead) {
ois.readObject() match {
case anObject : myObject =>
aMutableBuffer += anObject
case _ =>
}
ois.close()
}
The error that I am getting is a lot of the following:
Exception in thread "AWT-EventQueue-0" java.security.PrivilegedActionException:java.security.PrivilegedActionException: java.io.IOException: Stream closed
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:649)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:296)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:211)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:201)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:196)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:188)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)
Caused by: java.security.PrivilegedActionException: java.io.IOException: Stream closed
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:98)
at java.awt.EventQueue$2.run(EventQueue.java:652)
at java.awt.EventQueue$2.run(EventQueue.java:650)
... 9 more
Caused by: java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
at java.io.BufferedInputStream.read(BufferedInputStream.java:241)
at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
atABCk.PhotoshopApp$$anonfun$ABCPhotoshopApp$$fileOpenPicture$1.apply$mcVI$sp(PhotoshopApp.scala:32)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:75)
at ABC.PhotoshopApp$.ABC$PhotoshopApp$$fileOpenPicture(PhotoshopApp.scala:31)
at ABC.PhotoshopApp$$anon$9$$anon$11$$anon$1$$anonfun$2.apply$mcV$sp(PhotoshopApp.scala:111)
at scala.swing.Action$$anon$2.apply(Action.scala:60)
at scala.swing.Action$$anon$1.actionPerformed(Action.scala:78)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2028)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2351)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
at javax.swing.AbstractButton.doClick(AbstractButton.java:389)
at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:809)
at com.apple.laf.AquaMenuItemUI.doClick(AquaMenuItemUI.java:137)
at javax.swing.plaf.basic.BasicMenuItemUI$Handler.menuDragMouseReleased(BasicMenuItemUI.java:913)
at javax.swing.JMenuItem.fireMenuDragMouseReleased(JMenuItem.java:568)
at javax.swing.JMenuItem.processMenuDragMouseEvent(JMenuItem.java:465)
at javax.swing.JMenuItem.processMouseEvent(JMenuItem.java:411)
at javax.swing.MenuSelectionManager.processMouseEvent(MenuSelectionManager.java:305)
at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:852)
at java.awt.Component.processMouseEvent(Component.java:6373)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3267)
at java.awt.Component.processEvent(Component.java:6138)
at java.awt.Container.processEvent(Container.java:2085)
at java.awt.Component.dispatchEventImpl(Component.java:4735)
at java.awt.Container.dispatchEventImpl(Container.java:2143)
at java.awt.Component.dispatchEvent(Component.java:4565)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4621)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4282)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4212)
at java.awt.Container.dispatchEventImpl(Container.java:2129)
at java.awt.Window.dispatchEventImpl(Window.java:2478)
at java.awt.Component.dispatchEvent(Component.java:4565)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:679)
at java.awt.EventQueue.access$000(EventQueue.java:85)
at java.awt.EventQueue$1.run(EventQueue.java:638)
at java.awt.EventQueue$1.run(EventQueue.java:636)
... 14 more
This only happens when I read in an object to my buffer that keeps track of the objects. Moreover, I am able to save the file correctly (as I have done tests to ensure everything got there). Anyone have any ideas what is going on here?
You are closing your Stream ois at the end, but inside the loop. Then you try to read from it on the first line of the loop. Which obviously failes with a IOException: Stream closed