Does implementing graalvm on a spring batch application improve its performance - spring-batch

Am working on a spring batch job that processes millions of records during a period of 5-6 hrs every time it's run.
The application runs of java 8 and i was wondering if making that batch job run on graalvm would add any significant performance.

It could (should even), but best measure for yourself.
Switching to GraalVM should be a matter of changing JAVA_HOME or whatever path to java you use, run your job and see how long it takes.
If you have the time/resources run the same job both with your usual JVM and with GraalVM and see if using GraalVM helps on your workload.

Related

Scala Spark IntelliJ Idea development process

I am currently using spark to write my dimensional data model and we are currently uploading the jar to an AWS EMR cluster to test. However, this is tedious and time consuming for testing and building tables.
I would like to know what others are doing to speed up their development. The possibilities I came across in my research is running spark jobs directly from the IDE with Intellij Idea and I would like to know other development processes that are being used where it's faster to develop.
The ways I have had tried till now are:
Installing spark and hdfs on two or three commodity PCs and test the code before submitting it on the cluster.
Running the code on the single node to avoid dummy mistakes.
Submitting the jar file on the cluster.
The similar part in the first and third method is making the jar file which may takes a lot of time. The second one is not suitable to find and fix the bugs and problems and raise on distributed running environments.

How a spark application starts using sbt run.

I actually want to know the underlying mechanism of how this happens that when I execute sbt run the spark application starts !
What is the difference between this and running spark on standalone mode and then deploying application on it using spark-submit.
If someone can explain how the jar is submitted and who makes the task and assigns it in both the cases, that would be great.
Please help me out with this or point to some read where i can make my doubts cleared !
First, read this.
Once you are familiar with the terminologies, different roles, and their responsibilities, read below paragraph to summarize.
There are different ways to run a spark application(a spark app is nothing but a bunch of class files with an entry point).
You can run the spark application as single java process(usually for development purposes). This is what happens when you run sbt run.
In this mode, all the services like driver, workers etc are run inside a single JVM.
But above way of running is only for development and testing purposes as it won't scale. That means you won't be able to process a huge amount of data. This is where other ways of running a spark app come into the picture(Standalone, mesos, yarn etc).
Now read this.
In these modes, there will be dedicated JVMs for different roles. Driver will be running as a separate JVM, there could be 10s to 1000s of executor JVMs running on different machines(Crazy right!).
The interesting part is, the same application that runs inside a single JVM will be distributed to run on 1000s of JVMs. This distribution of the application, life-cycle of these JVMs, making them fault-tolerance etc are taken care by Spark and the underlying cluster frameworks.

service fabric slow development cycle

Still very new to service fabric but I'm surprised that something as advanced as this is so slow to debug. I'm using a fairly fast machine but it takes 4-5 minutes to tear down and restart the cluster. I've googled it and can't see that anyone else has reported this as being a show stopper.
Some clues to help with your slow development turnaround time:
When developing locally, consider using a One-node cluster in order
to speed-up deployments and upgrades (less Upgrade/Fault Domains):
https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-get-started-with-a-local-cluster#one-node-and-five-node-cluster-mode
You need to setup/create your cluster once and than start it and
keep it running between debugging sessions, Visual Studio will take
care of uninstalling/upgrading the SF Apps when starting the
debugger.
You can modify the properties of the SF Application project to
decide if your SF App will be uninstalled and install or upgraded
when starting the debugger, which impacts the deployment time.
Consider running from a SSD drive which will speed up compilation and
deployment (file intensive).
Expect less than one minute to compile, deploy and attach debugger for a SF App with 2-3 services.
Yes, we have the same issue, we have around 10 services in our application and debugging is very slow, VS fails to refresh 1 node cluster all the time, so cluster reset is only solution. So every debug run takes about 5 minutes.
Yes, very disappointment development process, the only advantage is some reuse of C# code, if you have not decided what to use for your cloud solution, abandon C# as early as possible. Go for any JST based language having no intermediate binaries.

Tweaking SBT performance

Are there any SBT options, JVM options or Scala compiler options that would positively or negatively influence SBT's startup and compile time. What are the options that I can play around with? Documentation is very poor and doesn't specify anything.
Background: I switched from from a 2 year old Macbook Air to a brand new Macbook Pro. Same OS, same configuration. The new one, has a much faster CPU and twice as much memory. For some reason, starting SBT and compiling my Play2 application is significantly faster on my old, much weaker, machine. The difference is consistently as high as 10 seconds compile time.
Turns out it was 2 things:
There was a conflicting dependency on SLF4J, which caused an SBT warning every time it started. I assume SBT tried to resolve the dependency in the background, significantly slowing down startup speed. Once that was removed, I saved 3-4 seconds.
When the compilation of the Play2 application was triggered by hitting http://localhost:9000 in the browser, there was another issue related to the hostname. I don't know exactly why this was having such a big impact, but after running scutil –set HostName “localhost” I reduced the time it takes to compile the application and get the Play2 application in the browser by nearly 10 seconds!
So overall, these 2 little changes saved me more than 10 seconds in the development cycle. I hope somebody finds this useful as well.

How to Speed Up IBM Rational Application Developer/Rational Software Architect

I want to know how can I speed up RSA 7.5( which is an IDE by IBM having eclipse under the hood with websphere server runtimes) mainly server start. The first time I start it after computer reboot it loads after, but after that it takes for ever to start/stop the server. The debug mode for server takes for ever to start.
I am using server 7 run time for IBM RSA 7.5.
So bascially RAD/RSA has websphere run times which allows to configure the server runtime start/stop within RAD/RSA. The run time allows you to develop webapps and test time on the server on deploy it on the websphere run time.
The problem I am facing is with the websphere runtime which works fine after computer reboot but is very slow after several deployments/publishing of the same web app.
I would be grateful you give performance tips for speed up RSA server start/shutdown and overall performance tips. I have plenty of memory like 12 GB with i7 Core 6 cores on Win7.
Of course of your are running the server in debug mode it's going to be a lot slower, but you have a few options like putting the server in development mode or doing some fine tuning as to which applications should start. Take a look at these articles:
Rational Application Developer Performance Tips- Case study: Tuning WebSphere Application Server V7 and V8 for performance
Performance tuning WebSphere Application Server 7 on AIX 6.1
WebSphere tuning for the impatient: How to get 80% of the performance improvement with 20% of the effort
WebSphere Performance Monitoring & Tuning
Some of these are a bit dated but they have some good information that may still be relevent to your issues; especially the first one.
Make sure that the workspaces are stored on a local disk.
edit - forgot this: buy a SSD disk. It makes a huge difference when developing.
If you have a virus scanner, disable on-access scan in the SDP installation directory including the server plugin, and in all your workspaces.
Uninstall any applications (ears) you are not using - the more you have installed the longer the server takes to startup. If your server is taking too long to start, RAD/RSA will assume it has timed out and stop it before it finishes starting - if this happens then increase the start timeout limit by double-clicking your server in the Servers tab and modifying the values in the Timeouts section.
Oh, and If you have a lot of datasources defined, and autostarting connection pools with alot of connections, it may also take a while to start the pools.
But that can't explain it all... I haven't tested, but since WAS and RSA seems to spend a lot of time doing absolutely nothing, I am starting to suspect it's trying to download schemas or something. If you have the time, you could try to trace and see if you find something like that...
I came across this post while trying to troubleshoot my RSA performance. I figured I would update it with a recommendation for improving performance on RSA 8.0.4.
http://publib.boulder.ibm.com/infocenter/radhelp/v8/index.jsp?topic=/com.ibm.performance.doc/topics/cperformance.html has some excellent tips on improving performance in the "Performance Tips" section. After implementing just some of the "Always" tips I've found my memory reducing significantly and performance being much faster.
You should start with the "Always" tips and then move to the "Sometimes" and "Rarely" ones for finer tuning.