Spark Job Server No Such Process while starting the job server - spark-jobserver

I have issue while trying to start the job server. Im getting the below error,
./server_start.sh
./server_start.sh: line 41: kill: (113865) - No such process
Please help me on this.

Related

Celery lose worker

I use celery 4.4.0 version in my project(Ubuntu 18.04.2 LTS). When i raise Exception('too few functions in features to classify') , celery project lost worker and i get such logs:
[2020-02-11 15:42:07,364] [ERROR] [Main ] Task handler raised error: WorkerLostError('Worker exited prematurely: exitcode 0.')
Traceback (most recent call last):
File "/var/lib/virtualenvs/simus_classifier_new/lib/python3.7/site-packages/billiard/pool.py", line 1267, in mark_as_worker_lost human_status(exitcode)), billiard.exceptions.WorkerLostError: Worker exited prematurely: exitcode 0.
[2020-02-11 15:42:07,474] [DEBUG] [ForkPoolWorker-61] Closed channel #1
Do you have any idea how to solve this problem?
WorkerLostError are almost like OutOfMemory errors - they can't be solved. They will continue to happen from time to time. What you should do is to make your task(s) idempotent and let Celery retry tasks that failed due to worker crash.
It sounds trivial, but in many cases it is not. Not all tasks can be idempotent for an example. Celery still has bugs in the way it handles WorkerLostError. Therefore you need to monitor your Celery cluster closely and react to these events, and try to minimize them. In other words, find why the worker crashed - Was it killed by the system because it was consuming all the memory? Was it killed simply because it was running on an AWS spot instance, and it got terminated? Was it killed by someone executing kill -9 <worker pid>? All these circumstances could be handled this way or another...

FTPD Server Issue

So I am trying to use my xampp server and for the life of me can't understand why my ProFTPD will not turn on. It only became cause for concern when I saw the word "bogon" in the application log. Can anyone translate to me what the application log means and maybe how I go about troubleshooting the problem ?
Stopping all servers...
Stopping Apache Web Server...
/Applications/XAMPP/xamppfiles/apache2/scripts/ctl.sh : httpd stopped
Stopping MySQL Database...
/Applications/XAMPP/xamppfiles/mysql/scripts/ctl.sh : mysql stopped
Starting ProFTPD...
Exit code: 8
Stdout:
Checking syntax of configuration file
proftpd config test fails, aborting
Stderr:
bogon proftpd[3948]: warning: unable to determine IP address of 'bogon'
bogon proftpd[3948]: error: no valid servers configured
bogon proftpd[3948]: Fatal: error processing configuration file '/Applications/XAMPP/xamppfiles/etc/proftpd.conf'

Starting Parpool in MATLAB

I tried starting parpool in MATLAB 2015b. Command as follows,
parpool('local',3);
This command should allocate 3 workers. Whereas I received an error stating failure to start parpool. The error message as follows,
Error using parpool (line 94)
Failed to start a parallel pool. (For information in addition to
the causing error, validate the profile 'local' in the Cluster Profile
Manager.)
A similar query was posted in (https://nl.mathworks.com/matlabcentral/answers/196549-failed-to-start-a-parallel-pool-in-matlab2015a). I followed the same procedure, to validate the local profile as per the suggestions.
Using distcomp.feature( 'LocalUseMpiexec', false); or distcomp.feature( 'LocalUseMpiexec', true) in startup.m didn't create any improvement. Also when attempting to validate local profile still gives error message as follows,
VALIDATION DETAILS
Profile: local
Scheduler Type: Local
Stage: Cluster connection test (parcluster)
Status: Passed
Description:Validation Passed
Command Line Output:(none)
Error Report:(none)
Debug Log:(none)
Stage: Job test (createJob)
Status: Failed
Description:The job errored or did not reach state finished.
Command Line Output:
Failed to determine if job 24 belongs to this cluster because: Unable to
read file 'C:\Users\varad001\AppData\Roaming\MathWorks\MATLAB
\local_cluster_jobs\R2015b\Job24.in.mat'. No such file or directory..
Error Report:(none)
Debug Log:(none)
Stage: SPMD job test (createCommunicatingJob)
Status: Failed
Description:The job errored or did not reach state finished.
Command Line Output:
Failed to determine if job 25 belongs to this cluster because: Unable to
read file 'C:\Users\varad001\AppData\Roaming\MathWorks\MATLAB
\local_cluster_jobs\R2015b\Job25.in.mat'. No such file or directory..
Error Report:(none)
Debug Log:(none)
Stage: Pool job test (createCommunicatingJob)
Status: Skipped
Description:Validation skipped due to previous failure.
Command Line Output:(none)
Error Report:(none)
Debug Log:(none)
Stage: Parallel pool test (parpool)
Status: Skipped
Description:Validation skipped due to previous failure.
Command Line Output:(none)
Error Report:(none)
Debug Log:(none)
I am receiving these error only in my cluster machine. But launching parpool in my standalone PC is working perfectly. Is there a way to rectify this issue?

How do I fix server Oops error when deploying Play Framework 2.0 application as a Windows service?

I was following the processes in this very helpful post:
How do I run a Play Framework 2.0 application as a Windows service?
when I ran into trouble in Step 9. When I execute runConsole.bat, the service cycles between the running and restarting states. The full log is here:
wrapper.log
but what jumps out at me in the log are the following:
INFO|7268/0|play.core.server.NettyServer|13-12-28 13:07:28|Oops, cannot start the server.
INFO|7268/0|play.core.server.NettyServer|13-12-28 13:07:28|Configuration error: Configuration error[Cannot connect to database [default]]
...
INFO|7268/0|play.core.server.NettyServer|13-12-28 13:07:28|Caused by: org.h2.jdbc.JdbcSQLException: Database may be already in use: "Locked by another process". Possible solutions: close all other connection(s); use the server mode [90020-168]
...
INFO|wrapper|play.core.server.NettyServer|13-12-28 13:07:28|restart process due to default exit code rule
INFO|wrapper|play.core.server.NettyServer|13-12-28 13:07:28|restart internal RUNNING
INFO|wrapper|play.core.server.NettyServer|13-12-28 13:07:28|stopping process with pid/timeout 7268 45000
INFO|wrapper|play.core.server.NettyServer|13-12-28 13:07:30|process exit code: -1
...
INFO|7812/1|play.core.server.NettyServer|13-12-28 13:07:45|[error] c.j.b.h.AbstractConnectionHook - Failed to obtain initial connection Sleeping for 0ms and trying again. Attempts left: 0. Exception: null
INFO|7812/1|play.core.server.NettyServer|13-12-28 13:07:45|Oops, cannot start the server.
INFO|7812/1|play.core.server.NettyServer|13-12-28 13:07:45|Configuration error: Configuration error[Cannot connect to database [default]]
repeats several times...
Half way through writing this post, I realized that before step 9., you should terminate the start.bat script which you start up in step 6. When I did this, runConsole.bat would execute normally without all the errors I list above.

MongoDB ReplicaSet on windows azure - Sockect exception error

I have 3 node replica set deployed in windows azure. While doing performance testing, the test code halts after sometime. In the server I can see the following error log -
Fri Aug 30 23:14:59.982 [conn2454] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [ip:port]
For the performance test I am using multithreaded code to only read data from the replicaset.
So far I have tried http://docs.mongodb.org/manual/faq/diagnostics/#does-tcp-keepalive-time-affect-sharded-clusters-and-replica-sets. But it did not help so far.
Any thoughts/suggestions will be welcomed.
Thanks
This is old, but just in case someone else stumbles across this.
You need to set the TCP/IP keep alive times different from the stock Linux configuration if you are running under Azure:
echo 45 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 30 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes