CDC with WSO2 Streaming Integrator and Postgres DB - postgresql

I am trying to setup Change Data Capture (CDC) between WSO2 Streaming Integrator and a local Postgres DB.
I have added the Postgres Driver (v42.2.5) to SI_HOME/lib and I am able to read data from the database from a Siddhi application.
I am following the CDCWithListeningMode example to implement CDC and I am using pgoutput as the logical decoding plugin. But when I run the application I get the following log.
[2020-04-23_19-02-37_460] INFO {org.apache.kafka.connect.json.JsonConverterConfig} - JsonConverterConfig values:
converter.type = key
schemas.cache.size = 1000
schemas.enable = true
[2020-04-23_19-02-37_461] INFO {org.apache.kafka.connect.json.JsonConverterConfig} - JsonConverterConfig values:
converter.type = value
schemas.cache.size = 1000
schemas.enable = false
[2020-04-23_19-02-37_461] INFO {io.debezium.embedded.EmbeddedEngine$EmbeddedConfig} - EmbeddedConfig values:
access.control.allow.methods =
access.control.allow.origin =
bootstrap.servers = [localhost:9092]
header.converter = class org.apache.kafka.connect.storage.SimpleHeaderConverter
internal.key.converter = class org.apache.kafka.connect.json.JsonConverter
internal.value.converter = class org.apache.kafka.connect.json.JsonConverter
key.converter = class org.apache.kafka.connect.json.JsonConverter
listeners = null
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
offset.flush.interval.ms = 60000
offset.flush.timeout.ms = 5000
offset.storage.file.filename =
offset.storage.partitions = null
offset.storage.replication.factor = null
offset.storage.topic =
plugin.path = null
rest.advertised.host.name = null
rest.advertised.listener = null
rest.advertised.port = null
rest.host.name = null
rest.port = 8083
ssl.client.auth = none
task.shutdown.graceful.timeout.ms = 5000
value.converter = class org.apache.kafka.connect.json.JsonConverter
[2020-04-23_19-02-37_516] INFO {io.debezium.connector.common.BaseSourceTask} - offset.storage = io.siddhi.extension.io.cdc.source.listening.InMemoryOffsetBackingStore
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.server.name = localhost_5432
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.port = 5432
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - table.whitelist = SweetProductionTable
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - cdc.source.object = 1716717434
[2020-04-23_19-02-37_517] INFO {io.debezium.connector.common.BaseSourceTask} - database.hostname = localhost
[2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - database.password = ********
[2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - name = CDCWithListeningModeinsertSweetProductionStream
[2020-04-23_19-02-37_518] INFO {io.debezium.connector.common.BaseSourceTask} - server.id = 6140
[2020-04-23_19-02-37_519] INFO {io.debezium.connector.common.BaseSourceTask} - database.history = io.debezium.relational.history.FileDatabaseHistory
[2020-04-23_19-02-38_103] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - user 'user_name' connected to database 'db_name' on PostgreSQL 11.5, compiled by Visual C++ build 1914, 64-bit with roles:
role 'user_name' [superuser: false, replication: true, inherit: true, create role: false, create db: false, can log in: true] (Encoded)
[2020-04-23_19-02-38_104] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - No previous offset found
[2020-04-23_19-02-38_104] INFO {io.debezium.connector.postgresql.PostgresConnectorTask} - Taking a new snapshot of the DB and streaming logical changes once the snapshot is finished...
[2020-04-23_19-02-38_105] INFO {io.debezium.util.Threads} - Requested thread factory for connector PostgresConnector, id = localhost_5432 named = records-snapshot-producer
[2020-04-23_19-02-38_105] INFO {io.debezium.util.Threads} - Requested thread factory for connector PostgresConnector, id = localhost_5432 named = records-stream-producer
[2020-04-23_19-02-38_293] INFO {io.debezium.connector.postgresql.connection.PostgresConnection} - Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLSN=null]
[2020-04-23_19-02-38_704] ERROR {io.siddhi.core.stream.input.source.Source} - Error on 'CDCWithListeningMode'. Connection to the database lost. Error while connecting at Source 'cdc' at 'insertSweetProductionStream'. Will retry in '5 sec'. (Encoded)
io.siddhi.core.exception.ConnectionUnavailableException: Connection to the database lost.
at io.siddhi.extension.io.cdc.source.CDCSource.lambda$connect$1(CDCSource.java:424)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:793)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.ConnectException: Cannot create replication connection
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:87)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:38)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$ReplicationConnectionBuilder.build(PostgresReplicationConnection.java:362)
at io.debezium.connector.postgresql.PostgresTaskContext.createReplicationConnection(PostgresTaskContext.java:65)
at io.debezium.connector.postgresql.RecordsStreamProducer.(RecordsStreamProducer.java:81)
at io.debezium.connector.postgresql.RecordsSnapshotProducer.(RecordsSnapshotProducer.java:70)
at io.debezium.connector.postgresql.PostgresConnectorTask.createSnapshotProducer(PostgresConnectorTask.java:133)
at io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:86)
at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:45)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:677)
... 3 more
Caused by: io.debezium.jdbc.JdbcConnectionException: ERROR: could not access file "decoderbufs": No such file or directory
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initReplicationSlot(PostgresReplicationConnection.java:145)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.(PostgresReplicationConnection.java:79)
... 12 more
Caused by: org.postgresql.util.PSQLException: ERROR: could not access file "decoderbufs": No such file or directory
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:307)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:293)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:270)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:266)
at org.postgresql.replication.fluent.logical.LogicalCreateSlotBuilder.make(LogicalCreateSlotBuilder.java:48)
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initReplicationSlot(PostgresReplicationConnection.java:108)
... 13 more
Debezium defaults to decoderbufs plugin - "could not access file "decoderbufs": No such file or directory".
According to this answer, the issue is due to the configuration of decoderbufs plugin.
Details
Postgres - 11.4
siddhi-cdc-io - 2.0.3
Debezium - 0.8.3
How do I configure the embedded debezium engine to use the pgoutput plugin? Will changing this configuration fix the error?
Please help me with this issue. I have not found any resources that can help me.

you either need to update the Debezium to the latest 1.1 version - this will enable you to use pgoutput plugin using plugin.name config option or you need to deploy (and maybe build) decoderbufs.so library to your PostgreSQL database.
I'd recommend the former as 0.8.3 is very old version.

I observed this behavior with PostgreSQL 12 when I tried to do CDC with pgoutput logical decoding output plug-in. It seems like even though I configured the database with pgoutput, the siddhi extension is trying to make the connection using "decoderbufs" as decoding plug-in.
When I tried configuring decoderbufs as the logical decoding output plug-in in the database level, I was able to use siddhi io extension without any issue.
It seems like for now, Siddhi io CDC only supports decoderbufs logical decoding output plug-in with PostgreSQL.

Related

PyFlink CDC Connectors Postgres failure

Trying to follow the Flink CDC Connectors Postgres tutorial using PyFlink:
https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html
Failing Code
ddl = """
CREATE TABLE shipments (
shipment_id INT,
order_id INT,
origin STRING,
destination STRING,
is_arrived BOOLEAN
) WITH (
'connector' = 'postgres-cdc',
'hostname' = 'localhost',
'port' = '5432',
'username' = 'postgres',
'password' = 'postgres',
'database-name' = 'postgres',
'schema-name' = 'public',
'slot.name' = 'slot2',
'table-name' = 'shipments'
);
"""
table_env.execute_sql(ddl)
table2: Table = table_env.sql_query("SELECT * FROM shipments")
table2.execute().print()
Main Stacktrace
Caused by: io.debezium.DebeziumException: Creation of replication slot failed
at io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:141)
at io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:130)
at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:759)
at io.debezium.embedded.ConvertingEngineBuilder$2.run(ConvertingEngineBuilder.java:188)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near "CREATE_REPLICATION_SLOT"
Position: 1
Troubleshooting
CDC needs access to Postgres' WAL, write ahead log. In order to get the WAL Postgres needs to have a replication slot. So you need to set this up in Postgres, I got that done with this command:
ALTER SYSTEM SET wal_level = logical
SELECT * FROM pg_create_logical_replication_slot('slot2', 'test_decoding')
Then restart Postgres.
CDC code that get the replication slot
jdbcConnection.getReplicationSlotState(connectorConfig.slotName(), connectorConfig.plugin().getPostgresPluginName());
called from:
https://github.com/debezium/debezium/blob/main/debezium-connector-postgres/src/main/java/io/debezium/connector/postgresql/PostgresConnectorTask.java
When that doesn't work you it will try to create the replication slot and that is where the exception is thrown.
Possible version issue
Maybe the problem is that I am running Postgres 13 and Ververica CDC might only support up to Postgres 12.

Why can't I connect to Gremlin-Server?

Abstract
I'm trying to set up a Titan/Cassandra/Gremlin-Server stack in Docker (v1.13.0). The problem I'm facing is that applications trying to connect to Gremlin-Server on the default port 8182 are reporting errors (details below).
First, here is some relevant version information:
Cassandra v2.2.8
Titan v1.0.0 (Hadoop 1)
Gremlin 3.2.3
Setup
Setup takes place in a Dockerfile in order to be reproducible. It assumes that a Cassandra container already exists, running a cassandra.yaml in which start_rpc has been set to true.
The Dockerfile is as follows:
FROM openjdk:alpine
ENV TITAN 'titan-1.0.0-hadoop1'
RUN apk update && apk add bash unzip && rm -rf /var/cache/apk/* \
&& adduser -S -s /bin/bash -D srg \
&& wget -O /tmp/$TITAN.zip http://s3.thinkaurelius.com/downloads/titan/$TITAN.zip \
&& unzip /tmp/$TITAN.zip -d /opt && ln -s /opt/$TITAN /opt/titan \
&& rm /tmp/*.zip \
&& chown -R srg /opt/$TITAN/ \
&& /opt/titan/bin/gremlin-server.sh -i org.apache.tinkerpop gremlin-python 3.2.3
COPY conf/gremlin-server/* /opt/$TITAN/conf/gremlin-server/
USER srg
WORKDIR /opt/titan
EXPOSE 8182
CMD ["bin/gremlin-server.sh", "conf/gremlin-server/srg.yaml"]
The astute reader will note that I am copying custom configuration files into the container, namely a Gremlin-Server configuration file (srg.yaml) and a titan graph properties file (srg.properties).
srg.yaml
host: localhost
port: 8182
threadPoolWorker: 1
gremlinPool: 8
scriptEvaluationTimeout: 30000
serializedResponseTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {
graph: conf/gremlin-server/srg.properties
}
plugins:
- aurelius.titan
scriptEngines: {
gremlin-groovy: {
imports: [java.lang.Math],
staticImports: [java.lang.Math.PI],
scripts: [scripts/empty-sample.groovy]},
gremlin-jython: {},
gremlin-python: {},
nashorn: {
imports: [java.lang.Math],
staticImports: [java.lang.Math.PI]}}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { useMapperFromGraph: graph }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { useMapperFromGraph: graph }}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { useMapperFromGraph: graph }}
processors:
- { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
metrics: {
consoleReporter: {enabled: true, interval: 180000},
csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
jmxReporter: {enabled: true},
slf4jReporter: {enabled: true, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}}
threadPoolBoss: 1
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
enabled: false}
srg.properties
gremlin.graph=com.thinkaurelius.titan.core.TitanFactory
storage.backend=cassandrathrift
storage.hostname=cassandra # refers to the linked container
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25
# Start elasticsearch inside the Titan JVM
index.search.backend=elasticsearch
index.search.directory=db/es
index.search.elasticsearch.client-only=false
index.search.elasticsearch.local-mode=true
Execution
The container is run with the following command: docker run -ti --rm=true --link test.cassandra:cassandra -p 8182:8182 titan.
Here is the log output for Gremlin-Server:
0 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer -
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
297 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Configuring Gremlin Server from conf/gremlin-server/srg.yaml
439 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics ConsoleReporter configured with report interval=180000ms
448 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics CsvReporter configured with report interval=180000ms to fileName=/tmp/gremlin-server-metrics.csv
557 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics JmxReporter configured with domain= and agentId=
561 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
1750 [main] INFO com.thinkaurelius.titan.core.util.ReflectiveConfigOptionLoader - Loaded and initialized config classes: 12 OK out of 12 attempts in PT0.148S
1972 [main] INFO com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager - Closed Thrift connection pooler.
1990 [main] INFO com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration - Generated unique-instance-id=ac1100031-ad2d5ffa52e81
2026 [main] INFO com.thinkaurelius.titan.diskstorage.Backend - Configuring index [search]
2386 [main] INFO org.elasticsearch.node - [Lunatik] version[1.5.1], pid[1], build[5e38401/2015-04-09T13:41:35Z]
2387 [main] INFO org.elasticsearch.node - [Lunatik] initializing ...
2399 [main] INFO org.elasticsearch.plugins - [Lunatik] loaded [], sites []
6471 [main] INFO org.elasticsearch.node - [Lunatik] initialized
6472 [main] INFO org.elasticsearch.node - [Lunatik] starting ...
6477 [main] INFO org.elasticsearch.transport - [Lunatik] bound_address {local[1]}, publish_address {local[1]}
6507 [main] INFO org.elasticsearch.discovery - [Lunatik] elasticsearch/u2StmRW1RsyEHw561yoNFw
6519 [elasticsearch[Lunatik][clusterService#updateTask][T#1]] INFO org.elasticsearch.cluster.service - [Lunatik] master {new [Lunatik][u2StmRW1RsyEHw561yoNFw][ad2d5ffa52e8][local[1]]{local=true}}, removed {[Lunatik][kKyL9UE-R123LLZTTrsVCw][ad2d5ffa52e8][local[1]]{local=true},}, reason: local-disco-initial_connect(master)
6908 [main] INFO org.elasticsearch.http - [Lunatik] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.17.0.3:9200]}
6909 [main] INFO org.elasticsearch.node - [Lunatik] started
6923 [elasticsearch[Lunatik][clusterService#updateTask][T#1]] INFO org.elasticsearch.gateway - [Lunatik] recovered [0] indices into cluster_state
7486 [elasticsearch[Lunatik][clusterService#updateTask][T#1]] INFO org.elasticsearch.cluster.metadata - [Lunatik] [titan] creating index, cause [api], templates [], shards [5]/[1], mappings []
8075 [main] INFO com.thinkaurelius.titan.diskstorage.Backend - Initiated backend operations thread pool of size 4
8241 [main] INFO com.thinkaurelius.titan.diskstorage.Backend - Configuring total store cache size: 94787290
8641 [main] INFO com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog - Loaded unidentified ReadMarker start time 2017-01-21T16:31:28.750Z into com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller#3520958b
8642 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] was successfully configured via [conf/gremlin-server/srg.properties].
8643 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-*
14187 [main] INFO com.jcabi.manifests.Manifests - 108 attributes loaded from 264 stream(s) in 185ms, 108 saved, 3371 ignored: ["Agent-Class", "Ant-Version", "Archiver-Version", "Bnd-LastModified", "Boot-Class-Path", "Build-Date", "Build-Host", "Build-Id", "Build-Java-Version", "Build-Jdk", "Build-Job", "Build-Number", "Build-Time", "Build-Timestamp", "Build-Version", "Built-At", "Built-By", "Built-OS", "Built-On", "Built-Status", "Bundle-ActivationPolicy", "Bundle-Activator", "Bundle-BuddyPolicy", "Bundle-Category", "Bundle-ClassPath", "Bundle-Classpath", "Bundle-Copyright", "Bundle-Description", "Bundle-DocURL", "Bundle-License", "Bundle-Localization", "Bundle-ManifestVersion", "Bundle-Name", "Bundle-NativeCode", "Bundle-RequiredExecutionEnvironment", "Bundle-SymbolicName", "Bundle-Vendor", "Bundle-Version", "Can-Redefine-Classes", "Change", "Class-Path", "Created-By", "DynamicImport-Package", "Eclipse-AutoStart", "Eclipse-BuddyPolicy", "Eclipse-SourceReferences", "Embed-Dependency", "Embedded-Artifacts", "Export-Package", "Extension-Name", "Extension-name", "Fragment-Host", "Git-Commit-Branch", "Git-Commit-Date", "Git-Commit-Hash", "Git-Committer-Email", "Git-Committer-Name", "Gradle-Version", "Gremlin-Lib-Paths", "Gremlin-Plugin-Dependencies", "Gremlin-Plugin-Paths", "Ignore-Package", "Implementation-Build", "Implementation-Build-Date", "Implementation-Title", "Implementation-URL", "Implementation-Vendor", "Implementation-Vendor-Id", "Implementation-Version", "Import-Package", "Include-Resource", "JCabi-Build", "JCabi-Date", "JCabi-Version", "Java-Vendor", "Java-Version", "Main-Class", "Main-class", "Manifest-Version", "Maven-Version", "Module-Email", "Module-Origin", "Module-Owner", "Module-Source", "Originally-Created-By", "Os-Arch", "Os-Name", "Os-Version", "Package", "Premain-Class", "Private-Package", "Require-Bundle", "Require-Capability", "Scm-Connection", "Scm-Revision", "Scm-Url", "Specification-Title", "Specification-Vendor", "Specification-Version", "Tool", "X-Compile-Source-JDK", "X-Compile-Target-JDK", "hash", "implementation-version", "mode", "package", "url", "version"]
14842 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines - Loaded gremlin-jython ScriptEngine
15540 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines - Loaded nashorn ScriptEngine
16076 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines - Loaded gremlin-python ScriptEngine
16553 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines - Loaded gremlin-groovy ScriptEngine
17410 [main] INFO org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor - Initialized gremlin-groovy ScriptEngine with scripts/empty-sample.groovy
17410 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized GremlinExecutor and configured ScriptEngines.
17419 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[standardtitangraph[cassandrathrift:[cassandra]], standard]
17565 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/vnd.gremlin-v1.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
17566 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/vnd.gremlin-v1.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
17808 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/vnd.gremlin-v1.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0
17811 [main] INFO org.apache.tinkerpop.gremlin.server.AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0
17958 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
17959 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Channel started at port 8182.
1/21/17 4:34:20 PM =============================================================
-- Meters ----------------------------------------------------------------------
org.apache.tinkerpop.gremlin.server.GremlinServer.errors
count = 0
mean rate = 0.00 events/second
1-minute rate = 0.00 events/second
5-minute rate = 0.00 events/second
15-minute rate = 0.00 events/second
180564 [metrics-logger-reporter-thread-1] INFO org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics - type=METER, name=org.apache.tinkerpop.gremlin.server.GremlinServer.errors, count=0, mean_rate=0.0, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second
Symptoms
So far, everything appears to be working as intended. The logs indicate that I am able to load srg.properties and bind the data structure to a variable called graph.
The problem appears when I try to connect to the Gremlin-Server instance over the exported port 8182, for example using gremlin-python:
# executed via python 3.6.0 on the host machine, i.e. not inside of Docker
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','graph'))
produces the following exception ...
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-10-59ad504f29b4> in <module>()
----> 1 g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/','g'))
/Users/lthibault/.pyenv/versions/3.6.0/lib/python3.6/site-packages/gremlin_python/driver/driver_remote_connection.py in __init__(self, url, traversal_source, username, password, loop, graphson_reader, graphson_writer)
41 self._password = password
42 if loop is None: self._loop = ioloop.IOLoop.current()
---> 43 self._websocket = self._loop.run_sync(lambda: websocket.websocket_connect(self.url))
44 self._graphson_reader = graphson_reader or GraphSONReader()
45 self._graphson_writer = graphson_writer or GraphSONWriter()
/Users/lthibault/.pyenv/versions/3.6.0/lib/python3.6/site-packages/tornado/ioloop.py in run_sync(self, func, timeout)
455 if not future_cell[0].done():
456 raise TimeoutError('Operation timed out after %s seconds' % timeout)
--> 457 return future_cell[0].result()
458
459 def time(self):
/Users/lthibault/.pyenv/versions/3.6.0/lib/python3.6/site-packages/tornado/concurrent.py in result(self, timeout)
235 return self._result
236 if self._exc_info is not None:
--> 237 raise_exc_info(self._exc_info)
238 self._check_done()
239 return self._result
/Users/lthibault/.pyenv/versions/3.6.0/lib/python3.6/site-packages/tornado/util.py in raise_exc_info(exc_info)
HTTPError: HTTP 599: Stream closed
Suspecting a problem specific to this library:
1) attempt to connect to the websocket port with nc
$ nc -z -v localhost 8182
found 0 associations
found 1 connections:
1: flags=82<CONNECTED,PREFERRED>
outif lo0
src ::1 port 58627
dst ::1 port 8182
rank info not available
TCP aux info available
Connection to localhost port 8182 [tcp/*] succeeded!
2) attempt to connect to Gremlin-Server using a different client library, namely go-gremlin
Test case:
package main
import (
"fmt"
"log"
"github.com/go-gremlin/gremlin"
)
func main() {
if err := gremlin.NewCluster("ws://localhost:8182/gremlin"); err != nil {
log.Fatal(err)
}
data, err := gremlin.Query(`graph.V()`).Exec()
if err != nil {
log.Fatalf("Query error: %s", err)
}
fmt.Println(string(data))
}
Output:
$ go run cmd/test/main.go
2017/01/21 14:47:42 Query error: unexpected EOF
exit status 1
Current Conclusions & Questions
From the previous tests, I conclude that this is an application-level problem (i.e. a problem on the websocket or ws protocol level, not a problem in the host or container networking stack). Indeed, nc reports that the socket connection is successful, but in both the Python and Go client libraries ostensibly complain of an inappropriate (empty) response from the server.
I have tried removing the /gremlin path from the websocket URL both in gremlin-python and in go-gremlin, to no avail.
My question is: where do I go from here? Any suggestions or diagnostic paths would be most appreciated!
The main problem is that the host in your Gremlin Server configuration is set to the default which is localhost. This will only allow connections from the server itself. You need to change the value to an external IP of the server or 0.0.0.0.
The other issue is that gremlin-python server plugin was made available with Apache TinkerPop 3.2.2. Titan 1.0.0 uses TinkerPop 3.0.1. I dobut that the gremlin-python 3.2.3 plugin will work with Titan 1.0.0.
Update: Consider using JanusGraph 0.1.1 which uses TinkerPop 3.2.3. JanusGraph was forked from Titan, so the code is basically the same with updated dependencies.

Spark + Flume : "Unable to create Rpc client"

I am trying to connect a scala spark-shell with a flume agent.
I launch the shell :
 ./bin/spark-shell
--jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume-sink_2.10-1.6.0.jar
--jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume_2.10-1.6.0.jar
--jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume-assembly_2.10-1.6.0.jar
And launch a scala script to listen on port 10000 of localhost :
import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam
 
 
val host = "localhost"
val port = 10000
val batchInterval = Milliseconds(5000)
 
// Create the context and set the batch size
val sparkConf = new SparkConf().setAppName("FlumePollingEventCount")
val ssc = new StreamingContext(sc, batchInterval)
 
// Create a flume stream that polls the Spark Sink running in a Flume agent
val stream = FlumeUtils.createPollingStream(ssc, host, port)
 
// Print out the count of events received from this server in each batch
stream.count().map(cnt => "Received " + cnt + " flume events." ).print()
 
ssc.start()
ssc.awaitTermination()
 
Then I configure and start a shell agent :
1) configuration : I make a tail - f on a file appended by another running script (I think it doesn't matter to detail that here)
agent1.sources = source1
agent1.channels = channel1
agent1.sinks = spark
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -f /Users/romain/Informatique/notebooks/spark_scala/velib/logs/trajets.csv
agent1.sources.source1.channels = channel1
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 2000000
agent1.channels.channel1.transactionCapacity = 1000000
agent1.sinks = avroSink
agent1.sinks.avroSink.type = avro
agent1.sinks.avroSink.channel = channel1
agent1.sinks.avroSink.hostname = localhost
agent1.sinks.avroSink.port = 10000
2) start :
./bin/flume-ng agent --conf conf --conf-file ./conf/avro_velib.conf --name agent1 -Dflume.root.logger=INFO,console
And then things go ok till an error :
2017-02-01 15:09:55,688 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink avroSink
2017-02-01 15:09:55,688 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:289)] Starting RpcSink avroSink { host: localhost, port: 10000 }...
2017-02-01 15:09:55,688 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source source1
2017-02-01 15:09:55,689 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.ExecSource.start(ExecSource.java:169)] Exec source starting with command:tail -f /Users/romain/Informatique/notebooks/spark_scala/velib/logs/trajets.csv
2017-02-01 15:09:55,689 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: SINK, name: avroSink: Successfully registered new MBean.
2017-02-01 15:09:55,689 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SINK, name: avroSink started
2017-02-01 15:09:55,689 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:206)] Rpc sink avroSink: Building RpcClient with hostname: localhost, port: 10000
2017-02-01 15:09:55,689 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:126)] Attempting to create Avro Rpc client.
2017-02-01 15:09:55,691 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: SOURCE, name: source1: Successfully registered new MBean.
2017-02-01 15:09:55,692 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SOURCE, name: source1 started
2017-02-01 15:09:55,710 (lifecycleSupervisor-1-1) [WARN - org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:634)] Using default maxIOWorkers
2017-02-01 15:10:15,802 (lifecycleSupervisor-1-1) [WARN - org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:294)] Unable to create Rpc client using hostname: localhost, port: 10000
org.apache.flume.FlumeException: NettyAvroRpcClient { host: localhost, port: 10000 }: RPC connection error
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:182)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:121)
at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:638)
at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:89)
at org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:127)
at org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:211)
at org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:292)
at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:10000
at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261)
at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:203)
at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:152)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:168)
... 16 more
Any idea welcom
You should start "listenner"(spark app in your case) first and then launch "writer" (flume-ng)
Instead of localhost use your machine name i.e (machine name used in spark conf) in both scala and avroconf.
agent1.sinks.avroSink.hostname = <Machine name>
agent1.sinks.avroSink.port = 10000

Cannot connect to Cassandra using Astyanax

I am running latest version of cassandra on my mac inside docker. I have installed cassandra dev center on my mac and I can connect to cassandra using devcenter and I can easily query my tables.
http://i.imgur.com/48MztBM.png
http://i.imgur.com/jPKblly.png
I can see that all required ports for cassandra are also open. (as shown below)
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e698e61e3bac cassandra:latest "/docker-entrypoint.s" 9
minutes ago Up 9 minutes 0.0.0.0:7000->7000/tcp, 0.0.0.0:7199-
>7199/tcp, 0.0.0.0:9042->9042/tcp, 0.0.0.0:9160->9160/tcp, 7001/tcp cassandra
I also logged in, into my cassandra container and executed the following commands
root#e698e61e3bac:/# nodetool -h 127.0.0.1 enablethrift
root#e698e61e3bac:/# nodetool -h 127.0.0.1 statusthrift
running
However when I run the code below
val configImpl = new AstyanaxConfigurationImpl()
configImpl.setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
configImpl.setCqlVersion("3.4.2")
configImpl.setTargetCassandraVersion("3.7")
val poolConfig = new ConnectionPoolConfigurationImpl("MyConnectionPool")
poolConfig.setPort(9160)
poolConfig.setMaxConnsPerHost(1)
poolConfig.setSeeds("192.168.1.169")
val context = new AstyanaxContext.Builder()
.forCluster("localhost")
.forKeyspace("movielens_small")
.withAstyanaxConfiguration(configImpl)
.withConnectionPoolConfiguration(poolConfig)
.withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
.buildKeyspace(ThriftFamilyFactory.getInstance())
context.start()
val keyspace = context.getClient()
val cf = new ColumnFamily[UUID, String]("cf", UUIDSerializer.get, StringSerializer.get())
val result = keyspace.prepareQuery(cf).withCql("select name from movies").execute()
val data = result.getResult.getRows()
for {
row <- data
col <- row.getColumns
} {
println(col)
}
context.shutdown()
I get the following error
07:32:10,606 INFO ConnectionPoolMBeanManager:53 - Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=MyConnectionPool,ServiceType=connectionpool
07:32:10,625 INFO CountingConnectionPoolMonitor:194 - AddHost: 192.168.1.169
07:32:10,748 INFO CountingConnectionPoolMonitor:194 - AddHost: 172.17.0.2
07:32:10,748 INFO CountingConnectionPoolMonitor:205 - RemoveHost: 192.168.1.169
[error] (run-main-0) com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=172.17.0.2(172.17.0.2):9160, latency=2003(2003), attempts=1]Timed out waiting for connection
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=172.17.0.2(172.17.0.2):9160, latency=2003(2003), attempts=1]Timed out waiting for connection
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352)
at com.netflix.astyanax.thrift.AbstractThriftCqlQuery.execute(AbstractThriftCqlQuery.java:41)
at com.abhi.CassandraScanner$.delayedEndpoint$com$abhi$CassandraScanner$1(CassandraScanner.scala:41)
at com.abhi.CassandraScanner$delayedInit$body.apply(CassandraScanner.scala:19)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App.$anonfun$main$1$adapted(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:378)
at scala.App.main(App.scala:76)
at scala.App.main$(App.scala:74)
I was able to solve the problem. The problem was with the NodeDiscoveryType. I changed my code to NodeDiscoveryType.NONE and it worked perfectly. Although I still don't understand why its mandatory for the NodeDiscoveryType to be NULL and why it won't work with RING_DESCRIBE.
Hopefully someone will elaborate further.

How do I resolve a MongoDB timeout error when connecting via the Scala Play! framework?

I am connecting to MongoDB while using the Scala Play! framework. I end up getting this timeout error:
! #6j672dke5 - Internal server error, for (GET) [/accounts] ->
play.api.Application$$anon$1: Execution exception[[MongoTimeoutException: Timed out while waiting to connect after 10000 ms]]
at play.api.Application$class.handleError(Application.scala:293) ~[play_2.10-2.2.1.jar:2.2.1]
at play.api.DefaultApplication.handleError(Application.scala:399) [play_2.10-2.2.1.jar:2.2.1]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$12$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:165) [play_2.10-2.2.1.jar:2.2.1]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$12$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:162) [play_2.10-2.2.1.jar:2.2.1]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) [scala-library-2.10.4.jar:na]
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185) [scala-library-2.10.4.jar:na]
Caused by: com.mongodb.MongoTimeoutException: Timed out while waiting to connect after 10000 ms
at com.mongodb.BaseCluster.getDescription(BaseCluster.java:131) ~[mongo-java-driver-2.12.3.jar:na]
at com.mongodb.DBTCPConnector.getClusterDescription(DBTCPConnector.java:396) ~[mongo-java-driver-2.12.3.jar:na]
at com.mongodb.DBTCPConnector.getType(DBTCPConnector.java:569) ~[mongo-java-driver-2.12.3.jar:na]
at com.mongodb.DBTCPConnector.isMongosConnection(DBTCPConnector.java:370) ~[mongo-java-driver-2.12.3.jar:na]
at com.mongodb.Mongo.isMongosConnection(Mongo.java:645) ~[mongo-java-driver-2.12.3.jar:na]
at com.mongodb.DBCursor._check(DBCursor.java:454) ~[mongo-java-driver-2.12.3.jar:na]
Here is my Scala code for connecting to the database:
//models.scala
package models.mongodb
//imports
package object mongoContext {
//context stuff
val client = MongoClient(current.configuration.getString("mongo.host").toString())
val database = client(current.configuration.getString("mongo.database").toString())
}
Here is the actual model that is making the connection:
//google.scala
package models.mongodb
//imports
case class Account(
id: ObjectId = new ObjectId,
name: String
)
object AccountDAO extends SalatDAO[Account, ObjectId](
collection = mongoContext.database("accounts")
)
object Account {
def all(): List[Account] = AccountDAO.find(MongoDBObject.empty).toList
}
Here's the Play! framework MongoDB conf information:
# application.conf
# mongodb connection details
mongo.host="localhost"
mongo.port=27017
mongo.database="advanced"
Mongodb is running on my local machine. I can connect to it by typing mongo at the terminal window. Here's the relevant part of the conf file:
# mongod.conf
# Where to store the data.
# Note: if you run mongodb as a non-root user (recommended) you may
# need to create and set permissions for this directory manually,
# e.g., if the parent directory isn't mutable by the mongodb user.
dbpath=/var/lib/mongodb
#where to log
logpath=/var/log/mongodb/mongod.log
logappend=true
#port = 27017
# Listen to local interface only. Comment out to listen on all interfaces.
#bind_ip = 127.0.0.1
So what's causing this timeout error and how do I fix it? Thanks!
I figured out that I needed to change:
val client = MongoClient(current.configuration.getString("mongo.host").toString())
val database = client(current.configuration.getString("mongo.database").toString())
to:
val client = MongoClient(conf.getString("mongo.host"))
val database = client(conf.getString("mongo.database"))