HADOOPFS - Could not verify the base directory in streamsets - streamsets

I am having issues running the Pipeline with in streamsets, I can see the following error is :
HADOOPFS_44 - Could not verify the base directory: 'java.net.ConnectException: Call From SDC/...... to ......failed on connection exception: java.net.ConnectException: Connection refused;
For more details see:
https://cwiki.apache.org/confluence/display/HADOOP2/ConnectionRefused

You should follow the steps given in the Hadoop wiki. The machine running StreamSets Data Collector cannot connect to the Hadoop cluster for some reason.

Related

"SchemaRegistryException: Failed to get Kafka cluster ID" for LOCAL setup

I'm downloaded the .tz (I am on MAC) for confluent version 7.0.0 from the official confluent site and was following the setup for LOCAL (1 node) and Kafka/ZooKeeper are starting fine, but the Schema Registry keeps failing (Note, I am behind a corporate VPN)
The exception message in the SchemaRegistry logs is:
[2021-11-04 00:34:22,492] INFO Logging initialized #1403ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log)
[2021-11-04 00:34:22,543] INFO Initial capacity 128, increased by 64, maximum capacity 2147483647. (io.confluent.rest.ApplicationServer)
[2021-11-04 00:34:22,614] INFO Adding listener: http://0.0.0.0:8081 (io.confluent.rest.ApplicationServer)
[2021-11-04 00:35:23,007] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryException: Failed to get Kafka cluster ID
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.kafkaClusterId(KafkaSchemaRegistry.java:1488)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.<init>(KafkaSchemaRegistry.java:166)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:71)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:90)
at io.confluent.rest.Application.configureHandler(Application.java:271)
at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:245)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:44)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:180)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.kafkaClusterId(KafkaSchemaRegistry.java:1486)
... 7 more
My schema-registry.properties file has bootstrap URL set to
kafkastore.bootstrap.servers=PLAINTEXT://localhost:9092
I saw some posts saying its the SchemaRegistry unable to connect to the KafkaCluster URL because of the localhost address potentially. I am fairly new to Kafka and basically just need this local setup to run a git repo that is utilizing some Topics/Kafka so my questions...
How can I fix this (I am behind a corporate VPN but I figured this shouldn't affect this)
Do I even need the SchemaRegistry?
I ended up just going with the Docker local setup inside, and the only change I had to make to the docker compose YAML was to change the schema-registry port (I changed it to 8082 or 8084, don't remember exactly but just an unused port that is not being used by some other Confluent service listed in the docker-compose.yaml) and my local setup is working fine now

Apache Beam Spark Portable Runner

I am running a sample pipeline and my environment is this.
python "SaiStudy - Apache-Beam-Spark.py" --runner=PortableRunner --job_endpoint=192.168.99.102:8099
My Spark is running on a Docker Container and I can see that the JobService is running at 8099.
I am getting the following error:
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"#1603539936.536000000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_chann
el.cc","file_line":4090,"referenced_errors":[{"created":"#1603539936.536000000","description":"failed to connect to all addresses","file":"src/core/ext/filters/cli
ent_channel/lb_policy/pick_first/pick_first.cc","file_line":394,"grpc_status":14}]}"
When I curl to ip:port, I can see the following error from the docker logs
Oct 24, 2020 11:34:50 AM org.apache.beam.vendor.grpc.v1p26p0.io.grpc.netty.NettyServerTransport notifyTerminated
INFO: Transport failed
org.apache.beam.vendor.grpc.v1p26p0.io.netty.handler.codec.http2.Http2Exception: Unexpected HTTP/1.x request: GET /
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:103)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.readClientPrefaceString(Http2ConnectionHandler.java:302)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:239)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:505)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:444)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:283)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at org.apache.beam.vendor.grpc.v1p26p0.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Help Please.
Please find instruction here how to setup PortableRunner for Spark:
https://beam.apache.org/documentation/runners/spark/
Basically you need to setup additional Docker container (as described) which acts as runner between Beam (in any language) and Spark.
You connect Beam to runner, and runner to the Spark.

localstack v0.11.5 and kcl v1.13.3

I am using kcl v1.13.3 with the latest localstack v0.11.5
The kcl client now uses edge service port 4566.
Are the kcl and localstack versions compatible?
I keep getting the following error:
com.amazonaws.SdkClientException: Unable to execute HTTP request: The target server failed to respond
Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond
com.amazonaws.SdkClientException: Unable to execute HTTP request: The target server failed to respond
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException (AmazonHttpClient.java:1163)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper (AmazonHttpClient.java:1109)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute (AmazonHttpClient.java:758)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer (AmazonHttpClient.java:732)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute (AmazonHttpClient.java:714)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500 (AmazonHttpClient.java:674)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute (AmazonHttpClient.java:656)
at com.amazonaws.http.AmazonHttpClient.execute (AmazonHttpClient.java:520)
at com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke (AmazonKinesisClient.java:2782)
Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead (DefaultHttpResponseParser.java:141)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead (DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse (AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader (DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader (CPoolProxy.java:165)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse (HttpRequestExecutor.java:273)
at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse (SdkHttpRequestExecutor.java:82)
at org.apache.http.protocol.HttpRequestExecutor.execute (HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute (MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute (ProtocolExec.java:185)
at org.apache.http.impl.client.InternalHttpClient.doExecute (InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:56)
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute (SdkHttpClient.java:72)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest (AmazonHttpClient.java:1285)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper (AmazonHttpClient.java:1101)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute (AmazonHttpClient.java:758)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer (AmazonHttpClient.java:732)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute (AmazonHttpClient.java:714)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500 (AmazonHttpClient.java:674)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute (AmazonHttpClient.java:656)
at com.amazonaws.http.AmazonHttpClient.execute (AmazonHttpClient.java:520)
Did you already confirm that localstack endpoint is functional on that port? For example:
aws kinesis list-streams --endpoint http://localhost:4566
(If you do not have and want aws cli installed, there's always the option of using docker)
Moreover, it might be helpful for you to share how you are boostrapping the AWS client. It should be something along the lines of:
AwsClientBuilder.EndpointConfiguration endpointConfig = new AwsClientBuilder.EndpointConfiguration("http://localhost:4566",
Regions.EU_WEST_1.getName());
return AmazonDynamoDBClientBuilder.standard()
.withEndpointConfiguration(endpointConfig)
.build();
Note that if you are running your kcl app inside another docker container, then you might want to change from "http://localhost:4566" to "http://localstack:4566".

Failed to open native connection to Cassandra from jar produced by intellij

I've developed a spark-based application that gets data from kafka and saves it in cassandra DB, using intellij.
connection code in scala:
val cluster = Cluster.builder().addContactPoint("192.168.0.253").withPort(9042).build();
val session = cluster.connect()
the code work fine when I run it from intellij, but I get this error when I try to run it from jar using command line:
Exception in thread "main" java.io.IOException:
Failed to open native connection to Cassandra at {192.168.0.253}:9042 at .... ....
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed
(tried: /192.168.0.253:9042 (com.datastax.driver.core.exceptions.TransportException:
[/192.168.0.253] Error writing)) at
com.datastax.driver.core.ControlConnection.reconnectInternal(
ControlConnection.java:233) at
com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1424)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:403)
at com.datastax.spark.connector.cql.CassandraConnector$.
com$datastax$spark$connector$cql$CassandraConnector$$createSession(
CassandraConnector.scala:155) ... 13 more
I've produced a jar file from intellij, and I built the jar with dependencies, using [ copy to the output directory and link via manifist ] option.
cassandra.yaml file:
#Whether to start the native transport server.
start_native_transport: true
#port for the CQL native transport to listen for clients on
native_transport_port: 9042
Why is this error raised & and how can I fix it?
Thanks in advance

"Unable to associated Elastic IP with cluster" in Eclipse Plugin Tutorial

I am currently trying to evaluate AWS for my company and was trying to follow the tutorials on the web.
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2241
However I get the below error during startup of the server instance:
Unable to associated Elastic IP with cluster: Unable to detect that the Elastic IP was orrectly associated.
java.lang.Exception: Unable to detect that the Elastic IP was correctly associated
at com.amazonaws.ec2.cluster.Cluster.associateElasticIp(Cluster.java:802)
at com.amazonaws.ec2.cluster.Cluster.start(Cluster.java:311)
at com.amazonaws.eclipse.wtp.ElasticClusterBehavior.launch(ElasticClusterBehavior.java:611)
at com.amazonaws.eclipse.wtp.Ec2LaunchConfigurationDelegate.launch(Ec2LaunchConfigurationDelegate.java:47)
at org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:853)
at org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:703)
at org.eclipse.debug.internal.core.LaunchConfiguration.launch(LaunchConfiguration.java:696)
at org.eclipse.wst.server.core.internal.Server.startImpl2(Server.java:3051)
at org.eclipse.wst.server.core.internal.Server.startImpl(Server.java:3001)
at org.eclipse.wst.server.core.internal.Server$StartJob.run(Server.java:300)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Then after a while, another error occur:
Unable to publish server configuration files: Unable to copy remote file after trying 4 timeslocal file: 'XXXXXXXX/XXX.zip'
Results from first attempt:
Unexpected exception: java.net.ConnectException: Connection timed out: connect
root cause: java.net.ConnectException: Connection timed out: connect
at com.amazonaws.eclipse.ec2.RemoteCommandUtils.copyRemoteFile(RemoteCommandUtils.java:128)
at com.amazonaws.eclipse.wtp.tomcat.Ec2TomcatServer.publishServerConfiguration(Ec2TomcatServer.java:172)
at com.amazonaws.ec2.cluster.Cluster.publishServerConfiguration(Cluster.java:369)
at com.amazonaws.eclipse.wtp.ElasticClusterBehavior.publishServer(ElasticClusterBehavior.java:538)
at org.eclipse.wst.server.core.model.ServerBehaviourDelegate.publish(ServerBehaviourDelegate.java:866)
at org.eclipse.wst.server.core.model.ServerBehaviourDelegate.publish(ServerBehaviourDelegate.java:708)
at org.eclipse.wst.server.core.internal.Server.publishImpl(Server.java:2731)
at org.eclipse.wst.server.core.internal.Server$PublishJob.run(Server.java:278)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Can anyone point me to what I'm doing wrong?
I followed the tutorials and the video tutorials on youtube exactly.
Best Regards
~Jeffrey
Problem has been resolved, it is due to the Elastic IP being assigned to the Server on creation.
Eclipse is unable to connect to the server if it is used.