vertica kafka scheduler stops processing messages - apache-kafka

Could you please help me? I have vertica's scheduler, it reads data from apache kafka's topic. Periodically, the scheduler stops processing messages from the topic, there are no errors in the scheduler log, the scheduler process itself continues to work, but does not process messages and it cannot be stopped correctly, you can only kill this process. With what it can be connected and what else it is possible to look at the given problem?
Below are the last entries from the scheduler log before the problem occurred
2022-07-12 05:30:20.986 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Starting frame # 2022-07-12 05:30:20.986
2022-07-12 05:30:48.837 com.vertica.solutions.kafka.scheduler.LaneWorker::Lane Worker 1 [INFO] Lane Worker 1 waiting for batch...
2022-07-12 05:30:48.837 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Sleeping for 2149 milliseconds until 2022-07-12 05:30:50.986. Started frame # 2022-07-12 05:30:20.986.
2022-07-12 05:30:51.018 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Starting frame # 2022-07-12 05:30:51.018
2022-07-12 05:31:21.431 com.vertica.solutions.kafka.scheduler.LaneWorker::Lane Worker 1 [INFO] Lane Worker 1 waiting for batch...
2022-07-12 05:31:21.456 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Starting frame # 2022-07-12 05:31:21.456
2022-07-12 05:31:30.111 com.vertica.solutions.kafka.scheduler.LaneWorker::Lane Worker 1 [INFO] Lane Worker 1 waiting for batch...
2022-07-12 05:31:30.111 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Sleeping for 21345 milliseconds until 2022-07-12 05:31:51.456. Started frame # 2022-07-12 05:31:21.456.
2022-07-12 05:31:51.480 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Starting frame # 2022-07-12 05:31:51.48
2022-07-12 05:32:13.280 com.vertica.solutions.kafka.scheduler.LaneWorker::Lane Worker 1 [INFO] Lane Worker 1 waiting for batch...
2022-07-12 05:32:13.281 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Sleeping for 8200 milliseconds until 2022-07-12 05:32:21.48. Started frame # 2022-07-12 05:31:51.48.
2022-07-12 05:32:21.505 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Starting frame # 2022-07-12 05:32:21.505
2022-07-12 05:35:19.932 com.vertica.solutions.kafka.scheduler.config.ConfigurationRefresher::Main_cfg_refresh [INFO] Refreshing Scheduler (refresh interval reached).
2022-07-12 05:35:19.932 com.vertica.solutions.kafka.scheduler.config.ConfigurationRefresher::Main_cfg_refresh [INFO] refresh trial 0

Related

Play: Restart on code change takes long to shutdown and starting the Hikary Pool

Waiting for the server to restart when working with Play cost us a lot of time.
One thing I see in the log is that shutting down and starting the HikaryPool takes a lot of time (> 40 seconds).
Here is the log:
2019-10-31 09:11:47,327 [info] application - Shutting down connection pool.
2019-10-31 09:11:47,328 [info] c.z.h.HikariDataSource - HikariPool-58 - Shutdown initiated...
2019-10-31 09:11:53,629 [info] c.z.h.HikariDataSource - HikariPool-58 - Shutdown completed.
2019-10-31 09:11:53,629 [info] application - Shutting down connection pool.
2019-10-31 09:11:53,629 [info] c.z.h.HikariDataSource - HikariPool-59 - Shutdown initiated...
2019-10-31 09:11:53,636 [info] c.z.h.HikariDataSource - HikariPool-59 - Shutdown completed.
2019-10-31 09:11:53,636 [info] application - Shutting down connection pool.
2019-10-31 09:11:53,636 [info] c.z.h.HikariDataSource - HikariPool-60 - Shutdown initiated...
2019-10-31 09:11:53,640 [info] c.z.h.HikariDataSource - HikariPool-60 - Shutdown completed.
....
2019-10-31 09:12:26,454 [info] p.a.d.DefaultDBApi - Database [amseewen] initialized at jdbc:postgresql://localhost:5432/bpf?currentSchema=amseewen
2019-10-31 09:12:26,454 [info] application - Creating Pool for datasource 'amseewen'
2019-10-31 09:12:26,454 [info] c.z.h.HikariDataSource - HikariPool-68 - Starting...
2019-10-31 09:12:26,455 [info] c.z.h.HikariDataSource - HikariPool-68 - Start completed.
2019-10-31 09:12:26,455 [info] p.a.d.DefaultDBApi - Database [companyOds] initialized at jdbc:sqlserver://localhost:1433;databaseName=companyOds
2019-10-31 09:12:26,455 [info] application - Creating Pool for datasource 'companyOds'
2019-10-31 09:12:26,455 [info] c.z.h.HikariDataSource - HikariPool-69 - Starting...
2019-10-31 09:12:26,456 [info] c.z.h.HikariDataSource - HikariPool-69 - Start completed.
2019-10-31 09:12:26,457 [info] p.a.d.DefaultDBApi - Database [company] initialized at jdbc:oracle:thin:#castor.olymp:1521:citrin
2019-10-31 09:12:26,457 [info] application - Creating Pool for datasource 'company'
2019-10-31 09:12:26,457 [info] c.z.h.HikariDataSource - HikariPool-70 - Starting...
2019-10-31 09:12:26,458 [info] c.z.h.HikariDataSource - HikariPool-70 - Start completed.
2019-10-31 09:12:26,458 [info] p.a.d.DefaultDBApi - Database [amseewen] initialized at jdbc:postgresql://localhost:5432/bpf?currentSchema=amseewen
2019-10-31 09:12:26,458 [info] application - Creating Pool for datasource 'amseewen'
2019-10-31 09:12:26,458 [info] c.z.h.HikariDataSource - HikariPool-71 - Starting...
2019-10-31 09:12:26,459 [info] c.z.h.HikariDataSource - HikariPool-71 - Start completed.
2019-10-31 09:12:26,459 [info] p.a.d.DefaultDBApi - Database [companyOds] initialized at jdbc:sqlserver://localhost:1433;databaseName=companyOds
2019-10-31 09:12:26,459 [info] application - Creating Pool for datasource 'companyOds'
2019-10-31 09:12:26,459 [info] c.z.h.HikariDataSource - HikariPool-72 - Starting...
2019-10-31 09:12:26,459 [info] c.z.h.HikariDataSource - HikariPool-72 - Start completed.
Is there a way to shorten this time?
Updates
I use The Play integration of Intellij. The build-tool is sbt.
Here is the configuration:
sbt 1.2.8
Thread Pools
We use the default thread pool for the application. For the Database access we use:
database.dispatcher {
executor = "thread-pool-executor"
throughput = 1
thread-pool-executor {
fixed-pool-size = 55 # db conn pool (50) + number of cores (4) + housekeeping (1)
}
}
Ok with the help of billoneil on the Hikari Github Page and suggestions of #Issilva, I could figure out the problem:
The problem are now datasources where the database is not reachable (during development). So we configured it, that the application also
starts when the database is not reachable (initializationFailTimeout = -1).
So there are 2 problems when shutting down:
The pools are shutdown sequentially.
A pool that has no connection takes 10 seconds to shutdown.
The suggested solution is not to initialise the datasources that can not be reached. Except a strange exception the shutdown time problem is solved (down to milliseconds).

Cluster running on single machine eats too much space of /dev/shm

I am running the example provided by official akka: https://github.com/akka/akka-samples/tree/2.5/akka-sample-cluster-scala.
My OS is: Linux Mint 19 with the latest kernel.
And for the Worker Dial-in Example(Transformation Example), I cannot fully run this example as there is no enough space in /dev/shm. Although I have more than 2GB available space.
The problem is when I launch the first frontend node, it eats some KBs space. When I launch the second one, it eats some MBs space. When I launch the third one, it eats some hundred of MBs space. Further I just cannot even launch the fourth one, it just throws an error which causes the whole cluster down:
[info] Warning: space is running low in /dev/shm (tmpfs) threshold=167,772,160 usable=95,424,512
[info] Warning: space is running low in /dev/shm (tmpfs) threshold=167,772,160 usable=45,088,768
[info] [ERROR] [11/05/2018 21:03:56.156] [ClusterSystem-akka.actor.default-dispatcher-12] [akka://ClusterSystem#127.0.0.1:57246/] swallowing exception during message send
[info] io.aeron.exceptions.RegistrationException: IllegalStateException : Insufficient usable storage for new log of length=50335744 in /dev/shm (tmpfs)
[info] at io.aeron.ClientConductor.onError(ClientConductor.java:174)
[info] at io.aeron.DriverEventsAdapter.onMessage(DriverEventsAdapter.java:81)
[info] at org.agrona.concurrent.broadcast.CopyBroadcastReceiver.receive(CopyBroadcastReceiver.java:100)
[info] at io.aeron.DriverEventsAdapter.receive(DriverEventsAdapter.java:56)
[info] at io.aeron.ClientConductor.service(ClientConductor.java:660)
[info] at io.aeron.ClientConductor.awaitResponse(ClientConductor.java:696)
[info] at io.aeron.ClientConductor.addPublication(ClientConductor.java:371)
[info] at io.aeron.Aeron.addPublication(Aeron.java:259)
[info] at akka.remote.artery.aeron.AeronSink$$anon$1.<init>(AeronSink.scala:103)
[info] at akka.remote.artery.aeron.AeronSink.createLogicAndMaterializedValue(AeronSink.scala:100)
[info] at akka.stream.impl.GraphStageIsland.materializeAtomic(PhasedFusingActorMaterializer.scala:630)
[info] at akka.stream.impl.PhasedFusingActorMaterializer.materialize(PhasedFusingActorMaterializer.scala:450)
[info] at akka.stream.impl.PhasedFusingActorMaterializer.materialize(PhasedFusingActorMaterializer.scala:415)
[info] at akka.stream.impl.PhasedFusingActorMaterializer.materialize(PhasedFusingActorMaterializer.scala:406)
[info] at akka.stream.scaladsl.RunnableGraph.run(Flow.scala:588)
[info] at akka.remote.artery.Association.runOutboundOrdinaryMessagesStream(Association.scala:726)
[info] at akka.remote.artery.Association.runOutboundStreams(Association.scala:657)
[info] at akka.remote.artery.Association.associate(Association.scala:649)
[info] at akka.remote.artery.AssociationRegistry.association(Association.scala:989)
[info] at akka.remote.artery.ArteryTransport.association(ArteryTransport.scala:724)
[info] at akka.remote.artery.ArteryTransport.send(ArteryTransport.scala:710)
[info] at akka.remote.RemoteActorRef.$bang(RemoteActorRefProvider.scala:591)
[info] at akka.actor.ActorRef.tell(ActorRef.scala:124)
[info] at akka.actor.ActorSelection$.rec$1(ActorSelection.scala:265)
[info] at akka.actor.ActorSelection$.deliverSelection(ActorSelection.scala:269)
[info] at akka.actor.ActorSelection.tell(ActorSelection.scala:46)
[info] at akka.actor.ScalaActorSelection.$bang(ActorSelection.scala:280)
[info] at akka.actor.ScalaActorSelection.$bang$(ActorSelection.scala:280)
[info] at akka.actor.ActorSelection$$anon$1.$bang(ActorSelection.scala:198)
[info] at akka.cluster.ClusterCoreDaemon.gossipTo(ClusterDaemon.scala:1330)
[info] at akka.cluster.ClusterCoreDaemon.gossip(ClusterDaemon.scala:1047)
[info] at akka.cluster.ClusterCoreDaemon.gossipTick(ClusterDaemon.scala:1010)
[info] at akka.cluster.ClusterCoreDaemon$$anonfun$initialized$1.applyOrElse(ClusterDaemon.scala:496)
[info] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
[info] at akka.actor.Actor.aroundReceive(Actor.scala:517)
[info] at akka.actor.Actor.aroundReceive$(Actor.scala:515)
[info] at akka.cluster.ClusterCoreDaemon.aroundReceive(ClusterDaemon.scala:295)
[info] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
[info] at akka.actor.ActorCell.invoke(ActorCell.scala:557)
[info] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
[info] at akka.dispatch.Mailbox.run(Mailbox.scala:225)
[info] at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
[info] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
[info] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
[info] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
[info] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
It seems it is sending huge message(48MB+?) to every nodes.
So what's up here? What is the root cause and how shall I fix this?

Kafka-Manager Web UI not loading

I have started kafka-manager on centos VM and below are its logs.
[info] o.a.z.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
[info] o.a.z.ZooKeeper - Client environment:java.io.tmpdir=/tmp
[info] o.a.z.ZooKeeper - Client environment:java.compiler=
[info] o.a.z.ZooKeeper - Client environment:os.name=Linux
[info] o.a.z.ZooKeeper - Client environment:os.arch=amd64
[info] o.a.z.ZooKeeper - Client environment:os.version=3.10.0-862.el7.x86_64
[info] o.a.z.ZooKeeper - Client environment:user.name=root
[info] o.a.z.ZooKeeper - Client environment:user.home=/root
[info] o.a.z.ZooKeeper - Client environment:user.dir=/root/Confluent_kafka/kafka-manager-1.3.3.21
[info] o.a.z.ZooKeeper - Initiating client connection, connectString=localhost:3181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState#73687e45
[info] o.a.z.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:3181. Will not attempt to authenticate using SASL (unknown error)
[info] o.a.z.ClientCnxn - Socket connection established to localhost/127.0.0.1:3181, initiating session
[info] k.m.a.KafkaManagerActor - zk=localhost:3181
[info] k.m.a.KafkaManagerActor - baseZkPath=/kafka-manager
[info] o.a.z.ClientCnxn - Session establishment complete on server localhost/127.0.0.1:3181, sessionid = 0x16565ff95660000, negotiated timeout = 60000
[info] k.m.a.KafkaManagerActor - Started actor akka://kafka-manager-system/user/kafka-manager
[info] k.m.a.KafkaManagerActor - Starting delete clusters path cache...
[info] k.m.a.DeleteClusterActor - Started actor akka://kafka-manager-system/user/kafka-manager/delete-cluster
[info] k.m.a.DeleteClusterActor - Starting delete clusters path cache...
[info] k.m.a.KafkaManagerActor - Starting kafka manager path cache...
[info] k.m.a.DeleteClusterActor - Adding kafka manager path cache listener...
[info] k.m.a.DeleteClusterActor - Scheduling updater for 10 seconds
[info] k.m.a.KafkaManagerActor - Adding kafka manager path cache listener...
[info] play.api.Play - Application started (Prod)
[info] p.c.s.NettyServer - Listening for HTTP on /0.0.0.0:9000
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Updating internal state...
[info] k.m.a.KafkaManagerActor - Shutting down kafka manager
Tha kafka manager starts perfectly but the WEB UI does not load at all.
The IPV6 is disabled and the netstat shows this
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 2670/java
Can someone help in this.

Kafka vertica consumer and rejection table

I see very strange behavior using vertica kafka consumer:
2016-07-27 04:22:17.307 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Starting frame # 2016-07-27 04:22:17.307
2016-07-27 04:22:17.330 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Starting compute batches for new Frame.
2016-07-27 04:22:17.431 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Completed computing batch set for current Frame.
2016-07-27 04:22:17.469 com.vertica.solutions.kafka.scheduler.LaneWorker::Lane Worker 2 ("openx"."requests"-CREATE#2016-07-27 04:22:17.431) [ERROR] Rolling back MB: [Vertica][VJDBC](4213) ROLLBACK: Object "requests_rej" already exists
java.sql.SQLSyntaxErrorException: [Vertica][VJDBC](4213) ROLLBACK: Object "requests_rej" already exists
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.dataengine.VResultSet.fetchChunk(Unknown Source)
at com.vertica.dataengine.VResultSet.initialize(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.readExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.handleExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.execute(Unknown Source)
at com.vertica.jdbc.common.SPreparedStatement.executeWithParams(Unknown Source)
at com.vertica.jdbc.common.SPreparedStatement.executeUpdate(Unknown Source)
at com.vertica.solutions.kafka.scheduler.MicroBatch.execute(MicroBatch.java:193)
at com.vertica.solutions.kafka.scheduler.LaneWorker.run(LaneWorker.java:69)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.vertica.support.exceptions.SyntaxErrorException: [Vertica][VJDBC](4213) ROLLBACK: Object "requests_rej" already exists
... 11 more
2016-07-27 04:22:17.469 com.vertica.solutions.kafka.scheduler.LaneWorker::Lane Worker 2 [INFO] Lane Worker 2 waiting for batch...
2016-07-27 04:22:17.469 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Sleeping for 9838 milliseconds until 2016-07-27 04:22:27.307. Started frame # 2016-07-27 04:22:17.307.
2016-07-27 04:22:27.308 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Starting frame # 2016-07-27 04:22:27.307
2016-07-27 04:22:27.331 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Starting compute batches for new Frame.
2016-07-27 04:22:27.427 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Completed computing batch set for current Frame.
I do as documentation says, but on write into vertica see this error! Why I see it? How can I fix it?
The error clearly mentions that
Object "requests_rej" already exists
Try dropping the object and running your job.

FreeSwitch very slow

I've installed a default out of the box FreeSwitch instance but when I try to make an internal call (extension to extension) it take around 12 seconds before the call is established and I can hear the ring tone.
When I look at the log I see the connection request almost instantly but then no activities and after 10 seconds or more the call starts and I hear the phone ringing.
Here is the log file it it helps, please see the 10 seconds delay between 130:08:07 to 13:08:17.
freeswitch#vps-1170411-23979.manage.myhosting.com> 2015-09-26 13:07:41.591949 [CONSOLE] mod_voicemail.c:4091 Event Thread Started
2015-09-26 13:08:02.171949 [NOTICE] switch_channel.c:1075 New Channel sofia/internal/1001#168.144.85.16 [25229804-6471-11e5-9558-f1a7477c5309]
2015-09-26 13:08:07.331948 [INFO] mod_dialplan_xml.c:635 Processing BSmarter.CA <1001>->1000 in context default
2015-09-26 13:08:07.331948 [CRIT] mod_dptools.c:1670 WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
2015-09-26 13:08:07.331948 [CRIT] mod_dptools.c:1670 Open /usr/local/freeswitch/conf/vars.xml and change the default_password.
2015-09-26 13:08:07.331948 [CRIT] mod_dptools.c:1670 Once changed type 'reloadxml' at the console.
2015-09-26 13:08:07.331948 [CRIT] mod_dptools.c:1670 WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
2015-09-26 13:08:17.371961 [INFO] switch_ivr_async.c:3932 Bound B-Leg: *1 execute_extension::dx XML features
2015-09-26 13:08:17.371961 [INFO] switch_ivr_async.c:3932 Bound B-Leg: *2 record_session::/usr/local/freeswitch/recordings/1001.2015-09-26-13-08-17.wav
2015-09-26 13:08:17.371961 [INFO] switch_ivr_async.c:3932 Bound B-Leg: *3 execute_extension::cf XML features
2015-09-26 13:08:17.371961 [INFO] switch_ivr_async.c:3932 Bound B-Leg: *4 execute_extension::att_xfer XML features
2015-09-26 13:08:17.391951 [NOTICE] switch_channel.c:1075 New Channel sofia/internal/1000#99.226.75.129:63329 [2e34333a-6471-11e5-957b-f1a7477c5309]
2015-09-26 13:08:17.571984 [NOTICE] sofia.c:6760 Ring-Ready sofia/internal/1000#99.226.75.129:63329!
2015-09-26 13:08:17.591949 [INFO] switch_ivr_originate.c:1193 Sending early media
2015-09-26 13:08:17.591949 [INFO] switch_core_media.c:5395 Activating RTCP PORT 4003
2015-09-26 13:08:17.591949 [NOTICE] sofia_media.c:92 Pre-Answer sofia/internal/1001#168.144.85.16!
2015-09-26 13:08:18.631986 [NOTICE] sofia.c:7580 Hangup sofia/internal/1001#168.144.85.16 [CS_EXECUTE] [ORIGINATOR_CANCEL]
Any idea what the problem might be?
This pause was introduced in order to force the people to change the default password. Just edit it in vars.xml and the delay should go away.
Just Like
Stanislav Sinyagin edit the /conf/vars.xml and change the default password 1234 to new password.
this is the only way to stop that delay