I am having a cluster of two nodes i.e. two OrientDB servers running on two separate machines having the enterprise edition 2.2.3 .Both the machines are VM having fedora OS 18. The orientDB database consists of approximately 75000 edges and 5000 nodes.
When i try to stop any of the nodes or both the nodes one after other i am having following error:
Node1
2017-05-02 17:32:44:811 WARNI Received signal: SIGINT [OSignalHandler]Exception in thread "Timer-1" com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
at com.hazelcast.spi.AbstractDistributedObject.throwNotActiveException(AbstractDistributedObject.java:85)
at com.hazelcast.spi.AbstractDistributedObject.lifecycleCheck(AbstractDistributedObject.java:80)
at com.hazelcast.spi.AbstractDistributedObject.getNodeEngine(AbstractDistributedObject.java:74)
at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:309)
at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:250)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:94)
at com.orientechnologies.orient.server.hazelcast.OHazelcastDistributedMap.get(OHazelcastDistributedMap.java:53)
at com.orientechnologies.agent.profiler.OEnterpriseProfiler$14.run(OEnterpriseProfiler.java:772)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid11478.hprof ...
Heap dump file created [744789648 bytes in 21.248 secs]
Node2
2017-05-02 17:32:41:108 INFO [192.168.6.153]:2434 [orientdb] [3.6.3] Running shutdown hook... Current state: ACTIVE [Node]Exception in thread "Timer-1" com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
at com.hazelcast.spi.AbstractDistributedObject.throwNotActiveException(AbstractDistributedObject.java:85)
at com.hazelcast.spi.AbstractDistributedObject.lifecycleCheck(AbstractDistributedObject.java:80)
at com.hazelcast.spi.AbstractDistributedObject.getNodeEngine(AbstractDistributedObject.java:74)
at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:309)
at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:250)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:94)
at com.orientechnologies.orient.server.hazelcast.OHazelcastDistributedMap.get(OHazelcastDistributedMap.java:53)
at com.orientechnologies.agent.profiler.OEnterpriseProfiler$14.run(OEnterpriseProfiler.java:772)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
How can i solve the heap memory issue?
Seems like your problem is the Out of Memory error. The exception from Hazelcast just means that the HazelcastInstance was stopped, most probably based on the OOME fact.
Related
Setup: I have an artemis broker HA cluster with 3 brokers. The replication policy is replication. Each broker is running in its own VM.
Problem: When I leave my brokers running for long time, usually after 5-6 hours, I get the below error.
2022-11-21 21:32:37,902 WARN
[org.apache.activemq.artemis.utils.critical.CriticalMeasure] Component
org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager
is expired on path 0 2022-11-21 21:32:37,902 INFO
[org.apache.activemq.artemis.core.server] AMQ224107: The Critical
Analyzer detected slow paths on the broker. It is recommended that
you enable trace logs on org.apache.activemq.artemis.utils.critical
while you troubleshoot this issue. You should disable the trace logs
when you have finished troubleshooting. 2022-11-21 21:32:37,902 ERROR
[org.apache.activemq.artemis.core.server] AMQ224079: The process for
the virtual machine will be killed, as component
org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager#46d59067
is not responsive 2022-11-21 21:32:37,969 WARN
[org.apache.activemq.artemis.core.server] AMQ222199: Thread dump:
******************************************************************************* Complete Thread dump "Thread-517
(ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$7#437da279)"
Id=602 TIMED_WAITING on
java.util.concurrent.SynchronousQueue$TransferStack#75f49105
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.SynchronousQueue$TransferStack#75f49105
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
What does this really mean? I understand that the critical analyzer sees an error and it halts the broker but what is causing this error?
You may take a look at the documentation. Basically you are experiencing some issue tat the broker detects and it shuts down before it becomes too irresponsive. Setting the policy to LOG you might get more clues on the issue.
I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes after reboot of machine zookeeper is not starting in one of the node and I am seeing the below errors in logs
2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer#692] - Unable to load database on disk
java.io.IOException: The current epoch, 7, is older than the last zxid, 34359738370
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain#92] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
Caused by: java.io.IOException: The current epoch, 7, is older than the last zxid, 34359738370
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
... 4 more----
I have seen ZOOKEEPER-2354 and the symptoms look similar.
support#platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
8support#platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
7support#platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch.tmp
8support#platform2
The above issue states the issue is fixed in 3.4.6 but I am observing the same in 3.4.13.
Can someone let me know how can I recover the zookeeper node from this?
This has been discussed in zookeeper mailing thread. Relevant quote from that thread
With the other two zookeeper servers running I stopped the zookeeper
in the broken node and the deleted all the contents inside
/var/lib/zookeeper/version-2 and started the zookeeper back on the
node. It is running fine now and got all the data from the other
servers.
We have 2 node cluster in Weblogic running Liferay DXP. When we are trying to start the Node-2 we get the below error:
Any thoughts on this and what could be the best fix ? This is impacting our production and any help is appreciated.
Jun 18, 2018 1:26:21 AM org.jgroups.protocols.UDP setBufferSize
WARNING: JGRP000015: the receive buffer of socket MulticastSocket was
set to 500KB, but the OS only allocated 212.99KB. This might lead to
performance problems. Please set your max receive buffer in the OS
correctly (e.g. net.core.rmem_max on Linux)
1:26:21,354 AM EDT>
and we are getting in our logs:
01:26:21,484 ERROR [Start Level: Equinox Container:
123465678-1234-1234-1234-f0a48d1c8347][ClusterExecutorImpl:402] Unable
to get cluster node with address Servername-32883
Running MongoDB on my company's QA env, I ran into this error in the log:
2015-02-22T04:48:06.194-0500 [rsHealthPoll] SEVERE: Invalid access at address: 0
2015-02-22T04:48:06.290-0500 [rsHealthPoll] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0xf62526 0xf62300 0xf6241f 0x7fc70b581710 0xca12c2 0xca14e7 0xca3bb6 0xf02995 0xefb6d8 0xf9af1c 0x7fc70b5799d1 0x7fc70a2d28fd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xf62526]
/usr/bin/mongod() [0xf62300]
/usr/bin/mongod() [0xf6241f]
/lib64/libpthread.so.0(+0xf710) [0x7fc70b581710]
/usr/bin/mongod(_ZN5mongo21ReplSetHealthPollTask12tryHeartbeatEPNS_7BSONObjEPi+0x52) [0xca12c2]
/usr/bin/mongod(_ZN5mongo21ReplSetHealthPollTask17_requestHeartbeatERNS_13HeartbeatInfoERNS_7BSONObjERi+0xf7) [0xca14e7]
/usr/bin/mongod(_ZN5mongo21ReplSetHealthPollTask6doWorkEv+0x96) [0xca3bb6]
/usr/bin/mongod(_ZN5mongo4task4Task3runEv+0x25) [0xf02995]
/usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x128) [0xefb6d8]
/usr/bin/mongod() [0xf9af1c]
/lib64/libpthread.so.0(+0x79d1) [0x7fc70b5799d1]
/lib64/libc.so.6(clone+0x6d) [0x7fc70a2d28fd]
It seems there's some segmentation fault in rsHealthPoll.
This is from a mongod instance running as part of a replica set in a shard-ready cluster (2 mongods + 1 arbiter running with config servers and mongos processes).
This DB mostly receives writes of new records, periodically updating a boolean to True for some records, and some reads, according to user activity querying it. (Single collection at the moment)
Googling this error only gave me other, older, already-solved segmentation fault bugs in MongoDB Jira.
Anyone seen this recently or knows the reason?
I'm am using JBoss EAP 6.2 as Webapplication server and Apace Modcluster for load balancing.
Whenever I try to undeploy my webapplication, I get the following warning
14:22:16,318 WARN [org.jboss.modcluster] (ServerService Thread Pool -- 136) MODCLUSTER000025: Failed to drain 2 remaining active sessions from default-host:/starrassist within 10.000000.1 seconds
14:22:16,319 INFO [org.jboss.modcluster] (ServerService Thread Pool -- 136) MODCLUSTER000021: All pending requests drained from default-host:/starrassist in 10.002000.1 seconds
and it takes forever to undeploy and the EAP server-group and node in which the application is deployed becomes unresponsive.
The only workaround is to restart the entire EAP server. My Question is, Is there an attribute that I can set in EAP or ModCluster so that the active sessions beyond a maxTimeOut would expire itself?
To control the timeout to stop a context you can use the following configuration value:
Stop Context Timeout
The amount of time, measure in units specified by
stopContextTimeoutUnit, for which to wait for clean shutdown of a
context (completion of pending requests for a distributable context;
or destruction/expiration of active sessions for a non-distributable
context).
CLI Command:
/profile=full-ha/subsystem=modcluster/mod-cluster-config=configuration/:write-attribute(name=stop-context-timeout,value=10)
Ref: Configure the mod_cluster Subsystem
Likewise if you are using JDK 8 take a look at this issue: Draining pending requests fails with Oracle JDK8