Apache Geode Native Client logs show a connection pool error on starting native client - geode

We run a native client and I have noticed a Failed to add endpoint to pool error in the cache server logs when the client is started.
I setup the logs using:
CacheFactory cacheFactory = new CacheFactory();
return cacheFactory
.Set("log-file", "Geode.log")
.Set("log-level", "ALL")
.Set("name", "Dealer")
.SetPdxReadSerialized(true)
.Create();
The Geode.log file shows the following:
[info 2020/09/03 12:40:57.906591 GMT Daylight Time ARGO:15580 11876] ClientMetadataService started for pool MyPool2
[debug 2020/09/03 12:40:57.986018 GMT Daylight Time ARGO:15580 25428] SerializationRegistry::deserialize typeId = -1 dsCode = 1
[debug 2020/09/03 12:40:57.986095 GMT Daylight Time ARGO:15580 25428] closing the connection locator1
[debug 2020/09/03 12:40:57.986117 GMT Daylight Time ARGO:15580 25428] closing the connection locator
[fine 2020/09/03 12:40:57.986205 GMT Daylight Time ARGO:15580 25428] Created new endpoint 1.2.3.4:40404 for pool MyPool2
[error 2020/09/03 12:40:57.986256 GMT Daylight Time ARGO:15580 25428] Failed to add endpoint 1.2.3.4:40404 to pool MyPool2
[debug 2020/09/03 12:40:57.986285 GMT Daylight Time ARGO:15580 25428] ThinClientRedundancyManager::maintainRedundancyLevel(): checking redundant list, size = 0
[debug 2020/09/03 12:40:57.986303 GMT Daylight Time ARGO:15580 25428] ThinClientRedundancyManager::maintainRedundancyLevel(): finding nonredundant endpoints, size = 1
[fine 2020/09/03 12:40:57.986321 GMT Daylight Time ARGO:15580 25428] Recovering subscriptions on endpoint [1.2.3.4:40404] from pool MyPool2
[fine 2020/09/03 12:40:57.986339 GMT Daylight Time ARGO:15580 25428] TcrEndpoint::createNewConnection: connectTimeout = m_needToConnectInLock=59000000 appThreadRequest =0
[debug 2020/09/03 12:40:57.986361 GMT Daylight Time ARGO:15580 25428] Tcrconnection const isSecondary = 0 and isClientNotification = 0, this = 00000202EDBAD790, conn ref to endopint 1
[finest 2020/09/03 12:40:57.986438 GMT Daylight Time ARGO:15580 25428] Using socket send buffer size of 64240.
[finest 2020/09/03 12:40:57.986465 GMT Daylight Time ARGO:15580 25428] Using socket receive buffer size of 64240.
[debug 2020/09/03 12:40:57.986482 GMT Daylight Time ARGO:15580 25428] Creating plain socket stream
Can someone explain why we see the error here? The code that is executed is at ThinClientPoolDM.cpp but the error does not seem to make any difference to the client, which we can see does make a connection. Though the server endpoint does not appear to be added to a pool in the error state we can see a fine message almost immediately after saying recovering subscriptions on endpoint and it's the same endpoint.

There was a longstanding bug in this code causing it to log a failure here on success and vice-versa, which is most likely what you're hitting. This was fixed as part of PR #588 for GEODE-7930, on 4/6/2020. Please see if you have this fix in your local repo and reply if you do and are still hitting the issue.

Related

celery on kubernetes execute task 15 minutes after receive

try to migrate my app django/celery in nomad(hashicorp) to Kubernetes, and jobs with #shared_task() it's executed after 15 min at receiving message
I don't see anything in stats or status, Redis connection is OK
I see the task in flower, but it remains started during 15min
Received 2021-09-28 20:30:56.387649 UTC
Started 2021-09-28 20:30:56.390532 UTC
Succeeded 2021-09-28 20:46:00.556030 UTC
Received 2021-09-28 21:18:43.436750 UTC
Started 2021-09-28 21:18:43.441041 UTC
Succeeded 2021-09-28 21:33:49.391542 UTC
Celery version is 4.4.2
Any resolution to this problem?
fixed, it's based on redis key cache with setex
thanks

Why can't linux read hwclock some month shift?

We have a linux system that we are building with yocto.
We can read our hardware clock after reboots, change both system time and hardware time without any error (most of the time). However; after some new month, every year that we have tried we are running in to this error. "hwclock: RTC_RD_TIME: Invalid argument".
Example 1:
root#:~# date
Thu Apr 30 23:59:50 UTC 2020
root#:~# hwclock
Thu Apr 30 23:59:52 2020 0.000000 seconds
root#:~#
root#:~#
root#:~# date
Fri May 1 00:00:10 UTC 2020
root#:~# hwclock
hwclock: RTC_TD_TIME: Invalid argument
root#:~#
This is not happening every new month, if I do the same test in January linux can read the hwclock without any issues. It does also not matter if the unit is powered or not. If I set the hwclock to first of May 00:00:00 it can keep track of the time.
The same error occurs on the following month shift:
Feb (it does not matter if it is leap year or not) -> Mar
Apr -> May
Jun -> Jul
Sep -> Oct
Nov -> Dec
Dec (Not sure because of new year or new month) -> Jan
In my understanding, this is happening because rtc-lib.c cannot verify the time correctly.
I have tried on multiple different hardware
Does anyone have any idea what might cause this?
Solution:
The fault was not in rtc-lib.c. The cause of the error was a faulty RTC implementation. The RTC month value is 1-indexed, but the kernel assumes it is 0-indexed. Added a patch for this to rtc-[my_rtc_model].c and now it seems to be working.

postgreSql log file Errors

My application is deployed on remote application server (Linux) and from there it tries to connect to DB server (PostgreSql 9.4) which is again present on another remote server (Linux). I send a long message to app server through JMS and this message processing takes many hours to get processed. But unfortunately I am getting facing some issues of performance with DB server. When I see postgresql.log file I can see the below errors/warning:
< 2017-05-05 09:18:00.676 CEST >LOG: could not receive data from client: Connection timed out
< 2017-05-05 13:38:33.704 CEST >LOG: incomplete startup packet
< 2017-05-05 13:42:29.158 CEST >LOG: unexpected EOF on client connection with an open transaction
< 2017-05-05 13:50:49.163 CEST >LOG: checkpoints are occurring too frequently (1 second apart)
< 2017-05-05 13:50:49.163 CEST >HINT: Consider increasing the configuration parameter "checkpoint_segments".
Do I need to update something in postgresql.conf file. Can somebody please advise what should I follow to avoid these errors?

postgres: EOF detected for even simple queries

I'm running a postgres server locally on my computer and it seems that even the simple queries like the one below is giving me an EOF detected error.
For instance, this query
ALTER TABLE maintab ADD COLUMN testing numeric;
UPDATE maintab SET testing = numeric1 * numeric2;
And similar activities will throw an EOF error. I'm also running PostGIS with QGIS and my spatial queries, no matter how simple, will throw this error.
I've look around at forums and documentation but nothing can seem to help solve this problem. Is there anything I can do to stop this?
EDIT
I ran a check on my error logs after doing some Googling. Found these logs, not sure what to make of them
2015-09-04 11:18:31 EDT [1138-4] LOG: terminating any other active server processes
2015-09-04 11:18:31 EDT [1208-3] WARNING: terminating connection because of crash of another server process
2015-09-04 11:18:31 EDT [1208-4] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2015-09-04 11:18:31 EDT [1208-5] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2015-09-04 11:18:31 EDT [1138-5] LOG: all server processes terminated; reinitializing
2015-09-04 11:18:31 EDT [3861-1] LOG: database system was interrupted; last known up at 2015-09-04 15:08:49 EDT
2015-09-04 11:18:32 EDT [3861-2] LOG: database system was not properly shut down; automatic recovery in progress
2015-09-04 11:18:32 EDT [3861-3] LOG: record with zero length at 1D/123A250
2015-09-04 11:18:32 EDT [3861-4] LOG: redo is not required
2015-09-04 11:18:32 EDT [3861-5] LOG: MultiXact member wraparound protections are now enabled
2015-09-04 11:18:32 EDT [1138-6] LOG: database system is ready to accept connections
2015-09-04 11:18:32 EDT [3865-1] LOG: autovacuum launcher started
2015-09-04 16:07:22 EDT [1122-1] LOG: database system was interrupted; last known up at 2015-09-04 16:06:25 EDT
2015-09-04 16:07:22 EDT [1179-1] [unknown]#[unknown] LOG: incomplete startup packet
2015-09-04 16:07:23 EDT [1122-2] LOG: database system was not properly shut down; automatic recovery in progress
2015-09-04 16:07:23 EDT [1122-3] LOG: record with zero length at 1D/123A320
2015-09-04 16:07:23 EDT [1122-4] LOG: redo is not required
2015-09-04 16:07:23 EDT [1122-5] LOG: MultiXact member wraparound protections are now enabled
2015-09-04 16:07:23 EDT [1114-1] LOG: database system is ready to accept connections
2015-09-04 16:07:23 EDT [1183-1] LOG: autovacuum launcher started
2015-09-04 12:15:05 EDT [1183-2] LOG: stats collector's time 2015-09-04 16:07:23.363257-04 is later than backend local time 2015-09-04 12:15:05.07308-04
2015-09-04 12:17:34 EDT [1114-2] LOG: server process (PID 3824) was terminated by signal 11: Segmentation fault
2015-09-04 12:17:34 EDT [1114-4] LOG: terminating any other active server processes
2015-09-04 12:17:34 EDT [1183-3] WARNING: terminating connection because of crash of another server process
2015-09-04 12:17:34 EDT [1183-4] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2015-09-04 12:17:34 EDT [1183-5] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2015-09-04 12:17:34 EDT [1114-5] LOG: all server processes terminated; reinitializing
2015-09-04 12:17:34 EDT [3828-1] LOG: database system was interrupted; last known up at 2015-09-04 16:07:23 EDT
2015-09-04 12:17:35 EDT [3828-2] LOG: database system was not properly shut down; automatic recovery in progress
2015-09-04 12:17:35 EDT [3828-3] LOG: redo starts at 1D/123A388
2015-09-04 12:17:35 EDT [3828-4] LOG: unexpected pageaddr 1C/F9258000 in log segment 000000010000001D00000001, offset 2457600
2015-09-04 12:17:35 EDT [3828-5] LOG: redo done at 1D/1255C18
2015-09-04 12:17:36 EDT [3828-6] LOG: MultiXact member wraparound protections are now enabled
2015-09-04 12:17:36 EDT [3833-1] LOG: autovacuum launcher started
2015-09-04 12:17:36 EDT [1114-6] LOG: database system is ready to accept connections

Quickfix session setup : logout sent before logon from initiator

My QuickFIX client is sending logout before login everyday. Is it possible not to have this logout message sent before login ... ?
Followings are the setting I am using now:
[default]
FileStorePath=/home/quickfix/crons/exe/quickfix/filestore
ConnectionType=initiator
SenderCompID=TN7_42
TargetCompID=EMS
SocketConnectHost=xxxxx
TimeZone=Asia/Tokyo
StartTime=07:50:00 Asia/Tokyo
EndTime=20:00:00 Asia/Tokyo
HeartBtInt=30
ReconnectInterval=5
CheckLatency=N
UseLocalTime=Y
[session]
BeginString=FIX.4.2
SocketConnectPort=12061
ResetOnLogon=Y
ResetOnLogout=Y
ResetOnDisconnect=Y
RefreshOnLogon=N
.. and following is the log message I get everyday:
Oct 11, 2011 7:56:00 AM quickfix.SessionSchedule <init>
INFO: [FIX.4.2:TN7_42->EMS] daily, 22:50:00-UTC - 11:00:00-UTC
<20111010-22:56:00.820, FIX.4.2:TN7_42->EMS, event> (Session FIX.4.2:TN7_42->EMS schedule is daily, 22:50:00-UTC - 11:00:00-UTC)
<20111010-22:56:00.821, FIX.4.2:TN7_42->EMS, event> (Session state is not current; resetting FIX.4.2:TN7_42->EMS)
<20111010-22:56:00.821, FIX.4.2:TN7_42->EMS, event> (Created session: FIX.4.2:TN7_42->EMS)
Oct 11, 2011 7:56:00 AM quickfix.mina.NetworkingOptions logOption
INFO: Socket option: SocketTcpNoDelay=true
Oct 11, 2011 7:56:00 AM quickfix.mina.NetworkingOptions logOption
INFO: Socket option: SocketSynchronousWrites=false
Oct 11, 2011 7:56:00 AM quickfix.mina.NetworkingOptions logOption
INFO: Socket option: SocketSynchronousWriteTimeout=30000
Oct 11, 2011 7:56:00 AM quickfix.mina.initiator.IoSessionInitiator <init>
INFO: [FIX.4.2:TN7_42->EMS] [/xxxxx:12061]
Oct 11, 2011 7:56:00 AM quickfix.mina.SessionConnector startSessionTimer
INFO: SessionTimer started
Oct 11, 2011 7:56:00 AM quickfix.mina.initiator.InitiatorIoHandler sessionCreated
INFO: MINA session created for FIX.4.2:TN7_42->EMS: local=/xxxxx:48477, class org.apache.mina.transport.socket.nio.SocketSessionImpl, remote=/xxxxx:12061
<20111010-22:56:01.860, FIX.4.2:TN7_42->EMS, outgoing> (8=FIX.4.2^A9=52^A35=5^A34=1^A49=TN7_42^A52=20111010-22:56:01.859^A56=EMS^A10=085^A)
Oct 11, 2011 7:56:01 AM quickfix.Session disconnect
INFO: [FIX.4.2:TN7_42->EMS] Disconnecting: Session reset
Oct 11, 2011 7:56:05 AM quickfix.mina.initiator.InitiatorIoHandler sessionCreated
INFO: MINA session created for FIX.4.2:TN7_42->EMS: local=/xxxxx:48478, class org.apache.mina.transport.socket.nio.SocketSessionImpl, remote=/xxxxx:12061
<20111010-22:56:06.844, FIX.4.2:TN7_42->EMS, outgoing> (8=FIX.4.2^A9=70^A35=A^A34=1^A49=TN7_42^A52=20111010-22:56:06.844^A56=EMS^A98=0^A108=30^A141=Y^A10=166^A)
<20111010-22:56:06.845, FIX.4.2:TN7_42->EMS, event> (Initiated logon request)
<20111010-22:56:06.847, FIX.4.2:TN7_42->EMS, incoming> (8=FIX.4.2^A9=179^A35=5^A49=EMS^A56=TN7_42^A34=1^A43=N^A52=20111010-22:56:06.846^A58=Catastropic Error: Incoming sequence number (1) is less than expected (2) without PossDupFlag being set. Logging out.^A10=226^A)
Oct 11, 2011 7:56:06 AM quickfix.Session disconnect
INFO: [FIX.4.2:TN7_42->EMS] Disconnecting: IO Session closed
<20111010-22:56:06.849, FIX.4.2:TN7_42->EMS, error> (quickfix.SessionException Logon state is not valid for message (MsgType=5))
<20111010-22:56:06.849, FIX.4.2:TN7_42->EMS, event> (Already disconnected: Verifying message failed: quickfix.SessionException: Logon state is not valid for message (MsgType=5))
<20111010-22:56:10.887, FIX.4.2:TN7_42->EMS, error> (java.net.ConnectException: java.net.ConnectException: Connection refused(Next retry in 5000 milliseconds))
<20111010-22:56:15.898, FIX.4.2:TN7_42->EMS, error> (java.net.ConnectException: java.net.ConnectException: Connection refused(Next retry in 5000 milliseconds))
You're being bitten by bug QFJ-357, which although it is against the Java project also seems to be an issue for the straight C++ version.
This has actually been fixed in trunk for C++ by the fix in revision 2269.
Incoming sequence number is less than expected
This says it all. The sequence number being sent in the FIX message to the acceptor has a different sequence number then expected from the acceptor, hence the forced logoff message. This is done primarily to keep both the acceptor and initiator in sync while sending and receiving messages.
There is a flag in the config which mentions to reset all sequence numbers during connection. Use that flag to get over this problem for now, but better stick with the original sequence numbers. In the reject message you should get the sequence number being expected by the acceptor. Parse the sequence number and then start the logon process again.
Try making your session end time a bit earlier, and confirm that you are actually sending a logout due to the end time being reached, and not just terminating your application without logging out.
There also have been one or two bugs in QuickFIX around this area which did not exist in 1.12, so you might try that older version and see if it works better for you.