Orientdb network connection lost during commit

Orientdb network connection lost during commit - orientdb

I am using the blueprints graph api for orient against a 2 node cluster running orient 1.7.10. When ingesting simple parent child data I intermittently get the following error on commit -
Warning: caught I/O errors from not connected (local socket=?), trying to reconnect (error: java.io.IOException: Channel is closed)
The connection is then reestablished:
Connection re-acquired transparently after 31ms and 1 retries: no errors will be thrown at application level.
This occurs mid way through the commit (100 vertices and edges) with the result that the server thinks it has sent the response but the client hangs forever.
Is there a way to catch this at the application level and e.g. rollback?
I would be very grateful for any help?

As far as i know a very similar issue was fixed some time ago: https://github.com/orientechnologies/orientdb/issues/2930
one thing to be aware is that autostart transaction of the graph, if is enabled (and it is by default) you don't need to do begin, but just commit, if you do begin the transaction will be committed at shutdown and in that case can create that problem.
another suggestion is migrate to 2.0-* releases that have important improvement also in that side, especially if you are in development phase, the 2.0 final is going to be released very soon and will be the one with major focus in the next months.
bye

Related

Opening DB connections to Postgres taking long

Some of our applications are facing issues with the connection pool. I run one of them. A JEE application on Payara 4.1 which uses PostgreSQL 9.5.8.
I have as good as no problems when running the application localy with local db instance. When running on the remote environment I have seen issues happening every 10 minutes that the application was unresponsive (well, it actually responded everything with HTTP status 503). Guessing it was related to opening connections taking long, we have set the parameter idleTimeoutInSeconds="0" in jdbc-resource. Now we have the same issues about 4 times a day which is an improvement, but - well - neighbour systems are still complaining.
We usually run with 5 steady connection allowing maximum connections of 30. Our application usually uses 1 up to 2 to handle traffic. With TCP dump I have seen, that at a certain point in time the connection pool tries to open many connections (the pool realizes the connections it holds have been closed by the DB without any information like TCP FIN, opening each connections takes about 1 second). During this time of about 30 seconds not all requests can be safely queued and some 503 happen.
Locally everything is fine. Opening a connection takes ~50ms and everyone is happy. Our postgres team is not helping at all and I am stuck with a problem. As I don't see any improvement possibility with the connection pool in JEE, I have radical ideas going in the direction of:
Refreshing the connections myself. All the time. Constantly. (Which would be hard to implement in JEE where I can not simply look into the connection pool and tell each connection to be refreshed just in case).
Replacing the not-helping-at-all JEE implementation of connection pool with something that works better. (Future generations of developers maintaining our app will hate me...)
Replacing the DB with something managed by myself. (Even dumber idea)
Does anyone:
Has any idea how I could perform 1 or 2 above?
Has any other ideas what could help?
Here my current JDBC resource definition if needed:
<jdbc-resource poolName="<poolName>" jndiName="<jndiName>" isConnectionValidationRequired="true"
connectionValidationMethod="table" validationTableName="version()" maxPoolSize="30"
validateAtmostOncePeriodInSeconds="30" statementTimeoutInSeconds="30" isTimerPool="true" steadyPoolSize="5"
idleTimeoutInSeconds="0" connectionCreationRetryAttempts="100000" connectionCreationRetryIntervalInSeconds="30"
maxWaitTimeInMillis="2000">

How to know if a Firebird 2.0 database is being accessed?

I know that using Firebird 2.5+ I can check if there are users accessing my database using SQL, but unfortunately, Firebird 2.0 doesn't have this feature. Yes, I know it's an old version, but it's a legacy software and I'm not allowed to upgrade this in a short time... :(
I need to know if someone is connected to my 2.0 Firebird database, due to a process I'll run:
Block connections to DB (but ONLY if no one is connected)
Run my process
Allow users to reconnect again
I can start my process only when there are no users connected.
My database is part of a client/server system (no Web).
Any hints?

-at[tach] : this parameter prevents any new connections to the database from being made with the exception of the SYSDBA and the database owner. The shutdown will fail if there are any sessions connected after the timeout period has expired. It makes no difference if those connected sessions belong to the SYSDBA, the database owner or any other user. Any connections remaining will terminate the shutdown with the following details:
https://firebirdsql.org/manual/gfix-dbstartstop.html
There is also Services API to do it so your database access library should expose the shutdown function. Specify a short shutdown, and if it failed - then there were some users. If it succeeded - now you can go on with maintenance, having a warranty client applications will not be able to connect.
Alternatively you can upgrade Firebird 2.0 -> 2.1 which is more close to 2.0 than 2.5 but already have Monitoring Tables implemented.
However this your approach has one weak point - race conditions. Using M.T. you envision your work as following:
Keep querying M.T. (which slows down server work significantly) until there are no other connections.
start maintenance work, that would fail if other connections are active
complete maintenance work
Problem is, that even after at step 1 you gained "no other connection" state, it does not mean that between steps 1 and 2, and especially between steps 2 and 3 now new connections would be made.
Even if you made your checks and ensure #1 condition, when you would go on with maintenance there would be some new user connected back and working now. Not every time of course, but as time goes by it will eventually happen one day.
But there is yet one more good thing in FB 2.1 - database-level triggers.
c:\Program Files\Firebird\Firebird_2_1\doc\sql.extensions\README.db_triggers.txt
You can create a regular "all_current_connections" table, using on connect and on disconnect triggers to keep it up to date.
You perhaps would also have to add some logic to your applications, so they would update that table with your internal application ID, to tell main workflow apps/connections from servicing utilities. However it is also possible that mere CURRENT_USER and CURRENT_CONNECTION pair, which the trigger knows and can store to the table, would be enough for that table, if you can infer kind of application from mere user name.
Then on disconnect trigger might be checking whether all "main workflow" apps disconnected and POST_EVENT to notify servicing utilities. However those utilities would still have to shutdown the database first, anyway.

You can shut down the database using gfix. The gfix tool will try to shutdown the database and if connections still exist after a timeout, the shutdown will fail.
For example, use:
gfix -shut -attach 5 <your-database>
This will:
prevent new connection being created,
wait 5 seconds for the existing connections to end,
if after 5 seconds there are still active connections the shutdown will abort,
otherwise, after 5 seconds the database will be shut down.
After shutdown, only SYSDBA or the database owner can create a connection to the database. This is only a viable option if your application it self doesn't use SYSDBA or the database owner account.
You bring the database back online using:
gfix -online <your-database>
For more information, see also Gfix - Database Housekeeping: Database Startup and Shutdown

Well, not an elegant way, but works...
I try to rename the database file.
If there is someone accessing the database, the rename operation will give me
an exception, saying that the file is in use by some process.
If rename succeeds, new users will not be able to access the database
anymore (the connection string used by my systems is not changed).
I run the exclusive process I have to.
Rename the database file to its original name, allowing new users to
connect again.
I post my solution in the hope that helps someone facing a similar problem.
Our new version of the product will probably a Web application and the database was not choosen yet, but certainly will no be Firebird.
Thanks to all that tried to give me an answer.

Multiple could not receive data from client: Connection reset by peer Postgresql and Resque

I have a server that runs Postgresql. in the logs I am seeing this message for my resque based 'worker' box, multiple times a minute. Some minutes there isn't a message, others could be 10 times.
2016-01-12 13:40:36 EST:1.1.8.2(33899):[16141]: LOG: could not receive data from client: Connection reset by peer
Now when i go into the 1.1.8.2 box to look at netstat -ntp i don't see a port 33899, and most of them are at least in the 40xxx range by now. That may be conjecture but I'm at a loss to find out why a Redis/Resque/Puma Rails stack would be printing out these messages, let alone what that means even if i get to the bottom of it.
Will I gain memory back if they are closed 'normally'?
Is this a thing to be wary of?
How does one debug OLD ports that are open when the db box and the worker box both don't display the ports any more?

This message is probably due to the resque worker task not closing the database connection before it exits. It's not a huge problem, but presumably Postgres is doing a little extra work to clean it up, and it makes a mess of your log file...
One solution is to add a hook to your resque worker's task file (the same file that contains the self.perform definition):
def self.after_perform(*args)
ActiveRecord::Base.connection.disconnect!
end

503 Server Unavailable - Dynamics CRM Web Service down - how to diagnose?

I provide support for a large application across multiple servers. System has been running live for 6+ months.
8th December: total system failure. iisreset across each of the servers sorted it out. Everything back to normal.
Post failure investigation showed various processes not able to get a response from a particular server which hosts an instance of Dynamics CRM (2011 R11). Specifically it seems the SOAP service was not responding (Organization.svc). 503 - Server Unavailable (really it was just the web service). I suspect it died.
Having the exact time of the error I checked the event logs on the server but these did not have anything of use. The last error prior to the failure was a report rendering error which was 9 minutes before the system actually went down. Surely if web service crashed this would be reflected in the event log?
Fast forward to today, 8th January and the system fails again. The 8th of the month again! iisreset fixes it... again!
Again, completely useless event logs showings no errors prior to failure.
Entertained the idea of Dynamics CRM trace logging but this is out of the question due to the performance hit.
Apart from the event logs where else to look? Are there possible external factors or causes? I'm trying to find the root cause but have run out of ideas!

While this may not address the source of your problem, maybe it can help minimize the symptoms. May I suggest that you configure the IIS server to recycle the application pool at a scheduled interval within your production environment.
http://technet.microsoft.com/en-us/library/cc753179%28v=ws.10%29.aspx

Mongo cannot find master during data lookups

I am running a large data update using pymongo. To run the updates, individual records are found using collection.find_one(unique criteria), changes are made, the updates are batched, and finally sent in chunks using db.collection.save([long list of records to save])
On my local machine (running 1.6.3), the imports work fine.
On a remote server (running 1.6.0), which is much faster than my local machine, I can get through a portion of the inserts just fine, but then will suddenly get the following error when looking up original records:
connection = Connection(...)
...
raise AutoReconnect("could not find master/primary")
pymongo.errors.AutoReconnect: could not find master/primary
The number of records I can get through is varies somewhat, but is not random.
At first I thought I was running into the connection limit. I started closing connections manually after each record lookup:
collection.database.connection.disconnect()
Which didn't solve the problem. Am I on the right track?

So there are a couple of potential issues here:
raise AutoReconnect("could not find master/primary")
pymongo.errors.AutoReconnect: could not find master/primary
That error indicates that the existing connection has somehow been invalidated. There are a number of reasons this could happen.
The most common reason this happens is that that the Primary of a Replica Set has stepped down or has failed. In this case your code needs to:
Catch (or trap) the error.
Decide on a retry strategy. (fail? retry once?...)
Are you doing this?
Are you running Replica Sets or Master/Slave?
Do you have any tracking for the performance of these servers?
Are they having network issues?
Are they switching roles?
collection.database.connection.disconnect()
Which didn't solve the problem. Am I on the right track?
Where is the exception "happening"? Is it coming from the connection itself or the save command?
On a remote server (running 1.6.0)
As of this writing, 1.6.0 is a very old version of MongoDB. There were multiple replication bugs fixed in the subsequent 1.6.x versions and 1.7.x versions. (we're already at 1.8.1rc-0)
I would start by looking at what's happening with your servers, but that may well lead you down the upgrade path.

I've encountered this problem in interactive python usage with pymongo, where I leave the session idle and encounter AutoReconnect upon returning. I've handled it this way:
import functools
import pymongo
import time
MAX_AUTO_RECONNECT_ATTEMPTS = 5
def graceful_auto_reconnect(mongo_op_func):
"""Gracefully handle a reconnection event."""
#functools.wraps(mongo_op_func)
def wrapper(*args, **kwargs):
for attempt in xrange(MAX_AUTO_RECONNECT_ATTEMPTS):
try:
return mongo_op_func(*args, **kwargs)
except pymongo.errors.AutoReconnect as e:
wait_t = 0.5 * pow(2, attempt) # exponential back off
time.sleep(wait_t)
return wrapper
#graceful_auto_reconnect
def some_func_that_does_mongodb_ops():
...
...
YMMV