BIRT sessions stay open in Vertica - eclipse

I'm having enormous problems managing connections in Vertica when developing BIRT reports. The basic idea is that sessions never die, so I always hit the connection cap. This is, of course, a problem, because then you can't use the database at all unless you do a close_all_sessions() to nuke everyone.
This happens at just about every level of development there is. First, in Esproc, when you develop the underlying logic... if there's a bug in your program before the connection.close(), the connection stays open and Esproc opens up a new one next execution. This adds up REALLY quickly when you have a couple of users developing stuff on the network.
Next, in Eclipse it's the same thing. You open a report and Eclipse creates a dozen connections that'll stay as long as you keep Eclipse open. Then, when you run the report, it'll create another bunch of connection, totally ignoring the ones it already has... and if you have bugs in your report, the dozen extras won't close.
Then on our website, same thing... problem running the report, boom, connections won't close EVER. I've had sessions stay open for two weeks with absolutely no activity. They only disappeared when I restarted Tomcat.
I'm at my whit's end here. There doesn't seem to be ANY way to set a session timeout in Vertica and I don't even know where to even begin looking to solve these problems. Everywhere I could find, the connection timeout was set to 20 seconds... so I would expect a connection to disappear after reaching that time, but of course that's not the case.
I really have no idea what to do here... and I'm desperate for some help here. Can anyone give me a clue? I've been at this for two days now and my brain just can't take anymore.

You want to use a connection pool instead of a direct JDBC access, it will blow away connection issues on Tomcat and improve performances.
Visit this article for more informations.
Define the connection pool (CP) in [Tomcat home]/conf/server.xml
Link the CP to web applications in [Tomcat home]/conf/context.xml
Install Apache Probe or something similar on Tomcat, this will help to test if the CP is correctly defined.
in BIRT reports, use JNDI URL property to link a datasource to the CP
This will solve the problem for the website, but not for Eclipse designer though. Try to upgrade to the most recent birt & jdbc versions.

Related

Have you ever experienced connection issues with Postgres database' based on just a db name?

I've been idly bashing away at an issue with Postgres for months now. I've a bit of software (custom in-house stuff) that on 24 out of 25 servers does a certain process absolutely fine, no issues what so ever.
On the 25th server though, the process wouldn't quite complete properly, it would fail at the final hurdle which was a simple date change.
It's been a back-burner type issue so I'd not committed much time to working it out until management started to get angsty so I spent most of yesterday bashing away at it.
Obvious checks were done first:
Postgres version (9.6)
Software version
Windows patches (Server 2019)
GPO's
NTFS permissions
etc
All checked out as matches across every server. Went through the Postgres and the in-house software logs at length, had one of the developers build a stand alone executable for the process with a ridiculous amount of logging, still no dice. No indicators. Procmon and Wireshark logs showed the same story, nothing clear at all as to what was going on.
So then we take a backup of the database, load it in with a different name for testing and start running the process to find that it now works fine on the cloned database. This leads us to thinking there's maybe a formatting issue of some kind in the database, conscious of the idea that doing the backup & restore would shake things around. So we go back to the live, back it up again - delete the DB from postgres and restore from the backup.
No dice. Still broken.
Cue some serious confusion. We've done essentially the same thing with cloning live to test and are still getting the same fault at the end of the process.
After some head scratching and more prodding around in the logs I hit upon an idea of doing a fresh backup of the live DB, deleting the database, restoring the backup with a different name and then pointing the live software install to the newly named live DB and testing the process again.
It works!
For clarity, the filenames are basic alpha only. Upper case and lower case. No numbers, no symbols. Less than 15 characters in length.
I'm at a loss as to why it's now working and I'd love to get some input from the community.

How to handle db failure at start and runtime

The premises:
I'm using NodeJS, Mongodb Native 2.0+. I have created one connection (pool) which is keept open and reused in all modules. When production ready, the app and the db will be hosted on the same server, probably a VPS.
Questions about MongoDB in a production ready environment.
Should I anticipate that Mongod can crash? And if so, should I have some kind of autostart for it. Can "bugs" in my code be responsible for Mongod crashing?
If i should anticipate crasches, this probably also means that i should anticipate disruptions in my app when db cant be reached. What is the proper way to handle these? How long should i expect these disruptions to be?
If I start my app, manually close the Mongod and initiate a route call, the default response seems to be "waiting/loading" until Mongod is up again. I guess this default behaviour is ok if I dont expect disruption to be more than some seconds?
If Mongod is not up when starting the app, an exception will be thrown and the app will not start. This seems fine because without the db, the app cant do anything. Or should this be handled in another way?
I have made extensive search for this online, but have not found anything useful. Maybe I dont know what search terms to use...
There is alot of questions crammed up in one post here, but I hope someone could give me some answer or provide me with some links to good reading. The big question here is, how do i handle db failure at start and runtime

What are the limitations of the flask built-in web server

I'm a newbie in web server administration. I've read multiple times that flask built-in web server is not designed for "production", and must be used only for tests and debug...
But what if my app touchs only a thousand users who occasionnaly send data to the server ?
If it works, when will I have to bother with the configuration of a more sophisticated web server ? (I am looking for approximative metrics).
In a nutshell, I would love to find what the builtin web server can do (with approx thresholds) and what it cannot.
Thanks a lot !
There isn't one right answer to this question, but here are some things to keep in mind:
With the right amount of horizontal scaling, it is quite possible you could keep scaling out use of the debug server forever. When exactly you would need to start scaling (or switch to using a "real" web server) would also depend on the environment you are hosting in, the expectations of the users, etc.
The main issue you would probably run into is that the server is single-threaded. This means that it will handle each request one at a time, serially. This means that if you are trying to serve more than one request (including favicons, static items like images, CSS and Javascript files, etc.) the requests will take longer. If any given requests happens to take a long time (say, 20 seconds) then your entire application is unresponsive for that time (20 seconds). This is only the default, of course: you could bump the thread counts (or have requests be handled in other processes), which might alleviate some issues. But once again, it can still be slow under a "high" load. What is considered a "high" load will be dependent on your application and the expectations of a maximum acceptable response time.
Another issue is security: if you are concerned at ALL about security (and not just the security of the data in the application itself, but the security of the box that will be running it as well) then you should not use the development server. It is not ready to withstand any sort of attack.
Finally, the development server could just fail outright. It is not designed to be used as a long-running process (days, weeks, months), and so it has not been well tested to work in this capacity.
So, yes, it has limitations. Yes, you could still conceivably use it in production. And yes, I would still recommend using a "real" web server. If you don't like the idea of needing to install something like Apache or Nginx, you can still go with a solution that is still as easy as "run a python script" by using some of the WSGI Standalone servers, which can run a server that is designed to be in production with something just as simple as running python run_app.py in the command line. You typically just need to create a 4-5 line python script to import and create the server object, point it to your Flask app, and run it.
gunicorn could be run with only the following on the command line, no extra script needed:
gunicorn myproject:app
...where "myproject" is the Python package that contains the app Flask object. Keep in mind that one of developers of gunicorn would probably recommend against this approach. See https://serverfault.com/questions/331256/why-do-i-need-nginx-and-something-like-gunicorn.
The OP has long-since moved on, but for those who encounter this question in the future I would just add that setting up an Apache server, even on a laptop, is free and pretty easy. It can be readily configured for as few or as many features as you want just by uncomment in or commenting out lines in the config file. There might be an even easier GUI method for doing that nowdays, but just editing the configs is simple.

How should I determine what is issuing a flush_all command

We have a memcached server that is shared by about two dozen apps. One of the web apps (or perhaps one of our utility apps) is issuing a flush_all command periodically. The frequency seems random, or at least we haven't seen a pattern yet. It happens about 10 times an hour.
Here's the rub. I can't figure out a good way to figure out which app is doing this. The memcacehd logs are not helpful at all. Here's what I've done so far:
* grep all source code - Other than memcached libraries I can't see anywhere where we issue this command.
* Enable verbose logging (-vv) in memcached - I see the commands get issued, but the log doesn't show any information about where the command is being issued from.
* Research how to administratively disable this; without an unapproved source patch to memcached I can't figure out a good way to do it.
Has anyone else had this problem? I'm assuming that this is coming from one of our web apps, but its possible its from somewhere else too. Any suggestions?
My next step is to setup a second memcached server and move applications one by one (which will be slow and time consuming). There must be a better way.
A little late, but in case anyone else hits this...
I'd suggest you set up multiple memcache proxies and configure each application to use a different one. The first proxy I found was twemproxy, no idea how good it is.
After that you can use the logs for the proxy to identify which application is issuing the commands.

Crystal Reports hanging

The company has recently implemented software not written by us. The software uses Crystal Reports and whenever somebody draws a particularly large report and close their browser before the report is finished loading, we cannot draw anymore reports. The only way to fix it is to reset iis which is obviously exceptionally bad practice.
Any ideas on how to overcome this?
Thanks
So if one person closes their browser prematurely, the app breaks for everyone? Can two people try loading one of these long-running reports at once? Are there multiple templates, and this only breaks one and leaves the others ok?
It sounds a bit like the app's implementation of Crystal is holding an exclusive lock on the original template, and so when the user quits prematurely the app doesn't release the template for other users to use.
If it's a SQL server it is pulling data from, you could kill the SPID on the SQL server, that may allow the CR process to exit more gracefully; if you're using IIS6, you could configure a worker process to cycle automatically after a fixed number of requests or a time frame. Creating multiple worker processes may help also.
I wonder why it is hanging though, will it succeed if you wait long enough for the prior query and the current one to finish?
Finding a way to speed up the query would be a good idea too; or have large reports run off-hours and delivered to the users.