The company has recently implemented software not written by us. The software uses Crystal Reports and whenever somebody draws a particularly large report and close their browser before the report is finished loading, we cannot draw anymore reports. The only way to fix it is to reset iis which is obviously exceptionally bad practice.
Any ideas on how to overcome this?
Thanks
So if one person closes their browser prematurely, the app breaks for everyone? Can two people try loading one of these long-running reports at once? Are there multiple templates, and this only breaks one and leaves the others ok?
It sounds a bit like the app's implementation of Crystal is holding an exclusive lock on the original template, and so when the user quits prematurely the app doesn't release the template for other users to use.
If it's a SQL server it is pulling data from, you could kill the SPID on the SQL server, that may allow the CR process to exit more gracefully; if you're using IIS6, you could configure a worker process to cycle automatically after a fixed number of requests or a time frame. Creating multiple worker processes may help also.
I wonder why it is hanging though, will it succeed if you wait long enough for the prior query and the current one to finish?
Finding a way to speed up the query would be a good idea too; or have large reports run off-hours and delivered to the users.
Related
I'm having enormous problems managing connections in Vertica when developing BIRT reports. The basic idea is that sessions never die, so I always hit the connection cap. This is, of course, a problem, because then you can't use the database at all unless you do a close_all_sessions() to nuke everyone.
This happens at just about every level of development there is. First, in Esproc, when you develop the underlying logic... if there's a bug in your program before the connection.close(), the connection stays open and Esproc opens up a new one next execution. This adds up REALLY quickly when you have a couple of users developing stuff on the network.
Next, in Eclipse it's the same thing. You open a report and Eclipse creates a dozen connections that'll stay as long as you keep Eclipse open. Then, when you run the report, it'll create another bunch of connection, totally ignoring the ones it already has... and if you have bugs in your report, the dozen extras won't close.
Then on our website, same thing... problem running the report, boom, connections won't close EVER. I've had sessions stay open for two weeks with absolutely no activity. They only disappeared when I restarted Tomcat.
I'm at my whit's end here. There doesn't seem to be ANY way to set a session timeout in Vertica and I don't even know where to even begin looking to solve these problems. Everywhere I could find, the connection timeout was set to 20 seconds... so I would expect a connection to disappear after reaching that time, but of course that's not the case.
I really have no idea what to do here... and I'm desperate for some help here. Can anyone give me a clue? I've been at this for two days now and my brain just can't take anymore.
You want to use a connection pool instead of a direct JDBC access, it will blow away connection issues on Tomcat and improve performances.
Visit this article for more informations.
Define the connection pool (CP) in [Tomcat home]/conf/server.xml
Link the CP to web applications in [Tomcat home]/conf/context.xml
Install Apache Probe or something similar on Tomcat, this will help to test if the CP is correctly defined.
in BIRT reports, use JNDI URL property to link a datasource to the CP
This will solve the problem for the website, but not for Eclipse designer though. Try to upgrade to the most recent birt & jdbc versions.
I'm developing a website (using a LAMP stack) which must handle many user-made scheduling tasks. It works as following: an user creates an event and sets a date, and others users (as many as 63) may join. A few hours before the set date, the system must email each user subscribed to that event. And that's it.
However, I have never handled scheduling, and the only tools I know (poorly) are cron and at. My plan is to create an at job for each event, which will call a script that gets all subscribers emails and mails them.
My question is: is my plan/design good? Is it scalable? Are there better options that I should be aware of?
Why a separate cron job for each event? I've done something similar thing for a newsletter with a cron job just running once per hour and if there are any newsletters to be sent it just handles them. In your case you'd have a script that runs once every hour and gets a list of users for events that happen in the desired time interval since.
It will work. As far as scalability, at the minimum make sure that the script runs in it's own process so it doesn't bog down the server unnecessarily.
Create a php-cli script perhaps?
I'm doing most of my work in Rails nowadays, and there's a wealth of background processing libraries one of them is Resque it uses the redis server to keep track of the jobs
I found a PHP clone https://github.com/chrisboulton/php-resque
Might be overkill for your use case, but give it a shot perhaps
If you would consider a proper framework that uses an application server (and not a simple webserver), Spring has a task scheduling layer that's simple to use. Scheduling jobs on the server really requires more than what a simple LAMP install can do, but I haven't used PHP in a while so maybe there's an equivalent.
Here's an article that compares some of your options.
So we run a downline report. That gathers everyone in the downline of the person who is logged in. Some people of clients run this with no problem as it returns less than 100 records.
Some people of clients however returns 4,000 - 6,000 rows which comes out to be about 8 MB worth of information. I actually had to up my buffer limit on my development machine to handle the large request.
What are some of the best ways to store this large piece of data and help prevent it from being run multiple times consecutively?
Can it be stored in a cookie?
Session is out of the question as this would eat up way to much memory on the server.
I'm open to pretty much anything at this point, trying to better streamline the old process into a much quicker efficient one.
Right now what is done, is it loads the entire recordset, it loops through the recordset building out the data into return_value cells.
Would this be better to turn into a jquery/ajax call?
The only main requirements are:
classic asp
jquery/javascript
T-SQL
Why not change the report to be paged? Phase 1: run the entire query, but the page only displays the right set of rows based on selected page. Now your response buffer problem is fixed. Phase 2: move the paging into the query using Row_Number(), now your database usage problem is fixed. Phase 3: offer the user an option of "display to screen" (using above) or "export to csv" where you can most likely export all the data, since csv is nice and compact.
Using a cookie seems unwise, given the responses to the question What is the maximum size of a web browser's cookie's key?.
I would suggest using ASP to create a file on the Web server and writing the data to that file. When the user requests the report, you can then determine if "enough time" has passed for it to be worth running the report again, or if the cached version is sufficient. User's login details could presumably be used for naming the file, or the Session.SessionID, or you could store something new in the user's session. Advantage of using their login would be that your cache of the report can exist longer than a user's session.
Taking Brian's Answer further, query page count, which would be records returned / items per page rounded up. Then join the results of every page query on client side. Pages start at a offset provided through the query. Now you have the full amount on the client without overflowing your buffer. And it can be tailored to an interface and user option (display x per page).
Our shop has developed a few WEB/SMS/DB solution for a dozen client installations. The applications have some real-time performance requirements, and are just good enough to function properly. The problem is that the clients (owners of the production servers) are using the same server/database for customizations that are causing problems with the performance of the applications that we created and deployed.
A few examples of clients' customizations:
Adding large tables with many text datatypes for the columns that get cast to other data types in the queries
No primary keys, indexes, or FK constraints
Use of external scripts that use count(*) from table where id = x, in a loop from the script, to determine how to construct more queries later in the same script. (no bulk actions that the planner can optimize or just do everything in a single pass)
All new code files on the server are created/owned by root, with 0777 permissions
The clients don't take suggestions/criticism well. If we just go ahead and try to port/change the scripts ourselves, the old code can come back, clobbering any changes that we make! Or with out limited knowledge of their use cases, we break functionality while trying to optimize their changes.
My question is this: how can we limit the resources to queries/applications other that what we create and deploy? Are there any pragmatic options in scenarios like this? We prided ourselves in having an OSS solution, but it seems that it's become a liability.
We use PG 8.3 running on a range on Linux Distos. The clients prefer php, but shell scripts, perl, python, and plpgsql are all used on the system in one form or another.
This problem started about two minutes after the first client was given full access to the first computer, and it hasn't gone away since. Anytime someone whose priorities are getting business oriented work done quickly they will be sloppy about it and screw up things for everyone. That's just how things work, because proper design and implementation are harder than cheap hacks. You're not going to solve this problem, all you can do is figure out how to make it easier for the client to work with you than against you. If you do it right, it will look like excellent service rather than nagging.
First off, the database side. There's now way to control query resources in PostgreSQL. The main difficulty is that tools like "nice" control CPU usage, but if the database doesn't fit in RAM it may very well be I/O usage that is killing you. See this developer message summarizing the issues here.
Now, if in fact it's CPU the clients are burning through, you can use two techniques to improve that situation:
Install a C function that changes the process priority (example 1, example 2) and make sure whenever they run something it gets called first (maybe put it into their psql config file, there are other ways).
Write a script that looks for postmaster processes spawned by their userid and renice them, make it run often in cron or as a daemon.
It sounds like your problem isn't the particular query processes they're running, but rather other modifications they're making to the larger structure. There's only one way to cope with that: you have to treat the client like they're an intruder and use the approaches of that portion of the computer security field to detect when they screw things up. Seriously! Install an intrusion detection system like Tripwire on the server (there are better tools, that's just the classic example), and have it alert you when they touch anything. New file that's 0777? Should jump right out of a proper IDS report.
On the database side, you can't directly detect the database being modified usefully. You should do a pg_dump of the schema every day into a file (pg_dumpall -g and pg_dump -s, then diff that against the last one you delivered and again alert you when it's changed. If you manage that this well, the contact with the client turns into "we noticed you changed on the server...what is it you're trying to accomplish with that?" which makes you look like you're really paying attention to them. That can turn into a sales opportunity, and they may stop fiddling with things as much just knowing you're going to catch it immediately.
The other thing you should start doing immediately is install as much version control software as you can on each client box. You should be able to login to each system, run the appropriate status/diff tool for the install, and see what's changed. Get that mailed to you regularly too. Again, this works best if combined with something that dumps the schema as a component to what it manages. Not enough people use serious version control approaches on the code that lives in the database.
That's the main set of technical approaches useful here. The rest of what you've got is a classic consulting client management problem that's far more of a people problem than a computer one. Cheer up, it could be worse--FSM help you if you give them ODBC access and they discover they can write their own queries in Access or something simple like that.
I’m about to write a workflow in CRM that calls itself every day. This is a recursive workflow.
It will run on half a million entities each day and deactive the record if it was not been upodated in the past 3 days.
I’m worried about performance has anyone else done this.
I haven't personally implemented anything like this, but that's 500,000 records that are floating around in the DB that the async service has to keep track of, which is going to tax your hardware. In addition, CRM keeps track of recursive workflow instances. I don't have the exact specs in front of me, but if a workflow calls itself a set number of times within a certain timeframe, CRM will kill the workflow.
Could you just write a console app that asks the Crm Service for records that haven't been updated in three days, and then deactivate them? Run it as a scheduled task once a day, and then your CRM system doesn't have the burden of keeping track of all those running workflow instances.
EDIT: Ah, I see now you might have been thinking of one workflow that runs on all the records as opposed to workflows running on each record. benjynito's advice makes sense if you go this route, although I still think a scheduled task would be more appropriate than using workflow.
You'll want to make sure your workflow is running in non-peak hours. Assuming you have an on-premise installation you should be able to get away with that. If you're using a hosted instance, you might be worried about one organization running the workflow while another organization is using the system. Use the timeout and maybe a custom workflow activity, if necessary, to force the start time to a certain period.
I'm assuming you'll be as efficient as possible in figuring out which records to deactivate. (i.e. Query Expression would only bring back the records you'll be deactivating).
The built-in infinite loop-protection offered by CRM shouldn't kill your workflow instances. It stops after a call depth of 8, but it resets to 1 if no calls are made for an hour. So the fact that you're doing this once a day should make you OK on the recursive workflow front.