Perl Website with Dancer2 - how can I log user activity, history, etc? - perl

We have a perl web interface that I am currently working on to slowly convert to using Dancer 2 and PSGI instead of our slow old plain vanilla CGI model.
In our old model, we stored everything in sessions -- the history of what the users did, the call stacks, the data inputs, ........ you get the idea.
We do not want to do it that way anymore so that we can keep the sessions small and efficient. BUT, we'd still like to log just what the users have been doing (that way when an error gets reported we can see what they did to get to the error, what input(s) they put in, etc).
I looked at Logging on Dancer2 documentation, but this doesn't seem to quite get to what we need - this would only record Dancer2 messages + what other messages I put in.
This one that I found Dancer2::Logger doesn't seem to quite cut it either.
What other libraries could I use to do what I need? I seriously doubt that perl does NOT have somethign that does this so...

Just off the top of my head, I can think of Log::log4perl and Log::Dispatch, though there are myriad others.
You can use them to establish your own log files, separate from dancer's log.
As for the best way, most logging interfaces have the same api for logging, but differ in run-time instantiation, and configuration syntax. So read the docs on a few of them and maybe try a couple out on for size.

Related

Tricky Issue Handling File Upload in Perl

We use CGI.pm to help us handle file uploading on our website and through our API which is used by our Android & iPhone Apps. We recently noticed that CGI.pm seems to be returning no params for almost 50% of the files being uploaded via our iPhone App. We haven't seen a similar issue with those files being uploaded via our website.
We can't replicate the problem in testing but in production the cgi_error() method of CGI.pm isn't reporting any errors in those cases where the CGI.pm params are missing. We have confirmed that the iPhone App is always including the correct params when POSTing the files for upload.
Quick background on the setup. We have the application delpoyed on Amazon EC2 Servers which are being load balanced using the Amazon Elastic Load Balancers. We also have $CGI::POST_MAX=(1024*100000); so the POST max size is set to 100MB and we have confirmed that all uploads are under this limit.
I'm not sure where to go next. Any ideas on what the issue might be and how to resolve it would be great appreciated. Also helpful would be any ideas on how to identify the root of the issue so we can start troubleshooting.
Thanks in advance for your help!
The loss of params with no error logged is exactly the symptom of the CGI module encountering an error processing the POST data - such as POST exceeding $CGI::POST_MAX. Are you using CGI.pm in functional mode by calling param(), or object oriented mode calling $cgi->param()? Regarding cgi_error(), perldoc CGI warns: When using the function-oriented interface, errors may only occur the first time you call param(). Be ready for this!
As for debugging, if you suspect CGI.pm is masking errors from you, try looking at the CGI object before doing anything else:
use Data::Dumper;
my $cgi = CGI->new();
warn Dumper($cgi);
Within the dump of the CGI object you would see an error like this: '.cgi_error' => '413 Request entity too large' - which is what cgi_error() would return for POST_MAX exceeded.
Also, if using Modperl, be aware that CGI can old onto values such as $CGI::POST_MAX between requests to different apps. (But, since you are specifying POST_MAX in yours, this wouldn't appear to be your problem.)
We too are seeing this same behaviour with CGI.pm, although we had thought the problem was restricted to just IE. Solved by adding an $CGI::POST_MAX=5000000. Over kill as it is only a 50k file being passed back and for.

Recreate a site from a tcpdump?

It's a long story, but I am trying to save an internal website from the pointy hair bosses who see no value from it anymore and will be flicking the switch at some point in the future. I feel the information contained is important and future generations will want to use it. No, it's not some adult site, but since it's some big corp, I can't say any more.
The problem is, the site is a mess of ASP and Flash that only works under IE7 and is buggy under IE8 and 32bit only even. All the urls are session style and are gibberish. The flash objects itself pull extra information with GET request to ASP objects. It's really poorly designed for scraping. :)
So my idea is to do a tcpdump as I navigate the entire site. Then somehow dump the result of every GET into a sql database. Then with a little messing with the host file, redirect every request to some cgi script that will look for a matching get request in the database and return the data. So the entire site will be located in an SQL database in URL/Data keypairs. Flat file may also work.
In theory, I think this is the only way to go about this. The only problem I see is if they do some client side ActiveX/Flash stuff that generates session URLs that will be different each time.
Anyway, I know Perl, and the idea seems simple with the right modules, so I think I can do most of the work in that, but I am open to any other ideas before I get started. Maybe this exist already?
Thanks for any input.
To capture I wouldn't use tcpdump, but either the crawler itself or a webproxy that can be tweaked to save everything, e.g. Fiddler, Squid, or mod_proxy.

How can I communicate across Perl CGI scripts?

I am searching for efficient ways of communication across two Perl
scripts. I have two scripts; Script 1 generates some data. I want my
Script 2 to be able to access that information.
The easiest/dumbest
way is to write the data generated by Script 1 as a file and read it
later using Script 2. Is there any other way than this? Can I store
the data in memory and make it available to Script 2 (of course with
support from my Linux )? Meaning malloc some data by Script 1 and make
Script 2 able to access it.
There is no guarantee that Script 2 will be run after Script 1. So
there should be some way to free that memory using a watchdog timer.
Let me reveal some more context. I am running these scripts on a web-server using CGI-Perl. So at the click of a button Script 1 is run and it generates a html web-page. Now the user can add some inputs to to this generated web-page and click a button on this new page.Now Script 2 should be able to read the data on new web-page.I can post the data back to web-server again but a more efficient way is to keep a copy of generated page in server also and make it available to script 2. Now, I would like to avoid writing down the generated page as a file. I was thinking of storing it in memory
This depends somewhat on your usage... one large set of data? Many small messages? Di you canre at all about data persistance? Is it TOTALLY asynchronous?
Some of the options are:
For any but the most high performace web sites, the best approach is to write our the HTML pages to files!. Unless the intrer-process communication is benchmarked to be the botttleneck in performance, don't both with any of the non-file solutions (shared memory, cache, intermediate server).
Specifically for two CGI scripts on the same server, if you run them under mod_perl or some other arrangement which shares Perl interpreter between 2 CGI processes, you can develop a package to serve as cache, which -with its package level variable - would be preserved in memory by mod_perl as long as mod_perl is running and can thus be used by a writer CGI process and a reader CGI process to communicate. Of course the usual synchronization/deadlock and persistance issues associated with reader/writer need to be considered.
As an alternative, use Apache::Session sessions to store inter-session data.
As you noted, shared memory. For example use IPC::ShareLite, IPC::Cache, or this solution from perlmonks.
Also, please check Chapter 16 Recipe 12 "Sharing Variables in Different Processes" from O'Reilly's "Perl Cookbook" (no link since non-pirated versions aren't online anywhere I know of)
Use a permanent medium. A file is one option. A database is another.
For async, use an intermediate messaging system (MQ, Tibco, something more lightweight). Probably a bit of an overkill in this scenario but a valid option to be aware of. This one is likely to be pretty stablem solid and optmized, but possibly not free and less flexible/tailored.
Or roll your own simple messaging system server - it's not THAT complicated for very simple one you seem to need.
Listen on one port for requests from first process to store data, listen on another port for requests from consumer process to send you that data, store the data in a storage area in memory and purge it when it expires using alarms or separate watcher child process).
You've tagged your question as "cgi". Are they both CGI programs? In that case, they can just talk to each other by making HTTP requests.
However, you'll have to tell a lot more about why you are trying to do this and what you need to accomplish for us to help you. It's certainly easy for Perl programs to communicate with each other in some fashion, but that doesn't mean it's the right answer for you.
When you have complex requirements for interaction among CGI programs, you probably want to move to a web framework that handles a lot of those details for you. Catalyst might be where'd you want to start. There's even a book for it.

Perl application move causing my head to explode...please help

I'm attempting to move a web app we have (written in Perl) from an IIS6 server to an IIS7.5 server.
Everything seems to be parsing correctly, I'm just having some issues getting the app to actually work.
The app is basically a couple forms. You fill the first one out, click submit, it presents you with another form based on what checkboxes you selected (using includes and such).
I can get past the first form once... but then after that it stops working and pops up the generated error message. After looking into the code and such, it basically states that there aren't any checkboxes selected.
I know the app writes data into .dat files... (at what point, I'm not sure yet), but I don't see those being created. I've looked at file/directory permissions and seemingly I have MORE permissions on the new server than I did on the last. The user/group for the files/dirs are different though...
Would that have anything to do with it? Why would it pass me on to the next form, displaying the correct "modules" I checked the first time and then not any other time after that? (it seems to reset itself after a while)
I know this is complicated so if you have any questions for me, please ask and I'll answer to the best of my ability :).
Btw, total idiot when it comes to Perl.
EDIT AGAIN
I've removed the source as to not reveal any security vulnerabilities... Thanks for pointing that out.
I'm not sure what else to do to show exactly what's going on with this though :(.
I'd recommend verifying, step by step, that what you think is happening is really happening. Start by watching the HTTP request from your browser to the web server - are the arguments your second perl script expects actually being passed to the server? If not, you'll need to fix the first script.
(start edit)
There's lots of tools to watch the network traffic.
Wireshark will read the traffic as it passes over the network (you can run it on the sending or receiving system, or any system on the collision domain).
You can use a proxy server, like WebScarab (free), Burp, Paros, etc. You'll have to configure your browser to send traffic to the proxy server, which will then forward the requests to the server. These particular servers are intended to aid testing, in that you'll be able to mess with the requests as they go by (and much more)
As Sinan indicates, you can use browser addons like Fx LiveHttpHeaders, or Tamper Data, or Internet Explorer's developer kit (IIRC)
(end edit)
Next, you should print out all CGI arguments that the second perl script receives. That way, you'll know what the script really thinks it gets.
Then, you can enable verbose logging in IIS, so that it logs the full HTTP request.
This will get you closer to the source of the problem - you'll know if it's (a) the first script not creating correct HTML, resulting in an incomplete HTTP request from the browser, (b) the IIS server not receiving the CGI arguments for some odd reason, or (c) the arguments aren't getting from the IIS server and into the perl script (or, possibly, that the perl script is not correctly accessing the arguments).
Good luck!
What you need to do is clear.
There is a lot of weird excess baggage in the script. There seemed to be no subroutines. Just one long series of commands with global variables.
It is time to start refactoring.
Get one thing running at a time.
I saw HTML::Template there but you still had raw HTML mixed in with code. Separate code from presentation.

How to limit the effect of client modifications to production systems

Our shop has developed a few WEB/SMS/DB solution for a dozen client installations. The applications have some real-time performance requirements, and are just good enough to function properly. The problem is that the clients (owners of the production servers) are using the same server/database for customizations that are causing problems with the performance of the applications that we created and deployed.
A few examples of clients' customizations:
Adding large tables with many text datatypes for the columns that get cast to other data types in the queries
No primary keys, indexes, or FK constraints
Use of external scripts that use count(*) from table where id = x, in a loop from the script, to determine how to construct more queries later in the same script. (no bulk actions that the planner can optimize or just do everything in a single pass)
All new code files on the server are created/owned by root, with 0777 permissions
The clients don't take suggestions/criticism well. If we just go ahead and try to port/change the scripts ourselves, the old code can come back, clobbering any changes that we make! Or with out limited knowledge of their use cases, we break functionality while trying to optimize their changes.
My question is this: how can we limit the resources to queries/applications other that what we create and deploy? Are there any pragmatic options in scenarios like this? We prided ourselves in having an OSS solution, but it seems that it's become a liability.
We use PG 8.3 running on a range on Linux Distos. The clients prefer php, but shell scripts, perl, python, and plpgsql are all used on the system in one form or another.
This problem started about two minutes after the first client was given full access to the first computer, and it hasn't gone away since. Anytime someone whose priorities are getting business oriented work done quickly they will be sloppy about it and screw up things for everyone. That's just how things work, because proper design and implementation are harder than cheap hacks. You're not going to solve this problem, all you can do is figure out how to make it easier for the client to work with you than against you. If you do it right, it will look like excellent service rather than nagging.
First off, the database side. There's now way to control query resources in PostgreSQL. The main difficulty is that tools like "nice" control CPU usage, but if the database doesn't fit in RAM it may very well be I/O usage that is killing you. See this developer message summarizing the issues here.
Now, if in fact it's CPU the clients are burning through, you can use two techniques to improve that situation:
Install a C function that changes the process priority (example 1, example 2) and make sure whenever they run something it gets called first (maybe put it into their psql config file, there are other ways).
Write a script that looks for postmaster processes spawned by their userid and renice them, make it run often in cron or as a daemon.
It sounds like your problem isn't the particular query processes they're running, but rather other modifications they're making to the larger structure. There's only one way to cope with that: you have to treat the client like they're an intruder and use the approaches of that portion of the computer security field to detect when they screw things up. Seriously! Install an intrusion detection system like Tripwire on the server (there are better tools, that's just the classic example), and have it alert you when they touch anything. New file that's 0777? Should jump right out of a proper IDS report.
On the database side, you can't directly detect the database being modified usefully. You should do a pg_dump of the schema every day into a file (pg_dumpall -g and pg_dump -s, then diff that against the last one you delivered and again alert you when it's changed. If you manage that this well, the contact with the client turns into "we noticed you changed on the server...what is it you're trying to accomplish with that?" which makes you look like you're really paying attention to them. That can turn into a sales opportunity, and they may stop fiddling with things as much just knowing you're going to catch it immediately.
The other thing you should start doing immediately is install as much version control software as you can on each client box. You should be able to login to each system, run the appropriate status/diff tool for the install, and see what's changed. Get that mailed to you regularly too. Again, this works best if combined with something that dumps the schema as a component to what it manages. Not enough people use serious version control approaches on the code that lives in the database.
That's the main set of technical approaches useful here. The rest of what you've got is a classic consulting client management problem that's far more of a people problem than a computer one. Cheer up, it could be worse--FSM help you if you give them ODBC access and they discover they can write their own queries in Access or something simple like that.