How can I defer processing during apache / mod_perl page rendering? - perl

I have an apache2 / mod_perl website. On one page, I need to do some server/server communication via SOAP.
The results of this communication are not required for the rendering of the page (but user input is required to trigger this communication).
The SOAP communication is very slow.
So what I want to do is process and print the page for the user, then do all the SOAP stuff behind the scenes.
What's the best way to achieve this? start some fork? write the job to a file and have a cronjob pick it up?
Thanks

There are two types of solutions: First you can do what Randal Schwartz suggested here. Second you could use a Message Queue like Beanstalk or Gearman. Beanstalk has a Perl Client and is now persistent and is ideal for lightweight stuff. Gearman on the other hand has more features, more worked on. There is also TheSchwartz - use it if you can do without too much documentation. cron is ideal for systematically repeating tasks. For the kind of application you have, it appears that Schedule::At might be more appropriate if you prefer a more generic "message-queue"
Also see an old StackOverflow Thread here

Related

Perl Website with Dancer2 - how can I log user activity, history, etc?

We have a perl web interface that I am currently working on to slowly convert to using Dancer 2 and PSGI instead of our slow old plain vanilla CGI model.
In our old model, we stored everything in sessions -- the history of what the users did, the call stacks, the data inputs, ........ you get the idea.
We do not want to do it that way anymore so that we can keep the sessions small and efficient. BUT, we'd still like to log just what the users have been doing (that way when an error gets reported we can see what they did to get to the error, what input(s) they put in, etc).
I looked at Logging on Dancer2 documentation, but this doesn't seem to quite get to what we need - this would only record Dancer2 messages + what other messages I put in.
This one that I found Dancer2::Logger doesn't seem to quite cut it either.
What other libraries could I use to do what I need? I seriously doubt that perl does NOT have somethign that does this so...
Just off the top of my head, I can think of Log::log4perl and Log::Dispatch, though there are myriad others.
You can use them to establish your own log files, separate from dancer's log.
As for the best way, most logging interfaces have the same api for logging, but differ in run-time instantiation, and configuration syntax. So read the docs on a few of them and maybe try a couple out on for size.

What are the limitations of the flask built-in web server

I'm a newbie in web server administration. I've read multiple times that flask built-in web server is not designed for "production", and must be used only for tests and debug...
But what if my app touchs only a thousand users who occasionnaly send data to the server ?
If it works, when will I have to bother with the configuration of a more sophisticated web server ? (I am looking for approximative metrics).
In a nutshell, I would love to find what the builtin web server can do (with approx thresholds) and what it cannot.
Thanks a lot !
There isn't one right answer to this question, but here are some things to keep in mind:
With the right amount of horizontal scaling, it is quite possible you could keep scaling out use of the debug server forever. When exactly you would need to start scaling (or switch to using a "real" web server) would also depend on the environment you are hosting in, the expectations of the users, etc.
The main issue you would probably run into is that the server is single-threaded. This means that it will handle each request one at a time, serially. This means that if you are trying to serve more than one request (including favicons, static items like images, CSS and Javascript files, etc.) the requests will take longer. If any given requests happens to take a long time (say, 20 seconds) then your entire application is unresponsive for that time (20 seconds). This is only the default, of course: you could bump the thread counts (or have requests be handled in other processes), which might alleviate some issues. But once again, it can still be slow under a "high" load. What is considered a "high" load will be dependent on your application and the expectations of a maximum acceptable response time.
Another issue is security: if you are concerned at ALL about security (and not just the security of the data in the application itself, but the security of the box that will be running it as well) then you should not use the development server. It is not ready to withstand any sort of attack.
Finally, the development server could just fail outright. It is not designed to be used as a long-running process (days, weeks, months), and so it has not been well tested to work in this capacity.
So, yes, it has limitations. Yes, you could still conceivably use it in production. And yes, I would still recommend using a "real" web server. If you don't like the idea of needing to install something like Apache or Nginx, you can still go with a solution that is still as easy as "run a python script" by using some of the WSGI Standalone servers, which can run a server that is designed to be in production with something just as simple as running python run_app.py in the command line. You typically just need to create a 4-5 line python script to import and create the server object, point it to your Flask app, and run it.
gunicorn could be run with only the following on the command line, no extra script needed:
gunicorn myproject:app
...where "myproject" is the Python package that contains the app Flask object. Keep in mind that one of developers of gunicorn would probably recommend against this approach. See https://serverfault.com/questions/331256/why-do-i-need-nginx-and-something-like-gunicorn.
The OP has long-since moved on, but for those who encounter this question in the future I would just add that setting up an Apache server, even on a laptop, is free and pretty easy. It can be readily configured for as few or as many features as you want just by uncomment in or commenting out lines in the config file. There might be an even easier GUI method for doing that nowdays, but just editing the configs is simple.

How to handle large amounts of scheduled tasks on a web server?

I'm developing a website (using a LAMP stack) which must handle many user-made scheduling tasks. It works as following: an user creates an event and sets a date, and others users (as many as 63) may join. A few hours before the set date, the system must email each user subscribed to that event. And that's it.
However, I have never handled scheduling, and the only tools I know (poorly) are cron and at. My plan is to create an at job for each event, which will call a script that gets all subscribers emails and mails them.
My question is: is my plan/design good? Is it scalable? Are there better options that I should be aware of?
Why a separate cron job for each event? I've done something similar thing for a newsletter with a cron job just running once per hour and if there are any newsletters to be sent it just handles them. In your case you'd have a script that runs once every hour and gets a list of users for events that happen in the desired time interval since.
It will work. As far as scalability, at the minimum make sure that the script runs in it's own process so it doesn't bog down the server unnecessarily.
Create a php-cli script perhaps?
I'm doing most of my work in Rails nowadays, and there's a wealth of background processing libraries one of them is Resque it uses the redis server to keep track of the jobs
I found a PHP clone https://github.com/chrisboulton/php-resque
Might be overkill for your use case, but give it a shot perhaps
If you would consider a proper framework that uses an application server (and not a simple webserver), Spring has a task scheduling layer that's simple to use. Scheduling jobs on the server really requires more than what a simple LAMP install can do, but I haven't used PHP in a while so maybe there's an equivalent.
Here's an article that compares some of your options.

GWT Server side entry point

I followed these instructions
http://code.google.com/webtoolkit/usingeclipse.html
There appears to be no entry point function for the server? How do I run background threads or code not related to the rpc services that the server exports?
For example, what if some embedded database needs to be updated every 5 minutes. So then a background thread would fetch this new data to do the updating
GWT is client-side technology and has nothing to do with server-side. You can use any servers-side technology with it. If you use java/servlets then you can optionally use GWT-RPC, which is nice, but not required.
Web applications are based around request-reply paradigm: when there is a request, they handle it and send back the reply. Servlets are designed around this paradigm. They are used at some of the largest sites and are not just a toy (as you noted in other comment).
When you need something to run periodically, then this is usually the job for Job Schedulers. I'd recommend Quartz, which has great documentation. There is also an example how to initialize it in servlet environment.
thats not how web applications are supposed to work. Read http://code.google.com/intl/de-AT/webtoolkit/doc/latest/tutorial/clientserver.html
If you want to run some processing when request comes and potentially include some dynamic parts, you can just make your pages to be JSP or servlets. GWT does not need to be used in HTML files. Just the page served by server need to be HTML. So something like server side entry point is either JSP or servlet. Otherwise you need to use PRC. But if you needed to run RPC for every page loaded, you could consider this tip of embedding RPC in the base response.

How can I communicate across Perl CGI scripts?

I am searching for efficient ways of communication across two Perl
scripts. I have two scripts; Script 1 generates some data. I want my
Script 2 to be able to access that information.
The easiest/dumbest
way is to write the data generated by Script 1 as a file and read it
later using Script 2. Is there any other way than this? Can I store
the data in memory and make it available to Script 2 (of course with
support from my Linux )? Meaning malloc some data by Script 1 and make
Script 2 able to access it.
There is no guarantee that Script 2 will be run after Script 1. So
there should be some way to free that memory using a watchdog timer.
Let me reveal some more context. I am running these scripts on a web-server using CGI-Perl. So at the click of a button Script 1 is run and it generates a html web-page. Now the user can add some inputs to to this generated web-page and click a button on this new page.Now Script 2 should be able to read the data on new web-page.I can post the data back to web-server again but a more efficient way is to keep a copy of generated page in server also and make it available to script 2. Now, I would like to avoid writing down the generated page as a file. I was thinking of storing it in memory
This depends somewhat on your usage... one large set of data? Many small messages? Di you canre at all about data persistance? Is it TOTALLY asynchronous?
Some of the options are:
For any but the most high performace web sites, the best approach is to write our the HTML pages to files!. Unless the intrer-process communication is benchmarked to be the botttleneck in performance, don't both with any of the non-file solutions (shared memory, cache, intermediate server).
Specifically for two CGI scripts on the same server, if you run them under mod_perl or some other arrangement which shares Perl interpreter between 2 CGI processes, you can develop a package to serve as cache, which -with its package level variable - would be preserved in memory by mod_perl as long as mod_perl is running and can thus be used by a writer CGI process and a reader CGI process to communicate. Of course the usual synchronization/deadlock and persistance issues associated with reader/writer need to be considered.
As an alternative, use Apache::Session sessions to store inter-session data.
As you noted, shared memory. For example use IPC::ShareLite, IPC::Cache, or this solution from perlmonks.
Also, please check Chapter 16 Recipe 12 "Sharing Variables in Different Processes" from O'Reilly's "Perl Cookbook" (no link since non-pirated versions aren't online anywhere I know of)
Use a permanent medium. A file is one option. A database is another.
For async, use an intermediate messaging system (MQ, Tibco, something more lightweight). Probably a bit of an overkill in this scenario but a valid option to be aware of. This one is likely to be pretty stablem solid and optmized, but possibly not free and less flexible/tailored.
Or roll your own simple messaging system server - it's not THAT complicated for very simple one you seem to need.
Listen on one port for requests from first process to store data, listen on another port for requests from consumer process to send you that data, store the data in a storage area in memory and purge it when it expires using alarms or separate watcher child process).
You've tagged your question as "cgi". Are they both CGI programs? In that case, they can just talk to each other by making HTTP requests.
However, you'll have to tell a lot more about why you are trying to do this and what you need to accomplish for us to help you. It's certainly easy for Perl programs to communicate with each other in some fashion, but that doesn't mean it's the right answer for you.
When you have complex requirements for interaction among CGI programs, you probably want to move to a web framework that handles a lot of those details for you. Catalyst might be where'd you want to start. There's even a book for it.