Controlling a random webpage through a Perl script - perl

There is a random website say abc.com and this website has a search engine. Is it possible to create a perl script to automatically read from a text file and feed search values into this search engine and automatically download the files that are the result of the search ? Once the download is complete, the loop has to continue until all the search values have been exhausted. I don't have any server details about the website itself.
Any help is much appreciated. Thanks !

This is HTTP client programming. You're basically writing a program that is pretending to be a browser.
The standard module for doing this is probably WWW::Mechanize (see the cookbook and the examples).
If you want something lower level, then the LWP bundle of modules will do all that you want.
There's a free online book. But it's a little old and probably doesn't reflect current best practices.

Related

How to hold or save the DTMF input in VXML? Any guides to set up a test IVR (VXML) service?

So I currently have an IVR written in some dodgy old code which is confusing and goes way over the top for some things.
I'm wanting to re-write one of my basic IVRs with VXML.
So a little bit of research is that I can call perl scripts which I can use to run data past databases, that part isnt to bad.
My question is how, or what is the syntax to use to "hold" or save the dtmf input for a menu, and then pass it to the perl script.
Question two.
Hosting of the VXML IVR. Are there any guides to setting up a test service? I have a PABX, and a few servers I can play around with.
To play around with VoiceXML I would recommend Voxeo's excellent platform called Prophecy. You can get two ports for free that you can run on a server or even on your workstation/laptop. They provide a SIP softphone to test your apps so it does not require any elaborate setup; just a simple install and you are ready to go. They also have hosted environment that you can test from for free. You just pay for the service if you put it into production. Here is a post that describes how to setup and test applications in the hosted environment. And here is a post on how to setup and test applications if you install Prophecy on your PC. Voxeo's CTO is on the VoiceXML standards committee so their platform conforms very close to the standard.
Voxeo's developer site has excellent documentation on VoiceXML that is full of examples. On your question for how to get dtmf input you can go to the bottom of the left pane in the documentation and click on the element "field". The field element is used to collect information from the caller. To easily do this with DTMF input you can use the builtin grammars. For more information on builtin grammars look at the documentation on the "type" attribute of the "field" element. Once you get a "filled" event from the "field" you can call your Perl script using a "submit" element. Voxeo's documentation has a link to this article on creating a VoiceXML applications with Perl. The Voxeo Forum is also an excellent source of information on VoiceXML and the Prophecy. If you cannot find an answer to your question in the Forum just ask it and their knowledgeable support staff will assist.
If you are also familiar with .NET technologies there is an open source project called VoiceModel that makes it easy to develop VoiceXML applications using ASP.NET. The project has a lot of examples in it.
These resources should get you started with VoiceXML fairly quickly.
To specifically answer your DTMF question, just use <submit> to send the DTMF input to the perl script, using the attribute namelist (which is just a list of variables that you need to send).
Also, from the VXML 2.0 specification:
"The <submit> element is used to submit information to the origin Web server and then transition to the document sent back in the response. Unlike <goto>, it lets you submit a list of variables to the document server via an HTTP GET or POST request. For example, to submit a set of form items to the server you might have:
<submit next="log_request" method="post"
namelist="name rank serial_number"
fetchtimeout="100s" fetchaudio="audio/brahms2.wav"/>
"

sending files from iphone app to a local server?

Need some help please with web related matters since I don't know much about web (more on the software side of things).
Basically, I am developing an iphone app and would like to send data to a local server once in a while (for simplicity, let's just say I want to send this info to my personal computer which will act as the server). This is just some simple data, and I dont care about the format (actually .txt is the best, but I am open to any format which will make it easier - I am just transferring numbers).
What would be the best way to go about this process? A quick step by step explanation would be highly appreciated. From my very basic knowledge I assume that I will need to:
setup my Mac as a server (which I think should be done from settings?)
Create a URL connection on my app and send the file?
I am probably missing 50 other steps here...
Thanks!
One path is to set up a webDAV server -- you'll have to Google that up, it's far too big a topic to cover here.
To the specific questions you asked:
1) Your mac can become a web server by turning on the WebSharing in preferences, or a file server by turing on fileSharing. Be sure to set permissions the way you want them.
2) If your mac is a web server, you could write a simple CGI script (perl, ruby, or the like -- this is simple tutorial stuff that's all over the www) that accepts your text as a parameter. From your iPhone app, you'd make an NSURLRequest to a URL similar to:
http://192.168.10.1/webPage.html?this+is+the+text+I+want+to+send
Of course, you can get fancier ans use POST requests (the above example is a GET request), but that's going to require more reading.
If you want to transfer files via file sharing, that's a bit more complicted.
What would REALLY help us answer is if you could specify the question a bit more tightly. As it is, you've asked about a very broad area that covers quite a bit of ground.

Search engine for CPAN modules

I find the extensive volume of modules available through CPAN to be somewhat at odds with its search capacities. I'm aware that there is a lot of data stored about modules, including the dlsip tags. However I'm not aware of a convenient interface to query this database. search.cpan.org seems to provide only a basic textual search, and the dlsip data is only (AFAIK) shown when browsing by category.
Is there an interface available, either as a command-line utility, in a perl module, or on a website, that will provide an advanced search query system, and render relevant data in a convenient way? In addition to the dlsip data, I'd ideally like to be able to make things like user ratings, total comments, last update time, and deployment statistics part of the query and/or view.
This is a somewhat obvious answer, but I often use Google to search CPAN. I simply type "site:cpan.org search term here" or simply "cpan search term here" and usually can find an appropriate module quickly. Rarely have I found a need to search the meta-data directly, but I agree it would definitely be nice.
(If someone is interested in starting a project to make that data more searchable, let me know and I'll help out!)
You can get a data dump from PAUSE and do what you like with it. Andreas König is the guy you'll have to talk to. I've never found the dlsip stuff useful because most people never bother to update it after they register a module.
All of the other stuff you see on CPAN Search is just a mash up of other projects. Most of the stuff that you list does not live in one database. You have to go to each individual project and get its data.
Talk to drrho, he is nursing a pet project involving lots of additional CPAN metadata.
If you want to help making a better search engine than (kobe)search.c.o, get involved in CPANHQ.

What's the best way to write a maintainable web scraping app?

I wrote a perl script a while ago which logged into my online banking and emailed me my balance and a mini-statement every day. I found it very useful for keeping track of my finances. The only problem is that I wrote it just using perl and curl and it was quite complicated and hard to maintain. After a few instances of my bank changing their webpage I got fed up of debugging it to keep it up to date.
So what's the best way of writing such a program in such a way that it's easy to maintain? I'd like to write a nice well engineered version in either Perl or Java which will be easy to update when the bank inevitably fiddle with their web site.
In Perl, something like WWW::Mechanize can already make your script more simple and robust, because it can find HTML forms in previous responses from the website. You can fill in these forms to prepare a new request. For example:
my $mech = WWW::Mechanize->new();
$mech->get($url);
$mech->submit_form(
form_number => 1,
fields => { password => $password },
);
die unless ($mech->success);
A combination of WWW::Mechanize and Web::Scraper are the two tools that make me most productive. Theres a nice article about that combination at the catalyzed.org
If I were to give you one advice, it would be to use XPath for all your scraping needs. Avoid regexes.
Hmm, just found
Finance::Bank::Natwest
Which is a perl module specifically for my bank! Wasn't expecting it to be quite that easy.
A lot of banks publish their data in a standard format, which is commonly used by personal finance packages such as MS Money or Quicken to download transaction information. You could look for that hook and download using the same API, and then parse the data on your end (e.g. parse Excel documents with Spreadsheet::ParseExcel, and Quicken docs with Finance::QIF).
Edit (reply to comment): Have you considered contacting your bank and asking them how you can programmatically log into your account in order to download the financial data? Many/most banks have an API for this (which Quicken etc make use of, as described above).
There's a currently up to date Ruby implementation here:
http://github.com/warm/NatWoogle
Use perl and the web::scraper package:
link text

Best method to write an email poller

I am working on an email polling solution, for a multi-user system. So users can send emails on their respective ids and it would be polled and inserted to a db.
There are two options that I am considering:
Perl/Unix based email pollers..
A java based poller.
What would you recommend.. (other suggestions are also welcome)
Instead of polling, why don't you forward the mail to a process? Depending on the mail server you use, you can do that as an alias or even in the .forward file.
I've nothing much to add to this, but there's currently a project at google code to rebuild iwantsandy.com as open source.
It's at:
http://code.google.com/p/sandysback/
I'm definitely going to be watching this to see how they parse emails, and have those emails "inserted into a db"
Whichever language you have most experience in!
I personally know java and perl well and for this task I would choose perl but the differneces are marginal.
Perl would be shorter and sweeter, java would be take longer but probably be a more robust solution once the database access is sorted out.
I find Perl DBI is a better and more portable database interface than JDBC which does not hide database implementations from your code and is sensitive to version changes etc. I.E. you must have the right version of the right database driver for your target database.
RE: Poling
If you have the option to forward the email to a process I would highly recommended you do that. (Forwarding generally puts less load on the server than poling does.) If not, then poling is the next best thing. Look into the POP3 client libraries on whichever language you are most comfortable with.
RE: Language choice
If I intended to do a lot of parsing of the emails then Perl would be my choice. If not much parsing is involved then Java would be the way to go for me ;-).
-- In a former life I wrote a Perl script to parse (well structured) incoming emails into HTML pages and post them to a web server.
You have a couple of options. As the orginal poster said - probably the simplest way is to set up an entry in the aliases file to a script.
Then the body of the email gets passed as standard input to the script. You can then use a perl script + Mime modules to parse out the bits of the message and do whatever you want with it.
One might also look at apache james - which is a custom mail server. They have the equivilent of servlets, called 'maillets' that you put your business logic in. They often hard to deploy in enterprise scenario's though as most companies don't like having custom mailservers being deployed.
... the aliases route is probably your best bet. one other note of caution - email isn't gauranteed. if you are using this as some sort of app to app messaging system, and you control both ends, you should probably look at something else, like JMS type messaging.
-Ace