Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
So far I've been using wget and curl to do screen scraping. Now I would like to switch to perl. What's a good tutorial that will cover basic web programming in perl (preferably without restating the basics)? I'm talking about basic things like getting and parsing pages, submitting forms, proxies, etc.
I've used WWW-Mechanize in the past to achieve the basic web crawling functionality, including form submission and the like.
There are some pretty good examples.
These should pretty much cover everything you're looking for:
http://www.perl.com/pub/2002/08/20/perlandlwp.html
http://lwp.interglacial.com/
http://www.perl.com/pub/2003/01/22/mechanize.html
http://gd.tuwien.ac.at/linux/ldp/LDP/LGNET/108/oregan2.html
Tools you will need besides Perl:
HTTP Live Headers (Firefox extension) or eqv. to reverse engineer Javascript requests to primitive GET / POST so you can mimic that with Mechanize or LWP, etc.
As already mentioned by other posters, a good headless-browser is WWW::Mechanize module.
I would suggest spending some time learning HTML::TreeBuilder & especially HTML::TreeBuilder::XPath and HTML::Query. the last two will become very handy when you will want to get actual data from HTML documents.
HTML::TableExtract is also a nice module to extract data from HTML tables when needed.
basically, using all of the above will give you the ability to crawl most sites.
Have fun crawling (-:
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I've been using mod_perl for years. I have a few modules that handle Apache requests at early states, basically custom responses based on request headers that alter the normal response from Apache, like custom response codes and things like that.
I've been told by others that these days there are better ways to run Perl applications in a fast way (e.g. with a persistent interpreter that only takes subs as request handlers, similar to mod_perl), but none of them can tell me with good authority or experience what is proven to work as fast (or even better, if better) as mod_perl.
I'd like to get a more experienced opinion on that subject and I thought StackOverflow can be a perfect place to get answers from such people.
So, as of 2014, which alternatives to mod_perl are proven to be good or even better (in terms of performance and reliability) and why? Which pros or cons do you get with them compared to mod_perl?
The Plack module, which implements the Perl Web Server Gateway Interface (PSGI)
is popular for good reason.
It presents a standard API that allows a Perl web application to run on old CGI, FastCGI, mod_perl, and others, or it can behave as a stand-alone web server on its own.
I can't offer any benchmark figures, but I will update this answer if I find anything relevant.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Can you recommend a CMS that exposes the entire stored content through some kind of API (HTTP, XML-RPC, web service)?
I want to use it only for creating/editing content, and the content will be then retrieved from another site.
For instance, Wordpress has the type of API I am looking for, but unfortunately it lacks some of the functionality I need (hierarchically organized articles and media, article and image ordering, image galleries...).
You should take a look at http://osmek.com/ or http://expressionengine.com/ with the Export it plugin. It allows for channels to be pulled using a REST API.
I have been on a crusade to separate the CMS from the front-end of all of my projects and I have used ExpressionEngine in the past and really dig it. Another plugin that you may like with EE is Playa,it allows for relational data models.
Good luck
Check out prismic.io, contentful.com and osmek.com
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
With so many tools and technologies lying around, I am looking for suggestions around the best ones (UI/server-side frameworks/database/CMS) to use for building a web(site/app) similar to Facebook itself.
Details of the website cannot be revealed due to privacy concerns. But largely, the experience and interactions would be similar to what Facebook has (such as continuous feeds, groups, upload data/files, comments, etc.), just that it would be in a different domain.
Information (or links) on what technologies/frameworks are such sites/portals using will also be of great help!
Elgg is a great start. they have numerous plugins (some that even make it look very similar to facebook). I've seen some prototypes that where built in a few days that have tons of functionality
The simple answer is PHP. But people likely imagine a LAMP stack.
Facebook has reengineered the front side and back side of PHP, as I understand it. They use the HipHop compiler to cut the cost of execution of PHP. And I don't know the details, but they have some kind of backside distributed database they use instead of PHP/LAMP traditional use of MySQL.
(See http://www.facebook.com/note.php?note_id=24413138919 for a description of one of the mechanisms they use, Cassandra).
If you don't care about scale, you can skip these two steps and save a lot of engineering.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
is there any well written perl open source out there (not using any kinda of framework) that i could use as sample for learning and good pratice of the perl...
I've searched around and found many things for PHP, but nothing in perl that uses no framework.
Thanks in advance.
Have you tried browsing CPAN? You can find code there doing pretty much anything, and many distributions post links to their github repositories, so you can follow along in the development process.
CPAN Ratings has reviews and rankings of a large number of releases, which helps you differentiate between good releases and bad ones, but being able to make this determination for yourself would be best, which you get through learning and experience.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have found services like ClearSpring and Widgetbox for putting content snippets onto a widget range of social networking web sites, but I would like to build my own widgets without a third party dependency. I have been looking but I have not yet found a good resource to learn how to create widgets/gadgets for multiple sites. It was easy to build a gadget for iGoogle, but Facebook, MySpace and the others are less obvious.
What is a good resource to create content snippets for multiple sites?
It is actually more than just different standards. The context of different networks make it harder to map 1 to 1. Sure there is overlap, but each network has a large delta from each other.
Its managing the delta that's the hard part, not learning the different syntaxes.
In the mobile space there are few products, like OpenFeint and PhoneGap.
Most developers tend to roll their own framework since it is such a volatile space.
it seems that they're converging on two camps:
- FaceBook
- everybody else (using OpenSocial)
your iGoogle widget is OpenSocial, so it's almost ready to be used on MySpace and others. now go to FaceBook to cover all.