Does anyone know of an http client that is scripting friendly (ie: the basics, gets, posts) and is capable of executing javascript (all, not just location redirect) ? And one which isn't just launching another browser.
There are now tools to achieve exactly what you are asking. The best class of tool, if not the only one, is probably the "headless-browser".
There have apparently been a few attempts at headless browsers, but the one that seems to have got it right is called PhantomJS.
PhantomJS is basically a WebKit browser without any display, so all the layout logic, JavaScript, etc is all in there along with the basic HTTP client, just like in a browser - because it is a browser.
PhantomJS exposes some kind of interface in JavaScript but apparently it's not so easy to use on its own. Another project has popped up to make it more useful, CasperJS.
One more project deserves mention here, SpookyJS. It's job is to act as a middleman between node.js and PhantomJS, since both implement a JavaScript event loop it's not easy to integrate them. With SpookyJS you can script a HTTP client in JavaScript on your desktop or server.
As far as I know there is no such thing available (although I'm keeping an eye on this thread hoping to be proved wrong).
However if your prepared to roll up your sleeves and do some work, then it should be possible to implement sucah a thing based on Firefox with a xul script - or you might consider looking at, for example, rhino - which is a javascript engine without a browser.
Elinks is a text-mode browser with javascript - so it would probably be simpler to run that in a pty compared with implementing your own browser component and exposing the DOM to rhino.
Related
I am trying to find a high-level Clojure library for making HTTP and HTTPS requests, parsing out forms and links from responses and then POST-ing updated forms or following links. Ideally something that would automatically handle redirects and cookies (i.e. sessions). That is, I'd like to find something whereby my code can as closely as possible mimic a user driving a webapp from a browser, without the browser.
A number of years ago we used Hpricot and Ruby for a similar task but I'm prefer to do this in Clojure if at all possible. From memory - and I haven't used Hpricot for years - we were able to do all this with minimal effort: we were able to concentrate on the 'what' of driving the application, not the 'how'.
I found clj-http https://github.com/dakrone/clj-http but this seems to be one step lower-level than I'm looking for (no form parsing) - although it is based on Apache HttpComponents http://hc.apache.org/httpcomponents-client-ga/ which does seem to expose a nice, fluent, API for forms http://hc.apache.org/httpcomponents-client-ga/tutorial/html/fluent.html.
Screen scraping in clojure asks about screen-scraping in Clojure, and there are several good suggestions for that, but nothing that really addresses the above.
HTTP Kit http://www.http-kit.org/client.html looks like it would be a great foundation for the above but doesn't do form parsing or session management (as far as I can see).
Currently I've veering toward using the Apache HttpComponents Java library directly from Clojure. Can anyone suggest any better - perhaps more Clojure idiomatic - alternative? Or anything that they found worked well in similar circumstances? My goal is to write the minimal amount of code quickly to investigate a problem with a web service. This is not production code. Saving time, rather than getting an 'ideal' solution is my main concern.
[The background is that I am trying to mimic certain forms of user behaviour in order to first reproduce and then try and track down an intermitent bug in a large body of legacy Java/EJB code. However the problem only seems to occur one time per several thousand POSTs. (The suspicious is of some form of caching issue.) The existence of the problem, after the fact, is easy to detect however.]
Have you looked at the Enlive library yet? Here is a good tutorial on it.
You seem to really have 2 parts here. The first part is (1) a Selenium-like client, which drives (2) a webserver.
For part (1), either Selenium, Enlive, or something similar will allow you to simulate a browser to submit data, read the responses, and respond from there. For part (2), it seems you just need a regular Clojure web framework such as Ring/Compojure (older & simpler) or Pedestal (newer & more powerful).
I'm trying to set up some integration between Chrome, and various command-line tools and build systems that I have. Almost everything that I want to do within Chrome is supported by the extensions API, so I figured I'd make an extension, set up communication between it and my external tools, and go from there.
Unfortunately, I can't find any sane way to get messages in and out of Chrome. The only thing I could find that would plausibly work at all, would be introduce a local web server as a message broker, having the extension connect to it with WebSockets, and then having the command-line utilities do the same. But that's way too much complexity - it'd basically mean writing a whole IPC framework.
Is there any reasonable way to do this?
There is currently no way to let extensions communicate outside Chrome without XHR/WebSockets/SocketAPI or any traditional methods like Image URLs, JavaScript URLs etc.
If you want make an overkill, you could try creating a NPAPI Plugin that writes protocol messages to disk/file (like how Apache WebServer does), and create another standalone Python script/or any other scripting language that tails that. So your API would basically read that file that the NPAPI Extension Plugin creates.
I am looking to start Web Programming in Perl (Perl is the only language I know). The problem is, I have no prior knowledge of anything to do with the web, except surfing it. I have no idea where to start.
So my question(s) is...
Where do I start learning Web Programming? What should I know? What should I use?
I thank everybody in advance for answering and helping.
The key things to understand are:
What you can send to browsers
… or rather, the things you intend to send to browsers, but having an awareness of what else is out there is useful (since, in complex web applications in particular, you will need to select appropriate data formats).
e.g.
HTML
CSS
JavaScript
Images
JSON
XML
PDFs
When you are generating data dynamically, you should also understand the available tools (e.g. the Perl community has a strong preference for TT for generating HTML, but there are other options such as Mason, while JSON::Any tends to be my goto for JSON).
Transport mechanisms
HTTP (including what status codes to use and when, how to do redirects, what methods (POST, GET, PUT, etc) to use and when).
HTTPS (HTTP with SSL encryption)
How to get a webserver to talk to your Perl
PSGI/Plack if you want modern and efficient
CGI for very simple
mod_perl if you want crazy levels of power (I've seen someone turn then Apache HTTPD into an SMTP spam filter using it).
Security
How to guard against malicious input (which basically comes down to knowing how to take data in one format (such as submitted form data) and convert it to another (such as HTML or SQL).
Web Frameworks
You can push a lot of work off to frameworks, which provide structured ways to organise a web applications.
Web::Simple is simple
Dancer seems to be holding the middle ground (although I have to confess that I haven't had a chance to use it yet)
Catalyst probably has the steepest learning curve but comes with a lot of power and plugins.
Dependent on complexity of your project, you could have a look at Catalyst MVC. This is a good framework, messing up with the most request stuff, but gives you enough in deep view whats going on.
There is a good tutorial in CPAN
If you want to start with mod_perl or CGI, there are also some Tutorials :
mod_perl
CGI Doc
If you're looking to try some web programming in Perl, you could try hosting a Dancer app for free on OpenShift Express.
There's even a "Dancer on OpenShift Express" repo to get you started: https://github.com/openshift/dancer-example
Obviously, the answer to the question depends on a number of environmental factors.
In general, I'm wondering what people's experiences are with HtmlUnitDriver as a reliable tool that can be "trusted" to navigate a website basically the same way other browsers do.
Of course, I realize "the way other browsers do" is pretty nebulous; naturally every browser will have its quirks. But I am on a project where we have hundreds of acceptance test scenarios (written in JBehave) and using FirefoxDriver and InternetExplorerDriver, running all of them takes over two hours, which is kind of rough from a continuous integration standpoint. So I'm wondering if it's at least feasible that we could switch our acceptance tests over to use HtmlUnitDriver and expect much faster times with mostly the same behavior (and perhaps we could expect a handful of tests to fail using HtmlUnitDriver and specifically run those tests with a browser-based driver).
Our UI uses GWT, which may or may not complicate things (I don't know).
Basically, in others' experience, does HtmlUnitDriver operate about as well as another browser, or is it really only appropriate for very simple HTML websites with minimal JavaScript and should not be used for an enterprise web application?
From my experiences with using HtmlUnitDriver I would say that if you don't use it as your baseline browser when writing your tests then converting them to use it becomes a bit of a nightmare. This is especially true when it comes to javascript heavy sites.
The main reason for this is the obvious underlying use of htmlunit which, by default, uses the Rhino javascript engine. In the past I've always had to specify that HtmlUnitDriver start htmlunit using Firefox's javascript engine. This, for the most part, solved the javascript issues I was finding while running tests using HtmlUnitDriver.
One of the biggest issues I faced when it came to using the same test code for each browser was if, on the site under test, the UI developers had assigned javascript events such as onClick() to html elements such as a <span>.
The reason for this is that if you were to use WebDriver's .click() method on a WebElement representing the <span>, then htmlunit would not do anything (it expects an onClick() to be called on elements such as an <input>).
To get around this I had to manually call a click() event in javascript. You can do this either by using WebDriver's JavascriptExecutor or by using a WebDriverBackedSelenium and Selenium's .fireEvent() method.
So if your site uses such events then I'd say switching to use HtmlUnitDriver could be a big task.
Despite this, I actually use HtmlUnitDriver for all my tests. However, I went through the pains of discovering all of the above a while back, so now use HtmlUnitDriver as my baseline browser when writing tests.
I have used mechanize in Python with great success. However, I am trying to learn Scala. I have an IRC bot that I would like to add some features to, mostly having to do with screen scraping web pages from our corporate intranet. That requires being redirected to a corp-wide login page, then going to the destination, then having to possibly submit another login.
Does anyone know of something that I can use from Scala to get this sort of functionality?
I don't know any Scala effort of similar functionality. Pending answers to the contrary, I advise you to look for Java libraries of similar functionality.
The closest Java libraries I can think of are browser drivers. The most well-known are Selenium and WebDriver. The latter also offers an in-process mode.
Since Selenium's API isn't all that pleasent to use, a couple of projects sprung-up with DSLish façades: Selenium DSL and Selenium Inspector.
A caveat is that they are all oriented towards testing of web application, so they might be lacking in features that attend your case.