Perl: Parsing AJAX loaded content - perl

This is an age-old question regarding perl web scrapers after Web 2.0; they simply cannot parse dynamically loaded pages because they need some sort of JavaScript engine in order to render the page. This issue is much more involved than simply rendering JavaScript, since Perl would also have to be able to manage and maintain the DOM.
It seems WWW::Selenium and WWW::Mechanize::Firefox is able to accomplish this by utilizing FireFox (or other browsers) to do the rendering for it. However, V8 has become so popular (as seen with Node.js), so I'm curious if there are any new libraries that utilize it or there has since been a browser-independent solution, which I'm not aware.
I might usually consider this a closable question, but with so few results when Googling and on Stack Overflow, there shouldn't be too many solutions (if any).
Related (older) Questions:
How can I use Perl to grab text from a web page that is dynamically generated with JavaScript?
How can I handle Javascript in a Perl web crawler?

You mentioned Selenium but there is the later version Selenium::Remote::Driver which works with a selenium 2.0 hub.
I see you can also use it without a Selenium hub
Without Standalone Server ( I haven't used this part)
As of v0.25, it's possible to use this module without a standalone
server - that is, you would not need the JRE or the JDK to run your
Selenium tests. See Selenium::Chrome, Selenium::PhantomJS, and
Selenium::Firefox for details. If you'd like additional browsers
besides these, give us a holler over in Github.
PhantomJS may be of interest as it is a headless browser
This is probably not an answer but it was too long for a comment

Related

How to reverse engineer a progressive web app ?

I found this free PWA https://www.the-qrcode-generator.com and now wonder how I could do one such myself.
Since I couldn't find any access to its source code I wondered if it'd be difficult to reverse engineer.
I'm interested in building a PWA with QRCode functionality.
This one was created with AngularJS v1.3.20. You can find the source in your console windows under Sources tab. You can easily beautify the code inside the window to make it readable.
If you want to know how they organized their rest API, the browser network tab will help a lot, just filter by XHR and examine all the call from the front end to be.
The front end is very hard to revers engineer, because most sites are served as minified bundles, so you can't see the original code.
You can however find some other information about what they used to build it, for example in the html source you can see some ng-* tags, which indicates that this is angular, you can also see that body has attribute data-ng-app meaning this is angularjs and so on.
For the QR logic you can see that there are no back end calls, meaning that it is written entirely in the client. I would search for already available solutions for that.

Cannot get JxBrowser to render in eclipse rcp JavaFx environment

I am currently evaluating JxBrowser 6.17 as an alternative browser technology for a eclipse RCP based data maintenance application, since the SWT Browser does not suit our needs.
What are our special needs anyways? Well, we need to embed an older solution into our new application, since we would not be able to add all required features into the new application in time. Since we have the required features in an older JSP based web application we need to embed it to make the functionality available to our customers. This comes with a lot of issues, but for most of them we already generated answers. The biggest issue we are currently facing is that the SWT Browser component does not support instance based Cookies which we need, since our web application has Cookie based session management.
I have also tried using the Mozilla implementation using different profiles, which unfortunately is not possible, since the profile management is just as static as the Cookie management.
Next step is to evaluate commercial frameworks with which I am currently facing some issues.
I requested an evaluation license for JxBrowser and tried to embed it into our eclipse 4 RCP application. I would like to embed it using JavaFX, since we already use JavaFX and would like to avoid AWT.
Using the following code will instantiate a JxBrowser and load the given URL. Actually the request does get fired, but the Browser does not render any content at all.
FXCanvas canvas = new FXCanvas(parent, SWT.NONE);
Browser browser = new Browser();
BrowserView browserView = new BrowserView(browser);
browser.loadURL("http://www.google.com");
canvas.setScene(new Scene(browserView));
Even when loading specific HTML it will not render any content.
There are no observable Exceptions or errors.
Does anybody have an idea on whats the issue in my case?
In a different scenario we use the JavaFX WebView embedding the CKEditor into our application and everything works (more or less) flawlessly, but I am not able to get the JxBrowser to render its contents.
I am sure, that I am doing something wrong (probably something pretty basic :))
What could I be missing?
Any idea or tip could do the trick ;)

deleting page version numbers in form action URLs in wicket for stress testing purposes

I want to stress test a system based on Apache Wicket, using grinder.
So what I did was that I used grinder's TCP Proxy tool to record a test session in my Application and then fed the generated test script to grinder to stress test the system; but we found out the tests aren't carried out successfully.
After a lot of tweaking and debugging, we found out that the problem was within the wicket's URL generation system, where it mixes the page version number into its URLs.
So I searched and found solutions for removing that page version number from the URLs (Like this), and used them and they worked and removed those version numbers from the URLs used in the browser. But then again, the tests didn't work.
So I inspected more and found out that even though the URLs are clean now, the action attribute of forms still use URLs mixed with page version number like this one : ./?4-1.[wicket-path of the form]
So is there anyway to remove these version numbers from form URLs as well? If not, is there any other way to overcome this problem and be able to stress test a wicket web application?
Thanks in advance
I have not used grinder, but I have successfully load-tested my wicket application using JMeter Proxy; without changing Wicket's default version mechanism.
Here is the JMeter step-by-step link for your reference:
https://jmeter.apache.org/usermanual/jmeter_proxy_step_by_step.pdf
Basically, all I did was running proxy server to accept web requests from the browser to capture the test scenarios. Once done collecting the samples, then change the target host url to whichever server you want to point to (other than your localhost).
Alternatively, there is another load testing tool BlazeMeter (compatible with JMeter). You could add the chrome browser plugin for quick understanding.
Also, you might want to consider mounting your packages to individual urls for 'cleaner' urls. That way, you have set of known urls generated for pages within same package (for example, /reports for all the reports pages within reports package).
Hope this helps!
-Mihir.
You should not ignore/remove the pageId from the urls. If you remove them then you will request a completely new instance of the page, i.e. you will lose any state from the original page.
Instead of using the href when recording you need to use the attribute set (by you!) with org.apache.wicket.settings.DebugSettings#setComponentPathAttributeName(String).
So Grinder/JMeter/Gatling/... should keep track of this special attribute instead of 'href' and later find the link to click by using CSS/XSLT selector.
P.S. If you are not afraid of writing some Scala code then you can take a look at https://github.com/vanillasource/wicket-gatling.

Accurate browser detection/redirect possible using JavaScript?

Please forgive me if this answer is somewhere else on this site or online. If it is, I sure haven't found it in the past several days of searching.
What I am hoping to find is an "accurate" method of detecting a browser and redirecting to a simple, static page if not a recent browser.
The samples I have found until now often have not provided an accurate representation of the actual browser being used. For instance:
When testing with Navigator 9, I'll get a message that I'm using Firefox 2
When testing with Maxthon 3, it reports I'm using IE 9.
My site displays correctly in all the current browsers I've been testing it with. But I wish I could have a basic static page for those .01% who still are using an old browser for whatever reason. They could still get some basic information from my site, as well as encouraged to update to a more current browser.
If anyone has any useful suggestions, I'd greatly appreciate them.
Thanks so much.
Cheers,
David
Browser detection is never perfect, for a variery of reasons. If you are using jQuery, you should look into jQuery.browser.
I'd try to detect the browser on the server side and do an HTTP redirect if the browser is something non-standard. Most decent frameworks have functionality to detect the browser from the user agent string. Again, this is not perfect, mainly because of the data browsers report. Also, if Maxthon reports it's IE, that's because it is based on IE and therefore the layout engine should be the same.
So you either
support a small number of browsers and cater for their quirks, sending all other browsers to a basic page (this sucks for future versions of browsers because they might be standards-compliant but they will still display your very basic page), or
you have a standards-compliant page for all browsers and then you define alternatives for the ones that give you problems.
I'd go for the second option. It usually all boils down to one version for all browsers, and a number of hacks for various versions of IE. Also, remember to avoid padding in your CSS and use margins instead.
In the end, you probably shouldn't be testing for browsers and version numbers, but supported features. Try using Modernizr.
The $.browser property is deprecated in jQuery 1.3. On jQuery support site, they strongly recommend to use the detection feature (JQuery.support) instead of the jQuery.browser property.
Actually, this has been answered already in another question, please check here How can you detect the version of a browser?

How can I extract data from a Java applet (inside the browser)

Well, well, here we go...
We have a java applet running on a regular browser (ff4+ or ie5+).
I do NOT have access to the java code / servlet. Nor even to the server.
I NEED to send/retrieve data from this applet. This means i must emulate an user onto it by clicking buttons and filling form's textboxes and also return data (after server response) wich ll be inside textboxes.
So the technologies avaibles to be used are C, VB, .NET (webbrowser object mainly), PHP (cUrl avaible), JavaScript, Sniffing the browser/server communication using Fiddler.
We really need this. But if thats impossible so we may have to know also.
The data is owned by my company so no copyrigth is inflicted.
Also i'm open to non traditional solution such as saving the html as an image and then retrieve the data using some OCR software...
Well so any suggestion or pointing directions would be gratefully appreciated.
Thx
Paulo Bueno.
Having to emulate a user browsing is wrought with problems and i would suggest an alternate route, if its feasible. These are the steps I would take:
Grab the applet class or jar from my cache (anyone accessing the page / applet can do this).
Decompile the code into Java source (Using jad or other preferred tool)
Review the process with which the applet communicates to the service
Write an application to submit my data to the service that the applet connects to and handle any responses just as the applet would.
You can run any applet without a browser using the "appletviewer" that is shipped in a JRE. This way it is possible but not practicable to read and send fake input with http://code.google.com/p/windowlicker/ to control the SWING GUI.
But within a regular browser environment with access to the code you would rather do this:
using the "scriptable" and "mayscript" attributes/elements in your object tag. standard browser JREs include a "plugin.jar" that contains the needed function to do this job. This interface lets Java-Applet code communicate vice versa to Javascript, from wich you can do whatever you want (i.e. ajax request)
this topic is rather complex, so check out what google tells us:
http://www.htmlcodetutorial.com/applets/_APPLET_MAYSCRIPT.html
http://www.raditha.com/java/javascript.php
Using this interface is a real pain, so i suggest to implement HTTPRequests within your applet to tell the PHP server whatever you want to tell it.
regards,
Michael
I do NOT have access to the java code / servlet. Nor even to the
server.
Emm... It is quite unusual situation. If you have the applet, of course, you should have access to its src files to modify :)
I NEED to send/retrieve data from this applet. This means i must
emulate an user onto it by clicking buttons and filling form's
textboxes and also return data (after server response) wich ll be
inside textboxes.
Anyway, to "emulate" user you can use the Robot object but still it will demand you to modify the applet code to make it support some additional functionality... As I can remember, JS etc cannot control Java Applet from the outside commands unless the applet does contain JS supported functionality for web page interaction... But still you say you don't have any access to the applet src so there is no information does the applet support netscape.javascript or not and how it support it so it is quite unclear... So I must ask do you have any docs of this applet?
Comming around the question text and getting back to the question title itself which says
"How can I extract data from a Java applet (inside the browser)"
I may suppose to say that
To extract data from Java Applet you can use netscape lib which supports Java Applet and JS interaction example, docs. That is the most optimal way in this case
Good luck
Might not be too difficult to de-compile/change/compile again the applet unless it is obfuscated.
I use JAD.
https://stackoverflow.com/questions/31353/is-jad-the-best-java-decompiler
If you must get your data by interacting with the java applet instead of reverse engineering it, check out FEST (Fixtures for Easy Software Testing). FEST is designed for testing Java Swing GUIs by simulating user interaction, but you can easily use it to automate your applet as well.
Check out the documentation page on testing applets to get started.