How do I crawl a website written in JSP (Java Server Pages) using Perl Script? - perl

I have here this website:https://www.connect2nse.com/iislNet/UserFolder.jsp
Firstly i tried using WWW::Mechanize, but it doesn't seem to work. WWW::Mechanize doesn't work with JSP written website. So I researched about how to download a file in a website written in JSP, but can't find a good one. Can anybody help me with this one? Thanks in advance.

As far as the client is concerned, JavaServer Pages is identical to PHP, Perl, or even static HTML files. The result is a page of HTML that can be rendered and displayed, and the source of the data isn't the reason for WWW::Mechanize failing to do what you want
Doesn't work is useless as a problem description, and the issue could be pretty much anything. However, if the HTML is associated with some JavaScript (which is executed on the client system after the page has been retrieved and not on the server) then it may be more or less handicapped because WWW::Mechanize doesn't support JavaScript. For that you will need to use WWW::Mechanize::Firefox or similar, which works by using a real instance of Firefox to render the HTML and execute any JavaScript

Related

How can I "drill down" into a website using Perl's WWW::Mechanize

I have used the WWW::Mechanize Perl module on a number of projects and it's helped me out a lot.
I am trying to use it on a different site and I can't "drill down" into the content of the site.
The site is https://customer.bookingbug.com/?client=hantsrecyclingcentres#/services
I have tried figure out what the URL would be to get content shown in the resulting HTML, such as bb.d570283b87c834518ba9.css, bb.d570283b87c834518ba9.js and version.js
I tried to copy the resulting HTML into this posting, but used all sorts of quote and code sample combinations and it wouldn't display properly.
Does anyone have any idea how I "navigate" this site using this Perl module please?
WWW::Mechanize is a web client with some HTML parsing capabilities. But as you clearly noticed, the information you want is not in the HTML document you requested. Either download the correct document (whatever that might be), or do what the browser does and execute the JavaScript. This would require a JavaScript engine. The simplest way to achieve that is to remote-control a web browser (e.g. using Selenium::Chrome).

How to include CGI script inside HTML document?

I have to include my CGI script login.pl inside an HTML document index.html. I googled for answers and was surprised to find out that it was hard to get a definitive answer to this question. Some suggest using server side includes, but as far as I understand those are used for putting HTML inside of CGI, which is not what I want. I know that in JSP and PHP one can use tags like <% %> and php tags to include code inside of HTML document. Is there a similar construct for CGI? P.S. I am using CGI.pm framework and want to run output of login.pl inside of index.html.
What you are looking for is HTML::Template or Template::Toolkit. Either of these will allow you to put tags in your HTML file and your CGI script can populate the data.
Note that you will be posting to or getting from the CGI script which will read the HTML file, populate it and then send built HTML file to the client. The client's browser would not be accessing the HTML file directly.
In your case, the client's browser will post to login.pl and then, on the server, the perl script will run and build the HTML file and serve it to the client.
It's not possible. CGI is a method for invoking scripts or programs on the server.
See https://en.wikipedia.org/wiki/Common_Gateway_Interface
PHP, ASP and JSP as special documents which are parsed by an interpreter on the server side which then executes the included code. HTML documents cannot be executed, thus, it's not possible.
However, you could make use of Server Side Includes (SSI, https://en.wikipedia.org/wiki/Server_Side_Includes) and include/call CGI programs from there - it's however, less powerful than PHP, JSP and ASP).

How to upload an image with jquery in all browsers

I am currently creating a form that involves a file uploader. Currently
my form is fine just using multipart and post but later on in future iterations
it will be necessary for the form to be posted with ajax in order to edit the
image before submitting the form.
I have seen a lot of things about multiple files like jquery-file-upload and swf
and php with uploadify and a whole host of non IE 7+ solutions. However those are
not going to work for this specific project and I am really just looking for the bare
bones nothing fancy to have to deal with just sends the image data to an endpoint.
What is the best way to do this in a way that can support all browsers.
=====EDIT=====
I havent tested this completely yet but this solution seems good to me
https://github.com/francois2metz/html5-formdata
Fine Uploader is a library that provides the ability to support cross-browser uploading. Ajax/xhr post requests are used for all browsers that support the file api. Otherwise, a form-based upload method is used. No flash is used or needed. This is all transparent to the user. Check out fineuploader.com for more details.
You can't send file through AJAX request this is just impossible. If you use HTML5 File API that would work but as you stated in your question you need to support old browsers. So I think you either have to use flash (uploadify uses flash as well) or you don't have any other option.
You can have a look at this question/answer:
jQuery Ajax File Upload

How safe is the data being parsed by RTF editors like TinyMCE?

I have a great concern in deploying the TinyMCE editor on a website. Looking at the code parsed by the editor it does a great job, and I leave the HTML button off the toolbar configuration so users can not inject their own source.
However, from what I read in the TinyMCE docs, it claims to degrade nicely to a regular textarea should javascript be disabled on a users browser... and therein lies my concern. If it does revert to a normal textarea, then the user is then able to easily inject their own HTML, and this leaves me with a security concern.
I just pass through data created with TinyMCE, and it is used within another page created by my script, so it poses no security risk to my server. The security concern arises over what malicious data may be passed to another user viewing the generated page.
I know many of you will tell me to just use regexes, or parse this data, but that itself could be a nightmare, as I would be trying to either...
a.) Use regexes to try and clean up the HTML without breaking the generated page,
and it is better to parse the data for that anyway.
b.) Reparsing data that has already been parsed by the RTF editor, which also
would probably end up breaking the generated page.
Anyone with any previous experience with this type of scenario, I would really appreciate a 'heads-up' as to any other risks that using an RTF editor for user data could entail.
I would really like to provide this as a user option, but not if the risks outweigh giving the user using the RTF a chance to take a wack at another user viewing the page that is generated by the script.
My gut feeling is to steer a wide berth around use of the RTF at this point.
Thanks for any direction you can give me with your own experiences.
You cannot have client-side security on the web. You simply can't trust the browser, because it's easy for a malicious user to substitute a replacement browser that does whatever he wants.
If you accept HTML from users (using TinyMCE or through any other method) and display it to other users, you must sanitize or validate the HTML in some way on the server. If you're using Perl, the leading package seems to be HTML::Scrubber (along with various other modules that help you plug it in to various frameworks). I haven't had occasion to try it myself.
The TinyMCE Security page mentions some ways to make it harder for people to submit arbitrary HTML, but you still need server-side checks.
Regex is generally not considered good for parsing HTML
RegEx match open tags except XHTML self-contained tags but I have noted the "perl" tag :)
My advice when taking markup from users is to always parse it through something that can accept mal-formed HTML and return well formed HTML. These parses generally produce something that can be queried and updated with some form of XPath.
In Python there is a module called BeautifulSoup, Ruby has Nokogiri and in ASP.NET there is a project called HtmlAgilityPack that all do this sort of thing. I'm not sure what library perl has, but I'm sure there would be something.

GWT Toolkit: preprocessing files on client side

If there's a way for the client side GWT code to pre-process a file on the client computer?
For example, to calculate a checksum of it before submitting the file to the server.
No it is not possible. The manipulation of the file is done by the browser, not the HTML code.
Think about it, GWT is 100% javascript. And javascript has no access whatsoever of the file in your computer. That would be an pretty big security risk! GWT "wraps" the file input box so it can be displayed inside the GWT panel. But once you press the "upload" button, the upload is done by the browser.
You could do file manipulation with another technology however. Java applets for example. But that is outside of GWT area...
Using GWT, there is no way to read files on the client side yet. However, in HTML5, you can read files without uploading to a server using the "File API".
Links are provided below.
File API tutorial on html5rocks.com
Example of how to use File API in GWT via JSNI
I'm pretty sure that because GWT code compiles to pure JavaScript, there isn't a way without requiring some third-party browser plugin.
Do you mean from an <input type="file"...> file upload field in a form?
The short answer is no-- file uploads are handled by the browser, and are sent directly to the server (as an ENCODING_MULTIPART POST). And security restrictions on JavaScript mean there's no way to workaround that restriction.