Perl Dancer send_file Issue with Images - perl

I have a Perl Dancer web application that uses GD to dynamically create images. I am trying to deliver these images to the user as PNG. For example:
package MyApp;
use Dancer ':syntax';
use GD;
...
get '/dynamic_image/:var1/:var2' => sub {
my $im = GD::Image->new(100,100);
my $black = $im->colorAllocate(0,0,0);
my $white = $im->colorAllocate(255,255,255);
$im->rectangle(10,10,90,90,$white);
my $png = $im->png;
return send_file( \$png, content_type => 'image/png', filename => params->{var1}."_".params->{var2}.".png" );
};
However, when accessing the above route, Chrome and Firefox don't seem to know what to do with the image data. If I try to use the route in Lightbox, Chrome complains. For example, when clicking on a link like this:
link
Chrome's console says:
Resource interpreted as Image but transferred with MIME type application/octet-stream: "http://www.example.com/dynamic_image/my/image".
It looks like Dancer is not using content_type correctly. Interestingly, IE8 seems to load the images just fine. Any idea what's going on? I'm currently running it standalone on Windows 7 with Strawberry Perl v5.16.2.

To explain the different behavior with IE: If IE encounters a Content-Type of application/octet-stream, it will attempt to scan the file to determine a more specific MIME type. That behavior is covered more here.
I recommend using the GET` commandline tool from Perl's LWP distribution to confirm what's going on. You can try this:
GET -sSe http://www.example.com/dynamic_image/my/image | less
The result should include among other things the Content-Type header. It sounds like you'll find that it says application/octet-stream. This starts to look like an issue with Dancer.
You didn't specify what version of Dancer you are using. Older versions did not support the content_type option to send_file(). If you are are reading the latest docs on CPAN and expecting them to apply to an older version, there could be some confusion.

It does not seem to be a dancer problem. There are other environments where it happens too.
Resource interpreted as Document but transferred with
MIME type image

After banging my head against this for awhile, I think I can answer my own question. Firefox actually tipped me off to a bug in my own code. Basically, when accessing the dynamically created image in Firefox, it would display a page with the HTTP request info along with the PNG data. I noticed that some debugging text was displayed on the page. It turns out that I left a print in one of the loops that generated the image data (I had used it to verify the image was being built correctly), and that text somehow made it into the "image" itself--which I assume caused Firefox and Chrome to freak out a bit. So this wasn't a Dancer or application bug, but a PEBKAC issue. Thanks for the input, everybody.

Related

Print page content

I just started playing with google chrome apps. I've searched the internet and haven't found the way to print the content of the windows. Tried using windows.print(), but nothing happened.
As far as I have read, the print() wont work since it is called in the background.html that does not have any content. How can I make the call at the correct place and send the content of the app to the printer?
Thank you in advance!
You're right that this can't be done through the Background page, since it is not rendered. What you'll need to do is inject a "content script" into the page you would like to print. The content script would contain the print command, and probably whatever would trigger the print command.
In a nutshell, "content scripts" are scripts that are injected into the pages a user browses. You can inject pretty much any JavaScript you like, and even inject entire JavaScript libraries like JQuery. More details can be found here:
https://developer.chrome.com/extensions/content_scripts.html
If you decide to use a popup window to trigger the print you can pass a message to the window you would like to print. Message passing is how the different components of an extension (background page, content script, popup) communicate. More info can be found here:
http://developer.chrome.com/extensions/messaging.html
Printing in apps is not yet supported, I believe. See
Issue 131279: async version of window.print()

CURL class that works like simple HTML DOM?

So i've been using both CURL and simple_html_dom for a while, for anyone who is not familiar with simple HTML DOM - It allows you to go through elements with ease and without the hassle of having to use regex/exploding stuff and so on.
E.g.
$html = file_get_html($obj->loc);
$item['title'] = $html->find('#Prod-Name h1',0)->plaintext;
However as far as i'm aware this does not support cookies - like CURL does, is there something out there that does?
Would be interested to hear peoples experience in this screen scraping/bot creation.
You can just download with curl and parse it with the parsing lib of your choice. I use this method sometimes but I'm not very happy with it, it would be nice if php had some decent scraping libs and even nicer if they were built in.

Gtkwebkit, save html to pdf

Last days I search for best and shortest way to convert html files to pdf. Since I create my html files with C program and see them through gtkwebkit which uses cairo it should be some efficient and direct way to convert content of showed page to html with C (I think).
But can't find any example or direction to go on the net.
Until now, among different virtual printers, I find only commandline tools which are maded in perl or which depends on qt what is not wanted.
Please for any suggestion, example or advice to get this functionality from gtkwebkit and if not, maybe something with some tiny C library.
As far as I can tell from reading the documentation (haven't tried it out myself):
Get the main frame with webkit_web_view_get_main_frame().
Create a GtkPrintOperation with gtk_print_operation_new().
Set the export-file property on your print operation to be the name of the PDF you want to export to.
Print the frame with webkit_web_frame_print_full(). Make sure to pass GTK_PRINT_OPERATION_ACTION_EXPORT as the 'action' parameter.
I once wrote some code, to accomplish that without opening a window. But then I ran into a problem with using that code from multiple threads (in a webserver e.g.). I made some research and I figured out that gtk itself is single threaded. So I made my code thread safe, by queuing the print operations to the main thread. Anyway, if it helps, check it out... https://github.com/gnudles/wkgtkprinter

Perl WWW-Mechanize Module

I am using www-mechanize module to access website controls. Some html pages contains frames. I cant get the links names and i am unable to access the links in frames. Please any one suggest right solution to resolve this issue.
Working Platform: Windows, Perl
Thanks in advance
From what I see, WWW::Mechanize does not load frames automatically; you need to do so yourself. You can get links to the frames with:
#frames = $mech->find_link( 'tag' => 'frame' );
and then $mech->get each one (cloning your mech object if necessary).

How can I download Yahoo Groups?

I want to download some Yahoo Groups (files, photos, messages, memberlist) and I've found these scripts:
http://freshmeat.net/projects/grabyahoogroup/
http://sourceforge.net/project/showfiles.php?group_id=62034
I've downloaded ActivePerl and the needed modules from CPAN (nothing fancy; they're very easy to find). I've managed to install them, but when I run the script I get an error after it tells me that I've successfully logged in:
"Use of uninitialized value $cells in pattern match (m//) at yahoogroups_files.pl line 244, line 2."
I'm guessing that Yahoo changed the layout of the page or something, but I'm not able to update the script myself. I'm a newbie when it comes to Perl and understanding the way Yahoo generates the pages, I only know some basic C++. I want to mention that I'm not lazy, I'll try do fix it myself but I need your help: hints, advice, anything.
PS: I've contacted the author, but he isn't willing to update the scripts.
You would need knowledge in the following fields:
use of an html parser
http knowledge ( get/post/head )
web scraping
I suggest you focus on WWW::Mechanize since it's capable of all these things ( and more )
EDIT: another solution ( that doesn't need programming ) , is this: login with your browser on yahoo groups, store the cookie, and then run wget , passing the stored cookie as a parameter. This way you'll get the task accomplished very fast.
Find your browser's cookies.txt file on your harddrive, and then call wget like this ( if I remember the commands correctly ) :
wget --load-cookies path_to_cookie_file -r -w 60 website
The full man page can be found here
EDIT2: Another option is to use WebDriver to automate firefox. You can use this article as a guide on how to accomplish this.
By the filename I'm assuming you're using Yahoo Group archiver found here: http://sourceforge.net/projects/grabyahoogroup/
I ran the files script against the SubEthaEdit group and it works great. All of the files downloaded without incident.
Looking at the code it seems to barf while processing an html table in a while loop if $cells is empty.
Considering the code did work when I tested it it's possible there's something going on with the listing of that group's files. You'll want to try outputting $content and figure out where and why the regular expression on 243 isn't able to process that html.
EDIT: If you don't mind posting the group this is happening with I'm sure myself or someone else here can try it out and troubleshoot on their own. It's tough to pinpoint what's up when the issue can't be duplicated. Also, try the same group I did and see if it works out for you. Certainly something up with the group you're trying if that works.
Dunno if it will help you, but here's what I did to get the message-download working:
http://sourceforge.net/forum/forum.php?thread_id=3283915&forum_id=209170
(I only used message-download, I didn't look at file-download)
Was tinkering on this a while ago to backup my girlfriend's group messages and files from uni. Upon debugging on the latest scripts I've found out that there seems to be a bug on group_domain declaration (theres also a group declaration bug that i've found on yahoo2maildir.pl of the same project, see $request)
($group_domain) = $url =~ /\/\/(.*?groups.yahoo.com)\//;
in this case, i've overwritten the $request var under the function sub download_folder() with
from <br>
$request = GET "http://$group_domain/group/$group/files$sub_folder/";
<br> to <br>
$request = GET "http://**groups.yahoo.com/group/$user_group**/files$sub_folder/";
grabyahoogroup works well in the latest edition, which can be found at the svn repo:
http://grabyahoogroup.svn.sourceforge.net/viewvc/grabyahoogroup/trunk/yahoo_group/
The version at sourceforge.net/projects/grabyahoogroup/files/ HAS BUGS AND DID NOT WORK FOR ME.
I've been looking for a tool that collects messages/conversations from Yahoo Groups!. I finally found this tool that converts your Yahoo! Groups messages into MBOX format after struggling to try to make my own and searching everywhere on the internet.
Download tools
Both of the following are Google Chrome extensions.
Chrome Extension to Download Members posted by Sam Hobbs (2015).
Chrome Application To Download Messages posted by Mark Fletcher (Jan 2016).
Plain string to Base64 binary data
At some time past September 16, 2010 (at least for me), the messages retrieved are no longer plain text and instead Base 64 binary data (ASCII). Using this swiss converter tool can allow you to read the data as it is.
Sample content from the MBOX format
VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4=
Sample result after conversion
The quick brown fox jumps over the lazy dog.
for cause, as of 2019/09
https://github.com/csaftoiu/yahoo-groups-backup
.....