How to verify the name of a file? - perl

I have a function that returns a file for the user avatar and I want to create tests for it. What I want to do is to check the name of the file.
This is the function:
sub get_avatar {
my $self = shift;
my $username = $self->stash('username');
my $home = Mojo::Home->new;
$home->detect('Project');
my $path = $home->child('users', 'avatars', "$username");
$path = $home->child('img', 'default.png') if !(-e $path);
$self->render_file('filepath' => $path);
}
And this is the test:
$file = $t->get_ok('/user/username/avatar')->status_is(200);
//use Data::Dumper;
//diag Dumper($file);
ok $file =~ 'username';
I want to check if the name of the $file is equivalent to 'username'(which is the name of the file from the server) or 'default' if it is a default avatar.

Don't work with the file path. Work with the actual image. Create two small test images. They can be 2x2 pixel PNG files. Maybe make them single colour, but different. Have one be your default one.
Then serialise that and put it in your test as a string. Run $t->tx->res->body through the same serialisation, and compare these two.
If you wanted, you could also make the test deploy that image before running the code, so your application doesn't depend on the image being there.

Related

Perl: should the function TreeBuilder be adapted when it is in a loop foreach?

My code is to enter an actor name and the program, via the given actor's filmography in IMDB, lists on a hash table all the cinematic genres of the movies he has acted in as well as their frequency. However, I have a problem: When I type a name like "brad pitt" or "bruce willis" after running the program at the prompt, execution takes indefinitely. How do you know what the problem is?
Another problem: when I type "nicolas bedos" (an actor name that I entered from the beginning), it works but it seems that the index is only made for a single movie selected in the #url_links list. Should the look_down function of the TreeBuilder module within a foreach loop be adapted? I was telling myself that the #genres list was overwritten on each iteration so I added a push () but the result remains the same.
use LWP::Simple;
use PerlIO::locale;
use HTML::TreeBuilder;
use WWW::Mechanize;
binmode STDOUT, ':locale';
use strict;
use warnings;
print "Enter the actor's name:";
my $acteur1 = <STDIN>; # the user enters the name of the actor
print "We will analyze the filmography of the actor $actor1 by genre\n";
#we put the link with the given actor in Mechanize variable in order to browse the internet links
my $lien1 = "https://www.imdb.com/find?s=nm&q=$acteur1";
my $mech = WWW::Mechanize->new();
$mech->get($lien1); #we access the search page with the get function
$mech->follow_link( url_regex => qr/nm0/i ); #we access the first result using the follow_link function and the regular expression nm0 which is in the URL
my #url_links= $mech->find_all_links( url_regex => qr/title\/tt/i ); #owe insert in an array all the links having as regular expression "title" in their URL
my $nb_links = #url_links; #we record the number of links in the list in this variable
my $tree = HTML::TreeBuilder->new(); #we create the TreeBuilder module to access a specific text on the page via the tags
my %index; #we create a hashing table
my #genres = (); #we create the genre list to insert all the genres encountered
foreach (#url_links) { #we make a loop to browse all the saved links
my $mech2 = WWW::Mechanize->new();
my $html = $_->url(); #we take the url of the link
if ($html =~ m=^/title=) { #if the url starts with "/title"
$mech2 ->get("https://www.imdb.com$html"); #we complete the link
my $content = $mech2->content; #we take the content of the page
$tree->parse($content); #we access the url and we use the tree to find the strings that interest us
#genres = $tree->look_down ('class', 'see-more inline canwrap', #We have as criterion to access the class = "see-more .."
sub {
my $link = $_[0]->look_down('_tag','a'); #new conditions: <a> tags
$link->attr('href') =~ m{genres=}; #autres conditions: "genres" must be in the URL
}
);
}
}
my #genres1 = (); #we create a new list to insert the words found (the genres of films)
foreach my $e (#genres){ #we create a loop to browse the list
my $genre = $e->as_text; #the text of the list element is inserted into the variable
#genres1 = split(/[à| ]/,$genre); #we remove the unnecessary characters that are spaces, at and | which allow to keep that the terms of genre cine
}
foreach my $e (#genres1){ #another loop to filter listing errors (Genres: etc ..) and add the correct words to the hash table
if ($e ne ("Genres:" or "") ) {
$index{$e}++;
}
}
$tree->delete; #we delete the tree as we no longer need it
foreach my $cle (sort{$index{$b} <=> $index{$a}} keys %index){
print "$cle : $index{$cle}\n"; #we display the hash table with the genres and the number of times that appear in the filmography of the given actor
}
Thank you in advance for your help,
wobot
 
The IMDB Conditions of Use say this:
Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.
So you might want to reconsider what you're doing. Perhaps you could look at the OMDB API instead.

Access environment variable in perl written in double quotes via config file

I have an environment variable $ROOT. For eg. $ROOT = "/someroot" It is accessed in a Perl file via config file parameters.
Eg
In config file :
path = '$ROOT/abc/somepath'
In Perl file while using this variable when I write config->{$path} in back ticks config->{$path} value of $ROOT is accessible i.e /someroot/abc/somepath but when in double quotes "config->{$path}" the result is $ROOT/abc/somepath.
I need this to be written in double quotes for opening files : open (filehandle,"config->{$path}"); How can achieve the value of config->{$path} in double quotes.
P.S I have also used $ENV{'config->{$path}'};
Try
my path = $ENV{"ROOT"} . config->{$path};
open(filehandle, path);
But now you do not have to precede your configured path with $ROOT.
config file: path = '/abc/somepath'
Are you looking for this?
sub get_conf {
my ($config, $key) = #_;
my $val = $config{key};
return undef if !defined($val);
$val =~ s{\$ROOT\b}{$ENV{ROOT}}g;
return $val;
}
my $path = get_conf(config, 'path');
For a more general solution, try one of the String::Interpolate modules on CPAN. I favor String::Interpolate::RE (disclaimer: I wrote it):
use String::Interpolate::RE 'strinterp';
my $path = strinterp( $config{path}, {}, { useENV=> 1 } );

First 8 bytes are always wrong when downloading a file from my script

I have a Mojolicious Lite script that "gives out" an executable file (user can download the file from the script's URL). I keep encoded data in an inline template in DATA section, then encode it and render_data.
get '/download' => sub {
my $self = shift;
my $hex_data = $self->render_partial( 'TestEXE' );
my $bin_data;
while( $hex_data =~ /([^\n]+)\n?/g ) {
$bin_data .= pack "H".(length $1), $1;
}
my $headers = Mojo::Headers->new;
$headers->add( 'Content-Type', 'application/x-download;name=Test.exe' );
$headers->add( 'Content-Disposition', 'attachment;filename=Test.exe' );
$headers->add( 'Content-Description', 'File Transfer');
$self->res->content->headers($headers);
$self->render_data( $bin_data );
};
__DATA__
## TestEXE.html.ep
4d5a90000300000004000000ffff0000b8000000000000004000000000000000
00000000000000000000000000000000000000000000000000000000b0000000
0e1fba0e00b409cd21b8014ccd21546836362070726f6772616d2063616e6e6f
....
When I run this locally (via built in webserver on http://127.0.0.1:3000/, Win7) I get the correct file (size and contents). But when I run it in CGI mode on shared hosting (Linux), it comes back with correct size, but first 8 bytes of the file are always incorrect (and always different). The rest of the file is correct.
If in my sub i specify $hex_data instead of $bin_data I get what suppose to be there.
I'm at lost.
render_partial isn't what you want.
First, re-encode the executable in base64 format, and specify that the template is base64 encoded (This is assuming hex is not a requirement for your app):
## template-name (base64)
Also, you don't actually need a controller method at all. Mojolicious will handle the process for you - all you have to do is appropriately name the template.
use Mojolicious::Lite;
app->start;
__DATA__
## Test.exe (base64)
...
http://127.0.0.1:3000/Test.exe will then download the file.
-
If you still want to use a controller method for app-specific concerns, get the data template specifically:
use Mojolicious::Lite;
get '/download' => sub {
my $self = shift;
# http://mojolicio.us/perldoc/Mojolicious/Renderer.pm#get_data_template
my $data = $self->app->renderer->get_data_template({}, 'Test.exe');
# Replace content-disposition instead of adding it,
# to prevent duplication from elsewhere in the app
$self->res->headers->header(
'Content-Disposition', 'attachment;filename=name.exe');
$self->render_data($data);
};
app->start;
__DATA__
## Test.exe (base64)
...
http://127.0.0.1:3000/download will get the template, set the header, and then download it as name.exe.

Perl OpenOffice::OODoc - accessing header/footer elements

How do you get elements in a header/footer of a odt doc?
for example I have:
use OpenOffice::OODoc;
my $doc = odfDocument(file => 'whatever.odt');
my $t=0;
while (my $table = $doc->getTable($t))
{
print "Table $t exists\n";
$t++;
}
When I check the tables they are all from the body. I can't seem to find elements for anything in the header or footer?
I found sample code here which led me to the answer:
#! /usr/local/bin/perl
use OpenOffice::OODoc;
my $file='asdf.odt';
# odfContainer is a representation of the zipped odf file
# and all of its parts.
my $container = odfContainer("$file");
# We're going to look at the 'style' part of the container,
# because that's where the header is located.
my $style = odfDocument
(
container => $container,
part => 'styles'
);
# masterPageHeader takes the style name as its argument.
# This is not at all clear from the documentation.
my $masterPageHeader = $style->masterPageHeader('Standard');
my $headerText = $style->getText( $masterPageHeader );
print "$headerText\n"
The master page style defines the look and feel of the document -- think CSS. Apparently 'Standard' is the default name for the master page style of a document created by OpenOffice... that was the toughest nut to crack... once I found the example code, that fell out in my lap.

How can my previously untainted data become tainted again?

I have a bit of a mystery here that I am not quite understanding the root cause of. I am getting an 'Insecure dependency in unlink while running with -T switch' when trying to invoke unlink from a script. That is not the mystery, as I realize that this means Perl is saying I am trying to use tainted data. The mystery is that this data was previously untainted in another script that saved it to disk without any problems.
Here's how it goes... The first script creates a binary file name using the following
# For the binary file upload
my $extensioncheck = '';
my $safe_filename_characters = "a-zA-Z0-9_.";
if ( $item_photo )
{
# Allowable File Type Check
my ( $name, $path, $extension ) = fileparse ( $item_photo, '\..*' );
$extensioncheck = lc($extension);
if (( $extensioncheck ne ".jpg" ) && ( $extensioncheck ne ".jpeg" ) &&
( $extensioncheck ne ".png" ) && ( $extensioncheck ne ".gif" ))
{
die "Your photo file is in a prohibited file format.";
}
# Rename file to Ad ID for adphoto directory use and untaint
$item_photo = join "", $adID, $extensioncheck;
$item_photo =~ tr/ /_/;
$item_photo =~ s/[^$safe_filename_characters]//g;
if ( $item_photo =~ /^([$safe_filename_characters]+)$/ ) { $item_photo = $1; }
else { die "Filename contains invalid characters"; }
}
$adID is generated by the script itself using a localtime(time) function, so it should not be tainted. $item_photo is reassigned using $adID and $extensioncheck BEFORE the taint check, so the new $item_photo is now untainted. I know this because $item_photo itself has no problem with unlink itself latter in the script. $item_photo is only used long enough to create three other image files using ImageMagick before it's tossed using the unlink function. The three filenames created from the ImageMagick processing of $item_photo are created simply like so.
$largepicfilename = $adID . "_large.jpg";
$adpagepicfilename = $adID . "_adpage.jpg";
$thumbnailfilename = $adID . "_thumbnail.jpg";
The paths are prepended to the new filenames to create the URLs, and are defined at the top of the script, so they can't be tainted as well. The URLs for these files are generated like so.
my $adpageURL = join "", $adpages_dir_URL, $adID, '.html';
my $largepicURL = join "", $adphotos_dir_URL, $largepicfilename;
my $adpagepicURL = join "", $adphotos_dir_URL, $adpagepicfilename;
my $thumbnailURL = join "", $adphotos_dir_URL, $thumbnailfilename;
Then I write them to the record, knowing everything is untainted.
Now comes the screwy part. In a second script I read these files in to be deleted using the unlink function, and this is where I am getting my 'Insecue dependency' flag.
# Read in the current Ad Records Database
open (ADRECORDS, $adrecords_db) || die("Unable to Read Ad Records Database");
flock(ADRECORDS, LOCK_SH);
seek (ADRECORDS, 0, SEEK_SET);
my #adrecords_data = <ADRECORDS>;
close(ADRECORDS);
# Find the Ad in the Ad Records Database
ADRECORD1:foreach $AdRecord(#adrecords_data)
{
chomp($AdRecord);
my($adID_In, $adpageURL_In, $largepicURL_In, $adpagepicURL_In, $thumbnailURL_In)=split(/\|/,$AdRecord);
if ($flagadAdID ne $adID_In) { $AdRecordArrayNum++; next ADRECORD1 }
else
{
#Delete the Ad Page and Ad Page Images
unlink ("$adpageURL_In");
unlink ("$largepicURL_In");
unlink ("$adpagepicURL_In");
unlink ("$thumbnailURL_In");
last ADRECORD1;
}
}
I know I can just untaint them again, or even just blow them on through knowing that the data is safe, but that is not the point. What I want is to understand WHY this is happening in the first place, as I am not understanding how this previously untainted data is now being seen as tainted. Any help to enlighten where I am missing this connection would be truly appreciated, because I really want to understand this rather than just write the hack to fix it.
Saving data to a file doesn't save any "tainted" bit with the data. It's just data, coming from an external source, so when Perl reads it it becomes automatically tainted. In your second script, you will have to explicitly untaint the data.
After all, some other malicious program could have changed the data in the file before the second script has a chance to read it.