How can I get file size in Perl before processing an upload request? - perl

I want to get file size I'm doing this:
my $filename=$query->param("upload_file");
my $filesize = (-s $filename);
print "Size: $filesize ";`
Yet it is not working. Note that I did not upload the file. I want to check its size before uploading it. So to limit it to max of 1 MB.

You can't know the size of something before uploading. But you can check the Content-Length request header sent by the browser, if there is one. Then, you can decide whether or not you want to believe it. Note that the Content-Length will be the length of the entire request stream, including other form fields, and not just the file upload itself. But it's sufficient to get you a ballpark figure for conformant clients.
Since you seem to be running under plain CGI, you should be able to get the request body length in $ENV{CONTENT_LENGTH}.

Also want to sanity check against possibly already having post max set (from perldoc CGI):
$CGI::POST_MAX
If set to a non-negative integer, this variable puts a ceiling on the size of
POSTings, in bytes. If CGI.pm detects a POST that is greater than the ceiling,
it will immediately exit with an error message. This value will affect both
ordinary POSTs and multipart POSTs, meaning that it limits the maximum size of
file uploads as well. You should set this to a reasonably high value, such as
1 megabyte.

The uploaded file is stashed in a tmp location on the server when the form is submitted, check the file size there.
Supply the value for $field.
my $upload_filehandle = $query->upload($field);
my $tmpfilename = $query->tmpFileName($upload_filehandle);
my $file_size = (-s $tmpfilename);

This has nothing to do with Perl.
You are trying to read the filesize of a file on the user's computer using commands that read files on your server, what you want can't be done using Perl.
This is something that has to be done in the browser, and looking briefly at these questions it's either very hard or impossible.
Your best bet is to allow the user to start the upload and abort if the file is too big.

If you want to check before you process the request, you might be better off checking on the web page that triggers the request. I don't think the web browser can do it on it's own, but if you don't mind Flash, there are many Flash upload tools that can check things like size (as well as file types) and prevent uploading.
A good one to start with is the YUI Uploader. Lots more here: What is the best multiple file JavaScript / Flash file uploader?
Obviously you would want to check on the server side too, but by the time the user has started sending the request to the server, you are already using up your CPU cycles and bandwidth.

Thanks everyone for your replies; I just found out why $filesize = (-s $filename); was not working before, it is due that I was checking file size while sending Ajax request and not while re submitting the page.That's why I was having size to be zero. I fixed that to submit the page and it worked. Thanks.

Just read this post but while checking the content-length is a good approximate pre-check you could also save the file to temporary folder and then perform any kind of check on it. If it doesn't meet your criteria just delete and don't send it to it's final destination.

Look at the perl documentation for file stats -X - perldoc.perl.org and stat-perldoc.perl.org. Also, you can look at this upload script which is doing the similar thing what you are trying to do.

Related

MessageSummaryItems.PreviewText Clarification

We're making use of the newly added MessageSummaryItems.PreviewText feature. Thank you!!
On issue is: sometimes the PreviewText contains HTML links? From reading through the source I see this in ImapFolderFetch.cs
var body = message.TextBody ?? message.HtmlBody;
So this is saying: use the Plaintext version, if it exists, then use the HTML version?
Therefore if I see links in the preview, I can assume no Plaintext version is available?
Our problem with this is:
If our message only has an HTML version, We could strip the links from the message in our code, but there are only 256 characters of it. In many cases, there will be nothing left to display.
As per your TODO: Using the CONVERT extension would be a better approach but, as far as I can tell its not supported by Gmail?
A fall back would be:
If we could set the preview length for both HTML and Plaintext individually, then we could say, If you only have an HTML version give me 1K of it and i'll strip out the links on the client.
Thoughts?
Very few IMAP servers support the CONVERT extension which is the main reason I didn't implement it.
The PreviewText feature is an attempt at adding a convenience feature to fetch the first 256 bytes of each message body in batched requests in order to minimize latency, but no matter what I do, it's not guaranteed to be useful (since there could be a ton of markup before any real text is included in HTML).
If I were to split text and html messages into 2 different batches so that I could request different sizes for each, then it would be less efficient and might take significantly longer to fetch, so I'm not sure if it's really worth it. The less I'm able to batch at a time, the less useful the feature becomes compared to implementing your own loop over the list of messages and downloading your own specified chunk size. one message at a time.
My suggestion would be to use the PreviewText feature and for the rare messages where the 256 bytes isn't enough, perform a folder.GetStream() on them.

WWW::Mechanize in Perl, Script Gets Killed

I have written a Perl Script which uses WWW::Mechanize to connect to a site, login and then visit a few pages inside the site. It all works good, however, when I try to visit a large number of pages, the script gets killed. I am sure this has got nothing to with the HTTP Server's Configuration and the connection limits configured. This is because, the script is running on my own site.
Here's a high level overview of my script:
$url="http://example.com";
$mech=WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech->get($url);
login to the site using the form fields.
Now, once I am logged in, I connect to URLs within the site as follows:
$i is the iteration counter in a for loop
$internal_url="http://example.com/index.php?page=$i";
$mech->get($internal_url);
perform some operations on the page returned ($mech->content using HTML::TreeBuilder::XPath)
now, I iterate over the for loop connecting to a different internal_url, since the value of $i is incremented in every iteration.
As I said, it all works good. However, after about 180 pages, the script gets killed.
What could be the reason? I have tried multiple times.
I even added a $mech->delete; right before the end of the FOR loop to prevent any memory leak.
However, the only issue is that the login session which was maintained by $mech would be destroyed as a result of this.
I have tried multiple times and this script always gets killed after visiting the same number of pages.
Thanks.
Try this code:
$mech=WWW::Mechanize->new();
$mech->stack_depth(0);
OR
$mech=WWW::Mechanize->new(stack_depth=>0);
According to the docs: Get or set the page stack depth. Use this if
you're doing a lot of page scraping and running out of memory.
A value of 0 means "no history at all." By default, the max stack
depth is humongously large, effectively keeping all history.

SWFUpload + jQuery.SWFUpload - Remove File From Queue

I'm facing a big issue IMO.
First, here's my code:
.bind('uploadSuccess', function(event, file, serverData){
if(serverData === 'nofile') {
var swfu = $.swfupload.getInstance('#form');
swfu.cancelUpload(file.id); // This part is not working :(
} else {
alert('File uploaded');
}
})
In this part I'm checking server response (I'm have strict validation restrictions). Now my question. Is it possible to remove uploaded file from queue? Basically, if server returns error I display error message, but... this file still exsit in the queue (I've implemented checking filename and filesize to avoid duplicated uploads) and user is not possible to replace this file (due to upload and queue limit).
I was trying to search for a solution, but without success. Any ideas?
Regards,
Tom
From the link
http://swfupload.org/forum/generaldiscussion/881
"The cancelUpload(file_id) function
allows you to cancel any file you have
queued.
You just have to keep the file's ID
value so you can pass it to
cancelUpload when you call it."
Probably you have to keep the file ID before sending anything to the server

C#: Take Out Image Portion of JPEG to Backup Metadata?

This will be a little backwards from the typical approach.
I've used ExifTool for metadata manipulation before, but I really want to keep the best metadata backup I can before I make anything permanent.
What I want to do is remove the compressed image portion of a JPEG file to leave everything else intact. That's backing up EXIF, Makernotes, IPTC, XMP, etc whether at the beginning or end of the file.
What I've tried so far is to strip all metadata from a copy of the original JPEG, and use it as a basis of what bytes will be taken out of the original. After looking at the raw data, it doesn't seem like the stripped copy is contiguous in the original copy. There may be some header information still remaining in the stripped version. I don't really know. Not a good way to do it, I suppose.
Are there any markers that will absolutely tell me where the compressed JPEG image data starts and ends? I understand that JPEG files have 0xFFD8 and 0xFFD9 to mark the start and end of the image, but have come to find out that metadata is actually between those markers.
I'm using C#.
Thank you.
To do this properly you need to fully parse the JPEG/JFIF format and discard anything you don't want. Metadata is all kept in APP segments or trailers after the JPEG EOI, so presumably you will toss everything else. Full parsing of a JPEG/JFIF is not trivial, and for this I refer you to the JPEF/JFIF specification.
You can use the JpegSegmentReader class from my MetadataExtractor library to retrieve specific segments from a JPEG image.

How can I validate an image file in Perl?

How would I validate that a jpg file is a valid image file. We are having files written to a directory using FTP, but we seem to be picking up the file before it has finished writing it, creating invalid images. I need to be able to identify when it is no longer being written to. Any ideas?
Easiest way might just be to write the file to a temporary directory and then move it to the real directory after the write is finished.
Or you could check here.
JPEG::Error
[arguments: none] If the file reference remains undefined after a call to new, the file is to be considered not parseable by this module, and one should issue some error message and go to another file. An error message explaining the reason of the failure can be retrieved with the Error method:
EDIT:
Image::TestJPG might be even better.
You're solving the wrong problem, I think.
What you should be doing is figuring out how to tell when whatever FTPd you're using is done writing the file - that way when you come to have the same problem for (say) GIFs, DOCs or MPEGs, you don't have to fix it again.
Precisely how you do that depends rather crucially on what FTPd on what OS you're running. Some do, I believe, have hooks you can set to trigger when an upload's done.
If you can run your own FTPd, Net::FTPServer or POE::Component::Server::FTP are customizable to do the right thing.
In the absence of that:
1) try tailing the logs with a Perl script that looks for 'upload complete' messages
2) use something like lsof or fuser to check whether anything is locking a file before you try and copy it.
Again looking at the FTP issue rather than the JPG issue.
I check the timestamp on the file to make sure it hasn't been modified in the last X (5) mins - that way I can be reasonably sure they've finished uploading
# time in seconds that the file was last modified
my $last_modified = (stat("$path/$file"))[9];
# get the time in secs since epoch (ie 1970)
my $epoch_time = time();
# ensure file's not been modified during the last 5 mins, ie still uploading
unless ( $last_modified >= ($epoch_time - 300)) {
# move / edit or what ever
}
I had something similar come up once, more or less what I did was:
var oldImageSize = 0;
var currentImageSize;
while((currentImageSize = checkImageSize(imageFile)) != oldImageSize){
oldImageSize = currentImageSize;
sleep 10;
}
processImage(imageFile);
Have the FTP process set the readonly flag, then only work with files that have the readonly flag set.