I am trying to download images from a site using Perl to download and save them with LWP::Simple.getstore.
Here's an example of the URL
http://www.aavinvc.com/_includes/blob.asp?Table=user&I=28&Width=100!&Height=100!
Turns out the files are completely empty that I am getting with LWP. I even tried cURL and same thing, completely empty. Would there be another way to get these?
If the file really contains ASP, then you have to run it through an ASP engine.
If things worked properly, then the URL would return an image file with an appropriate content type. You've just saved it with a .asp extension.
The fix is simple: Rename the file (preferably by looking at the Content-Type header returned (trivial with LWP, but I think you'll have to move beyond getstore) and doing it in Perl.
Regarding the update:
I just tried:
#!/usr/bin/perl
use Modern::Perl;
use LWP::Simple;
LWP::Simple::getstore(q{http://www.aavinvc.com/_includes/blob.asp?Table=user&I=28&Width=100!&Height=100}, 'foo.jpeg');
… and it just worked. The file opened without a hitch in my default image viewer.
.asp is not an image format.
Here are two explanations:
The image are simple jpegs generated by .asp files, so just use them if they were .jpegs - just rename them;
You are actually downloading a page that says "LOL I trol U - we don't allow images to be downloaded with Simple.getstore.
Related
I'm trying to embed a GIF in a GitHub Pages page and tried every single way I found online can work. Here's my link:
https://github.com/jellyfishrui/Interactive-Programming-in-Python-Rice-University/blob/master/Week3/Stopwatch-the-Game/Instructions
The last line of code is the embedding code:
![StopWatch](https://github.com/jellyfishrui/Interactive-Programming-in-Python-Rice-University/blob/master/Week3/Stopwatch-the-Game/StopWatch.gif)
I also tried to embed the PNG (also saved as other formats like JPEG) and changed the extension to upper/lower case. But none of them can help me load the image. I also tried it on different browsers but they just all turned the same.
I tried the absolute/relative path and neither worked.
That file has no file extension but you're trying to use Markdown for the image. Try renaming it to Instructions.md.
Also, make sure the casing of the file matches the file you've uploaded. Guessing what case to use isn't likely to work.
Recently I've integrated Google Drive with my iOS application. Everything works fine but .ppt files. Normally if a file is a Drive file I use downloadURL to download it. If the file belongs to Google Docs I use one of the exportLinks (exactly the same as Alain described it here).
However all .ppt files (with "mimeType": "application/vnd.google-apps.presentation") which come from Google Docs are corrupted after being downloaded (I use an export link with exportFormat=pptx). The same file downloaded via web browser works fine.
I use ASIHTTPRequest lib for downloading files (which also can be the reason of corrupted .ppt?).
Any ideas why only ppt files cause problems?
I can already tell you that the lib you're using isn't the cause:I'm not using it but I've the same problem: it seems that there the code received isn't 200 (if ($httpRequest->getResponseHttpCode() == 200)) as it shows me a specific error message I've asked to return in case of. Also, when I'm trying to download a presentation in PDF or txt, it shows the same error.
It's not really an answer but I'm trying to understand also why only presentations are causing problems.
EDIT: the code received is 302. If it can help...
EDIT 2: After trying, I noticed that the first parameter is the file id and the second the export format:
https://docs.google.com/feeds/download/presentations/Export?docId=filedid&exportFormat=pptx
But in the 302 code, I have this location:
https://docs.google.com/feeds/download/presentations/Export?exportFormat=pptx&id=fileid
Not only the two parameters aren't in the same order but the name is id and not docid
When I take this URL, put it as the export link and then try to copy the file, it's working. I get a 200 response and the inside of the file.
I hope it helps.
I am trying to write a program in perl to download images from a website. The problem is I would like to retain the same directory structure as the website it is being downloaded from.
e.g. If the image to be download is from the below url. Then the program should create the directory name "folder" and inside that download and then put the image inside the inner most folder.
http://www.example.com/folder/download/images.jpg
I am using LWP to download the images.
use LWP::Simple;
getstore($fileURL,$filename);
Look at wget or pavuk. You can also call them from within perl. That's what I usually do.
I have a website into which many pdfs are uploaded. What i want to do is to download all those PDFs present in the website. To do so i first need to provide username and password to the website. After searching for sometime i found WWW::Mechanize package that does this work. Now the problem arises here that i want to make a recursive search in the website meaning that if the link does not contain a PDF, then i should not simply discard the link but should navigate the link and check whether the new page has links that contain PDFs. In this way i should exhaustively search the entire website to download all PDFs uploaded. Any suggestion on how to do this?
I'd also go with wget, which runs on a variety of platforms.
If you want to do it in Perl, check CPAN for web crawlers.
You might want to decouple collecting PDF URLs from actually downloading them. Crawling already is lengthy processing and it might be advantageous to be able to hand off downloading tasks to seperate worker processes.
You are right about using WWW::Mechanize module. This module has a method - find_all_links() wherein you can point out the regex to match the kind of pages you want to grab or follow.
For example:
my $obj = WWW::Mechanize->new;
.......
.......
my #pdf_links = $obj->find_all_links( url_regex => qr/^.+?\.pdf/ );
This gives you all the links pointing to pdf files, Now iterate through these links and issue a get call on each of them.
I suggest to try with wget. Something like:
wget -r --no-parent -A.pdf --user=LOGIN --pasword=PASSWORD http://www.server.com/dir/
I want to add a pdf and word format of my resume to my portfolio page and make it downloadable. Does anyone have some simple script?
Add a link to the file and let the browser handle the download.
You may be over-complicating the problem. It's possible to use a href pointing to the location of the .pdf or .doc file, when a user clicks on this in their browser, generally they will be asked if they would like to save or open the file, depending on their OS/configuration.
If this is still confusing, leave a comment and I'll explain anything you don't get.
Create the PDF. Upload it. Add a link.
Save yourself 30 minutes tossing around with PDFGEN code.
You will want to issue or employ the Content-Disposition HTTP header to force the download otherwise some browsers may recognize the common file extensions and try to automatically open the file contents. It will feel more professional if the link actually downloads the file instead of launching an app - important for a resume I think.
Content-Disposition must be generated within the page from the server side as far as I know.
Option:
Upload your resume to Google Docs.
Add a link to the file on your portfolio page just as I do in the menu of my blog:
Use Google Docs Viewer passing to it the URL of the PDF as you can see in this link.