Wget downloading incomplete file from a URL [closed] - wget

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I want to get a file downloaded on my linux system whose url is
http://download.oracle.com/otn-pub/java/jdk/7u51-b13/jre-7u51-linux-i586.tar.gz
and I am issuing the following command as :
wget -U 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:16.0) Gecko/20100101 Firefox/16.0' http://download.oracle.com/otn-pub/java/jdk/7u51-b13/jre-7u51-linux-i586.tar.gz
whereas the user agent is being passed to -U which i have copied from my browser's user agent. But it downloads the file only of size 5.3KB whereas entire file is 46.09MB and the downloaded file is corrupted.
How can I resolve this issue?

Looking at the output, you will realize that oracle denied the request, containing following message:
In order to download products from Oracle Technology Network you must
agree to the OTN license terms.
Be sure that...
Your browser has "cookies" and JavaScript enabled.
You clicked on "Accept License" for the product you wish to download.
You attempt the download within 30 minutes of accepting the license.
Most probably you have to send some GET or POST value and/or keep session data.

The file isn't 'corrupt' exactly; if you go to that URL in a new browser session you'll see an error page saying 'In order to download products from Oracle Technology Network you must agree to the OTN license terms.'. That is the page you've downloaded - the file size of the page it redirects to is 5307 bytes.
Before you can get the file from the download page you have to accept the license agreement using the radio buttons. Doing so creates a cookie in your browser, and when you get the actual file that cookie is checked. wget doesn't have that cookie available.
You need to download directly from the site, or arrange for wget to send a fake cookie, which probably isn't supported in general. Some downloads used to have a wget script attached, not sure if this one does; it doesn't look like it from what's on the download page.

Related

How can I download PDF files from Google Drive to my iOS app? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I want to build an app that lets the user download books, and those books are located on Google Drive.
My question is: how to download pdf files from Google Drive or the internet?
Answer:
You can use the Google Drive Files: get method to directly download PDFs, or use the Files: export method to convert native Drive files to PDF.
More Information:
The official Google Drive documentation[1] only give download examples for node.js, Python and Java, but this can be used for whichever language you would like. This uses the Files: get[2] method for downloading files to the local machine.
The Files: export method[3] is used for converting files of Google Drive format to another format. The MIME Types of Google Drive files are well documented[4] and only certain files can be exported to PDF[5].
References:
Download files | Google Drive API v3
Google Drive API - Files: get
Google Drive API - Files: export
G Suite and Drive MIME Types
G Suite documents and corresponding export MIME Types

Are local robots.txt files read by Facebook and Google? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a folder which is half public: The URL is not linked, the people that know the URL are only a few friends (which will not link it) and it is cryptic enough to make sure that nobody lands there by accident.
However, the link is send via Googlemail and Facebook messages. Is there a way to tell Facebook and Google in a local robots.txt file not to index the page?
When I add it to the "global" robots.txt file then everybody who takes a look there will see that in my /secret-folder-12argoe22v4 might be something interesting. So I will not do that. But will Facebook / Google look at /secret-folder-12argoe22v4/robots.txt?
The content would be
User-agent: *
Disallow: .
or
User-agent: *
Disallow: /secret-folder-12argoe22v4/
As CBroe mentioned, a robots.txt file must always be at the top level of the site. If you put it in a subdirecory, it will be ignored. One way you can block a directory without publicly revealing its full name is to block just part of it, like this:
User-agent: *
Disallow: /secret
This will block any URL that starts with "/secret", including "/secret-folder-12argoe22v4/".
I should point out that the above is not a 100% reliable way to keep the files out of the search engines. It will keep the search engines from directly crawling the directory, but they can still show it in search results if some other site links to it. You may consider using robots meta tags instead, but even this won't prevent someone from directly following an off-site link. The only really reliable way to keep a directory private is to put it behind a password.

I accidentally deleted a picture in my album on FB and it was uploaded years ago and I dont have a copy how can i get it back [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I am not tech savvy but I was taking a picture that I have had on my FB act album since 2007 and trying to copy it. It was of the first moments after my daughter was born and they laid her in my arms after a Cesarian section. It is the only place I had the picture. No other copys. The actual camera was lost. I was trying to do a "then and now picture" her birth to her first day of kindergarten and now I have lost it by accidentally hitting delete instead of copying.
Is there ANYWAY to get it back? Does FB have a trash folder?
Beth
So, I know this question is probably off-topic but I can't help but feel bad for lost baby photos.
First of all, if anyone else tagged the photo, it would be linked to their account as well. If you've ever emailed anyone the photo, likely you would have either sent the photo itself or a direct link to the facebook website where it's stored. It's probably still stored there, but now you just have (unfortunately) no way to access it. So if you ever sent the link to anyone, if you find the link, it will still link to the page.
If you're someone who doesn't routinely purge your browsing history, there may be a chance that a copy of the photo was saved to your browser's temporary files folder while you were looking at the photo. If you use Internet Explorer 7 or 8, here's a guide. Otherwise just Google "<your browser name and version> temp folder location".
http://windows.microsoft.com/en-us/windows-vista/view-temporary-internet-files
Of course, neither of these may work. If so, sorry for your loss of data. Good luck!

Perl, Template-Toolkit and SEO [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm not sure how to deploy best practice for SEO in a new project.
I'm building a CMS that will be used by a group of writers to post news articles to a website. I'm developing the site using Perl and Template-Toolkit (TT2). I've also embedded an open source editor (TinyMCE) in the system that will be used for content creation.
I was planning to save the news article content to the DB as text - though I could also save it to flat files and then save the corresponding file paths to the DB.
From an SEO standpoint, I think it would be very helpful if this content could be exposed to search engines. There will be lots of links and images that could help to improve rankings.
If I put this content in the DB, it won't be discoverable ... right?
If I save this content in template files (content.tt) will the .tt files be recognized by search engines?
Note that the template files (.tt) will be displayed as content via a TT2 wrapper.
I'm also planning to generate a Google XML Sitemap using the Sitemap 0.90 standard. Perhaps this is suffiecient? Or should I try to make the actual content discoverable?
Thanks ... just not sure how the google dance deals with .tt files and such.
If I put this content in the DB, it won't be discoverable ... right?
The database is part of your backend. Google cares about what you expose to the front end.
If I save this content in template files (content.tt) will the .tt files be recognized by search engines?
Your template files are also part of your backend.
Note that the template files (.tt) will be displayed as content via a TT2 wrapper.
The wrapper takes the template files and the data in the database and produces HTML pages. The HTML pages are what Google sees.
Link to those pages.
just not sure how the google dance deals with .tt files and such
Google doesn't care at all about .tt files and the like. Google cares about URLs and the resources that they represent.
When Google is given the URL of the front page of your site, it will visit that URL. Your site will respond to that request by generating the front page, presumably in HTML. Google will then parse that HTML and extract any URLs it finds. It will then visit all of those URLs and the process will repeat. Many times.
The back-end technologies don't matter at all. What matters is that your site is made up of well-constructed HTML pages with meaningful links between them.

REST design for file uploads [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I want to create a REST API for a file upload service that allows a user to:
Open a session
Upload a bunch of files
Close the session
And then later, come back and do things with the files they uploaded in a previous session.
To facilitate dealing with data about each file and dealing with the content of the file itself, this is the URI scheme I am thinking of using:
/sessions/
/sessions/3
/sessions/3/files
/sessions/3/files/5
/sessions/3/file/5/content
/sessions/3/file/5/metadata
This will allow the file metadata to be dealt with separately from the file content. In this case, only a GET is allowed on the file content and file metadata, and to update either one, a new file has to be PUT.
Does this make sense? If not, why and how could it be better?
Why do you need sessions? Is it for Authentication and Authorization reasons? If so I would use http basic with SSL or digest. As such there is no start or end session, because http is stateless and security headers are sent on each request.
Suggestion of upload resource would be to directly map as private filesystem
# returns all files and subdirs of root dir
GET /{userId}/files
GET /{userId}/files/file1
GET /{userId}/files/dir1
# create or update file
PUT /{userId}/files/file2
When uploading file content you then would use multipart content type.
Revised answer after comment
I would design your wanted separation of file-content and payload by introducing link (to file-content) inside upload payload. It eases resource structure.
Representation 'upload' resource:
{
"upload-content" : "http://storage.org/2a34cafa" ,
"metadata" : "{ .... }"
}
Resource actions:
# upload file resource
POST /files
-> HTTP 201 CREATED
-> target location is shown by HTTP header 'Location: /files/2a34cafa
# /uploads as naming feels a bit more natural as /files
POST /sessions/{sessionId}/uploads
-> HTTP 201 CREATED
-> HTTP header: 'Location: /sessions/{sessionId}/uploads/1
-> also returning payload
# Updating upload (like metadata)
/PUT/sessions/{sessionId}/uploads/1