Wget to download data from a javascript authenticated webpage - wget

i am trying to download data from a webpage that uses post method and uses javascript to pass the username/password to function called login.do
is there a way to connect and download the data of the webpage.
please let me know if you need any more information about the situation.
i tried using wget --save-cookies cookies.txt http://x.x.x.x:80
but it doesnt go pass the connecting stage.

Have a look at this question, which is similar to yours. Basically, you will probably have to inspect the POST request and make one yourself using wget that contains the necessary login information.

Related

Using wget to download page within Moodle

I understand that this might be too specific but I am still quite new and having some difficulty figuring this out. So, I am hoping someone can offer up a clear command which can achieve my goal.
My goal is to download a page from Moodle. Now, this is Moodle 3.9 but I don't think this matters. The page is a wiki which is part of Moodle. So, there are child pages, links, images, etc. Now, each student gets their own wiki and it is available by URL. However, one must be logged in to access the URL or else a login page will show up.
I did review this question, which seemed like it would help. It even has an answer here which seems specific to Moodle.
By investigating each of the answers, I learned how to get the cookies from Firefox. So, I tried logging into Moodle in my browser, then getting the cookie for the session and using that in wget, according to the Moodle answer cited above.
Everything I try results in a response in Terminal like:
Redirecting output to 'wget-log.4'
I then found out where wget-log files are stored and checked the contents, which seemed to be the login page from Moodle. While I am quite sure I got the cookie correct (copy-and-paste) I am not sure I got everything else correct.
Try as I might (for a couple of hours), I could not get any of the answers to work.
My best guess for the command is as follows:
wget --no-cookies --header "Cookie: MoodleSession=3q00000000009nr8i79uehukm" https://myfancydomainname.edu/mod/ouwiki/view.php?id=7&user=160456
It just redirects me to the wget-log file (which has the Moodle login page).
I should not need any post data since once logged into my browser, I can simply paste this URL and it will take me to the page in question.
Any ideas what the command might be? If not, any idea how I could get some pretty clear (step-by-step) instructions how to figure it out?

wget vbulletin forum attachments

I want to download attachments from a vbulletin forum with login. It always gives me an error about unspecified length. In the thread itself only thumbnails are displayed, but i want the full resolution which is in the attachments. I am targeting only the *.jpg, not the rest of the forum.
The url looks something like this: http://www.page.com/attachment.php?attachmentid=1234567&d=1234567890
(I think both numbers "attachmentid" and "d" are random and independent from each other.)
When I try mirroring the whole page everything works except the attachments(thumbnails are downloaded).
Any ideas how I can solve this issue?
Cheers
PS: Httracker brings me to the same problem, alternative solutions welcome as well :)
As you mentioned about "download attachments from a vbulletin forum with login", make sure you have done the login part. So the steps will be as follows.
1) Make the login using wget, and store the cookie into a txt file. Parameter can be --save-cookies
2) Now call the attachment download portion using another wget call and use the cookie.txt(of step 1) with this call. The parameter can be --load-cookies
More detailed about the wget parameters can be found from here.
Both attachmentid and Id can not be random at the same time. Otherwise the forum can not recognize at which attachment you are focusing on.

Facebook like link data extractor

I'm seeking a lib that takes a URL and returns back useful information like:
Title
Description
List of images
Anything around?
Embed has a nice api for exactly this purpose. link
Try out the REST API links.preview method - https://developers.facebook.com/docs/reference/rest/links.preview/
You can also test out a few URLs to see if this is what you are looking for.
This one made my day: http://www.embedify.me/
Yes. You can send ajax request to a php file which gets contents of the url using file_get_contents() function and returns it to ajax. Then we can extract whatever we want from this data response.
For explanation, live demo and script download follow herehttp://www.voidtricks.com/extracting-url-data-like-facebook/

Upload a hosted image with Facebook's Graph API by url?

So, I'm using the Facebook Graph API to upload a photo. Using curl, it goes something like this:
curl -F 'access_token={some access token}'
-F 'source=#/some/file/path/foo.png'
-F 'message=This is a test of programatic image uploading'
https://graph.facebook.com/me/photos
Now, this works fine if I have the file on the machine I'm making the request from. The issue is that the system I'm working on gets the image as a url (say, "http://example.com/foo.png"). I'd rather not download the image from example.com to my server just to upload it to facebook, since I have no need to keep it other than that. Is there any way I can just pass the url to facebook, or is this impossible?
(-F 'source=#http://example.com/foo.png' does not work)
In the past, we've simply downloaded the file locally to the server, then handled the upload and unlinked it. This way we're also able to be sure that the asset was available (servers/connections flaking out) to be uploaded in the first place. I don't believe you can initiate an upload and the media to come from a third-party (may be wrong though).

iPhone web services NSURL

hi I am working on an application which takes data from a website and it displays it in table. I have been sucessful in making like an RSS feed (made like a twitter feed so I think it is an xmlparser) but now I want to get data from a website which doesn't have RSS feed in it..I just want to get the titles from the webpage.... any suggestion how do I do it without the XMLParser...
thanks
I think that the best way is to create on your server a php/asp/... page that will scrape data from the remote website.
Then, in that page, you can use some CURL to scrape data.
See here.
Next, you return the data in the format you want (XML/jSon/etc...).
Finally, you can easily call that script from your code.
On the other hand, pay attention to not scrape anything as skimming is generally illegal and Apple ca reject your app because of that.
There is a nice post talking about it.