Translate a web page click to a URL for wget - wget

I use a web page that a client uses to send me large files.
It's in tha far east, so it often requires dozens of attempts to work.
I'd like to script this using bash with wget (or curl) to retry until it succeeds.
The web page has a "Download" button with this:
<a class="button file-node-download" href="#">
Download
</a>
The web page address is https://aaa.bbbbbb.com/p/DWk8R34Qgj0Yof4D.
The filename is the_file.zip.
I click on the button with the href='#' and the browser starts the download.
When I try this in bash all I get is the web page HTML:
wget 'https://aaa.bbbbbb.com/p/DWk8R34Qgj0Yof4D#'
wget 'https://aaa.bbbbbb.com/p/DWk8R34Qgj0Yof4D%23'
wget 'https://aaa.bbbbbb.com/p/DWk8R34Qgj0Yof4D%23the_file.zip'
wget 'https://aaa.bbbbbb.com/p/DWk8R34Qgj0Yof4D%23/the_file.zip'
I can't see any JavaScript on the page that might make this work.

It appears that there's an event listener attached to the A DOM element. You'd need to find it and see what it actually does before you can get a working wget version out of it.
Have a look at the network tab in your browser's development console. You should be able to see what network request is eventually triggered by the click.

Related

How do I access the data shown on the sources tab in Chrome?

So I am a little bit stuck here.
I am doing some scraping with Puppeteer, but at some point I have to download a file. The problem is that the file is "generated" after a click on a button. I know how to do that on Puppeteer, also to catcha requests and responses from the page, however, none of them is of use.
So I have a button on a previous page, this is the button after inspecting it.
<button
id="ReporteOpinionForm:botonConsultar"
name="ReporteOpinionForm:botonConsultar"
class="ui-button ui-widget ui-state-default ui-corner-all ui-button-text-only ui-button"
type="submit">
<span class="ui-button-text ui-c">
Consultar
</span>
</button>
By inspecting the whole page I can see that it uses Primefaces and JSF
So once I click it a XHTMLRequest is sent to an XHML endpoint
The response (on the right) is nothing more than an ID or some sort, also the method is POST and the body is FormData which has some unimportant stuff I think
After a few seconds, a new page is loaded with the PDF embedded
But after inspecting the page, it only has an empty body
But if I go to the sources tab on the devtools I can see this
The content is a base64 encoded string, which if decoded to a PDF file, results in the file that can be seen on the viewer, so, the primary objective here is to download the file, I have tried several things.
• Catch the request and the response and copy the response, but the response of the XHMLHttpRequest is different and not the base64 string
• Move the mouse to the PDF bar that appears on the top of the page and click on the download button, but that doesnt work either
• Try to print the page into a PDF, however, the scripts breaks on headless mode when the button to generate the PDF is clicked.
I am lost and I don't know what to do or what I am missing or not seeing
Any help would be appreciated, thanks

Googlebot vs "Google Plus +1 Share Button bot"?

Site Setup
I have a fully client-side one page webapp that is dynamically updated and routed on the client side. I redirect any #! requests to a headless server that renders the request with javascript executed and returns the final html to the bot. The head of the site also contains:
<meta name="fragment" content="!">
Fetch as Google works
Using the Fetch as Google webmaster tool, in the Fetch Status page, I can see that the jQuery I used to update the og:title, og:image, and og:description was executed and the default values replaced. Everything looks good, and if I mouseover the URL, the screenshot is correct.
However, with the Google Plus button, no matter what values og:title, og:image, and og:description tags are updated to, the share pop-up always uses the default/initial values.
Attempted use
I call this after each time the site content is updated, rerouted, and og meta content updated.
gapi.plusone.render("plusone-div");
I was assuming that if this approach works for the Googlebot, it should also work for the +1 button. Is there a difference between the Googlebot and whatever is used by +1 to retrieve the site metadata?
edit:
Passing a url containing the #! results in a 'site not found'
gapi.plusone.render("plusone-div", {"href" : 'http://www.site.com/#!city/Paris');
The Google crawler does not render the snippet when the +1 button is rendered but rather when a user clicks the +1 button (or share button). What you should try is to determine what your server is sending to the Googlebot during this user initiated and asynchronous load by the Google crawler.
You can emulate this by using the following cURL command:
curl -A "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google (+https://developers.google.com/+/web/snippet/)" http://myurl.com/path/to/page
You can output that command to a file by adding -o testoutput.html to the command.
This will give you an idea of what the Google crawler sees when it encounters your page. The structured data testing tool can also give you hints.
What you'll likely see is that unless your doing your snippet preparation in a static file or on the server side is that you're likely not going to get the snippet that you desire.
If you can provide real URLs to test, I can probably provide more specific feedback.
Google+ fetch the pages using the _escaped_fragment_ query parameter but without the equal sign.
So, it would fetch http://www.site.com/?_escaped_fragment and NOT https://www.site.com?_escaped_fragment_=
Google Search crawler still using the fragment with the equal sign, this is just for Google plus crawler.

wget download page which is a post-request

I want to download with wget some pages of results. I have found that what changes each time i click the next page button is the last number, as is shown here:
http://www.sth.com/friends/ --> first page
http://www.sth.com/friends/#quickfinder_member_friends_1=2 --> second page
http://www.sth.com/friends/#quickfinder_member_friends_1=3 --> third page
when i try to download the page with wget, it saves always the first page. from the webconsole of mozilla i found that it is a post request and not a get request.
Does anyone know how i can use wget to download these pages?
Thank you.

Facebook “Like Box” and JQuery

I'm attempting to add a FB "Like Box" to a website I'm developing. Not too familiar with Facebook apps, but so far I've gone the non-IFRAME route, using the FB SDK script include.
I'm fairly certain I've got almost everything setup correctly. In fact, I see the widget appear when I visit the page UNCACHED (i.e. in FF, I hit CTRL+SHIFT+R to reload all content to avoid loading from cache). Once I revisit the site, or move around within the site by clicking links, the content does not reappear.
I'm wondering if it's an issue with a) the channel.php file, or b) the apps interaction with my use of JQuery. The channel.php file is verbain what is provided by Facebook (using PHP's caching mechanism).
Here's the site currently: http://www.morningfatty.com/demo - It might be easier to list this rather than post several code snippets.
I went to your website and didn't see the like box. I checked the HTML code and it all appeared fine. The div looked like <div data-header="false" data-stream="false" data-border-color="#40ADAD" data-show-faces="true" data-colorscheme="light" data-width="192" data-href="http://www.facebook.com/morningfatty" class="fb-like-box"></div>
I went to https://developers.facebook.com/docs/reference/plugins/like-box/ and tried your link http://www.facebook.com/morningfatty and lo-and-behold the like box didn't display there.
I tried going directly to http://www.facebook.com/morningfatty and it redirected me to http://www.facebook.com/MorningFatty. I noticed the change of case in the name. So I went back and tried http://www.facebook.com/MorningFatty in the like-box and it worked!!.
I believe that you page will work once you update the casing on the url. :)

the the popup window ("flyout") of a like button doesn't show up in a chrome extension

i am trying to put a like button on a page in a chrome extension that i've developed.
i use the simple XFBML version:
<fb:like href="http://www.mydomain.com/page?param=1&otherparam=2" send="false" layout="standard" width="400" show_faces="false" font="arial" ref="chrome_notification"></fb:like>
and of course i use the JavaScript SDK.
when i upload this page into my webserver, everything works just fine.
but when i run this page within my chrome extension, the like button itself works just fine, but the comment popup doesn't show.
in addition - i get these error masseges in the console:
Unsafe JavaScript attempt to access frame with URL chrome-extension://eindnjdghfmigkecgibjclhdnadlnbhm/../mypage.html from frame with URL https://www.facebook.com/plugins/like.php?api_key=&channel_url=http%3A%2F%2Fstatic.ak.fbcdn.net%2Fconnect%2Fxd_proxy.php%3Fversion%3D3%23cb%3Df3d383d278%26origin%3Dchrome-extension%253A%252F%252Feindnjdghfmigkecgibjclhdnadlnbhm%252Ff44dd2768%26relation%3Dparent.parent%26transport%3Dpostmessage&extended_social_context=false&font=arial&href=http%3A%2F%2Fwww.mydomain.com%2F%3Fparam%3D1%26otherparam%3D2&layout=button_count&locale=en_US&node_type=link&ref=chrome_notification&sdk=joey&send=false&show_faces=false&width=400. Domains, protocols and ports must match.
and -
Unsafe JavaScript attempt to access frame with URL https://www.facebook.com/plugins/like.php?api_key=158698534219579&channel_url=http%3A%2F%2Fstatic.ak.fbcdn.net%2Fconnect%2Fxd_proxy.php%3Fversion%3D3%23cb%3Df3d383d278%26origin%3Dchrome-extension%253A%252F%252Feindnjdghfmigkecgibjclhdnadlnbhm%252Ff44dd2768%26relation%3Dparent.parent%26transport%3Dpostmessage&extended_social_context=false&font=arial&href=http%3A%2F%2Fwww.mydomain.com%2F%3Fparam%3D1%26otherparam%3D2&layout=button_count&locale=en_US&node_type=link&ref=chrome_notification&sdk=joey&send=false&show_faces=false&width=400 from frame with URL http://www.facebook.com/plugins/comment_widget_shell.php?api_key=&locale=en_US&master_frame_name=f38cd100f8&sdk=joey. Domains, protocols and ports must match.
can anyone help me to find a solution for this?
Could it possibly by related to this known Facebook bug?
http://developers.facebook.com/bugs/293075054049400
Basically, one gets this behavior if secure browsing is enabled on the Facebook user's account.