Tell if Facebook page redirects - facebook

I've noticed that some facebook pages redirect. For example the NOFX band page (http://www.facebook.com/pages/NOFX/104336479603261 only two links allowed so that one isn't set to link) redirects to their official page ( https://www.facebook.com/pages/NOFX-Official-Page/180985116576?rf=104336479603261 ). I'm curious is if in the api we can tell that a page does this. https://graph.facebook.com/104336479603261 doesn't seem to show anything about the redirect but, perhaps there's another way to find it.
Edit:
Solutions that don't use the api are fine.
Edit2:
Solved here is the code I used if anyone is interested:
Code mostly copied from How can I determine if a URL redirects in PHP?.
function getURL($URL)
{
$ch = curl_init($URL);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,true);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3');
curl_exec($ch);
$code = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
return $code;
}
Only thing really worth noting is that I had to add user agent so I don't get sent to an unsupported browser page.

Edit: Solutions that don't use the api are fine.
Then just make an HTTP HEAD request for the page URL, and see if you get a 301 Moved Permanently status code with a Location header in response. (Actually checking the status code only should be enough, if you only want to find out if the page gets redirected; if you also want to know where it’s getting redirected to, then check the location header as well.)

Related

Verifying login to website via powershell - do I need to scrape regardless?

I need to test if a set of default credentials work for a site programmatically.
Right now I have two options at my disposal. The app will login if I pass a url like - https://this.com/app/loginprocess.aspx?username=user1&password=pass1.
I can also use POST and Innvoke-WebRequest to sign into the site. Something like;
$UserCredentials = Get-Credential
$InvokeResponse = Invoke-WebRequest $Url -SessionVariable MyInvokeSession
$form = $InvokeResponse.Forms
$form.fields['Username'] = '$UserCredentials.UserName'
$form.fields['Password'] = $UserCredentials.GetNetworkCredential().Password
$InvokeResponse = Invoke-WebRequest -Uri ($Url + $form.Action) -WebSession $MyInvokeSession -Method POST -Body $form.Fields
My first question is - no matter which of the two methods I use, I must scrape the webpage to check if the login was successful, correct?
My second question is - Is scraping the page the only way to check if the login worked? If so what is a good way to scrape for a constant value?
Thanks.
The urls are always structured the same, and the form is always the same.
It depends how the site was coded - but every HTTP request returns a Status code - and generally you should be able to use this to validate a login without having to look at the actual response itself.
Take a look at the value of $InvokeResponse.StatusCode after making the request. Try a correct and incorrect login - does it return the same status code or different? If you get a 200 or similar for valid login attempt and a 403 for the invalid login attempt those are standard success/fail codes for a login and you can use them in your logic. Another way of testing if a login worked is trying to login, then access a page that only works for logged in users (try the same test on the response code of that page). If both tests give the same code for success and fail you need to use the content to decide if you logged in or not.
You can check the Status code like this:
$r = Invoke-WebRequest -Uri https://www.google.com/
Write-Host "Response code $($r.StatusCode)";
if($r.StatusCode -lt 400){
Write-Host "Probably OK";
}else{
Write-Host "Something went wrong";
}
and i get the output:
Response code 200
Probably OK
FYI codes starting 2 are Success(200, 201, 204 are common ones), codes starting 3 are redirects(301, 304 etc), codes starting 4 is a client error (404 not found, 403 not authorized etc) and codes starting 5 something went wrong server side. BUT its totally arbitrary what code the programmer that made that page decided to send back for any given input so test thoroughly :-)
Side note - use POST not GET when sending credentials. By default most servers will write GET request parameters to its log files in plain text and not for POST requests. You might want to change that password just in case

Simple API request not working - 403 error

I am trying to run a simple API request from a perl script. But it seems not working. Same request if copied to web browser works without any problem.
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $query = 'http://checkdnd.com/api/check_dnd_no_api.php?mobiles=9944384761';
my $result = get($query);
print $result."\n";
when I use getprint($query) - it gives 403 error.
If you take a look at the body of the response (i.e. not only at the status code 403) you will find:
The owner of this website (checkdnd.com) has banned your access based on your browser's signature (2f988642c0f02798-ua22).
This means that it is blocking the client because it probably looks too much like a non-browser. For this site a simple fix is to include some User-Agent header. The following works for me:
my $ua = LWP::UserAgent->new;
$ua->default_header('User-Agent' => 'Mozilla/5.0');
my $resp = $ua->get('http://checkdnd.com/api/check_dnd_no_api.php?mobiles=9944384761');
my $result = $resp->decoded_content;
The site in question seems to be served by Cloudflare which has some thing they call "Browser Integrity Check". From the support page for this feature:
... looks for common HTTP headers abused most commonly by spammers and denies access to your page. It will also challenge visitors that do not have a user agent or a non standard user agent (also commonly used by abuse bots, crawlers or visitors).

Use perl REST::Client and multi-part form to post image to Confluence

How can I attach an image to an existing Confluence page, using their latest REST API, from Perl via the REST::Client module?
This may be a perl question (I may be posting an image incorrectly), or a Confluence question (their API docs may be missing a required detail such as a required header that is being quietly added by curl), or both.
I have established a connection object, $client, using the REST::Client module. I have verified that $client is a valid connection by performing a $client->GET to a known Confluence page ID, which correctly returns the page's details.
I attempt to upload an image, using:
$headers = {Accept => 'application/json',
Authorization => 'Basic ' . encode_base64($user . ':' .
$password),
X_Atlassian_Token => 'no-check',
Content_Type => 'form-data',
Content =>
[ file => ["file1.jpg"], ]};
$client->POST('rest/api/content/44073843/child/attachment', $headers);
... and the image doesn't appear on the attachments list.
I've packet-sniffed the browser whilst uploading an image there, only to find that it uses the prototype API that is being deprecated. I'd hoped that I could just stand on Atlassian's shoulders in terms of seeing exactly what their post stream looks like, and replicating that... but I don't want to use the API that's being deprecated, since they recommend against it.
The curl example of calling the Confluence API to attach a file that they give at https://developer.atlassian.com/confdev/confluence-rest-api/confluence-rest-api-examples, when my host, filename, and page ID are substituted in, does post the attachment.
I formerly specified comment in my array of Content items, but removed that during debugging to simplify things since the documentation said it was optional.
One thing I'm unclear about is getting the contents of the file into the post stream. In the curl command, the # accomplishes that. In REST::Client, I'm not sure if I have to do something more than I did, to make that happen.
I can't packet-sniff the outgoing traffic because our server only allows https, and I don't know how (or if it's even possible) to set the REST::Client module or one of its underlying modules to record the SSL info to a log file so that Wireshark can pick it up and decode the resulting TLS traffic, the way one can with the environment variable for Chrome or Firefox. I also don't have access to server logs. So I don't know what the request I'm actually sending looks like (if I did, I could probably say, "OK, it looks wrong right HERE" or "But it looks right?!"). I am therefore unfortunately at a loss as to how to debug it so blindly.
A similar question about posting multipart forms using REST::Client was asked by someone else, more generically, back in April of last year, but received no responses. I'm hoping that since mine is more specific, someone can tell me what I might be doing wrong.
Any help would be appreciated.
You should be capturing your post response like so:
my $response = $client->POST('rest/api/content/44073843/child/attachment', $headers); print Dumper $response;
Also, the url you are using in the POST call is incomplete and won't work, you need the full url.

HTTP error: 403 while parsing a website

So I'm trying to parse from this website http://dl.acm.org/dl.cfm . This website doesn't allow web scrapers, so hence I get an HTTP error: 403 forbidden.
I'm using python, so I tried mechanize to fill the form (to automate the filling of the form or a button click), but then again I got the same error.
I can't even open the html page using urllib2.urlopen() function, it gives the same error.
Can anyone help me with this problem?
If the website doesn't allow web scrapers/bots, you shouldn't be using bots on the site to begin with.
But to answer your question, I suspect the website is blocking urllib's default user-agent. You're probably going to have to spoof the user-agent to a known browser by crafting your own request.
headers = {"User-Agent":"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"}
req = urllib2.Request("http://dl.acm.org/dl.cfm", headers=headers)
urllib2.urlopen(req)
EDIT: I tested this and it works. The site is actively blocking based on user-agents to stop badly made bots from ignoring robots.txt

file_get_contents not working with facebook graph api

file_get_contents not working with following url (failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request)
$token_url = "https://graph.facebook.com/oauth/access_token?client_id=235326466577139&redirect_uri=http%3A%2F%2Fapps.facebook.com%2Flikeablephotos%2F&client_secret=CLIENT_SECRET&code=AQDFZbjpAUda8c_gz4wDDuBOVrsn0dApz3s8UA--7hFQIi1wb70-tDE56xXcCtDq-hV5UWzR5YEw_ozuGT24FLfvh9KnqHZ3xdn46P_KxYCf3DHJQA3AAu2ICHBqTk1-6fHTsl6FbagKz83H6dn15kkbKksajA4KcVIoPse5JbuBLlh6V5L1ANe8fzR94iH_SMU";
$response = file_get_contents($token_url);
but if you copy and paste the above URL into the browser address bar it works just fine! and returns:
access_token=AAADWBzZAyUvMBAL2Th6CRtxh2Up5soTCK8N4HJcy0ZBhQgJPxtZArKbuITISMoGLDxNiyeNW4GUZCBvJPeBTH6mx4v83ueUIAAYQJA1WrAZDZD&expires=5112501
but his similar URL (for a different user) also works:
$token_url = "https://graph.facebook.com/oauth/access_token?client_id=235326466577139&redirect_uri=http%3A%2F%2Fapps.facebook.com%2Flikeablephotos%2F&client_secret=CLIENT_SECRET&code=AQC2kTEV96-1Cki2oYUhyzjH6yFe6AJRd1Q3G8fbUXW-IsLJUlactzSwCvGVBK6jh1tL-t7v6dOWJZzbkSYhk0n2z6BHQcpljWAdoXFGB5zLC4FgW8fmxT6hwdRIQOr2dZ95CD_q5yJuOUz_2DItUa3_FF9m2_TmFYGEbxPoiaF47YSTUuZp6g-8ffziJcKDAdo";
when using file_get_contents
Please help, thanks
As an alternative to file_get_contents have you considered using cURL. I use curl for alot of requests with great results and on fail it will not expose your client secret.
refer to http://php.net/manual/en/book.curl.php
this code snippet is a standard for all my apps to get application access token. Can be used for all api calls.
$app_access_token = GetCH();
function GetCH(){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://graph.facebook.com/oauth/access_token?client_id=YOUR_APP_ID&client_secret=YOUR_APP_SECRET&grant_type=client_credentials");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
if(substr($url,0,8)=='https://'){
// The following ensures SSL always works. A little detail:
// SSL does two things at once:
// 1. it encrypts communication
// 2. it ensures the target party is who it claims to be.
// In short, if the following code is allowed, CURL won't check if the
// certificate is known and valid, however, it still encrypts communication.
curl_setopt($ch,CURLOPT_HTTPAUTH,CURLAUTH_ANY);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
}
$sendCH = curl_exec($ch);
curl_close($ch);
return $sendCH;
};
Finally found the answer, had the same issue..the problem is most likely because facebook's code that they give you has spaces in between the params. If you just copied and pasted from Facebook, you will get a HTTP/1.1 400 Bad Request error. Put all the params on one line and remove all spaces and this should fix it. Worked for me. I know this is an older post but if you are having this problem let me know if it works for you!
For debugging:
Take the actual string in $token_url and paste into a browser address field and see what happens.
You will get some back some json error code.
For me, it was the my token was expired. Starting fresh request worked fine.
same problem happened to me the problem was access token supplied with the URL was expired since access token have 1 hour validity ,so i created a new access token and it works again ,hope this info helps .