Stop Facebook/Other Apps from crawling my web application via private sharelink - facebook

Edit: The suggested answer does not work as the robots are not just randomly crawling from my index, they are visiting a specific link when it is entered in a FB message.
I've created a basic chat application in Flask on App Engine. It allows the user to invite others by adding their ID or by giving them a private sharelink that auto-adds who ever goes to it (similar to youtube or google drive).
A serious flaw I have found is that if a user posts the link into a facebook message, Facebook will crawl/visit the link and by design of my system add them as a user to the conversation. All of a sudden you'll see 3 random users join the conversation.
My chat system is completely anonymous and designed to be temporary so theres no login or authentication other than a unique key for each user saved in their session.
So Facebook bots visit the link, get assigned an ID and get authenticated into the conversation because they used the users share-link, is there a way I can stop this via either Flask/Python or App Engine? Could I IP ban facebook?
Some code for the sake of code, does this for every new visitor:
def requires_session(f):
#wraps(f)
def decorated(*args, **kwargs):
if 'profile' not in session:
user_ref = fs_database.collection('users').document()
data = {
'id': user_ref.id,
'date': datetime.now(timezone.utc)
}
# add the user to the database
user_ref.set(data)
# save their id to their session
session['profile'] = data.get('id')
# create a hash for later on to create a sharelink
session['share'] = hashlib.sha256(data.get('id').encode('utf-8')).hexdigest()
return f(*args, **kwargs)
return decorated
I could maybe add a check first if Facebook-bot: return False

For your case I would say that you can avoid that either on your side or on Google Cloud Platform side. To be more precise, you can reject some connections in your code or you can set firewall rules to your App Engine instance to reject connections coming from certain IPs. In the public documentation you can find more information about firewall rules when using GAE:
Using flex environment.
Using standard environment.
Code-wise you can check at this github repo which is addresses the issue of blocking certain IPs to your Flask app.
The last possible option is authentication, but as the chat is anonymous I guess that's not the solution you are looking for.

The accepted answer lead me to this answer, I protected the route with a decorator that would get the 'user agent' of the incoming connection and see where it comes from. If it comes from Facebook, redirect it away.
def check_for_robot(f):
#wraps(f)
def decorated(*args, **kwargs):
if 'not_a_robot' not in session:
agent = request.headers.get('User-Agent')
if request.headers.getlist("X-Forwarded-For"):
ip = request.headers.getlist("X-Forwarded-For")[0]
else:
ip = request.remote_addr
# Stop robots from crawling when sharing conversation links
# Could use the IPs too
if 'facebook' in agent or 'Slackbot' in agent:
return 'No Robots Thanks'
# Real people will get to here and continue on
session['not_a_robot'] = True
return f(*args, **kwargs)
return decorated
#app.route('/')
#check_for_robot
def index()
return 'hello human'
This issue also occurs with ANY messaging service that crawls your links to display data in the chat message (WhatsApp, Slack, etc).
This also exposed a vulnerability in these messaging services as they now return the incorrect metadata back to the chat service, but embed the link you provided, ie. Phishing, Clickjacking

Related

hangouts-chat: hangouts chat bot unable to post messages to a Bot implementation https endpoint

I have developed a HTTPS Synchronous end point that responds to POST messages and configured the URL as "Bot URL" under Chat bot configuration for Hangouts Chat. It is deployed to an EC2 in amazon aws and added a route53 entry for the URL: https://mychatbot-implementation which redirects HTTPS POSTs to my ec2.
However, chat bot is not posting any messages to the https end point and there are no errors logged.
Link to screenshot of chat-bot configuration
Chat Bot Implementation Code Here:
from flask import Flask, request, json, render_template, make_response
app = Flask(__name__)
#app.route('/', methods=['POST'])
def on_event():
event = request.get_json()
resp = None
if event['type'] == 'REMOVED_FROM_SPACE':
logging.info('Bot removed from space...')
if event['type'] == 'ADDED_TO_SPACE':
text = 'Thanks for adding me to "%s"!' % event['space']['displayName']
elif event['type'] == 'MESSAGE':
text = 'You said: `%s`' % event['message']['text']
else:
return
return json.jsonify({'text': text})
if __name__ == '__main__':
app.run(port=8080, ssl_context='adhoc', debug=True, host='my host ip address')
Could someone please advise on the next steps?
Unfortunately, mychatbot-implementation isn't a valid Internet TLD, so Route53 will never be able to route your request (in fact, it won't get it). You have 2 issues to be concerned with (bot implementation, user-reachability) and need to tackle them separately (divide-n-conquer) rather than trying to solve everything at once.
I suggest that to test your bot implementation, you keep your bot running on EC2 and get a reachable IP address (w.x.y.z) to your instance (plus port#) and change your configuration to point to that, i.e., https://w.x.y.z:8080/ and see if the Hangouts Chat service can reach your bot. Once you get this working and your bot debugged, then you can worry about getting a TLD and registering with DNS.

Facebook pixel events call from server

I have absolutelly the same question as dan here - Facebook conversion pixel with "server to server" option . There was written, that there was no way, but it was 2013, so I hope something changed.
So, is there any way to call facebook pixel events (e.g. CompleteRegistration) from server side now?
I can describe situation in more details. Imagine, that user visits our site, where fb pixel tracks 'PageView' of course. When user passes form and sends his phone number, we call 'Lead' event. But then we need to track one more event, when our manager successfully confirmes this user! Of course, it happens on other computer and so on, so there is no idea, how to "connect" to base user.
I've seen a lot of documentation departments like this, but I can't fully understand even if it's possible or not.
Logically, we need to generate specific id for user (or it can be phone number really), when 'Lead' event is called. Then, we should use this id to 'CompleteRegistration' for that user. But I can't understand, how to do it technically.
It would be gratefull, if somebody could explain it.
P.S. As I understand, it is fully available in API for mobile apps. Is it ok idea to use it for our situation, if there is no other solution?
Use Offline Conversions to record events that happen after a user has left your website. Logging these conversions, technically, is very easy. Setting everything up takes a little effort
tldr; check the code below
Follow setup steps in the FB docs (Setup steps 1-5) which are:
Setup facebook Business Manager account
Add a new app to Business Manager account
Create an Ad account, if you don't already have one
Create a System User for the ad account
After the setup, follow Upload Event Data steps on the same page, steps 1-3 to create an offline event set and associate it with your ad. These can be carried out in the Graph API Explorer by following the links in the examples. These can be done programmatically, but is out of the scope of making the event calls from the server for one campaign.
Once you have created the event set, then you can upload your CompleteRegistration events!
You will need to make a multipart form data request to FB, the data key will be an array of your conversion events. As #Cbroe mentioned, you must hash your match keys (the data you have available about your user to match them with a FB user) before sending to FB. The more match keys you are able to provide, the better chance at matching your user. So if you can get their email and phone at the same time, you're much more likely to match your user.
Here's an example of the call to FB using node.js:
var request = require('request')
// The access token you generated for your system user
var access_token = 'your_access_token'
// The ID of the conversion set you created
var conversionId = 'your_conversion_set_id'
var options = {
url: 'https://graph.facebook.com/v2.12/' + conversionId + '/events',
formData: {
access_token: access_token,
upload_tag: 'registrations', //optional
data: [{
match_keys: {
"phone": ["<HASH>", "<HASH>"]
},
currency: "USD",
event_name: "CompleteRegistration",
event_time: 1456870902,
custom_data: { // optional
event_source: "manager approved"
},
}]
}
}
request(options, function(err, result) {
// error handle and check for success
})
Offline Conversion Docs
Facebook has now a Server-Side API: https://developers.facebook.com/docs/marketing-api/server-side-api/get-started
Implementing this is similar to implementing the offline events outlined in the accepted answer.
Keep in mind that it will always be cumbersome to track and connect events from the browser and from your server. You need to share a unique user id between the browser and server, so that Facebook (or any other analytics provider) will know that the event belongs to the same user.
Tools like mixpanel.com and amplitude.com may be more tailored to your needs, but will get very expensive once you move out of the free tier (100+ EUR at mixpanel, 1000+ EUR at Amplitude, monthly). Those tools are tailored towards company success, whereas Facebook is tailored towards selling and measuring Facebook ads.

App Engine force login from facebook

I have a google app engine that handled logging in to Yahoo and Google though the OpenID mechanism and it works well. I have added the facebook connector (Javascript Library) and it to works well with the exception that I can't get the app to think it's authenticated. I know why this is occurring because Facebook provides its own "connect" and it's not OpenID.
Is there a way I can call the GAE framework once authenticated to allow for login: required pages to work?
I want to add others like twitter and microsoft, but it's a real pain if I can't get the GAE to honour the fact it's authenticated.
EDIT
Thanks to #Isaac below I did stumble upon this URL at facebook:
https://developers.facebook.com/docs/facebook-login/manually-build-a-login-flow/
The redirect URL you specifiy gets a few tokens that facebook thinks you need... in their case access_token={access-token}&expires={seconds-til-expiration}.
From the time when I managed to break my app because of the ajax loading I noticed that google OAuth put you back on /_ah/login?continue=<new final end point> which processed the tokens seemlessly and then did something within the framework to register the user as authenticated, then puts you back on the URL you specified in the first place. It's this I want to understand.
Further edit
I've found a few bits for example:
http://code.scotchmedia.com/engineauth/docs/index.html
It's python but it looks like it is possible to handle multiple authentication types from multiple vendors (the link shows things like twitter and facebook) so this does look possible.
I tried creating a new User on the framework, but there is no setCurrentUser() on the UserService object.
The only thing I can think of now is forging the authentication cookie. Anyone know the best way to do that?
I believe facebook has an oauth login flow. These links might help:
http://hayageek.com/facebook-javascript-sdk/
https://developers.facebook.com/docs/reference/dialogs/oauth/
https://developers.facebook.com/docs/facebook-login/
If you can't get oauth to work, you could use a #login_required decorator on your endpoints
import functools
def login_required(func):
#functools.wraps(func)
def decorated(self, *args, **kwargs):
current_user = users.get_current_user()
if current_user or check_facebook_login(self):
return func(self, *args, **kwargs)
elif self.request.method == 'GET':
self.redirect(users.create_login_url(self.request.url))
else:
self.response.headers['Content-Type'] = 'text/plain'
self.response.write('Forbidden')
self.error(403)
return decorated
class QueryHandler(webapp2.RequestHandler):
#login_required
def get(self):
...

List of Facebook CDN addresses

I need to compile a list of the addresses of all the CDNs used by Facebook.
Example:
fbcdn-sphotos-a.akamaihd.net
fbstatic-a.akamaihd.net
...
I need these for a captive portal application that allows user to connect to WiFi with facebook. We allow facebook.com through the firewall for Graph API calls, but one of the issues we had is that the Facebook login dialog takes forever to load and loads without stylesheets/images. We fixed that by white-listing fbstatic-a.akamaihd.net, but we want to make sure we won't have surprises later.
fbstatic-a.akamaihd.net
fbcdn-profile-a.akamaihd.net
fbcdn-sphotos-a-a.akamaihd.net
fbcdn-creative-a.akamaihd.net
fbexternal-a.akamaihd.net
And:
fbcdn-sphotos-b-a.akamaihd.net
fbcdn-sphotos-c-a.akamaihd.net
fbcdn-sphotos-d-a.akamaihd.net
fbcdn-sphotos-e-a.akamaihd.net
fbcdn-sphotos-f-a.akamaihd.net
fbcdn-sphotos-g-a.akamaihd.net
fbcdn-sphotos-h-a.akamaihd.net
Those start from character A and ends at H, if Facebook would add a new domain it must be like :
fbcdn-sphotos-i-a.akamaihd.net
So future CDN domains would be
fbcdn-sphotos-j-a.akamaihd.net
fbcdn-sphotos-k-a.akamaihd.net
fbcdn-sphotos-l-a.akamaihd.net
fbcdn-sphotos-m-a.akamaihd.net
fbcdn-sphotos-n-a.akamaihd.net
And So on.. till character Z
akamaihd is used for old photos and for special photos(cover ph. example..) only as I see...
They created new servers and there is some new logic in them as well. Example:
scontent-a-lhr.xx.fbcdn.net
scontent-b-lhr.xx.fbcdn.net and more but I have seen some shared photos on interesting links. Example this:
https://fbcdn-sphotos-h-a.akamaihd.net/hphotos-ak-prn2/v/1472407_10151814939401656_1687041676_n.jpg?oh=d92bc9af7f987c5fe298ecc8c717a4e1&oe=5287894B&__gda__=1384659513_abeb3f33223e6e737aff91419149509e

Google Data/OAuth/AppEngine/Python - Properly Registering a Web Application

I'm creating a webapp with this combination of tools. I'm authenticating with App Engine in the following manner:
class googleLogin(webapp.RequestHandler):
def get(self):
callbackURL = 'http://%s/googleLoginCallback' % getHost()
#Create a client service
gdClient = gdata.docs.service.DocsService()
gdata.alt.appengine.run_on_appengine(gdClient)
gdClient.SetOAuthInputParameters(gdata.auth.OAuthSignatureMethod.HMAC_SHA1,
_GoogleConsumerKey,
consumer_secret=_GoogleConsumerSecret)
#Get a Request Token
requestToken = gdClient.FetchOAuthRequestToken(scopes=_GoogleDataScope,
extra_parameters={'xoauth_displayname': APP_NAME})
#Persist token secret
self.session = Session()
self.session[TOKENSECRETKEY] = requestToken.secret
gdClient.auto_set_current_token = True
gdClient.SetOAuthToken(requestToken)
authUrl = gdClient.GenerateOAuthAuthorizationURL(callback_url=callbackURL)
self.redirect(authUrl)
I authenticated my domain with Google at https://www.google.com/accounts/ManageDomain, entering a target URL and am using the given Consumer Key/Secret. For instance, if my domain was 'juno.appspot.com', I am using http://juno.appspot.com as the target url path prefix.
The process is working; however, Google presents this message to the user in a yellow security box:
"The application that directed you
here claims to be 'xxxxxx'. We are
unable to verify this claim as the
application runs on your computer, as
opposed to a website. We recommend
that you deny access unless you trust
the application."
I don't think I should be getting this error, since my server is getting the request token and creating the authorization URL. Does anyone have any insight on how to get rid of this warning?
Google's domain registration has an option to upload a certificate, but I shouldn't need to do that because I'm using OAuth with the HMAC_SHA1 signature method.
Also, not that it should matter, but I'm doing all this through a UIWebView on the iPhone. I'm specifically trying to do all authentication server-side to avoid exposing my Consumer Key/Secret.
Thank you for any tips :)
Solved.
The culprit is this line from above:
extra_parameters={'xoauth_displayname': APP_NAME})
Setting this value for a registered application intentionally triggers a warning to users, as indicated by the Google documentation:
xoauth_displayname:
(optional) String identifying the
application. This string is displayed
to end users on Google's authorization
confirmation page. For registered
applications, the value of this
parameter overrides the name set
during registration and also triggers
a message to the user that the
identity can't be verified. For
unregistered applications, this
parameter enables them to specify an
application name, In the case of
unregistered applications, if this
parameter is not set, Google
identifies the application using the
URL value of oauth_callback; if
neither parameter is set, Google uses
the string "anonymous".
Removing this line no longer allows me to use a 'nice' name in place of the domain, but it gets rid of that annoying yellow box :)
I'm not sure exactly where the issue may be in your code, but I've got a one page oauth/appengine/gdata example which may at least set you in the right direction. Have you tried to navigate to the site directly from the iPhone/desktop browser to see what message is delivered?
Hope it helps.
Alternatively, is it possibly to do with the user agent the UIWebView sets?