I have a web application which uses URLs that look like this:
http://library.example.com/Register.aspx?query=academic&key=586c70bb-5683-419c-aae9-e596af9ab66a
(The GUID is used instead of a plain int to discourage guessing, which is all we need for now.)
The problem: that long URL frequently breaks when sent via email. It's humans sending the emails, so I can't control the formatting. Sometimes it's the sending email program at fault, sometimes the receiving, but regardless I'm spending too much time on talking people through fixing problems.
Everything has to be from this domain, so I can't use a third-party shortener. I could host my own, but that seems like a kludge.
Any suggestions?
Edits
#Sunny: Thanks for elaborating, but my situation differs from what you assume. A corporate customer (of mine) passes this URL to its employees, and they use it to get to a branded Registration page. They need to give a working email as part of registration, and that gets forwarded to the corporate supervisor.
Registration gets them access to a database, but what they see is not specific to the corporate customer. So the occasional interloper is not a big deal; when they get weeded out by the corporate supervisor, we invite them to subscribe.
#Everybody: the email breakage is not on the punctuation (?&=), but at some predetermined line-length. Surprised me, too. Note that the domain name is long, as is the path to the virtual directory, which is a part of the problem.
After reading the responses, I'm going to use base64 as a pseudo-shortener, something like:
http://a.MyLongDomainName.com/?q=a&key=base64_encoded_GUID
...and see if that survives. Thanks to all.
You can at least shorten it a bit. Right now, you're send a GUID, which is a 128-bit number, in a format that is essentially hexadecimal with extra dashes. If you view the GUID as a byte array and convert it to Base64, you can cut things down a bit. Likewise, "query=academic" could be "q=a".
The GUID is currently taking up 36 characters. Converting to Base-64 cuts this down to 22, saving 14 chars. Replacing "query=academic&key=" with "q=a&k=" shaves off another 13. Cutting a total of 27 characters may well keep your URL short enough not to wrap, despite the presence of ampersands and equal signs.
One more detail: the Base-64 text is going to end with an "=", which will then be hex-encoded into "%3D". The solution is to cut that character off, because it's just padding.
With credit to the original posters, it looks like the best bet is a combination of things:
Compact GUID with base-64.
Shorten key names and, if possible, values.
Wrap URL in angle-braces to encourage client to parse it properly.
If possible, replace key names with URL-rewriting, so that it looks like a path.
If you can't use a third-party URL shortener, then your only option (besides changing the URL structure, as Sunny suggested) is to surround your URL with angle brackets, like this:
<http://library.YourDomainNameHere.com/Register.aspx?query=academic&key=586c70bb-5683-419c-aae9-e596af9ab66a>
Any email client that follows the guidelines found in the Uniform Resource Identifiers (URI): Generic Syntax document should display a clickable link. This is not a fool-proof solution, however, and you'll likely end up resorting to a URL shortening service or restructuring your URLs.
The only alternative to installing your own shortener service (which would be the ideal solution IMO), may be base64 encoding of the whole URL (and using a shorter key). But that would increase string length by 33% (very likely to break in E-Mail clients as well), and look ugly.
I would go with building a URL shortener service that shortens URLs on demand to something like this:
http://library.example.com/go/586c70bb-5683-419c-aae9-e596af9ab66a
There are some prepackaged URL Shorteners that you could host on your own. Here's a codeplex search
http://www.codeplex.com/site/search?query=url%20shortener
This will give you the ability to keep your short url's in house
Alternatively you could some how implement a RESTFul URL that would be a lot harder to screw up
http://library.example.com/Register/Academic/586c70bb-5683-419c-aae9-e596af9ab66a
This solution should work better than the querystring simply because what usually breaks in the email clients is the ?, the =, and the &
I personally think a RESTFul solution is best as it creates the cleanest urls that still make "some" sense.
How about replacing the GUIDS with YouTube style keys
e.g. http://library.example.com/Register.aspx?q=academic&k=jkGlkNu8
By using base-64 strings (instead of Guids which are base-16) and dropping those pesky dashes, you can pack a decent range of unique keys into a small amount of characters.
What about a combination of the methods described here?
Combining shorter URLs with Base64 encoding of the key would turn
http://library.example.com/Register.aspx?query=academic&key=586c70bb-5683-419c-aae9-e596af9ab66a
into
http://l.example.com/register/ac/WGxwu1aDQZyq6eWWr5q2ag
Much more readable, IMO. And lack of chars like ? and & reduces the risk of cut'n'paste errors.
REST-ful url like:
http://www.yourdomainhere.com/register/academic/{userName_here}
might help IMO.
If the user is not registered, this will do it & return a message confirming the fact
If the user has already been registered, there will be no action & perhaps a notification that the user has been registered can be shown.
The routing of the URL and/or validating the request etc. can be implementation detail best left to a module looking at the request pipeline...
HTH.
EDIT:
As pointed out by #Steven below, there is an addition step involved in this solution:
When the user clicks on the REST URL, launch the confirmation/login screen with the user name pre-filled. The user can login to the account & this is confirmation that the user is valid. Till he does the first login, the status of the account can be "not confirmed" & at his first login, it can be "confirmed" without bothering if the click/request has come from the email sent and/or via a request in a web browser.
This will also ensure that it will work for authentic email account since till the user actually does a valid login, the account will not be in "confirmed" status...
Related
Before i start thinking about this programatically, does anyone know if it is possible to actually extract the correct url from an email link that is basically a tracking module?
Our work email system auto blocks tracking based urls from email, so i am thinking of writing something to extract the correct url so people can copy and paste the tracking link into a program and it will provide the correct url.
Is this even possible with the way that email tracking works?
Here is an example of a url in an email that i recently received:
http://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiNTE0MTQ4NSIsImRlbGl2ZXJ5X2lkIjoiOTI0NzI2MTU0IiwidXJsIjoiaHR0cHM6Ly93d3cuYXhzaWVkLmNvbS9nY3NlLWNvbXB1dGVyLXNjaWVuY2Uvb2NyLW5lYS1ndWlkZS8_X19zPXphb2txcDVpaWN4NGkxZndtYmNnIn0
Our system blocks these. It eventually resolves to:
https://www.axsied.com/gcse-computer-science/ocr-nea-guide/?__s=zaokqp5iicx4i1fwmbcg
(got our network admin to check it for me)
I want a system that gets the right url from the ugly mess that is blocked so we can actually view links from emails.
Thanks in advance for any help.
The data in tracking URLs are typically a unique ID pointing to some entry in a database, or are encrypted with a private key, so there's no way to obtain any meaningful information from them. (see answers to this related question: Generate unique link for each website visitor)
More naive approaches will simply encode the data, in which case you may be able to extract useful information from them. Funnily enough, your example URL is a base 64 encoded JSON object containing the link itself:
{
"account_id": "5141485",
"delivery_id": "924726154",
"url": "https://www.axsied.com/gcse-computer-science/ocr-nea-guide/?__s=zaokqp5iicx4i1fwmbcg"
}
In this case you could actually resolve the URL on your own, but this type of approach is uncommon for that very reason.
I am using the current link in my email.
*|baseUrl|*/verifyEmail?token=*|token|*
This however causes one or two people to get strange links from the email and get not found, usually based on some random email providers. E.g. - if I use a 10 minute mail (10minutemail.com), I get the following:
https://10minutemail.com/10MinuteMail/www.mywebsite.com/verifyEmail?token=b32fee82da59e7b4085269faca35ec7025122876
Correct link: www.mywebsite.com/verifyEmail?token=b32fee82da59e7b4085269faca35ec7025122876
Assuming this is due to baseUrl? Am I doing something fundamentally wrong when setting up my email link?
You need to include http:// or https:// with your baseUrl. Otherwise the email client may prepend a default base address instead of 'just' the missing protocol, especially if it is a webmail client.
I have this account creation email that is sent out to anyone who is trying to create an account as I need to authenticate that they are who they say they are.
However, my issue here is that the URL where they need to click when they receive my email is too long and some email clients do not handle that very well and sometimes truncates the URL thus making the URL invalid when clicked.
Because the URL contains the domain name, the hashed email and a long activation code. It looks something like this.
http://domain.com/activation?email=75a5867d3df134bededbaf24ff17624d&key=8fecb20817b3847419bb3de39a609afe
While some email clients are ok with this but some are not...And I don't want to use HTML email and rather stick with plain/text email. Also I heard horrible stories using URL shorteners so I am not sure if I should use them...
Any insights in this area is appreciated!
I would definitely agree with Jason: shorten your url.
Think of what you really need.
Most likely the email address is in the database already, so you can refer to if with a short ID (let's say 7 numbers max). Your signature can be something very simple as substring (base64_url(md5(email+salt)), 0, 5). 5 base64 characters are 64^5=about 1 billion possibilities. This is probably secure enough (and what would the real damage be if someone registered with a wrong email address). So your url would be http://domain.com/activation?email=1234&key=aD5Y_, http://domain.com/activation?e=1234&k=aD5Y_ or even http://domain.com/activation?e=1234aD5Y_ . In the last format you know the last 5 characters are the key, so the rest is the id. Note that the code example assumes md5 to return in an 8-bit string format (and not hex string format), and base64_url uses a url safe base64 method. Also, some background info on a salt.
If your email address has a long id or needs to be encoded in the url as well, or the above is not short enough yet, consider an even shorter form. Basically this will result in making your own url shortener. Just before you insert the link into the email, generate some random 5 character string. Insert this string as key into memcached (or the database), with as value the original url. Then your url could be http://domain.com/redirect?key=rT-tW . When you see this in your app, just retrieve the original url from the database/memcached and redirect there.
Do make sure that your system is robust against the following:
Someone enters an email address (their real email), you send the link
That person changes their email address into something fake on the website before clicking the link, you send a new email to the new (fake) email address
They now click the link from the first email and your website confirms their email address in the second (fake) form.
One way to do this is make sure to use the email address itself (and not for instance just the user id) in the key generation, as suggested above.
Numerical IDs vs names
As an example, which of these would you choose for identifying a single transaction, from a single bank account, for a single company:
/companies/freds-painting-ltd/accounts/savings/transactions/4831
/companies/freds-painting-ltd/accounts/1/transactions/4831
/companies/62362/accounts/1/transactions/4831
You idiot, something totally different! Crikey, did you even READ Fielding's dissertation?
Now, I think the 1st one is the most readable. If I have more than one company, or if I'm someone like an accountant managing multiple companies, it's immediately clear which company, and which account, I'm looking at. It's also more bookmarkable/emailable and would prevent 'fishing' for other companies by changing the company ID. I would want transaction IDs to be unique to an account (I.e. Both 'savings' and 'current' accounts could have transaction '1'
A 'company' will be my 'top-level', or 'first class' resource. Nothing at all would ever be shared between companies. As such, it would be the ideal candidate for a shard (or 'ancestor'/'namespace' in Google App Engine parlance). So I'd only have to worry about the account names being unique within one company. Every company could have an account called 'savings'.
Not sure what the situation in the rest of the world is, though LTDs or PLCs in UK would have a unique name, there could be many 'Dave's Window Cleaning' businesses (what's know as a trading name).
The business owner(s) could potentially opt for the top level /company/company-name URI to be public, and contain some basic details like their website, contact details etc, but everything below that would NEVER be accessible by search engines.
So my thoughts/concerns are:
1) Is it reasonable, when someone signs in to add their business, to say "Sorry, 'Dave's Window Cleaning' business is taken. How about 'Dave's Window Cleaning Portsmouth' (Having taken their location in another field)? My worry with this is that, for a more well known company, you're giving away the fact that they have an account with you. Or that someone could use that form to search for names. Perhaps not a biggie.
2) The size of the company name. Would it be reasonable for a name like 'Dave's Window cleaning, gardening, and loads of other stuff'? Thus creating a URL like 'daves-window-cleaning-gardening-and-loads-of-other-stuff/'
3) How to deal with someone changing their business name - I would approach it by creating a new company with that string ID, copying over everything, then deleting the old resource. The original URI would return 404 rather than redirecting - as you can't guarantee someone else won't want to take the now unused name, or even if more than one person has used the same name in the past.
4) Should the 'real' unique ID be an number in the back end, and for every request to be handled by first doing a query for what company ID this name actually related to.
5) The impact of searching for a transaction in the persistence layer.
6) The possibility of URL rewriting, but then that wouldn't work cleanly in GAE, nor would it solve the issue of ensuring company names are unique.
RESTful webservice vs RESTful website
So, we potentially have this lovely RESTful webservice that the latest snazzy iphone/android app can use (delusions of grandeur). But what about the main website itself? I note, right now, that the URL I see at the top of my page is not 'RESTful': /questions/ask is an action. There is no 'ask' resource on the server. It's more the state of the page, the preparation for POSTing to /questions/ - or if I'm editing, PUTing to /questions/{id}
I also note that Stackoverflow has URIs like /questions/362352/name-of-the-question, and that the latter part can be omitted, and one will be redirected to it.
Should I host a completely separate webapp that consumes my lovely webservice (from the same domain)? Do I even need a separate REST server, or can I rely on content negotiation (JSON/XML) and HTTP verb to select the right method (I'm using Jersey), and return the right representation?
So I could have /companies/aboxo/ return the whole HTML page (using stringtemplate.org) if it's a GET /,text/plain or test/html, and JSON/XML for others?
But what happens for 'add/edit/delete' transaction? Would GET / /companies/freds-painting-ltd/savings/transactions/?template=add be ok (or GET ../transactions/352?template=edit), and that would return the right HTML?
Thinking about this last detail is driving me mad for some reason.
Comments, suggestions, outright ridicule - all welcome!
Marcos
Rails solves the "id vs name" problem by displaying both in the URL but using only the id to actually identify eg:
/companies/62362-freds-painting-ltd/accounts/1-savings/transactions/4831
ie - for the ones that have a "pretty url" the function that generates your path write both the id and the name... but for your router, where relevant: you strip off everything thats not the id.
incidentally, it means your customer could actually write whatever they like into the URL and it'd make no difference:
/companies/62362-i_luv_blue_turtles/accounts/1-your_mum/transactions/4831
and your router still just sees:
/companies/62362/accounts/1/transactions/4831
:)
For a cannonical URI I suggest just /transactions/{id} as I presume the transaction knows what the company and account is. Therefore, #4 :-)
Is SEO a concern? I presume you don't want random folks off the internet googling for X company's transactions?! Therefore, I would just keep names (which may change) out of the URI.
I am trying to embed an ID into an email so that when a recipient replies to an email that my system sends out, my system can pick it up and match the two together.
I have tried appending a custom header, however this is stripped out when the user replies.
I have tried embedding an HTML comment within the email, but outlook does not seem to keep comments when a reply email is created.
Worst case scenario, I can manually try and match the sent and received emails by time span or have a visible tag within the message body.
Does anyone know of a more elegant solution?
Thanks in advance
Email messages already contain such an identifiers, called Message-ID. And there's even a way to send which message you're replying to by sending that ID in a header called In-Reply-To. That's done by pretty much all email clients, that's how they usually do their threading.
It's defined in RFC 822 (yep that's pretty old) and probably re-defined and refined in more modern versions of that.
I have seen a method that includes a one byte image with a unique name that's linked to the user. When they view the email and download the images, your HTTP server will record a hit for that unique image. Of course the user needs to display images, but you can include a message in the body asking them to display the images. We actually include content in an image so they need to show images.
If your incoming e-mail can handle +foo or -foo suffixes, use that.
Many e-mail systems can route user+foo#example.com or user-foo#example.com
to user#example.com. You can replace foo with some kind of identifier.
Several mailing list servers use this for tracking bounces.
While I can't say for certain, my investigation in that sort of matter some time ago yielded the following "conclusion":
Headers are transformed a lot
Message bodies are transformed a lot
This is partly because, I suspect, of:
Need to protect users from malicious intentions
Need to perform "targeted marketing"
I have seen "unique codes" flying around in clear text in the email body but I would suggest having a unique identifier embedded in the return address instead.
The usual approach is to place the id in the subject line and/or somewhere visible in the message text and informing the recipient that he should not modify the subject or quote the original mail when responding.