Localization for REST APIs - rest

I am starting this discussion to gather more info on localization practices for APIs. It seems HTTP does NOT provide sufficient guidance and even the state of practice is not sufficient enough.
The basic problem is that APIs may need to provide content that is dependent on the user culture, country, language and timezone. For example a German user would like to read messages in German language, with European metric dates, numbers, units, using Euro currency and in Central European Timezone.
Reading through RFC 7231 Section 5.3.5 Accept-Language and further into RFC 4647 one may think Accept-Language is sophisticated enough and is what should be done. There are several notable shortcomings though:
Language tags may not be precise enough e.g. user may only request language without country code and thus leave ambiguity as: "de, en;q=0.8"
Even if the user supplies both language and country preferences it is not clear how to tie the selection of message locale and value formatting locale. For example if a user requests: "hu_HU, en_US;q=0.9" while the application lacks Hungarian messages and is written in Java that knows how to format date in Hungarian. So should the app use English messages with Hungarian dates or rather provide English messages with US dates? The actual situation may be more complex.
Timezone is not present in the language tags. There is no HTTP standard header for this it seems.
I see Microsoft have thought about #2 in ASP.Net and introduce the notion of Culture and UICulture to separate selection of message language from formatting.
In Java world Spring have introduced TimeZoneAwareLocaleContext to address #3
W3c have issued guideline to Accept-Language used for locale setting. This more or less says that Accept-Language is not enough
So what is your thinking?
Do you know of APIs tat solve this problem in comprehensive way? Pointers?
Should APIs accept multiple values for selecting message language, value formatting locale and timezone?
Should Accept-Language be used at all?

Ok guys,
here is a summary of how I answer my question. I hope this helps future API authors.
The fundamental requirements for an UI based on top of API excluding currency presentation seem to be:
Select the best language out of the available product translations using RFC 4647 list of language ranges
Select the best data format out of the available using RFC 4647 list of language ranges
Allow clients to provide distinct preferences for translation and format. There will be cases where people will not find the best translation and yet prefer to see the proper formatting aligned with their culture.
Allow clients to specify a timezone using IANA TZDB identifiers
Format data elements using Unicode CLDR http://cldr.unicode.org/
Use named placeholders in localization bundles e.g. "{drive} is corrupt" is easier to translate properly than "{1} is corrupt"
On the REST HTTP headers I suggest use of 3 headers
accept-language - used for selecting translation and following the guidelines of RFC 7231 https://www.rfc-editor.org/rfc/rfc7231#section-5.3.5
format-locale - used to select data formatting style if different from the translation language preferences. Again list of language range elements. Defaults to accept-language if omitted.
timezone - used to select timezone for rendering date and time values. This should be valid timezone ID from the IANA TZDB https://www.iana.org/time-zones
Implementation wise it seems Java 8 and later have full capability to implement a globalized application. Other languages and older Java versions seem to have varying degrees of issues.

I would keep all data in a universal locale independent format. For numbers using . as a decimal separator, date and time using ISO 8601 and in UTC, etc.
Provide localized text only if it absolutely necessary. In that case get the locale from accept-language header field, and if you have the localized string pass that. If not fallback to the string you have.
For example, you might a multilingual product database that contains product data in several languages. When you write an API for the database you can select the product data in user's language (if any).
Here is a sample.

Related

Best way to denote time (without date) in Swagger spec

What is the best way to represent a time field in a swagger specification, the closest type to denote it looks like date-time but this makes standard deserialisers to expect date field to be passed along with the time... Is there a standard or best practice to just denote time in a swagger spec that works well with the Jackson deserialisers?
Is denoting time in milliseconds/seconds and using type string in swagger an acceptable approach?
Depending on what you're trying to represent, this may or may not be a good idea.
If you want to represent a specific timestamp, then it's probably much safer to include the date.
If the date really isn't important (eg. you want to indicate that an event takes place at 14:00 every day), then I don't believe swagger has a built in format for that. However, the swagger format field is open and swagger has support for ECMA 262 regex string patterns.

How to get TimeZones array in swift using Chinese

I need to get to the time zone array using Chinese,but ,use TimeZone.knownTimeZoneIdentifiers get is English array
TimeZone.knownTimeZoneIdentifiers returns a list of IDs from the IANA TZ Database. A comprehensive list is available here.
These are not to be translated. They are identifiers, passed as parameters to code to identify a time zone. They are always in English, and their exact spelling, casing, and punctuation should remain intact. If you were to translate it, you would find they are not usable in any API.
If your goal is to display a human-readable translated string in a UI, then you should use the localized names provided by the Unicode CLDR project. I am not an iOS developer, so I can't be certain, but from reading the docs, I believe these are already available to you by using the localizedName instance method of the TimeZone class.

How do you represent forever (infinitely in the future) in iso8601?

An API defines that a date should be sent as iso8601, but we have a requirement to send "forever" as a date, and the standard does not seem to cover this. Can anyone suggest a better solution than Dec 31 9999? Is there a different standard that would be more appropriate?
Quoting ISO 8601:2004(E):
3.5 Expansion
By mutual agreement of the partners in information interchange, it is permitted to expand the component
identifying the calendar year, which is otherwise limited to four digits. This enables reference to dates and
times in calendar years outside the range supported by complete representations, i.e. before the start of the
year [0000] or after the end of the year [9999].
And also relevant may be section 3.7 Mutual agreement which basically says you're free to define your own representations as long as you don't interfere with the representations defined in ISO 8601. So 9999-12-32 or 9999-13-00 could be mutually agreed upon for your proposed forever value.
As to what's common practice, I'd say it depends.
I'd go for 3.7 whenever possible. But it's important to assess your role within the whole set-up. E.g. if you're using a 3rd party API within your own set of components for the sake of convenience or future compatibility, there should be no problem at all. If you're part of a bigger system and you'd have to convince tens of other system parties/components/modules/etc. I'd say it's not worth the trouble.
Also very important to check legacy code. And at least sketch out a plan on how to do the migration in case it breaks set-ups beyond belief. That could be anything from documenting your API "extension" to actually sending patches to the legacy code maintainers.

creating dynamic NSLocale with location data

I'm working on an app which should create an NSLocale object based NOT on the user's region (which should remain at the user's preferred language for most interface elements), but on the physical location of the traveler, to format currency. However, to build an NSLocale I need to concatenate the language (e.g. 'en') and the location (e.g. 'US') to initWithLocaleIdentifier:#"en_US", and thus to get the currency conventions into the formatter.
I can get the ISOcountry code from the CLPlacemark, but the language information... is harder to identify. Is there a lookup table of language options for each country, or some other option for initializing an NSLocale object based only upon the 'country' information?
I've made a cheap concatenation of #"us_US" which seems to work as well as #"de_DE" (!), but I don't know if I can count on that in all cases.
Thanks,
Tim
For what it's worth, it seems that the NSLocales can in fact be initialized -- at least for the purpose of obtaining currency information -- with 'gb_GB' or 'us_US' locale identifier. I haven't found exceptions in the locations offered by Xcode.

The Best Way to Parse Dates from an Email

Im currently developing an app that can parse dates from an email - i.e extract the time and dates from an email (similar to gmail).
Currently I do this in php but this is a tad clunky.
Whats the best language to do this in and are there any existing open source solutions?
I think PHP is as capable as any other language. Can we see the code you're using so we can suggest improvements? I'd use a regular expression... you just need a good one that supports a variety of formats.
What I do in my email client is extract all the tokens delimited by whitespace and then iterate over them using heuristics to decide how to classify each token. For instance if the token has a ':' character in it then I treat it as a time, to be parsed as ##:##:##. If it has '.' or '-' treat it as a day/month/year combo, and you have to decide which end is which... could be any number of combinations. If the token starts with a letter (i.e. isalpha(*string)) then you do a month name lookup. If it's a number it could be the day or year... decide based on length and whether you have an existing day or year already etc. If the token starts with '-' or '+' then it's a timezone, parse accordingly.
Seems to work in the field quite well, my email client has been around for 10 years or so. My code is C++, but you can write the same in PHP easily, it's not particularly language specific.
if you mean the date it was sent (or received), you are retrieving them from the mail headers (for example 'Date:' header) and they have a standard date format, see the RFC 2822
Anyway, if you use javamail (it's open source now) you can get the sent date with
Date sentDate = mail.getSentDate();