I'm considering assembling image name strings based on device type, orientation, and localization.
For example:
#define kLocalCode NSLocalizedString(#"en", #"localization code")
...would result in "background_iphone_portrait_en.png" (or _es, _de, etc.)
It's faster in my workflow to do it this way instead of placing images with the same names in separate localized folders.
Are there any downsides with this method of image localization?
First:
I found this a really creative solution. And it makes me think.
This is not really an answer but:
You can load the 'en' with the following call:
[NSLocale currentLocale]
This would result that you do not have to translate en to fr, es or de, but can take it runtime.
If there is a language you do not support, load the english version.
Related
I am building an app to read results from color measurement devices, and for this purpose I need to know how to store an array of results to a local file on an android smartphone/tablet and read it back from that file so that it's once again an array I can work with.
The results will be result objects, because I also need to tell when the measurement was taken, and what measurement mode was used (such as B/W-measurement or measurement of a light source).
I know how to get strings in and out, but as far as I know, transforming that to an array is impossible without bodgy and inelegant code.
So where do I even get started here?
Should I use plain .txt?
Or should I try to use .xml or .json files?
You should use standard formats for your storing. The json format is a good one for structured data because many tools support displaying or even parsing it.
For instance, you may store it like this
[
[
"result1",
"23:00",
"B/W"
],
[
"result2",
"18:14",
"Color"
]
]
You can see, if you store it to e.g. test.json and drag'n'drop it on the Firefox browser, it recognizes the format and supports cool displaying. So using standard formats for structured data is a good idea, programming libraries and even programming languages like Python support it with special classes or functionality. It's also easy for you to code the data dump or the parser by yourself.
Also XML is a good format. Actually, .json is more modern while .xml was there first. What to choose depends on in which programming world you are. Some more support this, some more that. For your purposes, it doesn't matter which one you use. I've seen .json much in the Android world but probably because it is just more modern.
But remember, the format is just the framework, which data content you put into it is up to you.
Does anyone know exactly what the new (iOS 6) lowercaseStringWithLocale method of NSString does? The documentation is very skimpy, and I didn't find a single reference to this method in Apple's developer forums.
While localizing my app, I'm interested in changing words from my strings file to lowercase when they appear in a sentence -- except in the German version, where some words should stay in uppercase at all times. Is that what this method is for? Or something completely different?
The discussion in lowercaseString might shed some light:
Note: This method performs the canonical (non-localized) mapping. It is suitable for programming operations that require stable results not depending on the user's locale preference. For localized case mapping for strings presented to users, use the corresponding lowercaseStringWithLocale: method.
So if you're computing the lowercase version of a string for a purpose such as case-insensitive database lookup, use lowercaseString. If you intend to show the user the result, then use lowercaseStringWithLocale.
Note that lowercaseStringWithLocale won't make a decision based on the actual words as to whether the word should be lowercased or not. It does what you ask it to do, and doesn't question your motives.
Lower/uppercasing is indeed locale-dependent. The only example I know about (and it's a killer one, a source of many globalization bugs) is the Turkish i issue. See here for an overview: http://www.codinghorror.com/blog/2008/03/whats-wrong-with-turkey.html
Basically, when you uppercase "Hi" you get "HI" except for Turkey where you get "Hİ"
Likewise, when you lowercase "HI" you get "hi" except for Turkey where you get "hı"
I have googled (well, DuckDuckGo'ed, actually) till I'm blue in the face, but cannot find a list of language codes of the type en-GB or fr-CA anywhere.
There are excellent resources about the components, in particular the W3C I18n page, but I was hoping for a simple alphabetical listing, fairly canonical if possible (something like this one). Cannot find.
Can anyone point me in the right direction? Many thanks!
There are several language code systems and several region code systems, as well as their combinations. As you refer to a W3C page, I presume that you are referring to the system defined in BCP 47. That system is orthogonal in the sense that codes like en-GB and fr-CA simply combine a language code and a region code. This means a very large number of possible combinations, most of which make little sense, like ab-AX, which means Abkhaz as spoken in Åland (I don’t think anyone, still less any community, speaks Abkhaz there, though it is theoretically possible of course).
So any list of language-region combinations would be just a pragmatic list of combinations that are important in some sense, or supported by some software in some special sense.
The specifications that you have found define the general principles and also the authoritative sources on different “subtags” (like primary language code and region code). For the most important parts, the official registration authority maintains the three- and two-letter ISO 639 codes for languages, and the ISO site contains the two-letter ISO 3166 codes for regions. The lists are quite readable, and I see no reason to consider using other than these primary resources, especially regarding possible changes.
There are 2 components in play here :
The language tag which is generally defined by ISO 639-1 alpha-2
The region tag which is generally defined by ISO 3166-1 alpha-2
You can mix and match languages and regions in whichever combination makes sense to you so there is no list of all possibilities.
BTW, you're effectively using a BCP47 tag, which defines the standards for each locale segment.
Unicode maintains such a list :
http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/index.html
Even better, you can have it in an XML format (ideal to parse the list) and with also the usual writing systems used by each language :
http://unicode.org/repos/cldr/trunk/common/supplemental/supplementalData.xml
(look in /LanguageData)
One solution would be to parse this list, it would give you all of the keys needed to create the list you are looking for.
http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
I think you can take it from here http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_information.html
This can be found at Unicode's Common Locale Data Repository. Specifically, a JSON file of this information is available in their cldr-json repo
We have a working list that we work off of for language code/language name referencing for Localizejs. Hope that helps
List of Language Codes in YAML or JSON?
List of primary language subtags, with common region subtags for each language (based on population of language speakers in each region):
https://www.unicode.org/cldr/charts/latest/supplemental/language_territory_information.html
For example, for English:
en-US (320,000,000)
en-IN (250,000,000)
en-NG (110,000,000)
en-PK (100,000,000)
en-PH (68,000,000)
en-GB (64,000,000)
(Jukka K. Korpela and tigrish give good explanations for why any combination of language + region code is valid, but it might be helpful to have a list of codes most likely to be in actual use. s-f's link has such useful information sorted by region, so it might also be helpful to have this information sorted by language.)
I've managed to finally build and run pocketsphinx (pocketsphinx_continuous). The problem I'm running into, is how to a improve accuracy. From what I understand, you can specify a dictionary file (-dict test.dic). So I took the default dictionary file and added some more pronunciations of the same words, for example:
pencil P EH N S AH L
pencil(2) P EH N S IH L
spaghetti S P AH G EH T IY
spaghetti(2) S P UH G EH T IY
Yet pocketsphinx still does not recognize either word at all. I know there is a jsgf file you can specify as well , but that seems more for phrases and grammar. How can I get pocketsphinx to recognize common words such as pencil and spaghetti?
thanks
-Mike
With something like this, you can't be certain, but I can offer the following suggestions:
Perhaps the language model somehow has low probabilities for "spaghetti" and "pencil". As you suggested, you could use a JSGF to test out how it does for recognition if it doesn't use the N-gram models, but instead does a simple grammar (give it like twenty words, including spaghetti and pencil). This way you can see if it is perhaps the language model which makes it difficult to recognize these words, and it can do okay if it considers all the words to have equal probability.
Perhaps you simply pronounce these words poorly, even with the alternative dictionary entries. Try either A. Testing other peoples' voices, or B. Adapting the acoustic model to your voice (see http://cmusphinx.sourceforge.net/wiki/tutorialam)
Also, what is it recognizing them as when it is failing? If possible, remove the words it misrecognizes as from the dictionary.
Again, for overall accuracy, only three things are going to really help you: restricting the grammar, adapting the accoustic model, and perhaps getting higher quality recording input.
To improve accuracy you may want to try adapting the acoustic model to your voice.
http://cmusphinx.sourceforge.net/wiki/tutorialadapt
To learn how to add new words: http://ghatage.com/tech/2012/12/13/Make-Pocketsphinx-recognize-new-words/
Make sure you put a tab (not a space) after the word and before the start of the pronunciation.
May be the problem is with Pocketsphinx. I too was not getting good results with Pocketsphinx. But I was getting very good accuracy with Sphinx4 (for a US speaker with a noise-cancelling microphone.) Therefore I did a comparison between the two using the same audio recordings. For pocketsphinx I used pocketsphinx_batch with the WSJ audio model and a small vocabulary language model and dictionary (created online with the CMU Cambridge language modelling toolkit.) For Sphinx4 I wrote a small Java program using the Sphinx4 library. The result was that Sphinx4 was much more accurate. All the gory details are at http://www.jaivox.com/pocketsphinx.html.
To achieve good accuracy with a pocketshinx:
Important! Check that your mic, audio device, file supports 16 kHz while the general model is trained with 16 kHz acoustic examples.
You should create your own limited dictionary you cannot use cmusphinx-voxforge-de.dic while accuracy is dramatically dropped.
You should create your own language model.
You can search for Jasper project on GitLab to see how it's implemented.
Also, please check the documentation
This is on the CMUSphinx website
"There are various phonesets to represent phones, such as IPA or SAMPA. CMUSphinx does not yet require you to use any well-known phoneset, moreover, it prefers to use letter-only phone names without special symbols. This requirement simplifies some processing algorithms, for example, you can create files with phone names as part of the filenames without any violating of the OS filename requirements.
A dictionary should contain all the words you are interested in, otherwise the recognizer will not be able to recognize them. However, it is not sufficient to have the words in the dictionary. The recognizer looks for a word in both the dictionary and the language model. Without the language model, a word will not be recognized, even if it is present in the dictionary."
https://cmusphinx.github.io/wiki/tutorialdict/
I was discussing this with some friends and we began to wonder about this. Could someone gain access to URLs or other values that are contained in the actual objective-c code after they purchase your app?
Our initial feeling was no, but I wondered if anyone out there had definitive knowledge one way or the other?
I do know that .plist files are readily available.
Examples could be things like:
-URL values kept in a string
-API key and secret values
Yes, strings and information are easily extractable from compiled applications using the strings tool (see here), and it's actually even pretty easy to extract class information using class-dump-x (check here).
Just some food for thought.
Edit: one easy, albeit insecure, way of keeping your secret information hidden is obfuscating it, or cutting it up into small pieces.
The following code:
NSString *string = #"Hello, World!";
will produce "Hello, World!" using the strings tool.
Writing your code like this:
NSString *string = #"H";
string = [stringByAppendingString:#"el"];
string = [stringByAppendingString:#"lo"];
...
will show the characters typed, but not necessarily in order.
Again: easy to do, but not very secure.
When you purchase an app it is saved on your hard disk as "FooBar.ipa"; that file is actually in Zip format. You can unzip it and inspect the contents, including searching for strings in the executable. Try it! Constant values in your code are not compressed, encrypted, or scrambled in any way.
I know this has already been answered, but I want to give my own suggestion too.
Again, please remember that all obfuscation techniques are never 100% safe, and thus are not the best, but often they are "good enough" (depending on what you want to obfuscate). This means that a determined cracker will be able to read your strings anyways, but these techniques may stop the "casual cracker".
My other suggestion is to "crypt" the strings with a simple XOR. This is incredibly fast, and does not require any authorization if you are selling the app through the App Store (it does not fall into the categories of algorithms that require authorization for exporting them).
There are many snippets around for doing a XOR in Cocoa, see for example: http://iphonedevsdk.com/forum/iphone-sdk-development/11352-doing-an-xor-on-a-string.html
The key you use could be any string, be it a meaningless sequence of characters/bytes or something meaningful to confuse readers (e.g. use name of methods, such as "stringWithContentsOfFile:usedEncoding:error:").