As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I used to use prefix underscore for instance variable naming, to distinguish from local variables.
I happend to see the "Google Objective-C Style Guide", and found that it suggests to use trailing underscores(refer it HERE), but without any detail explanation about why.
I know it is a coding style issue, but I do wonder is there any advantages of using trailing underscores?
Related: Question about #synthesize (see the blockquotes at the bottom of the answer)
The advantage is: _var is a convention (C99, Cocoa guidelines) for private, but it is so common (even on Apple templates) that Apple uses double underscore __var. Google solves it by using trailing underscore instead.
I updated the other answer with a couple more reasons...
Leading underscore is also discouraged in C++ (see
What are the rules about using an underscore in a C++ identifier?) and Core Data properties
(try adding a leading underscore in the model and you'll get "Name
must begin with a letter").
Trailing underscore seems the sensible choice, but if you like
something else, collisions are unlikely to happen, and if they do,
you'll get a warning from the compiler.
The prefix underscore usually is used in system / SDK libraries. So using prefix underscore may cause overridden of variable in super class and a bug like this is not so easy to found.
Take a look at any class provided by system, like NSView, and you will find that.
Apple uses single leading underscore for ivars so that their variable names won't collide with ours. When you name your ivars, use anything but a single leading underscore.
There is no advantage as such using trailing underscores. We follow the coding style so that it eases the code readability. Underscore, in this case helps us differentiate between iVars and local variables. As far as i know, _iVar is more prominent than iVar_
Almost all programming languages based on english, where we write and read from left to right.
Using leading underscores makes easier finding iVars, and recognition that a variable is iVar.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Some languages/platforms like Java, Javascript, Windows, Dotnet, KDE etc. use UTF16. Some others prefer UTF8.
What is the reason that no language/platform uses BOCU-1? What is the rationale for JEP 254 and JEP 254 equivalent for Dotnet?
Is the reason that BOCU-1 is patented? Are there any technical reasons also?
Edit
My question is not about Java specifically. By JEP 254, I mean compact UTF-16 as mentioned in that proposal. My question is, since BOCU-1 is compact for almost any unicode string, why don't any language/platform use it internally, instead of UTF-16 or UTF-8. Such a usage would improve cache performance for any string, and not just ASCII or Latin-1.
Such a usage may also help in non-Latin programming language support in formats like The Language Server Index Format (LSIF).
What is the reason that no language/platform uses BOCU-1?
That question is far too broad in scope for Stack Overflow, and a concise answer is impossible.
However, in the specific case of Java note that someone raised the possibility of Java adopting BOCU-1 as an RFE (Request For Enhancement) in 2002. See JDK-4787935 (str) Reducing the memory footprint for Strings.
That bug was closed with a resolution of "Won't Fix" ten years later:
"Although this is a very interesting proposal, it is highly unlikely that BOCU or any other multi-byte encoding for internal use would be adopted. Furthermore, this comes down to a space-time tradeoff with unclear long-term consequences. Given the length of time this proposal has lingered, it seems appropriate to close it as will not fix".
What is the rationale for JEP 254...?
There is a section of JEP 254 titled "Motivation" which explains that, and in particular it states "most String objects contain only Latin-1 characters". However, if that does not satisfy you, raise a separate question.
Ensure that it is on topic for Stack Overflow by reviewing What topics can I ask about here? first. Two of the people who reviewed JEP 254 (Aleksey Shipilev and Brian Goetz) respond here on SO, so you may get an authoritative answer.
What is the rationale for ... JEP 254 equivalent for Dotnet?
Again, raise this as a separate SO question.
Is the reason that BOCU-1 is patented?
That question is specifically off topic here: "Legal questions, including questions about copyright or licensing, are off-topic for Stack Overflow", though Wikipedia notes "BOCU-1 is the only Unicode compression scheme described on the Unicode Web site that is known to be encumbered with intellectual property restrictions".
Are there any technical reasons also?
A very important non-technical reason is that the HTML5 specification explicitly forbids the use of BOCU-1!...
Avoid these encodings
The HTML5 specification calls out a number of encodings that you should avoid...
Documents must also not use CESU-8, UTF-7, BOCU-1, or SCSU encodings, since they... were never intended for Web content and the HTML5 specification forbids browsers from recognising them.
Of course that invites the question of why HTML 5 forbids the use of BOCU-1, and the only technical reason I can find for that is that this Mozilla documentation on HTML's <meta> element states:
Authors must not use CESU-8, UTF-7, BOCU-1 and/or SCSU as cross-site scripting attacks with these encodings have been demonstrated.
See this GitHub link for more details on the XSS vulnerability with BOCU-1.
Also note that in line with the the HTML5 specification, all the major browsers specifically do not support BOCU-1.
This question already has answers here:
How does one go from a Unicode character to its description?
(2 answers)
Closed 6 years ago.
I'm asking again because this question got put on hold:
is there a file/data representation of the unicode 9.0 standard?
there is a website http://unicode.org/ that lists the standard
and there is a page here - http://www.unicode.org/charts/ that has pdfs of all the scripts. For example, 1E900 to 1E95F is reserved for Adlam.
I'm hoping there is some sort of unicode.metadata file that can be read in and parsed so that the following queries can be made:
what is the code range for Osmanya?
how many characters are there for Brahmi?
Unicode is a large standard. The JDK (if it is the Java platform you’re interested in) does not provide APIs that cover ‘all of Unicode’.
I believe the most popular library out there is ICU.
If however the queries you mentioned are all that you’re interested in, then perhaps the java.lang.Character API is all you need.
java.lang.Character.UnicodeBlock.OSMANYA (though I see no method to obtain the codepoint range)
java.lang.Character.UnicodeBlock.BRAHMI
You can for example query whether a codepoint is within the range for Osmanya, by testing the value of (Character$UnicodeBlock/of 1234) for equivalence with Character$UnicodeBlock/OSMANYA.
Whenever I use Random.nextString(int), I get a String of questions marks (??????). I've tried using creating an instance of Random and using a seed, but nothing works. I am using Scala 2.10.5. Anyone know what the issue is?
In most terminals, when a character is not displayable (there are a lot of existing characters, and you cannot remotely hope to have them all in the font used by your terminal), it will print a question mark instead.
Because the string is random, you are very likely to have the vast majority of them be non displayable (and thus rendered as a sequence of question marks).
So the strings are valid, they are indeed random (not just a series of question marks), and this is all just a rendering issue. You can easily check that their content really is different each time by displaying the character codes (something like println(myString.map(_.toInt)) will do).
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
i am looking for a mapping table or Perl module or anything else, which makes it possible to map characters to a URL safe version that is also readable.
I need to build URLs without any special characters. The base words are city names in their native language which means it can contain special characters from that language.
For example, when i have something like the polish city name 'łódź' i need to get a readable version like: 'lodz'
The major browsers show and accept non-ASCII characters in the URL bar even if they need to be encoded during transmission.
For example,
http://.../city/Montr%C3%A9al
will appear as
http://.../city/Montréal
in the browser's URL bar. [Test]
But if you want to convert to a subset of ASCII, you'd start by using Text::Unidecode's unidecode. Then you gotta decide what to do with the characters that must be escaped in URLs.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am writing an article on Unicode and discussing the advantages of this encoding scheme over outdated methods like ASCII.
As part of my research I am looking for a reference that listed the languages that could be fully represented using only the characters supported by ASCII. Haven't had much luck tracking it down with Google and I thought I'd tap the collective knowledge of SO to see if anyone had a reasonable list.
Key points:
All languages listed must be able to
be completely represented using the character set available in ASCII.
I know this won't be comprehensive,
but I am mostly interested in the
most common written languages.
There are no natural languages that I know of that can be fully represented in ASCII. Even American English, the language for which ASCII was invented, doesn't work: for one, there are a lot of foreign words that have been integrated into the American English language that cannot be represented in ASCII, like resumé, naïve or a word that probably every programmer uses regularly, schönfinkeln.
And two, ASCII is missing pretty much all typographic characters like “quotation marks”, dashes of various lengths (– and —), ellipses (…), thin and wide spaces and so on, all of which are used in American English.
IIRC from my Latin classes, the macrons in Latin are later additions by people studying meters in Latin poetry; they wouldn't have been used in every-day writing. So you've got Latin.
Given loan words, I don't think there are any such languages. Even ugly Americans know the difference between "resume" and "résumé".
I assume you mean natural languages and only 7 bit ASCII?
In that case the list is quite small. Mostly english.
Some constructed languages such as Interlingua and Ido are designed to use only ASCII characters. ‘Real’ languages in everyday use tend to use characters outside the ASCII range, at the very least for loanwords.
Not a widely used language, but Rotokas can be written using only ASCII letters. See http://en.wikipedia.org/wiki/Rotokas_alphabet