Why are bools are sometimes referred to as "flags"? - boolean

Why are bools sometimes referred to as "flags"? Is it just a metaphor or is there some historical reason behind it?

Flags are an ancient way to convey information. A flag, if we ignore lowering it to half-mast, has only two states - raised or not raised. E.g., consider a white flag - raising it means surrendering. Not raising it, the default state, means that you are not surrendering.
A boolean variable, like a flag, only has two states - true and false.

Flag can be used as noun and as verb: To flag can mean to note, mark, signal something (Maybe this is derived from the use of nautical flags?)
An early (but probably not the first) use of the term flag in computer history can be found in the IBM 1620 from 1959 (my emphasis):
Memory was accessed two decimal digits at the same time (even-odd
digit pair for numeric data or one alphameric character for text
data). Each decimal digit was 6 bits, composed of an odd parity Check
bit, a Flag bit, and four BCD bits for the value of the digit in the
following format:
C F 8 4 2 1
The Flag bit had several uses:
In the least significant digit it was set to indicate a negative number (signed magnitude).
It was set to mark the most significant digit of a number (wordmark).
In the least significant digit of 5-digit addresses it was set for indirect addressing (an option on the 1620 I, standard on the 1620
II). Multi-level indirection could be used (you could even put the
machine in an infinite indirect addressing loop).
In the middle 3 digits of 5-digit addresses (on the 1620 II) they were set to select one of 7 index registers.
So a bit used to mark or indicate something was called flag bit.
Of course the use of "flag" in flag fields or status registers is then quite natural.
But once the association between flag and bit has been established it is also understandable that their use can become exchangeable. And of course this also holds for boolean variables.
PS: The same question was already asked, but unfortunately without answer.

Related

Is the file hashing/checksum value case insensitive?

My question is only about file hashing rather than hashing function in general. My assumption is that the value of a file checksum/hashing is case insensitive. My concern is that I cannot find any online documentation to confirm that. I only got the following two points to support my claim.
This link contains some file hash values. None of them contains any capital letter. https://www.virtualbox.org/download/hashes/6.1.2/SHA256SUMS
When I use Powershell Get-FileHash cmdlet, all returns are capitals. https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/get-filehash?view=powershell-7
Can anyone help me confirm my assumption, and provide some documentation on files in Windows as well as in Linux OS?
Hashes and checksums are often presented in hexadecimal notation. Although it is common to use upper case A-F instead of lower case a-f, it does not make any difference.
As for a reference, the question is so basic that it's hard to find a solid reference. One is ISO/IEC 9899 standard for the C programming language:
A hexadecimal constant consists of the prefix 0x or 0X followed by a
sequence of the decimal digits and the letters a (or A) through f (or
F) with values 10 through 15 respectively.
In some use cases, such as CSS lower case might be preferred, as it is more pleasant to read among other lower case characters. .Net's Int32.ToString supports standard numeric formaters. x for lower case, X for upper case.
In System.Convert, there's ToInt32 that will convert values from one base into 32 bit integers. Let's see how hex digit AA is converted to decimal in different cases. Like so,
[convert]::toint32("aa", 16)
170
[convert]::toint32("AA", 16)
170
[convert]::toint32("aA", 16)
170
[convert]::toint32("Aa", 16)
170
Every letter case combination represents the same decimal value, 170. Don't try this on hashes though, as those are usually larger than 32 bit integers.
My question is only about file hashing rather than hashing function in general. My assumption is that the value of a file checksum/hashing is case insensitive.
Hashes are byte sequences, they don't have case at all.
Hashes are generally encoded as hexadecimal for display, for which the 6 "letters" (a to f) can be either case. That's mostly a style issue though I've known system which did object when getting the "wrong" case (some would only accept lowercase, others only uppercase).
Also beware that e.g. it's not unheard of to store or show hashes as base64 where case is relevant. Without knowing why you're asking (e.g. is it idle musing, or do you have an actual use case) it's hard to answer completely categorically.

Can you create a programming language with just one symbol?

Can you create a programming language with just one symbol like brainfuck.
Yes, it has been done before - see Unary.
Basically it's a strange encoding of brainfuck. Treat each BF command as a number. The whole program is then also a number, created by concatenating the commands together (with an extra 1 at front, for unambiguous decoding). Convert the number to unary numeric system (aka the number of digits is your number) and you're done.
Note however the programs in this tend to be very large - a cat implemented in Unary is (according to the information on page) 56623 characters long.
MGIFOS, Lenguage and Ellipsis follow the same principle. Note that e.g. a hello world in MGIFOS
has more characters than particles in the observable universe
Then Len(language,encoding) extends this principle to any language.
They are called OISC One Instruction Set Compiler.
The first one know of is Melzak's Arithmetic Machine (1961), with the instruction:
z = x-y or jump if y>x
You also have Zero Instruction Set Computer, which are more like neural nets.
Not forgetting the amazing FRACTRAN of Conway & Guy (1996), with no instruction but interprets a series of fractions (the program) in a Tuning complete way.

dart, total available string characters?

I'm not familiar with character sets and whether languages pick them up from their environments or if they are baked into the language itself, I wanted to make a simple number system in dart that has the largest possible base it can have, like hex has 0-9a-f I would have every single character in some specified ascending order with lower case and upper case having different values to give me the largest possible base to my number system. I want to do this so I can send numbers as strings with as few characters as possible, so my question is, does dart have a standard baked in character set that I can be certain will exist in every environment it runs in?
You should be able to use every value even if no concrete character is assigned to a code.
This would only be a problem when you try to display the character.
Some codes are control characters with special meaning (like 0x0000) which you should avoid
more info here: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
If you want to transport the result over the internet using text protocols you may be limited to ASCII. In this case I suggest Base64 encoding.

BCPL octal numerical constants

I've been digging into the history of BCPL due to a question I was asked about the reasoning behind using the prefix "0x" for the representation hexadecimal numbers.
In my search I stumbled upon a really good explanation of the history behind this token. (Why are hexadecimal numbers prefixed with 0x?)
From this post, however, another questions sparked:
For octal constants, did BCPL use 8 <digit> (As per specs: http://cm.bell-labs.com/cm/cs/who/dmr/bcpl.pdf) or did it use #<digit> (As per http://rabbit.eng.miami.edu/info/bcpl_reference_manual.pdf) or were both of these syntaxes valid in different implementations of the language?
I've also been able to find a second answer here that used the # syntax which further intrigued me in the subject. (Why are leading zeroes used to represent octal numbers?)
Any historical insights are greatly appreciated.
There were many slight variations on syntax in BCPL.
For example, while the one we used had 16-bit cells (so that x!y gave you the 16-bit word from a word address at x + y (a word address being half of the byte address), we also had a need to extract from byte address and byte values (since we were primarily creating OS and control software on a 6809 byte-addressable CPU).
Hence in addition to:
x!y - get word from byte address (x + y) * 2
we also had
x!%y - get byte from byte address (x * 2) + y
x%!y - get word from byte address x + (y * 2)
x%%y - get byte from byte address x + y
I'm pretty certain they were implementation-specific as I never saw them anywhere else. And BCPL was around long before language standards were as important as they are today.
The canonical language specification would have been the earlier one from Richards since he wrote the language (and your second document is for the Essex BCPL implementation about a decade later). But keep in mind that Project MAC was the earliest iteration - there were plenty of advancements after that as well.
For example, there's a 2013 revision of the BCPL User Guide (see Martin's home page) which specifies #b, #o and #x as prefixes for various non-decimal bases.

Encoding that minimizes misreading / mistyping / misspeaking?

Let's say you have a system in which a fairly long key value can be accurately communicated to a user on-screen, via email or via paper; but the user needs to be able to communicate the key back to you accurately by reading it over the phone, or by reading it and typing it back into some other interface.
What is a "good" way to encode the key to make reading / hearing / typing it easy & accurate?
This could be an invoice number, a document ID, a transaction ID or some other abstract value. Let's say for the sake of this discussion the underlying key value is a big number, say 40 digits in base 10.
Some thoughts:
Shorter keys are generally better
a 40-digit base 10 value may not fit in the space given, and is easy to get lost in the middle of
the same value could be represented in base 16 in 33-34 digits
the same value could be represented in base 36 in 26 digits
the same value could be represented in base 64 in 22-23 digits
Characters that can't be visually confused with each other are better
e.g. an encoding that includes both O (oh) and 0 (zero), or S (ess) and 5 (five), could be bad
This issue depends on the font / face used to display the key, which you may be able to control in some cases (like printing on paper) but can't control in others (like web pages and email).
Also depends on whether you can control the exclusive use of upper and / or lower case -- e.g. capital D (dee) may look like O (oh) but lower case d (dee) would not; while lower case l (ell) looks like a 1 (one) while capital L (ell) would not. (With exceptions for especially exotic fonts / faces).
Characters that can't be verbally / aurally confused with each other are better
a (ay) 8 (eight)
B (bee) C (cee) D (dee) E (ee) g (gee) p (pee) t (tee) v (vee) z (zee) 3 (three)
This issue depends on the audio quality of the end-to-end channel -- bigger challenge if the expected user base could have a speech impediment, or may have to speak through a gas mask, or the communication channel could include CB radios or choppy VOIP phone systems.
Adding a check digit or two would detect errors but not help resolve errors.
An alpha - bravo - charlie - delta type dialog can help with hearing errors, but not reading errors.
Possible choices of encoding:
Base 64 -- compact, but too many hard-to-verbalize characters (underscore, dash etc.)
Base 34 -- 0-9 and A-Z but with O (oh) and I (aye) left out as the easiest to confuse with digits
Base 32 -- same as base 34 but leave out the 0 (zero) and 1 (one) as well
Is there a generally recognized encoding that is a reasonable solution for this scenario?
When I heard it first, I liked the article A Proposal for Proquints: Identifiers that are Readable, Spellable, and Pronounceable. It encodes data as a sequence of consonants and vowels. It's tied to the English language though. (Because in German, f and v sound equal, so they should not be used both.) But I like the general idea.