how to construct a dfa that recognizes the set of bit string that begins with two D's? - discrete-mathematics

I need help with this solution cause I've just begun studying the finite state automation and couldn't help myself with this problem.
does the question mean that I've to give input 1101 1101 & if so how to show the other state to go to if the bit string is not starting with two D's

Assuming D means the hexadecimal representation of the decimal number 13, then yes, the binary representation of the same number is 1101 and the set of strings beginning with two Ds (meaning, the hexadecimal representation of the binary string begins with two Ds) would be the set of strings beginning 11011101. To accept such strings with a DFA you need ten states:
an initial state
one state for each symbol in your string
a dead state
The transitions will go from the initial state to each state corresponding to symbols of your string, in order, on the symbols at those positions. Any input that's out of place will lead to the last, dead, state, which loops to itself. The only accepting state is the one corresponding to the last symbol of your string, and it loops to itself as well (since any string formed by appending to this base is also a string in your language)

Related

Why are bools are sometimes referred to as "flags"?

Why are bools sometimes referred to as "flags"? Is it just a metaphor or is there some historical reason behind it?
Flags are an ancient way to convey information. A flag, if we ignore lowering it to half-mast, has only two states - raised or not raised. E.g., consider a white flag - raising it means surrendering. Not raising it, the default state, means that you are not surrendering.
A boolean variable, like a flag, only has two states - true and false.
Flag can be used as noun and as verb: To flag can mean to note, mark, signal something (Maybe this is derived from the use of nautical flags?)
An early (but probably not the first) use of the term flag in computer history can be found in the IBM 1620 from 1959 (my emphasis):
Memory was accessed two decimal digits at the same time (even-odd
digit pair for numeric data or one alphameric character for text
data). Each decimal digit was 6 bits, composed of an odd parity Check
bit, a Flag bit, and four BCD bits for the value of the digit in the
following format:
C F 8 4 2 1
The Flag bit had several uses:
In the least significant digit it was set to indicate a negative number (signed magnitude).
It was set to mark the most significant digit of a number (wordmark).
In the least significant digit of 5-digit addresses it was set for indirect addressing (an option on the 1620 I, standard on the 1620
II). Multi-level indirection could be used (you could even put the
machine in an infinite indirect addressing loop).
In the middle 3 digits of 5-digit addresses (on the 1620 II) they were set to select one of 7 index registers.
So a bit used to mark or indicate something was called flag bit.
Of course the use of "flag" in flag fields or status registers is then quite natural.
But once the association between flag and bit has been established it is also understandable that their use can become exchangeable. And of course this also holds for boolean variables.
PS: The same question was already asked, but unfortunately without answer.

dart, total available string characters?

I'm not familiar with character sets and whether languages pick them up from their environments or if they are baked into the language itself, I wanted to make a simple number system in dart that has the largest possible base it can have, like hex has 0-9a-f I would have every single character in some specified ascending order with lower case and upper case having different values to give me the largest possible base to my number system. I want to do this so I can send numbers as strings with as few characters as possible, so my question is, does dart have a standard baked in character set that I can be certain will exist in every environment it runs in?
You should be able to use every value even if no concrete character is assigned to a code.
This would only be a problem when you try to display the character.
Some codes are control characters with special meaning (like 0x0000) which you should avoid
more info here: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
If you want to transport the result over the internet using text protocols you may be limited to ASCII. In this case I suggest Base64 encoding.

Read UTF-8 encoded string from io.Reader

I am writing an small communication protocol with TCP sockets.
I am able to read and write basic data types such as integers but I have no idea of how to read an UTF-8 encoded string from a slice of bytes.
The protocol client is written in Java and the server is Go.
As per I read: GO runes are 32 bit long and UTF-8 chars are 1 to 4 byte long, what makes not possible to simply cast a byte slice to a String.
I'd like to know how can I read and write this UTF-8 stream.
Note
I have the byte buffer length on time to read the string.
Some theory first:
A rune in Go represents a Unicode code point — a number assigned to a particular character in Unicode. It's an alias to uint32.
UTF-8 is a Unicode encoding — a format of representing Unicode code points for the means of storage and transmission. UTF-8 might use 1 to 4 bytes to encode a single code point.
How this maps on Go data types:
Both []byte and string store a series of bytes (a byte in Go is an alias for uint8).
The chief difference is that strings are immutable, so while you can
b := make([]byte, 2)
b[0] = byte('a')
b[1] = byte('z')
you can't
var s string
s[0] = byte('a')
The latter fact is even underlined by the inability to set the string length explicitly (like in imaginary s := make(string, 10)).
While strings in Go contain abstract bytes (you're free to store in them, say, characters encoded using Windows-1252), certain Go statements and type conversions interpret strings as being encoded in UTF-8, in particular:
A type conversion between string and []rune parses the string as a sequence of UTF-8-encoded code points and produces a slice of them. The reverse type conversion takes the Unicode code points from the slice of runes and produces an UTF-8-encoded string.
A range loop over a string loops through Unicode code points comprising the string, not just bytes.
Go also supplies the type conversions between string and []byte and back. Now recall that strings are read-only, while slices of bytes are not. This means a construct like
b := make([]byte, 1000)
io.ReadFull(r, b)
s := string(b)
always copies the data, no matter if you convert a slice to a string or back. This wastes space but is type-safe and enforces the semantics.
Now back to your task at hand.
If you work with reasonably small strings and are not under memory pressure, just convert your byte slices filled by io.Read() (or whatever) to strings. Be sure to reuse the slice you're using to read the data to ease the pressure on the garbage collector — that is, do not allocate a new slice for each new read as you're gonna to copy the data put to it by the reading code off to a string.
Finally, if you absolutely must to not copy the data (say, you're dealing with multi-megabyte strings, and you have tight memory requirements), you may try to play dirty tricks by unsafely working with memory — here is an example of how you might transplant the memory from a byte slice to a string. Note that should you revert to something like this, you must very well understand that it's free to break with any new release of Go, and it's not even guaranteed to work at all.

Fully correct Unicode visual string reversal

[Inspired largely by trying to explain the problems with Character Encoding independent character swap, but also these other questions neither of which contain a complete answer: How to reverse a Unicode string, How to get a reversed String (unicode safe)]
Doing a visual string reversal in Unicode is much harder than it looks. In any storage format other than UTF-32 you have to pay attention to codepoint boundaries rather than going byte-by-byte. But that's not good enough, because of combining glyphs; the spec has a concept of "grapheme cluster" that's closer to the basic unit you want to be reversing. But that's still not good enough; there are all sorts of special case characters, like bidi overrides and final forms, that will have to be fixed up.
This pseudo-algorithm handles all the easy cases I know about:
Segment the string into an alternating list of words and word-separators (some word-separators may be the empty string)
Reverse the order of this list.
For each string in the list:
Segment the string into grapheme clusters.
Reverse the order of the grapheme clusters.
Check the initial and final cluster in the reversed sequence; their base characters may need to be reassigned to the correct form (e.g. if U+05DB HEBREW LETTER KAF is now at the end of the sequence it needs to become U+05DA HEBREW LETTER FINAL KAF, and vice versa)
Join the sequence back into a string.
Recombine the list of reversed words to produce the final reversed string.
... But it doesn't handle bidi overrides and I'm sure there's stuff I don't know about, as well. Can anyone fill in the gaps?

Displaying Unicode Characters

I already searched for answers to this sort of question here, and have found plenty of them -- but I still have this nagging doubt about the apparent triviality of the matter.
I have read this very interesting an helpful article on the subject: http://www.joelonsoftware.com/articles/Unicode.html, but it left me wondering about how one would go about identifying individual glyphs given a buffer of Unicode data.
My questions are:
How would I go about parsing a Unicode string, say UTF-8?
Assuming I know the byte order, what happens when I encounter the beginning of a glyph that is supposed to be represented by 6 bytes?
That is, if I interpreted the method of storage correctly.
This is all related to a text display system I am designing to work with OpenGL.
I am storing glyph data in display lists and I need to translate the contents of a string to a sequence of glyph indexes, which are then mapped to display list indices (since, obviously, storing the entire glyph set in graphics memory is not always practical).
To have to represent every string as an array of shorts would require a significant amount of storage considering everything I have need to display.
Additionally, it seems to me that 2 bytes per character simply isn't enough to represent every possible Unicode element.
How would I go about parsing a Unicode string, say UTF-8?
I'm assuming that by "parsing", you mean converting to code points.
Often, you don't have to do that. For example, you can search for a UTF-8 string within another UTF-8 string without needing to care about what characters those bytes represent.
If you do need to convert to code points (UTF-32), then:
Check the first byte to see how many bytes are in the character.
Look at the trailing bytes of the character to ensure that they're in the range 80-BF. If not, report an error.
Use bit masking and shifting to convert the bytes to the code point.
Report an error if the byte sequence you got was longer than the minimum needed to represent the character.
Increment your pointer by the sequence length and repeat for the next character.
Additionally, it seems to me that 2
bytes per character simply isn't
enough to represent every possible
Unicode element.
It's not. Unicode was originally intended to be a fixed-with 16-bit encoding. It was later decided that 65,536 characters wasn't enough, so UTF-16 was created, and Unicode was redefined to use code points between 0 and 1,114,111.
If you want a fixed-width encoding, you need 21 bits. But they aren't many languages that have a 21-bit integer type, so in practice you need 32 bits.
Well, I think this answers it:
http://en.wikipedia.org/wiki/UTF-8
Why it didn't show up the first time I went searching, I have no idea.