I have this issue with the latest pg admin where postgres is automatically truncating function names less than 63 characters. I don't know if it's related to language or something else, but here's a function name I'm using:
"βρες_ασθενείς_μίας_μέρας_νοσηλευτή"
postgres truncates the name to:
"βρες_ασθενείς_μίας_μέρας_νοσηλευτ"
which is 33 characters.
Did the rules of max function name size change or is there something wrong with my preferences?
Thanks for your time.
"4.1.1. Identifiers and Key Words":
The system uses no more than NAMEDATALEN-1 bytes of an identifier; longer names can be written in commands, but they will be truncated. By default, NAMEDATALEN is 64 so the maximum identifier length is 63 bytes. If this limit is problematic, it can be raised by changing the NAMEDATALEN constant in src/include/pg_config_manual.h.
Note that it says 63 bytes, not characters. If you use UTF-8, your untruncated string is 64 bytes long, which is too long. The truncated string is 62 bytes long and fits.
Related
I understand that ASCII is a scheme of character encoding, where a Byte is assigned a certain decimal number, hexcode or a letter of our alphabet.
What I don't understand and couldn't find out via Google is how exactly the computer deals with ASCII behind the scenes. For instance when I write a text file with the text "hello world", what is the computer doing? Does it save the bytes in memory and where does the ASCII encoding come into play?
Almost anything that computers store on disk, transfer over the network or keep in their memory is handled as 8-bit chunks of data, called bytes.
Those bytes are just numbers. Anything between 0 and 255 *.
So a 100 byte file is just 100 numbers one after each other.
A network message is similar: it's just a bunch of numbers one after the other.
(We tend to abstract over those and call them something like "streams", because at some level it often doesn't matter if you read from a file on disk or receive a network message, they are fundamentally just finite streams of bytes).
If you want to display a file from the disk as text, something needs to convert those numbers to something meaningful for humans. Because if I tell you that a file contains the bytes 68 65 6c 6c 6f 20 77 6f 72 6c 64 0a, then chances are you don't really know what that means. (By the way, those are hex values which is already an interpretation, one could equivalently say that the file contains the decimal byte values 104, 101, 108, ...)
ASCII is a pattern of how to interpret those numbers. It tells you that 0x68 (decimal 104) represents the character h. And that 0x65 (decimal 101) represents e. And if you apply that mapping to those bytes you'll get hello world.
That decoding only has to happen when the computer wants to show the text to a user, because internally it doesn't care that 0x65 is h. So if the computer wants to display some text to you it looks up what letter 0x65 represents h, probably represented again via its Unicode codepoint which happens to be U+0065 and then it looks up how that character is represented in the font. The font then has a mapping of U+0065 to some instructions on how to draw the h.
And since we're talking about ASCII it should be mentioned that ASCII is not actually used an awful lot these days, mostly because it only supports a very limited set of characters (basically just barely enough to write English language text, and not even all of that). More commonly used encodings today are UTF-8 (which has the benefit of being ASCII compatible which means all valid ASCII text is also valid UTF-8 text, but not the other way around) and UTF-16. Other encodings that used to be popular, but are on the decline are the ISO-8859-* family (which are basically extended versions of ASCII, but still only support a small number of characters each).
* So technically even saying "those are numbers between 0 and 255" is already an interpretation. Technically they are 8 bits, each one of which can be off or on. Those can be interpreted as an unsigned number (0 to 255), a signed number (-128 to 127), a character (using something like the ASCII encoding) or potentially anything else you want. But the "unsigned number" interpretation is one of the most straightforward ones.
For instance when I write a text file with the text "hello world", what is the computer doing?
When you hit those keys on your keyboard, a certain protocol between the keyboard and computer lets the computer know which keys were hit. The computer translates that into a character, like "h", based on what keyboard layout is currently selected. It may also cause your video game character to move sideways or whatever else, there's no direct connection between a key and what it causes to happen. But let's say you're in a text editor and your computer interpreted your hitting the "h" key as "inputting the letter h". It now turns that into some internal, in-memory character representation. Often in-memory representations will be UTF-16 encoded bytes, so the computer can represent any and all possible Unicode characters.
When you hit File → Save as..., you select to store the file in ASCII encoding. The text editor now goes through the UTF-16 bytes stored in memory and converts them all into equivalent ASCII bytes, according to a UTF-16/Unicode → ASCII encoding table. Those bytes are stored on disk.
When you open that file again, the text editor reads those bytes from disk, probably turns them into its internal UTF-16 representation, and stores them in memory so you can edit the file. At this point you can typically think about each character as a character; it doesn't matter what bytes it's encoded as, that is abstracted away. An "h" is just an "h" at this point.
Each in-memory character is mapped to a glyph in a font, typically by its Unicode code point, to be able to display a graphical representation of it on screen for you.
I'm developing a function that needs to detect if a string is Unicode.
I get this string from an Access DB.
Now i'm analyzing every two bytes: If second is 00 then is Unicode, but not always is so; sometimes I've got a couple of bytes as &H2 &HA1.
How can i solve this problem?
Only the characters from 0 to 127 are "safe." ANSI character values from 128 to 255 have different meanings and character mappings in different locales.
For example, in the U.S. English locale:
Option Explicit
Private Sub Form_Load()
Dim S As String
S = "‰"
Debug.Print S, Asc(S), AscW(S)
End Sub
Produces:
‰ 137 8240
If the underlying data is primarily ASCII/ANSI, then your current check is enough. In 16-bit Unicode, such string data will have a majority of characters whose upper byte is 00. Not 100%, but an obvious majority. This won't occur in straight ANSI string data.
I'm reviewing for an exam right now and one of the review questions gives an answer that I'm not understanding.
A main memory location of a MIPS processor based computer contains the following bit pattern:
0 01111110 11100000000000000000000
a. If this is to be interpreted as a NULL-terminated string of ASCII characters, what is the string?
The answer that's given is "?p" but I'm not sure how they got that.
Thanks!
All ASCII characters are made up of 8 bits. So given your main memory location, we can break it up into a few bytes.
00111111
01110000
00000000
...
Null terminated strings are terminated with none other than... a null byte! (A byte with all zeros). So this means that your string contains two bytes that are ASCII characters. Byte 1 has a value of 63 and byte two has a value of 112. If you have a look at an ASCII chart like this one you'll see that 63 corresponds to '?' and 112 corresponds to 'p'.
On my NetBSD system, there is a password hash in master.passwd that looks like this:
$sha1$[5 numbers]$[8 letters]$[17 alpha numeric].[10 alpha numeric]
For privacy concerns I left out the actual values. Would someone be willing to explain the different parts of this? I was under the impression that SHA1 resulted in 20 bytes, so I was very confused about what part was the actual hash, and what part was the salt, and what part everything else was.
The relevant parts can be found in NetBSD src/lib/libcrypt.
For the format: crypt-sha1.c
The format of the encrypted password is:
$<tag>$<iterations>$<salt>$<digest>
where:
<tag> is "sha1"
<iterations> is an unsigned int identifying how many rounds
have been applied to <digest>. The number
should vary slightly for each password to make
it harder to generate a dictionary of
pre-computed hashes. See crypt_sha1_iterations.
<salt> up to 64 bytes of random data, 8 bytes is
currently considered more than enough.
<digest> the hashed password.
The digest is 160 bits = 20 bytes, but it is encoded using base64 (4 bytes for 3 source bytes) to 28 bytes (with one zero padding byte). See util.c for that.
What is the easiest way to shorten a base 64 string. e.g
PHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIKICAgICAgICAgICAgeG1sbnM6eG1wPSJodHRwOi8v
I just learned how to convert binary to base64. If I'm correct, groups of 24bits are made and groups of 6bits are used to create the 64 charcters A-Z a-z 0-9 +/
I was wondering is it possible to further shrink a base 64 string and make it smaller; I was hoping to reduce a 100 character base64 string to 20 or less characters.
A 100-character base64 string contains 600 bits of information. A base64 string contains 6 bits in each character and requires 100 characters to represent your data. It is encoded in US-ASCII (by definition) and described in RFC 4648. This is In order to represent your data in 20 characters you need 30 bits in each character (600/20).
In a contrived fashion, using a very large Unicode mapping, it would be possible to render a unified CJK typeface, but it would still require the minimum of about 40 glyphs (~75 bytes) to represent the data. It would also be really difficult to debug the encoding and be really prone to misinterpretation. Further, the purpose of base64 encoding is to present a representation that is not destroyed by broken intermediate systems. This would very likely not work with anything as obscure as a base2Billion encoding.