I am still relatively new to pgSQL after switching away from mySQL completely.
I am trying to find the character limitations, if any, that pgSQL may or may not have. Specifically I am curious if there is a character limit on the following?
Database Name Length (mySQL is 64 characters)
Username Length (mySQL is 16 characters)
Password Length
I've been searching Google, I've read the pgSQL FAQ, and a few random other posts but I haven't found a solid answer to any of these. Perhaps pgSQL does not have these limitations like mySQL does. If anyone could shed some light on this that would be great!
I am currently using pgSQL 9.3.1
The length for any identifier is limited to 63 characters:
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
By default, NAMEDATALEN is 64 so the maximum identifier length is 63 bytes
As username and database name are identifiers, that limit should apply to them.
I'm not aware of any length limitation on passwords (although I'm sure there is one).
Related
I'm investigating a bandwidth problem and stumbled on an issue with retrieving a bytea value. I tested this with PostgreSQL 10 and 14, the respective psql clients and the psycopg2 client library.
The issue is that if the size of a bytea value is eg. 10 MB (I can confirm by doing select length(value) from table where id=1), and I do select value from table where id=1, then the amount of data transferred over the socket is about 20MB. Note that the value in the database is pre-compressed (so high entropy), and the table is set to not compress the bytea value to avoid double work.
I can't find any obvious encoding issue since it's all just bytes. I can understand that the psql CLI command may negotiate some encoding so it can print the result, but psycopg2 definitely doesn't do that, and I experience the same behaviour.
I tested the same scenario with a text field, and that nearly worked as expected. I started with copy paste of lorem ipsum and it transferred the correct amount of data, but when I changed the text to be random extended ASCII values (higher entropy again), it transferred more data than it should have. I have compression disabled for all my columns so I don't understand why that would happen.
Any ideas as to why this would happen?
That is normal. By default, values are transferred as strings, so a bytea would be rendered in hexadecimal numbers, which doubles its size.
As a workaround, you could transfer such data in binary mode. The frontend-backend protocol and the C library offer support for that, but it will depend on your client API whether you can make use of that or not.
Currently I use:
The utf8mb4 database character set.
The utf8mb4_unicode_520_ci database collation.
I understand that utf8mb4 supports up to four bytes per character. I also understand that Unicode is a standard that continues to get updates. In the past I thought utf8 was sufficient until I had some test data get corrupted, lesson learned. However I'm having difficulty understanding the upgrade path for both the character set and collations.
The utf8mb4_unicode_520_ci database collation is based off of Unicode Collation Algorithm version 5.2.0. If you navigate to the parent directory you'll see up to version 14.0 listed at the time of typing this. Now those are the Unicode standards, then there is the supported MariaDB character sets and collations.
Offhand I'm not sure when the need to go from four bytes per character gets superseded to go to eight bytes per character or even 16 so it's not as simple a measure of just updating the database collation. Additionally I'm not seeing anything that seems to be newer than version 5.2.0 on MariaDB's documentation.
So in short my three highly related questions are:
Are the newer collations such as version 14 still fully compatible with four byte characters or have they exhausted all combinations and now require up to eight or 16 bytes per character?
What is the latest database collation that MariaDB supports in regards to Unicode versions?
In regards to the second question, a newer version than 5.2.0 is supported by MariaDB then is utf8mb4 still sufficient for a character set or not?
I am not bound to or care about MySQL compatibility.
You can inspect the collations currently supported by your MariaDB instance:
SELECT * FROM INFORMATION_SCHEMA.COLLATIONS
WHERE CHARACTER_SET_NAME = 'utf8mb4';
As far as I know, MariaDB does not support any UTF-8 collation version more current than utf8_unicode_520ci. If you try to use the '900' version, for example importing metadata from MySQL to MariaDB, you get errors.
There is no such thing as an 8-byte or 16-byte UTF-8 encoding. UTF-8 is an encoding that uses between 1 and 4 bytes per character, no more than that.
MariaDB also supports utf16 and utf32, but neither of these uses more than 4 bytes per character. Utf16 is variable-length, using one or two 16-bit code units per character. Utf32 is fixed-width, always using 32-bits (4 bytes) per character.
When I pick a name, it says it should be less than 128 characters long. Is there somewhere where I can set mongodb to accept larger names? It would make life easier.
No. You can't set anything to allow a namespace <database>.<collection> to be longer than 128 characters. I agree with Philipp here, that having this limit be a problem hints that you might not be using collections as intended.
Does anyone know of a simple chart or list that would show all acceptable varchar characters? I cannot seem to find this in my googling.
What codepage? Collation? Varchar stores characters assuming a specific codepage. Only the lower 127 characters (the ASCII subset) is standard. Higher characters vary by codepage.
The default codepage used matches the collation of the column, whose defaults are inherited from the table,database,server. All of the defaults can be overriden.
In sort, there IS no "simple chart". You'll have to check the character chart for the specific codepage, eg. using the "Character Map" utility in Windows.
It's far, far better to use Unicode and nvarchar when storing to the database. If you store text data from the wrong codepage you can easily end up with mangled and unrecoverable data. The only way to ensure the correct codepage is used, is to enforce it all the way from the client (ie the desktop app) to the application server, down to the database.
Even if your client/application server uses Unicode, a difference in the locale between the server and the database can result in faulty codepage conversions and mangled data.
On the other hand, when you use Unicode no conversions are needed or made.
using jdbc (jt400) to insert data into an as400 table.
db table code page is 424. Host Code Page 424
the ebcdic 424 code page does not support many of the characters that may come from the client.
for example the sign → (Ascii 26 Hex 1A)
the result is an incorrect translation.
is there any built-in way in the toolbox to remove any of the unsupported characters?
You could try to create a logical file over your ccsid424 physical file with a different codepage. It is possible on the as/400 to create logical files with different codepages for individual columns, by adding the keyword CCSID(<num>). You can even set it to an unicode charset, e.g. CCSID(1200) for UTF-16. Of course your physical file will still only be able to store chars that are in the 424 codepage, and those will be replaced by some invalid character char, but the translation might be better that way.
There is no way to store chars, that are not in codepage 424 in a column with that codepage directly (the only way I can think of is encoding them somehow with multiple chars, but that is most likely not what you want to do, since it will bring more problems than it "solves").
If you have control over that system, and it is possible to do some bigger changes, you could do it the other way around: create a new unicode version of that physical file with a different name (I'd propose CCSID(1200), that's as close as you get to UTF-16 on as/400 afaik, and UTF-8 is not supported by all parts of the system in my experience. IBM does recommend 1200 for unicode). Than transfer all data from your old file to the new one, delete the old one (before that, backup it!), and than create a logical file over the new physical, with the name of the old physical file. In that logical file change all ccsid-bearing columns from 1200 to 424. That way, existing programs can still work on the data. Of course there will be invalid chars in the logical file now, once you insert data that is not in a subset of ccsid 424; so you will most likely have to take a look at all programs that use the new logical file.