The dump function in Oracle displays the internal representation of data:
DUMP returns a VARCHAR2 value containing the data type code, length in bytes, and internal representation of expr
Fore example:
SELECT DUMP(cast(1 as number ))
2 FROM DUAL;
DUMP(CAST(1ASNUMBER))
--------------------------------------------------------------------------------
Typ=2 Len=2: 193,2
SQL> SELECT DUMP(cast(1.000001 as number ))
2 FROM DUAL;
DUMP(CAST(1.000001ASNUMBER))
--------------------------------------------------------------------------------
Typ=2 Len=5: 193,2,1,1,2
It shows that the first 1 uses 2 byte for storing and the second example uses 5 bytes for storing.
I suppose the similar function in PostgreSQL is pg_typeof but it returns only the type name without information about byte usage:
SELECT pg_typeof(33);
pg_typeof
integer (1 row)
Does anybody know if there is an equivalent function in PostgreSQL?
I don't speak PostgreSQL.
However, Oracle functionality page says that there's Orafce which
implements in Postgres some of the functions from the Oracle database that are missing (or behaving differently)
It, furthermore, mentions the dump function
dump (anyexpr [, int]): Returns a text value that includes the datatype code, the length in bytes, and the internal representation of the expression
One of examples looks like this:
postgres=# select pg_catalog.dump('Pavel Stehule',10);
dump
-------------------------------------------------------------------------
Typ=25 Len=17: 68,0,0,0,80,97,118,101,108,32,83,116,101,104,117,108,101
(1 row)
To me, it looks like Oracle's dump:
SQL> select dump('Pavel Stehule') result from dual;
RESULT
--------------------------------------------------------------
Typ=96 Len=13: 80,97,118,101,108,32,83,116,101,104,117,108,101
SQL>
I presume you'll have to visit GitHub and install the package to see whether you can use it or not.
It is not a complete equivalent, but if you want to figure out the byte values used to encode a string in PostgreSQL, you can simply cast the value to bytea, which will give you the bytes in hexadecimal:
SELECT CAST ('schön' AS bytea);
This will work for strings, but not for numbers.
Related
How can I set numeric display format in psql?
I have many character varying and double precision columns in the select query. I can use to_char() or round() on each of the numeric columns, but with many columns this makes the code too repetitive.
Is there a shortcut? For example, are there any psql session settings, something like \pset numeric '9.9EEEEE' (I just made it up, don't try to use it). I could not quickly find such settings in the manual: PostgreSQL: Documentation: psql.
Example:
-- Got this:
=# select bar from (values (123456789), (0.123456789), (0.000000123456789)) table_foo (bar);
bar
-------------------
123456789
0.123456789
0.000000123456789
-- I have to use this workaround:
=# select to_char(bar, '9.9EEEEE') as bar from (values (123456789), (0.123456789), (0.000000123456789)) table_foo (bar);
bar
----------
1.2e+08
1.2e-01
1.2e-07
-- I want this:
-- set some session settings, and then magically:
=# select bar from (values (123456789), (0.123456789), (0.000000123456789)) table_foo (bar);
bar
----------
1.2e+08
1.2e-01
1.2e-07
There is not other possibility to specify number of format then specification of numeric locale. Values are formatted on server side already, psql does formatting to final output format (table, html, csv), but the values are formatted, and psql doesn't try to do reformat (numericlocale is an exception). I remember long discussion about possibility to specify output format of boolean type, but still without actual results. psql has not strong functionality for generating rich reports. It should be fast, safe and robust interactive client. Although I would to have some stronger possibility in this area, the modern spreadsheets and reporting tools are for some tasks better.
You could always define your own formatting function, but you'd have to be doing a lot of formatting on a lot of columns for this to be considered an improvement in readability. I wouldn't recommend it for your toy example but if your real queries have dozens of columns it might be neater.
sophia=> create function x(numeric) returns text language sql as $$ select to_char($1, '9.9EEEEE') $$;
CREATE FUNCTION
sophia=> select x(bar) from (values (123456789), (0.123456789), (0.000000123456789)) table_foo (bar);
x
----------
1.2e+08
1.2e-01
1.2e-07
(3 rows)
sophia=>
I have a database with one column of the type nvarchar. If I write
INSERT INTO table VALUES ("玄真")
It shows ¿¿ in the table. What should I do?
I'm using SQL Developer.
Use single quotes, rather than double quotes, to create a text literal and for a NVARCHAR2/NCHAR text literal you need to prefix it with N
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( value NVARCHAR2(20) );
INSERT INTO table_name VALUES (N'玄真');
Query 1:
SELECT * FROM table_name
Results:
| VALUE |
|-------|
| 玄真 |
First, using NVARCHAR might not even be necessary.
The 'N' character data types are for storing data that doesn't 'fit' in the database's defined character set. There's an auxiliary character set defined as the NCHAR Character set. It's kind of a band aid - once you create a database it can be difficult to change its character set. Moral of this story - take great care in defining the Character Set when creating your database and do not just accept the defaults.
Here's a scenario (LiveSQL) where we're storing a Chinese string in both NVARCHAR and VARCHAR2.
CREATE TABLE SO_CHINESE ( value1 NVARCHAR2(20), value2 varchar2(20 char));
INSERT INTO SO_CHINESE VALUES (N'玄真', '我很高興谷歌翻譯。' )
select * from SO_CHINESE;
Note that both the character sets are in the Unicode family. Note also I told my VARCHAR2 string to hold 20 characters. That's because some characters may require up to 4 bytes to be stored. Using a definition of (20) would give you only room to store 5 of those characters.
Let's look at the same scenario using SQL Developer and my local database.
And to confirm the character sets:
SQL> clear screen
SQL> set echo on
SQL> set sqlformat ansiconsole
SQL> select *
2 from database_properties
3 where PROPERTY_NAME in
4 ('NLS_CHARACTERSET',
5 'NLS_NCHAR_CHARACTERSET');
PROPERTY_NAME PROPERTY_VALUE DESCRIPTION
NLS_NCHAR_CHARACTERSET AL16UTF16 NCHAR Character set
NLS_CHARACTERSET AL32UTF8 Character set
First of all, you should to establish the Chinese character encoding on your Database, for example
UTF-8, Chinese_Hong_Kong_Stroke_90_BIN, Chinese_PRC_90_BIN, Chinese_Simplified_Pinyin_100_BIN ...
I show you an example with SQL Server 2008 (Management Studio) that incorporates all of this Collations, however, you can find the same characters encodings in other Databases (MySQL, SQLite, MongoDB, MariaDB...).
Create Database with Chinese_PRC_90_BIN, but you can choose other Coallition:
Select a Page (Left Header) Options > Collation > Choose the Collation
Create a Table with the same Collation:
Execute the Insert Statement
INSERT INTO ChineseTable VALUES ('玄真');
I am trying to run a CSV import using the COPY command for some data that includes a guillemet (»). Redshift complains that the column value is too long for the varchar column I have defined. The error in the "Loads" tab in the Redshift GUI displays this character as two dots: .. - had it been treated as one, it would have fit in the varchar column. It's not clear whether there is some sort of conversion error occurring or if there is a display issue.
When trying to do plain INSERTs I run into strange behavior as well:
dev=# create table test (name varchar(3));
CREATE TABLE
dev=# insert into test values ('bla');
INSERT 0 1
3 characters treated as 4?
dev=# insert into test values ('bl»');
ERROR: value too long for type character varying(3)
dev=# insert into test values ('b»');
INSERT 0 1
Why does char_length return 2?
dev=# select char_length(name), name from test;
char_length | name
-------------+------
2 | b»
I've checked the client encoding and database encodings and those all seem to be UTF8/UNICODE.
You need to increase the length of your varchar field. Multibyte characters use more than one character and length in the definition of varchar field are byte based. So, your special char might be taking more than a byte. If it still doesn't work refer to the doc page for Redshift below,
http://docs.aws.amazon.com/redshift/latest/dg/multi-byte-character-load-errors.html
I have a table with a bytea field, and it would be convenient if I could do queries via the command line (or pgAdmin's query executor). I've got the hex value as a string. Is there a built in function to convert hex to bytea?
I'd like to do something like:
SELECT * FROM table WHERE my_bytea_field=some_???_function('fa26e312');
where 'fa26e312' is the hex value of the bytea field I want.
Note: this is just to be helpful while I'm developing / debugging things, I can do it via code but I'd like to be able to do it by hand in a query.
Try using built-in decode(string text, type text) function (it returns bytea). You can run queries via CLI using psql in non-interactive mode, that is with -c switch (there are some formatting options if you like):
psql -c "SELECT * FROM table WHERE my_bytea_field=decode('fa26e312', 'hex');"
Example:
CREATE TABLE test(id serial, my_bytea_field bytea);
INSERT INTO test (my_bytea_field) VALUES
(E'\\320\\170'::bytea),
(E'\\100\\070'::bytea),
(E'\\377\\377'::bytea);
psql -tc "SELECT * FROM test WHERE my_bytea_field=decode('ffff', 'hex');"
3 | \377\377
SELECT * FROM table WHERE my_bytea_field=E'\\xfa26e312';
Just as in the example in the Binary Data Types docs (note the E'\\x' prefix):
SELECT E'\\xDEADBEEF';
under psql console, you can see its bytea content by default, such as:
also you can query like this (such as in pg_admin) :
xxx=# select id, encode(code_hash, 'hex') from your_table where id = 1195;
id | encode
------+------------------------------------------------------------------
1195 | c5e5dcf215925f7ef4dfaf5f4b4f105bc321c02776d6e7d52a1db3fcd9d011a4
I want to format long numbers using thousand separator. It can be done using to_char function just like:
SELECT TO_CHAR(76543210.98, '999G999G990D00')
But when my PostgreSQL server with UTF-8 encoding is on Polish version of Windows such SELECT ends with:
ERROR: invalid byte sequence for encoding "UTF8": 0xa0
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
In to_char pattern G is described as: group separator (uses locale).
This SELECT works without error when server is running on Linux with Polish locale.
As a workaround I use space instead of G in format string, but I think there should be way to set thousand separator just like in Oracle:
ALTER SESSION SET NLS_NUMERIC_CHARACTERS=', ';
Is such setting available for PostgreSQL?
If you use psql, you can execute this:
\pset numericlocale
Example:
test=# create temporary table a (a numeric(20,10));
CREATE TABLE
test=# insert into a select random() * 1000000 from generate_series(1,3);
INSERT 0 3
test=# select * from a;
a
-------------------
287421.6944910590
140297.9311533270
887215.3805568810
(3 rows)
test=# \pset numericlocale
Showing locale-adjusted numeric output.
test=# select * from a;
a
--------------------
287.421,6944910590
140.297,9311533270
887.215,3805568810
(3 rows)
I'm pretty sure the error message is literally true: 0xa0 isn't a valid UTF-8 character.
My home server is running PostgreSQL on Windows XP, SP3. I can do this in psql.
sandbox=# show client_encoding;
client_encoding
-----------------
UTF8
(1 row)
sandbox=# show lc_numeric;
lc_numeric
---------------
polish_poland
(1 row)
sandbox=# SELECT TO_CHAR(76543210.98, '999G999G990D00');
to_char
-----------------
76 543 210,98
(1 row)
I don't get an error message, but I get garbage for the separator. Could this be a code page issue?
As a workaround I use space instead of
G in format string
Let's think about this. If you use a space, then on a web page the value might split at the end of a line or at the boundary of a table cell. I'd think a nonbreaking space might be a better choice.
And, in Unicode, a nonbreaking space is 0xa0. In Unicode, not in UTF8. (That is, 0xa0 can't be the first byte of a UTF8 character. See UTF-8 Bit Distribution.)
Another possibility is that your client is expecting one byte order, and the server is giving it a different byte order. Since the numbers are single-byte characters, the byte order wouldn't matter until, well, it mattered. If the client is expecting a big endian MB character, and it got a little endian MB character beginning with 0xa0, I'd expect it to die with the error message you saw. I'm not sure I have a way to test this before I go to work today.