Should bytea used for binary operations in psql?? I am trying to do binary operations & column with datatype bytea!!
Eg select * from orders where order_val & {bitMask} >0;
where order_val will be my binary strings stored in bytea, and bitMask is generated by the user query!.
Related
The dump function in Oracle displays the internal representation of data:
DUMP returns a VARCHAR2 value containing the data type code, length in bytes, and internal representation of expr
Fore example:
SELECT DUMP(cast(1 as number ))
2 FROM DUAL;
DUMP(CAST(1ASNUMBER))
--------------------------------------------------------------------------------
Typ=2 Len=2: 193,2
SQL> SELECT DUMP(cast(1.000001 as number ))
2 FROM DUAL;
DUMP(CAST(1.000001ASNUMBER))
--------------------------------------------------------------------------------
Typ=2 Len=5: 193,2,1,1,2
It shows that the first 1 uses 2 byte for storing and the second example uses 5 bytes for storing.
I suppose the similar function in PostgreSQL is pg_typeof but it returns only the type name without information about byte usage:
SELECT pg_typeof(33);
pg_typeof
integer (1 row)
Does anybody know if there is an equivalent function in PostgreSQL?
I don't speak PostgreSQL.
However, Oracle functionality page says that there's Orafce which
implements in Postgres some of the functions from the Oracle database that are missing (or behaving differently)
It, furthermore, mentions the dump function
dump (anyexpr [, int]): Returns a text value that includes the datatype code, the length in bytes, and the internal representation of the expression
One of examples looks like this:
postgres=# select pg_catalog.dump('Pavel Stehule',10);
dump
-------------------------------------------------------------------------
Typ=25 Len=17: 68,0,0,0,80,97,118,101,108,32,83,116,101,104,117,108,101
(1 row)
To me, it looks like Oracle's dump:
SQL> select dump('Pavel Stehule') result from dual;
RESULT
--------------------------------------------------------------
Typ=96 Len=13: 80,97,118,101,108,32,83,116,101,104,117,108,101
SQL>
I presume you'll have to visit GitHub and install the package to see whether you can use it or not.
It is not a complete equivalent, but if you want to figure out the byte values used to encode a string in PostgreSQL, you can simply cast the value to bytea, which will give you the bytes in hexadecimal:
SELECT CAST ('schön' AS bytea);
This will work for strings, but not for numbers.
I have a database with one column of the type nvarchar. If I write
INSERT INTO table VALUES ("玄真")
It shows ¿¿ in the table. What should I do?
I'm using SQL Developer.
Use single quotes, rather than double quotes, to create a text literal and for a NVARCHAR2/NCHAR text literal you need to prefix it with N
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( value NVARCHAR2(20) );
INSERT INTO table_name VALUES (N'玄真');
Query 1:
SELECT * FROM table_name
Results:
| VALUE |
|-------|
| 玄真 |
First, using NVARCHAR might not even be necessary.
The 'N' character data types are for storing data that doesn't 'fit' in the database's defined character set. There's an auxiliary character set defined as the NCHAR Character set. It's kind of a band aid - once you create a database it can be difficult to change its character set. Moral of this story - take great care in defining the Character Set when creating your database and do not just accept the defaults.
Here's a scenario (LiveSQL) where we're storing a Chinese string in both NVARCHAR and VARCHAR2.
CREATE TABLE SO_CHINESE ( value1 NVARCHAR2(20), value2 varchar2(20 char));
INSERT INTO SO_CHINESE VALUES (N'玄真', '我很高興谷歌翻譯。' )
select * from SO_CHINESE;
Note that both the character sets are in the Unicode family. Note also I told my VARCHAR2 string to hold 20 characters. That's because some characters may require up to 4 bytes to be stored. Using a definition of (20) would give you only room to store 5 of those characters.
Let's look at the same scenario using SQL Developer and my local database.
And to confirm the character sets:
SQL> clear screen
SQL> set echo on
SQL> set sqlformat ansiconsole
SQL> select *
2 from database_properties
3 where PROPERTY_NAME in
4 ('NLS_CHARACTERSET',
5 'NLS_NCHAR_CHARACTERSET');
PROPERTY_NAME PROPERTY_VALUE DESCRIPTION
NLS_NCHAR_CHARACTERSET AL16UTF16 NCHAR Character set
NLS_CHARACTERSET AL32UTF8 Character set
First of all, you should to establish the Chinese character encoding on your Database, for example
UTF-8, Chinese_Hong_Kong_Stroke_90_BIN, Chinese_PRC_90_BIN, Chinese_Simplified_Pinyin_100_BIN ...
I show you an example with SQL Server 2008 (Management Studio) that incorporates all of this Collations, however, you can find the same characters encodings in other Databases (MySQL, SQLite, MongoDB, MariaDB...).
Create Database with Chinese_PRC_90_BIN, but you can choose other Coallition:
Select a Page (Left Header) Options > Collation > Choose the Collation
Create a Table with the same Collation:
Execute the Insert Statement
INSERT INTO ChineseTable VALUES ('玄真');
The PostgreSQL types bytea and bit varying sound similar:
bytea stores binary strings.
bit varying stores strings of 1's and 0's.
The documentation does not mention a maximum size for either. Is it 1GB like character varying?
I have two separate use cases, both over a table with millions of rows:
Storing MD5 hashes
That would be a bytea with a length of 16 bytes or a bit(128). It would be used for:
Deduplication: Heavy use of GROUP BY, with an index I suppose.
Querying with WHERE md5 = for exact matches only.
Displaying as a hex string for human use.
Storing arbitrary binary data
Strings of binary data of varying length up to 4kB for:
Bitwise operations to find the strings matching a certain mask. Example at the end of this post.
Extracting some bytes, for instance get the integer value of the byte 14 in my string.
Some deduplication.
Working example for the bitwise operation, using bit varying. The mask is X'00FF00' and the it returns only the row X'AAAAAA'. I shortened the strings for the example but it would be over their full length, up to 4kB. Is it possible to do something similar with bytea?
CREATE TABLE test1 (mystring bit varying);
INSERT INTO test1 VALUES (X'AAAAAA'), (X'ABCABC');
SELECT * FROM test1 WHERE mystring & X'00FF00' = X'00AA00';
Which of bytea and bit varying is the more appropriate?
I saw the UUID type is made to store exactly 16 bytes, would that be any advantage to store the MD5's?
In general, if you're not using bitwise operations you should be using bytea.
I store larger values in bytea and then convert substrings to bit varying for bitwise operations where possible, mostly because clients understand bytea much more consistently than bit varying and the I/O format is more compact.
MD5 values should be stored as bytea. Bitwise operations on them make no sense, and you generally want to fetch them as binary.
I think bit varying really has two uses:
To store flags fields that are literally bit strings; and
As an interim data type for internal calculations
For pretty much everything else, use bytea.
There's nothing stopping you storing a 4k bitfield if that's what it is, though.
It appears the maximum length of bytea is 1 GB. [1]
For bitwise operation use bit varying (explanation see below)
For storing MD5 hash use bytea. It will take less storage than bit varying
The benefit using UUID is UUID algorithm somehow guarantees your uniqueness, not only in your table, but also in your database or even across your database (even if you generate UUID in your application). I think if you are using UUID without dashes it will be more efficient for storing, comparing and sorting in UUID (comparison between bytea and UUID see below).
For bitwise operation use bit varying
If you concern about storage:
bit varying takes more storage than bytea. If you are okay then you should try comparing the function they both offer:
bit varying
vs
bytea
So far I can see bit varying will be more suitable for you to do bitwise operation though bytea is generally accepted way to store arbitrary data.
PostgreSQL offers a single bytea operator: concatenation. You can append one byte value to another bytea value using the concatenation operator ||. [1]
Note that you cannot compare two bytea value, even for equality/inequality. You can, of course, convert bytea value into another value using the CAST(), and that opens up other operators. [1]
Comparison between UUID and bytea
create table u(uuid uuid primary key, payload character(300));
create table b( bytea bytea primary key, payload character(300));
INSERT INTO u
SELECT uuid_generate_v4()
FROM generate_series(1,1000*1000);
INSERT INTO b
SELECT random_bytea(16)
FROM generate_series(1,1000*1000);
VACUUM ANALYZE u;
VACUUM ANALYZE b;
## Your table size
SELECT pg_size_pretty(pg_total_relation_size('u'));
pg_size_pretty
----------------
81 MB
SELECT pg_size_pretty(pg_total_relation_size('b'));
pg_size_pretty
----------------
101 MB
## Speed comparison
\timing on
## Common select
select * from u limit 1000;
Time: 1.433 ms
select * from b limit 1000;
Time: 1.396 ms
## Random Select
SELECT * FROM u OFFSET random()*1000 LIMIT 10000;
Time: 42.453 ms
SELECT * FROM b OFFSET random()*1000 LIMIT 10000;
Time: 10.962 ms
Conclusion : I don't think there will be more benefit using UUID except its uniqueness and smaller size (will be faster to insert)
Note: No Index, there is only one connection
Some source :
PostgreSQL: "The Comprehensive Guide to Building, Programming, And Administratoring PostgreSQL Databases" Book
Take this sqlite3 interactive session:
sqlite> create table d(t date);
sqlite> insert into d values('1913-12-23');
sqlite> select t,typeof(t) from d;
1913-12-23|text
sqlite>
Why is the date field showing up as type text? Would storage space be smaller and queries be faster if it were numeric?
Sqlite3 datatype documentation shows that the affinity for date columns is numeric according to the table in §2.2. I would guess that date fields stored as numeric values as opposed to text would take up less space, so that would be ideal. I could very well not have a good grasp of sqlite yet, so my concerns may be invalid.
For easy copy/paste:
create table d(t date);
insert into d values('1913-12-23');
select t,typeof(t) from d;
The reason that typeof('1913-12-23') returns text is that '1913-12-23' is a string, and is stored in the table as a string.
Type affinity comes into play only when the conversion would be lossless; e.g., the value '123456' would be converted into a number.
SQLite does not have a native date data type.
To allow dates to be handled by the built-in date functions, you have to store them either as yyyy-mm-dd strings or as numbers (either Julian day numbers, or seconds since the Unix epoch).
Integers take up less storage than strings, so queries are faster if the DB's bottleneck is I/O, which is likely.
I'm new in postgreSQL
how to do this
select * from table_abc where table_abc.a>=7a and table_abc.b<=7a
all value is HEX in column a, b and input value
Thanks
EDIT :
table_abc
a bytea
b bytea
c text
Careful, here. In Postgres, bytea is a byte array. You look like you want to store a single byte in those columns.
I don't see a single-byte type in the list of datatypes at http://www.postgresql.org/docs/9.0/static/datatype.html.
You can go with an integer type. For example, when I say this:
select x'7A'::integer
I get 122.
If you intend to store a single byte in these columns and write your queries with hex values, then I suggest you make the columns integers and query like this:
select * from table_abc where table_abc.a>=x'7a'::integer and table_abc.b<=x'7a'::integer