Converting bytea back to varchar - postgresql

In Postgres when I want to save a varchar to a bytea column, this is made easy by an implicit conversion. So I can simply execute
UPDATE my_table SET my_bytea_col = 'This varchar will be converted' WHERE id = 1;
I use this all the time. However, I would like to occasionally see the contents of this column as a varchar. IDEs will handle this for you, but I would prefer in my use case to return the results with the bytea converted back to a varchar.
Of course I've tried something like this, among more complex options:
select my_bytea_col::VARCHAR from my_table WHERE id = 1
This, however, doesn't return my original readable text. How else can I convert my bytea back to the original varchar after postgres's implicit conversion in updates and inserts like the one above?

If the string encoding is UTF-8, you could use
SELECT convert_from(my_bytea_col, 'UTF8')
FROM my_table
WHERE id = 1;
If the encoding is different, you need to supply the appropriate second argument (e.g. LATIN1) to convert_from.
May I remark that I consider it not a good idea to store text strings as bytea?

Related

Postgres truncates trailing zeros for timestamps

Postgres (V11.3, 64bit, Windows) truncates trailing zeros for timestamps. So if I insert the timestamp '2019-06-12 12:37:07.880' into the table and I read it back as text postgres returns '2019-06-12 12:37:07.88'.
Table date_test:
CREATE TABLE public.date_test (
id SERIAL,
"timestamp" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
CONSTRAINT pkey_date_test PRIMARY KEY(id)
)
SQL command when inserting data:
INSERT INTO date_test (timestamp) VALUES( '2019-06-12 12:37:07.880' )
SQL command to retrieve data:
SELECT dt.timestamp ::TEXT FROM date_test dt
returns '2019-06-12 12:37:07.88'
Do you consider this a bug or a feature?
My real issue is: I´m running queries from a C++ program and I have to convert the data returned from the database to appropriate data types. Since the protocol is text-based everything I read from the database is plain text. When parsing timestamps I first tokenize the string and then convert each token to integer. And because the millisecond part is truncated, the last token is "88" instead of "880", and converting "88" yields another value that converting "880" to integer.
That's the default display format when using a cast to text.
If you want to see all three digits, use to_char()
SELECT to_char(dt.timestamp,'yyyy-mm-d hh24:mi:ss.ms')
FROM date_test dt;
will return 2019-06-12 12:37:07.880
It’s a matter of presentation only.
First note that 07.88 seconds and 07.880 seconds is the same amount of time (also 7.88 and 07.880000000 for that matter).
PostgreSQL internally represents a timestamp in a way that we shouldn’t be concerned about as long as it’s an unambiguous representation. When you retrieve the timestamp, it is formatted into some string. This is where PostgreSQL apparently chooses not to print redundant trailing zeros. So it’s probably not even correct to say that it truncates anything. It just refrains from generating that 0.
I think that the nice solution would be to modify your parser in C++ to accept any number of decimals and parse them correctly with and without trailing zeroes. Another solution that should work is given in the answer by a_horse_with_no_name.

Date/time formatting for table creation

I am creating a table that will be populated with a COPY. Here's the format of that data:
6/30/2014 2:33:00 PM
MM-DD-YYYY HH:MM:SS ??
What would I use as the formatting for the CREATE TABLE statement?
CREATE TABLE practice (
Data_Time ????
)
One alternative might be to read as varchar() then format later. Seems convoluted tho.
Always store timestamps as timestamp (or timestamptz).
Never use string types (text, varchar, ...) for that.
CREATE TABLE practice (
practice_id serial PRIMARY KEY
, data_time timestamp NOT NULL
);
If your timestamp literals are clean and follow the standard MDY format, you can set the DateStyle temporarily for the transaction to read proper timestamp types directly:
BEGIN;
SET LOCAL datestyle = 'SQL, MDY'; -- works for your example
COPY practice (data_time) FROM '/path/to/file.csv';
COMMIT;
Else, your idea is not that bad: COPY to a temporary table with a text column, sanitize the data and INSERT timestamps from there possibly using to_timestamp(). Example:
Formatting Date(YY:MM:DD:Time) in Excel
You should pretty much never use vharchar() in postgres. Always use text. But it sounds to me like you want 2 columns
create table practice (date_time timestamp, format text)

Show all numeric rows or vice-versa postgresql

I have a table named "temp_table" and a column named "temp_column" of type varchar. The problem is "temp_column" must be of type integer. If I will just automatically update the table into type integer, it will generate an error since some data has non-numeric data in it.
I want a query that will show all rows if "temp_column" has non-numeric values in it (or the other way around) and update or SET the value accordingly. I'm having a hard time since ISNUMERIC is not available in postgresql.
how to do this?
This will show all rows where you have non-integer values in that column. It uses a regular expression to find all values that have anything else than just numbers in it:
select *
from temp_table
where temp_column ~ '[^0-9]';
this can also be used in an update statement:
update temp_table
set temp_column = null
where temp_column ~ '[^0-9]';
This will also filter out "numeric" values like 3.14 as those aren't integers.

How to convert PostgreSQL escape bytea to hex bytea?

I got the answer to check for one certain BOM in a PostgreSQL text column. What I really like to do would be to have something more general, i.e. something like
select decode(replace(textColumn, '\\', '\\\\'), 'escape') from tableXY;
The result of a UTF8 BOM is:
\357\273\277
Which is octal bytea and can be converted by switching the output of bytea in pgadmin:
update pg_settings set setting = 'hex' WHERE name = 'bytea_output';
select '\357\273\277'::bytea
The result is:
\xefbbbf
What I would like to have is this result as one query, e.g.
update pg_settings set setting = 'hex' WHERE name = 'bytea_output';
select decode(replace(textColumn, '\\', '\\\\'), 'escape') from tableXY;
But that doesn't work. The result is empty, probably because the decode cannot handle hex output.
If the final purpose is to get the hexadecimal representation of all the bytes that constitute the strings in textColumn, this can be done with:
SELECT encode(convert_to(textColumn, 'UTF-8'), 'hex') from tableXY;
It does not depend on bytea_output. BTW, this setting plays a role only at the final stage of a query, when a result column is of type bytea and has to be returned in text format to the client (which is the most common case, and what pgAdmin does). It's a matter of representation, the actual values represented (the series of bytes) are identical.
In the query above, the result is of type text, so this is irrelevant anyway.
I think that your query with decode(..., 'escape') can't work because the argument is supposed to be encoded in escape format and it's not, per comments it's normal xml strings.
With the great help of Daniel-Vérité I use this general query now to check for all kind of BOM or unicode char problems:
select encode(textColumn::bytea, 'hex'), * from tableXY;
I had problem with pgAdmin and too long columns, as they had no result. I used that query for pgAdmin:
select encode(substr(textColumn,1,100)::bytea, 'hex'), * from tableXY;
Thanks Daniel!

TSQL Prefixing String Literal on Insert - Any Value to This, or Redundant?

I just inherited a project that has code similar to the following (rather simple) example:
DECLARE #Demo TABLE
(
Quantity INT,
Symbol NVARCHAR(10)
)
INSERT INTO #Demo (Quantity, Symbol)
SELECT 127, N'IBM'
My interest is with the N before the string literal.
I understand that the prefix N is to specify encoding (in this case, Unicode). But since the select is just for inserting into a field that is clearly already Unicode, wouldn't this value be automatically upcast?
I've run the code without the N and it appears to work, but am I missing something that the previous programmer intended? Or was the N an oversight on his/her part?
I expect behavior similar to when I pass an int to a decimal field (auto-upcast). Can I get rid of those Ns?
Your test is not really valid, try something like a Chinese character instead, I remember if you don't prefix it it will not insert the correct character
example, first one shows a question mark while the bottom one shows a square
select '作'
select N'作'
A better example, even here the output is not the same
declare #v nvarchar(50), #v2 nvarchar(50)
select #v = '作', #v2 = N'作'
select #v,#v2
Since what you look like is a stock table why are you using unicode, are there even symbols that are unicode..I have never seen any and this includes ISIN, CUSIPS and SEDOLS
Yes, SQL Server will automatically convert (widen, cast down) varchar to nvarchar, so you can remove the N in this case. Of course, if you're specifying a string literal where the characters aren't actually present in the database's default collation, then you need it.
It's like you can suffix a number with "L" in C et al to indicate it's a long literal instead of an int. Writing N'IBM' is either being precise or a slave to habit, depending on your point of view.
One trap for the unwary: nvarchar doesn't get automatically converted to varchar, and this can be an issue if your application is all Unicode and your database isn't. For example, we had this with the jTDS JDBC driver, which bound all parameter values as nvarchar, resulting in statements effectively like this:
select * from purchase where purchase_reference = N'AB1234'
(where purchase_reference was a varchar column)
Since the automatic conversions are only one way, that became:
select * from purchase where CONVERT(NVARCHAR, purchase_reference) = N'AB1234'
and therefore the index of purchase_reference wasn't used.
By contrast, the reverse is fine: if purchase_reference was an nvarchar, and an application passed in a varchar parameter, then the rewritten query:
select * from purchase where purchase_reference = CONVERT(NVARCHAR, 'AB1234')
would be fine. In the end we had to disable binding parameters as Unicode, hence causing a raft of i18n problems that were considered less serious.