I am trying to fix Microsoft word smart quotes (and other word smart characters) that were inserted into some content due to copy/paste. While we are working on a permanent solution to this I am trying to create a script so we can fix the data as it becomes an issue.
To test it out I"m running the following query: select title from DigArticleArticle where ArticleId = 8249. This correctly retrieves our title, complete with the question mark due to the invalid character. To replace this I tried the following query:
select REPLACE(title, CHAR(8216), char(39)), Title from DigArticleArticle where ArticleID = 8249
This returns null as the first column. Why would my replace return null? Even if the character code isn't found it should still return the original string.
Try:
select REPLACE(title, NCHAR(8216), char(39)), Title from DigArticleArticle where ArticleID = 8249
As mentioned above CHAR() deals with ASCII characters (0-255). In this case Unicode version is needed, NCHAR() can deal with range 0-65535
From the MSDN Docs on the argument for char
CHAR ( integer_expression )
Arguments
integer_expression
Is an integer from 0 through 255. NULL is returned if the integer expression is not in this range.
8216 is larger than 255 so its null
For replace
Return Types
Returns nvarchar if one of the input arguments is of the nvarchar data type; otherwise, REPLACE returns varchar.
Returns NULL if any one of the arguments is NULL.
So you'll always get back null if char(8216) is an argument in replace
As per trekstuff's answer you should use nchar instead
Related
I am attempting to import a CSV into ADF however the file header is not the first line of the file. It is dynamic therefore I need to match it based on the first column (e.g "TestID,") which is a string.
Example Data (Header is on Line 4)
Date:,01/05/2022
Time:,00:30:25
Test Temperature:,25C
TestID,StartTime,EndTime,Result
TID12345-01,00:45:30,00:47:12,Pass
TID12345-02,00:46:50,00:49:12,Fail
TID12345-03,00:48:20,00:52:17,Pass
TID12345-04,00:49:12,00:49:45,Pass
TID12345-05,00:50:22,00:51:55,Fail
I found this article which addresses this issue however I am struggling to rewrite the expression from using an integer to using a string.
https://kromerbigdata.com/2019/09/28/adf-dynamic-skip-lines-find-data-with-variable-headers
First Expression
iif(!isNull(toInteger(left(toString(byPosition(1)),1))),toInteger(rownum),toInteger(0))
As the article states, this expression looks at the first character of each row and if it is an integer it will return the row number (rownum)
How do I perform this action for a string (e.g "TestID,")
Many Thanks
Jonny
I think you want to consider first line that starts with string as your header and preceding lines that starts with numbers should not be considered as header. You can use isNan function to check if the first character is Not a number(i.e. string) as seen in the below modified expression:
iif(isNan(left(toString(byPosition(1)),1))
,toInteger(rownum)
,toInteger(0)
)
Following is a breakdown of the above expression:
left(toString(byPosition(1)),1): gets first character fron left side of the first column.
isNan: checks if the character is "not a number".
iif: not a number, true then return rownum, false then return 0.
Or you can also use functions like isInteger() to check if the first character is an integer or not and perform actions accordingly.
Later on as explained in the cited article you need to find minimum rownum to skip.
Hope it helps.
This is not obviuos to me.
When I do:
SELECT MAX("SequenceNumber") FROM "TrackingEvent";
It returns perfectly fine with the correct result
When I do:
SELECT nextval(pg_get_serial_sequence("TrackingEvent", "SequenceNumber")) AS NextId
It returns an error which says
column "TrackingEvent" does not exist.
Not only is it wrong but the first argument of the function pg_get_serial_sequence takes a table name and not a column name, so the error is aslo misleading.
Anyways, can someone explain to me why I get an error on the pg_get_serial_sequence function ?
pg_get_serial_sequence() expects a string as its argument, not an identifier. String constants are written with single quotes in SQL, "TrackingEvent" is an identifier, 'TrackingEvent' is a string constant.
But because the function converts the string constant to an identifier, you need to include the double quotes as part of the string constant. This however only applies to the table name, not the column name, as explained in the manual
Because the first parameter is potentially a schema and table, it is not treated as a double-quoted identifier, meaning it is lower cased by default, while the second parameter, being just a column name, is treated as double-quoted and has its case preserved.
So you need to use:
SELECT nextval(pg_get_serial_sequence('"TrackingEvent"', 'SequenceNumber'))
This is another good example why using quoted identifiers is a bad idea. You should rename "TrackingEvent" to tracking_event and "SequenceNumber" to sequence_number
I am importing data into a Postgres database. The table I am importing to includes a couple of columns with dates.
The CSV file I am uploading, however, has empty values for some of the date fields.
The table looks like this:
dot_number bigint,
legal_name character varying,
dba_name character varying,
carrier_operation character varying,
hm_flag character varying,
pc_flag character varying,
...
mcs150_date date,
mcs150_mileage bigint,
The data looks like this:
1000045,"GLENN M HINES","","C","N","N","317 BURNT BROW RD","HAMMOND","ME","04730","US","317 BURNT BROW RD","HAMMOND","ME","04730","US","(207) 532-4141","","","19-NOV-13","20000","2012","23-JAN-02","ME","1","2"
1000050,"ROGER L BUNCH","","C","N","N","108 ST CHARLES CT","GLASGOW","KY","42141","US","108 ST CHARLES CT","GLASGOW","KY","42141","US","(270) 651-3940","","","","72000","2001","23-JAN-02","KY","1","1"
I have tried doing this:
COPY CC FROM 'C:\Users\Owner\Documents\FMCSA Data\FMCSA_CENSUS1_2016Sep.txt' DELIMITER ',' CSV HEADER NULL '';
But I get this error:
ERROR: invalid input syntax for type date: "" CONTEXT: COPY cc, line
24, column mcs150_date: ""
********** Error **********
ERROR: invalid input syntax for type date: "" SQL state: 22007
Context: COPY cc, line 24, column mcs150_date: ""
This is probably pretty simple, but none of the solutions I've found online did not work.
You need to specify the QUOTE character so that "" would be interpreted as NULL, like so:
COPY CC FROM 'C:\Users\Owner\Documents\FMCSA Data\FMCSA_CENSUS1_2016Sep.txt' DELIMITER ',' CSV HEADER QUOTE '"' NULL '';
QUOTE '"' was the addition.
Docs: https://www.postgresql.org/docs/current/static/sql-copy.html
I ended up importing as text and then altering the tables according to the correct type.
Just for any future reference.
Docs:https://www.postgresql.org/docs/current/sql-copy.html
says,
NULL
Specifies the string that represents a null value. The default is \N
(backslash-N) in text format, and an unquoted empty string in CSV
format. You might prefer an empty string even in text format for
cases where you don't want to distinguish nulls from empty strings.
This option is not allowed when using binary format.
so remove the quote on the empty string to obtain an NULL value on these empty date values.
Just for future reference, the issue here was probably the date format of the not-null date values. It's common for an MS Excel file saved to CSV to have that format, 01-JUL-16, but PostgreSQL will not know what to do with it unless you've first converted it to one of the standard date formats[1]. But PostgreSQL won't be able to accept that format "out of the box" when doing a COPY, because it'll be presented with a date string that doesn't match one of the format masks that it can handle by default.
That, AND the null handling for null date values.
[1] (and perhaps dealt with the consequences of having a 2-digit year upfront, particularly that years prior to 1969 will be interpreted as 20xx).
I'm trying to clean out excessive trailing zeros, I used the following query...
UPDATE _table_ SET _column_=trim(trailing '00' FROM '_column_');
...and I received the following error:
ERROR: column "_column_" is of
expression is of type text.
I've played around with the quotes since that usually is what it barrels down to for text versus numeric though without any luck.
The CREATE TABLE syntax:
CREATE TABLE _table_ (
id bigint NOT NULL,
x bigint,
y bigint,
_column_ numeric
);
You can cast the arguments from and the result back to numeric:
UPDATE _table_ SET _column_=trim(trailing '00' FROM _column_::text)::numeric;
Also note that you don't quote column names with single quotes as you did.
Postgres version 13 now comes with the trim_scale() function:
UPDATE _table_ SET _column_ = trim_scale(_column_);
trim takes string parameters, so _column_ has to be cast to a string (varchar for example). Then, the result of trim has to be cast back to numeric.
UPDATE _table_ SET _column_=trim(trailing '00' FROM _column_::varchar)::numeric;
Another (arguably more consistent) way to clean out the trailing zeroes from a NUMERIC field would be to use something like the following:
UPDATE _table_ SET _column_ = CAST(to_char(_column_, 'FM999999999990.999999') AS NUMERIC);
Note that you would have to modify the FM pattern to match the maximum expected precision and scale of your _column_ field. For more details on the FM pattern modifier and/or the to_char(..) function see the PostgreSQL docs here and here.
Edit: Also, see the following post on the gnumed-devel mailing list for a longer and more thorough explanation on this approach.
Be careful with all the answers here. Although this looks like a simple problem, it's not.
If you have pg 13 or higher, you should use trim_scale (there is an answer about that already). If not, here is my "Polyfill":
DO $x$
BEGIN
IF count(*)=0 FROM pg_proc where proname='trim_scale' THEN
CREATE FUNCTION trim_scale(numeric) RETURNS numeric AS $$
SELECT CASE WHEN trim($1::text, '0')::numeric = $1 THEN trim($1::text, '0')::numeric ELSE $1 END $$
LANGUAGE SQL;
END IF;
END;
$x$;
And here is a query for testing the answers:
WITH test as (SELECT unnest(string_to_array('1|2.0|0030.00|4.123456000|300000','|'))::numeric _column_)
SELECT _column_ original,
trim(trailing '00' FROM _column_::text)::numeric accepted_answer,
CAST(to_char(_column_, 'FM999999999990.999') AS NUMERIC) another_fancy_one,
CASE WHEN trim(_column_::text, '0')::numeric = _column_ THEN trim(_column_::text, '0')::numeric ELSE _column_ END my FROM test;
Well... it looks like, I'm trying to show the flaws of the earlier answers, while just can't come up with other testcases. Maybe you should write more, if you can.
I'm like short syntax instead of fancy sql keywords, so I always go with :: over CAST and function call with comma separated args over constructs like trim(trailing '00' FROM _column_). But it's a personal taste only, you should check your company or team standards (and fight for change them XD)
I have variable l_string varchar 2(100).
The value of string is l_string='L000EW45UY678IOP'.
I want to get number from this string like
00045678
Please tell me how to get the above string result.
Google the TRANSLATE function.
It will let you replace the alphabetic characters by nothing, leaving the digits behind.
try this:
regexp_replace('L000EW45UY678IOP', '[^0-9]', '')