TSQL Prefixing String Literal on Insert - Any Value to This, or Redundant? - tsql

I just inherited a project that has code similar to the following (rather simple) example:
DECLARE #Demo TABLE
(
Quantity INT,
Symbol NVARCHAR(10)
)
INSERT INTO #Demo (Quantity, Symbol)
SELECT 127, N'IBM'
My interest is with the N before the string literal.
I understand that the prefix N is to specify encoding (in this case, Unicode). But since the select is just for inserting into a field that is clearly already Unicode, wouldn't this value be automatically upcast?
I've run the code without the N and it appears to work, but am I missing something that the previous programmer intended? Or was the N an oversight on his/her part?
I expect behavior similar to when I pass an int to a decimal field (auto-upcast). Can I get rid of those Ns?

Your test is not really valid, try something like a Chinese character instead, I remember if you don't prefix it it will not insert the correct character
example, first one shows a question mark while the bottom one shows a square
select '作'
select N'作'
A better example, even here the output is not the same
declare #v nvarchar(50), #v2 nvarchar(50)
select #v = '作', #v2 = N'作'
select #v,#v2
Since what you look like is a stock table why are you using unicode, are there even symbols that are unicode..I have never seen any and this includes ISIN, CUSIPS and SEDOLS

Yes, SQL Server will automatically convert (widen, cast down) varchar to nvarchar, so you can remove the N in this case. Of course, if you're specifying a string literal where the characters aren't actually present in the database's default collation, then you need it.
It's like you can suffix a number with "L" in C et al to indicate it's a long literal instead of an int. Writing N'IBM' is either being precise or a slave to habit, depending on your point of view.
One trap for the unwary: nvarchar doesn't get automatically converted to varchar, and this can be an issue if your application is all Unicode and your database isn't. For example, we had this with the jTDS JDBC driver, which bound all parameter values as nvarchar, resulting in statements effectively like this:
select * from purchase where purchase_reference = N'AB1234'
(where purchase_reference was a varchar column)
Since the automatic conversions are only one way, that became:
select * from purchase where CONVERT(NVARCHAR, purchase_reference) = N'AB1234'
and therefore the index of purchase_reference wasn't used.
By contrast, the reverse is fine: if purchase_reference was an nvarchar, and an application passed in a varchar parameter, then the rewritten query:
select * from purchase where purchase_reference = CONVERT(NVARCHAR, 'AB1234')
would be fine. In the end we had to disable binding parameters as Unicode, hence causing a raft of i18n problems that were considered less serious.

Related

Alphanumeric sorting without any pattern on the strings [duplicate]

I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14
One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort
The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

Converting bytea back to varchar

In Postgres when I want to save a varchar to a bytea column, this is made easy by an implicit conversion. So I can simply execute
UPDATE my_table SET my_bytea_col = 'This varchar will be converted' WHERE id = 1;
I use this all the time. However, I would like to occasionally see the contents of this column as a varchar. IDEs will handle this for you, but I would prefer in my use case to return the results with the bytea converted back to a varchar.
Of course I've tried something like this, among more complex options:
select my_bytea_col::VARCHAR from my_table WHERE id = 1
This, however, doesn't return my original readable text. How else can I convert my bytea back to the original varchar after postgres's implicit conversion in updates and inserts like the one above?
If the string encoding is UTF-8, you could use
SELECT convert_from(my_bytea_col, 'UTF8')
FROM my_table
WHERE id = 1;
If the encoding is different, you need to supply the appropriate second argument (e.g. LATIN1) to convert_from.
May I remark that I consider it not a good idea to store text strings as bytea?

SQL WHERE clause not functional with string

I am trying to run a query that has a where clause with a string from a column of type VARCHAR(50) through PHP, yet for some reason it does not work in either PHP or MySQLWorkbench. My database looks like:
Database Picture:
The table title is 'paranoia' where the column 'codename' is VARCHAR(50) and 'target' is VARCHAR(50). The query I am trying to run takes the form, when searching for a codename entry clearly named '13Brownie' with no spaces, as follows:
UPDATE paranoia SET target='sd' WHERE codename='13Brownie'
Yet for some reason passing a string to the argument for codename is ineffective. The WHERE clause works when I do codename=7 or codename=3 and returns those respective integer codenames, and I can do codename=0 to get all the other lettered codenames. The string input works in neither MySQLWorkbench or the PHP script I will be using to update such selected rows, but again the integer input does.
It seems like the WHERE clause is only taking the integer values of my string input or the column is actually made up of the integer values of each entry, but the column codename is clearly defined as VARCHAR(50). I have been searching for hours but to no avail.
It is likely that there are white-space characters in the data. Things to try:
SELECT * FROM paranoia WHERE codename like '13%'
SELECT * FROM paranoia WHERE codename = '13Brownie '
SELECT codename, LEN(codename) FROM paranoia
VARCHAR(10) is a valid type to accept a string of at most 10 characters. I think this can possibly happen because of a foreign key constraint enforced with another table. check if you have this constraint using the "relation view" if you are on phpmyadmin.

Postgresql not truncating overlength strings

According to the documentation, strings longer than that specified by character varying or VARCHAR should be truncated:
If one explicitly casts a value to character varying(n) or character(n), then an over-length value will be truncated to n characters without raising an error. (This too is required by the SQL standard.)
but I can not get it to work. Now the documentation does say one has to "explicitly" cast a value to character varying so maybe I am missing that. Below is a simple test table:
create table test1(
tval character varying(20));
where the following fails with ERROR: value too long for type character varying(20)
insert into test1 values
('this is a super long string that we want to see if it is really truncated');
How can I get this to work?
This won't truncate, because it's just an assignment:
create table test1(tval character varying(20));
insert into test1 values ('this is a super long string that we want to see if it is really truncated');
but this will, because it's an explicit cast:
insert into test1 values (CAST('this is a super long string that we want to see if it is really truncated' AS varchar(20)));
To get truncation behaviour you must use an explicit cast, and frankly I wish the SQL standard didn't specify that.
The better way to handle this is to be explicit about what you want:
insert into test1 values (left('this is a super long string that we want to see if it is really truncated', 20));
There is another solution, not to specify the n when creating the column:
If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.)

how to get db2 without any appended values

select rtrim(char(PKG_AGR_IDR)),rtrim(char(STA_DTE))
from test FETCH FIRST 10 ROW ONLY
"0010000010. 2014-03-14"
"0010000010. 2014-03-14"
I need data as below:
0010000010 2014-03-14
I am planning to write a script to do rtrim(char(fieldname)) is there any combination of functions with which i can get proper output for both fields.
One might presume that the OP might have been written more like the following, to better describe the scenario:
Some background about what is being done will be included, such that later references [such as to field_name] will be previously-explained rather than having to be intuited by a reviewer.
The intention is to enable dynamically generating an SQL SELECT statement that will retrieve a character-representation of the data from the columns of a specified TABLE. Given the DDL create table THE_SCHEMA.TEST ( PKG_AGR_IDR NUMERIC(10, 0), STA_DTE DATE ) and given the following DML used to populate that TABLE with a sample-row insert into THE_SCHEMA.TEST VALUES(10000010. '2014-03-14'), what is desired is to obtain a result-set [limited to the first ten rows for the purpose of testing] that would include the data from each column [of the TABLE named TEST in THE_SCHEMA] as VARCHAR data, as produced from the following query that would have been generated from the metadata stored in the SYSCOLUMNS catalog VIEW:
select rtrim(char(PKG_AGR_IDR)),rtrim(char(STA_DTE))from testFETCH FIRST 10 ROW ONLY
The single expression generated as 'RTRIM(CHAR(' CONCAT COLUMN_NAME CONCAT '))' from the SYSCOLUMNS data, as seen twice in the query noted just prior, seems unable to provide desirable results when applied to a column-name irrespective the value of the DATA_TYPE of the COLUMN_NAME being formatted by that character-expression. Specifically, for example, the result of the dynamically generated query select RTRIM(CHAR(PKG_AGR_IDR)), RTRIM(CHAR(STA_DTE)) from THE_SCHEMA.TEST FETCH FIRST 10 ROW ONLY produces the following output:
0010000010. 2014-03-14
However the expected\desired output would be:
0010000010 2014-03-14
Is there any expression like RTRIM(CHAR(column_name)) that will function for all the columns in a TABLE, to obtain the data as character-string, regardless the data-type of the columns, whether they be numeric, varchar or date?
Yet even with that more complete description of the scenario\background:
The claims about what is the output from the original expression are unexpected from the CHAR scalar effecting Decimal to Character casting, at least for the DB2 for i SQL for which the zero-scale packed decimal (DECIMAL) and zoned decimal (NUMERIC) SQL data types are represented without a decimal separator [aka decimal point] despite the optional decimal-character as the second argument. As well the CHAR scalar omits leading zeroes when casting from numeric. Thus the DB2 for i SQL would have obtained a result of the string '10000010' rather than either of '0010000010.' or '10000010.'
I suppose the issue may be specific to the DB2 for Z or the DB2 LUW, and perhaps this topic was incorrectly tagged with DB2 for i? Or perhaps there may be a[n unstated] concern about an apparent incompatibility betwixt the DB2 variants? Yet having read the documentation, the described results seem contrary to what is documented, so I suspect the actual problem for the OP may be due to having encountered a defect [in whatever is the unstated variant of the DB2 and release level that is being used].?
I do not expect that there will be any one expression that will perform what is desired for each of NUMERIC, VARCHAR, and DATE [nor for each of INTEGER, SMALLINT, NUMERIC, DECIMAL, VARCHAR, and DATE]. For omission of the decimal point, the DB2 for i SQL is probably the most like what is expressed as desired, but then the leading zeroes are always trimmed http://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/db2/rbafzscachar.htm
... Leading zeros are not returned. Trailing zeros are returned. If the scale of decimal-expression is zero, the decimal character is not returned. ...
The DB2 LUW SQL seems at least somewhat incoherent with regard to the topic of leading zeroes, as example 6 suggests none and then example 7 shows they are there, but like the above doc reference, clearly there should be no leading zero characters http://www.ibm.com/support/knowledgecenter/SSEPGG_10.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000777.html
... Leading zeros are not included. Trailing zeros are included. ... If the scale of decimal-expression is zero, the decimal character is not returned. ...
I did not research a DB2 for Z doc link.
I would expect that the solution will entail using a CASE expression, perhaps for the DATA_TYPE value. That is what I did coding something similar, though I just used VARCHAR casting scalar and did not do any trimming. However my requirement for CASE was not about keeping leading zero characters, instead mostly for choosing the correct decimal-separator character. And because the second argument decimal-character [for CHAR or VARCHAR] is disallowed for the INTEGER numeric types [sqlcode -171 aka SQL0171], the CASE expression for just the numeric types would be sufficiently resolved using just the following expression CASE WHEN DATA_TYPE IN ('INTEGER', 'SMALLINT', 'BIGINT') THEN ', ' concat DecSep concat ')' ELSE ')' appended to the 'VARCHAR(' concat where DecSep was the one-character variable having either the comma or period as the chosen decimal separator. Yet because the second argument [for CHAR or VARCHAR] is specific to the data type of the first argument, the character and date\time data types had their own CASE expression CASE WHEN DATA_TYPE IN ('DATE', 'TIME') THEN ', ' concat StdFmt concat ')' ELSE ')' appended to the 'VARCHAR(' concat where StdFmt was the three-character variable having one of the standards format specifications of ISO, USA, EUR, or JIS.
Not sure what you are asking. Remove double quotes? remove dot?
You can do a substr by providing the first and last position and also concatenate the two values.
select substr(trim(PKG_AGR_IDR), 2, 11) || ' ' || trim(char(STA_DTE))
from test FETCH FIRST 10 ROW ONLY