T-SQL implicit conversion between 2 varchars - tsql

I have some T-SQL (SQL Server 2008) that I inherited and am trying to find out why some of queries are running really slow. In the Actual Execution Plan I have three clustered index scans which are costing me 19%, 21% and 26%, so this seems to be the source of my problem.
The contents of the fields are usually numeric (but some job numbers have an alpha prefix)
The database design (vendor supplied) is pretty poor. The max length of a job number in their application is 12 chars, but in the tables that are joined it is defined as varchar(50) in some places and varchar(15) in others. My parameter is a varchar(12), but I get same thing if I change it to a varchar(50)
The node contains this:
Predicate: [Live_Costing].[dbo].[TSTrans].[JobNo] as [sts1].[JobNo]=CONVERT_IMPLICIT(varchar(50),[#JobNo],0)
sts1 is a derived table, but the table it pulls jobno from is a varchar(50)
I don't understand why it's doing an implicit conversion between 2 varchars. Is it just because they are different lengths?
I'm fairly new to the execution plan
Is there an easy way to figure out which node in the exec plan relates to which part of the query?
Is the predicate, the join clause?
Regards
Mark

Some variables can have collation: enter link description here
Regardless you need to verify your collations, which can be specified at server, DB, table, and column level.
First, check your collation between tempdb and the vendor supplied database. It should match. If it doesn't, it will tend to do implicit conversions.
Assuming you cannot modify the vendor supplied code base, one or more of the following should help you:
1) Predefine your temp tables and specify the same collation for the key field as in the db in use, rather than tempdb.
2) Provide collations when doing string comparisons.
3) Specify collation for key values if using "select into" with a temp table
4) Make sure your collations on your tables and columns match your database collation (VERY important if you imported only specific tables from a vendor into an existing database.)
If you can change the vendor supplied code base, I would suggest reviewing the cost for making all of your char keys the same length and NOT varchar. Varchar has an overhead of 10. The caveat is that if you create a fixed length character field not null, it will be padded to the right (unavoidable).
Ideally, you would have int keys, and only use varchar fields for user interaction/lookup:
create table Products(ProductID int not null identity(1,1) primary key clustered, ProductNumber varchar(50) not null)
alter table Products add constraint uckProducts_ProductNumber unique(ProductNumber)
Then do all joins on ProductID, rather than ProductNumber. Just filter on ProductNumber.
would be perfectly fine.

Related

Is there efficient difference between varchar and int as PK

Could somebody tell is it good idea use varchar as PK. I mean is it less efficient or equal to int/uuid?
In example: car VIN I want to use it as PK but I'm not sure as good it will be indexed or work as FK or maybe there is some pitfalls.
It depends on which kind of data you are going to store.
In some cases (I would say in most cases) it is better to use integer-based primary keys:
for instance, bigint needs only 8 bytes, varchar can require more space. For this reason, a varchar comparison is often more costly than a bigint comparison.
while joining tables it would be more efficient to join them using integer-based values rather that strings
an integer-based key as a unique key is more appropriate for table relations. For instance, if you are going to store this primary key in another tables as a separate column. Again, varchar will require more space in other table too (see p.1).
This post on stackexchange compares non-integer types of primary keys on a particular example.

Does PostgreSQL create an internal key (probably an int type) as primary key for a table without a primary key specified?

From https://stackoverflow.com/a/40597571/3284469
If you don't specify a primary key, RDBMS will help you choose an unique and non-null key, OR create an internal key (probably an int type) as primary key for this table.
Could you give some examples for the "OR" case, where a RDBMS (PostgreSQL in particular, and possibly also MySQL or SQL Server) create an "internal key (probably an int type) as primary key" for a table without a primary key specified?
Does PostgreSQL have something similar to MySQL?
Thanks.
for Postgres:
From "5.4. System Columns":
oid
The object identifier (object ID) of a row. This column is only present if the table was created using WITH OIDS, or if the default_with_oids configuration variable was set at the time. This column is of type oid (same name as the column); see Section 8.18 for more information about the type.
and
ctid
The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore `ctid is useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows.
Both come close to what you're searching for but have restrictions as you can read in the documentation. So, as the manual states, using a user-defined PK is the better choice.
for SQL Server:
There is the undocumented pseudo column %%physloc%%. It describes the physical location of a row. That, however, might be subject to change if the row gets physically moved for whatever reason. And it's undocumented, that is, its behavior might change any time between releases or even just patches or it might be removed completely without further notice. So using a user-defined PK is the better choice here either.

Create big integer from the big end of a uuid in PostgreSQL

I have a third-party application connecting to a view in my PostgreSQL database. It requires the view to have a primary key but can't handle the UUID type (which is the primary key for the view). It also can't handle the UUID as the primary key if it is served as text from the view.
What I'd like to do is convert the UUID to a number and use that as the primary key instead. However,
SELECT x'14607158d3b14ac0b0d82a9a5a9e8f6e'::bigint
Fails because the number is out of range.
So instead, I want to use SQL to take the big end of the UUID and create an int8 / bigint. I should clarify that maintaining order is 'desirable' but I understand that some of the order will change by doing this.
I tried:
SELECT x(substring(UUID::text from 1 for 16))::bigint
but the x operator for converting hex doesn't seem to like brackets. I abstracted it into a function but
SELECT hex_to_int(substring(UUID::text from 1 for 16))::bigint
still fails.
How can I get a bigint from the 'big end' half of a UUID?
Fast and without dynamic SQL
Cast the leading 16 hex digits of a UUID in text representation as bitstring bit(64) and cast that to bigint. See:
Convert hex in text representation to decimal number
Conveniently, excess hex digits to the right are truncated in the cast to bit(64) automatically - exactly what we need.
Postgres accepts various formats for input. Your given string literal is one of them:
14607158d3b14ac0b0d82a9a5a9e8f6e
The default text representation of a UUID (and the text output in Postgres for data type uuid) adds hyphens at predefined places:
14607158-d3b1-4ac0-b0d8-2a9a5a9e8f6e
The manual:
A UUID is written as a sequence of lower-case hexadecimal digits, in
several groups separated by hyphens, specifically a group of 8 digits
followed by three groups of 4 digits followed by a group of 12 digits,
for a total of 32 digits representing the 128 bits.
If input format can vary, strip hyphens first to be sure:
SELECT ('x' || translate(uuid_as_string, '-', ''))::bit(64)::bigint;
Cast actual uuid input with uuid::text.
db<>fiddle here
Note that Postgres uses signed integer, so the bigint overflows to negative numbers in the upper half - which should be irrelevant for this purpose.
DB design
If at all possible add a bigserial column to the underlying table and use that instead.
This is all very shaky, both the problem and the solution you describe in your self-answer.
First, a mismatch between a database design and a third-party application is always possible, but usually indicative of a deeper problem. Why does your database use the uuid data type as a PK in the first place? They are not very efficient compared to a serial or a bigserial. Typically you would use a UUID if you are working in a distributed environment where you need to "guarantee" uniqueness over multiple installations.
Secondly, why does the application require the PK to begin with (incidentally: views do not have a PK, the underlying tables do)? If it is only to view the data then a PK is rather useless, particularly if it is based on a UUID (and there is thus no conceivable relationship between the PK and the rest of the tuple). If it is used to refer to other data in the same database or do updates or deletes of existing data, then you need the exact UUID and not some extract of it because the underlying table or other relations in your database would have the exact UUID. Of course you can convert all UUID's with the same hex_to_int() function, but that leads straight back to my point above: why use uuids in the first place?
Thirdly, do not mess around with things you have little or no knowledge of. This is not intended to be offensive, take it as well-meant advice (look around on the internet for programmers who tried to improve on cryptographic algorithms or random number generation by adding their own twists of obfuscation; quite entertaining reads). There are 5 algorithms for generating UUID's in the uuid-ossp package and while you know or can easily find out which algorithm is used in your database (the uuid_generate_vX() functions in your table definitions, most likely), do you know how the algorithm works? The claim of practical uniqueness of a UUID is based on its 128 bits, not a 64-bit extract of it. Are you certain that the high 64-bits are random? My guess is that 64 consecutive bits are less random than the "square root of the randomness" (for lack of a better way to phrase the theoretical drop in periodicity of a 64-bit number compared to a 128-bit number) of the full UUID. Why? Because all but one of the algorithms are made up of randomized blocks of otherwise non-random input (such as the MAC address of a network interface, which is always the same on a machine generating millions of UUIDs). Had 64 bits been enough for randomized value uniqueness, then a uuid would have been that long.
What a better solution would be in your case is hard to say, because it is unclear what the third-party application does with the data from your database and how dependent it is on the uniqueness of the "PK" column in the view. An approach that is likely to work if the application does more than trivially display the data without any further use of the "PK" would be to associate a bigint with every retrieved uuid in your database in a (temporary) table and include that bigint in your view by linking on the uuids in your (temporary) tables. Since you can not trigger on SELECT statements, you would need a function to generate the bigint for every uuid the application retrieves. On updates or deletes on the underlying tables of the view or upon selecting data from related tables, you look up the uuid corresponding to the bigint passed in from the application. The lookup table and function would look somewhat like this:
CREATE TEMPORARY TABLE temp_table(
tempint bigserial PRIMARY KEY,
internal_uuid uuid);
CREATE INDEX ON temp_table(internal_uuid);
CREATE FUNCTION temp_int_for_uuid(pk uuid) RETURNS bigint AS $$
DECLARE
id bigint;
BEGIN
SELECT tempint INTO id FROM temp_table WHERE internal_uuid = pk;
IF NOT FOUND THEN
INSERT INTO temp_table(internal_uuid) VALUES (pk)
RETURNING tempint INTO id;
END IF;
RETURN id;
END; $$ LANGUAGE plpgsql STRICT;
Not pretty, not efficient, but fool-proof.
Use the bit() function to parse a decimal number from hex literal built from a substr of the UUID:
select ('x'||substr(UUID, 1, 16))::bit(64)::bigint
See SQLFiddle
Solution found.
UUID::text will return a string with hyphens. In order for substring(UUID::text from 1 for 16) to create a string that x can parse as hex the hyphens need to be stripped first.
The final query looks like:
SELECT hex_to_int(substring((select replace(id::text,'-','')) from 1 for 16))::bigint FROM table
The hext_to_int function needs to be able to handle a bigint, not just int. It looks like:
CREATE OR REPLACE FUNCTION hex_to_int(hexval character varying)
RETURNS bigint AS
$BODY$
DECLARE
result bigint;
BEGIN
EXECUTE 'SELECT x''' || hexval || '''::bigint' INTO result;
RETURN result;
END;
$BODY$`

How to ensure the accuracy of aggregated data in PostgreSQL table?

I have two PostgreSQL tables - table A contains individual client's credit movements records (increase / decrease) and Table B contains data of aggregated table A. Simplified structure of the tables (I removed FK and rules):
CREATE TABLE "public"."credit_review" (
"id" SERIAL,
"client_id" INTEGER NOT NULL,
"credit_change" INTEGER DEFAULT 0 NOT NULL,
"itime" TIMESTAMP(0) WITH TIME ZONE DEFAULT now()
) WITHOUT OIDS;
CREATE TABLE "public"."credit_review_aggregated" (
"id" SERIAL,
"credit_amount" INT DEFAULT 0 NOT NULL,
"valid_to_review_id" INT NOT NULL,
"client_id" INTEGER NOT NULL,
"itime" TIMESTAMP(0) WITH TIME ZONE DEFAULT now()
) WITHOUT OIDS;
Column "credit_review_aggregated.valid_to_review_id" is FK to "credit_review.id".
Because it is very important to have data in aggregation table correct I'm looking for a way of ensuring this need. It occurred to me:
Disable deleting and udpating records in both tables
On aggregated table create trigger to check if the entered data are correct (and if not, don't allow insert). I don't like it too much because when a record is inserted into aggregation tables credit_amount value will be counted twice (once in application a second time in the trigger).
Do you have some advice for me how to ensure this situation?
I'm not entirely clear on the invariant you're trying to enforce, but from the general outlines of the problem, I would be inclined to use trigger code to enforce it, and ues SERIALIZABLE transactions. Enforcing invariants across multiple tables becomes very tricky very quickly otherwise.
http://wiki.postgresql.org/wiki/SSI
Full disclosure: Because my employer needed to enforce complex integrity rules across multiple tables, I worked on adding SSI to PostgreSQL, along with Dan R.K. Ports of MIT.

SQLlite3 Update statment has no effect

I'm no SQL expert, but have a bug in an iPhone app where and UPDATE statement has no effect on the db.
I have been using the SQLlite manger plugin for FireFox to try dbug by repeatedly amending and running the UPDATE on the db. I also ran the statement thorough and SQL Validator which said it complied to the core SQL standard.
Can you spot anything wrong with the statement given below?
UPDATE sections
SET
title = 'What is acne ? ABC',
text = 'Pus on your face',
created = '2010-03-10 18:46:55',
modified = '2011-07-04 17:38:44',
position = 1,
condition_id = 4
WHERE id = 10;
There is some confusion and inconsistency in the way SQLite and the various implementations by Mozilla, Google, Adobe, and others handle numeric primary keys in databases whose tables were created outside of these implementations and where the primary keys were defined as an integer type but not as "INTEGER" [verbatim] -- that is, they were defined as INT or INT16 or INT32 etc.
INTEGER PRIMARY KEY in mothership SQLite is an alias for the rowid.
INT PRIMARY KEY in mothership SQLite is not an alias for the rowid.
A consortium member (or any implementor) may or may not follow this rule. (SQLite is in the public domain, of course.)
See section 2.0 here: http://www.sqlite.org/datatypes.html
and see section on RowId and Primary Key here: http://www.sqlite.org/lang_createtable.html#rowid
A PRIMARY KEY column only becomes an integer primary key if the
declared type name is exactly "INTEGER". Other integer type names like
"INT" or "BIGINT" or "SHORT INTEGER" or "UNSIGNED INTEGER" causes the
primary key column to behave as an ordinary table column with integer
affinity and a unique index, not as an alias for the rowid. [emphasis
added]
An implementor who does not follow the rule might not have even been aware that they were breaking the rule in the first place, since it is a "gotcha" arcane sort of rule. Anyway, what this means practically is that one implementation may treat a supplied value as an alias for the rowid and another implementation might not. If given the value 10, one might retrieve the tuple whose rowid = 10 and one might retrieve the tuple where the specified column's value = 10. This, of course, leads to spurious results in queries -- and they might look like perfectly good and plausible results but they are dead wrong.
Consider the following simple test: using flagship SQLite's utilities, not those provided by one of the implementors, execute the following DDL and DML statements; then, in your implementation, open the database and execute the DML statements again to compare the DML results:
CREATE TABLE TEST
("id" INT PRIMARY KEY, "name" text) -- ** NOTE "INT" not "INTEGER"
INSERT INTO TEST
(id, name)
VALUES
(7,'seven')
** *** N.B. THE ROWID OF THE ROW INSERTED ABOVE = 1 *** **
select rowid, id, name from test
result: 1 | 7 | seven
select * from TEST
result: 7 | seven
select * from TEST where id = 7
result: ????? [ymmv]
select * from TEST where id = 1
result: ????? [ymmv]
Depending on how the specific implementation treats an INT primary key the third select statement above (select * from TEST where id = 7) may return one row or it may return nothing!
If the implementation treats the INT PK as an alias for the row id, well, there is no row whose rowid = 7, and so it will return nothing. If the implementation treats the INT PK as a normal value, it will find the row.
Now, if you were to insert more rows into table TEST, you would eventually create a row whose rowid = 7. In one of these wayward implementations, when you use this where-clause -- where id = 7 --- you might think you were addressing the tuple whose id=7, but you'd actually be addressing the tuple whose rowid=7. You would get the wrong tuple and you might not realize it. Consider the possibilities when joining a child table to a parent table: the child table contains foreign key value of 7. What tuple does an inner join return from the parent table? It depends on whether the implementation honors the distinction between INT and INTEGER primary keys.
Last year, I documented this thoroughly for Adobe AIR, BTW, and also reported it on the SQLite news group. It is possible that some implementations have changed the behavior in the interim.
When creating SQLite tables, it is best to use INTEGER [verbatim] for primary keys, not any of the other recognized int types.
If your query is correct then You need to make sure about 2 things.
Have you written finalize_statement like this ?
sqlite3_finalize(selectStatement);
2.If you are testing in simulator. Are you sure you are checking database update in following path ?
/user/Libary/Application Support/iPhone Simulator/Your_Version_Number/Applications/YOUR_APPLICATION_GUID/Documents
Hope this help.
The only thing in the query that strikes me as worthy of further investigation is the question mark in the value for [title]. Remove it and see if that changes anything. Maybe it's being incorrectly parsed somewhere along the way as a parameter placeholder.