How to specify on error behavior for postgresql conversion to UUID - postgresql

I need to write a query to join 2 tables based on UUID field.
Table 1 contains user_uuid of type uuid.
Table 2 has this user_uuid in the end of url, after the last slash.
The problem is that sometimes this url contains other value, not castable to uuid.
My workaround like this works pretty good.
LEFT JOIN table2 on table1.user_uuid::text = regexp_replace(table2.url, '.*[/](.*)$', '\1')
However i have a feeling that better solution would be to try to cast to uuid before joining.
And here i have a problem. Such query:
LEFT JOIN table2 on table1.user_uuid = cast (regexp_replace(table2.url, '.*[/](.*)$', '\1') as uuid)
gives ERROR: invalid input syntax for type uuid: "rfa-hl-21-014.html" SQL state: 22P02
Is there any elegant way to specify the behavior on cast error? I mean without tons of regexp checks and case-when-then-end...
Appreciate any help and ideas.

There are additional considerations when converting a uuid to text. Postgres will yield a converted value in standard form (lower case and hyphened). However there are other formats for the same uuid value that could occur in you input. For example upper case and not hyphened. As text these would not compare equal but as uuid they would. See demo here.
select *
from table1 t1
join table2 t2
on replace(t_uuid::text, '-','') = replace(lower(t2.t_stg),'-','') ;
Since your data clearly contains non-uuid values, you cannot assume standard uuid format either. There are also additional formats (although not apparently often used) for a valid UUID. You may want to review UUID Type documentation

You could cast the uuid from table 1 to text and join that with the suffix from table 2. That will never give you a type conversion error.
This might require an extra index on the expression in the join condition if you need fast nested loop joins.

Related

Snowflake invalid identifier when performin a join

I have been trying to do an outer join across two different tables in two different schemas. I am trying to filter out before from the table variants the character that are smaller than 4 and bigger than 5 digits. The join was not working with a simply where clause in the end, hence this decision.
The problem is if I do not put the quotes, Snowflake will say that I put invalid identifiers. However, when I run this with the quotes, it works but I get as values in the fields of the column raw.stitch_heroku.spree_variants.SKU only named as the column name, all across the table!
SELECT
analytics.dbt_lcasucci.product_category.product_description,
'raw.stitch_heroku.spree_variants.SKU'
FROM analytics.dbt_lcasucci.product_category
LEFT JOIN (
SELECT * FROM raw.stitch_heroku.spree_variants
WHERE LENGTH('raw.stitch_heroku.spree_variants.SKU')<=5
and LENGTH('raw.stitch_heroku.spree_variants.SKU')>=4
) ON 'analytics.dbt_lcasucci.product_category.product_id'
= 'raw.stitch_heroku.spree_variants.SKU'
Is there a way to work this around? I am confused and have not found this issue on forums yet!
thx in advance
firstly single quote define a string literal 'this is text' where as double quotes are table/column names "this_is_a_table_name"
add alias's to the tables makes the SQL more readable, and the duplicate length command can be reduced with a between, thus this should work better:
SELECT pc.product_description,
sp.SKU
FROM analytics.dbt_lcasucci.product_category AS PC
LEFT JOIN (
SELECT SKU
FROM raw.stitch_heroku.spree_variants
WHERE LENGTH(SKU) BETWEEN 4 AND 5
) AS sp
ON pc.product_id = sp.SKU;
So I reduced the sub-selects results as you only used sku from sp but given you are comparing product_id to sku as your example exists you don't need to join to sp.
the invalid indentifiers implies to me something is named incorrectly, the first step there is to check the tables exist and the columns are named as you expect and the type of the columns are the same for the JOIN x ON y clause via:
describe table analytics.dbt_lcasucci.product_category;
describe table raw.stitch_heroku.spree_variants;

PostgreSQL use function result in ORDER BY

Is there a way to use the results of a function call in the order by clause?
My current attempt (I've also tried some slight variations).
SELECT it.item_type_id, it.asset_tag, split_part(it.asset_tag, 'ASSET', 2)::INT as tag_num
FROM serials.item_types it
WHERE it.asset_tag LIKE 'ASSET%'
ORDER BY split_part(it.asset_tag, 'ASSET', 2)::INT;
While my general assumption is that this can't be done, I wanted to know if there was a way to accomplish this that I wasn't thinking of.
EDIT: The query above gives the following error [22P02] ERROR: invalid input syntax for integer: "******"
Your query is generally OK, the problem is that for some row the result of split_part(it.asset_tag, 'ASSET', 2) is the string ******. And that string cannot be cast to an integer.
You may want to remove the order by and the cast in the select list and add a where split_part(it.asset_tag, 'ASSET', 2) = '******', for instance, to narrow down that data issue.
Once that is resolved, having such a function in the order by list is perfectly fine. The quoted section of the documentation in the comments on the question is referring to applying an order by clause to the results of UNION, INTERSECTION, etc. queries. In other words, the order by found in this query:
(select column1 as result_column1 from table1
union
select column2 from table 2)
order by result_column1
can only refer to the accumulated result columns, not to expressions on individual rows.

Will Postgres' DISTINCT function always return null as the first element?

I'm selecting distinct values from tables thru Java's JDBC connector and it seems that NULL value (if there's any) is always the first row in the ResultSet.
I need to remove this NULL from the List where I load this ResultSet. The logic looks only at the first element and if it's null then ignores it.
I'm not using any ORDER BY in the query, can I still trust that logic? I can't find any reference in Postgres' documentation about this.
You can add a check for NOT NULL. Simply like
select distinct columnName
from Tablename
where columnName IS NOT NULL
Also if you are not providing the ORDER BY clause then then order in which you are going to get the result is not guaranteed, hence you can not rely on it. So it is better and recommended to provide the ORDER BY clause if you want your result output in a particular output(i.e., ascending or descending)
If you are looking for a reference Postgresql document then it says:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.
If it is not stated in the manual, I wouldn't trust it. However, just for fun and try to figure out what logic is being used, running the following query does bring the NULL (for no apparent reason) to the top, while all other values are in an apparent random order:
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
However, cross joining the table with a modified version of itself brings two NULLs to the top, but random NULLs dispersed througout the resultset. So it doesn't seem to have a clear-cut logic clumping all NULL values at the top.
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
cross join (select n+3 from t) t2

How to combine DISTINCT and ORDER BY in array_agg of jsonb values in postgresSQL

Note: I am using the latest version of Postgres (9.4)
I am trying to write a query which does a simple join of 2 tables, and groups by the primary key of the first table, and does an array_agg of several fields in the 2nd table which I want returned as an object. The array needs to be sorted by a combination of 2 fields in the json objects, and also uniquified.
So far, I have come up with the following:
SELECT
zoo.id,
ARRAY_AGG(
DISTINCT ROW_TO_JSON((
SELECT x
FROM (
SELECT animals.type, animals.name
) x
))::JSONB
-- ORDER BY animals.type, animals.name
)
FROM zoo
JOIN animals ON animals.zooId = zoo.id
GROUP BY zoo.id;
This results in one row for each zoo, with a an aggregate array of jsonb objects, one for each animal, uniquely.
However, I can't seem to figure out how to also sort this by the parameters in the commented out part of the code.
If I take out the distinct, I can ORDER BY original fields, which works great, but then I have duplicates.
If you use row_to_json() you will lose the column names unless you put in a row that is typed. If you "manually" build the jsonb object with json_build_object() using explicit names then you get them back:
SELECT zoo.id, array_agg(za.jb) AS animals
FROM zoo
JOIN (
SELECT DISTINCT ON (zooId, "type", "name")
zooId, json_build_object('animal_type', "type", 'animal_name', "name")::jsonb AS jb
FROM animals
ORDER BY zooId, jb->>'animal_type', jb->>'animal_name'
-- ORDER BY zooId, "type", "name" is far more efficient
) AS za ON za.zooId = zoo.id
GROUP BY zoo.id;
You can ORDER BY the elements of a jsonb object, as shown above, but (as far as I know) you cannot use DISTINCT on a jsonb object. In your case this would be rather inefficient anyway (first building all the jsonb objects, then throwing out duplicates) and at the aggregate level it is plain impossible with standard SQL. You can achieve the same result, however, by applying the DISTINCT clause before building the jsonb object.
Also, avoid use of SQL key words like "type" and standard data types like "name" as column names. Both are non-reserved keywords so you can use them in their proper contexts, but practically speaking your commands could get really confusing. You could, for instance, have a schema, with a table, a column in that table, and a data type each called "type" and then you could get this:
SELECT type::type FROM type.type WHERE type = something;
While PostgreSQL will graciously accept this, it is plain confusing at best and prone to error in all sorts of more complex situations. You can get a long way by double-quoting any key words, but they are best just avoided as identifiers.

SQL DateTime Conversion Fails when No Conversion Should be Taking Place

I'm modifying an existing query for a client, and I've encountered a somewhat baffling issue.
Our client uses SQL Server 2008 R2 and the database in question provides the user the ability to specify custom fields for one of its tables by making use of an EAV structure. All of the values stored in this structure are varchar(255), and several of the fields are intended to store dates. The query in question is being modified to use two of these fields and compare them (one is a start, the other is an end) against the current date to determine which row is "current".
The issue I'm having is that part of the query does a CONVERT(DateTime, eav.Value) in order to turn the varchar into a DateTime. The conversions themselves all succedd and I can include the value as part of the SELECT clause, but part of the question is giving me a conversion error:
Conversion failed when converting date and/or time from character string.
The real kicker is this: if I define the base for this query (getting a list of entities with the two custom field values flattened into a single row) as a view and select against the view and filter the view by getdate(), then it works correctly, but it fails if I add a join to a second table using one of the (non-date) fields from the view. I realize that this might be somewhat hard to follow, so I can post an example query if desired, but this question is already getting a little long.
I've tried recreating the basic structure in another database and including sample data, but the new database behaves as expected, so I'm at a loss here.
EDIT In case it's useful, here's the statement for the view:
create view Festival as
select
e.EntityId as FestivalId,
e.LookupAs as FestivalName,
convert(Date, nvs.Value) as ActivityStart,
convert(Date, nve.Value) as ActivityEnd
from tblEntity e
left join CustomControl ccs on ccs.ShortName = 'Activity Start Date'
left join CustomControl cce on cce.ShortName = 'Activity End Date'
left join tblEntityNameValue nvs on nvs.CustomControlId = ccs.IdCustomControl and nvs.EntityId = e.EntityId
left join tblEntityNameValue nve on nve.CustomControlId = cce.IdCustomControl and nve.EntityId = e.EntityId
where e.EntityType = 'Festival'
The failing query is this:
select *
from Festival f
join FestivalAttendeeAll fa on fa.FestivalId = f.FestivalId
where getdate() between f.ActivityStart and f.ActivityEnd
Yet this works:
select *
from Festival f
where getdate() between f.ActivityStart and f.ActivityEnd
(EntityId/FestivalId are int columns)
I've encountered this type of error before, it's due to the "order of operations" performed by the execution plan.
You are getting that error message because the execution plan for your statement (generated by the optimizer) is performing the CONVERT() operation on rows that contain string values that can't be converted to DATETIME.
Basically, you do not have control over which rows the optimizer performs that conversion on. You know that you only need that conversion done on certain rows, and you have predicates (WHERE or ON clauses) that exclude those rows (limit the rows to those that need the conversion), but your execution plan is performing the CONVERT() operation on rows BEFORE those rows are excluded.
(For example, the optimizer may be electing to a do a table scan, and performing that conversion on every row, before any predicate is being applied.)
I can't give a specific answer, without a specific question and specific SQL that is generating the error.
One simple approach to addressing the problem would be to use the ISDATE() function to test whether the string value can be converted to a date.
That is, replace:
CONVERT(DATETIME,eav.Value)
with:
CASE WHEN ISDATE(eav.Value) > 0 THEN CONVERT(DATETIME, eav.Value) ELSE NULL END
or:
CONVERT(DATETIME, CASE WHEN ISDATE(eav.Value) > 0 THEN eav.Value ELSE NULL END)
Note that the ISDATE() function is subject to some significant limitations, such as being affected by the DATEFORMAT and LANGUAGE settings of the session.
If there is some other indication on the eav row, you could use some other test, to conditionally perform the conversion.
CASE WHEN eav.ValueIsDateTime=1 THEN CONVERT(DATETIME, eav.Value) ELSE NULL END
The other approach I've used is to try to gain some modicum of control over the order of operations of the optimizer, using inline views or Common Table Expressions, with operations that force the optimizer to materialize them and apply predicates, so that happens BEFORE any conversion in the outer query.