How to replace ORA_HASH function of Oracle in Postgres? - postgresql

How to replace ORA_HASH function of Oracle in Postgres? I am looking to implement the Batch and merge logic thats been written into Oracle and same I am looking to implement it into Postgres.
SELECT DISTINCT ACCNT_TYPE,ACCNT_SUB_TYPE,ACCNT_FROM_VAL,ACCNT_TO_VAL AS T_ACCNT_TO_VAL,
ORA_HASH("NAME"||TO_CHAR(LIFECYCLE_DATE,'DD-MM-YYYY HH24:MI:SS')||LEGACY_ICA_TYPE||TO_CHAR(PURGE_DATE,'DD-MM-YYYY HH24:MI:SS')
||HUB_STATE_IND||LIFECYCLE_STATUS_CD||VAT_ID||LICENSED_SW||PRIMARY_ICA||TOKEN_ACCT_SRV_DESC) AS HASH_VAL
FROM C_ACCNT
All the fields mentioned in the ORA_HASH is used to evaluate if INSERT should be done UPSERT should be done by considering all these fields.
Almost the same query but table name is different doing the left outer join. Also if the HASH VALUE is different then it would be considered for UPSERT.
Why this query always gives me null response?
select md5(p.src_id || p.type || p.accountName)
FROM ACCOUNT p;
If type value is NULL is DB then md5 results in NULL. This is bad.

If all you need is a hash function for a string, use the PostgreSQL built-in hashtext.
For the concatenation, you could use
hashtext(concat(p.src_id, p.type, p.accountName))

Related

Postgres Functions: Getting the Return Table Column Details

I feel the need to get the column names and data types of the table returned by any function that has a 'record' return data type, because...
A key process in an existing SQL Server-based system makes use of a stored procedure that takes a user-defined function as a parameter. An initial step gets the column names and types of the table returned by the function that was passed as a parameter.
In Postgres 13 I can use pg_proc.prorettype and the corresponding pg_type to find functions that return record types...that's a start. I can also use pg_get_function_result() to get the string containing the information I need. But, it's a string, and while I ultimately will have to assemble a very similar string, this is just one application of the info. Is there a tabular equivalent containing (column_name, data_type, ordinal_position), or do I need to do that myself?
Is there access to a composite data type the system may have created when such a function is created?
One option that I think will work for me, but I think it's a little weird, is to:
> create temp table t as select * from function() limit 0;
then look that table up in info_schema.columns, assemble what I need and drop the temp table...putting all of this into a function.
You can query the catalog table pg_proc, which contains all the required information:
SELECT coalesce(p.na, 'column' || p.i),
p.ty::regtype,
p.i
FROM pg_proc AS f
CROSS JOIN LATERAL unnest(
coalesce(f.proallargtypes, ARRAY[f.prorettype]),
f.proargmodes,
f.proargnames
)
WITH ORDINALITY AS p(ty,mo,na,i)
WHERE f.proname = 'interval_ok'
AND coalesce(p.mo, 'o') IN ('o', 't')
ORDER BY p.i;

PostgreSQL, allow to filter by not existing fields

I'm using a PostgreSQL with a Go driver. Sometimes I need to query not existing fields, just to check - maybe something exists in a DB. Before querying I can't tell whether that field exists. Example:
where size=10 or length=10
By default I get an error column "length" does not exist, however, the size column could exist and I could get some results.
Is it possible to handle such cases to return what is possible?
EDIT:
Yes, I could get all the existing columns first. But the initial queries can be rather complex and not created by me directly, I can only modify them.
That means the query can be simple like the previous example and can be much more complex like this:
WHERE size=10 OR (length=10 AND n='example') OR (c BETWEEN 1 and 5 AND p='Mars')
If missing columns are length and c - does that mean I have to parse the SQL, split it by OR (or other operators), check every part of the query, then remove any part with missing columns - and in the end to generate a new SQL query?
Any easier way?
I would try to check within information schema first
"select column_name from INFORMATION_SCHEMA.COLUMNS where table_name ='table_name';"
And then based on result do query
Why don't you get a list of columns that are in the table first? Like this
select column_name
from information_schema.columns
where table_name = 'table_name' and (column_name = 'size' or column_name = 'length');
The result will be the columns that exist.
There is no way to do what you want, except for constructing an SQL string from the list of available columns, which can be got by querying information_schema.columns.
SQL statements are parsed before they are executed, and there is no conditional compilation or no short-circuiting, so you get an error if a non-existing column is referenced.

What PostgreSQL type is good for stroring array of strings and offering fast lookup afterwards

I am using PostgreSQL 11.9
I have a table containing a jsonb column with arbitrary number of key-values. There is a requirement when we perform a search to include all values from this column as well. Searching in jsonb is quite slow so my plan is to create a trigger which will extract all the values from the jsonb column:
select t.* from app.t1, jsonb_each(column_jsonb) as t(k,v)
with something like this. And then insert the values in a newly created column in the same table so I can use this column for faster searches.
My question is what type would be most suitable for storing the keys and then searchin within them. Currently the search looks like this:
CASE
WHEN something IS NOT NULL
THEN EXISTS(SELECT value FROM jsonb_each(column_jsonb) WHERE value::text ILIKE search_term)
END
where the search_term is what the user entered from the front end.
This is not going to be pretty, and normalizing the data model would be better.
You can define a function
CREATE FUNCTION jsonb_values_to_string(
j jsonb,
separator text DEFAULT ','
) RETURNS text LANGUAGE sql IMMUTABLE STRICT
AS 'SELECT string_agg(value->>0, $2) FROM jsonb_each($1)';
Then you can query like
WHERE jsonb_values_to_string(column_jsonb, '|') ILIKE 'search_term'
and you can define a trigram index on the left hand side expression to speed it up.
Make sure that you choose a separator that does not occur in the data or the pattern...

Are multiple calls to a STABLE function optimized across CTEs?

According to PostgreSQL: Documentation: 38.7. Function Volatility Categories:
A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call.
What about when using CTEs? If I call a STABLE function inside a CTE and then again inside the primary SELECT query, will the optimizer optimize both calls of the function to a single call?
How can you tell? (I don't know how to use EXPLAIN.)
In the impractical example below, I want to make sure that the function get_user_by_id() is only called once.
CREATE TABLE users (
id bigserial PRIMARY KEY,
username varchar(64) NOT NULL
);
CREATE FUNCTION get_user_by_id(_id bigint) RETURNS users AS $$
SELECT *
FROM users
WHERE id = _id
$$ LANGUAGE SQL STABLE;
INSERT INTO users (username) VALUES ('user1');
WITH error AS (
SELECT -1 AS code
WHERE (SELECT get_user_by_id(3) IS NULL)
)
SELECT code AS id, NULL AS username FROM error
UNION ALL
SELECT * FROM get_user_by_id(3) WHERE id IS NOT NULL;
I think the answer is no, multiple calls to a STABLE function are not optimized across CTEs. I don't know how to prove it, but according to PostgreSQL: Documentation: 11: 7.8. WITH Queries (Common Table Expressions), CTEs seem to be separate "auxiliary statements". And, they don't seem to be well-integrated with the parent query. For example:
. . . the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary subquery. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards.

Executing queries dynamically in PL/pgSQL

I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL