I have imported tables from Postgres to hdfs by using sqoop. My table have uuid field as primary key and my command sqoop as below:
sqoop import --connect 'jdbc:postgresql://localhost:5432/mydb' --username postgreuser --password 123456abcA --driver org.postgresql.Driver --table users --map-column-java id=String --target-dir /hdfs/postgre/users --as-avrodatafile --compress -m 2
But I got the error:
Import failed: java.io.IOException: org.postgresql.util.PSQLException: ERROR: function min(uuid) does not exist
I tried executed the sql command: SELECT min(id) from users and got the same error. How could I fix it ? I use Postgres 9.4, hadoop 2.9.0 and sqoop 1.4.7
I'd like to credit #robin-salih 's answer, I've used it and implementation of min for int, to build following code:
CREATE OR REPLACE FUNCTION min(uuid, uuid)
RETURNS uuid AS $$
BEGIN
IF $2 IS NULL OR $1 > $2 THEN
RETURN $2;
END IF;
RETURN $1;
END;
$$ LANGUAGE plpgsql;
create aggregate min(uuid) (
sfunc = min,
stype = uuid,
combinefunc = min,
parallel = safe,
sortop = operator (<)
);
It almost the same, but takes advantages of B-tree index, so select min(id) from tbl works in few millis.
P.S. I'm not pgsql expert, perhaps my code is somehow wrong, double check before use in production, but I hope it uses indexes and parallel execution correctly. I've made it just from sample code, not digging into theory behind aggregates in PG.
Postgres doesn't have built-in function for min/max uuid, but you can create your own using the following code:
CREATE OR REPLACE FUNCTION min(uuid, uuid)
RETURNS uuid AS $$
BEGIN
IF $2 IS NULL OR $1 > $2 THEN
RETURN $2;
END IF;
RETURN $1;
END;
$$ LANGUAGE plpgsql;
CREATE AGGREGATE min(uuid)
(
sfunc = min,
stype = uuid
);
I found the answer's provided by #robin-salih and #bodgan-mart to be a great starting point but ultimately incorrect. Here's a solution which worked better for me:
CREATE FUNCTION min_uuid(uuid, uuid)
RETURNS uuid AS $$
BEGIN
-- if they're both null, return null
IF $2 IS NULL AND $1 IS NULL THEN
RETURN NULL ;
END IF;
-- if just 1 is null, return the other
IF $2 IS NULL THEN
RETURN $1;
END IF ;
IF $1 IS NULL THEN
RETURN $2;
END IF;
-- neither are null, return the smaller one
IF $1 > $2 THEN
RETURN $2;
END IF;
RETURN $1;
END;
$$ LANGUAGE plpgsql;
create aggregate min(uuid) (
sfunc = min_uuid,
stype = uuid,
combinefunc = min_uuid,
parallel = safe,
sortop = operator (<)
);
For more details, see my post at How to select minimum UUID with left outer join?
I am defining min/max aggregates for uuids using least/greatest which I believe should give the best performance as those are native to postgres (but I haven't benchmarked it).
Since least/greatest are special forms (to my understanding) I have to proxy them using a function which I am marking as immutable and parallel safe.
least/greatest already have proper null-handling behavior.
I am using these in production on Postgres 13.
create or replace function min(uuid, uuid)
returns uuid
immutable parallel safe
language plpgsql as
$$
begin
return least($1, $2);
end
$$;
create aggregate min(uuid) (
sfunc = min,
stype = uuid,
combinefunc = min,
parallel = safe,
sortop = operator (<)
);
create or replace function max(uuid, uuid)
returns uuid
immutable parallel safe
language plpgsql as
$$
begin
return greatest($1, $2);
end
$$;
create aggregate max(uuid) (
sfunc = max,
stype = uuid,
combinefunc = max,
parallel = safe,
sortop = operator (>)
);
This is not a issue with sqoop. Postgres doesn't allow min/max on uuid. Each uuid is unique and is not considered bigger/smaller than other.
To fix this in sqoop you might need to use some other field as the split-by key. I used created_At timestamp as my split-by key instead.
Related
I have this question, I was doing some migration from SQL Server to PostgreSQL 12.
The scenario, I am trying to accomplish:
The function should have a RETURN Statement, be it with SETOF 'tableType' or RETURN TABLE ( some number of columns )
The body starts with a count of records, if there is no record found based on input parameters, then simply Return Zero (0), else, return the entire set of record defined in the RETURN Statement.
The Equivalent part in SQL Server or Oracle is: They can just put a SELECT Statement inside a Procedure to accomplish this. But, its a kind of difficult in case of PostgreSQL.
Any suggestion, please.
What I could accomplish still now - If no record found, it will simply return NULL, may be using PERFORM, or may be selecting NULL as column name for the returning tableType columns.
I hope I am clear !
What I want is something like -
============================================================
CREATE OR REPLACE FUNCTION public.get_some_data(
id integer)
RETURNS TABLE ( id_1 integer, name character varying )
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
p_id alias for $1;
v_cnt integer:=0;
BEGIN
SELECT COUNT(1) FROM public.exampleTable e
WHERE id::integer = e.id::integer;
IF v_cnt= 0 THEN
SELECT 0;
ELSE
SELECT
a.id, a.name
public.exampleTable a
where a.id = p_id;
END;
$BODY$;
If you just want to return a set of a single table, using returns setof some_table is indeed the easiest way. The most basic SQL function to do that would be:
create function get_data()
returns setof some_table
as
$$
select *
from some_table;
$$
language sql;
PL/pgSQL isn't really necessary to put a SELECT statement into a function, but if you need to do other things, you need to use RETURN QUERY in a PL/pgSQL function:
create function get_data()
returns setof some_table
as
$$
begin
return query
select *
from some_table;
end;
$$
language plpgsql;
A function as exactly one return type. You can't have a function that sometimes returns an integer and sometimes returns thousands of rows with a dozen columns.
The only thing you could do, if you insist on returning something is something like this:
create function get_data()
returns setof some_table
as
$$
begin
return query
select *
from some_table;
if not found then
return query
select (null::some_table).*;
end if;
end;
$$
language plpgsql;
But I would consider the above an extremely ugly and confusing (not to say stupid) solution. I certainly wouldn't let that pass through a code review.
The caller of the function can test if something was returned in the same way I implemented that ugly hack: check the found variable after using the function.
One more hack to get as close as possible to what you want. But I will repeat what others have told you: You cannot do what you want directly. Just because MS SQL Server lets you get away poor coding does not mean Postgres is obligated to do so. As the link by #a_horse_with_no_name implies converting code is easy, once you migrate how you think about the problem in the first place. The closest you can get is return a tuple with a 0 id. The following is one way.
create or replace function public.get_some_data(
p_id integer)
returns table ( id integer, name character varying )
language plpgsql
as $$
declare
v_at_least_one boolean = false;
v_exp_rec record;
begin
for v_exp_rec in
select a.id, a.name
from public.exampletable a
where a.id = p_id
union all
select 0,null
loop
if v_exp_rec.id::integer > 0
or (v_exp_rec.id::integer = 0 and not v_at_least_one)
then
id = v_exp_rec.id;
name = v_exp_rec.name;
return next;
v_at_least_one = true;
end if;
end loop ;
return;
end
$$;
But that is still just a hack and assumes there in not valid row with id=0. A much better approach would by for the calling routing to check what the function returns (it has to do that in one way or another anyway) and let the function just return the data found instead of making up data. That is that mindset shift. Doing that you can reduce this function to a simple select statement:
create or replace function public.get_some_data2(
p_id integer)
returns table ( id integer, name character varying )
language sql strict
as $$
select a.id, a.name
from public.exampletable a
where a.id = p_id;
$$;
Or one of the other solutions offered.
I have a Postgresql 9.6 (and postgis 2.5) function that calculates a json with parameters based on a geometry of a record.
I created a json field and wrote the json calculation output there, and all was well.
UPDATE public.recordtable
SET polygoncalc = public.fn_calculate_polygon(5)
WHERE id = 5;
When I try to run it on Postgres 11.2 , it returns a error:
SQL Error [0A000]:ERROR: set-returning functions are not allowed in UPDATE
Based on this link I understand there has been a change in postgresql, but honestly,
how can I write the json into the field in Postgres 11.2 without getting the error ?
Thank you in advance.
EDIT: I changed the output field to type json and added the function text
the function:
CREATE OR REPLACE FUNCTION public.fn_create_jsn_point(pid double precision)
RETURNS TABLE(jsn json)
LANGUAGE plpgsql
AS $function$
DECLARE
p_id double precision = pid;
myjson json = null ;
begin
return QUERY
select row_to_json(finaljson)
from(
select
ST_x(a.wgs_84_dump_point) as X,
ST_y(a.wgs_84_dump_point) as Y
(
select (st_transform(st_centroid(t.geom),4326)) wgs_84_dump_point
from baserecords t where base_id = p_id
) a ) finaljson;
END;
$function$
;
The function is returning a table. No matter how many rows are returned, it is returning a table, i.e a set.
Since you know a single value will be returned, you can change the function to restrict it:
CREATE OR REPLACE FUNCTION public.fn_create_jsn_point(pid double precision)
RETURNS json -- <---------------------------- Return a single value
LANGUAGE plpgsql
AS $function$
DECLARE
p_id double precision = pid;
myjson json = null ;
begin
return ( -- <---------------------------------- return a single value
select row_to_json(finaljson)
from(
select
ST_x(a.wgs_84_dump_point) as X,
ST_y(a.wgs_84_dump_point) as Y
(
select (st_transform(st_centroid(t.geom),4326)) wgs_84_dump_point
from baserecords t where base_id = p_id
) a ) finaljson
);
END;
$function$
;
Write an aggregate to count the number of times the number 40 is seen in a column.
Use your aggregate to count the number of 40 year olds in the directory table.
This is what I was doing:
Create function aggstep(curr int) returns int as $$
begin
return curr.count where age = 40;
end;
$$ language plpgsql;
Create aggregate aggs(integer) (
stype = int,
initcond = '',
sfunc = aggstep);
Select cas(age) from directory;
You could do it for example like this:
First, create a transition function:
CREATE FUNCTION count40func(bigint, integer) RETURNS bigint
LANGUAGE sql IMMUTABLE CALLED ON NULL INPUT AS
'SELECT $1 + ($2 IS NOT DISTINCT FROM 40)::integer::bigint';
That works because FALSE::integer is 0 and TRUE::integer is 1.
I use IS NOT DISTINCT FROM rather than = so that it does the correct thing for NULLs.
The aggregate can then be defined as
CREATE AGGREGATE count40(integer) (
SFUNC = count40func,
STYPE = bigint,
INITCOND = 0
);
You can then query like
SELECT count40(age) FROM directory;
I want create a function:
CREATE OR REPLACE FUNCTION medibv.delAuto(tableName nvarchar(50), columnName nvarchar(100),value
nvarchar(100))
RETURNS void AS
$BODY$
begin
DELETE from tableName where columnName=value
end;
$BODY$
LANGUAGE plpgsql VOLATILE;
I have these parameters: tableName, columnName, value.
I want tableName as table in PostgreSQL.
CREATE OR REPLACE FUNCTION medibv.delauto(tbl regclass, col text, val text
,OUT success bool)
RETURNS bool AS
$func$
BEGIN
EXECUTE format('
DELETE FROM %s
WHERE %I = $1
RETURNING TRUE', tbl, col)
USING val
INTO success;
RETURN; -- optional in this case
END
$func$ LANGUAGE plpgsql;
Call:
SELECT medibv.delauto('myschema.mytbl', 'my_txt_col', 'foo');
Returns TRUE or NULL.
There is no nvarchar type in Postgres. You may be thinking of SQL Server. The equivalent would be varchar, but most of the time you can simply use text.
regclass is a specialized type for registered table names. It's perfect for the case an prevents SQL injection for the table name automatically and most effectively. More in the related answer below.
The column name is still prone to SQL injection. I sanitize the function with format(%I).
format() requires PostgreSQL 9.1+.
Your function did not report what happened. One or more rows may be found and deleted. Or none at all. As a bare minimum I added a boolean OUT column which will be TRUE if one or more rows were deleted. Because (quoting the manual here):
If multiple rows are returned, only the first will be assigned to the INTO variable.
Lastly, use USING with EXECUTE to pass in values. Don't cast back and forth. This is inefficient and prone to errors and to SQLi once more.
Find more explanation and links in this closely related answer:
Table name as a PostgreSQL function parameter
Use EXECUTE to run dynamic commands:
CREATE OR REPLACE FUNCTION medibv.delAuto(tableName nvarchar(50), columnName nvarchar(100),value
nvarchar(100))
RETURNS void AS
$BODY$
begin
EXECUTE 'DELETE FROM ' || tableName || ' WHERE ' || columnName || '=' || value;
end;
$BODY$
LANGUAGE plpgsql VOLATILE;
How can I write a stored procedure that contains a dynamically built SQL statement that returns a result set? Here is my sample code:
CREATE OR REPLACE FUNCTION reporting.report_get_countries_new (
starts_with varchar,
ends_with varchar
)
RETURNS TABLE (
country_id integer,
country_name varchar
) AS
$body$
DECLARE
starts_with ALIAS FOR $1;
ends_with ALIAS FOR $2;
sql VARCHAR;
BEGIN
sql = 'SELECT * FROM lookups.countries WHERE lookups.countries.country_name >= ' || starts_with ;
IF ends_with IS NOT NULL THEN
sql = sql || ' AND lookups.countries.country_name <= ' || ends_with ;
END IF;
RETURN QUERY EXECUTE sql;
END;
$body$
LANGUAGE 'plpgsql'
VOLATILE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 100 ROWS 1000;
This code returns an error:
ERROR: syntax error at or near "RETURN"
LINE 1: RETURN QUERY SELECT * FROM omnipay_lookups.countries WHERE o...
^
QUERY: RETURN QUERY SELECT * FROM omnipay_lookups.countries WHERE omnipay_lookups.countries.country_name >= r
CONTEXT: PL/pgSQL function "report_get_countries_new" line 14 at EXECUTE statement
I have tried other ways instead of this:
RETURN QUERY EXECUTE sql;
Way 1:
RETURN EXECUTE sql;
Way 2:
sql = 'RETURN QUERY SELECT * FROM....
/*later*/
EXECUTE sql;
In all cases without success.
Ultimately I want to write a stored procedure that contains a dynamic sql statement and that returns the result set from the dynamic sql statement.
There is room for improvements:
CREATE OR REPLACE FUNCTION report_get_countries_new (starts_with text
, ends_with text = NULL)
RETURNS SETOF lookups.countries AS
$func$
DECLARE
sql text := 'SELECT * FROM lookups.countries WHERE country_name >= $1';
BEGIN
IF ends_with IS NOT NULL THEN
sql := sql || ' AND country_name <= $2';
END IF;
RETURN QUERY EXECUTE sql
USING starts_with, ends_with;
END
$func$ LANGUAGE plpgsql;
-- the rest is default settings
Major points
PostgreSQL 8.4 introduced the USING clause for EXECUTE, which is useful for several reasons. Recap in the manual:
The command string can use parameter values, which are referenced in
the command as $1, $2, etc. These symbols refer to values supplied in
the USING clause. This method is often preferable to inserting data
values into the command string as text: it avoids run-time overhead of
converting the values to text and back, and it is much less prone to
SQL-injection attacks since there is no need for quoting or escaping.
IOW, it is safer and faster than building a query string with text representation of parameters, even when sanitized with quote_literal().
Note that $1, $2 in the query string refer to the supplied values in the USING clause, not to the function parameters.
While you return SELECT * FROM lookups.countries, you can simplify the RETURN declaration like demonstrated:
RETURNS SETOF lookups.countries
In PostgreSQL there is a composite type defined for every table automatically. Use it. The effect is that the function depends on the type and you get an error message if you try to alter the table. Drop & recreate the function in such a case.
This may or may not be desirable - generally it is! You want to be made aware of side effects if you alter tables. The way you have it, your function would break silently and raise an exception on it's next call.
If you provide an explicit default for the second parameter in the declaration like demonstrated, you can (but don't have to) simplify the call in case you don't want to set an upper bound with ends_with.
SELECT * FROM report_get_countries_new('Zaire');
instead of:
SELECT * FROM report_get_countries_new('Zaire', NULL);
Be aware of function overloading in this context.
Don't quote the language name 'plpgsql' even if that's tolerated (for now). It's an identifier.
You can assign a variable at declaration time. Saves an extra step.
Parameters are named in the header. Drop the nonsensical lines:
starts_with ALIAS FOR $1;
ends_with ALIAS FOR $2;
Use quote_literal() to avoid SQL injection (!!!) and fix your quoting problem:
CREATE OR REPLACE FUNCTION report_get_countries_new (
starts_with varchar,
ends_with varchar
)
RETURNS TABLE (
country_id integer,
country_name varchar
) AS
$body$
DECLARE
starts_with ALIAS FOR $1;
ends_with ALIAS FOR $2;
sql VARCHAR;
BEGIN
sql := 'SELECT * FROM lookups.countries WHERE lookups.countries.country_name ' || quote_literal(starts_with) ;
IF ends_with IS NOT NULL THEN
sql := sql || ' AND lookups.countries.country_name <= ' || quote_literal(ends_with) ;
END IF;
RETURN QUERY EXECUTE sql;
END;
$body$
LANGUAGE 'plpgsql'
VOLATILE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 100 ROWS 1000;
This is tested in version 9.1, works fine.