Some background.
I am working on a show case for indexes. In order to provide more insights into how Postgres handles them, I am using pageinspect module. My setup (on 9.3.5) is the following:
CREATE EXTENSION IF NOT EXISTS pageinspect;
DROP SCHEMA IF EXISTS pin;
CREATE SCHEMA pin;
SET search_path TO pin,public;
CREATE TABLE bat AS
SELECT id,
translate((random()*123456789)::text,'0987654321.','abcdefghijk') str
FROM generate_series(1, 10) id;
ALTER TABLE bat ADD CONSTRAINT p_bat PRIMARY KEY (id);
CREATE INDEX i_bat_str ON bat(str);
VACUUM ANALYZE bat;
And now I can show some details (meta and the only leaf page):
SELECT * FROM bt_metap('i_bat_str');
WITH pages AS (
SELECT relname, relpages, blkno
FROM pg_class ic, generate_series(1,relpages-1) s(blkno)
WHERE oid='pin.i_bat_str'::regclass
)
SELECT blkno,s.*
FROM pages, bt_page_items('pin.i_bat_str',blkno) s
WHERE blkno=1;
Last query produces a series of data values. I do understand, that first byte is a single-byte header, and in my case it's least-order bit is set, 'cos I'm on a x86_64.
So my question is — is it possible to check server endianness via function (or via SQL) that will not require compiling extra code? I know how to do it in C, but I'd like to avoid it.
I'd probably use plperl or plpython.
For example:
CREATE OR REPLACE FUNCTION little_endian() RETURNS boolean LANGUAGE plperlu AS $$
use Config;
return $Config{byteorder} eq '1234' || $Config{byteorder} eq '12345678';
$$;
or:
CREATE OR REPLACE FUNCTION little_endian() RETURNS boolean LANGUAGE plpythonu AS $$
import sys
return sys.byteorder == "little"
$$;
Related
I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.
This is the best example query I could think off, but it doesn't work:
do $$
declare #source_schema varchar := 'my_source_schema';
declare #source_table varchar := 'my_source_table';
declare #target_table varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare #unique_keys varchar := ('1', '2', '3')
begin
select into #target_table
from #source_schema.#source_table
where id in (#unique_keys);
delete from #source_schema.#source_table where export_id in (#unique_keys);
end ;
$$;
The query syntax works with hard-coded values.
Most of the times my variables are perceived as columns or not recognized at all. :(
You need to create and then call a plpgsql procedure with input parameters :
CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <# %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;
END ;
$$ ;
The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .
See the test result in dbfiddle.
As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.
You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.
Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:
When to use stored procedure / user-defined function?
CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
LANGUAGE plpgsql AS
$proc$
-- target schema and appendix are always the same, source_table is a variable input
DECLARE
_target_schema CONSTANT text := 's2'; -- hardcoded
_target_table text := _source_table || '_duplicates';
_sql text := format(
'WITH del AS (
DELETE FROM %I.%I
WHERE id = ANY($1)
RETURNING *
)
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
, _target_schema, _target_table);
BEGIN
RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _unique_keys; -- execute
GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;
Call:
CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);
db<>fiddle here
I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:
How to create a temporary function in PostgreSQL?
Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:
Are PostgreSQL column names case-sensitive?
Table name as a PostgreSQL function parameter
Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!
The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.
I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:
Dynamic SQL (EXECUTE) as condition for IF statement
Best way to get result count before LIMIT was applied
Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:
Replicating data between Postgres DBs
Or you can build the statements in a client program. Like psql, and use \gexec. Example:
Filter column names from existing table for SQL DDL statement
Based on Erwin's answer, minor optimization...
create or replace procedure pg_temp.p_archive_dump
(_source_schema text, _source_table text,
_unique_key int[],_target_schema text)
language plpgsql as
$$
declare
_row_count bigint;
_target_table text := '';
BEGIN
select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_') into _target_table from quote_ident(_source_table);
raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );
execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'
,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
GET DIAGNOSTICS _row_count = ROW_COUNT;
RAISE notice 'rows influenced, %',_row_count;
end
$$;
--
if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema
I need to create a function, which returns results of a SELECT query. This SELECT query is a JOIN of few temporary tables created inside this function. Is there any way to create such function? Here is an example (it is very simplified, in reality there are multiple temp tables with long queries):
CREATE OR REPLACE FUNCTION myfunction () RETURNS TABLE (column_a TEXT, column_b TEXT) AS $$
BEGIN
CREATE TEMPORARY TABLE raw_data ON COMMIT DROP
AS
SELECT d.column_a, d2.column_b FROM dummy_data d JOIN dummy_data_2 d2 using (id);
RETURN QUERY (select distinct column_a, column_b from raw_data limit 100);
END;
$$
LANGUAGE 'plpgsql' SECURITY DEFINER
I get error:
[Error] Script lines: 1-19 -------------------------
ERROR: RETURN cannot have a parameter in function returning set;
use RETURN NEXT at or near "QUERY"Position: 237
I apologize in advance for any obvious mistakes, I'm new to this.
Psql version is PostgreSQL 8.2.15 (Greenplum Database 4.3.12.0 build 1)
The most recent version of Greenplum Database (5.0) is based on PostgreSQL 8.3, and it supports the RETURN QUERY syntax. Just tested your function on:
PostgreSQL 8.4devel (Greenplum Database 5.0.0-beta.10+dev.726.gd4a707c762 build dev)
The most probable error this could raise in Postgres:
ERROR: column "foo" specified more than once
Meaning, there is at least one more column name (other than id which is folded to one instance with the USING clause) included in both tables. This would not raise an exception in a plain SQL SELECT which tolerates duplicate output column names. But you cannot create a table with duplicate names.
The problem also applies for Greenplum (like you later declared), which is not Postgres. It was forked from PostgreSQL in 2005 and developed separately. The current Postgres manual hardly applies at all any more. Look to the Greenplum documentation.
And psql is just the standard PostgreSQL interactive terminal program. Obviously you are using the one shipped with PostgreSQL 8.2.15, but the RDBMS is still Greenplum, not Postgres.
Syntax fix (for Postgres, like you first tagged, still relevant):
CREATE OR REPLACE FUNCTION myfunction()
RETURNS TABLE (column_a text, column_b text) AS
$func$
BEGIN
CREATE TEMPORARY TABLE raw_data ON COMMIT DROP AS
SELECT d.column_a, d2.column_b -- explicit SELECT list avoids duplicate column names
FROM dummy_data d
JOIN dummy_data_2 d2 using (id);
RETURN QUERY
SELECT DISTINCT column_a, column_b
FROM raw_data
LIMIT 100;
END
$func$ LANGUAGE plpgsql SECURITY DEFINER;
The example wouldn't need a temp table - unless you access the temp table after the function call in the same transaction (ON COMMIT DROP). Else, a plain SQL function is better in every way. Syntax for Postgres and Greenplum:
CREATE OR REPLACE FUNCTION myfunction(OUT column_a text, OUT column_b text)
RETURNS SETOF record AS
$func$
SELECT DISTINCT d.column_a, d2.column_b
FROM dummy_data d
JOIN dummy_data_2 d2 using (id)
LIMIT 100;
$func$ LANGUAGE plpgsql SECURITY DEFINER;
Not least, it should also work for Greenplum.
The only remaining reason for this function is SECURITY DEFINER. Else you could just use the simple SQL statement (possibly as prepared statement) instead.
RETURN QUERY was added to PL/pgSQL with version 8.3 in 2008, some years after the fork of Greenplum. Might explain your error msg:
ERROR: RETURN cannot have a parameter in function returning set;
use RETURN NEXT at or near "QUERY" Position: 237
Aside: LIMIT without ORDER BY produces arbitrary results. I assume you are aware of that.
If for some reason you actually need temp tables and cannot upgrade to Greenplum 5.0 like A. Scherbaum suggested, you can still make it work in Greenplum 4.3.x (like in Postgres 8.2). Use a FOR loop in combination with RETURN NEXT.
Examples:
plpgsql error "RETURN NEXT cannot have a parameter in function with OUT parameters" in table-returning function
How to use `RETURN NEXT`in PL/pgSQL correctly?
Use of custom return types in a FOR loop in plpgsql
I am new to PostgreSQL and found a trigger which serves my purpose completely except for one little thing. The trigger is quite generic and runs across different tables and logs different field changes. I found here.
What I now need to do is test for a specific field which changes as the tables change on which the trigger fires. I thought of using substr as all the column will have the same name format e.g. XXX_cust_no but the XXX can change to 2 or 4 characters. I need to log the value in theXXX_cust_no field with every record that is written to the history_ / audit table. Using a bunch of IF / ELSE statements to accomplish this is not something I would like to do.
The trigger as it now works logs the table_name, column_name, old_value, new_value. I however need to log the XXX_cust_no of the record that was changed as well.
Basically you need dynamic SQL for dynamic column names. format helps to format the DML command. Pass values from NEW and OLD with the USING clause.
Given these tables:
CREATE TABLE tbl (
t_id serial PRIMARY KEY
,abc_cust_no text
);
CREATE TABLE log (
id int
,table_name text
,column_name text
,old_value text
,new_value text
);
It could work like this:
CREATE OR REPLACE FUNCTION trg_demo()
RETURNS TRIGGER AS
$func$
BEGIN
EXECUTE format('
INSERT INTO log(id, table_name, column_name, old_value, new_value)
SELECT ($2).t_id
, $3
, $4
,($1).%1$I
,($2).%1$I', TG_ARGV[0])
USING OLD, NEW, TG_RELNAME, TG_ARGV[0];
RETURN NEW;
END
$func$ LANGUAGE plpgsql;
CREATE TRIGGER demo
BEFORE UPDATE ON tbl
FOR EACH ROW EXECUTE PROCEDURE trg_demo('abc_cust_no'); -- col name here.
SQL Fiddle.
Related answer on dba.SE:
How to access NEW or OLD field given only the field's name?
List of special variables visible in plpgsql trigger functions in the manual.
This query returns the OID of the function whose name and signature is supplied:
select 'myfunc(signature)'::regprocedure::oid;
But is there something in PostgreSQL plpgsql like a myNameAndSignature() function so we could use dynamic sql to build a statement that gets the OID of the function and then creates a temporary table with the OID appended to the name of the temp table?
The statement to execute dynamically is:
create temp table TT17015
I'm new to PostgreSQL, and maybe there's a better way to handle naming of temporary tables so the functions that use temp tables, and call each other, don't get the error that a particular temp table it is trying to delete is in use elsewhere?
Using the OID of a function does not necessarily prevent a naming conflict. The same function could be run multiple times in the same session.
If you are in need of a unique name, use a SEQUENCE. Run once in your database:
CREATE SEQUENCE tt_seq;
Then, in your plpgsql function or DO statement:
DO
$$
DECLARE
_tbl text := 'tt' || nextval('tt_seq');
BEGIN
EXECUTE 'CREATE TEMP TABLE ' || _tbl || '(id int)';
END
$$
Drawback is that you have to use dynamic SQL for dynamic identifiers. Plain SQL commands do not accept parameters for identifiers.
How to insert a text file into a field in PostgreSQL?
I'd like to insert a row with fields from a local or remote text file.
I'd expect a function like gettext() or geturl() in order to do the following:
% INSERT INTO collection(id, path, content) VALUES(1, '/etc/motd', gettext('/etc/motd'));
-S.
The easiest method would be to use one of the embeddable scripting languages. Here's an example using plpythonu:
CREATE FUNCTION gettext(url TEXT) RETURNS TEXT
AS $$
import urllib2
try:
f = urllib2.urlopen(url)
return ''.join(f.readlines())
except Exception:
return ""
$$ LANGUAGE plpythonu;
One drawback to this example function is its reliance on urllib2 means you have to use "file:///" URLs to access local files, like this:
select gettext('file:///etc/motd');
Thanks for the tips. I've found another answer with a built in function.
You need to have super user rights in order to execute that!
-- 1. Create a function to load a doc
-- DROP FUNCTION get_text_document(CHARACTER VARYING);
CREATE OR REPLACE FUNCTION get_text_document(p_filename CHARACTER VARYING)
RETURNS TEXT AS $$
-- Set the end read to some big number because we are too lazy to grab the length
-- and it will cut of at the EOF anyway
SELECT CAST(pg_read_file(E'mydocuments/' || $1 ,0, 100000000) AS TEXT);
$$ LANGUAGE sql VOLATILE SECURITY DEFINER;
ALTER FUNCTION get_text_document(CHARACTER VARYING) OWNER TO postgres;
-- 2. Determine the location of your cluster by running as super user:
SELECT name, setting FROM pg_settings WHERE name='data_directory';
-- 3. Copy the files you want to import into <data_directory>/mydocuments/
-- and test it:
SELECT get_text_document('file1.txt');
-- 4. Now do the import (HINT: File must be UTF-8)
INSERT INTO mytable(file, content)
VALUES ('file1.txt', get_text_document('file1.txt'));
Postgres's COPY command is exactly for this.
My advice is to upload it to a temporary table, and then transfer the data across to your main table when you're happy with the formatting. e.g.
CREATE table text_data (text varchar)
COPY text_data FROM 'C:\mytempfolder\textdata.txt';
INSERT INTO main_table (value)
SELECT string_agg(text, chr(10)) FROM text_data;
DROP TABLE text_data;
Also see this question.
You can't. You need to write a program that will read file content's (or URL's) and store it into the desired field.
Use COPY instead of INSERT
reference: http://www.commandprompt.com/ppbook/x5504#AEN5631