Reproducible shuffle on Postgres - postgresql

Is there an equivalent of MySQL rand(seed) in Postgres but in one statement because there is no way for me to guarantee that setseed() will be executed just before the random() in the same db instance. Also there is just too much columns to use a union.

Related

Querying a PostgreSQL database from Snowflake

PostgreSQL offers a way to query a remote database through dblink.
Similarly (sort-of), Exasol provides a way to connect to a remote Postgres database via the following syntax:
CREATE CONNECTION JDBC_PG
TO 'jdbc:postgresql://...'
IDENTIFIED BY '...';
SELECT * FROM (
IMPORT FROM JDBC AT JDBC_PG
STATEMENT 'SELECT * FROM MY_POSTGRES_TABLE;'
)
-- one can even write direct joins such as
SELECT
t.COLUMN,
r.other_column
FROM MY_EXASOL_TABLE t
LEFT JOIN (
IMPORT FROM JDBC AT JDBC_PG
STATEMENT 'SELECT key, other_column FROM MY_POSTGRES_TABLE'
) r ON r.key = t.KEY
This is very convenient to import data from PostgreSQL directly into Exasol without having to use a temporary file (csv, pg_dump...).
Is it possible to achieve the same thing from Snowflake (querying a remote PostgreSQL database from Snowflake with a direct live connection)? I couldn't find any mention of it in the documentation.
Have you looked into using external functions? It's not exactly what you're looking for (Snowflake doesn't have that capability yet) but it can be used as a workaround in some use cases. For instance, you could create a Python function on AWS Lambda that queries PostgreSQL for small amounts of data (due to Lambda limits) or have it trigger a PostgreSQL process that dumps to S3 to trigger Snowpipe for the bulk import use case.

Simple Select query is running more than 5hours db2

We have a select query as below . Query to fetch the data is running more than 5 hours.
select ColumnA,
ColumnB,
ColumnC,
ColumnD,
ColumnE
from Table
where CodeN <> 'Z'
Is there any way we can collect stats or any other way to improve performance .. ?
And in DB2 do we have any table where we can check whether collect stats are collected on the below table..
The RUNSTATS command collects table & indexes statistics. Note, that this is a Db2 command and not an SQL statement, so you may either run it with Db2 Command Line Processor (CLP) or using relational interface with a special Stored Procedure, which is able to run such commands:
RUNSTATS command using the ADMIN_CMD procedure.
Statistics is stored in the SYSSTAT schema views. Refer to Road map to the catalog views - Table 2. Road map to the updatable catalog views.
How many rows exist in table?
and not equal operator '<>' not indexable predicates

How to load bulk data to table from table using query as quick as possible? (postgresql)

I have a large table(postgre_a) which has 0.1 billion records with 100 columns. I want to duplicate this data into the same table.
I tried to do this using sql
INSERT INTO postgre_a select i1 + 100000000, i2, ... FROM postgre_a;
However, this query is running more than 10 hours now... so I want to do this more faster. I tried to do this with copy, but I cannot find the way to use copy from statement with query.
Is there any other method can do this faster?
You cannot directly use a query in COPY FROM, but maybe you can use COPY FROM PROGRAM with a query to do what you want:
COPY postgre_a
FROM PROGRAM '/usr/pgsql-10/bin/psql -d test'
' -c ''copy (SELECT i1+ 100000000, i2, ... FROM postgre_a) TO STDOUT''';
(Of course you have to replace the path to psql and the database name with your values.)
I am not sure if that is faster than using INSERT, but it is worth a try.
You should definitely drop all indexes and constraints before the operation and recreate them afterwards.

SAS SQL Pass Through

I would like to know what gets executed first in the SAS SQL pass thru in this code:
Connect To OLEDB As MYDB ( %DBConnect( Catalog = MYDB ) ) ;
Create table MYDB_extract as
select put(Parent,$ABC.) as PARENT,
put(PFX,z2.) as PFX,*
From Connection To MYDB
( SELECT
Appointment,Parents,Children,Cats,Dogs
FROM MYDB.dbo.FlatRecord
WHERE Appointment between '20150801' and '20150831'
And Children > 2);
Disconnect from MYDB;
Since MS SQL-Server doesn't support the PUT function will this query cause ALL of the records to be processed locally or only the resultant records from the DBMS?
The explicit pass-through query will still process and will return to SAS what it returns (however many records that is). Then, SAS will perform the put operations on the returned rows.
So if 10000 rows are in the table, and 500 rows meet the criteria in where, 500 records will go to SAS and then be put; SQL will handle the 10000 -> 500.
If you had written this in implicit pass through, then it's possible (if not probable) that SAS might have done all of the work.
First the code in the inline view will be executed on the server:
SELECT Appointment,Parents,Children,Cats,Dogs
FROM MYDB.dbo.FlatRecord
WHERE Appointment between '20150801' and '20150831' And Children > 2
Rows that meet that WHERE clause will be returned by the DBMS to SAS over the OLDEB connection.
Then SAS will (try and) select from that result set, applying any other code, including the put functions.
This isn't really any different from how an inline view works in any other DBMS, except that here you have two different database engines, one running the inner query and SAS running the outer query.

Query for PostgreSQL Server status variable?

In my project i want to collect PostgreSQL server's performance counter. For that i want query to collect it from the database. i am new to postgreSQL. when i am searching, i got something like,
SELECT * FROM pg_stat_database
but when i am use this in java in the following manner, Here Map_PostgreSQL is a hashmap.
while(rs.next())
{
Counter_Name.add(rs.getString(1).trim());
Map_PostgreSQL.put(rs.getString(1).trim(), rs.getString(2));
}
I got output like
{12024=template0, 1=template1, 12029=postgres}
What is the actual query to collect its status variables like "SHOW GLOBAL STATUS" in MySQL.
Thanks in advance..
1st, try to launch the sql query in your PostgreSQL Shell to see exactly which data are returned and how it is organised in rows and columns.
You'll see that the hashmap keys are your datid (database ids) and the values are your databases names.
I think you assumed that statistics were structured in "rows" whereas they are structured in columns.
Don't forget : PostgreSQL is a database server which means it can handle several databases (and in fact, it has several databases because some of them are already created such as the 'postgres' database itself - which Postgres (the server) uses internally, or 'template0').
By launching :
SELECT * FROM pg_stat_database;
You're asking the server to return statistics for every databases (provided you're allowed to get them)
If you want to only have stats for your own database, do :
SELECT * FROM pg_stat_database WHERE datname='your_database_name';
Hope this helped