Where to get example data used in PostgreSQL documentation? - postgresql

I would like to follow through and experiment with the examples in the PostgreSQL documentation but find it difficult to compare impacts of commands without actual working data.
Is there sample data or a default database that correlates with all the examples in the PostgreSQL documentation ?
Problem:
Getting table or column "does not exist" errors when executing SQL in documentation (https://www.postgresql.org/docs/10/tutorial-window.html):
SELECT depname, empno, salary, enroll_date
FROM
(SELECT depname, empno, salary, enroll_date,
rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos
FROM empsalary
) AS ss
WHERE pos < 3;
Error:
SQL Error [42703]: ERROR: column "enroll_date" does not exist
Attempted workarounds:
Stack Builder does not list sample data installation options
The sources do not contain required sample data (eg. "src\tutorial" directory et al in https://ftp.postgresql.org/pub/source/v10.23/postgresql-10.23.tar.gz)
Any 3rd-party sample data found appear incomplete or non-official (eg. "advanced-psql-examples.sql" from https://gist.github.com/marko-asplund/5561404 missing "enroll_date" column)
Versions:
PostgreSQL 10.23 (https://web3.pioneersoftware.co.uk/files/pgsql/10/postgresql-10.23-1-windows.exe)
Stack Builder 4.2.1

Related

Cannot create materialized view with ORDER BY clause in TimescaleDb 2.7.0

The timescale docs seem to suggest that since 2.7.0 it should be possible to make materialized views which include an order by clause. (See "timescale.finalized" option here and "function support" here).
However, I have not been able to get this to work for me. When I try to create my materialized view I get:
ERROR: invalid continuous aggregate query
DETAIL: ORDER BY is not supported in queries defining continuous aggregates.
HINT: Use ORDER BY clauses in SELECTS from the continuous aggregate view instead.
Is there something fundamental I'm misunderstanding about how this should work?
Here is the full script:
> select extname, extversion from pg_extension where extname = 'timescaledb';
extname | extversion
-------------+------------
timescaledb | 2.7.0
(1 row)
> CREATE TABLE stocks_real_time (
time TIMESTAMPTZ NOT NULL,
price DOUBLE PRECISION NULL
);
CREATE TABLE
> SELECT create_hypertable('stocks_real_time','time');
create_hypertable
-------------------------------
(7,public,stocks_real_time,t)
(1 row)
> CREATE MATERIALIZED VIEW mat_view_stocks_real_time
WITH (timescaledb.continuous)
AS (
SELECT
time_bucket('60 minutes', time) as bucketed_time,
AVG(price) as price
FROM stocks_real_time
GROUP BY bucketed_time
ORDER BY bucketed_time
);
ERROR: invalid continuous aggregate query
DETAIL: ORDER BY is not supported in queries defining continuous aggregates.
HINT: Use ORDER BY clauses in SELECTS from the continuous aggregate view instead.
I still get the same error if I explicitly add "timescaledb.finalized=true" to the with clause.
(NB: I work at Timescale!)
We have an open issue to support this, and I think the confusion is because we now support aggregates with order by clauses in them, this means things like: SELECT percentile_cont(price) WITHIN GROUP (ORDER BY time) or SELECT array_agg(foo ORDER BY time)
So I think that is probably where the confusion is coming from, but like I said, we have an open issue to support that sort of order by. You can also apply the order by in the SELECT from the continuous aggregate though: ie SELECT * FROM mat_view_stocks_real_time ORDER BY bucketed_time and that should work just fine.

DB2 to Netezza Migration

I have one query in DB2 which has mentioned below.
What would be the syntax for the same in NETEZZA?
select distinct acct_num from GTD_demo_dim where ACCT_NUM fetch first 1 rows only);
First, I don't think your statement is valid.
select distinct acct_num from GTD_demo_dim where ACCT_NUM fetch first 1 rows only);
The where clause needs to be finished and you've used a closing parenthesis without an opening one.
fetch first is common (standard?) ODBC syntax, so it's very likely that this will work. However, the usual way to do this in netezza is using a limit. All that said, this is how I'd query and expect the intended result (omitting your where since I can't infer the intent):
select distinct acct_num from gtd_demo_dim limit 1;

PostgreSQL -must appear in the GROUP BY clause or be used in an aggregate function

I am getting this error in the pg production mode, but its working fine in sqlite3 development mode.
ActiveRecord::StatementInvalid in ManagementController#index
PG::Error: ERROR: column "estates.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = ...
^
: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = 'Mazzey' GROUP BY user_id
#myestate = Estate.where(:Mgmt => current_user.Company).group(:user_id).all
If user_id is the PRIMARY KEY then you need to upgrade PostgreSQL; newer versions will correctly handle grouping by the primary key.
If user_id is neither unique nor the primary key for the 'estates' relation in question, then this query doesn't make much sense, since PostgreSQL has no way to know which value to return for each column of estates where multiple rows share the same user_id. You must use an aggregate function that expresses what you want, like min, max, avg, string_agg, array_agg, etc or add the column(s) of interest to the GROUP BY.
Alternately you can rephrase the query to use DISTINCT ON and an ORDER BY if you really do want to pick a somewhat arbitrary row, though I really doubt it's possible to express that via ActiveRecord.
Some databases - including SQLite and MySQL - will just pick an arbitrary row. This is considered incorrect and unsafe by the PostgreSQL team, so PostgreSQL follows the SQL standard and considers such queries to be errors.
If you have:
col1 col2
fred 42
bob 9
fred 44
fred 99
and you do:
SELECT col1, col2 FROM mytable GROUP BY col1;
then it's obvious that you should get the row:
bob 9
but what about the result for fred? There is no single correct answer to pick, so the database will refuse to execute such unsafe queries. If you wanted the greatest col2 for any col1 you'd use the max aggregate:
SELECT col1, max(col2) AS max_col2 FROM mytable GROUP BY col1;
I recently moved from MySQL to PostgreSQL and encountered the same issue. Just for reference, the best approach I've found is to use DISTINCT ON as suggested in this SO answer:
Elegant PostgreSQL Group by for Ruby on Rails / ActiveRecord
This will let you get one record for each unique value in your chosen column that matches the other query conditions:
MyModel.where(:some_col => value).select("DISTINCT ON (unique_col) *")
I prefer DISTINCT ON because I can still get all the other column values in the row. DISTINCT alone will only return the value of that specific column.
After often receiving the error myself I realised that Rails (I am using rails 4) automatically adds an 'order by id' at the end of your grouping query. This often results in the error above. So make sure you append your own .order(:group_by_column) at the end of your Rails query. Hence you will have something like this:
#problems = Problem.select('problems.username, sum(problems.weight) as weight_sum').group('problems.username').order('problems.username')
#myestate1 = Estate.where(:Mgmt => current_user.Company)
#myestate = #myestate1.select("DISTINCT(user_id)")
this is what I did.

Fetching rows in DB2

I know in DB2 (using version 9.7) I can select the first 10 rows of a table by using this query:
SELECT *
FROM myTable
ORDER BY id
FETCH FIRST 10 ROWS ONLY
But how can I get, for example, rows 11 to 20?
I can't use the primary key or the ID to help me...
Thanks in advance!
Here's a sample query that will get rows from a table contain state names, abbreviations, etc.
SELECT *
FROM (
SELECT stabr, stname, ROW_NUMBER() OVER(ORDER BY stname) AS rownumber
FROM states
WHERE stcnab = 'US'
) AS xxx
WHERE rownumber BETWEEN 11 AND 20 ORDER BY stname
Edit: ORDER BY is necessary to guarantee that the row numbering is consistent
between executions of the query.
You can also use the MYSQL compatibility. You just need to activate the vector compatibility for MYS, and then use Limit and Offset in your queries.
db2set DB2_COMPATIBILITY_VECTOR=MYS
db2stop
db2start
An excellent article written by DB2 experts from IBM https://www.ibm.com/developerworks/mydeveloperworks/blogs/SQLTips4DB2LUW/entry/limit_offset?lang=en
Compatibility vector in InfoCenter http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.apdv.porting.doc/doc/r0052867.html
A blog about this http://victorsergienko.com/db2-supports-limit-and-offset/

Proper GROUP BY syntax

I'm fairly proficient in mySQL and MSSQL, but I'm just getting started with postgres. I'm sure this is a simple issue, so to be brief:
SQL error:
ERROR: column "incidents.open_date" must appear in the GROUP BY clause or be used in an aggregate function
In statement:
SELECT date(open_date), COUNT(*)
FROM incidents
GROUP BY 1
ORDER BY open_date
The type for open_date is timestamp with time zone, and I get the same results if I use GROUP BY date(open_date).
I've tried going over the postgres docs and some examples online, but everything seems to indicate that this should be valid.
The problem is with the unadorned open_date in the ORDER BY clause.
This should do it:
SELECT date(open_date), COUNT(*)
FROM incidents
GROUP BY date(open_date)
ORDER BY date(open_date);
This would also work (though I prefer not to use integers to refer to columns for maintenance reasons):
SELECT date(open_date), COUNT(*)
FROM incidents
GROUP BY 1
ORDER BY 1;
"open_date" is not in your select list, "date(open_date)" is.
Either of these will work:
order by date(open_date)
order by 1
You can also name your columns in the select statement, and then refer to that alias:
select date(open_date) "alias" ... order by alias
Some databases require the keyword, AS, before the alias in your select.