Calculate table column using over table on server side - postgresql

Suppose, there are two tables in db:
Table registries:
Column | Type |
--------------------+-----------------------------+---------
registry_id | integer | not null
name | character varying | not null
...
uploaded_at | timestamp without time zone | not null
Table rows:
Column | Type | Modifiers
---------------+-----------------------------+-----------
row_id | character varying | not null
registry_id | integer | not null
row | character varying | not null
In real world registries is just a csv-file and rows is lines of the files. In my scala-slick application, I want to know how many lines in each file.
registries:
1,foo,...
2,bar,...
3,baz,...
rows:
aaa,1,...
bbb,1,...
ccc,2,...
desired result:
1,foo,... - 2
2,bar,... - 1
3,baz,... - 0
My code now is (slick-3.0):
def getRegistryWithLength(rId: Int) = {
val q1 = registries.filter(_.registryId===rId).take(1).result.headOption
val q2 = rows.filter(_.registryId===rId).length.result
val registry = Await.result(db.run(q1), 5.seconds)
val length = Await.result(db.run(q2), 5.seconds)
(registry, length)
}
(Await is bad idea, I know it)
How can I do getRegistryWithLength using single sql query?
I could add column row_n into table registries, but then I'll be forced to do updating column row_n after delete/insert query of rows table.
How can I do automatic calculation column row_n in table registries on db server side?

The basic query could be:
SELECT r.*, COALESCE(n.ct, 0) AS ct
FROM registry r
LEFT JOIN (
SELECT registry_id, count(*) AS ct
FROM rows
GROUP BY registry_id
) n USING (registry_id);
The LEFT [OUTER] JOIN is essential so you do not filter rows from registry without related rows in rows.
COALESCE to return 0 instead of NULL where no related rows are found.
There are many related answers on SO. One here:
SQL: How to save order in sql query?
You could wrap this in a VIEW for convenience:
CREATE VIEW reg_rn AS
SELECT ...
... which you query like a table.
Aside: It's unwise to use reserved SQL key words as identifiers. row is a no-go for a column name (even if allowed in Postgres).

Thanks Erwin Brandstetter for awesome answer, using it, I wrote code for my scala-slick application.
Scala code looks much more complicated than plain sql:
val registryQuery = registries.filter(_.userId === userId)
val rowQuery = rows groupBy(_.registryId) map { case (regId, rowItems) => (regId, rowItems.length)}
val q = registryQuery joinLeft rowQuery on (_.registryId === _._1) map {
case (registry, rowsCnt) => (registry, rowsCnt.map(_._2))
}
but it works!

Related

Is there any way to match multiple date ranges for inclusion in other multiple ranges in postgresql

For example I have in database allowed ranges - (08:00-12:00), (12:00-15:00) and requested range I want to test - (09:00-14:00). Is there any way to understand that my test range is included in allowed range in database. It can be splited in even more parts, I just want to know if my range fully fits to list of time ranges in database.
You don't provide table structure, so I have no idea of data type. lets assume those are texts:
t=# select '(8:00, 12:30)' a,'(12:00, 15:00)' b,'(09:00, 14:00)' c;
a | b | c
---------------+----------------+----------------
(8:00, 12:30) | (12:00, 15:00) | (09:00, 14:00)
(1 row)
then how you can do it:
t=# \x
Expanded display is on.
t=# with d(a,b,c) as (values('(8:00, 12:30)','(12:00, 15:00)','(09:00, 14:00)'))
, w as (select '2017-01-01 ' h)
, timerange as (
select
tsrange(concat(w.h,split_part(substr(a,2),',',1))::timestamp,concat(w.h,split_part(a,',',2))::timestamp) ta
, tsrange(concat(w.h,split_part(substr(b,2),',',1))::timestamp,concat(w.h,split_part(b,',',2))::timestamp) tb
, tsrange(concat(w.h,split_part(substr(c,2),',',1))::timestamp,concat(w.h,split_part(c,',',2))::timestamp) tc
from w
join d on true
)
select *, ta + tb glued, tc <# ta + tb fits from timerange;
-[ RECORD 1 ]----------------------------------------
ta | ["2017-01-01 08:00:00","2017-01-01 12:30:00")
tb | ["2017-01-01 12:00:00","2017-01-01 15:00:00")
tc | ["2017-01-01 09:00:00","2017-01-01 14:00:00")
glued | ["2017-01-01 08:00:00","2017-01-01 15:00:00")
fits | t
first you need to "cast" your time to timestamp, as there is no timerange in postgres, so we take same day for all times (w.h = 2017-01-01) and convert a,b,c to ta,tb,tc with default including brackets (which totally fits our case).
then use union https://www.postgresql.org/docs/current/static/functions-range.html#RANGE-FUNCTIONS-TABLE operator to get "glued" interval
lastly check if the range is contained by the larger one with <# operator

Recursive postgres query to view

I have the following table which models a very simple hierarchical data structure with each element pointing to its parent:
Table "public.device_groups"
Column | Type | Modifiers
--------------+------------------------+---------------------------------------------------------------
dg_id | integer | not null default nextval('device_groups_dg_id_seq'::regclass)
dg_name | character varying(100) |
dg_parent_id | integer |
I want to query the recursive list of subgroups of a specific group.
I constructed the following recursive query which works fine:
WITH RECURSIVE r(dg_parent_id, dg_id, dg_name) AS (
SELECT dg_parent_id, dg_id, dg_name FROM device_groups WHERE dg_id=1
UNION ALL
SELECT dg.dg_parent_id, dg.dg_id, dg.dg_name
FROM r pr, device_groups dg
WHERE dg.dg_parent_id = pr.dg_id
)
SELECT dg_id, dg_name
FROM r;
I now want to turn this into a view where I can choose which group I want to drill down for using a WHERE clause. This means I want to be able to do:
SELECT * FROM device_groups_recursive WHERE dg_id = 1;
And get all the (recursive) subgroups of the group with id 1
I was able to write a function (by wrapping the query from above), but I would like to have a view instead of the function.
Side-Node: I know of the shortcoming of an adjacency list representation, I cannot change it currently.

group by in postgres sql with error must appear in the GROUP BY clause or be used in an aggregate function [duplicate]

I've been migrating some of my MySQL queries to PostgreSQL to use Heroku. Most of my queries work fine, but I keep having a similar recurring error when I use group by:
ERROR: column "XYZ" must appear in the GROUP BY clause or be used in
an aggregate function
Could someone tell me what I'm doing wrong?
MySQL which works 100%:
SELECT `availables`.*
FROM `availables`
INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056 AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
GROUP BY availables.bookdate
ORDER BY availables.updated_at
PostgreSQL error:
ActiveRecord::StatementInvalid: PGError: ERROR: column
"availables.id" must appear in the GROUP BY clause or be used in an
aggregate function:
SELECT "availables".* FROM "availables" INNER
JOIN "rooms" ON "rooms".id = "availables".room_id WHERE
(rooms.hotel_id = 5056 AND availables.bookdate BETWEEN E'2009-10-21'
AND E'2009-10-23') GROUP BY availables.bookdate ORDER BY
availables.updated_at
Ruby code generating the SQL:
expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => 'availables.bookdate',
:order => 'availables.updated_at')
Expected Output (from working MySQL query):
+-----+-------+-------+------------+---------+---------------+---------------+
| id | price | spots | bookdate | room_id | created_at | updated_at |
+-----+-------+-------+------------+---------+---------------+---------------+
| 414 | 38.0 | 1 | 2009-11-22 | 1762 | 2009-11-20... | 2009-11-20... |
| 415 | 38.0 | 1 | 2009-11-23 | 1762 | 2009-11-20... | 2009-11-20... |
| 416 | 38.0 | 2 | 2009-11-24 | 1762 | 2009-11-20... | 2009-11-20... |
+-----+-------+-------+------------+---------+---------------+---------------+
3 rows in set
MySQL's totally non standards compliant GROUP BY can be emulated by Postgres' DISTINCT ON. Consider this:
MySQL:
SELECT a,b,c,d,e FROM table GROUP BY a
This delivers 1 row per value of a (which one, you don't really know). Well actually you can guess, because MySQL doesn't know about hash aggregates, so it will probably use a sort... but it will only sort on a, so the order of the rows could be random. Unless it uses a multicolumn index instead of sorting. Well, anyway, it's not specified by the query.
Postgres:
SELECT DISTINCT ON (a) a,b,c,d,e FROM table ORDER BY a,b,c
This delivers 1 row per value of a, this row will be the first one in the sort according to the ORDER BY specified by the query. Simple.
Note that here, it's not an aggregate I'm computing. So GROUP BY actually makes no sense. DISTINCT ON makes a lot more sense.
Rails is married to MySQL, so I'm not surprised that it generates SQL that doesn't work in Postgres.
PostgreSQL is more SQL compliant than MySQL. All fields - except computed field with aggregation function - in the output must be present in the GROUP BY clause.
MySQL's GROUP BY can be used without an aggregate function (which is contrary to the SQL standard), and returns the first row in the group (I don't know based on what criteria), while PostgreSQL must have an aggregate function (MAX, SUM, etc) on the column, on which the GROUP BY clause is issued.
Correct, the solution to fixing this is to use :select and to select each field that you wish to decorate the resulting object with and group by them.
Nasty - but it is how group by should work as opposed to how MySQL works with it by guessing what you mean if you don't stick fields in your group by.
If I remember correctly, in PostgreSQL you have to add every column you fetch from the table where the GROUP BY clause applies to the GROUP BY clause.
Not the prettiest solution, but changing the group parameter to output every column in model works in PostgreSQL:
expiration = Available.find(:all,
:joins => [ :room ],
:conditions => [ "rooms.hotel_id = ? AND availables.bookdate BETWEEN ? AND ?", hostel_id, date.to_s, (date+days-1).to_s ],
:group => Available.column_names.collect{|col| "availables.#{col}"},
:order => 'availables.updated_at')
According to MySQL's "Debuking GROUP BY Myths" http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html. SQL (2003 version of the standard) doesn't requires columns referenced in the SELECT list of a query to also appear in the GROUP BY clause.
For others looking for a way to order by any field, including joined field, in postgresql, use a subquery:
SELECT * FROM(
SELECT DISTINCT ON(availables.bookdate) `availables`.*
FROM `availables` INNER JOIN `rooms` ON `rooms`.id = `availables`.room_id
WHERE (rooms.hotel_id = 5056
AND availables.bookdate BETWEEN '2009-11-22' AND '2009-11-24')
) AS distinct_selected
ORDER BY availables.updated_at
or arel:
subquery = SomeRecord.select("distinct on(xx.id) xx.*, jointable.order_field")
.where("").joins(")
result = SomeRecord.select("*").from("(#{subquery.to_sql}) AS distinct_selected").order(" xx.order_field ASC, jointable.order_field ASC")
I think that .uniq [1] will solve your problem.
[1] Available.select('...').uniq
Take a look at http://guides.rubyonrails.org/active_record_querying.html#selecting-specific-fields

Why SELECT with WHERE clause returns 0 rows on Cassandra's table? (should return 2 rows)

I created a minimal example of users TABLE on Cassandra 2.0.9 database. I can use SELECT to select all its rows, but I do not understand why adding my WHERE clause (on indexed collumn) returns 0 rows.
(I also do not get why 'COINTAINS' statement causes an error here, as presented below, but let's assume this is not my primary concern. )
DROP TABLE IF EXISTS users;
CREATE TABLE users
(
KEY varchar PRIMARY KEY,
password varchar,
gender varchar,
session_token varchar,
state varchar,
birth_year bigint
);
INSERT INTO users (KEY, gender, password) VALUES ('jessie', 'f', 'avlrenfls');
INSERT INTO users (KEY, gender, password) VALUES ('kate', 'f', '897q7rggg');
INSERT INTO users (KEY, gender, password) VALUES ('mike', 'm', 'mike123');
CREATE INDEX ON users (gender);
DESCRIBE TABLE users;
Output:
CREATE TABLE users (
key text,
birth_year bigint,
gender text,
password text,
session_token text,
state text,
PRIMARY KEY ((key))
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX users_gender_idx ON users (gender);
This SELECT works OK
SELECT * FROM users;
key | birth_year | gender | password | session_token | state
--------+------------+--------+-----------+---------------+-------
kate | null | f | 897q7rggg | null | null
jessie | null | f | avlrenfls | null | null
mike | null | m | mike123 | null | null
And this does not:
SELECT * FROM users WHERE gender = 'f';
(0 rows)
This also fails:
SELECT * FROM users WHERE gender CONTAINS 'f';
Bad Request: line 1:33 no viable alternative at input 'CONTAINS'
It sounds like your index may have become corrupt. Try rebuilding it. Run this from a command prompt:
nodetool rebuild_index yourKeyspaceName users users_gender_idx
However, the larger issue here is that secondary indexes are known to perform poorly. Some have even identified their use as an anti-pattern. DataStax has a document designed to guide you in appropriate use of secondary indexes. And this is definitely not one of them.
creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense. Each value in the index becomes a single row in the index, resulting in a huge row for all the false values, for example. Indexing a multitude of indexed columns having foo = true and foo = false is not useful.
While gender may not be a boolean column, it has the same cardinality. A secondary index on this column is a terrible idea.
If querying by gender is something you really need to do, then you may need to find a different way to model or partition your data. For instance, PRIMARY KEY (state, gender, key) will allow you to query gender by state.
SELECT * FROM users WHERE state='WI' and gender='f';
That would return all female users from the state of Wisconsin. Of course, that would mean you would also have to query all states individually. But the bottom line, is that Cassandra does not handle queries for low cardinality keys/indexes well, so you have to be creative in how you solve these types of problems.

postgresql return the first positive value from multiple columns queried

I have a table in a POstgreSQL database with multiple columns out of which only one will have a value entered.
SELECT "Garden_GUID", "Municipality_Amajuba", "Municipality_Ilembe", "Municipality_Sisonke" from forms_garden
WHERE "Garden_GUID" = 'testguid';
Garden_GUID | Municipality_Amajuba | Municipality_Ilembe | Municipality_Sisonke
-------------+----------------------+---------------------+----------------------
testguid | Dannhauser | |
(1 row)
I wish to create a view in which the entries from those columns are colated into a single column.
I have tried:
CREATE VIEW municipality (GUID,funder,municipality)
AS SELECT "Garden_GUID"GUID,"Funder"funder,"Municipality_Amajuba","Municipality_Ilembe","Municipality_Sisonke"municipality
FROM forms_garden;
but it returns an error:
ERROR: column "municipality" specified more than once
Is there any way to query the various municipality_* columns row by row and only return the first positive entry?
Many thanks in advance.
I think coalesce() is what you're looking for:
with forms_garden as (
select 'guid1' guid, 'Dannhauser' amajuba, null ilembe, null sisonke
union all select 'guid2', null, 'muni2', null
union all select 'guid3', null, null, 'muni3'
) select guid, coalesce(amajuba,ilembe,sisonke) municipality from forms_garden;