I'm saving dynamic objects (objects of which I do not know the type upfront) using the following 2 tables in Postgres:
CREATE TABLE IF NOT EXISTS objects(
id UUID NOT NULL DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
name TEXT NOT NULL,
PRIMARY KEY(id)
);
CREATE TABLE IF NOT EXISTS object_values(
id UUID NOT NULL DEFAULT gen_random_uuid(),
event_id UUID NOT NULL,
param TEXT NOT NULL,
value TEXT NOT NULL,
);
So for instance, if I have the following objects:
dog = [
{ breed: "poodle", age: 15, ...},
{ breed: "husky", age: 9, ...},
}
monitors = [
{ manufacturer: "dell", ...},
}
It will live in the DB as follows:
-- objects
| id | user_id | name |
|----|---------|---------|
| 1 | 1 | dog |
| 2 | 2 | dog |
| 3 | 1 | monitor |
-- object_values
| id | event_id | param | value |
|----|----------|--------------|--------|
| 1 | 1 | breed | poodle |
| 2 | 1 | age | 15 |
| 3 | 2 | breed | husky |
| 4 | 2 | age | 9 |
| 5 | 3 | manufacturer | dell |
Note, these tables are big (hundreds of millions). Generally optimised for writing.
What would be a good way of querying/filtering objects based on multiple object params? For instance: Select the number of all husky dogs above the age of 10 per unique user.
I also wonder whether it would have been better to denormalise the tables and collapse the params onto a JSON column (and use gin indexes).
Are there any standards I can use here?
"Select the number of all husky dogs above the age of 10 per unique user" - The following query would do it.
SELECT user_id, COUNT(DISTINCT event_id) AS num_husky_dogs_older_than_10
FROM objects o
INNER JOIN object_values ov
ON o.id_ = ov.event_id
AND o.name_ = 'dog'
GROUP BY o.user_id
HAVING MAX(CASE WHEN ov.param = 'age'
AND ov.value_::integer >= 10 THEN 1 END) = 1
AND MAX(CASE WHEN ov.param = 'breed'
AND ov.value_ = 'husky' THEN 1 END) = 1;
Since your queries are most likely affected by having always the same JOIN operation between these two tables on the same fields, would be good to have a indices on:
the fields you join on ("objects.id", "object_values.event_id")
the fields you filter on ("objects.name", "object_values.param", "object_values.value_")
Check the demo here.
Related
I have some data which looks something like this after I run some filters on it:
Business Type | Business City | Min Rating | Max Rating
--------------+---------------+------------+-----------
Restaurant | Barcelona | 2 | 8
Restaurant | Barcelona | 1 | 10
Restaurant | Madrid | 4 | 8
Diner | Madrid | 5 | 8
Diner | Madrid | 1 | 8
Diner | Barcelona | 3 | 8
I need to return only one row for each business type and city combination. The row I return needs to be the one where the min and max ratings are the closest to specific numbers. For example, I want to return the one closest to a minimum rating of 3 and a max rating of 7. This would result in:
Business Type | Business City | Min Rating | Max Rating
--------------+---------------+------------+-----------
Restaurant | Barcelona | 2 | 8
Restaurant | Madrid | 4 | 8
Diner | Madrid | 5 | 8
Diner | Barcelona | 3 | 8
This is in a Rails 5 application using ActiveRecord. I'm open to using Arel, ActiveRecord DSL, or PostgreSQL.
You can define a custom aggregate function for that:
CREATE FUNCTION closer_to(integer, integer, integer) RETURNS integer
LANGUAGE sql AS
'SELECT CASE WHEN abs($1 - $3) < abs($2 - $3) THEN $1 ELSE $2 END';
CREATE AGGREGATE closest_to(integer, integer) (
STYPE = integer,
SFUNC = closer_to
);
Then you can write your query as:
SELECT "Business Type", "Business City",
closest_to("Min Rating", 3) AS "Min Rating",
closest_to("Max Rating", 7) AS "Max Rating",
FROM mytable
GROUP BY "Business Type", "Business City";
Situation
Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to use cubes to be able to do a display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
Note the following:
Even though there were no records in sales for Saucer 12 under Store S3, I displayed 0 instead of null or none.
I want to be able to do sort by store, say descending order for, S3.
The cells indicate the SUM total of that particular product spent in that particular store.
I also want to have pagination.
What I tried
This is the configuration I used:
"cubes": [
{
"name": "sales",
"dimensions": ["product", "store"],
"joins": [
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
This is the code I used:
result = browser.aggregate(drilldown=['Store','Product'],
order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])
I didn't get what I want.
I got it like this:
----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1 | 1 | 7.05 |
| 1 | 2 | 9 |
| 2 | 3 | 2.00 |
| and many more .... |
|---------------------------------------------|
which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.
My question
How do I get what I want?
Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?
Update
I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations
Cross-posted at Cubes GitHub issues to get more attention.
This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:
PostgreSQL Crosstab Query
And this about the "extra" column in the crosstab() output:
Pivot on Multiple Columns using Tablefunc
SELECT product_id, product
, COALESCE(s1, 0) AS s1 -- 1. ... displayed 0 instead of null
, COALESCE(s2, 0) AS s2
, COALESCE(s3, 0) AS s3
, COALESCE(s4, 0) AS s4
, COALESCE(s5, 0) AS s5
FROM crosstab(
'SELECT s.product_id, p.name, s.store_id, s.sum_amount
FROM product p
JOIN (
SELECT product_id, store_id
, sum(amount) AS sum_amount -- 3. SUM total of product spent in store
FROM sales
GROUP BY product_id, store_id
) s ON p.id = s.product_id
ORDER BY s.product_id, s.store_id;'
, 'VALUES (1),(2),(3),(4),(5)' -- desired store_id's
) AS ct (product_id int, product text -- "extra" column
, s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER BY s3 DESC; -- 2. ... descending order for S3
Produces your desired result exactly (plus product_id).
To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.
SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.
Major points
Read the basic explanation in the reference answer for crosstab().
I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.
You don't need the store table in the query if referential integrity is guaranteed.
ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:
PostgreSQL sort by datetime asc, null first?
For building crosstab() queries dynamically consider:
Dynamic alternative to pivot with CASE and GROUP BY
I also want to have pagination.
That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:
Displaying data in grid view page by page
I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.
To optimize performance for big result sets, consider:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Optimize query with OFFSET on large table
ALTER TABLE users ADD todo map;
UPDATE users SET todo = { '1':'1111', '2':'2222', '3':'3' ,.... } WHERE user_id = 'frodo';
now ,i want to run the follow cql ,but failed ,is here any other method ?
SELECT user_id, todo['1'] FROM users WHERE user_id = 'frodo';
ps:
the length my map can change. for example : { '1':'1111', '2':'2222', '3':'3' } or { '1':'1111', '2':'2222', '3':'3', '4':'4444'} or { '1':'1111', '2':'2222', '3':'3', '4':'4444' ...}
If you want to use a map collection, you'll have the limitation that you can only select the collection as a whole (docs).
I think you could use the suggestion from the referenced question, even if the length of your map changes. If you store those key/value pairs for each user_id in separate fields, and make your primary key based on user_id and todo_k, you'll have access to them in the select query.
For example:
CREATE TABLE users (
user_id text,
todo_k text,
todo_v text,
PRIMARY KEY (user_id, todo_k)
);
-----------------------------
| user_id | todo_k | todo_v |
-----------------------------
| frodo | 1 | 1111 |
| frodo | 2 | 2222 |
| sam | 1 | 11 |
| sam | 2 | 22 |
| sam | 3 | 33 |
-----------------------------
Then you can do queries like:
select user_id,todo_k,todo_v from users where user_id = 'frodo';
select user_id,todo_k,todo_v from users where user_id = 'sam' and todo_k = 2;
I have a table categories:
ID | NAME | PARENT ID | POSITION | LEVEL | ORDER
----------------------------------------------------------------------------
1 | root | -1 | 0x | 0 | 255
2 | cars | 1 | 0x58 | 1 | 10
5 | trucks | 1 | 0x68 | 1 | 10
13 | city cars | 2 | 0x5AC0 | 2 | 255
14 | offroad cars | 2 | 0x5B40 | 2 | 255
where:
ID int ident
NAME nvarchar(255)
PARENT ID int
POSITION hierarchyid
LEVEL hierarchyid GetLevel()
ORDER tinyint
This table model specifies model name and category where it belongs. Example:
ID | NAME | CATEGORY
-----------------------------
1 | Civic | 13
2 | Pajero | 14
3 | 815 | 5
4 | Avensis | 13
where:
ID int ident
NAME nvarchar(255)
CATEGORY int link to ID category table
What I am trying to do is to be able to show:
all models - would show all models from root recursively,
models within category cars (cars included)
models from city cars (or its children if any)
How do I use hierarchyid for such filtering and how to join the table for results with models? Is that a quick way how to show all model results from certain level?
I believe this would have given you what you were looking for:
declare #id hierarchyid
select #id = Position from Categories where Name = 'root' -- or 'cars', 'city cars', etc.
select m.*
from Models m
join Categories c on m.Category = c.ID
where c.Position.IsDescendantOf(#id) = 1
For more information on the IsDescendantOf method and other hierarchyid methods, check the method reference.
You going to want to use a CTE: Common Table Expression
https://web.archive.org/web/20210927200924/http://www.4guysfromrolla.com/webtech/071906-1.shtml
Introduced in SQL 2005 the allow for an easy way to do hierarchic or recursive relationships.
This is pretty close to your example:
http://www.sqlservercurry.com/2009/06/simple-family-tree-query-using.html
Creating a SQL query that performs math with variables from multiple tables
This question I asked previously will help a bit as far as layout, for the sake of saving time I'll include the important bits and add in more detail for scenarios pertaining to this:
A mockup of what the tables look like:
Inventory
ID | lowrange | highrange | ItemType
----------------------------------------
1 | 15 | 20 | 1
2 | 21 | 30 | 1
3 | null | null | 1
4 | 100 | 105 | 2
MissingOrVoid
ID | Item | Missing | Void
---------------------------------
1 | 17 | 1 | 0
1 | 19 | 1 | 0
4 | 102 | 0 | 1
4 | 103 | 1 | 0
4 | 104 | 1 | 0
TableWithDataEnteredForItemType1
InventoryID| ItemID | Detail1 | Detail2 | Detail3
-------------------------------------------------
1 | 16 | Some | Info | Here
1 | 18 | More | Info | Here
1 | 20 | Data | Is | Here
2 | 21 | .... | .... | ....
2 | 24 | .... | .... | ....
2 | 28 | .... | .... | ....
2 | 29 | .... | .... | ....
2 | 30 | .... | .... | ....
TableWithDataEnteredForItemType2
InventoryID| ItemID | Col1 | Col2 | Col3
----------------------------------------
4 | 101 | .... | .... | ....
I attempted this. I know it is not functional but it illustrates what I'm trying to do and I personally haven't seen anything written up like this before:
SELECT CASE WHEN (I.ItemType = 1) THEN SELECT TONE.ItemID FROM
TableWithDataEnteredForItemType1 TONE WHEN (I.ItemType = 2)
THEN SELECT TTWO.ItemID FROM TableWithDataEnteredForItemType2
TTWO END AS ItemMissing Inventory I JOIN CASE WHEN (I.ItemType = 1) THEN
TableWithDataEnteredForItem1 T WHEN (I.ItemType = 2) THEN
TableWithDataEnteredForItem2 T END ON
I.ID = T.InventoryID WHERE ItemMissing NOT BETWEEN IN (SELECT
I.lowrange FROM Inventory WHERE I.lowrange IS NOT NULL) AND IN
(SELECT I.highrange FROM Inventory WHERE I.highrange IS NOT NULL)
AND ItemMissing NOT IN (SELECT Item from MissingOrVoid)
The result should be:
ItemMissing
----
15
22
23
25
26
27
105
I know I'm probably not even going in the right direction with my query, but I was hoping I could get some direction as to fixing it to get the results that are needed.
Thanks
Edit:
Specific requirements (thought I included this but appears I didn't) - return list of all items not accounted for in the system. There are two ways of something being accounted for: 1) an entry in the corresponding ItemType table 2) located in MissingOrVoid (known items to be missing or removed).
DDL (I had to look this up as I wasn't sure what was meant here. Being that I have very little experience creating tables by scripting, this will probably be psuedo-DDL):
Inventory
ID - int, identifier/primary key, not nullable
lowrange - int, nullable
highrange - int, nullable
itemtype - int, not nullable
MissingOrVoid
ID - int, foreign key for Inventory.ID, not nullable
Item - int, identifier/primary key, not nullable
missing - bit, not nullable
void - bit, not nullable
Tables for Item types:
IntenvoryID - int, foreign key for Inventory.ID, not nullable
ItemID - int, primary key, not nullable
Everything else - not needed for querying, just data about the item (included
to show that the tables aren't the same content)
Edit 2: Here's an incredibly inefficient C# and Linq way of doing it but maybe of some help:
List<int> Items = new List<int>();
List<int> MoV = (from c in db.MissingOrVoid Select c.Item).ToList();
foreach (Table...ItemType1 row in db.Table...ItemType1)
Items.Add(row.ItemID);
foreach (Table...ItemType2 row in db.Table...ItemType2)
Items.Add(row.ItemID);
List<Range> InventoryRanges = new List<Range>();
foreach (Inventory row in db.Inventories)
{
if (row.lowrange != null && row.highrange != null)
InventoryRanges.Add(new Range(row.lowrange, row.highrange));
}
foreach (int item in Items)
{
foreach (Range range in InventoryRanges)
{
if (range.lowrange <= item && range.highrange >= item)
Items.Remove(item);
}
if (MoV.Contains(item))
Items.Remove(item);
}
return Items;
There's a ready made number table called master..spt_values, which can be quite helpful in this case. Note though, that you can use this table if the distance between lowrange and highrange cannot exceed 2047, otherwise create, populate and use your own number table instead.
Here's the method:
SELECT
ItemMissing = i.Item
FROM (
SELECT
i.ID,
Item = i.lowrange + v.number,
i.ItemType
FROM Inventory i
INNER JOIN master..spt_values v
ON v.type = 'P' AND v.number BETWEEN 0 AND i.highrange - i.lowrange
) inv
LEFT JOIN MissingOrViod m
ON inv.ID = m.ID AND inv.Item = m.Item
LEFT JOIN TableWithDataEnteredForItem1 t1 ON inv.ItemType = 1
AND inv.ID = t1.InventoryID AND inv.Item = t1.ItemID
LEFT JOIN TableWithDataEnteredForItem2 t2 ON inv.ItemType = 2
AND inv.ID = t2.InventoryID AND inv.Item = t2.ItemID
WHERE m.ID IS NULL AND t1.InventoryID IS NULL AND t2.InventoryID IS NULL
The subselect expands the Inventory table into a complete item list with item IDs as defined by lowrange and highrange (this is where the number table comes in handy). The obtained list is then compared against the other three tables to find and exclude those items that are present in them. The remaining items, then, constitute the list of 'items missing'.