Postgresql different where based on column value? - postgresql

Is it possible to dynamically change the where clause as a query executes based on the value of one of the columns? That is, lets say (and this is a completely made up example) I have a table of students, classes attended, and if they were tardy. I want to see for each student a list of all classes they have attended since the last time they were tardy. So the query would look something like this:
SELECT student, class, classdate
FROM attendance
WHERE classdate>(<<SOME QUERY that selects the most recent tardy for each student>>)
ORDER BY student,classdate;
or, to put it into more programing terminology, to perhaps make it clearer:
for studentName in (SELECT distinct(student) FROM attendance):
SELECT student, class, classdate
FROM attendance
WHERE classdate>(SELECT classdate
FROM attendance
WHERE tardy=true AND student=studentName
ORDER BY classdate DESC
LIMIT 1)
ORDER BY classdate;
Is there any way to do this with a single query, or do I need to do a separate query for each student (essentially as per the loop above)? The actual use case is more complicated to explain (it has to do with certifying records, and which ones need to be looked at) but conceptually it is the same.

Just use multiple aliases (e.g. a1 and a2) for the attendance table, so you can refer to the "outer" table alias in the subquery:
SELECT student, class, classdate
FROM attendance a1
WHERE classdate>(SELECT classdate
FROM attendance a2
WHERE tardy=true AND a2.student=a1.student
ORDER BY classdate DESC
LIMIT 1)
ORDER BY classdate;

Related

How to design MongoDB collection relation correctly

I am trying to create a simple DB of Customers/Products/Orders to practice my skills with aggregate and so on. I need some help on how to design the relation between those collections correctly (that i'll really be able to store the details in the right way).
I go for the minimal build and information-->
Customer:
(1) customer_id ,
(2)orders (this will be an array with all the order_id's this customer did)
Product:
(1) product_id
Order:
(1) order_id ,
(2) products (this will be an array with all the products in this order) + I wonder how I add quantity for each product ,
(3) user_id (this will store the user_id did this order) or i don't even need this field??
I hope someone could tell me if this is a right thinking, because I come from SQL and this what I would do there I guess
// I'm not using embedded document because in case i'll want to change a specific field value it would change in all places

How to best structure csv data for tableau that has "multiple categories"?

I have a set of 100 “student records”, I want to have checkboxes for each "favorite_food_type" and "favorite_food", whichever is checked would filter a "bar graph" that counts number of reports that contain that specific "favorite_food"type" and "favorite_food" schema could be:
name
favorite_food_type (e.g. vegetable)
favorite_food (e.g. banana)
I would like to in the dashboard be able to select via checkboxes, “Give me all the COUNT OF DISTINCT students with favorite_food of banana, apple, pear“ and filter graphs for all records. My issue is for a single student record, maybe one student likes both banana and apple. How do I best capture that? Should I have:
CASE A: Duplicate Records (this captures the two different “favorite_food”, but now I have to figure out how many students there are (which is one student)
NAME, FAVORITE_FOOD_TYPE,FRUIT
Charlie, Fruit, Apple
Charlie, Fruit, Pear
CASE B: Single Records (this captures the two different “favorite_food”, but is there a way to pick out from delimiters?)
NAME, FAVORITE_FOOD_TYPE,FRUITS
Charlie, Fruit, Apple#Pear
CASE C: Column for Each Fruit (this captures one record per student, but need a loooot of columns for each fruit, many would be false)
NAME, FAVORITE_FOOD_TYPE, APPLE, BANANA, PINEAPPLE, PEAR
Charlie, Fruit, TRUE, FALSE, TRUE, FALSE
I want to do this as easy as possible.
Avoid Case B if at all possible. Repeating information is almost always best handled by repeating rows -- not by cramming multiple values into a single table cell, nor by creating multiple columns such as Favorite_1 and Favorite_2
If you are provided data with multiple values in a field, Tableau does have functions and data connection features that can be used to split a single field into its constituent parts to form multiple fields. That works well with fixed number of different kinds of information -- say splitting a City, State field into separate fields for City and State.
Avoid Case C if at all possible. That cross tab structure makes it hard to analyze the data and make useful visualizations. Each value is treated as a separated field.
If you are provided data in crosstab format, Tableau allows you to pivot the data in the data connection pane to reshape into a form with fewer columns and many rows.
Case A is usually the best approach. You can simplify it further by factoring out repeating information into separated tables -- a process known as normalization. Then you can use a join to recombine the tables and see the repeating information when desired.
A normalized approach to your example would have two tables (or tabs in excel). The first table would have exactly one row per student with 2 columns: name and favorite_food_type. The second table would have a row per student/favorite food combination, with 2 columns: name and favorite_food. Now each student can have as many favorite foods as you like or none at all. Since both columns have a name field, that would be the key used to join (combine) the tables when needed.
Given that table design, you could have 2 data sources in Tableau. The first one just pointed to the student table and could be used to create visualizations that only involved students and favorite_food_types. The second data source would use a (left) join to read from both tables and could be used to look at favorite foods. When working with the second data source, you would have to be careful about reporting information about student names and favorite food types to account for the duplicate information. So use the first data source when possible. Finally, you could put both kinds of visualizations on a dashboard and use filter and highlight actions to make interaction seamless despite the two sources -- getting the best of both worlds.

Filter and display database audit / changelog (activity stream)

I'm developing an application with SQLAlchemy and PostgreSQL. Users of the system modify data in 8 or so tables. Consider this contrived example schema:
I want to add visible logging to the system to record what has changed, but not necessarily how it has changed. For example: "User A modified product Foo", "User A added user B" or "User C purchased product Bar". So basically I want to store:
Who made the change
A message describing the change
Enough information to reference the object that changed, e.g. the product_id and customer_id when an order is placed, so the user can click through to that entity
I want to show each user a list of recent and relevant changes when they log in to the application (a bit like the main timeline in Facebook etc). And I want to store subscriptions, so that users can subscribe to changes, e.g. "tell me when product X is modified", or "tell me when any products in store S are modified".
I have seen the audit trigger recipe, but I'm not sure it's what I want. That audit trigger might do a good job of recording changes, but how can I quickly filter it to show recent, relevant changes to the user? Options that I'm considering:
Have one column per ID type in the log and subscription tables, with an index on each column
Use full text search, combining the ID types as a tsvector
Use an hstore or json column for the IDs, and index the contents somehow
Store references as URIs (strings) without an index, and walk over the logs in reverse date order, using application logic to filter by URI
Any insights appreciated :)
Edit It seems what I'm talking about it an activity stream. The suggestion in this answer to filter by time first is sounding pretty good.
Since the objects all use uuid for the id field, I think I'll create the activity table like this:
Have a generic reference to the target object, with a uuid column with no foreign key, and an enum column specifying the type of object it refers to.
Have an array column that stores generic uuids (maybe as text[]) of the target object and its parents (e.g. parent categories, store and organisation), and search the array for marching subscriptions. That way a subscription for a parent category can match a child in one step (denormalised).
Put a btree index on the date column, and (maybe) a GIN index on the array UUID column.
I'll probably filter by time first to reduce the amount of searching required. Later, if needed, I'll look at using GIN to index the array column (this partially answers my question "Is there a trick for indexing an hstore in a flexible way?")
Update this is working well. The SQL to fetch a timeline looks something like this:
SELECT *
FROM (
SELECT DISTINCT ON (activity.created, activity.id)
*
FROM activity
LEFT OUTER JOIN unnest(activity.object_ref) WITH ORDINALITY AS act_ref
ON true
LEFT OUTER JOIN subscription
ON subscription.object_id = act_ref.act_ref
WHERE activity.created BETWEEN :lower_date AND :upper_date
AND subscription.user_id = :user_id
ORDER BY activity.created DESC,
activity.id,
act_ref.ordinality DESC
) AS sub
WHERE sub.subscribed = true;
Joining with unnest(...) WITH ORDINALITY, ordering by ordinality, and selecting distinct on the activity ID filters out activities that have been unsubscribed from at a deeper level. If you don't need to do that, then you could avoid the unnest and just use the array containment #> operator, and no subquery:
SELECT *
FROM activity
JOIN subscription ON activity.object_ref #> subscription.object_id
WHERE subscription.user_id = :user_id
AND activity.created BETWEEN :lower_date AND :upper_date
ORDER BY activity.created DESC;
You could also join with the other object tables to get the object titles - but instead, I decided to add a title column to the activity table. This is denormalised, but it doesn't require a complex join with many tables, and it tolerates objects being deleted (which might be the action that triggered the activity logging).

TSQL - Deleting with Inner Joins and multiple conditions

My question is a variation on one already asked and answered (TSQL Delete Using Inner Joins) but I have a different level of complexity and I couldn't see a solution to it.
My requirement is to delete Special Prices which haven't been accessed in 90 days. Special Prices are keyed on Customer ID and Product ID and the products have to matched to a Customer Order Detail table which also contains a Customer ID and a Product ID. I want to write one function that will look at the Special Price table for each Customer, compare each Product for that Customer with the Customer Order Detail table and if the Maximum Order Date is more than 90 days earlier than today, delete it from the Special Price table.
I know I can use a CURSOR (slow but effective) but would prefer to have a single query like the one in the TSQL Delete Using Inner Joins example. Any ideas and/or is more information required?
I cannot dig more on the situation of your system but i think and if it is ok for you, check MERGE STATEMENT, it might be a help instead of using cursors. check this Link MERGE STATEMENT

products and configurable_products in postgresql

I have a Product table and a ConfigurableProduct table.
If there are several variations of the same product like a shirt in different colors I create a ConfigurableProduct.
When a user is looking at the catalog he should see a list of products unless there is a ConfigurableProduct, then he should see it with a select box for each variations.
How do I structure the tables for Product and ConfigurableProduct and how do I query the db so I can page through the results?
Thanks
I am going to answer this as if you do not have tables created. I am not sure if that is true though.
The following is a simple example, but I assume you have more data.
products
id
name
configurable_products
id
variation
product_id REFERENCES products(id)
You can just make the configurable products a reference to products.
If you want a listing of products with their configurations then you can do:
select p.name, c.variation
from products p left outer join configurable_products c
on (p.id = c.product_id);
Of course you can just search for all the configurable_products based on the product id too when needed.
As for the paging part of your question you will have to clarify what you mean. You can use limit to limit results if you don't want to get everything at once.