How does a self join table decide what to display when conditions are "identical" - postgresql

I have a simple chat table. The chat table has a user_id column and a recipient_id column and a boolean agrees_to_chat column.
What I'd like to do, is display the users for which user 1 wants to chat with and whom all other users also want to chat with user 1.
(Note that there will be cases where 1 agrees to chat with 2, but 2 has not gone online to signal a preference yet. Obviously in those cases I don't want a chat to show up.)
Here's what I've come up with so far.
SELECT c1.user_id, c1.recipient_id, c2.user_id, c2.recipient_id FROM chats c1, chats c2
WHERE c1.recipient_id = c2.user_id
AND c1.user_id = c2.recipient_id
AND c2.user_id=1
AND c2.agrees_to_chat=true
AND c1.agrees_to_chat=true
For some reason setting c2.user_id = 1 results in what I want: records where user_id = 1, along with people who have agreed to chat listed in the recipient_id column.
However if I set it to c1.user_id=1 I get the same results flipped over. Namely, now my results are still people who have agreed to chat, but now the recipient_id = 1 for all results, and the user_id is the different users.
This matters to me because if I want to serve data that shows everyone whose agreed to chat with user 1. But if I decide to reference recipient_id in my code, I need to know that won't change...For example, on my computer I noticed that c2.user_id =1 results in what I want, but in this sql fiddle it seems to be that c1.user_id=1 gets what I need... http://sqlfiddle.com/#!15/799a9/2
So what's going on here? Is there something I'm not understanding about my query? Alternatively is there a better query for what I'm trying to achieve?

You don't need all 4 columns, since you already know 1st and 4th (and 2nd and 3rd) will be equal. Use SELECT c2.user_id, c2.recipient_id FROM ... or SELECT c1.user_id, c1.recipient_id FROM .... In case you actually need several copies of the same column from the self-joined tables, you can give names to them: SELECT c1.user_id AS user_id1, c1.recipient_id AS recipient_id1, c2.user_id AS user_id2, c2.recipient_id AS recipient_id2 FROM ...

Related

Postgres SELECT id LAG(id) OVER (ORDER BY id) FROM invoice WHERE id = 2

I've looked all over the internet and I fail to get this query running as expected.
I've got a table of invoices and some invoices are related to one another because they belong to the same project.
My ticket says I've got to get the PREVIOUS invoice based on a provided invoice.
Say Project A has 10 invoices, and I'm looking at invoice #4, I've got to write a query which will return the ID of the previous Invoice. Bear in mind, the invoice table is home to all sorts of projects, and each project could have many invoices on their own, so I want to avoid getting many IDs back and then iterating over them.
To illustrate the issue, I've written this fiddle.
It works somewhat acceptably when I don't filter for steps.id, but that means returning hundreds of IDs to sift through.
I've tried and tried but I can't seem to get the column previousStep to be kind of bound to the ID column.
Simply find the invoice with the next largest id for the same project:
SELECT inv2.id
FROM invoice AS inv1
JOIN invoice AS inv2
ON inv1.project = inv2.project AND inv1.id > inv2.id
WHERE inv1.id = 1057638
ORDER BY inv2.id DESC
LIMIT 1;

Can I obtain a postgresql lock that prevents concurrent inserts for a portion of a table?

I have a join table attendees of rooms and users with an extra column that represents a randomly assigned string (from within a pool of strings).
I want to prevent two users entering at the same time from accidentally getting assigned the same string -- so I'd want User B to wait to look up the previous users in the room (and their assigned strings) until User A has completed inserting -- but I don't want to do a table lock since a row insertion for say room_id = 12 doesn't affect an insertion for room_id = 77.
I can't use a unique constraint to solve this, since duplicate strings are possible in the case of a large number of users in a single room (the strings get reassigned evenly once all of them have been used once).*
My guess is doing something like
SELECT room_id, user_id, random_string WHERE room_id = ? FOR UPDATE
isn't going to help because it's not going to prevent User B from doing an insert -- and even if that SELECT FOR UPDATE prevented user B from doing the same call to read the rows corresponding to that room, what happens if both User A and User B are the first ones to join (and there aren't any rows for that room_id to lock)?
Would I use an advisory lock that's keyed on the room_id? Would that still help if I had multiple concurrent writes (e.g. User A should finish first, then User B, then User C)?
*Here's the example of why the strings aren't necessarily unique: say the pool of strings is "red", "blue", "green" -- the first three users that enter are each assigned one of them randomly, then the pool resets. The next three users are also assigned them randomly, so for the six users in the room, exactly two would have "red", two "blue", and two "green".
After a frustrating few days on SO I ended up getting a helpful response on Reddit that solves the problem:
Use a SELECT FOR UPDATE query on the rooms table, locking the row that corresponds to the room until the first user’s transaction completes.

Filter and display database audit / changelog (activity stream)

I'm developing an application with SQLAlchemy and PostgreSQL. Users of the system modify data in 8 or so tables. Consider this contrived example schema:
I want to add visible logging to the system to record what has changed, but not necessarily how it has changed. For example: "User A modified product Foo", "User A added user B" or "User C purchased product Bar". So basically I want to store:
Who made the change
A message describing the change
Enough information to reference the object that changed, e.g. the product_id and customer_id when an order is placed, so the user can click through to that entity
I want to show each user a list of recent and relevant changes when they log in to the application (a bit like the main timeline in Facebook etc). And I want to store subscriptions, so that users can subscribe to changes, e.g. "tell me when product X is modified", or "tell me when any products in store S are modified".
I have seen the audit trigger recipe, but I'm not sure it's what I want. That audit trigger might do a good job of recording changes, but how can I quickly filter it to show recent, relevant changes to the user? Options that I'm considering:
Have one column per ID type in the log and subscription tables, with an index on each column
Use full text search, combining the ID types as a tsvector
Use an hstore or json column for the IDs, and index the contents somehow
Store references as URIs (strings) without an index, and walk over the logs in reverse date order, using application logic to filter by URI
Any insights appreciated :)
Edit It seems what I'm talking about it an activity stream. The suggestion in this answer to filter by time first is sounding pretty good.
Since the objects all use uuid for the id field, I think I'll create the activity table like this:
Have a generic reference to the target object, with a uuid column with no foreign key, and an enum column specifying the type of object it refers to.
Have an array column that stores generic uuids (maybe as text[]) of the target object and its parents (e.g. parent categories, store and organisation), and search the array for marching subscriptions. That way a subscription for a parent category can match a child in one step (denormalised).
Put a btree index on the date column, and (maybe) a GIN index on the array UUID column.
I'll probably filter by time first to reduce the amount of searching required. Later, if needed, I'll look at using GIN to index the array column (this partially answers my question "Is there a trick for indexing an hstore in a flexible way?")
Update this is working well. The SQL to fetch a timeline looks something like this:
SELECT *
FROM (
SELECT DISTINCT ON (activity.created, activity.id)
*
FROM activity
LEFT OUTER JOIN unnest(activity.object_ref) WITH ORDINALITY AS act_ref
ON true
LEFT OUTER JOIN subscription
ON subscription.object_id = act_ref.act_ref
WHERE activity.created BETWEEN :lower_date AND :upper_date
AND subscription.user_id = :user_id
ORDER BY activity.created DESC,
activity.id,
act_ref.ordinality DESC
) AS sub
WHERE sub.subscribed = true;
Joining with unnest(...) WITH ORDINALITY, ordering by ordinality, and selecting distinct on the activity ID filters out activities that have been unsubscribed from at a deeper level. If you don't need to do that, then you could avoid the unnest and just use the array containment #> operator, and no subquery:
SELECT *
FROM activity
JOIN subscription ON activity.object_ref #> subscription.object_id
WHERE subscription.user_id = :user_id
AND activity.created BETWEEN :lower_date AND :upper_date
ORDER BY activity.created DESC;
You could also join with the other object tables to get the object titles - but instead, I decided to add a title column to the activity table. This is denormalised, but it doesn't require a complex join with many tables, and it tolerates objects being deleted (which might be the action that triggered the activity logging).

How to design database schema for meteor/mondodb for this situation?

I don't know how the collection for a meteor app should be.
In MySQL I would have 3 tables:
table_1: id, column_a, column_b
table_2: id, table_1_id, column_c, column_d
table_3: id, column_e, column_f
table_2 and table_3 could have the same identical rows. Some information can be in both tables, in table_2 and not in table_3, in table_3 and not in table_2.
I know that in meteor/mongodb when you design database schema, you need to know how you will access/display the information. I want to display something like this:
table_1.column_a
show all rows from table2 where table_2.table_1_id=table_1.id; and I
also want to check if table_2.column_c=table_3.column_e, if it's true
than I want to display that row from table_3.
I hope you understand, also if you have some suggestions about subscriptions/publications would be much appreciated.
P.S. I am sorry for the title of this topic, but I couldn't find a more specific title.
UPDATE:
Explaining it above I better understand the problem.
What I want is like a list of products(list A), and every product has a list of specifications. And I would like to have another list(list B), where I have a list of specifications with more details.
And I want to display the product details, including it's list of specifications, and when it displays the specifications of the products, I want to search in list B to find if there is a similar item, to show it's full descriptions.
I want to make that search when it's displaying because I want to be able to add the specification details(list B) later and this list will be updated periodically.
The list A(title, and another 3-4 columns) would have tens of thousands of products, the list of specification(title) of products in list A would have 10-20 items, and list B(title, description, status) would have a few hundreds.
I have an idea to create a collection of list A and for every product in there add an array with the specifications, and another collection for list B. I would subscribe/publish the whole collection of list B, and when I display the list A, I would search for every specification in list B. I don't know how good this idea is.

How do you store and display if a user has voted or not on something?

I'm working on a voting site and I'm wondering how I should handle votes.
For example on SO when you vote for a question (or answer) your vote is stored, and each time I go back on the page I can see that I already voted for this question because the up/down button are colored.
How do you do that? I mean I've several ideas but I'm wondering if it won't be an heavy load for the database.
Here is my ideas:
Write an helper which will check for every question if a voted has been casted
That's means that the number of queries will depends on the number of items displayed on the page (usually ~20)
Loop on my items get the ids and for each page write a query which will returns if a vote has been casted or NULL
Looks ok because only one query doesn't matter how much items on the page but may be break some MVC/Domain Model design, dunno.
When User log in (or a guest for whom an anonymous user is created) retrieve all votes, store them in session, if a new vote is casted, just add it to the session.
Looks nice because no queries is needed at all except the first one, however, this one and, depending on the number of votes casted (maybe a bunch for each user) can increase the size of the session for each users and potentially make the authentification slow.
How do you do? Any other ideas?
For eg : Lets assume you have a table to store votes and the user who cast it.
Lets assume you keep votes in user_votes when a vote is cast with a table structure something like the below one.
id of type int autoincrement
user_id type int, Foreign key representing users table
question_id type of int, Foreign key representing questions table
Now as the user will be logged in , when you are doing a fetch for the questions do a left join with the user_id in the user_votes table.
Something like
SELECT q.id, q.question, uv.id
FROM questions AS q
LEFT JOIN user_votes AS uv ON
uv.question_id = q.id AND
uv.user_id = <logged_in_user_id>
WHERE <Your criteria>
From the view you can check whether the id is present. If so mark voted, else not.
You may need to change your fields of the questions table and all. I am assuming you store questions in questions table and users in user table so and so. All having the primary key id .
Thanks
You could use a combination of your suggested strategies.
Retrieve all the votes made by the logged in user for recent/active questions only and store them in the session.
You then have the ones that are more likely to be needed while still reducing the amount you need to store in the session.
In the less likely event that you need other results, query for just those as and when you need to.
This strategy will reduce the amount you need to store in the session and also reduce the number of calls you make to your database.
Just based on the information than you've given so far, I would take the second approach: get the IDs of all the items on the page, and then do a single query to get all the user's votes for that list of item IDs. Then pass the collection of the user's item votes to your view, so it can render items differently when the user has voted for that item.
The other two approaches seem like they would tend to be less efficient, if I understood you correctly. Using a view helper to initiate an individual query for each item to check if the user has voted on it could lead to a lot of unnecessary queries. And preloading all the user's voting history at login seems to add unnecessary overhead, getting data that isn't always needed and adding the burden of keeping it up to date for the duration of the session.