PostgreSQL - Select from one table based on another - postgresql

I have a table of tweets (#OneToMany) and another table of analyzedtweets (#ManyToOne) with 'n' number of analyzedtweets (one per analyst) for each entry in the tweet table. Essentially, I can have any number of analysts (represented in a table), each one can analyze a tweet just once. To make it a bit more complex, the entries in the tweet table are grouped by process which is represented by yet another table.
My question is, how would I query the analyzedtweet table for the tweet_id in the last entry given a specific process_id and analyst_id and then use that to find the next tweet in the tweet table also given the same process_id and analyst_id? Basically, I want to give the analyst the next tweet that he/she has not yet analyzed within that specific process (run).
Here are my tables:
CREATE TABLE tweet (
id SERIAL PRIMARY KEY,
process_id INTEGER NOT NULL REFERENCES process(id) ON DELETE CASCADE ON UPDATE CASCADE,
...
);
CREATE TABLE analyzedtweet (
id SERIAL PRIMARY KEY,
tweet_id INTEGER NOT NULL REFERENCES tweet(id) ON DELETE CASCADE ON UPDATE CASCADE,
analyst_id INTEGER NOT NULL REFERENCES analyst(id) ON DELETE CASCADE ON UPDATE CASCADE,
process_id INTEGER NOT NULL REFERENCES process(id) ON DELETE CASCADE ON UPDATE CASCADE,
...
);
CREATE TABLE process (
id SERIAL PRIMARY KEY,
...
);
CREATE TABLE analyst (
id SERIAL PRIMARY KEY,
...
);
The only way I know how to do this is in 2 steps:
Given a specific process_id (processId) and analyst_id (analystId) run the following query to give me the last tweet_id analyzed by that analyst in that process.
SELECT tweet_id from analyzedtweet WHERE analyzedtweet.analyst_id = analystId AND analyzedtweet.process_id = processId ORDER BY analyzedtweet.tweet_id DESC LIMIT 1
Take the result of the above query (referred to ask latestTweetId) and run the following query:
SELECT * from tweet WHERE tweet.id > latestTweetId AND tweet.process_id = processId ORDER BY tweet.id DESC LIMIT 1
I'm sure there is a much better way to do this with JOIN, I just can't figure out how.
Finally, I am using Hibernate and would like to get the POJO back.

If you are fetching the latest tweet for a giving process_id and analyzedtweet_id use this query:
List<Tweet> t = session.createQuery("select t from tweet t
join t.process p,t.analyzedtweet a
where p.id=? and a.id=? order by t.id desc")
.setParameter(1, process_id)
.setParameter(2, analyzedtweet_id)
.setMaxResults(1).getResultList();

Related

Use index to speed up query using values from different tables

I have a table products, a table orders and a table orderProducts.
Products have a name as a PK (apple, banana, mango) and a price .
orders have a created_at date and an id as a PK.
orderProducts connects orders and products, so they have a product_name and an order_id. Now I would like to show all orders for a given product that happened in the last 24 hours.
I use the following query:
SELECT
orders.id,
orders.created_at,
products.name,
products.price
FROM
orderProducts
JOIN products ON
products.name=orderProducts.product
JOIN orders ON
orders.id=orderProducts.order
WHERE
products.name='banana'
AND
orders.created_at BETWEEN NOW() - INTERVAL '24 HOURS' AND NOW()
ORDER BY
orders.created_at
This works, but I would like to optimize this query with an index. This index would need to first be ordered by
the product name, so it can be filtered
then the created_at of the order in descending order, so it can select only the ones from 24 hours ago
The problem is, that from what I have seen, indexes can only be created on a single table, without the possibility of joining another tables values to it. Since two individual index do not solve this problem either, I was wondering if there was an alternative way to optimize this particular query.
Here are the table scripts:
CREATE TABLE products
(
name text PRIMARY KEY,
price integer,
)
CREATE TABLE orders
(
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW(),
)
CREATE TABLE orderProducts
(
product text REFERENCES products(name),
"order" integer REFERENCES orders(id),
)
First of all. Please do not put indices everywhere - that lead to slower changing operations...
As proposed by #Laurenz Albe - do not guess - check.
Other than that. Note that you know product name, price is repeated - so you can query that once. Question if in your case two queries are going to be faster then single one... Check that.
Please read docs. I would try this index:
create index orders_id_created_at on orders(created_at desc, id)
Normally id should go first, since that is unique, however here system should be able to filter out on both predicates - where/join. Just guessing here.
orderProducts I would like to see index on both columns, however for this query only one should be needed. In practice you are going from products to orders, or other way - both paths are possible, that is why I've wrote about indexing both columns. I would use two separate indexes:
create index orderproducts_product_id on orderproducts (product_id) include (order_id);
create index orderproducts_order_id on orderproducts (order_id) include (product_id);
Probably that is not changing much, but... idea is to use only index, but not the table itself.
These rules are important in terms of performance:
Integer index faster than string index, therefore, you should try to make the primary keys always be an integer. Because join the tables uses primary keys too.
If when in where clauses always use two fields then we must create an index for both fields.
Foreign-Keys are not indexed, you must create an index for foreign-key fields manually.
So, recommended table scripts will be are that:
CREATE TABLE products
(
id serial primary key,
name text,
price integer
);
CREATE UNIQUE INDEX products_name_idx ON products USING btree (name);
CREATE TABLE orders
(
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX orders_created_at_idx ON orders USING btree (created_at);
CREATE TABLE orderProducts
(
product_id integer REFERENCES products(id),
order_id integer REFERENCES orders(id)
);
CREATE INDEX orderproducts_product_id_idx ON orderproducts USING btree (product_id, order_id);
---- OR ----
CREATE INDEX orderproducts_product_id ON orderproducts (product_id);
CREATE INDEX orderproducts_order_id ON orderproducts (order_id);

Delete all records that violate new unqiue constraint

I have a table that has the following fields
----------------------------------
| id | user_id | doc_id |
----------------------------------
I want to create a new unique constraint to make sure that there are no repeat user_id and doc_id records. Aka a user can only be linked to a doc one time. That is simple enough.
ALTER TABLE mytable
ADD CONSTRAINT uniquectm_const UNIQUE (user_id, doc_id);
The issue is I have records that currently violate that constraint. I was wondering if there is an easy way to query for those records or to tell postgres just delete anything that violates the constraint.
Identifying records that violate your new key:
SELECT *
FROM
(
SELECT id, user_id, doc_id
, COUNT(*) OVER (PARTITION BY user_id, doc_id) as unique_check
FROM mytable
)
WHERE unique_check > 1;
Then you can figure out from those duplicates, which should be deleted and perform the delete.
To my knowledge there is no other way to perform this since any automated "Delete any duplicates" command would leave the database engine to decide which of the two-or-more duplicate records to get rid of.
If the entire record is a duplicate (all columns match) then you could just create a new table with your new unique constraint and do a INSERT INTO newtable SELECT DISTINCT * FROM oldtable but I'm betting that isn't the case.

Is there any way in sql to resolve this scenario?

I have a table in database like this
CREATE TABLE assignments
(
id uuid,
owner_id uuid NOT NULL,
);
Now I want to check in records , If IDs I am getting from request already exist or Not. If exist I will update owner_id and If In request I Not getting a ID which already exist in table I have to delete that record.
(In short it's update mechanism In which I am getting multiple Id's to update in table and If there is already a record in database and In request aswell I will update in database , and If there is record in table but not in request I will delete that from database)
This can be done with a single Insert statement with the ON Conflict clause. Your first task will be creating a PK (or UNIQUE) constraint on the table. Presumably that would be id.
alter table assignments
add constraint primary key (id);
Then insert your data, the 'on constraint' clause will update the owner_id for any existing id.
insert assignments(id, owner_id)
values ( <id>,<owner>)
on conflict (id)
do update
set owner_id = excluded.owner_id;

Recursive DELETE statement to remove all posts within topic category and all subtopics

I have a challenge I am solving that requires me to do the following:
"Delete all published posts in the “Customer Success” topic and its subtopics. A subtopic is a child or descendant topic (similar to folders vs subfolders) and can include many levels. For example, there might be a topic “Company” with a subtopic “Engineering” with a subtopic “Backend” with a subtopic “Elixir” so the hierarchy is Company > Engineering > Backend > Elixir. Here Elixir is also a subtopic of Company."
The tables I have that may be included in this statement are the: public.topics, public.posts_topics, and public.posts
I am new to PostgreSQL and have never done anything like a recursive deletion of child elements before. I know the posts_topics table has a foreign key to both the posts and topics tables.
Does anyone have any advice for how this statement should be written?
When you want to create an appropriate table structure you should think about FOREIGN KEYS. The FK connects a record in one table with a record of the same or another table. You can tell the FK that the related record must be deleted if the referenced record is deleted. This is done in the table definition using
FOREIGN KEY (column) REFERENCES another_table(primary_key_column) ON DELETE CASCADE
demo:db<>fiddle
CREATE TABLE topics (
id int PRIMARY KEY,
name text,
-- references another id in same table (build the topic hierarchy)
parent int REFERENCES topics(id) ON DELETE CASCADE
);
CREATE TABLE posts (
id int PRIMARY KEY,
post_text text,
-- connects the owner topic to the post
owner_id int REFERENCES topics(id) ON DELETE CASCADE
);
-- in fact, because we are using the owner_id in table "posts",
-- this table is not really required anymore but you requested it
CREATE TABLE posts_topics (
p_id int REFERENCES posts(id) ON DELETE CASCADE,
t_id int REFERENCES topics(id) ON DELETE CASCADE
);
However, if you really wanted to do this with a recursive query, the query could look like this:
demo:db<>fiddle
WITH RECURSIVE trace_tree AS (
SELECT id -- 1
FROM topics
WHERE id = 2
UNION
SELECT t.id
FROM topics t
JOIN trace_tree tt ON tt.id = t.parent
), del_posts_topics AS ( -- 2
DELETE FROM posts_topics WHERE t_id = ANY (
SELECT * FROM trace_tree
)
RETURNING p_id
), del_posts AS ( -- 3
DELETE FROM posts WHERE id = ANY (
SELECT * FROM del_posts_topics
)
)
DELETE FROM topics WHERE id = ANY ( -- 4
SELECT * FROM trace_tree
);
This is the recursion (WITH RECURSIVE). A recursive CTE contains two parts, combined by the UNION clause. First is the recursion initialization. Second is the recursion which joins the previous run on the current table. Finally this returns a list of ids which represents the ancestors of the topic given in the initialization.
Next CTE: Delete all entry in the posts_topics join table with all records which contains a topic t_id from the list queried above. The DELETE statement returns the post ids (p_id) which were deleted in this step using RETURNING.
The previously returned (and deleted) posts' p_ids are used in this DELETE statement for deleting the real posts.
Finally delete all the topics queried in (1)

What suitable locking technique to prevent insertion of data? [PostgreSQL]

I want to ensure that one worker has exactly one manager
CREATE TABLE IF NOT EXISTS user_relationships (
object_id SERIAL PRIMARY KEY NOT NULL UNIQUE,
manager_id INT REFERENCES users (object_id) NOT NULL,
worker_id INT REFERENCES users (object_id) NOT NULL,
CHECK (manager_id != worker_id),
UNIQUE (manager_id, worker_id)
);
I have series of SQL statements using Read Committed level of transaction isolation,
BEGIN
SELECT * FROM users WHERE id=manager_id AND acc_type="manager" FOR UPDATE;
SELECT * FROM users WHERE id=worker_id AND acc_type="worker" FOR UPDATE;
SELECT * FROM relationships WHERE id=worker;
INSERT INTO relationships (m, w) VALUES (manager_id, worker_id)
COMMIT
I have figured out the first two FOR UPDATE to prevent other
concurrent transactions from changing the users account type mid
transaction
I could not figure out what kind of "trick" to use for third query. Third query should return empty list to ensure that the worker has not yet been owned by any manager.
Third query FOR UPDATE does not work because I am expecting an empty row.
Due to third query, I run the risk of concurrent transaction adding duplicate worker to different managers.
What can I do to enforce one worker to one manager?
Probably you need to add UNIQUE to constraint to worker_id:
CREATE TABLE IF NOT EXISTS user_relationships (
object_id SERIAL PRIMARY KEY NOT NULL UNIQUE,
manager_id INT REFERENCES users (object_id) NOT NULL,
worker_id INT REFERENCES users (object_id) UNIQUE NOT NULL,
CHECK (manager_id != worker_id)
);
But is better to add field manager_id INT REFERENCES users (object_id) NOT NULL to table users and do not use user_relationships.