Loop insert with select - postgresql

I have the following structures
Tickets
+----+---------------------+-----------+---------------+
| id | price | seat_id | flight_id |
+----+---------------------+-----------+---------------+
Seats
+----+--------+-----------+
| id | letter | number |
+----+--------+-----------+
| 1 | A | 1 |
| 2 | A | 2 |
| 3 | A | 3 |
+----+--------+-----------+
I want to insert 2 tickets using only one query where the letter is A and the number is between 1 and 2, I guess to make more than 1 insert at time I have to use some plsql loop but I don't know how to do it and i don't know if this is the approach

Not sure what you are actually wanting to do, but from your description I'll assume you want 2rows in tickets referencing id 1 and 2 from seats.
SQL works in sets NOT in individual rows and loop (yes those are available via plpgsql) but avoid loops when ever possible. Inserting 2 rows does not require one; in fact it is almost exactly the same as inserting a single row. Since you didn not specify values for price and flight, I'll just omit them. But to insert 2 rows:
Insert into tickets(id,seat_id) values (1,1),(2,2);

Related

Thread safety in relational database triggers

I have a question regarding the thread-safety of trigger operations in relational databases like mariadb or mysql
Imagine a table structure like
+----+-------+----------+--------+
| ID | NAME | CATEGORY | OFFSET |
+----+-------+----------+--------+
| 1 | name1 | CAT_1 | 0 |
+----+-------+----------+--------+
| 2 | name2 | CAT_1 | 1 |
+----+-------+----------+--------+
| 3 | name3 | CAT_2 | 0 |
+----+-------+----------+--------+
| 4 | name4 | CAT_1 | 2 |
+----+-------+----------+--------+
| 5 | name5 | CAT_2 | 1 |
+----+-------+----------+--------+
Please note the value of column OFFSET in relation to CATEGORY. The offset increases by 1 everytime a record of a particular type is inserted.
For example the next record with id = 6 of type CAT_1 will have the value 3 for offset
and a record with id = 7 of type CAT_2 will have offset = 2
New records will be inserted via a rest API and the id and offset needs to returned in the response.
Now this process needs to have thread safety i.e no two records (even if invoked concurrently via HTTP request to the API) of the same category should have the same offset value.
One way I thought of doing this is via a before_insert trigger where it would read the last offset value of the to_be_inserted category and insert the new record with a +1.
What I am unsure about is if this process is thread-safe.
Can it result in a situation where two simultaneous inserts of same category will execute triggers that will read the same previous offset value and calculate the same current offset ?
If yes then what would be a thread-safe way of doing it ?
Any help would be greatly appreciated

Selecting value for the latest two distinct columns

I am trying to do an SQL which will return the latest data value of the two distinct columns of my table.
Currently, I select distinct the values of the column and afterwards, I iterate through the columns to get the distinct values selected before then order and limit to 1. These tags can be any number and may not always be posted together (one time only tag 1 can be posted; whereas other times 1, 2, 3 can).
Although it gives the expected outcome, this seems to be inefficient in a lot of ways, and because I don't have enough SQL experience, this was so far the only way I found of performing the task...
--------------------------------------------------
| name | tag | timestamp | data |
--------------------------------------------------
| aa | 1 | 566 | 4659 |
--------------------------------------------------
| ab | 2 | 567 | 4879 |
--------------------------------------------------
| ac | 3 | 568 | 1346 |
--------------------------------------------------
| ad | 1 | 789 | 3164 |
--------------------------------------------------
| ae | 2 | 789 | 1024 |
--------------------------------------------------
| af | 3 | 790 | 3346 |
--------------------------------------------------
Therefore the expected outcome is {3164, 1024, 3346}
Currently what I'm doing is:
"select distinct tag from table"
Then I store all the distinct tag values programmatically and iterate programmatically through these values using
"select data from table where '"+ tags[i] +"' in (tag) order by timestamp desc limit 1"
Thanks,
This comes close, but beware if you have two rows with the same tag share a maximum timestamp you will get duplicates in the result set
select data from table
join (select tag, max(timestamp) maxtimestamp from table t1 group by tag) as latesttags
on table.tag = latesttags.tag and table.timestamp = latesttags.maxtimestamp

How do I generate a random sample of groups, including all people in the group, where the group_id (but not the person_id) changes across time?

I have data that looks like this:
+----------+-----------+------------+------+
| group_id | person_id | is_primary | year |
+----------+-----------+------------+------+
| aaa1 | 1 | TRUE | 2000 |
| aaa2 | 1 | TRUE | 2001 |
| aaa3 | 1 | TRUE | 2002 |
| aaa4 | 1 | TRUE | 2003 |
| aaa5 | 1 | TRUE | 2004 |
| bbb1 | 2 | TRUE | 2000 |
| bbb2 | 2 | TRUE | 2001 |
| bbb3 | 2 | TRUE | 2002 |
| bbb1 | 3 | FALSE | 2000 |
| bbb2 | 3 | FALSE | 2001 |
+----------+-----------+------------+------+
The data design is such that
person_id uniquely identifies an individual across time
group_id uniquely identifies a group within each year, but may change from year to year
each group contains primary and non-primary individuals
My goal is three-fold:
Get a random sample, e.g. 10%, of primary individuals
Get the data on those primary individuals for all time periods they appear in the database
Get the data on any non-primary individuals that share a group with any of the primary individuals that were sampled in the first and second steps
I'm unsure where to start with this, since I need to first pull a random sample of primary individuals and get all observations for them. Presumably I can do this by generating a random number that's the same within any person_id, then sample based on that. Then, I need to get the list of group_id that contain any of those primary individuals, and pull all records associated with those group_id.
I don't know where to start with these queries and subqueries, and unfortunately, the interface I'm using to access this database can't link information across separate queries, so I can't pull a list of random person_id for primary individuals, then use that text file to filter group_id in a second query; I have to do it all in one query.
A quick way to get this done is:
select
data_result.*
from
data as data_groups join
(select
person_id
from
data
where
is_primary
group by
person_id
order by
random()
limit 1) as selected_primary
ON (data_groups.person_id = selected_primary.person_id)
JOIN data AS data_result ON (data_groups.group_id = data_result.group_id AND data_groups.year = data_result.year)
I even made a fiddle so you can test it.
The query is pretty straightforward, it gets the sample, then it gets their groups and then it gets all the users of those groups.
Please pay atention to the Limit 1 clause that is there because the data set was so little. You can put a value or a query that gets the right percentage.
If anyone has an answer using windowing functions I'd like to see that.
Note: next time please provide the schema and the data insertion so it is easier to answer.

removing duplicates from Sphinx Search based on 1 column

I have one table which I need to do a search on, this table is formed through a join between 2 other tables.
ThinkingSphinx::Index is defined on table posts, my posts_index.rb looks something like this:
join 'LEFT JOIN threads on posts.id = threads.id'
indexes 'posts.text', as: posts_text
indexes 'threads.text', as: threads_text
and my tables:
threads
| id | text |
| 0 | test title |
| 1 | foo bar |
posts
| id | parent | text
| 0 | 0 | some stuff
| 1 | 0 | more stuff
What I need to do is to perform a sphinx search on both the thread.text and the posts.text. Say I do a search on the word stuff, this comes back
thread.id | posts.id | thread.text | posts.text |
0 | 0 | test title | some stuff |
0 | 1 | test title | more stuff |
this is what I need, but if I do a search on the word test, this comes back
thread.id | posts.id | thread.text | posts.text |
0 | 0 | test title | some stuff |
0 | 1 | test title | more stuff |
this is NOT what I want, as you can see there is an extra unnecessary row returned, in this case I only want the first row. I cannot do a group_by because one thread can have many posts that may/may not contain the search term, I still need to return all of those posts that are hit. The only time when I dont want a certain duplicate result is if the search term is ONLY found in the thread title. For various reasons I cannot just write a filter after the sphinx search, it has to be written into the query.
About the only way to do this would be to arrange that the thread.text column is blank on all but one post (eg the first), that way a keyword match against the thread title can only ever match against one post.
(but you then can't get matches, where some words match the title, and some match the post text, for anything but the first post)
... also not sure exactly how to arrange for this in ThinkingSphinx.
not really a solution but I ended up just adding another index and removed the title field from the other index.

How do I return multiple documents that share indexes efficiently?

I'm indexing some data using Sphinx. I have objects that are categorised and the categories have a heirarchy. My basic table structure is as follows:
Objects
| id | name |
| 1 | ABC |
| 2 | DEF |
...
Categories
| id | name | parent_id |
| 1 | My Category | 0 |
| 2 | A Child | 1 |
| 3 | Another Child | 1 |
...
Object_Categories
| object_id | category_id |
| 1 | 2 |
| 2 | 3 |
...
My config currently is:
sql_query = SELECT categories.id, objects.name, parent_id FROM categories \
LEFT JOIN object_categories ON categories.id = object_categories.category_id \
LEFT JOIN objects ON objects.id = object_categories.object_id
sql_attr_uint = parent_id
This returns category IDs for any categories that contain objects that match my search, but I need to make an adjustment to get objects in that category or any of it's children.
Obviously, I could UNION this query with another that gets ID from matched categories parents, and so on (it could be up to 4 or 5 levels deep), but this seems hugely inefficient. Is there a way to return multiple document IDs in the first field, or to avoid repeated needless indexing?
I'm a Sphinx noob, so I'm not sure how to approach the problem.
See
http://www.sitepoint.com/hierarchical-data-database/
its talking about a database, but the same system works equally well within sphinx. It can take a while to get your head around, but its well worth mastering (IMHO!).
(ie add the left/right columns to the database, and then include them as attributes in the sphinx index)