Cassandra have slow(rpc timeout) read request with long IN operator - nosql

I have next table structure:
SELECT * FROM v WHERE uid = 0x5103be34e695ba3c31000000;
uid | cid | v
----------------------------+-----------+-------
0x5103be34e695ba3c31000000 | 02j1Dy9G1 | True
0x5103be34e695ba3c31000000 | 03szNx7G1 | False
0x5103be34e695ba3c31000000 | 0SREjO9G1 | True
0x5103be34e695ba3c31000000 | 0bQ4Qn9G1 | True
0x5103be34e695ba3c31000000 | 0ojEVLWF1 | True
0x5103be34e695ba3c31000000 | 1NiWfO9G1 | True
0x5103be34e695ba3c31000000 | 1fSmhWGF1 | True
0x5103be34e695ba3c31000000 | 1o0Ri3TF1 | True
User (uid) likes(True) or dislikes(False) content (cid)
"Is content liked by user?"
SELECT * FROM v WHERE uid = 0x5103be34e695ba3c31000000 AND cid IN ('Rqy9V79J',....more than 2000 cids...);
rpc timeout
Normal SELECT * FROM v WHERE uid = 0x5103be34e695ba3c31000000 works very fast.
How can i speed up read request with IN? Other data structure?
Any ideas?

IN-operator with many parameters require more memory for each thread.
To fix it try to set JVM_OPTS="$JVM_OPTS -Xss512k"

Related

Listing duplicated entities based on two fields with django orm

My objective is getting all orders with duplicated order_number by location, that has checkout = true at least one time. But I'm having issues to translate this to Django ORM.
example of data:
| id | order_number | location_id | checkout |
|----|--------------|-------------|----------|
| 1 | 1 | 1 | true |
| 2 | 1 | 1 | true |
| 3 | 1 | 1 | false |
| 4 | 2 | 1 | true |
| 5 | 1 | 2 | true |
| 6 | 2 | 2 | false |
select count(*), order_number, location_id from orders where checkout = true group by location_id, order_number having count(*) > 1;
the expectation
| count | order_number | location_id |
|-------|--------------|-------------|
| 2 | 1 | 1 |
I already tried this, but it's not working as expected
>>> Order.objects.filter(checkout=True).values_list('order_number', 'location_id').annotate(count_order_number=Count("order_number")).filter(count_order_number__gt=1)
<QuerySet []>
I'm using
Django=3.2
postgresql

View Rows as columns in postgres SQL without using cross Tab as Cross Tab is provioing the excepted results

I am using Postgres 9.6. and i have a result set like this:
employee Name|collegeName | Date |attendance
-------------|------------|----------|-----------
employee1 |college1 |2020-05-01| true
employee1 |college2 |2020-05-01| false
employee2 |college3 |2020-05-01| true
employee3 |college4 |2020-05-02| true
employee4 |college5 |2020-05-02| false
employee5 |college1 |2020-05-03| true
employee6 |college3 |2020-05-03| false
My desired result is as follows:
employee Name|collegeName | 2020-05-01 | 2020-05-02 | 2020-05-03
-------------|------------|------------|------------|-----------
employee1 |college1 | true | |
employee1 | college2 | false | |
employee2 | college3 | true | |
employee3 | college4 | | true |
employee4 |college5 | | false |
employee5 | college1 | | | true
employee6 | college3 | | |false
tried using cross tab but couldn't get the desired Result. please help.

Filling gaps in postgresql

I have Actions table, which has rows ordered by time
| time | session |
|----------|-----------|
| 16:10:10 | session_1 |
| 16:13:05 | null |
| 16:16:43 | null |
| 16:23:12 | null |
| 16:24:01 | session_2 |
| 16:41:32 | null |
| 16:43:56 | session_3 |
| 16:51:22 | session_4 |
I want to write a select which will put previous meaningful value instead of nulls
How to get this result with postgresql?
| time | session |
|----------|-----------|
| 16:10:10 | session_1 |
| 16:13:05 | session_1 |
| 16:16:43 | session_1 |
| 16:23:12 | session_1 |
| 16:24:01 | session_2 |
| 16:41:32 | session_2 |
| 16:43:56 | session_3 |
| 16:51:22 | session_4 |
update Actions a
set session = (
select session
from Actions
where time = (
select max(time) from Actions b
where b.time < a.time and session is not null
)
) where session is null;
I tried this with 'time' as int and 'session' as int [easier to add data].
drop table Actions;
create table Actions (time int, session int);
insert into Actions values (1,10),(2,null),(3,null),(4,2),(5,null),(6,3),(7,4);
select * from Actions order by time;
update Actions a ...;
select * from Actions order by time;
EDIT
Response to your modified question.
select a1.time, a2.session
from Actions a1
inner join
Actions a2
on a2.time = (
select max(time) from Actions b
where b.time <= a1.time and session is not null
)

Sphinx query takes too much time

I am making an index on a table with ~90 000 000 rows. Fulltext search must be done on a varchar field, called email. I also set parent_id as an attribute.
When I run queries to search emails that match words with small amount of hits, they are fired immediately:
mysql> SELECT count(*) FROM users WHERE MATCH('diedsmiling');
+----------+
| count(*) |
+----------+
| 26 |
+----------+
1 row in set (0.00 sec)
mysql> show meta;
+---------------+-------------+
| Variable_name | Value |
+---------------+-------------+
| total | 1 |
| total_found | 1 |
| time | 0.000 |
| keyword[0] | diedsmiling |
| docs[0] | 26 |
| hits[0] | 26 |
+---------------+-------------+
6 rows in set (0.00 sec)
Things get complicated when I am searching for emails that match words with a big amount of hits:
mysql> SELECT count(*) FROM users WHERE MATCH('mail');
+----------+
| count(*) |
+----------+
| 33237994 |
+----------+
1 row in set (9.21 sec)
mysql> show meta;
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| total | 1 |
| total_found | 1 |
| time | 9.210 |
| keyword[0] | mail |
| docs[0] | 33237994 |
| hits[0] | 33253762 |
+---------------+----------+
6 rows in set (0.00 sec)
Using parent_id attribute, doesn't give any profit:
mysql> SELECT count(*) FROM users WHERE MATCH('mail') AND parent_id = 62003;
+----------+
| count(*) |
+----------+
| 21404 |
+----------+
1 row in set (8.66 sec)
mysql> show meta;
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| total | 1 |
| total_found | 1 |
| time | 8.666 |
| keyword[0] | mail |
| docs[0] | 33237994 |
| hits[0] | 33253762 |
Here are my sphinx configs:
source src1
{
type = mysql
sql_host = HOST
sql_user = USER
sql_pass = PASS
sql_db = DATABASE
sql_port = 3306 # optional, default is 3306
sql_query = \
SELECT id, parent_id, email \
FROM users
sql_attr_uint = parent_id
}
index test1
{
source = src1
path = /var/lib/sphinx/test1
}
The query that I need to run looks like:
SELECT * FROM users WHERE MATCH('mail') AND parent_id = 62003;
I need to get all emails that match a certain work and have a certain parent_id.
My questions are:
Is there a way to optimize the situation described above? Maybe there is a more convenient matching mode for such type of queries? If I migrate to a server with SSD disks will the performance growth be significant?
Just to get count can just do
Select id from index where match(...) limit 0 option ranker=none; show meta;
And get from total_found.
Will be much more efficient than count[*) which invokes group by.
Or even call keywords('word','index',1); if only single words.

Postgres group by with aggerate (last comment in a conversation across all conversations)

I want to get the last comment in a conversation between two people.
My table structure as follows:
Table "public.comments"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+-----------------------------+-------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('comments_id_seq'::regclass) | plain | |
body | text | not null | extended | |
target_id | integer | | plain | |
target_type | character varying(255) | | extended | |
created_at | timestamp without time zone | not null | plain | |
updated_at | timestamp without time zone | not null | plain | |
user_id | integer | | plain | |
My Attempt:
SELECT
comments.id,
max(SELECT id comments.created_at),
CASE
WHEN user_id = 1 THEN CONCAT(user_id,'_',target_id)
WHEN target_id = 1 THEN CONCAT(target_id,'_',user_id)
END
FROM comments
WHERE
comments.user_id = 1
OR
(comments.target_type = 'User'
AND
comments.target_id = 1)
GROUP BY
CASE
WHEN user_id = 1 THEN CONCAT(user_id,'_',target_id)
WHEN target_id = 1 THEN CONCAT(target_id,'_',user_id)
END
So I figured out how to group the comments but how to order by created_at and get the latest id and information is where I'm stuck.