Cassandra have slow(rpc timeout) read request with long IN operator

Cassandra have slow(rpc timeout) read request with long IN operator - nosql

I have next table structure:
SELECT * FROM v WHERE uid = 0x5103be34e695ba3c31000000;
uid | cid | v
----------------------------+-----------+-------
0x5103be34e695ba3c31000000 | 02j1Dy9G1 | True
0x5103be34e695ba3c31000000 | 03szNx7G1 | False
0x5103be34e695ba3c31000000 | 0SREjO9G1 | True
0x5103be34e695ba3c31000000 | 0bQ4Qn9G1 | True
0x5103be34e695ba3c31000000 | 0ojEVLWF1 | True
0x5103be34e695ba3c31000000 | 1NiWfO9G1 | True
0x5103be34e695ba3c31000000 | 1fSmhWGF1 | True
0x5103be34e695ba3c31000000 | 1o0Ri3TF1 | True
User (uid) likes(True) or dislikes(False) content (cid)
"Is content liked by user?"
SELECT * FROM v WHERE uid = 0x5103be34e695ba3c31000000 AND cid IN ('Rqy9V79J',....more than 2000 cids...);
rpc timeout
Normal SELECT * FROM v WHERE uid = 0x5103be34e695ba3c31000000 works very fast.
How can i speed up read request with IN? Other data structure?
Any ideas?

IN-operator with many parameters require more memory for each thread.
To fix it try to set JVM_OPTS="$JVM_OPTS -Xss512k"

Related

Listing duplicated entities based on two fields with django orm

My objective is getting all orders with duplicated order_number by location, that has checkout = true at least one time. But I'm having issues to translate this to Django ORM.
example of data:
| id | order_number | location_id | checkout |
|----|--------------|-------------|----------|
| 1 | 1 | 1 | true |
| 2 | 1 | 1 | true |
| 3 | 1 | 1 | false |
| 4 | 2 | 1 | true |
| 5 | 1 | 2 | true |
| 6 | 2 | 2 | false |
select count(*), order_number, location_id from orders where checkout = true group by location_id, order_number having count(*) > 1;
the expectation
| count | order_number | location_id |
|-------|--------------|-------------|
| 2 | 1 | 1 |
I already tried this, but it's not working as expected
>>> Order.objects.filter(checkout=True).values_list('order_number', 'location_id').annotate(count_order_number=Count("order_number")).filter(count_order_number__gt=1)
<QuerySet []>
I'm using
Django=3.2
postgresql

View Rows as columns in postgres SQL without using cross Tab as Cross Tab is provioing the excepted results

I am using Postgres 9.6. and i have a result set like this:
employee Name|collegeName | Date |attendance
-------------|------------|----------|-----------
employee1 |college1 |2020-05-01| true
employee1 |college2 |2020-05-01| false
employee2 |college3 |2020-05-01| true
employee3 |college4 |2020-05-02| true
employee4 |college5 |2020-05-02| false
employee5 |college1 |2020-05-03| true
employee6 |college3 |2020-05-03| false
My desired result is as follows:
employee Name|collegeName | 2020-05-01 | 2020-05-02 | 2020-05-03
-------------|------------|------------|------------|-----------
employee1 |college1 | true | |
employee1 | college2 | false | |
employee2 | college3 | true | |
employee3 | college4 | | true |
employee4 |college5 | | false |
employee5 | college1 | | | true
employee6 | college3 | | |false
tried using cross tab but couldn't get the desired Result. please help.

Filling gaps in postgresql

I have Actions table, which has rows ordered by time
| time | session |
|----------|-----------|
| 16:10:10 | session_1 |
| 16:13:05 | null |
| 16:16:43 | null |
| 16:23:12 | null |
| 16:24:01 | session_2 |
| 16:41:32 | null |
| 16:43:56 | session_3 |
| 16:51:22 | session_4 |
I want to write a select which will put previous meaningful value instead of nulls
How to get this result with postgresql?
| time | session |
|----------|-----------|
| 16:10:10 | session_1 |
| 16:13:05 | session_1 |
| 16:16:43 | session_1 |
| 16:23:12 | session_1 |
| 16:24:01 | session_2 |
| 16:41:32 | session_2 |
| 16:43:56 | session_3 |
| 16:51:22 | session_4 |

update Actions a
set session = (
select session
from Actions
where time = (
select max(time) from Actions b
where b.time < a.time and session is not null
)
) where session is null;
I tried this with 'time' as int and 'session' as int [easier to add data].
drop table Actions;
create table Actions (time int, session int);
insert into Actions values (1,10),(2,null),(3,null),(4,2),(5,null),(6,3),(7,4);
select * from Actions order by time;
update Actions a ...;
select * from Actions order by time;
EDIT
Response to your modified question.
select a1.time, a2.session
from Actions a1
inner join
Actions a2
on a2.time = (
select max(time) from Actions b
where b.time <= a1.time and session is not null
)

Sphinx query takes too much time

I am making an index on a table with ~90 000 000 rows. Fulltext search must be done on a varchar field, called email. I also set parent_id as an attribute.
When I run queries to search emails that match words with small amount of hits, they are fired immediately:
mysql> SELECT count(*) FROM users WHERE MATCH('diedsmiling');
+----------+
| count(*) |
+----------+
| 26 |
+----------+
1 row in set (0.00 sec)
mysql> show meta;
+---------------+-------------+
| Variable_name | Value |
+---------------+-------------+
| total | 1 |
| total_found | 1 |
| time | 0.000 |
| keyword[0] | diedsmiling |
| docs[0] | 26 |
| hits[0] | 26 |
+---------------+-------------+
6 rows in set (0.00 sec)
Things get complicated when I am searching for emails that match words with a big amount of hits:
mysql> SELECT count(*) FROM users WHERE MATCH('mail');
+----------+
| count(*) |
+----------+
| 33237994 |
+----------+
1 row in set (9.21 sec)
mysql> show meta;
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| total | 1 |
| total_found | 1 |
| time | 9.210 |
| keyword[0] | mail |
| docs[0] | 33237994 |
| hits[0] | 33253762 |
+---------------+----------+
6 rows in set (0.00 sec)
Using parent_id attribute, doesn't give any profit:
mysql> SELECT count(*) FROM users WHERE MATCH('mail') AND parent_id = 62003;
+----------+
| count(*) |
+----------+
| 21404 |
+----------+
1 row in set (8.66 sec)
mysql> show meta;
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| total | 1 |
| total_found | 1 |
| time | 8.666 |
| keyword[0] | mail |
| docs[0] | 33237994 |
| hits[0] | 33253762 |
Here are my sphinx configs:
source src1
{
type = mysql
sql_host = HOST
sql_user = USER
sql_pass = PASS
sql_db = DATABASE
sql_port = 3306 # optional, default is 3306
sql_query = \
SELECT id, parent_id, email \
FROM users
sql_attr_uint = parent_id
}
index test1
{
source = src1
path = /var/lib/sphinx/test1
}
The query that I need to run looks like:
SELECT * FROM users WHERE MATCH('mail') AND parent_id = 62003;
I need to get all emails that match a certain work and have a certain parent_id.
My questions are:
Is there a way to optimize the situation described above? Maybe there is a more convenient matching mode for such type of queries? If I migrate to a server with SSD disks will the performance growth be significant?

Just to get count can just do
Select id from index where match(...) limit 0 option ranker=none; show meta;
And get from total_found.
Will be much more efficient than count[*) which invokes group by.
Or even call keywords('word','index',1); if only single words.

Postgres group by with aggerate (last comment in a conversation across all conversations)

I want to get the last comment in a conversation between two people.
My table structure as follows:
Table "public.comments"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+-----------------------------+-------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('comments_id_seq'::regclass) | plain | |
body | text | not null | extended | |
target_id | integer | | plain | |
target_type | character varying(255) | | extended | |
created_at | timestamp without time zone | not null | plain | |
updated_at | timestamp without time zone | not null | plain | |
user_id | integer | | plain | |
My Attempt:
SELECT
comments.id,
max(SELECT id comments.created_at),
CASE
WHEN user_id = 1 THEN CONCAT(user_id,'_',target_id)
WHEN target_id = 1 THEN CONCAT(target_id,'_',user_id)
END
FROM comments
WHERE
comments.user_id = 1
OR
(comments.target_type = 'User'
AND
comments.target_id = 1)
GROUP BY
CASE
WHEN user_id = 1 THEN CONCAT(user_id,'_',target_id)
WHEN target_id = 1 THEN CONCAT(target_id,'_',user_id)
END
So I figured out how to group the comments but how to order by created_at and get the latest id and information is where I'm stuck.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Cassandra have slow(rpc timeout) read request with long IN operator - nosql

IN-operator with many parameters require more memory for each thread. To fix it try to set JVM_OPTS="$JVM_OPTS -Xss512k"

Related

Listing duplicated entities based on two fields with django orm

View Rows as columns in postgres SQL without using cross Tab as Cross Tab is provioing the excepted results

Filling gaps in postgresql

Sphinx query takes too much time

Postgres group by with aggerate (last comment in a conversation across all conversations)

Categories

Resources