How to get P99 latency result using pgbench?

How to get P99 latency result using pgbench? - postgresql

I am trying to use pgbench to perform a test on PolarDB for postgreSQL.
This is the command I used to perform the test.
pgbench -M prepared -r -c 16 -j 4 -T 30 -p 10001 -d pgbench -l
And this is the result
... ...
client 2 sending P0_10
client 2 receiving
client 2 receiving
client 14 receiving
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 32
query mode: prepared
number of clients: 16
number of threads: 4
duration: 30 s
number of transactions actually processed: 49126
latency average = 9.772 ms
tps = 1637.313156 (including connections establishing)
tps = 1637.438330 (excluding connections establishing)
statement latencies in milliseconds:
1.128 \set aid random(1, 100000 * :scale)
0.068 \set bid random(1, 1 * :scale)
0.040 \set tid random(1, 10 * :scale)
0.041 \set delta random(-5000, 5000)
0.104 BEGIN;
3.815 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.590 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
1.188 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
1.440 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.327 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.481 END;
I wonder if there is a way to calculate P99 from the result, or there is some extra parameter I need provide to pgbench?

The -l caused it to write log files. You need to look in those log files for the latencies. For me that looks something like this:
cat pgbench_log.107915*|wc
36635 219810 1033548
sort pgbench_log.107915* -k3rn|head -n 366|tail -n 1
13 990 65184 0 195589 166574
so about 65.184 ms was the 99% latency. I would question if that actually means anything, though. After all, the very last transaction had to wait nearly 30 seconds before it even got is turn on the server, so why wasn't its 'latency' 30 seconds? What about the transactions that never got their turn at all?

Related

High CPU usage caused during select query

On running a select query with a QPS of 300+, query latency increases up to 20k ms with a CPU usage of ~100%. I couldn't identify what the issue could be over here!?
System details:
vCPUs: 16
RAM: 64 GiB
Create hypertable:
CREATE TABLE test_table (time TIMESTAMPTZ NOT NULL, val1 TEXT NOT NULL, val2 DOUBLE PRECISION NOT NULL, val3 DOUBLE PRECISION NOT NULL);
SELECT create_hypertable('test_table', 'time');
SELECT add_retention_policy('test_table', INTERVAL '2 day');
Query:
SELECT * from test_table where val1='abc'
Timescale version:
2.5.1
No. of rows in test-table:
6.9M
Latencies:
p50=8493.722203,
p75=8566.074792,
p95=21459.204448,
p98=21459.204448,
p99=21459.204448,
p999=21459.204448
CPU usage of postgres SELECT query as per the htop command.

How to implement a "bank transaction" like table?

Take the following example:
Operation Signal Value Balance
1 + 100.00 100.00
2 + 50.00 150.00
3 + 10.00 160.00
Now I have two concurrent transactions that take the last operation, add the value and then save (insert). The values are never updated.
Transaction 1
Takes the last operation (3) and locks it "for update" and inserts
Operation Signal Value Balance
1 + 100.00 100.00
2 + 50.00 150.00
3 + 10.00 160.00
4 + 10.00 170.00
Transaction 2
Takes the last operation and locks it "for update". Since the first transaction has a "for update" lock, it waits to get the latest. In this example transaction 2 began before transaction 1 ended.
The actual row returned from postgresql is #3 (after the fist lock is released). So it ends up like this:
Operation Signal Value Balance
1 + 100.00 100.00
2 + 50.00 150.00
3 + 10.00 160.00
4 + 10.00 170.00
5 + 10.00 170.00
So the balance ends up 170.00. The desired is to be 180.00.
This is in the READ_COMMITTED transaction mode.
With SERIALIZABLE mode Transaction 2 throws an error about concurrency.
I have tried the following:
Selected the latest row with "for update";
Selected a row with this_is_the_last = true "for update", changed it to false and then inserted a new one with this_is_the_last = true. PostgreSQL ends up returning a row with this_is_the_last = false even if it's not (on transaction 2 after transaction 1 relases the for update lock).
Is there a way to make a row level lock and make transaction 2 wait for transaction 1 in a way that transaction 2 will not select the same "latest" as transaction 1?

This kind of “lost update” need never happen in PostgreSQL, even if you use READ COMMITTED isolation.
Define operation as the primary key constraint on the table.
Each of your transactions will execute:
INSERT INTO transaction (operation, signal, value, balance)
SELECT operation + 1,
'+',
10,
balance + 10
FROM transaction
ORDER BY operation DESC
LIMIT 1;
Then if there is a concurrent transaction, one of the two will receive a primary key violation, since operation is unique. That transaction simply retries the operation until it succeeds.

Not able to increase TPS in Postgres BDR set up using pgbench utility

I am getting constant TPS as 500 while using pgbench with different TPS as input .How to increase the TPS in Postgres BDR set up, please suggest .
bdr_version used : 1.0.3-2017-11-21-15283ba
Sample Output and pgbench command input
Scenario 1:
Shared_Buffer = 1024 MB and throttling TPS as 2000
command used:
pgbench -h <hostname> -p 5432 -U postgres -d <dbname> -c 50 -j 50 -r -R 2000
O/P :
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 50
duration: 2400 s
number of transactions actually processed: 1200964
latency average: 894071.370 ms
latency stddev: -nan ms
rate limit schedule lag: avg 893971.489 (max 1819008.190) ms
tps = 500.382936 (including connections establishing)
tps = 500.385107 (excluding connections establishing)
statement latencies in milliseconds:
0.109225 \set nbranches 1 * :scale
0.100045 \set ntellers 10 * :scale
0.085917 \set naccounts 100000 * :scale
0.064758 \setrandom aid 1 :naccounts
0.057230 \setrandom bid 1 :nbranches
0.052089 \setrandom tid 1 :ntellers
0.049827 \setrandom delta -5000 5000
0.471368 BEGIN;
0.678317 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.599331 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
80.328257 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
15.537372 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.698180 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.995450 END;
Thu Aug 16 09:52:30 UTC 2018
Scenario 2:
Shared_Buffer = 1024 MB and throttling TPS as 6000
Command Used:
pgbench -h <hostname> -p 5432 -U postgres -d <dbname> -c 50 -j 50 -r -R 6000
O/P:
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 50
duration: 2400 s
number of transactions actually processed: 1184746
latency average: 1096729.554 ms
latency stddev: -nan ms
rate limit schedule lag: avg 1096628.305 (max 2208483.098) ms
tps = 493.625936 (including connections establishing)
tps = 493.629106 (excluding connections establishing)
statement latencies in milliseconds:
0.108491 \set nbranches 1 * :scale
0.098740 \set ntellers 10 * :scale
0.084497 \set naccounts 100000 * :scale
0.064168 \setrandom aid 1 :naccounts
0.056658 \setrandom bid 1 :nbranches
0.051678 \setrandom tid 1 :ntellers
0.049427 \setrandom delta -5000 5000
0.480755 BEGIN;
0.696514 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.607114 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
81.404448 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
15.775295 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.708403 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
1.009272 END;
Thu Aug 16 11:33:17 UTC 2018

Select query for fetching data on the interval of 4 hours

I am using postgres 8.1 database and I want to write a query which select data on the interval of 4 hours.
So as image show subscriber_id with date, this how currently data available in the database and
I want data like
No. of Subscriber | Interval
0 0-4
0 4-8
7 8-12
1 12-16
0 16-20
0 20-24
basically in each day we have 24 hours, if I divide 24/4=6 means I have total 6 intervals for each day
0-4
4-8
8-12
12-16
16-20
20-24
So I need count of subscribers within these intervals. Is there any data function in postgres which solve my problem or how can I write a query for this problem ?
NOTE : Please write your solution according to postgres 8.1 version

Use generate_series() to generate periods and left join date_time with appropriate periods, e.g.:
with my_table(date_time) as (
values
('2016-10-24 11:10:00'::timestamp),
('2016-10-24 11:20:00'),
('2016-10-24 15:10:00'),
('2016-10-24 21:10:00')
)
select
format('%s-%s', p, p+4) as "interval",
sum((date_time notnull)::int) as "no of subscriber"
from generate_series(0, 20, 4) p
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;
interval | no of subscriber
----------+------------------
0-4 | 0
4-8 | 0
8-12 | 2
12-16 | 1
16-20 | 0
20-24 | 1
(6 rows)
I wouldn't suppose that there is a live guy who remembers version 8.1. You can try:
create table periods(p integer);
insert into periods values (0),(4),(8),(12),(16),(20);
select
p as "from", p+4 as "to",
sum((date_time notnull)::int) as "no of subscriber"
from periods
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;
from | to | no of subscriber
------+----+------------------
0 | 4 | 0
4 | 8 | 0
8 | 12 | 2
12 | 16 | 1
16 | 20 | 0
20 | 24 | 1
(6 rows)

In Postgres, you can do this by generating all the intervals for your time periods. This is a little tricky, because you have to pick out the dates in your data. However, generate_series() is really helpful.
The rest is just a left join and aggregation:
select dt.dt, count(t.t)
from (select generate_series(min(d.dte), max(d.dte) + interval '23 hour', interval '4 hour') as dt
from (select distinct date_trunc('day', t.t)::date as dte from t) d
) dt left join
t
on t.t >= dt.dt and t.t < dt.dt + interval '4 hour'
group by dt.dt
order by dt.dt;
Note that this keeps the period as the date/time of the beginning of the period. You can readily convert this to a date and an interval number, if that is more helpful.

The Previous Solution is also working...
Adding one more option,
Instead of creating a table for periods we can also use array and unnest function of arrays in this query
My code is
select
p as "from", p+4 as "to",
sum((date_time not null)::int) as "no of subscriber"
from unnest(ARRAY[0,4,8,12,16,20]) as p
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;

I think if you run six different queries since you know the time intervals (lower and upper limit) will be better.

Hive Calculation of Percentage

I am trying to write a simple code to calculate the percentage of occurrence of distinct instances in a table.
Can I do this in one go???
Below is my code which is giving me error.
select 100 * total_sum/sum(total_sum) from jav_test;

In the past when I have had to do similar things this is the approach I've taken:
SELECT
jav_test.total_sum AS total_sum,
withsum.total_sum AS sum_of_all_total_sum,
100 * (jav_test.total_sum / withsum.total_sum) AS percentage
FROM
jav_test,
(SELECT sum(total_sum) AS total_sum FROM jav_test) withsum -- This computes sum(total_sum) here as a single-row single-column table aliased as "withsum"
;
The presence of the total_sum and sum_of_all_total_sum columns in the output is just to convince myself that the correct math took place - the one you are interested in is percentage, based on the query you posted in the question.
After populating a small dummy table, this was the result:
hive> describe jav_test;
OK
total_sum int
Time taken: 1.777 seconds, Fetched: 1 row(s)
hive> select * from jav_test;
OK
28
28
90113
90113
323694
323694
Time taken: 0.797 seconds, Fetched: 6 row(s)
hive> SELECT
> jav_test.total_sum AS total_sum,
> withsum.total_sum AS sum_of_all_total_sum,
> 100 * (jav_test.total_sum / withsum.total_sum) AS percentage
> FROM jav_test, (SELECT sum(total_sum) AS total_sum FROM jav_test) withsum;
...
... lots of mapreduce-related spam here
...
Total MapReduce CPU Time Spent: 3 seconds 370 msec
OK
28 827670 0.003382990805514275
28 827670 0.003382990805514275
90113 827670 10.887551802046708
90113 827670 10.887551802046708
323694 827670 39.10906520714777
323694 827670 39.10906520714777
Time taken: 41.257 seconds, Fetched: 6 row(s)
hive>

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse