Sphinx RT-index updating in parallel - sphinx

Since I can't find it anywhere, is it possible to update the RT-index of Sphinx in parallel?
For instance, I have noticed a reduction in processing speed when documents have more then 1.000.000 words. Therefore, I would like to split my processor in processing documents with over 1.000.000 words in a separate thread, not holding back the smaller documents from being processed.
However, I haven't been able to find any benchmarks of parallel updating the RT-index. Neither I have found any documentation of it?
Are there others who are using this approach or is it considered bad practice?

First of all let me remind you that when you update smth in a Sphinx (actually manticore search/lucene/solr/elastic too) real time index you don't actually update anything, you just add the change to a new segment (RAM chunk in case of Sphinx) which will eventually (mostly much later) will be merged with other segments and the change will be really applied. Therefore the question is how fast you can populate RT RAM chunk with new records and how concurrency changes the throughput. I've made a test based on https://github.com/Ivinco/stress-tester and here's what I've got:
snikolaev#dev:~/stress_tester_github$ for conc in 1 2 5 8 11; do ./test.php --plugin=rt_insert.php -b=100 --data=/home/snikolaev/hacker_news_comments.smaller.csv -c=$conc --limit=100000 --csv; done;
concurrency;batch size;total time;throughput;elements count;latencies count;avg latency, ms;median latency, ms;95p latency, ms;99p latency, ms
1;100;28.258;3537;100000;99957;0.275;0.202;0.519;1.221
concurrency;batch size;total time;throughput;elements count;latencies count;avg latency, ms;median latency, ms;95p latency, ms;99p latency, ms
2;100;18.811;5313;100000;99957;0.34;0.227;0.673;2.038
concurrency;batch size;total time;throughput;elements count;latencies count;avg latency, ms;median latency, ms;95p latency, ms;99p latency, ms
5;100;16.751;5967;100000;99957;0.538;0.326;1.163;3.797
concurrency;batch size;total time;throughput;elements count;latencies count;avg latency, ms;median latency, ms;95p latency, ms;99p latency, ms
8;100;20.576;4857;100000;99957;0.739;0.483;1.679;5.527
concurrency;batch size;total time;throughput;elements count;latencies count;avg latency, ms;median latency, ms;95p latency, ms;99p latency, ms
11;100;23.55;4244;100000;99957;0.862;0.54;2.102;5.849
I.e. increasing concurrency from 1 to 11 (in my case on 8-core server) can let you increase the throughput from 3500 to 4200 documents per second. I.e. 20% - not bad, but not that great performance boost.
In your case perhaps another way can work out - you can update not one, but multiple indexes and then have a distributed index to combine them all. In other case you can do so called sharding. For example if you write to two RT indexes instead of one you can get this:
snikolaev#dev:~/stress_tester_github$ for conc in 1 2 5 8 11; do ./test.php --plugin=rt_insert.php -b=100 --data=/home/snikolaev/hacker_news_comments.smaller.csv -c=$conc --limit=100000 --csv; done;
concurrency;batch size;total time;throughput;elements count;latencies count;avg latency, ms;median latency, ms;95p latency, ms;99p latency, ms
1;100;28.083;3559;100000;99957;0.274;0.206;0.514;1.223
concurrency;batch size;total time;throughput;elements count;latencies count;avg latency, ms;median latency, ms;95p latency, ms;99p latency, ms
2;100;18.03;5543;100000;99957;0.328;0.225;0.653;1.919
concurrency;batch size;total time;throughput;elements count;latencies count;avg latency, ms;median latency, ms;95p latency, ms;99p latency, ms
5;100;15.07;6633;100000;99957;0.475;0.264;1.066;3.821
concurrency;batch size;total time;throughput;elements count;latencies count;avg latency, ms;median latency, ms;95p latency, ms;99p latency, ms
8;100;18.608;5371;100000;99957;0.613;0.328;1.479;4.897
concurrency;batch size;total time;throughput;elements count;latencies count;avg latency, ms;median latency, ms;95p latency, ms;99p latency, ms
11;100;26.071;3833;100000;99957;0.632;0.294;1.652;4.729
i.e. 6600 docs per sec at concurrency 5. Now it's almost 90% better than the initial throughput which seems to be a good result. Playing with the # of indexes and concurrencies you can find the optimal settings for your case.

Related

Troubleshooting Latency Increase for Lambda to EFS Reads

The Gist
We've got a Lambda job running that reads data from EFS (elastic throughout) for up to 200 TPS of read requests.
PercentIOLimit is well below 20%.
Latency goes from about 20 ms to about 400 ms during traffic spikes.
Are there any steps I can take to get more granularity into where the latency for the reads is coming from?
Additional Info:
At low TPS (~5), reads take about 10-20 ms.
At higher TPS (~50), p90 can take 300-400 ms.
I'd really like to narrow down what limit is causing these latency spikes, especially when the IOPercent usage is around 60%.

Downlink and Uplink thoughput calculation for LTE bandwidth

Three LTE small cells located in a room approximately same distance between each other which are having some interference. All the cells are same having bandwidth (15MHz) and same frequency configured.
Each cell is connected with a three separate UE(Phone) and calculated downlink and uplink throughput,
Each phone is receiving ~ ( 25 to 38 Mbps).
Later i changed the bandwidth of 1 cell to 10 MHz and other two cells to 5 Mhz and appropriate frequency and calculated downlink and uplink throughput,
Each phone is higher throughput compared to previous test.~( 29 to 46 Mbps).
My question here is , as the bandwidth is reduced significantly i assume the through also should drop as well, but the throughput is higher than before.
All the cells are same having bandwidth (15MHz) and same frequency configured.
That's the key moment, because they do interfere with each other. What is the reason of using the same frequency band for three cells?
Each phone is higher throughput compared to previous test.~( 29 to 46 Mbps).
My question here is , as the bandwidth is reduced significantly i assume the through also should drop as well, but the throughput is higher than before.
Most likely, reducing the bandwidth reduced the level of total interference.

On AWS RDS Postgres, what could cause disk latency to go up while iOPs / throughput go down?

I'm investigating an approximately 3 hour period of increased query latency on a production Postgres RDS instance (m4.xlarge, 400 GiB of gp2 storage).
The driver seems to be a spike in both read and write disk latencies: I see them going from a baseline of ~0.0005 up to a peak of 0.0136 write latency / 0.0081 read latency.
I also see an increase in disk queue depth from a baseline of around 2, to a peak of 14.
When there's a spike in disk latencies, I generally expect to see an increase in data being written to disk. But read iOPS, write iOPS, read throughput, and write throughput all went down (by approximately 50%) during the time when latency was elevated.
I also have server-side metrics on the total query volume I'm sending (measured in both queries per second and amount of data written: this is a write-heavy workload), and those metrics were flat during this time period.
I'm at a loss for what to investigate next. What are possible reasons that disk latency could increase while iOPs go down?

wrk --latency the mean of latency distribution

i use wrk to test my service
wrk -t2 -c10 -d20s --latency http://192.168.0.105:8102/get
output
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 525.29ms 210.25ms 1.73s 82.12%
Req/Sec 11.21 7.05 40.00 65.48%
Latency Distribution
50% 489.47ms
75% 570.62ms
90% 710.66ms
99% 1.56s
377 requests in 20.08s, 4.54MB read
Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec: 18.77
Transfer/sec: 231.74KB
but i do not understand the mean of Latency Distribution
Latency Distribution
50% 489.47ms
75% 570.62ms
90% 710.66ms
99% 1.56s
i got it it's 95th percentile mathematical calculation

should i increase threadpool?

I am seeing almost 99% of thread in park/wait condition and 1% in running out of threadpool of size 100. CPU usage is 25% and memory just 2GB out of 6 GB allocated. Should I increase threadpool size? I think so because system has enough resource and more threads will utilize it properly to increase througput.