Write speed on Cloud Firestore - google-cloud-firestore

In beta, it stated that maximum writes per second per database (at beta) is 10k. However, I am not clear on the context.
Is that means that for example I have 1 million record in BigQuery. When I want to write each row's record from BigQuery into each document, it can write 10k documents in a second?
Or, assume we have 100k concurrent users connecting to the Firestore and everyone of them perform 1 update, so after 1 second, it will have 10k successful write in the Firestore, which will take 10 second for the Firestore to complete all the write from 100k concurrent users?

Related

Process last n records in pyspark kafka streaming

I need to do some operations on last 100 000 records (customer bills) of different stores each for every five minutes. What is the best approach or steps I need to follow in pyspark structured streaming? Input source is Kafka.
And also I have to gradually delete the old records beyond 100k records of each store, because I need only recent 100k records per every store at any time.
For example, I need to find out the details of the product 'p1' from last 100k records of store 'S1', and last 100k records from store 'S2' and so on.
Structured streaming don't have such option. One workaround is estimate messages per second, then calulate how long needed for 100K records. Finally, you can use startingTimestamp. Please refer https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-streaming-queries

How do calculate IO operations in Cloudant account for Bluemix pricing? (throughput - lookups, reads, writes, queries)

Coundant standatd plan is written as "100 reads / sec, 50 writes / sec, 5 global queries / sec".
Is this IO/s calculate end-to-end request? Or is it based on the query execution plan?
Let's give some examples
Q1.Let's say I use a Bulk operation to create 3 new documents in Cloudant (Bluemix Standard plan).
1 write operation? 3 write operation?
Q2. Query by aggregation(join)-1000 indexed docs with "name, age range, join time" and get as one docs.
1 read? 1000 + 1 read?
Q3. When I am using the standard plan (limit 100 reads / sec), it is assumed that 100 users executed the query in (Q2) at the same time.
How is IO calculated? 1 * 100 reads? (1000 + 1) * reads?
Do some users fail to execute queries because of limitation IO?
There is no data listed properly about Cloudant Price Method.
Can anyone please point me out correctly?
I want to know exactly how the standard plan calculation is measured.
It would be better if you could add a calculation example and answer!
Also answered here, https://developer.ibm.com/answers/questions/525190/how-do-calculate-io-operations-in-cloudant-account/
Bulk operations currently count as 1 W, regardless of the number of docs it contains.
A query is a request to a URL that has one of _design, _find or _search, again unreated to the number of documents actually involved. Note that some of these API endpoins (search) are paged, so it would be 1 Query per requested page of results.
I assume that by "100 users" you mean 100 concurrent connections using the same credentials, as Cloudant's rate limiting is applied per account. If so, the sum total of requests are counted towards the limit. When that bucket is full, any further requests will be cut off, and failed with a 429: Too Many Requests HTTP status code.
As an example, let's say you have a Standard account where you've set the rate limit to allow 100 queries per second. You have 100 concurrent connections hitting _find repeatedly, each query returning 1000 documents. Cloudant will allow 100 queries per second, so on average each of your connections will get 1 query per second fulfilled, and any attempts to push harder than that will results in 429 http errors. With 10 concurrent connections, on average each will get 10 qps etc.
Cloudant rate limits at the http level. No splitting of bulk ops into the constituent parts take place, at least not yet.
Documentation for how this all hangs together can be found here: https://cloud.ibm.com/docs/services/Cloudant?topic=cloudant-pricing#pricing"
The Cloudant offering in the IBM Cloud Catalog has a link to the documentation. In the docs is a description of the plans with additional examples. The docs also have sections that explain how read and write operations are calculated.
The http code 429 is returned by Cloudant to indicate too many requests. The documentation discusses this fact and there are code samples on how to handle it.

Google Firestore Scalability Limit

As per the Google Firebase documentation, Firestore will support upto 1,000,000 simultaneous connections and 10,000 writes per second per database. We have ran a scalalbility test to check the Firestore is viable for our solution and encountering the errors like "Connect to firebase.googleapis.com:443:Connection timed out" during write operations for 40K samples, "firestore.googleapis.com:443:failed to respond" during GET operations while the 40K samples writing. Would like to understand the Firestore limits and its scalalbility
Running JMeter script to write data to Firebase (Firestore Blaze Plan purchased for testing) on multiple VMs and PCs which are connected to wired network to check the scalalbility. The JMeter script write data to Firebase using REST API PATCH and each script on PC/VM writes 5K data in a period 5 minutes. There are total of 8 PC/VM which writes 40K data to Firebase. During this we also GET 500 records in 5 minutes 2 times a day. During this test we are htting the Firebase failures
The errors are "Connect to firebase.googleapis.com:443:Connection timed out" during write operations for 40K samples, "firestore.googleapis.com:443:failed to respond" and Connection reset.
Are you considering the other limits for your specific operations?
According to https://firebase.google.com/docs/firestore/quotas
For example, for the write operation:
Maximum writes per second per database = 10,000 (up to 10 MiB per second)
Maximum write rate to a document= 1 per second
Maximum write rate to a collection in which documents contain sequential values in an indexed field = 500 per second
Maximum number of writes that can be passed to a Commit operation or performed in a transaction = 500
During GET operations you have the limits: Maximum number of exists(), get(), and getAfter() calls per request:
10 for single-document requests and query requests.
20 for multi-document reads, transactions, and batched writes. The previous limit of 10 also applies to each operation.
For example, imagine you create a batched write request with 3 write operations and that your security rules use 2 document access calls to validate each write. In this case, each write uses 2 of its 10 access calls and the batched write request uses 6 of its 20 access calls.
Exceeding either limit results in a permission denied error.
Some document access calls may be cached, and cached calls do not count towards the limits.
I think that some parameter could be causing the abortion of these connections.

How many (Maximum) DB2 multi row fetch cursor can be maintained in PLI/COBOL program?

How many (Maximum) DB2 multi row fetch cursor can be maintained in PLI/COBOL program as part of good performance?
I have a requirement to maintain 4 cursors in PLI program but I am concerned about number of multi fetch cursors in single program.
Is there any other way to check multi row fetch is more effective than normal cursor? I tried with 1000 records but I couldn't see the running time difference.
IBM published some information (PDF) about multi-row fetch performance when this feature first became available in DB2 8 in 2005. Their data mentions nothing about the number of cursors in a given program, only the number of rows fetched.
From this I infer the number of multi-row fetch cursors itself is not of concern from a performance standpoint -- within reason. If someone pushes the limits of reason with 10,000 such cursors I will not be responsible for their anguish.
The IBM Redbook linked to earlier indicates there is a 40% CPU time improvement retrieving 10 rows per fetch, and a 50% CPU time improvement retrieving 100+ rows per fetch. Caveats:
The performance improvement using multi-row fetch in general depends
on:
Number of rows fetched in one fetch
Number of columns fetched
(more improvement with fewer columns), data type and size of the
columns
Complexity of the fetch. The fixed overhead saved for not
having to go between the database engine and the application program
has a lower percentage impact for complex SQL that has longer path
lengths.
If the multi-row fetch reads more rows per statement, it results in
CPU time improvement, but after 10 to 100 rows per multi-row fetch,
the benefit is decreased. The benefit decreases because, if the cost
of one API overhead per row is 100% in a single row statement, it gets
divided by the number of rows processed in one SQL statement. So it
becomes 10% with 10 rows, 1% with 100 rows, 0.1% for 1000 rows, and
then the benefit becomes negligible.
The Redbook also has some discussion of how they did their testing to arrive at their performance figures. In short, they varied the number of rows retrieved and reran their program several times, pretty much what you'd expect.

How I can speed up mongodb?

I have a crawler which consits of 6+ processes. Half of processes are masters which crawl web and when they find jobs they put it inside jobs collection. In most times masters save 100 jobs at once (well I mean they get 100 jobs and save each one of them separatly as fast as possible.
The second half of processes are slaves which constantly check if new jobs of some type are available for him - if yes it marks them in_process (it is done using findOneAndUpdate), then processe the job and save the result in another collection.
Moreover from time to time master processes have to read a lot of data from jobs table to synchronize.
So to sum up there are a lot of read operations and write operations on db. When db was small it was working ok but now when I have ~700k job records (job document is small, it has 8 fields and has proper indexes / compound indexes) my db slacks. I can observe it when displaying "stats" page which basicaly executes ~16 count operations with some conditions (on indexed fields).
When masters/slaves processes are not runing stats page displays after 2 seconds. When masters/slaves are runing same page displays around 1 minute and sometime is does not display at all (timeout).
So how I can make my db to handle more requests per second? I have to replicate it?