Firestore ignore limit (on flutter) - flutter

I have a simple collections, and to test, i have created 10k documents in this collections.
After that, when i do a simple query with limit(5):
Firestore.instance.collection(myCollection).orderBy(myOrderBy).limit(5).getDocuments();
And i see that in my console :
W/CursorWindow(21291): Window is full: requested allocation 253420 bytes, free space 68329 bytes, window size 2097152 bytes
I/zygote64(21291): Background concurrent copying GC freed 535155(13MB) AllocSpace objects, 5(1240KB) LOS objects, 50% free, 17MB/35MB, paused 60us total 102.836ms
When i go to my Dashboard Firebase i see i have 10k read.
So I conclude that my query returns 6 results, but that it reads the entire database. Which can quickly decrease performance and increase the price.
I looked for a solution and I find this:
Firestore.instance.settings(persistenceEnabled: false;)
It seems to be working, but I have trouble understanding.
By default Firestore loads the entire collection to be able to make requests Offline?
Changing the firestore settings when launching the application would be enough, I'm not likely to be surprised?
And if I disable persistence I assume that if the user makes an offline write request, it will no longer be persisted when he is online again. Is a compromise possible?
Thanks,

Firestore's offline storage behaves as a cache, persisting any documents it has recently seen. It does not pre-load documents you haven't told it to load with a query/read operation, so in the query you show that would be at most 5 documents for each time you execute the query.
Did you add the 10K documents from the same client where you are running the query by any chance? If so, the local cache of that client may/would contain all those documents, since the client added them. You'll want to uninstall/reinstall the client to wipe the cache in that case, to get a more realistic experience of what your users would get.
The fact that you see 10K reads in your usage tab is a separate issue, not explained by the code you shared. One things to keep in mind is that documents loaded in the console are also charged reads.

Related

Bulk update and insert using Meteor method call in loop making high cpu usage

My application is on METEOR#1.6.0.1 and I am using  reywood:publish-composite ,  matb33:collection-hooks  for db relations.
I need to insert a list of 400 people into collection from excel file, for it currently i am inserting from client using Meteor method inside loop but when i see on galaxy during this CPU usage is very high 70-80% or some time 100%.
Once all data inserted, i need to send a mail and update the record so i am sending mail and update using Meteor method call one by one that again making CPU 70-80%.
How i can do above task in correct and efficient way. Please help.
Thanks.
I suspect that you are not using oplog tailing and you are trying to insert when some other part of your app has subscriptions to publications open. Without this meteor polls the collections and generates lots of slow queries at each document insert.
You can enable it by passing an url to meteor at startup. See https://docs.meteor.com/environment-variables.html#MONGO-OPLOG-URL for more info.
Having oplog tailing eases the strain on the server and should reduce the high cpu usage to a manageable level.
If you are still having issues then you may have to set up some tracing e.g. monti-apm https://docs.montiapm.com/introduction

What is the optimal way to do server side paging in expressjs with mongoose

I'm currently doing a project with my own MEAN stack.
Now in a new project I'm creating I've got a collection that I'm paging with Express on serverside, returning the page size every time (e.g 10 results out of the total 2000) and the total rows found for the query the user preformed (e.g 193 for UserID 3).
Although this works fine, I'm afraid that this will create an enormous load on the server since a user can easily pull 50-60 pages a session with 10, 20, 50 or even 100 results each.
My question to you guys is: if I have say 1000 concurrent users paging every few seconds like this, will MongoDB be able to cope with this? If not, what might be my alternatives here?
Also is there anyway I can simulate such concurrent read tests on my app/MongoDB?
Please take in account that I must do server side paging because the app will be quite dynamic and information can change very often.
If you're planning on only using a single webserver, you could cache the result set belonging to a certain page in memory. If you're planning on using multiple webservers, caching in-memory would lead to different result sets across servers, so in that case I'd recommend storing your cache either in MongoDB or in Redis.
A certain result set would be stored under a certain key in your cache. Your key would probably be composed of something like entityName + filterOptions + offset + resultsLimit. So for example you're loading movies with title=titanic, skipping the first 100, so offset=100 and loading only 50 per page so limit=50, which would all be concatenated into a single key.
When a request comes in, you would first try to load the result set from the cache. If the result set is inside the cache, you'll return that to the client. If it's not in the cache, you'd query the database for the latest result set, put that in the cache and return it to the client.
Whether or not you could pull it off with 1000 concurrent users depends a lot on your hardware, the data you are loading, how you're loading it and the efficiency of your implementation. There's one way to find out, and that's testing.
Of course by using the asynchronous capabilities of Node.js you can achieve the best scalability, so every call that can be executed async, such as database calls, should definitely be executed asynchronously.
You could load test your application for free from your local computer using Apache JMeter or let it be tested using for example Azure.

Lucene searches are slow via AzureDirectory

I'm having trouble understanding the complexities of Lucene. Any help would be appreciated.
We're using a Windows Azure blob to store our Lucene index, with Lucene.Net and AzureDirectory. A WorkerRole contains the only IndexWriter, and it adds 20,000 or more records a day, and changes a small number (fewer than 100) of the existing documents. A WebRole on a different box is set up to take two snapshots of the index (into another AzureDirectory), alternating between the two, and telling the WebService which directory to use as it becomes available.
The WebService has two IndexSearchers that alternate, reloading as the next snapshot is ready--one IndexSearcher is supposed to handle all client requests at a time (until the newer snapshot is ready). The IndexSearcher sometimes takes a long time (minutes) to instantiate, and other times it's very fast (a few seconds). Since the directory is physically on disk already (not using the blob at this stage), we expected it to be a fast operation, so this is one confusing point.
We're currently up around 8 million records. The Lucene search used to be so fast (it was great), but now it's very slow. To try to improve this, we've started to IndexWriter.Optimize the index once a day after we back it up--some resources online indicated that Optimize is not required for often-changing indexes, but other resources indicate that optimization is required, so we're not sure.
The big problem is that whenever our web site has more traffic than a single user, we're getting timeouts on the Lucene search. We're trying to figure out if there's a bottleneck at the IndexSearcher object. It's supposed to be thread-safe, but it seems like something is blocking the requests so that only a single search is performed at a time. The box is an Azure VM, set to a Medium size so it has lots of resources available.
Thanks for whatever insight you can provide. Obviously, I can provide more detail if you have any further questions, but I think this is a good start.
I have much larger indexes and have not run into these issues (~100 million records).
Put the indexes in memory if you can (8 million records sounds like it should fit into memory depending on the amount of analyzed fields etc.) You can use the RamDirectory as the cache directory
IndexSearcher is thread-safe and supposed to be re-used, but I am not sure if that is the reality. In Lucene 3.5 (Java version) they have a SearcherManager class that manages multiple threads for you.
http://java.dzone.com/news/lucenes-searchermanager
Also a non-Lucene post, if you are on an extra-large+ VM make sure you are taking advantage of all of the cores. Especially if you have an Web API/ASP.NET front-end for it, those calls all should be asynchronous.

Zero evictions memcached, but items still disappear

Items stored in Memcached seem to disappear without reason (TTL: 86400 but sometimes gone within 60s). However there's enough free space, and stats give zero evictions.
The items that get lost seem to be the larger items. They seem to disappear after adding some other big items. Could it be the case "The slab" for larger items is full and items are being evicted without being reported?
Memcached version 1.4.5.
Keys can get evicted before their expiration in memcached; this is a side effect of how memcached handles memory (see this answer for more details).
If the items you are storing are large enough that this is becoming a problem, memcached may be the wrong tool for the task you are trying to perform. You essentially have 2 practical options in this scenario:
break down the data you're trying to cache in smaller chunks
if this isn't feasible for any reason, you will have to use some sort of permanent storage, the nature of which will be dependent on the nature of data you're trying to store (choices would include redis, mongodb, SQL database, filesystem, etc.)

How does MonogoDB stack up for very large data sets where only some of the data is volatile

I'm working on a project where we periodically collect large quantities of e-mail via IMAP or POP, perform analysis on it (such as clustering into conversations, extracting important sentences etc.), and then present views via the web to the end user.
The main view will be a facebook-like profile page for each contact of the the most recent (20 or so) conversations that each of them have had from the e-mail we capture.
For us, it's important to be able to retrieve the profile page and recent 20 items frequently and quickly. We may also be frequently inserting recent e-mails into this feed. For this, document storage and MongoDB's low-cost atomic writes seem pretty attractive.
However we'll also have a LARGE volume of old e-mail conversations that won't be frequently accessed (since they won't appear in the most recent 20 items, folks will only see them if they search for them, which will be relatively rare). Furthermore, the size of this data will grow more quickly than the contact store over time.
From what I've read, MongoDB seems to more or less require the entire data set to remain in RAM, and the only way to work around this is to use virtual memory, which can carry a significant overhead. Particularly if Mongo isn't able to differentiate between the volatile data (profiles/feeds) and non-volatile data (old emails), this could end up being quite nasty (and since it seems to devolve the virtual memory allocation to the OS, I don't see how the this would be possible for Mongo to do).
It would seem that the only choices are to either (a) buy enough RAM to store everything, which is fine for the volatile data, but hardly cost efficient for capturing TB of e-mails, or (b) use virtual memory and see reads/writes on our volatile data slow to a crawl.
Is this correct, or am I missing something? Would MongoDB be a good fit for this particular problem? If so, what would the configuration look like?
MongoDB does not "require the entire data set to remain in RAM". See http://www.mongodb.org/display/DOCS/Caching for an explanation as to why/how it uses virtual memory the way it does.
It would be fine for this application. If your sorting and filtering were more complex you might, for example, want to use a Map-Reduce operation to create a collection that's "display ready" but for a simple date ordered set the existing indexes will work just fine.
MongoDB uses mmap to map documents into virtual memory (not physical RAM). Mongo does not require the entire dataset to be in RAM but you will want your 'working set' in memory (working set should be a subset of your entire dataset).
If you want to avoid mapping large amounts of email into virtual memory you could have your profile document include an array of ObjectIds that refer to the emails stored in a separate collection.
#Andrew J
Typical you need enough RAM to hold your working set, this is true for MongoDB as it is for an RDBMS. So if you want to hold the last 20 emails for all users without going to disk, then you need that much memory. If this exceed the memory on a single system, then you can use MongoDB's sharding feature to spread data across multiple machines, therefore aggregating the Memory, CPU and IO bandwidth of the machines in the cluster.
#mP
MongoDB allows you as the application developer to specify the durability of your writes, from a single node in memory to multiple nodes on disk. The choice is your depending on what your needs are and how critical the data is; not all data is created equally. In addition in MongoDB 1.8, you can specify --dur, this writes a journal file for all the writes. This further improves the durability of writes and speeds up recovery if there is a crash.
And what happens if your computer crashes to all the stuff Mongo had in memory. Im guessing that it has no logs so the answer is probably bad luck.