DynamoDB queries on Hierachical structures - nosql

On this lecture, the speaker gives two examples of Hierarchical Structures:
as item collections
as Documents (JSON)
Suppose it is needed to query for movies in which a particular actor appeared or the length of tracks, how would one proceed to achieve this?
I have a feeling that creating GSIs would work for the first case, but have no clue if it would be possible in the second case.

To the best of my knowledge, regarding the second case, there would be no way to query per anything inside the Attributes object.
If a query of that sort is needed, an hierarchical structure as item collection would be the choice.

Related

How to create search that shows results from multiple firestore collections in Flutter?

I have 2 different Firestore collections namely 'restaurants' and 'dishes'.
I currently have created 2 separate searches where user either searches for a dishes in search1 connected to 'dishes' collection or a dish in search2 connected to 'dishes' collection.
Please suggest on how to have just one search which searches in both collections in the backend and shows results from both 'restaurants' and 'dishes' at one place. Something like a universal search. So if user types 'burger' it should show 'chicken burger dish' as well as 'burger kind restaurant'.
Was able to make the searches work independently but need help with the strategy to combine the search in to one.
Thanks for your help.
Please suggest how to have just one search which searches in both collections in the backend and shows results from both 'restaurants' and 'dishes' in one place.
There is currently no way in which you can perform a search in two different collections at the same time. Why? Because the queries in Firestore are shallow, meaning that they only get documents from the collection that the query is run against. So to solve this, you have to perform two separate queries.
There is a workaround that might help, which would be to add all the necessary data on which you want to perform the search in a single document, most likely into a field of type array. In that way, you can perform a search using the "array contains" operator. Or for small datasets to search using contains.
If the above solution doesn't work for you, then you have to implement a third-party search service, as mentioned in the official documentation.

MongoDB/Mongoose whats the best way to count a lot of documents with a filter?

I understand that estimatedDocumentCount() uses metadata to count, which makes it faster when there are a lot of documents. But the drawback is that you cant add a filter on it like countDocuments(). What if there are still a lot of documents, but you want to use a filter, what's the best way to do that, if there is a way.
Well, you got it right.
countDocuments(...) is how you count documents with a filter.
If you're facing issues with speed, I'd advise you to add an index on the fields you're planning to filter with, this way it's an index scan and the result will be almost immediate.
It would follow that you are filtering based on the contents of the document, and that you would not be able to do so only with metadata.
You could add an index, or insert your data into an exclusive collection using normalized data models.
The documentation also suggests that you could create a view and count based on the metadata.

Use publications or separate into collections (performance)?

I have a collection, in which only two queries are ever called on it.
Ex. Cars.find({color: 'red'}); and Cars.find({color: 'blue'});
I was wondering if I should just create RedCars and BlueCars collections instead of using two publications on Cars.
Thinking of performance here, if the Cars collection were to get very large, would it be more performant to use two collections? Also, they are never called on the same template. Each has its own template.
Thanks
From a Mongo perspective, if you have a scenario where a single field across documents within a collection begins to look like an index (as you have described above) it will actually start to index queries against that field and make the return highly tuned. You can update this index (and if you have a lot of data that falls into scenario like you have described, you should tune this index), using standard Mongo indexing parameters against the database. There is more to this performance as well. For example, if it is a high read, low write, then Mongo will often keep portions or all of the query in memory for quick retrieval if it can.
As for whether it is better to split these into two collections. That's a tough one. From a performance standpoint it might be about the same either way if you tune your indexes properly and allow Mongo to do what it does best. However, from the meteor standpoint, I would consider it much easier to just keep them in a single collection from a code maintainability and testability standpoint.
In terms of performance, if the collection does get large, then your application will end up receiving alot more data than you expected it to if changes are made on either blue or red cars. A good solution rather than creating two collection is to use a parameterized subscription that will filter only on the data set you are looking at.
e.g.
Meteor.publish('cars', function(c) {
check(c, String);
return Cars.find({color: c});
});
Then you can access the data by subscribing Meteor.subscribe('cars', 'blue')

MongoDB database design - contest application

I'm building a contest application. Which have 4 collections so far:
contest
questions
matches
users
I want to store every user score for every match he's assigned into. But I really can't find a proper way to achieve this.
All what I've came up with, Is to replace matches in users with an array in which each element contains a reference to matches collection and score field. But I think this is not very efficient.
EDIT
I was thinking about another solution. A separate collection called scores that contains three fields user, match and score.
Here's my schema structure:
Contests:
Questions:
Matches:
Users:
Note Any recommended adjustments on the current design is welcomed too.
Since mongodb is not designed to support collections relationships you migth end up with some duplicated work, I would suggest you to find a way of storing as much data as you can in a single document.
Your scores would go in each match document, probably the users array would have this structure {'users':[{user_id:'xxx',score:xxx}{user_id:'xxx',score:xxx}]}
The other solution, would be what you say, to have in each user doccument, a matches array with a structure like this: {'matches':[{match_id:'xxx',score:xxx}{match_id:'xxx',score:xxx}]}
You can have both also, this migth be more efficient depending the kind of queries you will need to do. You can also have a field in the subdocuments that stores the user/match name/title
Note: As you can see, you have two solutions, or you optimize for doccument size(so you can store more) or you optimize for performance (so you can read faster/with less resources)
Hope this be of any help.

Query for set complement in CouchDB

I'm not sure that there is a good way to do with with the facilities CouchDB provides, but I'd like to somehow extract the relative complement of the sets of two different document types over a particular key.
For example, let's say that I have documents representing users and posts, both of which have a (unique) username field. There's a validation in place ensuring that a user document exists for the username in every post, but there may be any number post documents with a given username, include none. It's trivial to create a view which counts the number of posts per username. The view can even include zero-counts by emitting zero post-counts for the user documents in the view map function. What I want to do though is retrieve just the list of users who have zero associated posts.
It's possible to build the view I described above and filter client-side for zero-value results, but in my actual situation the number of results could be very, very large, and the interesting results a relatively small proportion of the total. Is there a way to do this sever-side and retrieve back just the interesting results?
I would write a map function to iterate through the documents and emit the users (or just usersnames) with 0 posts.
Then I would write a list function to iterate through the map function results and format them however you want (JSON, csv, etc).
(I would NOT use a reduce function to format the results, even if a reduce function appears to work OK in development. That is just my own experience from lessons learned the hard way.)
Personally I would filter on the client-side until I had performance issues. Next I would probably use Teddy's _filter technique—all pretty standard CouchDB stuff.
However, I stumbled across (IMO) an elegant way to find set complements. I described it when exploring how to find documents missing a field.
The basic idea
Finding non-members of your view obviously can't be done with a simple query (and a straightforward index scan.) However, it can be done in constant memory, and linear time, by simultaneously iterating through two query results at the same time.
One query is for all possible document ids. The other query is for matching documents (those you don't want). Importantly, CouchDB sorts query results, therefore you can calculate the complement efficiently.
See my details in the previous question. The basic idea is you iterate through both (sorted) lists simultaneously and when you say "hey, this document id is listed in the full set but it's missing in the sub-set, that is a hit.
(You don't have to query _all_docs, you just need two queries to CouchDB: one returning all possible values, and the other returning values not to be counted.)