I am working with an interesting scenario that I am not sure can work, or will work well. With my current project I am trying to find an efficient way of working with geopoints in firestore. The straight forward approach where a document can contain a geopoints field is pretty self explanatory and easy to query. However, I am having to work with a varying amount of geopoints for a single document (Article). The reason for this is because a specific piece of content may need to be available in more than one geographic area.
For example, An article may need to be available only in NYC, Denver and Seattle. Using a geopoint for each location and searching by radius, in general, is a pretty standard task if I only wanted the article to be available in Seattle, but now it needs to be available in two more places.
The solution as I see it currently is to use an array and fill it with geopoints. The structure would look something like this:
articleText (String),
sortTime (Timestamp),
tags (Array)
- ['tagA','tagB','tagC','tagD'],
availableLocations (Array)
- [(Geopoint), (Geopoint), (Geopoint), (Geopoint)]
Then performing a query to get all content within 10 miles of a specific Geopoint starting at a specific postTime.
What I don't know is if putting the geopoints in an array works well or should be avoided in favor or another data structure.
I have considered replicating an article document for each geopoint, but that does not scale very well if more than a handful of locations needs defining. I've also considered creating a "reference" collection where each point is a document that contains the documentID of an article, but this leads to reading each reference document then reading the actual document. Essentially two document reads for 1 piece of content, which can get expensive based on the Firestore pricing model, and may slow things down unnecessarily.
Am I approaching this in an acceptable way? And are there other methods that can work more efficiently?
Related
I have been continuously changing my database structure and just can't find a good solution.
My database stores habits, each document is a different habit. The fields of the habits hold information on the habit, but where I am struggling is how to structure the completion/history.
Unlike a task, a habit is completed over and over again which means I need to store all that information and be able to query it easily for date ranges and such.
First
The first idea I had was a subcollection of documents that each held information such as the date, the completion status, etc.
Although this idea sounds good the problem was querying. I am using flutter and inside a StreamBuilder it proved difficult to access a sub-collection.
Second
My second idea was to use a document field in the habit. With this, I could have an array called completed, and the array would store a Map with the date, the status, etc.
With this idea, I made it the furthest, but where I struggle is changing the status of one field in the Map in the array.
Third
My third and most recent idea I haven't made much headway with yet but I was thinking I could use a nest Maps in the habits document field. A field called history would be equivalent to the map, then inside the Map, a date in string format would be equivalent to the Map that stores all the essential information.
This idea was great in a lot of ways, but querying the data by date range when the date is in a string format will inevitably be difficult.
Please let me know which structure is more ideal, or if there is a better way to do it. The language I am using is Flutter so I need to operate within the guidelines of the cloud_firestore package. Any help would be appreciated!
Prior to allensdk version 0.14.5, the CellTypesCache.get_cells() function returned a large, nested structure containing information about cell morphology, ephys features, location, anatomical structure, tissue donors, etc. In version 0.14.5, the structure returned is flat and much smaller.
I see that some of this information is available through get_ephys_features() and get_morphology_features(), but I'm not sure where to find the rest. Where can I go to find out how to migrate my code to the new allensdk version?
Great question. We simplified the returned dictionary from CellTypesCache.get_cells for a few reasons:
There were a large number of fields that were variously: unexplained, not useful, distracting, and/or redundant with data returned from other functions.
The way brain structures were handled made it very difficult to filter cells by cortical layer across species.
The query involved a large number of joins and was fairly slow.
(2) was probably the most urgent issue we needed to address. The new dictionary structure is explained in a bit more detail here:
https://github.com/AllenInstitute/AllenSDK/wiki/Release-Notes-(0.14.5)
You are correct that you should look for ephys. and morphology features from CellTypesCache.get_ephys_features and CellTypesCache.get_morphology_features (or just CellTypesCache.get_all_features).
If there are any fields you were using in the old dictionary structure that are not now available in the current dictionary, let me know and we can find them again.
I got a question about modeling wishlists using mongodb and mongoose. The idea is I need a user beeing able to have many different wishlists which contain many wishes, each wish making a reference to a single article
I was thinking about it and because a wishlist only belong to a single user I thought using embedded document for that.
Same for the wish beeing embedded to a wishlist.
So I got something like that
var UserSchema = new Schema({
...
wishlists: [wishlistSchema]
...
})
var WishlistSchema = new Schema({
...
wishes: [wishSchema]
...
})
but my question is what to do with the article ? should I use a reference or should I copy the article's data in an embedded document.
If I use embedded document I got an update problem. When the article's price change, to update every wish referencing this article become a struggle. But to access those wishes's article is a piece of cake.
If I use reference, The update is not a problem anymore but I got a probleme when I filter the wish depending on their article criteria ( when I filter the wishes depending on a price, category etc .. ).
I think the second way is probably the best but I don't know how if it's possible to build a query to filter the wish depending on the article's field. I tried a lot of things using population but nothing works very well when you need to populate depending on a nested object field. ( for exemple getting wishes where their article respond to certain conditions ).
Is this kind of query doable ?
Sry for the loooong question and for my bad English :/ but any advice would be great !
In my experience in dealing with NoSQL database (mongo, mainly), when designing a collection, do not think of the relations. Instead, think of how you would display, page, and retrieve the documents.
I would prefer embedding and updating multiple schema when there's a change, as opposed to doing a ref, for multiple reasons.
Get would be fast and easy and filter is not a problem (like you've said)
Retrieve operations usually happen a lot more often than updates and with proper indexing, you wouldn't really have to bother about performance.
It leverages on NoSQL's schema-less nature and you'll be less prone restructuring due to requirement changes (new sorting, new filters, etc)
Paging would be a lot less of a hassle, and UI would not be restricted with it's design with paging and limit.
Joining could become expensive. Redundant data might be a hassle to update but it's always better than not being able to display a data in a particular way because your schema is normalized and joining is difficult.
I'd say that the rule of thumb is that only split them when you do not need to display them together. It is not impossible to join them back if you do, but definitely more troublesome.
I am building a real estate application using node and MongoDB. I have two major models
City
Property
I am now confused because I don't know if I should create a separate collection for cities and one for properties. Or I should put all the properties under it's city?
I am confused because I think when the application grow, large cities will be huge documents, which is a design decision should be done by the first.
Please let me know if you have a best practice way to handle this kind of situations.
As every property has only one city, this is a one-to-many relationship. In this case you have many options:
Firstly, remember the 16 MB document size restriction per document. So, how big the "many" is. How many properties per city?
One-to-Few (just a few hundred): embedding the "few" (property) in "one" (city).
One-to-Many (no more than a couple of thousand): child-referencing.
The ObjectIDs of the "many" (property) doc in an array in "one"(city) document.
One-to-Squillions: parent-referencing.
Store the ObjectId of the "one" (city) in the "many" (property) document.
Secondly, if there’s an high ratio of reads to updates, you can considering denormalization. Paying the price of slower and complex updates in order to get more efficient queries.
A proposed solution: having only one collection (properties) and in their documents embed the city document.
As probably, you are going to retrieve the properties by city, don't forget to create an index on the city field.
Recommending posts:
http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1
http://blog.mongodb.org/post/87892923503/6-rules-of-thumb-for-mongodb-schema-design-part-2
http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3
I assume the city center coordinates are not going to change very often. So embedding cities to properties is possible if the city documents are not very big and you need information abot them each time you read a property. On the other hand, you can put cities into a separate collection and link it from a property. Finally, you can embed most useful information about a city along with link to a property (mixed approach). Embedding a city to property will introduce some duplication but may increase read performance. To choose most appropriate option you need to understand what read/write requests the application will make most often.
I have a core data application and I would like to get results from the db, based on certain parameters. For example if I want to grab only the events that occured in the last week, and the events that occured in the last month. Is it better to do a fetch for the whole entity and then work with that result array, to create arrays out of that for each situation, or is it better to use predicates and make multiple fetches?
The answer depends on a lot of factors. I'd recommend perusing the documentation's description of the various store types. If you use the SQLite store type, for example, it's far more efficient to make proper use of date range predicates and fetch only those in the given range.
Conversely, say you use a non-standard attribute like searching for a substring in an encrypted string - you'll have to pull everything in, decrypt the strings, do your search, and note the matches.
On the far end of the spectrum, you have the binary store type, which means the whole thing will always be pulled into memory regardless of what kind of fetches you might do.
You'll need to describe your managed object model and the types of fetches you plan to do in order to get a more specific answer.