About MongoDB documents - mongodb

I am new to the NoSQL/MongoDB world and so I am experimenting with several things.
Let's say I would like to create a blog using MongoDB.
I could create a Blog post like this:
{
Title: "My First Post",
Body: "Bla de bla de bla bli bla de bah"
Date: "07/07/2013" // or 06/07/2013 when using javascript date notation
}
And then I would like my readers to post comments. One thing I know about my readers is they get very involved. They write thousands of comments to my blog posts.
First question:
Is it a good idea to embed the comments? Or is it better to store them in their own collection with a reference to the blog post id?
Here's another example. Let's say I'd like to create a social media like web site that has several different types of objects (i.e.: blog posts, video's, contacts) and people can subcribe to the objects so they can comment and read other people's comments.
A comment feed would look like this:
{
Type:[either blog post,video or contact]
Name:"Comments on this crazy video"
SubscribedUsers:[userid1,userid2,userid3...userid999]
Comments: {
{
Name:"Purple Dog",
Date: "07/07/2013 09:12:23",
Text: "Bla bla bla"
},
{
Name:"Shizzly Feather",
Date: "07/07/2013 09:23:08",
Text: "I agree with Purple Dog."
}
}
}
(maybe the notation is a bit off but I hope you get what I mean)
Question two:
Is the above example a good idea for a site like this? Is MongoDB a good fit or should I not use it for stuff like this? (What should I use then?) Or is there another way to achieve the same result (in the end, I would like to show the user an aggregated feed with all comments sorted by date DESC for the feeds they are subscribed to)
What I'm trying to learn (and what hopefully is useful to others) is when to choose MongoDB/NoSQL and when to stick with RDBMS's.

If you are using MongodB, you are better off embedding the comments because the comments are inextricably linked to each post - In fact, comments that are not germane/relevant to the post might be viewed as spam. The embedding shines when you ask for say, every comment that's associated with the post. The embedding sucks when you ask for say, every comment that's been made by anyone in the last 6 hours, but I can't think of a reason why you might want to ask for such a thing :)
You'll like RDBMS if you put the comments in their own table and you DO ask questions like "give me all comments that were made by everyone in the last six hours". On the other hand, RDBMS sucks when Julie C. (rightfully) expects to see all comments associated with her post and you have to run a "join" operation between the "poster" table, the "posts" table and the "comments" table to give her what she expects.
So, choose your poison :)

Here is an another approach which is a hybrid called a bucketing strategy
http://www.slideshare.net/jrosoff/mongodb-advanced-schema-design-inboxes

Related

Can you list multiple features within the same Schema.org "LocationFeatureSpecification"?

I am working on Schema.org Resort schema for a ton of resorts on a travel website and am trying to find the most efficient ways of filling out the schema with regards to amenities.
The current code looks something like this:
"amenityFeature": [
{
"#type":"http://schema.org/LocationFeatureSpecification",
"name":"Spa",
"value":"true"
},
{
"#type":"http://schema.org/LocationFeatureSpecification",
"name":"Internet Access",
"value":"true"
},
{
"#type":"http://schema.org/LocationFeatureSpecification",
"name":"Tennis Courts",
"value":"true"
}
]
My question is, can I write it like this instead to shorten lines of code:
{
"#type":"http://schema.org/LocationFeatureSpecification",
"name":[
"Spa", "Internet Access", "Tennis Courts"
],
"value":"true"
}
When I test it in Google’s Structured Data Testing Tool, it doesn’t give any errors. Here is what it looks like in the SDTT when I write it the short way:
And here is what it looks like if I do it the first/long way:
If I do it the short way, I want to make sure all those items are getting listed as amenities and not just different names for the same amenity. Otherwise, I'll go the long route.
No, each LocationFeatureSpecification represents one feature:
Specifies a location feature by providing a structured value representing a feature of an accommodation as a property-value pair of varying degrees of formality.
Your second snippet would represent one feature with multiple names.

Autocomplete with Firebase

How does one use Firebase to do basic auto-completion/text preview?
For example, imagine a blog backed by Firebase where the blogger can tag posts with tags. As the blogger is tagging a new post, it would be helpful if they could see all currently-existing tags that matched the first few keystrokes they've entered. So if "blog," "black," "blazing saddles," and "bulldogs" were tags, if the user types "bl" they get the first three but not "bulldogs."
My initial thought was that we could set the tag with the priority of the tag, and use startAt, such that our query would look something like:
fb.child('tags').startAt('bl').limitToFirst(5).once('value', function(snap) {
console.log(snap.val())
});
But this would also return "bulldog" as one of the results (not the end of the world, but not the best either). Using startAt('bl').endAt('bl') returns no results. Is there another way to accomplish this?
(I know that one option is that this is something we could use a search server, like ElasticSearch, for -- see https://www.firebase.com/blog/2014-01-02-queries-part-two.html -- but I'd love to keep as much in Firebase as possible.)
Edit
As Kato suggested, here's a concrete example. We have 20,000 users, with their names stored as such:
/users/$userId/name
Oftentimes, users will be looking up another user by name. As a user is looking up their buddy, we'd like a drop-down to populate a list of users whose names start with the letters that the searcher has inputted. So if I typed in "Ja" I would expect to see "Jake Heller," "jake gyllenhaal," "Jack Donaghy," etc. in the drop-down.
I know this is an old topic, but it's still relevant. Based on Neil's answer above, you more easily search doing the following:
fb.child('tags').startAt(queryString).endAt(queryString + '\uf8ff').limit(5)
See Firebase Retrieving Data.
The \uf8ff character used in the query above is a very high code point
in the Unicode range. Because it is after most regular characters in
Unicode, the query matches all values that start with queryString.
As inspired by Kato's comments -- one way to approach this problem is to set the priority to the field you want to search on for your autocomplete and use startAt(), limit(), and client-side filtering to return only the results that you want. You'll want to make sure that the priority and the search term is lower-cased, since Firebase is case-sensitive.
This is a crude example to demonstrate this using the Users example I laid out in the question:
For a search for "ja", assuming all users have their priority set to the lowercased version of the user's name:
fb.child('users').
startAt('ja'). // The user-inputted search
limitToFirst(20).
once('value', function(snap) {
for(key in snap.val()){
if(snap.val()[key].indexOf('ja') === 0) {
console.log(snap.val()[key];
}
}
});
This should only return the names that actually begin with "ja" (even if Firebase actually returns names alphabetically after "ja").
I choose to use limitToFirst(20) to keep the response size small and because, realistically, you'll never need more than 20 for the autocomplete drop-down. There are probably better ways to do the filtering, but this should at least demonstrate the concept.
Hope this helps someone! And it's quite possible the Firebase guys have a better answer.
(Note that this is very limited -- if someone searches for the last name, it won't return what they're looking for. Hence the "best" answer is probably to use a search backend with something like Kato's Flashlight.)
It strikes me that there's a much simpler and more elegant way of achieving this than client side filtering or hacking Elastic.
By converting the search key into its' Unicode value and storing that as the priority, you can search by startAt() and endAt() by incrementing the value by one.
var start = "ABA";
var pad = "AAAAAAAAAA";
start += pad.substring(0, pad.length - start.length);
var blob = new Blob([start]);
var reader = new FileReader();
reader.onload = function(e) {
var typedArray = new Uint8Array(e.target.result);
var array = Array.prototype.slice.call(typedArray);
var priority = parseInt(array.join(""));
console.log("Priority of", start, "is:", priority);
}
reader.readAsArrayBuffer(blob);
You can then limit your search priority to the key "ABB" by incrementing the last charCode by one and doing the same conversion:
var limit = String.fromCharCode(start.charCodeAt(start.length -1) +1);
limit = start.substring(0, start.length -1) +limit;
"ABA..." to "ABB..." ends up with priorities of:
Start: 65666565656565650000
End: 65666665656565650000
Simples!
Based on Jake and Matt's answer, updated version for sdk 3.1. '.limit' no longer works:
firebaseDb.ref('users')
.orderByChild('name')
.startAt(query)
.endAt(`${query}\uf8ff`)
.limitToFirst(5)
.on('child_added', (child) => {
console.log(
{
id: child.key,
name: child.val().name
}
)
})

Meteor + MongoDB - display list of posts with their comments count

I'm very new to Meteor, so this question might sound awkward. I'm trying to display a list of all posts
Posts = new Meteor.Collection('posts');
and then in Meteor.publish('posts', ...)
return Posts.find();
and show a number of comments related to each of those posts. Comments are stored in a separate collection
Comments = new Meteor.Collection('comments')
I don't want users to download all comments from the database just to find out comments count for each of the posts - I'm not displaying them here. So
Meteor.publish('comments', function(){
return Comments.find();
})
is not an option.
I know I could denormalize the data and store commentsCount in Post documents. But is there any other way to do this? I'd like it to be observable - or rather live updating, of course. I know how to do it when displaying a single post, but I don't know how to do it for the whole list.
did you know that you can add parameters to your publish function?
you could have something like
Meteor.publish('comments', function(post){
return Comments.find({postId: post._id});
})
this way you get only the comments for one post at a time.
Hope this was what you were looking for
If you're just looking for the number of comments I think you can simply use:
{{commentsCount}}
So an example would be something like:
<h1>Post Title</h1>
<p>{{commentsCount}} comments</p>
Make sure that is used within a post template though or else it won't know which comments it should be counting for.

what is the difference between doc.Content.Text and doc.Range(start, end).Text

Could you please explain what is the difference between doc.Content.Text and doc.Range(start, end).Text
Actually, if I extract a string like
doc.Content.Text.SubString(start, lenofText)
and if I do the same with
doc.Range(start, start + lenofText)
I get correct result for doc.Content.Text but incorrect result with doc.Range ... do you know the reason? I need to find a text and then convert it to a Hyper LINK but the doc.Range does not give the me the correct results...
Your description is a little vague (for instance, how is it not the correct results?) but a document is actually comprised of as many as 17 story parts (which includes things like the main story [the document area], footers, headers, footnotes, and comments). 'Content' refers specifically to the main text story. ‘Doc.Range’ is broader and can include more than one story. If the results are not correct because it looks like the text is offset by a certain number of characters, it may be counting other stories. If you want to limit the results to the body text, specify one of the following:
doc.Content
doc.StoryRanges(wdMainTextStory)

RESTful URL design for search

I'm looking for a reasonable way to represent searches as a RESTful URLs.
The setup: I have two models, Cars and Garages, where Cars can be in Garages. So my urls look like:
/car/xxxx
xxx == car id
returns car with given id
/garage/yyy
yyy = garage id
returns garage with given id
A Car can exist on its own (hence the /car), or it can exist in a garage. What's the right way to represent, say, all the cars in a given garage? Something like:
/garage/yyy/cars ?
How about the union of cars in garage yyy and zzz?
What's the right way to represent a search for cars with certain attributes? Say: show me all blue sedans with 4 doors :
/car/search?color=blue&type=sedan&doors=4
or should it be /cars instead?
The use of "search" seems inappropriate there - what's a better way / term? Should it just be:
/cars/?color=blue&type=sedan&doors=4
Should the search parameters be part of the PATHINFO or QUERYSTRING?
In short, I'm looking for guidance for cross-model REST url design, and for search.
[Update] I like Justin's answer, but he doesn't cover the multi-field search case:
/cars/color:blue/type:sedan/doors:4
or something like that. How do we go from
/cars/color/blue
to the multiple field case?
For the searching, use querystrings. This is perfectly RESTful:
/cars?color=blue&type=sedan&doors=4
An advantage to regular querystrings is that they are standard and widely understood and that they can be generated from form-get.
The RESTful pretty URL design is about displaying a resource based on a structure (directory-like structure, date: articles/2005/5/13, object and it's attributes,..), the slash / indicates hierarchical structure, use the -id instead.
Hierarchical structure
I would personaly prefer:
/garage-id/cars/car-id
/cars/car-id #for cars not in garages
If a user removes the /car-id part, it brings the cars preview - intuitive. User exactly knows where in the tree he is, what is he looking at. He knows from the first look, that garages and cars are in relation. /car-id also denotes that it belongs together unlike /car/id.
Searching
The searchquery is OK as it is, there is only your preference, what should be taken into account. The funny part comes when joining searches (see below).
/cars?color=blue;type=sedan #most prefered by me
/cars;color-blue+doors-4+type-sedan #looks good when using car-id
/cars?color=blue&doors=4&type=sedan #also possible, but & blends in with text
Or basically anything what isn't a slash as explained above.
The formula: /cars[?;]color[=-:]blue[,;+&], though I wouldn't use the & sign as it is unrecognizable from the text at first glance if that's your thing.
** Did you know that passing JSON object in URI is RESTful? **
Lists of options
/cars?color=black,blue,red;doors=3,5;type=sedan #most prefered by me
/cars?color:black:blue:red;doors:3:5;type:sedan
/cars?color(black,blue,red);doors(3,5);type(sedan) #does not look bad at all
/cars?color:(black,blue,red);doors:(3,5);type:sedan #little difference
possible features?
Negate search strings (!)
To search any cars, but not black and red:
?color=!black,!red
color:(!black,!red)
Joined searches
Search red or blue or black cars with 3 doors in garages id 1..20 or 101..103 or 999 but not 5
/garage[id=1-20,101-103,999,!5]/cars[color=red,blue,black;doors=3]
You can then construct more complex search queries. (Look at CSS3 attribute matching for the idea of matching substrings. E.g. searching users containing "bar" user*=bar.)
Conclusion
Anyway, this might be the most important part for you, because you can do it however you like after all, just keep in mind that RESTful URI represents a structure which is easily understood e.g. directory-like /directory/file, /collection/node/item, dates /articles/{year}/{month}/{day}.. And when you omit any of last segments, you immediately know what you get.
So.., all these characters are allowed unencoded:
unreserved: a-zA-Z0-9_.-~
Typically allowed both encoded and not, both uses are then equivalent.
special characters: $-_.+!*'(),
reserved: ;/?:#=&
May be used unencoded for the purpose they represent, otherwise they must be encoded.
unsafe: <>"#%{}|^~[]`
Why unsafe and why should rather be encoded: RFC 1738 see 2.2
Also see RFC 1738#page-20 for more character classes.
RFC 3986 see 2.2
Despite of what I previously said, here is a common distinction of delimeters, meaning that some "are" more important than others.
generic delimeters: :/?#[]#
sub-delimeters: !$&'()*+,;=
More reading:
Hierarchy: see 2.3, see 1.2.3
url path parameter syntax
CSS3 attribute matching
IBM: RESTful Web services - The basics
Note: RFC 1738 was updated by RFC 3986
Although having the parameters in the path has some advantages, there are, IMO, some outweighing factors.
Not all characters needed for a search query are permitted in a URL. Most punctuation and Unicode characters would need to be URL encoded as a query string parameter. I'm wrestling with the same problem. I would like to use XPath in the URL, but not all XPath syntax is compatible with a URI path. So for simple paths, /cars/doors/driver/lock/combination would be appropriate to locate the 'combination' element in the driver's door XML document. But /car/doors[id='driver' and lock/combination='1234'] is not so friendly.
There is a difference between filtering a resource based on one of its attributes and specifying a resource.
For example, since
/cars/colors returns a list of all colors for all cars (the resource returned is a collection of color objects)
/cars/colors/red,blue,green would return a list of color objects that are red, blue or green, not a collection of cars.
To return cars, the path would be
/cars?color=red,blue,green or /cars/search?color=red,blue,green
Parameters in the path are more difficult to read because name/value pairs are not isolated from the rest of the path, which is not name/value pairs.
One last comment. I prefer /garages/yyy/cars (always plural) to /garage/yyy/cars (perhaps it was a typo in the original answer) because it avoids changing the path between singular and plural. For words with an added 's', the change is not so bad, but changing /person/yyy/friends to /people/yyy seems cumbersome.
To expand on Peter's answer - you could make Search a first-class resource:
POST /searches # create a new search
GET /searches # list all searches (admin)
GET /searches/{id} # show the results of a previously-run search
DELETE /searches/{id} # delete a search (admin)
The Search resource would have fields for color, make model, garaged status, etc and could be specified in XML, JSON, or any other format. Like the Car and Garage resource, you could restrict access to Searches based on authentication. Users who frequently run the same Searches can store them in their profiles so that they don't need to be re-created. The URLs will be short enough that in many cases they can be easily traded via email. These stored Searches can be the basis of custom RSS feeds, and so on.
There are many possibilities for using Searches when you think of them as resources.
The idea is explained in more detail in this Railscast.
Justin's answer is probably the way to go, although in some applications it might make sense to consider a particular search as a resource in its own right, such as if you want to support named saved searches:
/search/{searchQuery}
or
/search/{savedSearchName}
I use two approaches to implement searches.
1) Simplest case, to query associated elements, and for navigation.
/cars?q.garage.id.eq=1
This means, query cars that have garage ID equal to 1.
It is also possible to create more complex searches:
/cars?q.garage.street.eq=FirstStreet&q.color.ne=red&offset=300&max=100
Cars in all garages in FirstStreet that are not red (3rd page, 100 elements per page).
2) Complex queries are considered as regular resources that are created and can be recovered.
POST /searches => Create
GET /searches/1 => Recover search
GET /searches/1?offset=300&max=100 => pagination in search
The POST body for search creation is as follows:
{
"$class":"test.Car",
"$q":{
"$eq" : { "color" : "red" },
"garage" : {
"$ne" : { "street" : "FirstStreet" }
}
}
}
It is based in Grails (criteria DSL): http://grails.org/doc/2.4.3/ref/Domain%20Classes/createCriteria.html
This is not REST. You cannot define URIs for resources inside your API. Resource navigation must be hypertext-driven. It's fine if you want pretty URIs and heavy amounts of coupling, but just do not call it REST, because it directly violates the constraints of RESTful architecture.
See this article by the inventor of REST.
In addition i would also suggest:
/cars/search/all{?color,model,year}
/cars/search/by-parameters{?color,model,year}
/cars/search/by-vendor{?vendor}
Here, Search is considered as a child resource of Cars resource.
There are a lot of good options for your case here. Still you should considering using the POST body.
The query string is perfect for your example, but if you have something more complicated, e.g. an arbitrary long list of items or boolean conditionals, you might want to define the post as a document, that the client sends over POST.
This allows a more flexible description of the search, as well as avoids the Server URL length limit.
RESTful does not recommend using verbs in URL's /cars/search is not restful. The right way to filter/search/paginate your API's is through Query Parameters. However there might be cases when you have to break the norm. For example, if you are searching across multiple resources, then you have to use something like /search?q=query
You can go through http://saipraveenblog.wordpress.com/2014/09/29/rest-api-best-practices/ to understand the best practices for designing RESTful API's
Though I like Justin's response, I feel it more accurately represents a filter rather than a search. What if I want to know about cars with names that start with cam?
The way I see it, you could build it into the way you handle specific resources:
/cars/cam*
Or, you could simply add it into the filter:
/cars/doors/4/name/cam*/colors/red,blue,green
Personally, I prefer the latter, however I am by no means an expert on REST (having first heard of it only 2 or so weeks ago...)
My advice would be this:
/garages
Returns list of garages (think JSON array here)
/garages/yyy
Returns specific garage
/garage/yyy/cars
Returns list of cars in garage
/garages/cars
Returns list of all cars in all garages (may not be practical of course)
/cars
Returns list of all cars
/cars/xxx
Returns specific car
/cars/colors
Returns lists of all posible colors for cars
/cars/colors/red,blue,green
Returns list of cars of the specific colors (yes commas are allowed :) )
Edit:
/cars/colors/red,blue,green/doors/2
Returns list of all red,blue, and green cars with 2 doors.
/cars/type/hatchback,coupe/colors/red,blue,green/
Same idea as the above but a lil more intuitive.
/cars/colors/red,blue,green/doors/two-door,four-door
All cars that are red, blue, green and have either two or four doors.
Hopefully that gives you the idea. Essentially your Rest API should be easily discoverable and should enable you to browse through your data. Another advantage with using URLs and not query strings is that you are able to take advantage of the native caching mechanisms that exist on the web server for HTTP traffic.
Here's a link to a page describing the evils of query strings in REST: http://web.archive.org/web/20070815111413/http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful
I used Google's cache because the normal page wasn't working for me here's that link as well:
http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful