How to best get around CouchDB's non-RDBMS limitations - nosql

We have two document 'types': Post and User:
Typical post:
{
"_id": "3847345345",
"Schema": "Post",
"Text": "Hello World! This is a post!",
"IsFeatured": true,
"UserID": "12345345345234234"
}
Typical user:
{
"_id": "12345345345234234",
"Schema": "User",
"Username": "georgepowell"
"PostIds": ["3847345345","5135345345","9987453236", ... ]
}
On a web page that displays a Post, the Username for that post (plus whatever other changable information about that user) is displayed alongside the post. Similar to SO:
This is a typical example of a situation where an SQL JOIN would be perfect, but of course CouchDB doesn't support anything like that. Instead we could make a view that indexes both User documents and Post documents on a Post's _id. Like this:
function(doc) {
if (doc.Schema = 'Post') {
emit([doc._id, 0], null);
} else if (doc.Schema = 'User') {
foreach (string id in doc.PostIds) // not javascript I know. shhh
emit([id, 1], null);
}
}
which works well, as we can efficiently retrieve all the information we need given a single Post's _id.
However, if I want to create a view that lists all the posts where IsFeatured == true along with all the user data, I get stuck!
function(doc) {
if (doc.Schema = 'Post' && doc.IsFeatured) {
emit([doc._id, 0], null);
} else if (doc.Schema = 'User') {
foreach (string id in doc.PostIds)
emit([id, 1], null); // I can't check if the post is featured!
}
}
Have I reached the limit of CouchDB for relational data? or is this kind of indexing possible in CouchDB?

Since it is a different technology there are trade-offs. And sometimes although things look like they will take more resources (an extra request) in the short-run it can be inconsequential, and in the long-run may give significant scalability, if you need that sort of thing.
CouchDB can handle a lot of different "databases" at the same time, which you can think of as different model spaces. So with the same running instance of CouchDB you could have /users and /posts. This requires absolutely no additional work on the part of configuration or performance of CouchDB.
This can make your map code more straight forward as you then don't need to have the 'Schema' field and be incorporating it into every map function.
Also, you can (and should) have multiple different map/reduce pairs in a given design view. This is important because if you have two different document "Schema"s emit(doc.id, doc.val) how can you tell which is which for reduce purposes.
A more CouchDB idiomatic way to look at your data would be that you don't save the post_ids on the user. Just the UserID on the Posts, then have a map something like the following for Posts:
(doc) ->
emit([doc.user_id, doc.isFeatured], null);
emit([doc.isFeatured, doc.createdAt], doc.user_id);
Then a request to the view API with arguments like ?start_key=['12345345345234234']&end_key=['12345345345234234',{}] would get all their posts.
Where one with ?key=['12345345345234234', 1] would just get their featured posts.
The second emit also gives you ability to quickly get all of the posts that are featured across the whole system sorted by date -- with who made them if you want that data, without getting the whole of the posts sent down the pipe.

Related

How to design calculation API in RESTFul way?

I'm trying to design an API to calculate a result based on inputs.
Real business:
The API compares two securities portfolios (source and target) and return the orders, the consumer gets the orders, so he/she can then places those orders to adjust portfolio from source to target.
If this is hard to be understood, then here's a similar scenario:
The API compare two text, then return the difference of the 2 texts.
It is a little bit different from the classic CRUD, because the inputs and output are different resources
My first thought is like this:
POST /api/difference
{
'source': { ... },
'target': { ... }
}
But, it will be conflict with the classic payload:
POST /api/difference
{
'lineNumber': ...,
'isAdded': ...
}
Questions:
Should I use a media-type to distinguish the the input payloads? What a 'resource' should be in this case?
What should the API look like if I also want to place the orders (or apply the text diff) in the same time when the API is called?
Iam not sure whether I understand your problem correctly, but in general it
depends on whether the resources are already persisted in the system. In case
both resources are already available in the system I would simply build an URI
like /portfolio/{source_id}/difference/{target_id} which returns the diff
result. If only the source exists I would probably use something like:
POST /portfolio/{source_id}/difference
{target}
If both resources are not available I would probably consider to first persist
such a resource and make then the comparison.
If I understood you correctly, there already exists the resource POST /api/difference and hence you are looking to change MIME type. Instead, why don't you go with the first approach and change the resource name? For example,
POST /api/compare
{
'source': { ... },
'target': { ... }
}

Custom fields and global subscriptions for Meteor user accounts

I'm adding custom data to Meteor user accounts for the first time. I've been able to add custom fields without difficulty and I know they're there because I can see them in Mongol. I am publishing via a global subscription so how do I then go about reading data from individual fields? It seems the syntax is very different from that when using publish/subscribe methods.
So, I have user accounts like this (as seen in Mongol):
"_id": "#################",
"profile": {
"name": "Test User"
},
"customfields": {
"customfield1": [
"A","B","C"
]
}
}
In server/main.js I have the following
Meteor.publish(null, function() {
return Meteor.users.find(this.userId, {fields:{customfields:1}});
});
This seems to be publishing fine. But what code do I use to render the cursor as data? I've been using variations on code like this in client/main.js and having no success:
var stuff = Meteor.users.find(this.userId).fetch();
console.log(stuff.customfield1);
Any help appreciated.
MyCollection.find() returns a cursor whereas MyCollection.findOne() returns an object, i.e. a single mongodb document.
A publication must return a cursor or array of cursors. You publication is fine.
You are basically trying to make the customfields key of the user object visible on the client. (The profile key is automatically published by Meteor).
On the client, where you are doing:
var stuff = Meteor.users.find(this.userId).fetch();
You can simply use:
var stuff = Meteor.user();
or
var stuff = Meteor.users.findOne(Meteor.userId());
Then stuff.customfields will contain what you're looking for.
The second form is way too verbose for me unless you're looking for a different user than the logged in user.
Note: this.userId on the client will not be the userId of the current user, it will be undefined. That only works on the server. That may actually be the root cause of your problem. In addition, your publications must be ready() for the data to be available. This isn't true immediately after login for example.
Since customfield1 is nested in customfields, did you try stuff.customfields.customfield1?

Transforming DB Collections in Meteor.publish

Hopefully this question is not too long but I am trying to include as much details as possible in what I did..
I am trying to figure out how to implement logic in Meteor.publish() that takes data from the DB, changes all the values in a column and makes the updated collection available for client-side subscription.
Specifically, I have a table that stores messages between users and the recipient is identified by his userId. I would like to replace the userId with his actual phone number which should be available in the Meteor.users table.
When I looked it up online I saw suggestions to use transform but my understanding is that it's not reactive.. I then learned about map but discovered that it returns an array which breaks the Meteor.publish() method. Finally I found something that uses forEach and self.added() and self.ready() so my code currently looks like this:
Meteor.publish("myMessages", function () {
var self = this;
Messages.find({
$or: [
{ senderId: this.userId },
{ recipientId: this.userId }
]
}).forEach(function(m) {
m.recipientId = Meteor.users.findOne({ _id: m.recipientId }).username;
console.log("adding msg to collection:");
console.log(m);
self.added("Messages", m._id, m);
});
self.ready();
});
The log messages look right and when Meteor restarts it prints all the messages from the DB related to the user where the recipient is replaced correctly with the phone number. However, on the client side when I try to run Messages.findOne(msgId) (with an id I verified exists by selecting it directly in mongo shell) I get undefined back and furthermore, running Messages.find() through developer tools in the browser returns undefined as well although I expected the messages that showed up in the logs to be available..
I feel that this is a basic use case but I am not able to make this work.. any help is appreciated!
"You can transform a collection on the server side like this:"
https://stackoverflow.com/a/18344597/4023641
It worked for me.
Unfortunately, changes in users collection will not update reactively these custom fields.

Elasticsearch query design via RESTful API

I'm trying to build a query that first lets me get a list of followers that are following a user, second it should take that list and then check to see if they are 'online'.
I have two 'indexes' or endpoints /channel and /following.
The channel endpoint JSON object looks like this (parts abbreviated)
{ channel: {"username":"username1", ... , "online":"true" } }
The following endpoint object looks a bit like this
{ following : {"username1":{"username2":"username2", "username3":"username3"} }
if I run a simple query /following/_search I get back hits like...
{
"_index": "following",
"_type": "following",
"_id": "_Liso_",
"_score": 1,
"_source": {
"Gabe": "Gabe",
"Gavin": "Gavin"
}
}
This result means that Gavin is following Gabe.
I believe the issue is how I'm storing the data.
In firebase my data looks like this
following
|---Gabe
|----Gavin:Gavin
so each child object of following object has key/value children of {username}:{username}
Now I can run queries that individually get the results I need. For example, if I ask ElasticSearch (ES) if channel "Gavin" is "online" I get back one result depending on if they are or are not online. And same with Following. However I can't seem to get the query to first see who is following Gavin and then see if they are online and return those users whom are online.
I've found a better solution (or maybe not). First you query the database for users whom are following a user.
From this list you send another query
{
"query":{
"filtered":{
"query":{
"match_all":{}
},
"filter":{
"bool":{
"must":{
"terms":{
"username":["username1"]
}
},
"must_not":{
"terms":{
"online":["true"]
}
}
}
}
}
}
}
This works however the username cannot be mixed with capitols. I don't know if this is an indexing issue on my part or terms have to be very specific. The solution I'm using on the client side is to lowercase the search terms before I submit them. It's crude and hacky but it works for now.
Issues that I may run into:
If a user has millions of followers pulling all that data from the
database will make the client sluggish.
a possible solution to this is to paginate the following results and run the query for every 20 returned results.
I'll continue to revise the answer as I develop / learn better query methods.

RESTful Many-to-Many possible?

How to I represent a complex resource for a REST post?
Hello,
Currently I have an application which when the user hits "save" it iterates over all of the form elements and creates one mass object which manages a:
var = params = [{
attributes1: form1.getValues(),
attributes2: form2.getValues(),
.. ..
}];
I then send this mass object via a RPC POST to my "Entity" model service.
This entity which I wish to persist data for is quite complex. All in all, the data is spread accross about 30 tables. To help explain my actual question, the "entity" is a building (as in a physical property/house/apartment).
What I would like, is to be able to turn my mess into a RESTful API for saving properties.
The problem I have is that, saving details for a single model that spans a single table is fine. How do I structure my data object for transport when the model has
many to many relationships
one to many relationships
one to one relationships
For example:
Here is a WATERED down version of what I might have on a property and the sample data
propertyId: 1,
locationId: 231234,
propertyName: "Brentwood",
kitchenFeatures: [
{ featureId: 1, details: "Induction hob"},
{ featureId:23, details: "900W microwave"}
],
propertyThemes: [ 12,32,54,65 ]
This actually goes on a lot more.. but you can get the general gist. kitchenFeatures would be an example of a many-to-many, where I have a featuresTable which has all of the features like so:
`featureId`, `feature`
1 "Oven Hob"
23 "Microwave"
and propertyThemes would be an example of another many-to-many.
How am I expected to form my "object" to my RESTful service? Is this even possible?
ie. If I want to save this property I would send it to:
http://example.com/api/property/1
The approach I would use here is hypermedia and links:
/property
/property/{id}
/property/{id}/features/{id}
Depending on your domain you might even get away with:
/property/{id}/features/{name}
or
/property/{id}/features/byname/{name}
Thus you can do REST operations and serve JSON or XHTML hypermedia.
Property details:
Request: GET /property/1
Response:
{
..
"name": "Brentwood",
"features": "/property/1/features"
..
}
Brentwood's features:
GET /property/1/features
{
..
"Kitchen": "/property/1/features/1",
"Dog Room": "/property/1/features/dog%20room",
..
}
GET /property/1/features/1
{
..
"Induction hob": "/property/1/features/1/1",
"900W microwave": "/property/1/features/1/23",
"nav-next" : "/property/1/features/dog%20room",
..
}
To add a relation you can do something like this:
POST /property/1/features
{
..
"Name": "Oven Hob"
..
}
If you know what the relation will be you use a PUT:
PUT /property/1/features/23
{
..
"Name": "Oven Hob"
..
}
You can serve multiple media types:
GET http://host/property/1/features/dog%20room.json
GET http://host/property/1/features/dog%20room.xhtml
For the response in xhtml the response can use named links like this:
..
Kitchen
..
There are other aspects of REST that you can use such as response code which I did not include above.
Thus, to model relations you make use of links which can be in itself a resource that can be operated on with GET, PUT, POST and DELETE or even custom verbs such as ASSOCIATE or LINK. But the first four are the ones that people are used to. Remember PUT is idempotent but not POST. See PUT vs POST in REST
Edit: You can group your links into JSON arrays to give structure to your hypermedia.
I think you're really asking, "How do I represent complex data in a form suitable for transmission within a POST?", right? It's less to do with REST and more to do with your choice of media type. I would suggest starting with a pure JSON representation, using arrays and cross-referenced ID fields to map the relationships. You could also do this with XML, of course.
The examples you gave look right on the money. You just need to ensure that both parties (browser and server) agree on the structure and interpretation of the media type you use.
I'm dealing with the exact same thing. I opted to not use id's anywhere, but use urls everywhere an id would normally be expected.
So in your case, the kitchenfeatures could simply be an array with urls to:
/feature/1
/feature/23
And the themes to
/propertyTheme/12
/propertyTheme/32
etc..
In the case of many-to-many relationships, we update all the relations as a whole. Usually we simply dump the existing data, and insert the new relationships.
For one to many relationships we sometimes extend the urls a bit where this makes sense. If you were to have comments functionality on a 'property', this could look like
/property/1/comment/5
But this really depends on the situation for us, for other cases we put it in the top-level namespace.
Is this helpful to you?