Basic GROUP BY statement using OPA MongoDB high level API - mongodb

My question is quite simple: I'd like to perform a GROUP BY like statement with MongoDB using the OPAlang high level database API. But I don't think that is possible (?)
If I do want to perform a mongodb $group operation, do I necessarily need to use the low-level API (using stdlib.apis.mongo ?)
Finally, can I use both low-level and high-level APIs to communicate with my MongoDB ?
Thanks.

I am afraid that, taking into account latest published Opa compiler code, no aggregation is supported :( See the thread in Opa forum. Also note the comment of Quentin about the using both low- and high-level API-s:
"You can use this [low level] library and built-in [hight level] library together, [...]"
See the auto-increment implementation advices by the guys from the MLstate in this thread. Note the high level DB field /next_id definition and initialization with low level read and increment.

I just got different idea.
All MongoDB commands (eg. the "group" command you are using) are accessible with the virtual collection named $cmd. You just ask the server to find the document {command_name: command_parameter, additional: "options", are: ["listed", "here"]}. You should be able to use every fancy feature of your MongoDB server, not supported yet with the Opa API, with single find query. This includes the aggregation framework introduced in version 2.2 and the full-text searching still in beta since version 2.4.
For example, I want to use new text command to search in full-text index for collecion coll_name the query string query. I am currently using the code (where oncuccess is the function to parse the answer and get the id-s of the documents found):
{ search: query, project: {_id:0, id:1}, }
|> Bson.opa2doc
|> MongoCommands.simple_str_command_opts(ll_db, db_name, "text", coll_name, opts)
|> MongoCommon.outcome_map(_, onsuccess, onfailure)
And if you take a look at the source code of the API, simple_str_command_opts is implemented as a findOne() to the Mongo.
But instead I could use the high level DB support:
/test/`$cmd`[{text: coll_name, search: query, project: {_id: 0, id: 1}}]
What you have to do, is to declare the high-level DB collection with the type including:
all the fields that you use to make the query,
all the fields that you can get in possible answer.
For the text command:
type commands = {
// command
string text,
// query
string search,
{
int _id,
int id,
} project,
// result of executing command "text"
string queryDebugString,
string language,
list({
float score,
{int id} obj,
}) results,
{
int nscanned,
int nscannedObjects,
int n,
int nfound,
int timeMicros,
} stats,
int ok,
// in case of failure (`ok: 0`)
string errmsg,
}
Unfortunately, it is not working :( During the application start-up Opa run-time DB support tries to create the unique index for the primary key of the set (for following example {text, search, project}):
database test {
article /article[{id}]
commands /`$cmd`[{text, search, project}]
}
Using primary key is necessary, since you have to use findOne(), not find(). Creating an index for virtual collection $cmd is not allowed and DB initialization fails.
If you find the way to stop Opa from creating index, you will be able to use all the fancy features of Mongo using no more then high-level API ;)

Related

Cloudant : query in http navigator

I'm using cloudant, with no auth, Cors enabled.
it works very well, Limit and skip working good.
but i can't find how to search for something .
I'm trying to find a document where cp is 24000 , for example with this query :
https://1c54473b-be6e-42d6-b914-d0ecae937981-bluemix.cloudant.com/etablissements/_all_docs?skip=0&limit=10&include_docs=true&q=cp:24000
But, the query doesn't return the right document.
I've also tried
https://1c54473b-be6e-42d6-b914-d0ecae937981-bluemix.cloudant.com/etablissements/_all_docs?skip=0&limit=10&include_docs=true&_search({'cp':24000})
with no luck.
oh, and by the way, do you know if jquery.couch.js lib has been discontinued? I cant even find it on github, nor on my hard disk while im using foxant, and it is not in the directory also..
The /db/_all_docs endpoint hits the primary index of the database where all of the documents in the database can be found in _id order.
If you wish to query the database to get a subset of the data you have three options
Cloudant Query - hit the POST /db/_find endpoint passing in a JavaScript object containing the selector which defines the query you wish to perform (like the WHERE clause of a SQL query) e.g. {selector: {cp: 24000}}
MapReduce - create a Map function in a design document that filters the documents you are interested it. It creates a materialized view that can be queried and filtered later. e.g. function(doc){ emit(doc.cp, null);}
Cloudant Search - this uses the Apache Lucene library to generate an index on the fields you specify. You can then query the index: q=cp:24000, which looks similar to the query you are looking to perform.

Field's datatype of collection in mongodb

How to get field information of a collection in mongodb.
information I am looking for are
field name
data type
You will need to loop over all the documents and figure out what the used names are, and which types each specific field uses. MongoDB does not have a schema, so there is no short cut to fetch this. Be also aware that each field's value can have totally different data types as well—another one of MongoDB's strenghts.
To figure out some statistics, such as field names, the following script can help:
mr = db.runCommand({
"mapreduce" : "things",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "things" + "_keys"
})
Then run distinct on the resulting collection so as to find all the keys:
db[mr.result].distinct("_id");
But there is no way to also include the field types with a Map/Reduce job like this.
You can't determine the schema of a collection. Each of the objects of an collection might have a different schema, you should be aware of this.
I made a similar question a few months ago , in the post you can find how to retrieve the schema of an object using the java programing language; However, to the best of my knowledge, the is no way to retrieve the data types other than try to cast the objects (this is the way the BasicBsonObjects do it).
MongoDB supports dynamic schema, and there is no inbuilt feature for schema introspection or analysis as at MongoDB 2.4.
However .. it is possible to infer the schema by inspecting using a Map/Reduce across either a sample of documents or the entire collection.
There are a few open source tools which package this approach up in a helpful interface, for example:
Schema.js - extends the mongo shell with collection.schema() prototypes
Variety - runs as a standalone script
I like the approach of schema.js, and include it in my ~/mongorc.js startup file so it is available in my mongo shell sessions.
By default schema.js analyzes up to 50 documents in a collection and returns the results inline. There is a limit option to inspect more (or even all) documents in a collection, and it supports the Map/Reduce out options so results can optionally be saved or merged with an output collection.

Haskell mongodb text search

What is the status of text search with haskell mongodb driver?
There is now 'LIKE' operator in mongo similar to SQL variants, so what is the best way to search a collection or the whole db for a particular text string?
I've read some people referencing external tools but I can also see that some text search was implemented in 2.4 mongo version which is done through command interface.
There should not be any problems doing it from console but how would I do it from haskell driver? I found 'runCommand' function in the driver APIs and it looks like it should be possible to send 'text' command to the server but the signature shows that it returns only one document - not a list of documents. So how is it done correctly?
How would I efficiently search for a word or a sentence in a collection or db so that it returns a list of documents containing the word? Is it possible to do without external tools using mongo 'text search' feature? SHould it be done in the application level?
Thanks.
The result type already contains the list of documents (that contain the searched text). Unfortunately, I could not test the query on my running database, but I have used runCommand to run an aggregation (before it was implemented for the haskell driver). The result document you get for such an query looks something like this:
{ results: [
{ score : ...,
obj : { ... }
},
...
],
... ,
ok : 1
}
The result document has a field results and its value is a document with fields score and obj. So in the end, you can find each of the matched document behind the obj-field in the list of results.
For more details, you should take a look here.

How do I describe a collection in Mongo?

So this is Day 3 of learning Mongo Db. I'm coming from the MySql universe...
A lot of times when I need to write a query for a MySql table I'm unfamiliar with, I would use the "desc" command - basically telling me what fields I should include in my query.
How would I do that for a Mongo db? I know, I know...I'm searching for a schema in a schema-less database. =) But how else would users know what fields to use in their queries?
Am I going at this the wrong way? Obviously I'm trying to use a MySql way of doing things in a Mongo db. What's the Mongo way?
Type the below query in editor / mongoshell
var col_list= db.emp.findOne();
for (var col in col_list) { print (col) ; }
output will give you name of columns in collection :
_id
name
salary
There is no good answer here. Because there is no schema, you can't 'describe' the collection. In many (most?) MongoDb applications, however, the schema is defined by the structure of the object hierarchy used in the writing application (java or c# or whatever), so you may be able to reflect over the object library to get that information. Otherwise there is a bit of trial and error.
This is my day 30 or something like that of playing around with MongoDB. Unfortunately, we have switched back to MySQL after working with MongoDB because of my company's current infrastructure issues. But having implemented the same model on both MongoDB and MySQL, I can clearly see the difference now.
Of course, there is a schema involved when dealing with schema-less databases like MongoDB, but the schema is dictated by the application, not the database. The database will shove in whatever it is given. As long as you know that admins are not secretly logging into Mongo and making changes, and all access to the database is controller through some wrapper, the only place you should look at for the schema is your model classes. For instance, in our Rails application, these are two of the models we have in Mongo,
class Consumer
include MongoMapper::Document
key :name, String
key :phone_number, String
one :address
end
class Address
include MongoMapper::EmbeddedDocument
key :street, String
key :city, String
key :state, String
key :zip, String
key :state, String
key :country, String
end
Now after switching to MySQL, our classes look like this,
class Consumer < ActiveRecord::Base
has_one :address
end
class Address < ActiveRecord::Base
belongs_to :consumer
end
Don't get fooled by the brevity of the classes. In the latter version with MySQL, the fields are being pulled from the database directly. In the former example, the fields are right there in front of our eyes.
With MongoDB, if we had to change a particular model, we simply add, remove, or modify the fields in the class itself and it works right off the bat. We don't have to worry about keeping the database tables/columns in-sync with the class structure. So if you're looking for the schema in MongoDB, look towards your application for answers and not the database.
Essentially I am saying the exactly same thing as #Chris Shain :)
While factually correct, you're all making this too complex. I think the OP just wants to know what his/her data looks like. If that's the case, you can just
db.collectionName.findOne()
This will show one document (aka. record) in the database in a pretty format.
I had this need too, Cavachon. So I created an open source tool called Variety which does exactly this: link
Hopefully you'll find it to be useful. Let me know if you have questions, or any issues using it.
Good luck!
AFAIK, there isn't a way and it is logical for it to be so.
MongoDB being schema-less allows a single collection to have a documents with different fields. So there can't really be a description of a collection, like the description of a table in the relational databases.
Though this is the case, most applications do maintain a schema for their collections and as said by Chris this is enforced by your application.
As such you wouldn't have to worry about first fetching the available keys to make a query. You can just ask MongoDB for any set of keys (i.e the projection part of the query) or query on any set of keys. In both cases if the keys specified exist on a document they are used, otherwise they aren't. You will not get any error.
For instance (On the mongo shell) :
If this is a sample document in your people collection and all documents follow the same schema:
{
name : "My Name"
place : "My Place"
city : "My City"
}
The following are perfectly valid queries :
These two will return the above document :
db.people.find({name : "My Name"})
db.people.find({name : "My Name"}, {name : 1, place :1})
This will not return anything, but will not raise an error either :
db.people.find({first_name : "My Name"})
This will match the above document, but you will have only the default "_id" property on the returned document.
db.people.find({name : "My Name"}, {first_name : 1, location :1})
print('\n--->', Object.getOwnPropertyNames(db.users.findOne())
.toString()
.replace(/,/g, '\n---> ') + '\n');
---> _id
---> firstName
---> lastName
---> email
---> password
---> terms
---> confirmed
---> userAgent
---> createdAt
This is an incomplete solution because it doesn't give you the exact types, but useful for a quick view.
const doc = db.collectionName.findOne();
for (x in doc) {
print(`${x}: ${typeof doc[x]}`)
};
If you're OK with running a Map / Reduce, you can gather all of the possible document fields.
Start with this post.
The only problem here is that you're running a Map / Reduce on which can be resource intensive. Instead, as others have suggested, you'll want to look at the code that writes the actual data.
Just because the database doesn't have a schema doesn't mean that there is no schema. Generally speaking the schema information will be in the code.
I wrote a small mongo shell script that may help you.
https://gist.github.com/hkasera/9386709
Let me know if it helps.
You can use a UI tool mongo compass for mongoDb. This shows all the fields in that collection and also shows the variation of data in it.
If you are using NodeJS and want to get the all the field names using the API request, this code works for me-
let arrayResult = [];
db.findOne().exec(function (err, docs)){
if(err)
//show error
const JSONobj = JSON.parse(JSON.stringify(docs));
for(let key in JSONobj) {
arrayResult.push(key);
}
return callback(null, arrayResult);
}
The arrayResult will give you entire field/ column names
Output-
[
"_id",
"emp_id",
"emp_type",
"emp_status",
"emp_payment"
]
Hope this works for you!
Consider you have collection called people and you want to find the fields and it's data-types. you can use below query
function printSchema(obj) {
for (var key in obj) {
print( key, typeof obj[key]) ;
}
};
var obj = db.people.findOne();
printSchema(obj)
The result of this query will be like below,
you can use Object.keys like in JavaScript
Object.keys(db.movies.findOne())

id autoincrement/sequence emulation with CassandraDB/MongoDB etc

I'm trying to build small web-system (url shortcutting) using nonsql Cassandra DB, the problem I stack is id auto generation.
Did someone already stack with this problem?
Thanks.
P.S. UUID not works for me, I do need to use ALL numbers from 0 to Long.MAX_VALUE (java). so I do need something that exactly works like sql sequence
UPDATED:
The reason why I'm not ok with GUID ids is inside of scope of my application.
My app has url shortcutting part, and I do need to make url as short as possible. So I follow next approach: I'm taking numbers starting from 0 and convert it base64 string. So in result I have url like mysite.com/QA (where QA is base 64 string).
This is was very easy to implement using SQL DB, I just took auto incremented ID, convert it to URL and was 100-percents sure, that URL is unique.
Don't know about Cassandra, but with mongo you can have an atomic sequence (it won't scale, but will work the way it should, even in sharded environment if the query has the sharded field).
It can be done by using the findandmodify command.
Let's consider we have a special collection named sequences and we want to have a sequence for post numbers (named postid), you could use code similar to this:
> db.runCommand( { "findandmodify" : "sequences",
"query" : { "name" : "postid"},
"update" : { $inc : { "id" : 1 }},
"new" : true } );
This command will return atomically the updated (new) document together with status. The value field contains the returned document if the command completed successfully.
Autoincrement IDs inherently don't scale well as they need a single source to generate the numbers. This is why shardable/replicatable databases such as MongoDB use longer, GUID-like identifiers for objects. Why do you need LONG values so badly?
You might be able to do it using atomic increments, retaining the old value, but I'm not sure. This would be limited to single server setups only.
Im not sure I follow you. What language are you using? Are we talking about uuid?
The following is how you generate UUIDs in some languages:
java.util.UUID.randomUUID(); // (Java) variant 2, version 4
import uuid // (Python)
uuid.uuid1() // version 1