MongoDB full text search with haskell driver - mongodb

Is there a possibility to use the full text search of mongoDB with the haskell driver?
I found the 'runCommand' in the haskell API, but it expects a Document as parameter. That's fine for all the other commands that mongodb can run, but the syntax for a text command is:
db.collection.runCommand( "text", {search : "something"})
So I don't know how I'll get the "text" as first parameter in front of the Document.
Thanks

The text-command can be written in another structure:
{ text: your_collection
, search: your_text
, filter: your_filter
, limit: your_limit
, project: your_projection
}
I had my suspicion, since all "runCommand"-action have the same structure. So I tried to apply that structure to the text-command - but without success. Then I remembered that aggregate has another structure too and tried that, but that didn't work either. Finally, I found the answer in a google group entry of the Java driver.

Related

MongoDB does not return a field

It must be a silly mistake but I can't find it.
When I run db.getCollection('communes').findOne({}),
I obtain:
{
"_id" : ObjectId("59b851a19db72301ae771c57"),
"COMMUNE" : "ALAA6",
"LIBGEO" : "ROHRBACH",
"PAYS" : "Allemagne"
}
which is fine.
But when I run db.getCollection('communes').findOne({COMMUNE: "ALAA6"}), it returns nothing!
For strange reasons, filtering on other fields work, so when I run db.getCollection('communes').findOne({LIBGEO: "ROHRBACH"}), it returns the result. Same thing filtering on "PAYS".
Adding quotes around COMMUNE, i.e. running db.getCollection('communes').findOne({"COMMUNE": "ALAA6"}) or using find instead of findOne doesn't change anything.
Any idea?
Ok, my fault. The collection has been created from an Excel export, and it seems the first column contains a strange character. Querying from the Robo3T client did not show it, but querying from Python returns:
{'LIBGEO': 'ROHRBACH',
'PAYS': 'Allemagne',
'_id': ObjectId('59b851a19db72301ae771c57'),
'\ufeffCOMMUNE': 'ALAA6'}
so obviously the \ufeffchar before COMMUNEshould be removed...

Elasticsearch mongodb river script in index doesn't work

I'm trying to change few fields strings using javascript.
For example take only the last part of the URL taken from mongo through the river so in elasticsearch I'll have only the end of it.
When creating the index (using curl) I added under "options" the following script:
"script": "ctx.document.shorturl = ctx.document.url.substr(-4);delete ctx.document.url;
I tried some manipulations such as adding \"...\" or use ctx['doc']['url'] and others but nothing seems to work.
I always get only url field with the full url (shorturl is not created at all).
Can anyone suggest what is the right syntax to make it work?
Another thing I need to do is combine to fields - lat & long, to one "location" field in order to use it in Kibana, can anyone suggest the right script for that? (create new field called "location" which contain both field "lat" & "long" with comma between them).
Thanks.
You did substring(-4), hence it will return the whole string. You should use substring(4) instead:
ctx.document.shorturl = ctx.document.url.substr(4);delete ctx.document.url;

Mongo find by regex: return only matching string

My application has the following stack:
Sinatra on Ruby -> MongoMapper -> MongoDB
The application puts several entries in the database. In order to crosslink to other pages, I've added some sort of syntax. e.g.:
Coffee is a black, caffeinated liquid made from beans. {Tea} is made from leaves. Both drinks are sometimes enjoyed with {milk}
In this example {Tea} will link to another DB entry about tea.
I'm trying to query my mongoDB about all 'linked terms'. Usually in ruby I would do something like this: /{([a-zA-Z0-9])+}/ where the () will return a matched string. In mongo however I get the whole record.
How can I get mongo to return me only the matched parts of the record I'm looking for. So for the example above it would return:
["Tea", "milk"]
I'm trying to avoid pulling the entire record into Ruby and processing them there
I don't know if I understand.
db.yourColl.aggregate([
{
$match:{"yourKey":{$regex:'[a-zA-Z0-9]', "$options" : "i"}}
},
{
$group:{
_id:null,
tot:{$push:"$yourKey"}
}
}])
If you don't want to have duplicate in totuse $addToSet
The way I solved this problem is using the string aggregation commands to extract the StartingIndexCP, ending indexCP and substrCP commands to extract the string I wanted. Since you could have multiple of these {} you need to have a projection to identify these CP indices in one shot and have another projection to extract the words you need. Hope this helps.

MongoDB - searching text

What is the best strategy for selecting mongodb entries in which a string value contains a set of words or phrases? I'm thinking of something equivalent to mysql's LIKE function, e.g.
WHERE (TEXT LIKE "% apple %") or (TEXT LIKE "% banana %")
I've seen options that involve tokenizing the string, but this would involve building unigrams for all the text, which would be huge no?
Mongo now supports text search since 2.4.
My experience has been pretty positive
http://docs.mongodb.org/manual/applications/text-search/
You start the server with setParameter text search enabled
Then enable the index on the collection
Then search with runCommand
MongoDB has no full text search capability right now, but it's easy to use external search engines like SOLR.
I strongly discourage you trying to rebuild text search with Regex or word stemming etc. yourself. You should rather focus on your app own features :)
I am using this combination: Mongoid, Sunspot and Mongoid-Sunspot. It works very well in production, and development setup is easy.
You can use the regular expression support in MongoDB queries. More details available # the following link
http://docs.mongodb.org/manual/reference/operator/regex/
Here are two examples should the above link move again in the future:
db.collection.find( { field: /acme.*corp/i } );
db.collection.find( { field: { $regex: 'acme.*corp', $options: 'i' } } );
Somehow MongoDB built-in text search failed to meet my requirements on an existing database which used a compound index. I am now using mongoose-search-plugin and it has been working superbly well. It uses natural stemming, and distance algorithms to return a relevance score.
User.search('Malaysia Car Food',{username:1},{}, function(err, u){
console.log('Search Results: '+JSON.stringify(u));
});

Performing regex queries with PyMongo

I am trying to perform a regex query using PyMongo against a MongoDB server. The document structure is as follows
{
"files": [
"File 1",
"File 2",
"File 3",
"File 4"
],
"rootFolder": "/Location/Of/Files"
}
I want to get all the files that match the pattern *File. I tried doing this as such
db.collectionName.find({'files':'/^File/'})
Yet I get nothing back. Am I missing something, because according to the MongoDB docs this should be possible? If I perform the query in the Mongo console it works fine, does this mean the API doesn't support it or am I just using it incorrectly?
If you want to include regular expression options (such as ignore case), try this:
import re
regx = re.compile("^foo", re.IGNORECASE)
db.users.find_one({"files": regx})
Turns out regex searches are done a little differently in pymongo but is just as easy.
Regex is done as follows :
db.collectionname.find({'files':{'$regex':'^File'}})
This will match all documents that have a files property that has a item within that starts with File
To avoid the double compilation you can use the bson regex wrapper that comes with PyMongo:
>>> regx = bson.regex.Regex('^foo')
>>> db.users.find_one({"files": regx})
Regex just stores the string without trying to compile it, so find_one can then detect the argument as a 'Regex' type and form the appropriate Mongo query.
I feel this way is slightly more Pythonic than the other top answer, e.g.:
>>> db.collectionname.find({'files':{'$regex':'^File'}})
It's worth reading up on the bson Regex documentation if you plan to use regex queries because there are some caveats.
The solution of re doesn't use the index at all.
You should use commands like:
db.collectionname.find({'files':{'$regex':'^File'}})
( I cannot comment below their replies, so I reply here )