Elastic search With MongoDB : Searching PDFs - mongodb

I were trying to save my pdf file in Mongo Db's gridFS and then searching in that pdfs using elastic search. I performed following :
1) Mongo DB Side:
mongod --port 27017 --replSet rs0 --dbpath "D:\Mongo-DB\mongodb-win32-i386-2.0.7\data17"
mongod --port 27018 --replSet rs0 --dbpath "D:\Mongo-DB\mongodb-win32-i386-2.0.7\data18"
mongod --port 27019 --replSet rs0 --dbpath "D:\Mongo-DB\mongodb-win32-i386-2.0.7\data19"
mongo localhost:27017
rs.initiate()
rs.add("hostname:27018")
rs.add("hostname:27019")
mongofiles -hlocalhost:27017 --db testmongo --collection files --type application/pdf put D:\Sherlock-Holmes.pdf
2) Elastic Search Side (Installed Plugins : bigdesk/head/mapper-attachments/river-mongodb)
-> Using Elastic Search Head given following request from "Any request" tab
URL : http://localhost:9200/_river/mongodb/
_meta/PUT
{
"type": "mongodb",
"mongodb": {
"db": "testmongo",
"collection": "fs.files",
"gridfs": true,
"contentType": "",
"content": "base64 /path/filename | perl -pe 's/\n/\\n/g'"
},
"index": {
"name": "testmongo",
"type": "files",
"content_type": "application/pdf"
}
}
Now i am trying to access following URL :
http://localhost:9200/testmongo/files/508e82e21e43def09b5e1602?pretty=true
I got following response (Which i believe is as expected) :
{
"_index" : "testmongo",
"_type" : "files",
"_id" : "508e82e21e43def09b5e1602",
"_version" : 1,
"exists" : true, "_source" : {"_id":"508e82e21e43def09b5e1602","filename":"D:\\Sherlock-Holmes.pdf","chunkSize":262144,"uploadDate":"2012-10-29T13:21:38.969Z","md5":"025fa2046f9254d2aecb9e52ae851065","length":98272,"contentType":"application/pdf"}
}
But when i were trying to search on this pdf using following URL:
http://localhost:9200/testmongo/files/_search?q=Albers&pretty=true
Its giving me following result :
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Here its showing me no any hit but word "Albers" present in this pdf. Please help. Thanks in advance.

i think you have to specify the property to be searched
http://localhost:9200/testmongo/files/_search?q=<PROPERTYNAME>:Albers&pretty=true
or even for complex searches
$ curl -XPOST 'http://localhost:9200testmongo/files/_search?q' -d '{
<PROPERTYNAME> : "value",
<PROPERTYNAME> : {
<PROPERTYNAME> : "value",
<PROPERTYNAME> : "value"
}
}
'
but as far as i know you can only search for your defined properties after indexing your data.

Related

how to output the result to a file in monogodb

I wan to list all database in Monogodb and output to a txt file, but it did not work.
mongo 127.0.0.1/test -eval 'var c= show databases;' >>db_list.txt
the error message is
MongoDB shell version: 2.6.12
connecting to: 127.0.0.1/test
2016-12-06T12:12:32.456-0700 SyntaxError: Unexpected identifier
anyone knows how to make this work. I appreciate any help.
To use eval and list databases directly on a shell, the following query should be helpful.
mongo test --eval "printjson(db.adminCommand('listDatabases'))"
MongoDB shell version: 3.2.10
connecting to: test
{
"databases" : [
{
"name" : "local",
"sizeOnDisk" : 73728,
"empty" : false
},
{
"name" : "m034",
"sizeOnDisk" : 11911168,
"empty" : false
},
{
"name" : "test",
"sizeOnDisk" : 536576,
"empty" : false
}
],
"totalSize" : 12521472,
"ok" : 1
}
This will list all the collection names in a particular DB.
mongo test --eval "printjson(db.getCollectionNames())"
MongoDB shell version: 3.2.10
connecting to: test
[
"aaa",
"areamodel",
"email",
"hex",
"key",
"mel",
"multi",
"ques",
"rich"
]
A sample execution for reference (screenshot)
Instead of test you can go simply,
mongo db_name query.js > out.json
here query.js contains any query like:
printjson( db.adminCommand('listDatabases') )

How to do custom mapping using mongo connector with elasticsearch

I wanna connect mongodb and elasticsearch. I used mongo connector to connect them. I followed instruction from below link to setup==>
http://vi3k6i5.blogspot.in/2014/12/using-elastic-search-with-mongodb.html
I am able to connect mongodb and elasticsearch. But by default mongo connector created indices in elasticsearch for all databases of mongodb.
I want to create only one index for my one database and I want to insert only selected field of documents. for example: in mongo shell==>
use hotels
db.restaurants.insert(
{
"address" : {
"street" : "2 Avenue",
"zipcode" : "10075",
"building" : "1480",
"coord" : [ -73.9557413, 40.7720266 ],
},
"borough" : "Manhattan",
"cuisine" : "Italian",
"grades" : [
{
"date" : ISODate("2014-10-01T00:00:00Z"),
"grade" : "A",
"score" : 11
},
{
"date" : ISODate("2014-01-16T00:00:00Z"),
"grade" : "B",
"score" : 17
}
],
"name" : "Vella",
"restaurant_id" : "41704620"
}
)
This will create database hotels and collection restaurants. Now I want to create index and I want to put only address field in elasticsearch for that index.
Below are the steps what I tried but thats not working :
First I start mongo connector like below :
Imomadmins-MacBook-Pro:~ jayant$ mongo-connector -m localhost:27017 -t localhost:9200 -d elastic_doc_manager --oplog-ts oplogstatus.txt
Logging to mongo-connector.log.
Then from new shell tab, I made command like :
curl -XPUT 'http://localhost:9200/hotels.restaurants/'
curl -XPUT "http://localhost:9200/hotels.restaurants/string/_mapping" - d'{
"string": {
"properties" : {
"address" : {"type" : "string"}
}
}
}'
But only index is created in elasticsearch named as hotels.restaurants. I can't see any document for index hotels.restaurants.
Please suggest me how to add document for hotels.restaurants
Well I got an answer to my question, while starting mongo connector we can specify collection name and the list of fields we are interested in. Please check below command ==>
$ mongo-connector -m localhost:27017 -t localhost:9200 -d elastic_doc_manager --oplog-ts oplogstatus.txt --namespace-set hotels.restaurants --fields address,grades,name

error when using mongorestore to replay oplog with binData field

When using mongorestore with option --oplogReplay to replay oplogs, I found a strange error that mongorestore cannot handle binData field's set operation. You maybe meet the same error if you do this:
insert a test data.
db.testData.insert({_id: 10000, data: BinData(0, ""), size: 10})
update its binData field.
db.testData.update({_id: 10000}, {$set: {data: BinData(0, "CgxVfs93PiT/DrxMSvASFgoNMTAuMTYwLjIyMi4xMhDEJxgKIAA=")}})
update its other field
db.testData.update({_id: 10000}, {$set: {size: 20}})
check with oplog
use local
db.oplog.rs.find().sort({$natural: -1})
you may see the following response:
{ "ts" : Timestamp(1435627154, 1), "h" : NumberLong("-4979206321598144076"), "v" : 2, "op" : "u", "ns" : "test.testData", "o2" : { "_id" : 10000 }, "o" : { "$set" : { "size" : 20 } } }
{ "ts" : Timestamp(1435627144, 1), "h" : NumberLong("2899524097634687825"), "v" : 2, "op" : "u", "ns" : "test.testData", "o2" : { "_id" : 10000 }, "o" : { "$set" : { "data" : BinData(0,"CgxVfs93PiT/DrxMSvASFgoNMTAuMTYwLjIyMi4xMhDEJxgKIAA=") } } }
{ "ts" : Timestamp(1435627136, 1), "h" : NumberLong("-8486373688715225152"), "v" : 2, "op" : "i", "ns" : "test.testData", "o" : { "_id" : 10000, "data" : BinData(0,""), "size" : 10 } }
dump these two oplog and replay it
In bash shell:
mongodump --port 27017 -d local -c oplog.rs --query '{"ts" : {$gte: Timestamp(1435627144, 1)}}' -o ./oplogD/
mv ./oplogD/local/oplog.rs.bson ./oplogR/oplog.bson
mongorestore --port 27017 --oplogReplay ./oplogR/
after this you would find data not as expected. In my own, data changes to this.
{ "_id" : 10000, "data" : BinData(0,"ADRAAAAAPiT/DrxMSvASFgoNMTAuMTYwLjIyMi4xMhDEJxgKIAA="), "size" : 20 }
The size field is really correct, but the data field is not correct.
The most strange thing would be this, if you dump only one oplog and replay it, the data would be correct.
mongodump --port 27017 -d local -c oplog.rs --query '{"ts" : Timestamp(1435627144, 1)}' -o ./tmpD/
mv ./tmpD/local/oplog.rs.bson ./tmpR/oplog.bson
mongorestore --port 27017 --oplogReplay ./tmpR/
After oplog replayed, the 'data' field is quite correct.
{ "_id" : 10000, "data" : BinData(0,"CgxVfs93PiT/DrxMSvASFgoNMTAuMTYwLjIyMi4xMhDEJxgKIAA="), "size" : 10 }
Why does this strange thing happen?
It was fixed in this commit.
https://github.com/mongodb/mongo-tools/commit/ed60bbfae7d2b5239bea69f162f0784e17995e91
Trace the bug report in JIRA.
https://jira.mongodb.org/browse/TOOLS-807

authentication failed from httpinterface and Robomongo

edit:
Ah bad news, Robomongo 0.8.x doesn't support SCRAM-SHA-1
https://github.com/paralect/robomongo/issues/766. Good news is that V0.9 they're working hard with promises support for it.
And also the http interface in Mongo 3.0 doesn't work with SCRAM-SHA-1 user documents, because "(it) is generally considered insecure".
https://jira.mongodb.org/browse/SERVER-17527
I've just set up a mongo3.0 replica set, and enabled authentication, and created an userAdminAnyDatabase admin and a normal readWrite user.
./mongod --dbpath=/usr/local/mongo/mongodb/data/data1 --logpath=/usr/local/mongo/mongodb/logs/log1/mongodb.log --port 27017 --replSet jv_mongo --smallfiles --fork --rest --httpinterface --keyFile /usr/local/mongo/mongodb/key/mongodb.pem
./mongod --dbpath=/usr/local/mongo/mongodb/data/data2 --logpath=/usr/local/mongo/mongodb/logs/log2/mongodb.log --port 27018 --replSet jv_mongo --smallfiles --fork --rest --httpinterface --keyFile /usr/local/mongo/mongodb/key/mongodb.pem
./mongod --dbpath=/usr/local/mongo/mongodb/data/data3 --logpath=/usr/local/mongo/mongodb/logs/log3/mongodb.log --port 27019 --replSet jv_mongo --smallfiles --fork --rest --httpinterface --keyFile /usr/local/mongo/mongodb/key/mongodb.pem
jv_mongo:PRIMARY> use admin
switched to db admin
jv_mongo:PRIMARY> db.getUser("mongoAdmin");
{
"_id" : "admin.mongoAdmin",
"user" : "mongoAdmin",
"db" : "admin",
"roles" : [
{
"role" : "userAdminAnyDatabase",
"db" : "admin"
}
]
}
jv_mongo:PRIMARY> use comment
switched to db comment
jv_mongo:PRIMARY> db.getUser("comment");
{
"_id" : "comment.comment",
"user" : "comment",
"db" : "comment",
"roles" : [
{
"role" : "readWrite",
"db" : "comment"
}
]
}
And access the shell without any problem.
./mongo --port 27017 -u mongoAdmin -p PASSWORD --authenticationDatabase admin
./mongo --port 27017 -u comment -p PASSWORD --authenticationDatabase comment
jv_mongo:PRIMARY> db.user_login.find();
{ "_id" : ObjectId("5506a9de41e1073435ff06b3"), "id" : NumberLong(2), "user_id" : 9527, "login_time" : ISODate("2015-03-16T10:01:02.378Z"), "login_ip" : "127.0.0.1" }
{ "_id" : ObjectId("5506a9de41e1073435ff06b4"), "id" : NumberLong(3), "user_id" : 9538, "login_time" : ISODate("2015-03-16T10:01:02.380Z"), "login_ip" : "127.0.0.1" }
{ "_id" : ObjectId("5506a9de41e1073435ff06b5"), "id" : NumberLong(4), "user_id" : 9549, "login_time" : ISODate("2015-03-16T10:01:02.382Z"), "login_ip" : "127.0.0.1" }
And also successfully accessed mongo via java driver
But I received auth fail when trying Robomongo or 192.168.106.152:28017.
I'm not very familiar with Mongo or Mongo3.0, maybe I'm missing some key configuration?
Use MongoChef, it will work for mongodb 3.0+

MongoDB replica set preventing queries to secondary

To set up the replica set, I've run in 3 separate terminal tabs:
$ sudo mongod --replSet rs0 --dbpath /data/mining --port 27017
$ sudo mongod --replSet rs0 --dbpath /data/mining2 --port 27018
$ sudo mongod --replSet rs0 --dbpath /data/mining3 --port 27019
Then, I configured replication in the Mongo shell and verified that it worked:
> var rsconf = {
_id: "rs0",
members: [
{
_id: 0,
host: 'localhost:27017'
},
{
_id: 1,
host: 'localhost:27018'
},
{
_id: 2,
host: 'localhost:27019'
}
]
};
> rs.initiate(rsconf);
{
"info": "Config now saved locally. Should come online in about a minute.",
"ok": 1
}
// Some time later...
> rs.status()
{
"set": "rs0",
"date": ISODate("2013-06-17T13:23:45-0400"),
"myState": 2,
"syncingTo": "localhost:27017",
"members": [
{
"_id": 0,
"name": "localhost:27017",
"health": 1,
"state": 1,
"stateStr": "PRIMARY",
"uptime": 4582,
"optime": {
"t": 1371489546,
"i": 1
},
"optimeDate": ISODate("2013-06-17T13:19:06-0400"),
"lastHeartbeat": ISODate("2013-06-17T13:23:44-0400"),
"lastHeartbeatRecv": ISODate("2013-06-17T13:23:44-0400"),
"pingMs": 0
},
{
"_id": 1,
"name": "localhost:27018",
"health": 1,
"state": 2,
"stateStr": "SECONDARY",
"uptime": 5034,
"optime": {
"t": 1371489546,
"i": 1
},
"optimeDate": ISODate("2013-06-17T13:19:06-0400"),
"self": true
},
{
"_id": 2,
"name": "localhost:27019",
"health": 1,
"state": 2,
"stateStr": "SECONDARY",
"uptime": 4582,
"optime": {
"t": 1371489546,
"i": 1
},
"optimeDate": ISODate("2013-06-17T13:19:06-0400"),
"lastHeartbeat": ISODate("2013-06-17T13:23:44-0400"),
"lastHeartbeatRecv": ISODate("2013-06-17T13:23:45-0400"),
"pingMs": 0,
"syncingTo": "localhost:27017"
}
],
"ok": 1
}
My script runs fine against the primary:
$ ./runScripts.sh -h localhost -p 27017
MongoDB shell version: 2.4.3
connecting to: localhost:27017/test
Successful completion
However, against either secondary:
$ ./runScripts.sh -h localhost -p 27018
MongoDB shell version: 2.4.3
connecting to: localhost:27017/test
Mon Jun 17 13:30:22.989 JavaScript execution failed: count failed:
{ "note" : "from execCommand", "ok" : 0, "errmsg" : "not master" }
at src/mongo/shell/query.js:L180
failed to load: /.../.../myAggregateScript.js
I've read in multiple places to use rs.slaveOk() or db.getMongo().setSlaveOk(), but neither of these had any effect, whether entered from the shell or called in my script. These statements did not throw errors when called, but they didn't fix the problem, either.
Does anyone know why I can't configure my replset to allow querying of the secondary?
rs.slaveOk() run in the mongo shell will allow you to read from secondaries. Here is a demonstration using the mongo shell under MongoDB 2.4.3:
$ mongo --port 27017
MongoDB shell version: 2.4.3
connecting to: 127.0.0.1:27017/test
replset:PRIMARY> db.foo.save({})
replset:PRIMARY> db.foo.find()
{ "_id" : ObjectId("51bf5dbd473d5e80fc095b17") }
replset:PRIMARY> exit
$ mongo --port 27018
MongoDB shell version: 2.4.3
connecting to: 127.0.0.1:27018/test
replset:SECONDARY> db.foo.find()
error: { "$err" : "not master and slaveOk=false", "code" : 13435 }
replset:SECONDARY> rs.slaveOk()
replset:SECONDARY> db.foo.find()
{ "_id" : ObjectId("51bf5dbd473d5e80fc095b17") }
replset:SECONDARY> db.foo.count()
1
You have to run the command rs.slaveOk() in the secondary server's shell.
Use the following to run queries on your MongoDB secondary:
db.getMongo().setReadPref('secondary')