Pig & MongoDB - How to load document nested fields with MongoLoader? - mongodb

I have a set of document stored on mongodb, which are like this
{
"_id" : { "$oid" : "5201ca52ddf19f9c7aea0bb2"} ,
"id" : 1 ,
"path" : "C://..." ,
"experiences" : [
{ id = "1", date="12/2012", content="blabla" }
{ id = "2", date="12/2013", content="blabla2" }
]
}
I would like to process the "experiences" fields of these documents to obtain an output like this:
(1,1,12/2012,blabla)
(1,2,12/2013,blabla2)
the schema is (document_id,exp_id,exp_date,exp_content).
I'm loading the document via pig and MongoLoader, here is my code:
REGISTER /root/mongo-2.10.1.jar
REGISTER /root/pig_librairies/mongo-hadoop_cdh4.3.0-1.1.0.jar
REGISTER /root/pig_librairies/mongo-hadoop-pig_cdh4.3.0-1.1.0.jar
REGISTER /root/pig_librairies/mongo-hadoop-core_cdh4.3.0-1.1.0.jar
persons = LOAD 'mongodb://localhost/gestion_competences.cv'
USING com.mongodb.hadoop.pig.MongoLoader('id:chararray, path:chararray, experiences:charrarray)
AS (id, path, experiences);
I know the problem is here:
experiences:chararray
but I don't know what structure i could use. I tried bags and maps and it doesn't work...
Do you have an idea on how to solve the problem ?
Thanks

Try experiences:map[] and then you can access values via key as experiences#'content'

Related

MongoDB Find and Modify With GraphQL

I am working on GraphQL mutation and need help here. My document looks like
{
"_id" : ObjectId("5bc02db357146d0c385d4988"),
"item_type" : "CategoryMapping",
"id" : null,
"CategoryGroupName" : "Mystries & Thriller",
"CustomCategory" : [
{
"name" : "Private Investigator",
"MappedBisacs" : [
"investigator",
"Privately owned",
"Secret"
]
},
{
"name" : "Crime Investigator",
"MappedBisacs" : [
"crime investigator",
"crime thriller"
]
}
]
}
UI
Allow user to update MappedBisacs through list of checkbox. So user can add/update or delete list of bisacs.
Problem - When client send GraphQL query like following;
mutation {
CategoryMapping_add(input: {CategoryGroupName: "Mystries & Thriller", CustomCategory: [{name: "Crime Investigator", MappedBisacs: ["investigator", "dafdfdaf", "dafsdf"]}]}) {
clientMutationId
}
}
I need to find Specific custom category and update its bisac array.
I am not sure if I got it, but this more a doubt on MongoDb than on GraphQL itself. First you must find the document that you want (I would use the id of the document instead of CategoryGroupName), then you can update this array in several ways. For example, after you found the document, you could simply access the array content and spread into a new one adding this new data from your mutation, and save this object with the update method. (if you simply want to add new data without removing any)
So, it depends on the case.
Check: https://docs.mongodb.com/manual/reference/operator/update-array/
Hope it helps! :)

How to change a field & add a sub-field in Mongodb

I have mongo documents like the following:
{
"_id" : ObjectId("591e36eb2cdce09936d8e511"),
"bday" : null,
"studentId" : "000004"
}
I want to make it :
{
"_id" : ObjectId("591e36eb2cdce09936d8e511"),
"bday" : null,
"studentId" : {
"id" : "000004"
}
}
I have tried the following:
db.persons.update({"studentId": "000004"},{$set : {"studentId.id": "000004"}},false,true)
But it is giving an error:
LEFT_SUBFIELD only supports Object: studentId not: 2
Can anyone help?
Thanks,
Sumit.
In your case you see an error because you are trying to take "studentId" which is string and add a property inside it. To fix it you should set entire new object like this
db.persons.update({"studentId":"000004"},{$set:{"studentId":{"id":"000004"}}})
Another option, if, for example you need to change structure for entire collection, is to just iterate over each element and update them. Like this:
db.persons.find({}).forEach(function (item) {
if (item.studentId != null){
item.studentId = {id:item.studentId};
db.persons.save(item);
}
});
I hope this will help
db.collection.update({_id:ObjectId("591e36eb2cdce09936d8e511")},{$set:{"studentId":{"id":"000004"}}})

Searching with dynamic field name in MongoDB

I have a situation where records in Mongo DB are like :
{
"_id" : "xxxx",
"_class" : "xxxx",
"orgId" : xxx,
"targetKeyToOrgIdMap" : {
"46784_56139542ecaa34c13ba9e314" : 46784,
"47530_562f1bc5fc1c1831d38d1900" : 47530,
"700004280_56c18369fc1cde1e2a017afc" : 700004280
},
}
I have to find out the records where child nodes of targetKeyToOrgIdMap has a particular set of values. That means, I know what the value is going to be there in the record in "46784_56139542ecaa34c13ba9e314" : 46784 part. And the field name is variable, its combination of the value and some random string.
In above example, I have 46784, and I need to find all the records which have 46784 in that respective field.
Is there any way I can fire some regex or something like that or by using any other mean where I would get the records which has the value I need in the child nodes of the field targetKeyToOrgIdMap.
Thanks in advance
You could use MongoDB's $where like this:
db.myCollection.find( { $where: function() {
for (var key in obj.targetKeyToOrgIdMap) {
if (obj.targetKeyToOrgIdMap[key] == 46784){
return true;
}
}
}}).each { obj ->
println obj
}
But be aware that this will require a full table scan where the function is executed for each document. See documentation.

how to find a value of a document nested in another document

Hello i'm new to Mongodb and i have a question that i haven't found an answer for it yet.
I would like to know how to find all the users that are younger than a certain age. db.getCollection('data').find({age:{$lt:50}}) is not working
I would like to know how to Extract all the mails of the users to a csv file.
Regards,
Nati
//'data'- is the a document/table
//The data looks like that :
db.getCollection('data').find({}) :
/* 1 */
{
"_id" : "8f911",
"userDetails" : { "age" : "19",
"birthday" : "1996/5/11"
},
"username" : "emailemail#do.com"
}
/* 2 */
.
.
.
.
age is nested inside userDetails. Can you try:
db.getCollection('data').find({"userDetails.age":{$lt:50}})
Since it is a string you can use JavaScript Expression for query. It will do the typecasting:
db.getCollection('data').find("this.userDetails.age < 50"}})

Query object within object

I'm using passport.js to store my users into my mongodb. A user object looks like this
{
"_id" : ObjectId("54893faf0907a100006341ee"),
"local" : {
"password" : [encrypted password],
"email" : "johnsmith#domain.com"
},
"__v" : 0
}
In a mongodb shell how would I go about listing all the emails? I'm finding it difficult to do this as my data sits two level deep within the object. Cheers!
You can use distinct to get a list of a field's distinct values in the collection, using dot notation to reference the embedded field:
db.users.distinct('local.email')