NULL fields document model - nosql

I'm new to elasticsearch and nosql document model in general.
In elasticsearch what is the best practice for fields that are null.
Should we declare it as null or leave it out completely?
Example: the email field
[{
id:1,
name:"John",
email:"userone#erj.com"
},
{
id:2,
name:"David",
email:null
},
{
id:3,
name:"Tony"
}]

What you want to do with the null field is completely up to you. By default, ES will completely ignore the null field. If you want, you can specify a default value in the mapping for the document for the null field as well.
mapping example:
{
"_default_" : {
"properties" : {
"id" : {"type" : "string", "index" : "not_analyzed"},
"name" : {"type" : "string"},
"email" : {"type" : "string", "index" : "not_analyzed", "null_value" : "NOEMAILAVAILABLE"}
}
}
}
The potential ways to handle this are outlined here: http://www.elasticsearch.org/guide/reference/mapping/core-types.html

Related

Mongodb query to return field value

I am trying to construct a Mongodb query to return a field value. My JSON looks like this:
"question" : "Global_Deployment",
"displayOrder" : 1,
"answerOptions" : {
"fieldId" : "1001",
"fieldType" : "radiobutton",
"fieldName" : "Global Deployment?",
"fieldLabel" : "Global Deployment?",
"helpText" : "Help will go here",
"emailTagFormControl" : "Global_Deployment?",
"source" : "custom",
"status" : "active",
"required" : "true",
"multiSelect" : "false",
"purgeFlag" : "false",
"enableAuditTrack" : "false",
"fields" : [],
"fieldValue" : "Yes",
"options" : [
{
"optionName" : "Yes"
},
{
"optionName" : "No"
}
],
"comments" : {
"commentId" : "C1001",
"commentDetails" : []
}
My query to reach the field with the fieldName "Global Deployment" is this:
db.getCollection('requests').find({"sections.questions.answerOptions.fieldName":"Global Deployment?"})
What I want to know is what to add to this query to return the value of "fieldValue", which is on a different line in the JSON. I am new to Mongodb. Any help would be greatly appreciated.
1) If you've multiple documents in DB with "fieldName" : "Global Deployment?", then .find() would return all the matching documents i.e; in the output what you get is an array of documents then you need to iterate through the array to get answerOptions.fieldValue for each document, Check the below scenario, as I've explained there are chances of getting multiple documents if "sections.questions.answerOptions.fieldName" is not an unique field.
db.getCollection('requests').find({"sections.questions.answerOptions.fieldName":"Global Deployment?"}, {'sections.questions.answerOptions.fieldValue':1})
Output of find :
/* 1 */
[{
"_id" : ObjectId("5d4e19826e173840500f5674"),
"answerOptions" : {
"fieldValue" : "Yes"
}
},
/* 2 */
{
"_id" : ObjectId("5d4e19826e073840500f5674"),
"answerOptions" : {}
}]
If you only need documents which has fieldValue in it then do this :
db.getCollection('requests').find({"sections.questions.answerOptions.fieldName":"Global Deployment?", 'sections.questions.answerOptions.fieldValue':{$exists: true}}, {'answerOptions.fieldValue':1})
Ok now you've array of documents then do iterate thru each to retrieve your value, check this mongoDB cursor tutorial .
2) If you think fieldName is unique across collection, then you can use .findOne() , which would exactly return one document (In case if you've multiple matching documents it would return first found doc) :
db.getCollection('requests').findOne({"sections.questions.answerOptions.fieldName":"Global Deployment?"}, {'sections.questions.answerOptions.fieldValue':1})
Output of findOne :
{
"_id" : ObjectId("5d4e19826e173840500f5674"),
"answerOptions" : {
"fieldValue" : "Yes"
}
}
If you see .find({},{}) has two arguments, second one is called projection which literally be useful if you want to retrieve only required fields in the response, By default mongoDB will return the entire document what ever you've posted in the question will be retrieved, Data in mongoDB flows as JSON's so operating will be similar to using JSON's, Here you can retrieve the required fields out of result, but for best use of network efficiency if you don't need entire document you'll only get the required fields using projection.
You can specify the second condition separated by comma. Either you are trying to filter data with $and or with $or
With simple approach:
{"sections.questions.answerOptions.fieldName":"Global Deployment?","sections.questions.answerOptions.fieldValue":"Yes" }
By using $and method:
.find(
{
$and: [
{"sections.questions.answerOptions.fieldName":"Global Deployment?"},
{"sections.questions.answerOptions.fieldValue":"Yes"}
]
}
)
Same way you can use $or method. Just replace $and with $or.
Edit:
If you want to retrieve specific value (in your case fieldValue), query would be:
db.getCollection('requests').find({
"sections.questions.answerOptions.fieldName":"Global Deployment?"
}).map(function(item){
return item.fieldValue
})
The correct answer here is the method .distinct() (docs)
In your case try it like this:
db.getCollection('requests').find({"sections.questions.answerOptions.fieldName":"Global Deployment?"}).distinct('fieldValue');
That will return only the value you want.
If you use findOne you can use dot notation.
For example, if we start with creating a collection to test using the following to get close to your sample:
db.stackOverflow.insertOne({
sections: {
questions: {
question: "Global_Deployment",
displayOrder: 1,
answerOptions: {
fieldId: "1001",
fieldType: "radiobutton",
fieldName: "Global Deployment?",
fieldLabel: "Global Deployment?",
helpText: "Help will go here",
emailTagFormControl: "Global_Deployment?",
source: "custom",
status: "active",
required: "true",
multiSelect: "false",
purgeFlag: "false",
enableAuditTrack: "false",
fields: [],
fieldValue: "Yes",
options: [
{
optionName: "Yes",
},
{
optionName: "No",
},
],
comments: {
commentId: "C1001",
commentDetails: [],
},
},
},
},
})
then, this query will return "Yes".
db.stackOverflow.findOne({}).sections.questions.answerOptions.fieldValue

Elasticsearch how to join publications and keywords

I have defined two indexes in elasticsearch that are populated with two different queries coming from a postgres database. I have many hundred of documents with thousand of keywords, and I have used logstash to populate the two indexes.
The first index is called publication and is defined as follow:
"mappings" : {
"doc" : {
"properties" : {
"external_id" : {"type": "text" },
"title" : {"type": "text", "analyzer":"english" },
"description" : { "type" : "text", "analyzer":"english" }
}
}
}
The second index is called keyword and is defined as follow:
"mappings" : {
"doc" : {
"properties" : {
"publication_id" : {"type": "keyword" },
"keyword" : {"type": "keyword" }
}
}
}
The relationship between the two indexes is based on the external_id <-> publication_id.
I am trying to define other indexes in a way that I can locate all the publications that have a specific keyword or all the keywords that are defined for a specific publication

Mongodb native validation for embedded documents

I'm trying to come up with a Mongodb's native validation rule that will validate a document (having an embedded document) such that either the embedded document is not present at all OR if present, it has one or more fields mandatorily present.
I have got an example below. A JSON document has an embedded document user. This user may not exist or when it exists it needs to have a field name mandatorily present.
"validator" : {
"$or" : [
{
"user" : {
"$exists" : "false",
"$type" : "null"
}
},
{
"user.name" : {
"$type" : "string",
"$exists" : "true"
}
}
]
}
And when I try to insert an empty JSON document to my collection testschema like db.testschema.insert({}), I get the standard error below which doesn't tell what is wrong and/or why. This should be possible because my document can either contain an embedded document user with a field name or not contain the embedded document at all.
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
Are the operators used inside the validator looking correct?
First thing null, true and false should not be passed as strings, false passed as the string will evaluate to true only.
Solution No need of "$type" : null validator, just "$exists" : false is enough for your case, The following validation will work for you
"validator" : {
"$or" : [
{
"user" : {
"$exists" : false,
}
},
{
"user.name" : {
"$type" : "string",
"$exists" : true
}
}
]
}

MongoDB: insert embedded document directly from another collection

I have two collection:
// Profile
{
_id: "12345",
name: "max",
country: "IT"
}
// Association
{
_id: "43234",
idclub: "1000",
state: "0"
}
I want to insert a Profile on Association without searching it.
In my code i search for an Association but i don't have the object Profile in that moment, i just have its "id".
Is it possible to perform some kind of insert on collection A retrieving on the fly the object of collection B given it's own ID?
And then, is this a recurring practice? As i can find nothing it seems not properly the best way...
Thanks
Use findAndModify operator
db.createCollection("Association");
db.Association.insert({ _id : "43234", idclub:"1000",state:"0"});
db.Association.findAndModify({
query:{ _id:"43234" },
update:{ $set:{ "profile":{ _id:"12345","name":"max","country":"IT" } } }
});
db.Association.find();
{"_id" : "43234", "idclub" : "1000", "state" : "0", "profile" : { "_id" :
"12345", "name" : "max", "country" : "IT" } }

Filtering Mongo items by multiple fields and subfields

I have the following items in my collection:
> db.test.find().pretty()
{ "_id" : ObjectId("532c471a90bc7707609a3d4f"), "name" : "Alice" }
{
"_id" : ObjectId("532c472490bc7707609a3d50"),
"name" : "Bob",
"partner_type1" : {
"status" : "rejected"
}
}
{
"_id" : ObjectId("532c473e90bc7707609a3d51"),
"name" : "Carol",
"partner_type2" : {
"status" : "accepted"
}
}
{
"_id" : ObjectId("532c475790bc7707609a3d52"),
"name" : "Dave",
"partner_type1" : {
"status" : "pending"
}
}
There are two partner types: partner_type1 and partner_type2. A user cannot be accepted partner in the both of types. But he can be a rejected partner in partner_type1 but accepted in the another, for example.
How can I build Mongo query that fetches the users that can become partners?
When your user can only be accepted in one partner-type, you should turn it around: Have a field accepted_as:"partner_type1" or accepted_as:"partner_type2". For people who aren't accepted yet, either have no such field or set it to null.
In both cases, your query to get any non-accepted will then be:
{
data.accepted_as: null
}
(null matches both non-existing fields as well as fields explicitly set to null)
For me the logical schema would be this:
"partner : {
"type": 1,
"status" : "rejected"
}
At least that keeps the paths consistent between documents.
So if you want to stay away from using mapReduce type methods to find out "which field" it is on, and otherwise use plain queries and the aggregation pipeline, then don't vary field paths on documents. If you alter the "data" then that is the most consistent form.