How do I handle fields in elasticsearch that contain a '_'? - mongodb

I am using a Mongo-Connector targeting elasticsearch. This works great for keeping elasticsearch up to date, but I have a problem with one of the fields because it contains an '_'. The data is being replicated/streamed from mongodb continually if I run a rename/reindex the new documents will start showing up with underscores again.
Kibana does not support underscores at the start of a field. What is the best practice for handling this?
I have filed an issue with elastic2-doc-manager for Mongo-Connector to support ingest nodes, but this feels like a much bigger issue with kibana all my attempts at fixing this issue using scripted fields and renaming the field have failed.
This seems like a huge problem. I see underscores in data everywhere, seems like a very poor decision on the side of the kibana team.
Kibana Error:
I have found some github referencese to this issue, but no work arounds.
Closed Issue: fields starting with underscore ( _ ) doesn't show
Never merged Pull request: Lift restriction of not allowing
'_'-prefixed fields.
Open Issue: Allow fields prefixed with _
(originally #4291)

Fields beginning with _ are reserved for use within Elasticsearch. Kibana does not support fields with _ currently, at least not yet. A request for this - https://github.com/elastic/kibana/issues/14856 is still open.
Until then if you would like to use the field in visualizations etc, I believe you need to rename it.
While you can't rename the field easily without using logstash or filebeat and Mongo-Connector doesn't support either of them you can instead use a scripted field as below to create a new filed and copy the _ field's value. That way you can use the new field to visualize etc. Add a new scripted field for ex. itemType with the below script and see if it works.
doc['_itemType.keyword'].value
Please note though that only keyword fields can be used like this, text type fields won't work. If your _itemType field is of type text, modify the mapping to include a sub field keyword of keyword type under _itemType and try the scripted field.

Related

custom encoder/decoder for insert or geting documents for mongo-driver

I have read this friendly article about encoding and decoding custom object using official go mongo driver.
There is nice example how to marshaling them into extended json format (bson.MarshalExtJSONWithRegistry). But I would like to know how to put this document into collection with InserOne() (and later get it from). Look at this pseudo code:
// myReg - variable created according to linked article in question.
// WithRegistry do **not** exist in mongo-driver lib is part of pseudocode
mongoCollection := client.Database("db").Collection("coll").WithRegistry(myReg)
// Now InserOne() honor myReg (type *bsoncodec.Registry) when serialize `val` and puting it into mongodb
mongoCollection.InsertOne(context.TODO(), val)
I have go through API doc and I have found there are Marshaler and Unmarshaler interfaces, but with registry way I would be able to (de)serialize the same type in different way on different collection (for example when migrate from old format to new one).
So the question is how to use *bsoncodec.Registry with collection functions (like InserOne, UpdateOne, FindOne etc.) if not what is the most idiomatic way to achieve my goal (custom (de)serialize).
The Database.Collection() method has "optional" options.CollectionOptions parameter which does have option to set the bsoncodec.Registry. If you acquire your collection using an options that's configured with a registry, that registry will be used with all operations performed with that collection.
Use it like this:
opts := options.Collection().SetRegistry(myReg)
c := client.Database("db").Collection("coll", opts)
Quoting from my related answer: How to ignore nulls while unmarshalling a MongoDB document?
Registries can be set / applied at multiple levels, even to a whole mongo.Client, or to a mongo.Database or just to a mongo.Collection, when acquiring them, as part of their options, e.g. options.ClientOptions.SetRegistry().
So when you're not doing migration from old to new format, you may set the registry at the "client" level and "be done with it". Your registry and custom coders / decoders will be applied whenever the driver deals with a value of your registered custom type.

Master Data Services - Domain based attributes

We are using Master Data Services as an MDM solution for our SQL Server BI environment. I have an entity containing a first name and last name and then I have created a business rule that concatenates these two fields to form a full name which is then stored in the "name" system field of the entity.
I use this as a domain based entity in another entity. Then the user can then see the full name before linking it as a attribute in the second entity.
I want to be able to restrict the users from capturing data in the first entity against the name attribute because the business rule deals with the logic to populate this attribute. I have read that there are two ways to do this:
Set the display width to zero of the attribute. This does not seem to work, the explorer version still shows a narrow version of the field in the rows and the user can still edit the field in the detail pane.
Use the security to make the attribute read only. I have tried different combinations of this but it seems that you cannot use this functionality for a name field (system field).
This seems like pretty basic functionality that I require and it seems that there is no clear cut way to do this in MDS.
Any assistance will be appreciated.
Thanks
We do exactly the same thing.
I tested it, and whether you create a new member, or edit an existing member, the business rule just overwrites the manual input value in the name attribute.
Is there a specific 'business' reason why you need to restrict data input in the name field? If it is for Ux reasons, you can change the display name of the name attribute to something like 'Don't populate' or alternatively make it a '.', then the users won't know what to input.

Can I prefix with $ (dollar sign) my URL query string parameters safely?

I am starting to implement filtering through query strings in my API REST application where I can perform filtering on any field passed to a entity in my database. But there are especial parameters that I want to differentiate like; sort, page, direction, etc from the rest of fields that will be used to filter the collection.
I want to avoid using an implementation like:
/?filters=field1:value1,field2:value2&page=3&per_page=50
Because I will need to make a custom parser to the filters value.
My desired structure is something more plain, like this:
/?lastname=Halden&country=somewhere&$sort=lastname
So, all the properties that aren't prefixed with $ are used to make the filter and the other properties with the prefix are used to tweak the result.
My actual question is that if it is safe?. If there can exist a problem in the moment to parse the whole query string in some libraries.
It should be ok. As a matter of fact, ODATA, one of the widely used REST standards in enterprise software uses $ the same way you want to use. Another sensible option you could use is _ (e.g. _sort, _page, etc.)

kibana error in displaying some data

Im indexing from MongoDB 2.4.9 to Elasticsearch 1.1.1 using the River Plugin. And of course, I'm using Kibana3
The documents in the MongoDB that I have contain a cidr. The cidr is in the format:
"cidr" : "0.0.0.0/00"
I have a table and a term panels in my kibana dashboard.
The Table panel shows the part 0.0.0.0/
and the term panel shows the part 00
I need both panels to show the WHOLE cidr value! Like this: 0.0.0.0/00
Does anyone have any idea why these two panels are behaving this way?
Thank you
Elasticsearch is processing the input, and splitting on the "/". logstash should be creating a "raw" version of the field. Try referencing "cidr.raw" in kibana.
If you're not using logstash, you'll need to update the elasticsearch mapping to either set the field to not_analyzed or to add the ".raw" field yourself.
The reference for using not_analyzed is here. Grab the current mapping, edit it, and post it back.
To add ".raw", check out the logstash default template, which shows you the magic to make a multi_field with ".raw".

MongoDB ObjectId foreign key implementation recommendation

I'm looking for a recommendation on how best to implement MongoDB foreign key ObjectId fields. There seem to be two possible options, either containing the nested _id field or without.
Take a look at the fkUid field below.
{'_id':ObjectId('4ee12488f047051590000000'), 'fkUid':{'_id':ObjectId('4ee12488f047051590000001')} }
OR
{'_id':ObjectId('4ee12488f047051590000000'), 'fkUid':ObjectId('4ee12488f047051590000001')} }
Any recommendations would be much appreciated.
I'm having a hard time coming up with any possible advantages for putting an extra field "layer" in there, so I would personally just store the ObjectId directly in fkUid.
I suggest to use default dbref implementation, that is described here http://www.mongodb.org/display/DOCS/Database+References and is compatible with most of specific language drivers.
If your question is about the naming of the field (what you have in the title), usually the convention is to name it after the object to which it refers.
The both ways that you have mentioned are one of the same meaning. But they have different kind of usages.
Storing fkUid like 'fkUid':{'_id':ObjectId('4ee12488f047051590000001')} an object has it's own pros. Let me give an example, Suppose there is a website where users can post images and view images posted by other users as well. But when showing the image the website also shows the name/username of the user. By using this way you also can store the details like 'fkUid':{'_id':ObjectId('4ee12488f047051590000001'), username: 'SOME_X'}. When you are getting details from the db you don't have to send a request again to get the username for the specific _id.
Where as in the second way 'fkUid':ObjectId('4ee12488f047051590000001')} } you have to send another request to the server only for getting the name/username and nothing else is useful from the same object.