mongodb 3.2 document validation disallow extra fields - mongodb

we started using MongoDb and we would like to take advantage of the document validation introduced in 3.2.
We would like to disallow extra properties which are not declared in the schema. For example if the schema says :
"group1.a": {
"$type": "int"
},
"group1.b": {
"$type": "int"
}
I would like the following document to fail:
{
"group1": {
"a": 1,
"b": 2,
"c": 3
}
}
Does anybody have an idea on how to implement this ?
Thanks

According to some research like here validation only check listed fields in validation schema for existence and rules.
If field is not on the list - then it is not under validation.
As this was introduced with 3.2 then you could open a Jira ticket for this functionality here

Related

How to match a trigger on a specific field in mongodb stitch?

The $match expression on mongodb's stitch application does not work properly.
I am trying to set up a simple update trigger that will only work on one field in a collection.
The trigger setup provides a $match aggregation which seems simple enough to set up.
For example if I want the trigger to only fire when the field "online" in a specified collection gets set to "true" I would do:
{"updateDescription.updatedFields":{"online":"true"}}
which for a stitch trigger is the same as:
{$match:{{updateDescription.updatedFields:{online:"true"}}}
The problem is when i try to match an update on a field that is an object.(for example hours:{online:40,offline:120}
For some reason $exists or $in does not work
So doing:
{"updateDescription.updatedFields":{"hours":{"$exists":true}}
does not work,neither does something like:
{"updateDescription.updatedFields":{"hours.online":{"$exists":true}}
The $match for the trigger is supposed to work exactly like a normal mongo $match.
They just provide one example :
{
"updateDescription.updatedFields": {
"status": "blocked"
}
}
The example is from here:
https://docs.mongodb.com/stitch/triggers/database-triggers/
I tried 100's of variations but i can't seem to get it
The trigger is working fine if the match is a specific value like:
{"updateDescription.updatedFields":{"hours.online":{"$numberInt\":"20"}}
and then i set the hours.online to 20 in the database.
I was able to have it match items by using an explicit $expr operator or declare it as a single field not an embedded object. ie. "updateDescription.updatedFields.statue": "blocked"
I struggled with this myself, trying to get a trigger to fire when a certain nested field was updated to any value (rather than just one specific one).
The issue appears to have to do with how change streams report updated fields.
With credit and thanks to MongoDB support, I can finally offer this as a potential solution, at least for simpler cases:
{
"$expr": {
"$not": {
"$cmp": [{
"$let": {
"vars": { "updated": { "$objectToArray": "$updateDescription.updatedFields" } },
"in": { "$arrayElemAt": [ "$$updated.k", 0 ] }
}},
"example.subdocument.nested_field"
]
}
}
}
Of course replace example.subdocument.nested_field with your own dot-notation field path.

Azure CosmosDB operation not supported when using $elemMatch and $in

I am doing a query like the following, which works fine with MongoDB, but sometimes fails with CosmosDB. I need it to work with both.
(XXX is a placeholder for any string value. All strings have unique values that are redacted for readability, and actual content should be of no significance.)
{
server_index: {
$elemMatch: {
server: "XXX",
index: "XXX",
delete_time: { $exists: false },
path: {
$in: ["XXX", "XXX", "XXX" ]
}
}
}
}
The schema of a document is somewhat like this:
{
...,
server_index: [
{
server: "XXX",
index: "XXX",
delete_time: ISODate(...), // optional
path: "XXX"
},
{...}, // same as above
...
],
...
}
This query sometimes works as expected with CosmosDB as well, but sometimes I also get the following response:
{
_t: "OKMongoResponse",
ok: 0,
code: 115,
errmsg: "Command is not supported",
$err: "Command is not supported"
}
What is especially strange is that the query seemingly succeeds, and the response above is returned by a "valid" cursor as the first document, which then causes my document parser "crash".
I am using the C++ legacy driver. Is this even supported by Cosmos DB?
(According to the developer I inherited this project from, it is, and as always when you inherit projects, it all worked fine according to the previous developer... So this may be due to a change in Cosmos DB, due to the nature of my test data, or who knows what...)
Side note: In MongoDB, there is a multi-key index on server_index, which looks like this:
{
"server_index.delete_time" : 1,
"server_index.server" : 1,
"server_index.index" : 1,
"server_index.path" : 1
}
Is this even supported in CosmosDB?
EDIT: Trying to add this index with Robo 3T silently fails, with no error message whatsoever. The index is simply not added. Nice!
(Please don't ask about the strange database schema. It is like it is for a reason, and believe me, I, too, would like to burn it all down and replace it with something else ... I am open for suggestions for alternative queries, though)
This was probably a server-side problem. It seemed wrong in the first place (error status returned as part of the query result), and it has disappeared after a couple of weeks without me changing anything.

index Elasticsearch document with existing "id" field

I have documents that I want to index into Elasticsearch with an existing unique "id" field.
I get an array of documents from a REST api endpoint ( eg.: http://some.url/api/products) in no particular order and if a document with the _id already exists in Elasticsearch it should update and reindex the document.
I want to create a new document if no document with the _id in Elasticsearch exists and then update a document, if it matches with an existing document in Elasticsearch.
This could be done with:
PUT products/product/un1qu3-1d-b718-105973677e95
{
"id": "un1qu3-1d-b718-105973677e95",
"state": "packaged"
}
The basic idea is to use the provided "id" field to create or update a document. Extraction of _id from document fields seems deprecated (link). But the indexing/ reindexing of documents with the "id" field can be done manually very easy with the kibana dev tools, with postman or a cURL request.
I want to achieve this (re-)indexing of documents that I receive over this api endpoint programmatically.
Is it possible to achieve this with logstash or a simple cronjob? Does Elasticsearch provide any functionality for this? Or do I need to write some custom backend to achieve this?
I thought of either:
1) index the document into Elasticsearch with the "id" field of my document or
2) find an Elasticsearch query that first searches for the document with the specific "id" field and then updates the document.
I was unable to find a solution for either way and have no clue how a good approach would look like.
Can anyone point me into the right direction on how to achieve this, suggest a better approach or provide a solution?
Any help much appreciated!
Update
I solved the problem with the help of the accepted answer. I used Logstash, the Http_poller input plugin, this article: https://www.elastic.co/blog/new-way-to-ingest-part-1 and this elastic.co question: https://discuss.elastic.co/t/upsert-with-logstash/59116
My output of logstash looks like this at the moment:
output {
elasticsearch {
index => "products"
document_type => "product"
pipeline => "rename_id"
document_id => "%{id}"
doc_as_upsert => true
action => "update"
}
Update 2
just for the sake of completeness I added the "rename_id" pipeline
{
"rename_id": {
"description": "_description",
"processors": [
{
"set": {
"field": "_id",
"value": "{{id}}"
}
}
]
}
}
It works this way!
Thanks alot!
Peter,
If I understand correctly, you want to ingest your documents into elastic search and will have some updates in future for these documents ?
If that's the case,
- Use your documents primary key as id for elastic documents.
- You can ingest entire document with updated values, elastic will replace the previous document with new one. given the primary key is same. Old document with same id will be deleted.
We use this approach for our search data.
you can use ingest pipelines to extract the id from the body and the _create endpoint to only create a document if it does not exist. Minor note: If you could specify the id on the client side indexing would be faster, as adding a pipeline adds a certain overhead.
PUT _ingest/pipeline/my_pipeline
{
"description": "_description",
"processors": [
{
"set": {
"field": "_id",
"value": "{{id}}"
}
}
]
}
PUT twitter/tweet/1?op_type=create&pipeline=my_pipeline
{
"foo" : "bar",
"id" : "123"
}
GET twitter/tweet/123
# this call will fail
PUT twitter/tweet/1?op_type=create&pipeline=my_pipeline
{
"foo" : "bar",
"id" : "123"
}
You can use script to UPSERT (update or insert) your document
PUT /products/product/un1qu3-1d-b718-105973677e95/_update
{
"script": {
"inline": "ctx._source.state = \"packaged\"",
"lang": "painless"
},
"upsert": {
"id": "un1qu3-1d-b718-105973677e95",
"state": "packaged"
}
}
Above query find the document with _id = "un1qu3-1d-b718-105973677e95"
if it is able to find any document then it will update state to "packaged" otherwise create a new document with field "id" and "state" (you can insert as many fields as you want).

print the length of field (string)

how to print field and length of this field
e.g. I have {name:"aaa"} document is collection "names"
then the expected output is
{name:"aaa", name_legth:3}
Please help.
MongoDB versions <3.2 don't have a text aggregation operator to compute length of a string value stored in a field. If you are using version 3.2 or older, you will need to implement the length computation outside the DB (such as in the controller layer of an MVC architecture).
Version 3.4, though, includes several new and useful aggregation operators including the $strLenCP operator which should serve your purpose. The usage for your case would be as follows:
db.names.aggregate(
[
{
$project: {
"name": 1,
"name_length": { $strLenCP: "$name" }
}
}
]
)
The documentation for the aggregation operator can be found here.

Delete a MongoDB subdocument by value

I have a collection containing documents that look like this:
{
"user": "foo",
"topics": {
"Topic AB": {
"score": 20,
"frequency": 3,
"last_seen": 40
},
"Topic BD": {
"score": 10,
"frequency": 2,
"last_seen": 38
},
"Topic TF": {
"score": 19,
"frequency": 6,
"last_seen": 20
}
}
}
I want to remove subdocuments whose last_seen value is less than 30.
I don't want to use arrays here since I'm using $inc to update the subdocuments in conjunction with upsert (which doesn't support the $ notation).
The real question here is how can I delete a key depending on its value. Using $unset simply drops a subdocument regardless of what it contains.
I'm afraid I don't think this is possible with your current design. Knowing the name of the key whose last_seen value you wish to test, for example Topic TF, you can do
> db.topics.update({"topics.Topic TF.last_seen" : { "$lt" : 30 }},
{ "$unset" : { "topics.Topic TF" : 1} })
However, with an embedded document structure, if you don't know the name of the key that you want to query against then you can't run the query. If the Topic XX keys are only known by what's in the document, you'd have to pull the whole document to find out what keys to test, and at that point you ought to just manipulate the document client-side and then update by _id.
The best option is to use arrays. The $ positional operator works with upserts, it just has a serious gotcha that, in the case of an insert, the $ will be interpreted as part of the field name instead of as an operator, so I understand your conclusion that it doesn't seem feasible. I'm not quite sure how you are using upsert such that arrays seem like they won't work, though. Could you give more detail there and I'll try to help come up with a reasonable workaround to use arrays and $ with your use case?