I have a Springboot application + MongoDB and I need to audit every update made to a collection on specified fields (data analysis purpose).
If I have a collection like:
{
"_id" : ObjectId("12345678910"),
"label_1" : ObjectId("someIdForLabel1"),
"label_2" : ObjectId("someIdForLabel2"),
"label_3" : ObjectId("someIdForLabel"),
"name": "my data",
"description": "some curious stuff",
"updatedAt" : ISODate("2022-06-21T08:28:23.115Z")
}
I want to write an audit document whenever a label_* is updated. Something like
{
"_id" : ObjectId("111213141516"),
"modifiedDocument" : ObjectId("12345678910"),
"modifiedLabel" : "label_1",
"newValue" : ObjectId("someNewIdForLabel1"),
"updatedBy" : ObjectId("userId"),
"updatedAt" : ISODate("2022-06-21T08:31:20.315Z")
}
How can I achieve this with MongoListener? I already have two methods for AfterSave and AfterDelete , for other purposes, but they give me the whole new Document.
I would rather avoid to query again the DB or to use a findAndModify() in the first place.
I gave a look to ChangeStreams too, but I have too many doubts when it comes to more than 1 instance.
Thank you so much, any tip will be appreciated!
I'm getting puzzled more and more discovering how mongodb is overcomplicated and bad designed in the query writing, anyway I have this kind of document in a db with thousand of records:
db.messages.aggregate([{$limit: 1}]).pretty()
{
"_id" : ObjectId("4f16fc97d1e2d32371003f42"),
"body" : "Hey Gillette,\n\nThe heat rate is going to depend on the type of fuel and the construction \ndate of the unit. Unfortunately, most of that info is proprietary. \n\nChris Gaskill is the head of our fundamentals group and he might be able to \nsupply you with some of the guidelines.\n\n-Bass\n\n\n \n\tEnron North America Corp.\n\t\n\tFrom: Lisa Gillette 04/05/2001 02:31 PM\n\t\n\nTo: Eric Bass/HOU/ECT#ECT\ncc: \nSubject: Power Generation Question\n\nHey Bass,\n\nI have a question and I am hoping you can help me. I am wanting to compile a \nlist of all the different types of power plants and their respective heat \nrates to determine some sort of generation ratio.\n\ni.e. Coal 4 mmbtu = 1 MW\n Simple Cycle 11 mmbtu = 1 MW\n\nPlease let me know if you can help me or point me to someone who can. Just \nFYI...Bryan suggested that I call you so blame him as you curse me under your \nbreath right now.\n\nThanks,\nLisa\n\n",
"filename" : "1045.",
"headers" : {
"Content-Transfer-Encoding" : "7bit",
"Content-Type" : "text/plain; charset=us-ascii",
"Date" : ISODate("2001-04-05T14:45:00Z"),
"From" : "eric.bass#enron.com",
"Message-ID" : "<2106897.1075854772243.JavaMail.evans#thyme>",
"Mime-Version" : "1.0",
"Subject" : "Re: Power Generation Question",
"To" : [
"lisa.gillette#enron.com"
],
"X-FileName" : "ebass.nsf",
"X-Folder" : "\\Eric_Bass_Jun2001\\Notes Folders\\Sent",
"X-From" : "Eric Bass",
"X-Origin" : "Bass-E",
"X-To" : "Lisa Gillette",
"X-bcc" : "",
"X-cc" : ""
},
"mailbox" : "bass-e",
"subFolder" : "sent"
}
And I need to find records from address X to address Y.
I managed to catch the "From" records with
db.messages.find({"headers.From": "eric.bass#enron.com"}).pretty().count()
But I can't get the To records (and I Need to get both togheter).
To query the "To" field I've tried:
db.messages.find({headers: {$elemMatch :{ "To": "lisa.gillette#enron.com"}}})
But it returns nothing
What am I missing?
Thanks
$elemMatch - To use this operator we need to give the array element and the matching operator, here in your case it should be like
db.messages.find({"headers.To": {$elemMatch :{$eq:"lisa.gillette#enron.com"}}})
$elemMatch is optimal to use when we have multiple queries to given for the array elements. If we are specifying only a single condition in the $elemMatch expression, we don't need to use $elemMatch, instead we can use find
db.messages.find({"headers.To": "lisa.gillette#enron.com"});
Take the following URI's as an example:
/tracks
/tracks/:id
/playlists
/playlists/:id
/playlists/:id/tracks
I have a question about the last URI (/playlists/:id/tracks). How do I add extra information/context to the track objects in relation to it's parent playlist?
Examples of context:
Added time of the track to the playlist
Play count of the track within the playlist
Likes per track within the playlist
All tracks have a created timestamp, play count and likes on a global scale. So my question is how would this information be added to the response of the endpoint.
I've come up with following for now:
{
"title" : "harder better faster stronger",
"artist" : "daft punk",
"likes" : 234252,
"created_at" : "2012-10-03 09:57:04"
"play_count" : 1203200035,
"relation_to_parent": {
"likes" : 5,
"created_at" : "2014-11-07 19:21:64",
"play_count" : 20
}
}
I've added a field called relation_to_parent which adds some context to the relation between the child and it's parent. I'm not sure though if this is a good way to do it. Hope to hear some other solutions.
By 1:n relations you can define a subresource. By n:m relations it is better to define a separate relationship resource. Note that these are just best practices, not standards.
Be aware that you can add links pointing to a different resource. According to the HATEOAS constraint you have to create hyperlinks if you want to expose an operation (for example getting another resource).
I don't think there is a 'one true way' to do this. Personally, I dislike adding the extra information like that, since you are giving a resource-plus, when you are looking for a resource. In any case, are 'likes' and 'created_at' and 'play_count' actually part of the relation to the parent, aren't they part of the track itself?
The two paths I usually see for this are:
/playlist/:id/tracks - returns a list of IDs (or URLs) for actual tracks, which you then fetch with /tracks/:track
/playlist/:id/tracks - returns the actual tracks, as if you did both steps in 1 above.
As for additional information, if it is not part of the tracks, you might do it as (any of these is valid):
info as part of the track, so /tracks/:track always returns the 'play_count' and 'likes', etc.
separate information, i.e. its own resource, if you want to keep the track clean. So you might get it at /tracks/:track/social_info or maybe /social_info/:track where it matches the track ID 1-to-1
If you have actual relation information, then it depends if it is 1:1 or 1:N or N:1 or N:N. 1:1 or 1:N or N:1 you would probably reports as part of the resource itself, while N:N would either be part of the resource (JSON objects can have depth) or as a separate resource.
Personally, I have done all of the above, and find cleaner is better, even if it is multiple retrievals. But now we are delving into opinion....
EDITED:
There are lots of ways to do N:N, here are just some:
/playlist/:id/tracks/:track/social_info - which could be embedded or a link to another object
/social_info/:playlist - more direct
/social_info/playlist/:id if you might have different kinds of social info
Personally (there is that word again; so much of this is personal preference and opinion), every time I have tried using deeper paths, thinking something only makes sense in a parent context, I have found myself ending up making its own resource for it, and linking back, so the 2nd or 3rd option ends up being what I do, with the first linking to it (either convenience to retrieve it or retrieve a list of it).
Mostly, that has not been because of constraints on the server side - e.g. when I write in nodejs, I use http://github.com/deitch/booster which handles multiple paths to the same resource really easily - but because client side frameworks often work better with a one true path.
If you want to fully embrace RESTful service design principles you definitely want to use hyperlinks in your representation format. JSON has some existing specifications if you prefer not to come up with your own: HAL and JSON API. A naive hypermedia format might look like this:
{
"playlist_id" : "666",
"created_at" : "2014-11-07 19:21:64",
"likes" : 5,
"tracks" : [
{"index" : 1,
"begin_at" : "00:02:00",
"end_at" : "00:05:23",
"_links" : {"track" : {
"href" : "/tracks/123",
"type" : "track"}}},
{"index" : 2,
"_links" : {"track" : {
"href" : "/tracks/432",
"type" : "track"}}},
{"index" : 3,
"_links" : {"track" : {
"href" : "/tracks/324",
"type" : "track"}}},
{"index" : 4,
"_links" : {"track" : {
"href" : "/tracks/567",
"type" : "track"}}}]
}
More elaborate features are included in both HAL and JSON API, like defining embedded resources and link templates. Using such semantics you might end up with something like the following:
{
"id" : "666",
"created_at" : "2014-11-07 19:21:64",
"likes" : 5,
"tracks" : [
{"id" : "123",
"index" : 1,
"begin_at" : "00:02:00",
"end_at" : "00:05:23"},
{"id" : "432",
"index" : 2},
{"id" : "324",
"index" : 3},
{"id" : "567",
"index" : 4}
],
"_links" : {
"_self" : {
"href" : "/playlists/666",
"type" : "playlist"},
"tracks" : {
"href" : "/tracks/{id}",
"type" : "track"}
},
"_embedded" : {
"track" : [
{"id" : "123",
"title" : "harder better faster stronger",
"artist" : "daft punk",
"created_at" : "2012-10-03 09:57:04",
"likes" : 234252,
"play_count" : 1203200035},
{"id" : "432",
"title" : "aerodynamic",
"artist" : "daft punk",
"created_at" : "2009-03-07 11:11:11",
"likes" : 33056,
"play_count" : 8796539}
]
}
}
Also, don't forget that using hyperlinks to express static relationships between entities is just the beginning of the journey. Using Hypermedia As The Engine Of Application State is the real Nirvana... but then you might be aiming too high.
I've downloaded a database yesterday consistent of tweets during the games in the confederation cup. And I saved it in the Mongo DB. My data model in the database is like the following json:
{ "_id" : ObjectId("51bc9036194069119ff88c10"),
"text" : "Adianta AGORA ir para ruas e protesta, como em Brasilia e outras capitais contra a copa das confederações e do... http://t.co/41e4GGoe4o",
"created_at" : "2013-06-15 16:03:02",
"id" : NumberLong("345934481724669954"),
"user" : { "image" : "http://a0.twimg.com/profile_images/3425436378/57edc83f19d834283351a3729595d480_normal.jpeg",
"screen_name" : "Fernando_Fontes",
"id" : 54433693,
"name" : "Fernando Fontes"
}
}
Now, I wish to retrieve tweets using a term from the field 'text', for example: 'Brasilia'. But, I couldn't make a query searching for part of the text. I'm still starting with NoSQL and Big Data things. Is there any way to find the documents which has a word inside the field 'text'?
Thanks a lot in advance,
Thiago
I have a basic structure like this:
> db.users.findOne()
{
"_id" : ObjectId("4f384903cd087c6f720066d7"),
"current_sign_in_at" : ISODate("2012-02-12T23:19:31Z"),
"current_sign_in_ip" : "127.0.0.1",
"email" : "something#gmail.com",
"encrypted_password" : "$2a$10$fu9B3M/.Gmi8qe7pXtVCPu94mBVC.gn5DzmQXH.g5snHT4AJSZYCu",
"last_sign_in_at" : ISODate("2012-02-12T23:19:31Z"),
"last_sign_in_ip" : "127.0.0.1",
"name" : "Trip Jameson",
"sign_in_count" : 100,
"usertimes" : [
...thousands and thousands of records like this one....
{
"enddate" : 348268392.115282,
"idle" : 0,
"startdate" : 348268382.116728,
"title" : "My Awesome Title"
},
]
}
So I want to find only usertimes for a single user where the title was "My Awesome Title", and then I want to see what the value for "idle" was in that record(s)
So far all I can figure out is that I can find the entire user record with a search like:
> db.users.find({'usertimes.title':"My Awesome Title"})
This just returns the entire User record though, which is useless for my purposes. Am I misunderstanding something?
Return only partial embedded documents is currently not supported by MongoDB
The matching User record will always be returned (at least with the current MongoDB version).
see this question for similar reference
Filtering embedded documents in MongoDB
This is the correspondent Jira on MongoDB space
http://jira.mongodb.org/browse/SERVER-142
Use:
db.users.find({'usertimes.title': "My Awesome Title"}, {'idle': 1});
May I suggest you take a more detailed look at http://www.mongodb.org/display/DOCS/Querying, it'll explain things for you.