I've downloaded a database yesterday consistent of tweets during the games in the confederation cup. And I saved it in the Mongo DB. My data model in the database is like the following json:
{ "_id" : ObjectId("51bc9036194069119ff88c10"),
"text" : "Adianta AGORA ir para ruas e protesta, como em Brasilia e outras capitais contra a copa das confederações e do... http://t.co/41e4GGoe4o",
"created_at" : "2013-06-15 16:03:02",
"id" : NumberLong("345934481724669954"),
"user" : { "image" : "http://a0.twimg.com/profile_images/3425436378/57edc83f19d834283351a3729595d480_normal.jpeg",
"screen_name" : "Fernando_Fontes",
"id" : 54433693,
"name" : "Fernando Fontes"
}
}
Now, I wish to retrieve tweets using a term from the field 'text', for example: 'Brasilia'. But, I couldn't make a query searching for part of the text. I'm still starting with NoSQL and Big Data things. Is there any way to find the documents which has a word inside the field 'text'?
Thanks a lot in advance,
Thiago
Related
I have a Springboot application + MongoDB and I need to audit every update made to a collection on specified fields (data analysis purpose).
If I have a collection like:
{
"_id" : ObjectId("12345678910"),
"label_1" : ObjectId("someIdForLabel1"),
"label_2" : ObjectId("someIdForLabel2"),
"label_3" : ObjectId("someIdForLabel"),
"name": "my data",
"description": "some curious stuff",
"updatedAt" : ISODate("2022-06-21T08:28:23.115Z")
}
I want to write an audit document whenever a label_* is updated. Something like
{
"_id" : ObjectId("111213141516"),
"modifiedDocument" : ObjectId("12345678910"),
"modifiedLabel" : "label_1",
"newValue" : ObjectId("someNewIdForLabel1"),
"updatedBy" : ObjectId("userId"),
"updatedAt" : ISODate("2022-06-21T08:31:20.315Z")
}
How can I achieve this with MongoListener? I already have two methods for AfterSave and AfterDelete , for other purposes, but they give me the whole new Document.
I would rather avoid to query again the DB or to use a findAndModify() in the first place.
I gave a look to ChangeStreams too, but I have too many doubts when it comes to more than 1 instance.
Thank you so much, any tip will be appreciated!
I created a text index based on title and main_body fields in my Mongo Collection. I have for instance an article with the title: "Abby Bengtsson" and her name "Abby", appearing throughout the actual article in main_body.
Making a text search query: {$text: {$search: 'abby bengtsson'}}, returns the desired article, along with a couple more.
But simply querying her first name: {$text: {$search: 'abby'}}, returns nothing.
I have tried using Mongo Compass, Downloaded Studio 3T, and using ssh and terminal commands on the server directly.
But I don't understand why this happens.. The same goes for other key words in other articles.
JSON Doc example
{
"_id" : ObjectId("5e0f4ded35fbd16f21bf3655"),
"category" : {
"category_id" : "5010",
"slug" : {
"0010" : "profiler",
"0020" : "profiler",
"0030" : "profiler"
},
"label" : {
"0010" : "Profiler",
"0020" : "Profiler",
"0030" : "Profiler"
},
"bg_color" : "#B12CA6",
"txt_color" : "#ffffff",
"main_category_id" : "5000"
},
"featured_image" : {
"main" : "https://img.norrbom.com/article/5e0f4d5e35fbd16f21bf3653/78805a221a988e79ef3f42d7c5bfd418-1578061277668/abby.jpg",
"mobile" : "https://img.norrbom.com/article/5e0f4d5e35fbd16f21bf3653/78805a221a988e79ef3f42d7c5bfd418-1578061277668/abby.jpg",
"square" : "https://img.norrbom.com/article/5e0f4d5e35fbd16f21bf3653/78805a221a988e79ef3f42d7c5bfd418-1578061277668/abby.jpg"
},
"metadata" : {
"title" : "Abby Bengtsson",
"description" : "Hon sprudlar av energi och glädje, vilket smittar av sig på hela redaktionen när hon kliver in hos En Sueco. Med sig har hon sin ursöta följeslagare pomeranianen Melwin",
"og" : {
"title" : "Abby Bengtsson",
"description" : "Hon sprudlar av energi och glädje, vilket smittar av sig på hela redaktionen när hon kliver in hos En Sueco. Med sig har hon sin ursöta följeslagare pomeranianen Melwin",
"image" : "https://img.norrbom.com/article/5e0f4d5e35fbd16f21bf3653/78805a221a988e79ef3f42d7c5bfd418-1578061277668/abby.jpg",
"type" : "article",
"site_name" : "En Sueco",
"url" : "https://www.ensueco.com/profil-abby-bengtsson"
},
"twitter" : {
"title" : "Abby Bengtsson",
"description" : "Hon sprudlar av energi och glädje, vilket smittar av sig på hela redaktionen när hon kliver in hos En Sueco. Med sig har hon sin ursöta följeslagare pomeranianen Melwin",
"card" : "summary",
"image" : "https://img.norrbom.com/article/5e0f4d5e35fbd16f21bf3653/78805a221a988e79ef3f42d7c5bfd418-1578061277668/abby.jpg"
}
},
"tags" : [
],
"title" : "Abby Bengtsson",
"state" : NumberInt(1),
"created" : ISODate("2020-01-01T04:17:00.000+0000"),
"modified" : ISODate("2020-01-01T08:27:54.000+0000"),
"version" : NumberInt(19),
"featured" : false,
"language" : "sv",
"magazines" : [
],
"slug" : "profil-abby-bengtsson",
"published" : ISODate("2020-01-02T10:14:00.000+0000"),
"published_until" : null,
"author_alias" : "Text: Sara Laine, sara#norrbom.com Foto: Mugge Fischer, mugge#norrbom.com",
"main_body" : "... stringified JSON object with article ...",
"article_id" : ObjectId("5e0f4d5e35fbd16f21bf3653"),
"origin" : "cms",
"site" : "0020",
"__v" : NumberInt(0)
}
EDIT 18-01-2020
I just tested something. It seems, that this issue only occurs for documents where the language property is set to sv (Swedish as per MongoDB Language Documentation). If I change the value to da (Danish), the document is being returned, when I search for "Abby".
I have currently solved my issue in production, by setting language_overwrite to a dummy field that doesn't exist.. Now all fields are being returned as they should. But the thing with the swedish language field still confuses me, as it is ONLY when I se the field to "sv" - and what sense does it make to have multiple language documents, and a text index that supposedly should return and search based on locale, if it doesn't work for one particular language variable?
What version of MongoDB are you using? The functionality has changed a bit version to version. See https://docs.mongodb.com/manual/core/index-text/#versions for more details.
I tested this out in 4.2 and got the results you would expect.
To test this out I created a free cluster in Atlas (cloud.mongodb.com) and loaded the sample data. Then I navigated to the Collections tab. The sample data contains a database named "sample_mflix" with a collection called "movies". My collection had a default text index that covered the following fields: cast_text_fullplot_text_genres_text_title_text.
Then I navigated to the Find tab. When I ran the searches you described, I got the results you would expect. Both {$text: {$search: 'abby bengtsson'}} and {$text: {$search: 'abby'}} return many results
Update based on new information added to original question on 18-01-20
I spoke with a colleague who explained to me what is going on:
It is worth noting that text search is designed for stemming with language heuristics. This will have unexpected outcomes with proper nouns like "Abby" (and with multi-language search).
Using query explain output for insight, this is what is happening:
- Abby stems to abby in Swedish but abbi in English, so the term is indexed as abby given the language value of sv in the document.
- A search without any language will default to English (rather than trying to stem in all possible languages) so a default search will not match the indexed term.
To search matching the indexed language they would have to provide a language value, eg: db.articles.find({$text: {$search: 'abby', $language: 'sv'}}).
This is working as designed but doesn't match the user's expectation that queries would be stemmed to match all possible languages (which is probably an unhelpful outcome in terms of relevance).
What they actually want is the solution they arrived at: they should index with a language of none for simple tokenisation without stemming or stopwords.
I'm working with a MongoDB database where each data row is like:
{
"_id" : ObjectId("5c1bbcfbfe78c90007af2676"),
"_class" : "dasdas.dsadas.eewrewrdsfsa.dsad",
"deviceid" : 943955,
"checkid" : "23303140",
"dsc247" : 1,
"description" : "Windows-Dienst-Überprüfung - SQL Server VSS Writer",
"checkstatus" : "testerror",
"extra" : "96 fortlaufende Fehler, Öffnen des Dienstes nicht möglich",
"datetime" : ISODate("2018-12-04T15:55:00.000Z"),
"consecutiveFails" : 96,
"emailalerts" : 1,
"smsalerts" : 0,
"emailrecoveryalerts" : 1,
"smsrecoveryalerts" : 0,
"servertime" : ISODate("2018-12-20T16:02:02.000Z")
}
The database has the following indexes:
_id
datetime
deviceid_description
deviceid_description_checkstatus_datetime
The rows are ordered by datetime, but I want to access the data based on servertime field. The problem is that even a simple query like this:
db.check.find(
{
servertime: new ISODate("2018-12-21 23:17:30.000Z")
}
).limit(1)
takes 30 seconds to give back the result, and if I try to execute any query beyond that it takes forever!!
However, if I do the above query based on datatime it finishes in 0.016 seconds!!
Is there any way around this problem?!
Do I have to change the DB schema? how difficult and time-consuming it would be?!
I use Robo mongo framework. Is there any faster platform to access the data?
The database has SQL charactristics, what about converting that to SQL format?
I recently started to load mongodb using mongoimport and realized that it has added an ObjectId field associated with the "_id". When I query this using the "meteor mongo" commandline it works fine:
meteor:PRIMARY> db.Warehouses.find({"_id":ObjectId("571b7a89a990b5b8779b1315")})
{ "_id" : ObjectId("571b7a89a990b5b8779b1315"), "name" : "Stephan Lumber", "street" : "23 East St", "city" : "Plano", "state" : "TX"}
meteor:PRIMARY>
My code can read the value in "_id" using console.log( "id ", currentId)
It returns ObjectID("571b7a89a990b5b8779b1315")
the value currentId contains the current warehouse ID selected.
However, when I try to use this to access the data in the code I keep getting "undefined" errors. I have tried many different ways. Here are a few:
warehouse = Warehouses.findOne({"_id":Mongo.ObjectID(currentId)});
warehouse = Warehouses.findOne({"_id":ObjectId(currentId)});
Also for some reason "ObjectId" in not recognized on the latter.
I don't know what else to try. Any help would be appreciated.
You don't have to add anything like Mongo.ObjectID or ObjectId you just have to write directly currentId.
warehouse = Warehouses.findOne({"_id": currentId});
I have a basic structure like this:
> db.users.findOne()
{
"_id" : ObjectId("4f384903cd087c6f720066d7"),
"current_sign_in_at" : ISODate("2012-02-12T23:19:31Z"),
"current_sign_in_ip" : "127.0.0.1",
"email" : "something#gmail.com",
"encrypted_password" : "$2a$10$fu9B3M/.Gmi8qe7pXtVCPu94mBVC.gn5DzmQXH.g5snHT4AJSZYCu",
"last_sign_in_at" : ISODate("2012-02-12T23:19:31Z"),
"last_sign_in_ip" : "127.0.0.1",
"name" : "Trip Jameson",
"sign_in_count" : 100,
"usertimes" : [
...thousands and thousands of records like this one....
{
"enddate" : 348268392.115282,
"idle" : 0,
"startdate" : 348268382.116728,
"title" : "My Awesome Title"
},
]
}
So I want to find only usertimes for a single user where the title was "My Awesome Title", and then I want to see what the value for "idle" was in that record(s)
So far all I can figure out is that I can find the entire user record with a search like:
> db.users.find({'usertimes.title':"My Awesome Title"})
This just returns the entire User record though, which is useless for my purposes. Am I misunderstanding something?
Return only partial embedded documents is currently not supported by MongoDB
The matching User record will always be returned (at least with the current MongoDB version).
see this question for similar reference
Filtering embedded documents in MongoDB
This is the correspondent Jira on MongoDB space
http://jira.mongodb.org/browse/SERVER-142
Use:
db.users.find({'usertimes.title': "My Awesome Title"}, {'idle': 1});
May I suggest you take a more detailed look at http://www.mongodb.org/display/DOCS/Querying, it'll explain things for you.