Expand a date range in mongodb - mongodb

I have a document structure like
{
"startDate": ISODate("2015-01-01T00:00:00Z"),
"endDate" : ISODate("2015-01-10T00:00:00Z"),
"foo" : "bar"
}
Is it possible to expand the date range like this?
{
"dates": [
ISODate("2015-01-01T00:00:00Z"),
ISODate("2015-01-02T00:00:00Z"),
ISODate("2015-01-03T00:00:00Z"),
ISODate("2015-01-04T00:00:00Z"),
ISODate("2015-01-05T00:00:00Z"),
ISODate("2015-01-06T00:00:00Z"),
ISODate("2015-01-07T00:00:00Z"),
ISODate("2015-01-08T00:00:00Z"),
ISODate("2015-01-09T00:00:00Z"),
ISODate("2015-01-10T00:00:00Z")
]
}

As far as I understood you want to add field dates for all your documents. Here is an approach I would use (you can do this in mongoshell):
1) iterate over all the documents modifying them
db.coll.find()..snapshot().forEach(function(o){
o.dates = func(o.startDate, o.endDate);
db.coll.save(o);
});
2) where you function func is something similar to this answer (you need to modify it a little bit because it looks like you need only dates without time included there.

Related

search phrase or words in document with timestamped words

I've been trying to do this for some days, I guess it's time to ask for a little help.
I'm using elasticsearch 6.6 (I believe it could be upgraded if needed) and nest for c# net5.
The task is to create an index where the documents are the result of a speech-to-text recognition, where all the recognized words have a timestamp (so that that said timestamp can be used to find where the word is spoken in the original file). There are 1000+ texts from media files, and every file is 4 hours long (that means usually 5000~15000 words).
Main idea was to split every text in 3 sec long segments, creating a document with the words in that time segment, and index it so that it can be searched.
I thought that it would not work that well, so next idea was to create a document for every window of 10~12 words scanning the document and jumping by 2 words at time, so that the search could at least match a decent phrase, and have highlighting of the hits too.
Since it's yet far from perfect, I thought it would be nice to index every whole text as a document so to maintain its coherency, the problem is the timestamp associated with every word. To keep this relationship I tried to use nested objects in the document:
PUT index-tapes-nested
{
"mappings" : {
"_doc" : {
"properties" : {
"$type" : { "type" : "text" },
"ContentId" : { "type" : "long" },
"Inserted" : { "type" : "date" },
"TrackId" : { "type" : "long" },
"Words" : {
"type" : "nested",
"properties" : {
"StartMillisec" : { "type" : "integer" },
"Word": { "type" : "text" }
}
}
}
}
}
}
This kinda works, but I don't know exactly how to write the query to search in the index.
A very basic query could be for example:
GET index-tapes-nested/_search
{
"query":{
"nested":{
"path":"Words",
"score_mode":"avg",
"query":{
"match":{
"Words.Word": "a bunch of things"
}
},
"inner_hits": {}
}
}
}
but something like that, especially with the avg scoring, gives low quality results; there could be the right document in the hits, but it doesn't get the word order, so it's not certain and it's not clear.
So as far as I understand it the span_near should come handy in these situations, but I get no results:
GET index-tapes-nested/_search
{
"query": {
"nested":{
"path":"Words",
"score_mode": "avg",
"query": {
"span_near": {
"clauses": [
{ "span_term": { "Words.Word": "bunch" }},
{ "span_term": { "Words.Word": "of" }},
{ "span_term": { "Words.Word": "things" }}
],
"slop": 2,
"in_order": true
}
}
}
}
}
I don't know much about elasticsearch, maybe I should change approach and change the model, maybe rewriting the query is enough, I don't know, this is pretty time consuming, so any help is really appreciated (is this a fairly common task?). For the sake of brevity I'm cutting some stuff and some ideas, I'm available to give some data or other examples if needed.
I also had problems with the c# nest client to manage the nested index, but that is another story.
This could be interpreted in a few ways i guess, having something like an "alternative stream" for a field, or metadata for every word, and so on. What i needed was this: https://github.com/elastic/elasticsearch/issues/5736 but it's not yet done, so for now i think i'll go with the annotated_text plugin or the 10 words window.
I have no idea if in the case of indexing single words there can be a query that 'restores' the integrity of the original text (which means 1. grouping them by an id 2. ordering them) so that elasticsearch can give the desired results.
I'll keep searching in the docs if there's something interesting, or if i can hack something to get what i need (like require_field_match or intervals query).

Count documents based on Array value and inner Array value

Before I explain my use case, I'd like to state that yes, I could change this application so that it would store things in a different manner or even split it into 2 collections for that matter. But that's not my intention, and I'd rather want to know if this is at all possible within MongoDB (since I am quite new to MongoDB). I can for sure work around this problem if I'd really need to, but rather looking for a method to achieve what I want (no I am not being lazy here, I really want to know a way to do this).
Let's get to the problem then.
I have a document like below:
{
"_id" : ObjectId("XXXXXXXXXXXXXXXXXXXXX"),
"userId" : "XXXXXXX",
"licenses" : [
{
"domain" : "domain1.com",
"addons" : [
{"slug" : "1"},
{"slug" : "2"}
]
},
{
"domain" : "domain2.com",
"addons" : [
{"slug" : "1"},
]
}
]
}
My goal is to check if a specific domain has a specific addon. When I use the below query to count the documents with domain: domain2.com and addon slug: 2 the result should be: 0. However with the below query it returns 1. I know that this is because the query is executed document wide and not just the license index that matched domain2.com. So my question is, how to do a sub $and (or however you'd call it)?
db.test.countDocuments(
{$and: [
{"licenses.domain": "domain2.com"},
{"licenses.addons.slug": "2"},
]}
)
Basically I am looking for something like this (below isn't working obviously), but below should return 0, not 1:
db.test.countDocuments(
{$and: [
{
"licenses.domain": "domain2.com",
$and: [
{ "licenses.addons.slug": "2"}
]
}
]}
)
I know there is $group and $filter operators, I have been trying many combinations to no avail. I am lost at this point, I feel like I am completely missing the logic of Mongo here. However I believe this must be relatively easy to accomplish with a single query (just not for me I guess).
I have been trying to find my answer on the official documentation and via stack overflow/google, but I really couldn't find any such use case.
Any help is greatly appreciated! Thanks :)
What you are describe is searching for a document whose array contains a single element that matches multiple criteria.
This is exactly what the $elemMatch operator does.
Try using this for the filter part:
{
licenses: {
$elemMatch: {
domain: "domain2.com",
"addons.slug": "2"
}
}
}

Unable to get some info in a subdocument

I am trying to get a value in my mongoDB collection. I would like to get the title of a movie and the sales (nbSold) of this movie for the current month.
Here is how my data are stored :
"_id" : ObjectId("52e6a1aacf0b3b522a8a157a"),
"title" : "Pulp Fiction",
"sales" : [
{
"date" : ISODate("2013-11-01T00:00:00Z"),
"nbSold" : 6
},
{
"date" : ISODate("2013-12-01T00:00:00Z"),
"nbSold" : 2
}
]
I'm using mongoose and this is how I build my query for the december of 2013 :
var query = Movie.find({"title":"Pulp Fiction"}, "title sales.nbSold")
.where("sales.date")
.equals(new Date("2013-12-01"));
However, this is the output that I am receiving :
{ title: 'Pulp Fiction', sales: [ { nbSold: 6 }, { nbSold: 2 } ] }
I would like to have only the title associated with the nbSold of the current month (2 in my case). What is the correct way to do this ?
Thanks a lot for your help.
First off, you should close your where call. You should call .exec(callback) when you're done comparing, and you should be using select instead of equals along with $elemMatch. Try this:
var query = Movie.find({"title":"Pulp Fiction"}, "title sales.nbSold")
.where("sales.date")
.select({ sales: { $elemMatch: new Date("2013-12-01") }})
.exec(function(err, doc) {
return console.dir(doc);
});
You should also have a callback. I ran this on my machine, and it definitely works. I posted code earlier that wasn't quite right, but this does the trick. See the documentation for an example.
Also, I'd be concerned as to how you're seeing if the date matches. If the time is off in your Date object but the date matches, Mongo won't find a match. And I don't know exactly how it works in Mongo, but in JavaScript, you can't compare two Date objects directly for equality as they are both different objects with the same value, so you may come across that problem as well. See this post for an example on how to do it.

Mongodb $elemMatch query inside map

I need to search inside a map element with a certain value in mongodb.
I have this element in data base:
{
"_id": ObjectId("52950e93c4aad399cff0d9f9"),
"_class": "com.company.model.customer.DbCustomer",
"version": NumberLong(0),
"channels": {
"adea3d4e-2a73-4f3e-8a89-a336d6132909": {
"value": "dominik.czech.aal#gmail.com",
"alias": "email1",
"deliveryChannel": "EMAIL",
"status": "GOOD",
"_class": "com.company.model.customer.CustomerEmail"
}
}
}
Where "adea3d4e-2a73-4f3e-8a89-a336d6132909" is a key of a map of channels.
What I want to search is a channel with certain value.
If "channels" were an array the query would be this way:
{ "channels" :
{ "$elemMatch" : { "value" : "dominik.czech.aal#gmail.com" } }
}
But, as channels is a map, I can't use this approach.
Is it possible to search inside a map the same way you search inside an array?
Notice that I want to use a single query, for security reasons I cannot use the map reduce functionality in my database.
Thanks in advance.
AFAIK it's not possible with the current MongoDB operators anyway, without scripting or map/reduce or knowing the keys you want to query in advance.
As a side note, you should think your data structure against how you want to query it - i.e. you should probably consider transform the channels document into an array.
i am not seeing any array set in above sample code, array set must be look like myarray[1,2,3]
so with this sample code if you want to search sub-field value you can try like following.
>db.Collection.find({"channels.value" : "dominik.czech.aal#gmail.com"})
Hope this will help.....

How to count values that greater than a value in an array

Data structure is like bellow:
{
"_id" : ObjectId("5031e3f0a606e8ef48c7da6b"),
"hitTime" : [ 1345446896, 1345446943, 1345446991 ],
"tag" : "a"
}
I want to get entries that have hitTime>1345446991 so that I can rank the tag popularity.
How do I do this? Or is there other data structures that are convenient to do this incremental count?
Documentation:
That is, when "value" is inspected, if it is an array, each value in the array is checked.
Using the MongoDB documentation, your query should look something like this:
db.collectionName.find({
hitTime : {
$gt : 1345446991
}
});
I am not sure if this works properly, but you might get the idea. =)