I'm trying to calculate the current age of file in Elasticsearch.
My plan was to do something like current time - timestamp.
I am now using scripts to do this and managed to subtract one date field from another
{ "query" :
{
"match_all" : {}
},
"fields" : ["_source"],
"script_fields" :
{ "date_diff" :
{ "script" : "doc[\"two\"].date-doc[\"one\"].date"}
}
}'
although I think it was done wrong as the answer was definitely not correct. (It worked out to thousands of days difference)
I have also tried using the Elasticsearch date-math suggestions such as "now", "time", "+1h" etc and all of these result in error
"JsonParseException[Unrecognized token 'time': was expecting 'null', 'true', 'false' or NaN\n at [Source: [B#220c4a0b; line: 1, column: 111]]; }]"
I'm unsure now if scripts is even the right thing to use. It seems like it should be simple to do, but I can't seem to find any examples.
Is this even possible to do? Is there a better way to do it?
Any help would be appreciated.
Can you show usecase for that? If you need to show age at UI, for example, its still better to response with creation date. If you need some analysis, filtering by creation date should do the trick.
If you are sure that you need to return "age" of document with ES query, try:
{
"query": {
"match_all": {}
},
"fields": ["_source"],
"script_fields": {
"age": {
"script": "(DateTime.now().getMillis() - doc['created'].value)/(24*60*60*1000)"
}
}
}
ES handles dates as unix epoch timestamps, you should cast current time to integer to reflect ms (with DateTime.now().getMillis()) and then subtract docs value from that. This will give you age in milliseconds. Divide by any coefficient, if needed (i.e., tj get age in days)
Related
I've been trying to do this for some days, I guess it's time to ask for a little help.
I'm using elasticsearch 6.6 (I believe it could be upgraded if needed) and nest for c# net5.
The task is to create an index where the documents are the result of a speech-to-text recognition, where all the recognized words have a timestamp (so that that said timestamp can be used to find where the word is spoken in the original file). There are 1000+ texts from media files, and every file is 4 hours long (that means usually 5000~15000 words).
Main idea was to split every text in 3 sec long segments, creating a document with the words in that time segment, and index it so that it can be searched.
I thought that it would not work that well, so next idea was to create a document for every window of 10~12 words scanning the document and jumping by 2 words at time, so that the search could at least match a decent phrase, and have highlighting of the hits too.
Since it's yet far from perfect, I thought it would be nice to index every whole text as a document so to maintain its coherency, the problem is the timestamp associated with every word. To keep this relationship I tried to use nested objects in the document:
PUT index-tapes-nested
{
"mappings" : {
"_doc" : {
"properties" : {
"$type" : { "type" : "text" },
"ContentId" : { "type" : "long" },
"Inserted" : { "type" : "date" },
"TrackId" : { "type" : "long" },
"Words" : {
"type" : "nested",
"properties" : {
"StartMillisec" : { "type" : "integer" },
"Word": { "type" : "text" }
}
}
}
}
}
}
This kinda works, but I don't know exactly how to write the query to search in the index.
A very basic query could be for example:
GET index-tapes-nested/_search
{
"query":{
"nested":{
"path":"Words",
"score_mode":"avg",
"query":{
"match":{
"Words.Word": "a bunch of things"
}
},
"inner_hits": {}
}
}
}
but something like that, especially with the avg scoring, gives low quality results; there could be the right document in the hits, but it doesn't get the word order, so it's not certain and it's not clear.
So as far as I understand it the span_near should come handy in these situations, but I get no results:
GET index-tapes-nested/_search
{
"query": {
"nested":{
"path":"Words",
"score_mode": "avg",
"query": {
"span_near": {
"clauses": [
{ "span_term": { "Words.Word": "bunch" }},
{ "span_term": { "Words.Word": "of" }},
{ "span_term": { "Words.Word": "things" }}
],
"slop": 2,
"in_order": true
}
}
}
}
}
I don't know much about elasticsearch, maybe I should change approach and change the model, maybe rewriting the query is enough, I don't know, this is pretty time consuming, so any help is really appreciated (is this a fairly common task?). For the sake of brevity I'm cutting some stuff and some ideas, I'm available to give some data or other examples if needed.
I also had problems with the c# nest client to manage the nested index, but that is another story.
This could be interpreted in a few ways i guess, having something like an "alternative stream" for a field, or metadata for every word, and so on. What i needed was this: https://github.com/elastic/elasticsearch/issues/5736 but it's not yet done, so for now i think i'll go with the annotated_text plugin or the 10 words window.
I have no idea if in the case of indexing single words there can be a query that 'restores' the integrity of the original text (which means 1. grouping them by an id 2. ordering them) so that elasticsearch can give the desired results.
I'll keep searching in the docs if there's something interesting, or if i can hack something to get what i need (like require_field_match or intervals query).
I'm struggling with a seemingly simple query in mongodb.
I have a job collection that has objects like:
{
"_id" : ObjectId("5995c1fc3c2a353a782ee51b"),
"stages" : [
{
"start" : ISODate("2017-02-02T22:06:26Z"),
"end" : ISODate("2017-02-03T22:06:26Z"),
"name" : "stage_one"
},
{
"start" : ISODate("2017-02-03T22:06:26Z"),
"end" : ISODate("2017-02-07T20:34:01Z"),
"name" : "stage_two"
}
]
}
I want to get a job whose second stage does not have an end time, i.e. end is null or not defined.
According to the mongo docs on querying for null and querying an array of embedded documents, it would seem the correct query should be:
db.job.findOne({'stages.1.end': null})
However, running that query returns me the job above which does have a non-null end date. In fact, if I run the query with count instead of findOne, I see that all jobs are returned - no filtering is done at all.
For completeness, here is the output from an example on a fresh mongo instance:
So in this example, I would expect db.job.findOne({'stages.1.end': null}) to return nothing since there is only one document and its second stage has a non-null end date.
This feels like the sort of issue where it's just me being an idiot and if so, I apologise profusely.
Thanks in advance for your help and let me know if you need any more details!
EDIT:
After some more experimentation, I think I can achieve what I want with the following:
db.job.find({$or: [{'stages.1.end': { $type: 10 }}, {'stages.1.end': {$exists: false}}]})
While this gets the job done, it doesn't feel like the simplest way and I still don't understand why the original query doesn't work. If anyone could shed some light on this it'd be much appreciated.
Can someone please tell me the difference b/w following two queries. Both works for me and seems to be giving correct results, but i am not sure if really there is any difference or not
Retrieve all students having scores between 80 and 95
var query1 = { 'grade' : {"$gt":80}, 'grade' : {"$lt":95} };
var query2 = { 'grade' : {"$gt":80,"$lt":95} };
I think that you will find that your first form does not actually work, even if it did for you on a minimal sample. And it will fail for a very good reason. Consider the following documents:
{ "grade" : 90 }
{ "grade" : 96 }
{ "grade" : 80 }
If you issue your first query form you will get this for a result:
{ "grade" : 90 }
{ "grade" : 80 }
The reason being that you cannot have the "same key" in a document like this, and one will negate the other. In this case the "right" side key takes precedence over the left key and overwrites. This is common behavior for hash or dictionary structures.
This is why the second form is required and will of course return only the document that matches the conditions that are intended to be specified.
When you have an actual case to use the same field and probably using different conditions you can use the $and operator. Not the best example, but just to clarify:
db.collection.find({ "$and": [
{ "grade": { "$gt": 80 } },
{ "grade": { "$lt": 95 } },
]})
It's real purpose is to combine different conditions on the same field.
For your case though, use the second form you specified.
You should use the second query for the following reason: you query is actually a JSON. And in your first query you are providing a duplicate key (grade).
Nonetheless they are permitted by the JSON RFC, however even Doug Crockford mentioned that he regrets leaving this ambiguity in the spec because it inevitably leads to all kinds of confusion.
Maybe Mongo parses it correctly right now (or may be rather in your case), but you do not know for how long will it be this way (and some other json parsers tells you that you have an error).
So the best way is to think about the first query as bad and only use second one.
I am trying to get a value in my mongoDB collection. I would like to get the title of a movie and the sales (nbSold) of this movie for the current month.
Here is how my data are stored :
"_id" : ObjectId("52e6a1aacf0b3b522a8a157a"),
"title" : "Pulp Fiction",
"sales" : [
{
"date" : ISODate("2013-11-01T00:00:00Z"),
"nbSold" : 6
},
{
"date" : ISODate("2013-12-01T00:00:00Z"),
"nbSold" : 2
}
]
I'm using mongoose and this is how I build my query for the december of 2013 :
var query = Movie.find({"title":"Pulp Fiction"}, "title sales.nbSold")
.where("sales.date")
.equals(new Date("2013-12-01"));
However, this is the output that I am receiving :
{ title: 'Pulp Fiction', sales: [ { nbSold: 6 }, { nbSold: 2 } ] }
I would like to have only the title associated with the nbSold of the current month (2 in my case). What is the correct way to do this ?
Thanks a lot for your help.
First off, you should close your where call. You should call .exec(callback) when you're done comparing, and you should be using select instead of equals along with $elemMatch. Try this:
var query = Movie.find({"title":"Pulp Fiction"}, "title sales.nbSold")
.where("sales.date")
.select({ sales: { $elemMatch: new Date("2013-12-01") }})
.exec(function(err, doc) {
return console.dir(doc);
});
You should also have a callback. I ran this on my machine, and it definitely works. I posted code earlier that wasn't quite right, but this does the trick. See the documentation for an example.
Also, I'd be concerned as to how you're seeing if the date matches. If the time is off in your Date object but the date matches, Mongo won't find a match. And I don't know exactly how it works in Mongo, but in JavaScript, you can't compare two Date objects directly for equality as they are both different objects with the same value, so you may come across that problem as well. See this post for an example on how to do it.
I am attempting to calculate the total amount of money spent being tracked inside of our database. Each order document contains a field "total_price"
I am attempting to use the following code:
db.orders.aggregate({
$group: {
_id: null,
total: {$sum: "$total_price"}
}
})
Unfortunately, the only output I get is: { "result" : [ { "_id" : null, "total" : 0 } ], "ok" : 1 }
But to verifiy there is actually numerical data stored, and just not being totaled: db.orders.find()[0].total_price this results in 8.99
Any help would be greatly appreciated. I have very little experience using MongoDB. I've only covered the basics at this point.
Thank you in advance.
$sum only works with ints, longs and floats. Right now, there is no operator to parse a string into a number, although that would be very useful. You can do this yourself as is described in Mongo convert all numeric fields that are stored as string but that would be slow.
I would suggest you make sure that your application stores numbers as int/long/float, and that you write a script that iterators over all your documents and updates the value. I would also suggest that you add a feature request at https://jira.mongodb.org/browse/SERVER to add an operator that converts a string to a number.