Query if key exists in an ElasticSearch hash - hash

How do I check if my query terms are the keys in one of my fields? For example, here's a stored document:
{
field1: "some value",
field2: "some other value",
field3: {
something: [1,2],
else: [2,3]
}
}
The query "something" should return that document. The query "some value" should also return that document. Here's what I have so far:
{
query: {
filtered: {
query: {
multi_match: {
query: query,
fields: ['field1', 'field2'],
operator: 'and'
}
},
filter: {
or: [
{
exists: { field: "field3"}
}
]
}
}
}
}

Assuming you want "some value" , in adjesent fashion , following should work fine -
{
"query": {
"filtered": {
"filter": {
"exists": {
"field": "field3"
}
},
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match_phrase": {
"field1": "some value"
}
},
{
"match_phrase": {
"field2": "some value"
}
}
]
}
},
{
"multi_match": {
"query": "something",
"fields": [
"field1",
"field2"
],
"operator": "and"
}
}
]
}
}
}
}
}

{
query: {
filtered: {
filter: {
or: [
{
query: {
multi_match: {
query: query,
fields: ['field1', 'field2'],
operator: "and"
}
}
},
{
exists: { field: "field3.query" }
}
]
}
}
}
}
The only caveat is that if query is a string with multiple terms (or an array), you'll have to create an exists filter for each term.

Related

Cannot find # in OpenSearch query

I have an index that includes a field and when a '#' is input, I cannot get the query to find the #.
Field Data: "#3213939"
Query:
GET /invoices/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"referenceNumber": {
"query": "#32"
}
}
},
{
"wildcard": {
"referenceNumber": {
"value": "*#32*"
}
}
}
]
}
}
}
"#" character drops during standard text analyzer this is why you can't find it.
POST _analyze
{
"text": ["#3213939"]
}
Response:
{
"tokens": [
{
"token": "3213939",
"start_offset": 1,
"end_offset": 8,
"type": "<NUM>",
"position": 0
}
]
}
You can update the analyzer and customize it.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html
OR
you can use referenceNumber.keyword field.
GET test_invoices/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"referenceNumber": {
"query": "#32"
}
}
},
{
"wildcard": {
"referenceNumber.keyword": {
"value": "*#32*"
}
}
}
]
}
}
}

How do I get the current validator of a mongo collection?

How do I get the current validator of a mongo collection?
Or is there no way, and I just need to overwrite it?
You can get it with the db.getCollectionInfos() method.
Example:
Create a collection in an empty database:
db.createCollection( "contacts",
{
validator: { $or:
[
{ phone: { $type: "string" } },
{ email: { $regex: /#mongodb\.com$/ } },
{ status: { $in: [ "Unknown", "Incomplete" ] } }
]
},
validationAction: "warn"
}
)
{ "ok" : 1 }
Run the command:
db.getCollectionInfos()[0].options.validator
Result:
{
"$or" : [
{
"phone": {
"$type": "string"
}
},
{
"email": {
"$regex": /#mongodb\.com$/
}
},
{
"status": {
"$in": [
"Unknown",
"Incomplete"
]
}
}
]
}
The validator is found as an object in the options object.

How to get the items around a specific item?

I'd like to get x items before and y items after (the neighbors of) the record with "lastseen":true:
// (_id fields omitted)
{ "msg": "hello 1" }
{ "msg": "hello 2" }
{ "msg": "hello 3", "lastseen": true }
{ "msg": "hello 4" }
{ "msg": "hello 5" }
For example, if I query with x=1 and y=1 the result should be:
// (_id fields omitted)
{ "msg": "hello 2" }
{ "msg": "hello 3", "lastseen": true }
{ "msg": "hello 4" }
What are my options in mongodb to achieve that?
It must be simpler to implement the logic on client side with several queries. Assuming you documents are ordered by _id:
findOne({"lastseen":true})
find({_id: {$lt: <_id from the previous query>}}).sort({_id:-1}).limit(1)
find({_id: {$gt: <_id from the first query>}}).sort({_id:1}).limit(1)
The only way to do it in a single query I can imagine is to group the documents into array, and then use $indexOfArray in combination with $slice:
db.collection.aggregate([
// ensure lastseen is present to calculate index properly
{ $addFields: {lastseen: { $ifNull: [ "$lastseen", false ] } } },
// get all documents into array
{ $group: { _id:null, docs: { $push:"$$ROOT" } } },
// get index of first matched document
{ $project: { docs:1, match: { $indexOfArray: [ "$docs.lastseen", true ] } } },
// slice the array
{ $project: { docs: { $slice: [ "$docs", { $subtract: [ "$match", 1 ] } , 3 ] } } },
// remove added lastseen
{ $project: { docs:
{ $map: {
input: "$docs",
as: "doc",
in: { $cond: {
if: "$$doc.lastseen",
then: "$$doc",
else: { $arrayToObject: { $filter: {
input: { $objectToArray: "$$doc" },
as: "field",
cond: { $ne: [ "$$field.k", "lastseen" ] }
} } }
} }
} }
} },
// un-group documents from the array
{ $unwind: "$docs" },
{ $replaceRoot: {newRoot:"$docs"}}
]);
but I doubt efficiency of such query.
Answer is very short you can use skip() to skip how many you want
// (_id fields omitted)
{ "msg": "hello 1" }
{ "msg": "hello 2" }`
{ "msg": "hello 3", "lastseen": true }
{ "msg": "hello 4" }
{ "msg": "hello 5" }
Command:
db.collection.find({},{_id:0}).skip(1)

Fetching esJsonRDD from elasticsearch with complex filtering in Spark

I am currently fetching the elasticsearch RDD in our Spark Job filtering based on one-line elastic query as such (example):
val elasticRdds = sparkContext.esJsonRDD(esIndex, s"?default_operator=AND&q=director.name:DAVID + \n movie.name:SEVEN")
Now if our search query becomes complex like:
{
"query": {
"filtered": {
"query": {
"query_string": {
"default_operator": "AND",
"query": "director.name:DAVID + \n movie.name:SEVEN"
}
},
"filter": {
"nested": {
"path": "movieStatus.boxoffice.status",
"query": {
"bool": {
"must": [
{
"match": {
"movieStatus.boxoffice.status.rating": "A"
}
},
{
"match": {
"movieStatus.boxoffice.status.oscar": "false"
}
}
]
}
}
}
}
}
}
}
Can I still convert that query to in-line elastic query to use it with esJsonRDD? Or is there anyway that the above query could still be used as is with esJsonRDD?
If not, what is the better way to fetch such RDDs in Spark?
Because esJsonRDD seems to accept only inline(one line) elastic queries.
Use triple quotes:
val query = """{
"query": {
"filtered": {
"query": {
"query_string": {
"default_operator": "AND",
"query": "director.name:DAVID + \n movie.name:SEVEN"
}
},
"filter": {
"nested": {
"path": "movieStatus.boxoffice.status",
"query": {
"bool": {
"must": [
{
"match": {
"movieStatus.boxoffice.status.rating": "A"
}
},
{
"match": {
"movieStatus.boxoffice.status.oscar": "false"
}
}
]
}
}
}
}
}
}
}"""
val elasticRdds = sparkContext.esJsonRDD(esIndex, query)

SyntaxError in `$and` query

I am trying to dynamically create an $and query in mongodb. But I get error SyntaxError: missing ; before statement #(shell). Is it not possible to create an $and query dynamically?
db.firstcoll.find({
$and: [{
Thing: {
$in: stuff
}
},
{
Thing: {
$ne: "One three"
}
},
{
Category: "Second"
}
]
}).forEach(function(doc1) {
var words = doc1.Thing.split(' ');
var NewQuery;
if (words.length == 1) {
NewQuery = {
$and: [{
Thing: {
$regex: words[0]
}
},
{
Category: doc1.Category
}
]
}
};
if (words.length == 2) {
NewQuery = {
$and: [{
Thing: {
$regex: words[0]
}
}, {
Thing: {
$regex: words[1]
}
}, {
Category: doc1.Category
}]
}
};
db.secondcoll.find(
NewQuery
).forEach(function(doc2) {
print("stuff: ", doc2.stuff)
})
})