return nested fields using elastic4s - scala

I have data stored with a nested location object and can't figure out how to get elastic4s to return the location as part of the result of a search. I have data that when queried (from the REST endpoint) looks like this:
{
"_index": "us_large_cities",
"_type": "city",
"_id": "AU7ke-xU_N_KRYZ5Iii_",
"_score": 1,
"_source": {
"city": "Oakland",
"state": "CA",
"location": {
"lat": "37.8043722",
"lon": "-122.2708026"
}
}
}
When I try querying it using elastic4s like so:
search in "us_large_cities"->"city" fields("location", "city", ) query {
filteredQuery filter {
geoPolygon("location") point(37.9, -122.31) point(37.8, -122.31) point(37.8, -122.25) point(37.9, -122.25)
}
I get back results like this:
{
"_index" : "us_large_cities",
"_type" : "city",
"_id" : "AU7keH9l_N_KRYZ5Iig0",
"_score" : 1.0,
"fields" : {
"city" : [ "Berkeley" ]
}
}
Where I would expect to see "location" but don't. Does anyone know how I specify the fields so that I can actually get the location?

You should try using source filtering instead, as shown below. Note the use of sourceInclude instead of fields.
search in "us_large_cities"->"city" sourceInclude("location", "city") query {
filteredQuery filter {
geoPolygon("location") point(37.9, -122.31) point(37.8, -122.31) point(37.8, -122.25) point(37.9, -122.25)
}

Related

ELASTICSEARCH - Update the value of a field to current date

I'm looking for a way to update the value of a field through update_by_query in the field: reviewed_data in the next document:
{
"reviewed_date" : "2022-07-05T09:31:04.742077702Z",
"timestamp" : "2022-07-05T10:18:52.353852943Z"
}
I'm doing it through the following way:
POST index1/_update_by_query
{
"script": {
"source": "ctx._source.reviewed_date = OffsetDateTime.parse(ctx._source.reviewed_date)" ,
"lang": "painless"
},
"query": {
"term": {
"_id": "23r2fvwc3drgr4f2323dsd"
}
}
}
This way I only manage to update the value of the timestamp date, even though I am indicating reviewed_date.
But I would like to be able to assign the value directly to reviewed_data without modifying other fields.
Mapping:
"timestamp" : {
"type" : "date"
}
"reviewed_date" : {
"type" : "date"
},

How to insert a document in MongoDB with one of fields of type ObjectId in Mulesoft Anypoint Studio?

I am trying to insert a document in MongoDB collection with one of the fields of type ObjectId referring to a document in another collection. Refer to the following as an example:
{
"_id": ObjectId("5d9b5191cbab733354f8345b"),
"accountBalance": 1234.0,
"pinCounter": 3,
"status": "ACTIVE",
"pinNumber": "1234",
"accountType": "CURRENT",
"customerId": ObjectId("5d96e3bd1c9d4400005cbb23")
}
The _id field is generated by MongoDB while customerId (5d96e3bd1c9d4400005cbb23 in above example) is provided in the request.
But when trying to map the data in the above format by appending the string "ObjectId(" is unsuccessful as in that case the field is inserted as as a string.
This is the format to use for ObjectId:
"_id" : {
"\$oid": "5d9b5191cbab733354f8345b"
}
So full DW script would look something like:
%dw 2.0
output application/json
---
{
"_id" : {
"\$oid": "5d9b5191cbab733354f8345b"
},
"accountBalance": 1234.0,
"pinCounter": 3,
"status": "ACTIVE",
"pinNumber": "1234",
"accountType": "CURRENT",
"customerId": {
"\$oid": "5d96e3bd1c9d4400005cbb23"
}
}

How to combine Documents in aggregation pipeline with MongoDB Java driver 3.6?

I am using an aggregation pipeline with the MongoDB Java driver version 3.6. If I have documents that look something like:
doc1 --
{
"CAR": {
"VIN": "ASDF1234",
"YEAR": "2018",
"MAKE": "Honda",
"MODEL": "Accord"
},
"FEATURES": [
{
"AUDIO": "MP3",
"TIRES": "All Season",
"BRAKES": "ABS"
}
]
}
doc2 --
{
"CAR": {
"VIN": "ASDF1234",
"AVAILABILITY": "In Stock"
}
}
And if I submit a query like:
collection.aggregate(
Arrays.asList(
Aggregates.match(
and(
in("CAR.VIN", vinList),
or(
eq("CAR.MAKE", carMake),
eq("CAR.AVAILABILITY", carAvailability),
)
)
)
)
)
Let us assume that there are exactly two different records for which the "CAR.VIN" criteria match for every VIN, and I am going to get two results. Rather than deal with two results each time, I would like to merge the documents so that the result looks like this:
{
"CAR": {
"VIN": "ASDF1234",
"YEAR": "2018",
"MAKE": "Honda",
"MODEL": "Accord",
"AVAILABILITY": "In Stock"
},
"FEATURES": [
{
"AUDIO": "MP3",
"TIRES": "All Season",
"BRAKES": "ABS"
}
]
}
The example where I have two and only two results trivializes my need for this. Imagine that vinList is a list of 10000 values, and it might return 2 x 10000 documents. When I return an AggregateIterable to the client that is calling my code, I do not want to impose the requirement that they have to group or collate the results in any way, but that they will receive one document for each result that has all of the information that they will want to parse, cleanly and easily.
Of course, people will suggest that the data is simply combined into one document with all of the data in the MongoDB collection. For reasons that I cannot control, there are two separate documents corresponding to each VIN in the same collection, and that is something that I am unable to change. There is a value in our system that makes this more reasonable than it might seem, so please don't focus on this apparent problem with the data.
I am trying, with not much luck, to utilize the Aggretes.group() operation to merge the fields in my aggregation pipeline. Accumulators.push seems to be the closest operation to what I need, but I do not want to complicate the document structure with extra arrays, etc. Is there a straightforward approach that I am not seeing?
you can try $mergeObjects added in mongo v3.6
db.cc.aggregate(
[
{
$group: {
_id : "$CAR.VIN",
CAR : {$mergeObjects : "$CAR"},
FEATURES : {$mergeObjects : {$arrayElemAt : ["$FEATURES", 0 ]}}
}
}
]
).pretty()
result
{
"_id" : "ASDF1234",
"CAR" : {
"VIN" : "ASDF1234",
"YEAR" : "2018",
"MAKE" : "Honda",
"MODEL" : "Accord",
"AVAILABILITY" : "In Stock"
},
"FEATURES" : {
"AUDIO" : "MP3",
"TIRES" : "All Season",
"BRAKES" : "ABS"
}
}
>
to get features as array
db.cc.aggregate(
[
{
$group: {
_id : "$CAR.VIN",
CAR : {$mergeObjects : "$CAR"},
FEATURES : {$push : {$arrayElemAt : ["$FEATURES", 0 ]}}
}
}
]
).pretty()
result
{
"_id" : "ASDF1234",
"CAR" : {
"VIN" : "ASDF1234",
"YEAR" : "2018",
"MAKE" : "Honda",
"MODEL" : "Accord",
"AVAILABILITY" : "In Stock"
},
"FEATURES" : [
{
"AUDIO" : "MP3",
"TIRES" : "All Season",
"BRAKES" : "ABS"
},
null
]
}
>

Can I utilize indexes when querying by MongoDB subdocument without known field names?

I have a document structure like follows:
{
"_id": ...,
"name": "Document name",
"properties": {
"prop1": "something",
"2ndprop": "other_prop",
"other3": ["tag1", "tag2"],
}
}
I can't know the actual field names in properties subdocument (they are given by the application user), so I can't create indexes like properties.prop1. Neither can I know the structure of the field values, they can be single value, embedded document or array.
Is there any practical way to do performant queries to the collection with this kind of schema design?
One option that came to my mind is to add a new field to the document, index it and set used field names per document into this field.
{
"_id": ...,
"name": "Document name",
"properties": {
"prop1": "something",
"2ndprop": "other_prop",
"other3": ["tag1", "tag2"],
},
"property_fields": ["prop1", "2ndprop", "other3"]
}
Now I could first run query against property_fields field and after that let MongoDB scan through the found documents to see whether properties.prop1 contains the required value. This is definitely slower, but could be viable.
One way of dealing with this is to use schema like below.
{
"name" : "Document name",
"properties" : [
{
"k" : "prop1",
"v" : "something"
},
{
"k" : "2ndprop",
"v" : "other_prop"
},
{
"k" : "other3",
"v" : "tag1"
},
{
"k" : "other3",
"v" : "tag2"
}
]
}
Then you can index "properties.k" and "properties.v" for example like this:
db.foo.ensureIndex({"properties.k": 1, "properties.v": 1})

How to include different catalogs in a query in MongoDB?

Suppose I have different documents in different collections:
On cars:
{ "_id": 32534534, "color": "red", ... }
On houses:
{ "_id": 93867, "city": "Xanadu", ... }
How can I retrieve the corresponding document to the documents below, in people:
{ "name": "Alonso", "owns": [32534534], ... }
{ "name": "Kublai Khan", "owns": [93867], ... }
Can I use something like the code below?
(Note that I'm not specifying a catalog)
db.find({'_id': 93867})
If not, what would you suggest to achieve this efect?
I have just found this related question: MongoDB: cross-collection queries
Using DBrefs you can store links to documents outside your collection or even in another mongodb database. You will have to fetch the references in separate queries, different drivers handle this differently, for example with the python driver you can auto dereference.
An example of yours in the js shell might look like:
> red_car = {"color": "red", "model": "Ford Perfect"}
{"color": "red", "model": "Ford Perfect"}
> db.cars.save(red_car)
> red_car
{
"color" : "red",
"model" : "Ford Perfect",
"_id" : ObjectId("4f041d96874e6f24e704f887")
}
> // Save as DBRef
> alonso = {"name": "Alonso", "owns": [new DBRef('cars', red_car._id)]}
{
"name" : "Alonso",
"owns" : [
{
"$ref" : "cars",
"$id" : ObjectId("4f041d96874e6f24e704f887")
}
]
}
> db.people.save(alonso)
As you can see DBRefs are a formal spec for referencing objects, that always contain the ObjectId but also can contain information on the database and the collection. In the above example you can see it stores the collection cars in the $ref field. Searching is trivial as you just do a query on the dbref:
> dbref = new DBRef('cars', red_car._id)
> red_car_owner = db.people.find({"owns": {$in: [dbref]}})[0]
> red_car_owner
{
"_id" : ObjectId("4f0448e3a1c5cd097fc36a65"),
"name" : "Alonso",
"owns" : [
{
"$ref" : "cars",
"$id" : ObjectId("4f0448d1a1c5cd097fc36a64")
}
]
}
Dereferencing can be done via the fetch() command in the shell:
> red_car_owner.owns[0].fetch()
{
"_id" : ObjectId("4f0448d1a1c5cd097fc36a64"),
"color" : "red",
"model" : "Ford Perfect"
}
However depending on your use case you may want to optimise this and write some code that iterates over the owns array and does as few find() queries as possible...
I think there is no way of achieving querying from multiple collections at once. I may suggest storing them inside the same collection like below with a type field.
{ "_id": 32534534, "type": "car", "color": "red", ... }
{ "_id": 93867, "type": "house", "city": "Xanadu", ... }
You need to restructure your people document to have a type to be added
{ "name": "Alonso", "owns": {ids:[32534534],type:'car'} ... }
{ "name": "Kublai Khan", "owns":{ids:[93867],type:'house'} ... }
so now you can find the people who owns the red color car by
db.people.find({type:car,ids:32534534})
and houses by
db.people.find({type:house,ids:93867})