Querying a map (<String, Object>) in JSON through MongoDB - mongodb

How to query a map of type Map<String, List> in JSON form, in MongoDB?
Sample JSON:
{
"WIDTH": 810,
"HEIGHT": 465,
"MODULES": {
"23": {
"XNAME": "COMP1",
"PARAMS": {
"_Klockers": {
"TYPE": "text",
"VALUE": "Klocker#3"
},
"SUBSYS": {
"TYPE": "text",
"VALUE": "2"
},
"EP": {
"TYPE": "integer",
"VALUE": "2"
}
}
},
"24": {
"XNAME": "COMP2",
"PARAMS": {
"_Rockers": {
"TYPE": "text",
"VALUE": "Rocker#3"
},
"Driver": {
"TYPE": "binary",
"VALUE": 1
},
"EP": {
"TYPE": "long",
"VALUE": "233"
}
}
},
"25": {
"XNAME": "COMP3",
"PARAMS": {
"_Mockers": {
"TYPE": "text",
"VALUE": "Mocker#3"
},
"SYSMain": {
"TYPE": "text",
"VALUE": "2342"
},
"TLP": {
"TYPE": "double",
"VALUE": "2.3"
}
}
}
}
}
Basically I want to :
List all the "XNAME" field values of all keys in "MODULES".
Expected output : {"COMP1", "COMP2", "COMP3"}
List all the "TYPE" in "PARAMS" object within each key of "MODULES".
Expected output : {"text", "text", "integer", "text", "binary", "long", "text", "text", "double"}
I am new to MongoDB and any help or redirection is appreciated.

You can use this
db.collection.aggregate([
{
$project: {//You require this as your data is dynamic
"modules": {
"$objectToArray": "$MODULES"
}
}
},
{//Destruct the array
"$unwind": "$modules"
},
{
"$project": {//Again, requires the same as keys are dynamic
"types": {
"$objectToArray": "$modules.v.PARAMS"
},
xname: "$modules.v.XNAME"
}
},
{//Destruct the types
$unwind: "$types"
},
{//Get the distinct values
$group: {
"_id": null,
"xname": {
"$addToSet": "$xname"
},
"types": {
"$addToSet": "$types.v.TYPE"
},
}
}
])

Related

Elasticsearch: Class Cast Exception Scala API

I have been using ES 5.6 and the aggregation queries are working
fine. Recently, we upgraded our ES to 7.1 and it has resulted in a
ClassCastException for one of the queries. I'm posting the ES Index
mapping along with the Scala code and ES query that is resulting in
the exception.
Mapping:
{
"orgs": {
"mappings": {
"org": {
"properties": {
"people": {
"type": "nested",
"properties": {
"email": {
"type": "keyword"
},
"first_name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"last_name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"pcsi": {
"type": "keyword"
},
"position": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"position_type": {
"type": "keyword"
},
"source_guid": {
"type": "keyword"
},
"source_lni": {
"type": "keyword"
},
"suffix": {
"type": "keyword"
}
}
}
}
}
}
}
}
Scala Query:
baseQuery.aggs(nestedAggregation("people", OrganizationSchema.People)
.subAggregations(termsAgg("positiontype", "people.position_type")))
Elastic Query:
{"query":{"term":{"_id":{"value":"id"}}},"aggs":{"people":{"nested":{"path":"people"},"aggs":{"positiontype":{"terms":{"field":"people.position_type"}}}}}}
response:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"people": {
"doc_count": 52,
"positiontype": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Board Member",
"doc_count": 28
},
{
"key": "Executive",
"doc_count": 22
},
{
"key": "Others",
"doc_count": 2
}
]
}
}
}
}
Scala code:
def getOrganizationPeopleFilters(client: ElasticClient, entityType: String, entityId: String, request: Option[PostFilterApiRequest], baseQuery: SearchRequest): IO[PostFilters] = {
val q = baseQuery.aggs(nestedAggregation("people", OrganizationSchema.People)
.subAggregations(termsAgg("positiontype", "people.position_type")))
client.execute {
q
}.flatMap { res ⇒
esToJsonOrganizationPeopleFilters(res.result)
}
}
The ES query is running and aggregating correctly in Kibana. But, when we are trying to FlatMap the response in the above Scala api code, it is resulting in a ClassCastException (java.lang.ClassCastException: scala.collection.immutable.Map$Map2 cannot be cast to java.lang.Integer)

Mongo Aggregation using $Max

I have a collection that stores history, i.e. a new document is created every time a change is made to the data, I need to extract fields based on the max value of a date field, however my query keeps returning either all of the dates or requires me to push the fields into an array which make the data hard to analyze for an end-user.
Expected output as CSV:
MAX(DATE), docID, url, type
1579719200216, 12371, www.foodnetwork.com, food
1579719200216, 12371, www.cnn.com, news,
1579719200216, 12371, www.wikipedia.com, info
Sample Doc:
{
"document": {
"revenueGroup": "fn",
"metaDescription": "",
"metaData": {
"audit": {
"lastModified": 1312414124,
"clientId": ""
},
"entities": [],
"docId": 1313943,
"url": ""
},
"rootUrl": "",
"taggedImages": {
"totalSize": 1,
"list": [
{
"image": {
"objectId": "woman-reaching-for-basket",
"caption": "",
"url": "",
"height": 3840,
"width": 5760,
"owner": "Facebook",
"alt": "Woman reaching for basket"
},
"tags": {
"totalSize": 4,
"list": []
}
}
]
},
"title": "The 8 Best Food Items of 2020",
"socialTitle": "The 8 Best Food Items of 2020",
"primaryImage": {
"objectId": "woman-reaching-for-basket.jpg",
"caption": "",
"url": "",
"height": 3840,
"width": 5760,
"owner": "Hero Images / Getty Images",
"alt": "Woman reaching for basket in laundry room"
},
"subheading": "Reduce your footprint with these top-performing diets",
"citations": {
"list": []
},
"docId": 1313943,
"revisionId": "1313943_1579719200216",
"templateType": "LIST",
"documentState": {
"activeDate": 579719200166,
"state": "ACTIVE"
}
},
"url": "",
"items": {
"totalSize": "",
"list": [
{
"type": "recipe",
"data": {
"comInfo": {
"list": [
{
"type": "food",
"id": "https://www.foodnetwork.com"
}
]
},
"type": ""
},
"id": 4,
"uuid": "1313ida-qdad3-42c3-b41d-223q2eq2j"
},
{
"type": "recipe",
"data": {
"comInfo": {
"list": [
{
"type": "news",
"id": "https://www.cnn.com"
},
{
"type": "info",
"id": "https://www.wikipedia.com"
}
]
},
"type": "PRODUCT"
},
"id": 11,
"uuid": "318231jc-da12-4475-8994-283u130d32"
}
]
},
"vertical": "food"
}
Below query:
db.collection.aggregate([
{
$match: {
vertical: "food",
"document.documentState.state": "ACTIVE",
"document.templateType": "LIST"
}
},
{
$unwind: "$document.items"
},
{
$unwind: "$document.items.list"
},
{
$unwind: "$document.items.list.contents"
},
{
$unwind: "$document.items.list.contents.list"
},
{
$match: {
"document.items.list.contents.list.type": "recipe",
"document.revenueGroup": "fn"
}
},
{
$sort: {
"document.revisionId": -1
}
},
{
$group: {
_id: {
_id: {
docId: "$document.docId",
date: {$max: "$document.revisionId"}
},
url: "$document.items.list.contents.list.data.comInfo.list.id",
type: "$document.items.list.contents.list.data.comInfo.list.type"
}
}
},
{
$project: {
_id: 1
}
},
{
$sort: {
"document.items.list.contents.list.id": 1, "document.revisionId": -1
}
}
], {
allowDiskUse: true
})
First of all, you need to go through the documentation of the $group aggregation here.
you should be doing this instead:
{
$group: {
"_id": "$document.docId"
"date": {
$max: "$document.revisionId"
},
"url": {
$first: "$document.items.list.contents.list.data.comInfo.list.id"
},
"type": {
$first:"$document.items.list.contents.list.data.comInfo.list.type"
}
}
}
This will give you the required output.

Design Search service using elastic search

I have a requirement. I am building up a search service for a social network. The search service should return the name of the users that somebody searches. Now it will be limited to user domain search only. I am planning to use elastic search to keep the indexes(user domain details). I will then call the EL from my search service(The search service is on nodejs). I am not able to think of a design on how to create the indexes for EL. Should I use a batch to create the indexes or during creation of users I will create the index.
A good pointers or a good design will be appreciated.
you should create index when new user create
simple example may be useful
your user data like:
`
[
{
"_id": "1",
"status": true,
"username": "mak",
"userdomain": "mydomain.com",
"name": "mak doe"
},
{
"_id": "2",
"status": true,
"username": "janny",
"userdomain": "mydomain.com",
"name": "janny"
},
{
"_id": "3",
"status": true,
"username": "mac",
"userdomain": "newdomain.com",
"name": "mac peter"
},
{
"_id": "4",
"status": true,
"username": "mak",
"userdomain": "mydomain.com",
"name": "mak peter"
},
{
"id": "5",
"status": true,
"username": "mak",
"userdomain": "newdomain.com",
"name": "mak peter"
},
]
`
elastic schema look like as below:
`
PUT socialdata
{
"mappings": {
"users": {
"properties": {
"status": {
"type": "boolean"
},
"name": {
"type": "text"
},
"username": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "keyword_lowercase_analyzer"
},
"english": {
"type": "text",
"analyzer": "english"
}
}
},
"userdomain": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "keyword_lowercase_analyzer"
},
"english": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
}
`
For Bulk upload :
`
POST socialdata/users/_bulk
{ "index": { "_index": "socialdata","_type": "users", "_id": 1 }}
{"status":true,"username": "mak","userdomain": "mydomain.com","name": "mak doe"}
{ "index": { "_index": "socialdata","_type": "users", "_id": 2 }}
{"status":true,"username": "janny","userdomain": "mydomain.com","name": "janny"}
{ "index": { "_index": "socialdata","_type": "users", "_id": 3 }}
{"status":true,"username": "mac","userdomain": "newdomain.com","name": "mac peter"}
{ "index": { "_index": "socialdata","_type": "users", "_id": 4 }}
{"status":true,"username": "mak","userdomain": "mydomain.com","name": "mak peter"}
{ "index": { "_index": "socialdata","_type": "users", "_id": 5 }}
{"status":true,"username": "mak","userdomain": "newdomain.com","name": "mak peter"}
`
for single index:
`
POST socialdata/users/_bulk
{ "index": { "_index": "socialdata","_type": "users", "_id": 1 }}
{"status":true,"username": "mak","userdomain": "mydomain.com","name": "mak doe"}
`
elastic query :
it will return only two record
`
POST socialdata/users/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"username": "mak"
}
},
{
"match": {
"userdomain": "mydomain.com"
}
}
]
}
}
}
`
it will return only one record
`
POST socialdata/users/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"username": "mak"
}
},
{
"match": {
"userdomain": "newdomain.com"
}
}
]
}
}
}
`

How can I use CloudKit web services to query based on a reference field?

I've got two CloudKit data objects that look somewhat like this:
Parent Object:
{
"records": [
{
"recordName": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"recordType": "ParentObject",
"fields": {
"fsYear": {
"value": "2015",
"type": "STRING"
},
"displayOrder": {
"value": 2015221153856287200,
"type": "INT64"
},
"fjpFSGuidForReference": {
"value": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"type": "STRING"
},
"fsDateSearch": {
"value": "2015221153856287158",
"type": "STRING"
},
},
"recordChangeTag": "id4w7ivn",
"created": {
"timestamp": 1439149087571,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
},
"modified": {
"timestamp": 1439149087571,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
}
}
],
"total":
}
Child Object:
{
"records": [
{
"recordName": "2015221153856287168",
"recordType": "ChildObject",
"fields": {
"District": {
"value": "002",
"type": "STRING"
},
"ZipCode": {
"value": "12345",
"type": "STRING"
},
"InspecReference": {
"value": {
"recordName": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"action": "NONE",
"zoneID": {
"zoneName": "_defaultZone"
}
},
"type": "REFERENCE"
},
},
"recordChangeTag": "id4w7lew",
"created": {
"timestamp": 1439149090856,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
},
"modified": {
"timestamp": 1439149090856,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
}
}
],
"total": 1
}
I'm trying to write a query to directly access the CloudKit web service and return the Child Object based on the reference of the parent object.
My test JSON looks something like this:
{"query":{"recordType":"ChildObject","filterBy":{"fieldName":"InspecReference","fieldValue":{ "value" : "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57", "type" : "string" },"comparator":"EQUALS"}},"zoneID":{"zoneName":"_defaultZone"}}
However, I'm getting the following error from CloudKit:
{"uuid":"33db91f3-b768-4a68-9056-216ecc033e9e","serverErrorCode":"BAD_REQUEST","reason":"BadRequestException:
Unexpected input"}
I'm guessing I have the Record Field Dictionary in the query wrong. However, the documentation isn't clear on what this should look like on a reference object.
You have to re-create the actual object of the reference. In this particular case, the JSON looks like this:
{
"query": {
"recordType": "ChildObject",
"filterBy": {
"fieldName": "InspecReference",
"fieldValue": {
"value": {
"recordName": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"action": "NONE"
},
"type": "REFERENCE"
},
"comparator": "EQUALS"
}
},
"zoneID": {
"zoneName": "_defaultZone"
}
}

Get the first document using $in with mongodb

how can get the first element using in in mongo ?
if i've a list like ['car', 'house', 'cat', dog'], and a collection which contains many documents these element, i'd like to find the first document which contain cat, and first which contains dog etc.
I've tried to use limit() but in fact it gives me only one document, which can be either car, or dog or cat etc.
is there a way to combine a limit with $in ?
Thanks
EDIT:
example of data i've:
{
"_id": {
"$oid": "51d53ace9e674607e837d62d"
},
"sensors": [{
"name": "os-hostname",
"value": "yahourt"
}, {
"name": "os-domain-name",
"value": ""
}, {
"name": "os-platform",
"value": "Win32NT"
}, {
"name": "os-fullname",
"value": "Microsoft Windows XP Professional"
}, {
"name": "os-version",
"value": "5.1.2600.131072"
}],
"type": "os",
"serial": "2_os_os-hostname_yahourt"
} {
"_id": {
"$oid": "51d53ace9e674607e837d62e"
},
"sensors": [{
"name": "cpu-id",
"value": "_Total"
}, {
"name": "cpu-usage",
"value": 37.2257042
}],
"type": "cpu",
"serial": "2_cpu_cpu-id_total"
} {
"_id": {
"$oid": "51d53ace9e674607e837d62f"
},
"sensors": [{
"name": "cpu-id",
"value": "0"
}, {
"name": "cpu-usage",
"value": 48.90282
}],
"type": "cpu",
"serial": "2_cpu_cpu-id_0"
} {
"_id": {
"$oid": "51d53ace9e674607e837d630"
},
"sensors": [{
"name": "cpu-id",
"value": "1"
}, {
"name": "cpu-usage",
"value": 25.54859
}],
"type": "cpu",
"serial": "2_cpu_cpu-id_1"
} {
"_id": {
"$oid": "51d53ace9e674607e837d631"
},
"sensors": [{
"name": "volume-name",
"value": "C:"
}, {
"name": "volume-label",
"value": ""
}, {
"name": "volume-total-size",
"value": "52427898880"
}, {
"name": "volume-total-free-space",
"value": "20305170432"
}, {
"name": "volume-percent-free-space",
"value": "38"
}, {
"name": "volume-reads-per-second",
"value": 0.0
}, {
"name": "volume-writes-per-second",
"value": 9.324152
}, {
"name": "volume-read-bytes-per-second",
"value": 0.0
}, {
"name": "volume-write-bytes-per-second",
"value": 194141.6
}, {
"name": "volume-queue-length",
"value": 0.0
}],
"type": "disk",
"serial": "2_disk_volume-name_c"
}
You cannot add a limit to $in but you could cheat by using the aggregation framework:
db.collection.aggregate([
{$match:{serial:{$in:[list_of_serials]}}},
{$sort:{_id:-1}},
{$group:{_id:'$serial',type:{$first:'$type'},sensors:{$first:'$sensors'},id:{$first:'$_id'}}}
]);
Would get a list of all first found of each type.
Edit
The update will get the last inserted according to the _id.