FIWARE CEP (Proton) Error in derived event attributes with Aggregate EPA - aggregate

I receive a ContextUpdate event from Orion, and I set a rule to count the numer of the received event, possibly with the same "entityId", over a specific time window.
In the derived event of the aggregate EPA, I want that the "DeviceID" attribute assume the same value of the "entityId" of the received event.
But, both with "Defered" or "Immediate" evaluation policy, in the DeviceID attribute of the consumer output (text file), I've got:
"DeviceID":"[Ljava.lang.Object;#4456c43f"
Any hint?
Thanks.
Here is the json configuration:
{
"epn": {
"events": [{
"name": "DCUPiazzaCarraraContextUpdate",
"createdDate": "Thu Oct 22 2015",
"attributes": [{
"name": "entityId",
"type": "String",
"dimension": 0
}, {
"name": "entityType",
"type": "String",
"dimension": 0
}, {
"name": "battery",
"type": "Double",
"dimension": 0
}, {
"name": "temperature",
"type": "Double",
"dimension": 0
}, {
"name": "stato",
"type": "Boolean",
"dimension": 0
}, {
"name": "rssi",
"type": "Integer",
"dimension": 0
}, {
"name": "lqi",
"type": "Integer",
"dimension": 0
}, {
"name": "timestamp",
"type": "String",
"dimension": 0
}, {
"name": "numprog",
"type": "Integer",
"dimension": 0
}, {
"name": "dcu",
"type": "String",
"dimension": 0
}]
}, {
"name": "DCUAbsence",
"createdDate": "Sat Nov 07 2015",
"attributes": [{
"name": "entityId",
"type": "String",
"dimension": 0
}, {
"name": "entityType",
"type": "String",
"dimension": 0
}, {
"name": "AlertType",
"type": "String",
"dimension": 0
}, {
"name": "eventnum",
"type": "Integer",
"dimension": 0
}, {
"name": "DeviceID",
"type": "String",
"dimension": 0
}, {
"name": "DeviceContext",
"type": "String",
"dimension": 0
}]
}],
"epas": [{
"name": "AbsenceDCU",
"createdDate": "Sat Nov 07 2015",
"epaType": "Aggregate",
"context": "AbsenceDCUComp",
"inputEvents": [{
"name": "DCUPiazzaCarraraContextUpdate",
"consumptionPolicy": "Reuse",
"instanceSelectionPolicy": "First"
}],
"computedVariables": [{
"name": "eventnum",
"aggregationType": "Count",
"DCUPiazzaCarraraContextUpdate": "1"
}],
"evaluationPolicy": "Immediate",
"cardinalityPolicy": "Unrestricted",
"internalSegmentation": [],
"derivedEvents": [{
"name": "DCUAbsence",
"reportParticipants": false,
"expressions": {
"entityId": "\"Alert\"",
"entityType": "\"PiazzaCarrara\"",
"AlertType": "\"006\"",
"eventnum": "eventnum",
"DeviceID": "DCUPiazzaCarraraContextUpdate.entityId",
"DeviceContext": "DCUPiazzaCarraraContextUpdate.entityType"
}
}]
}],
"contexts": {
"temporal": [{
"name": "AbsenceDCUWindow",
"createdDate": "Sat Nov 07 2015",
"type": "TemporalInterval",
"atStartup": false,
"neverEnding": false,
"initiators": [{
"initiatorType": "Event",
"initiatorPolicy": "Ignore",
"name": "DCUPiazzaCarraraContextUpdate"
}],
"terminators": [{
"terminatorType": "RelativeTime",
"terminationType": "Discard",
"relativeTime": "5000"
}]
}],
"segmentation": [{
"name": "AbsenceDCUID",
"createdDate": "Thu Dec 17 2015",
"participantEvents": [{
"name": "DCUPiazzaCarraraContextUpdate",
"expression": "DCUPiazzaCarraraContextUpdate.entityId"
}, {
"name": "DCUAbsence",
"expression": "DCUAbsence.DeviceID"
}]
}],
"composite": [{
"name": "AbsenceDCUComp",
"createdDate": "Thu Dec 17 2015",
"temporalContexts": [{
"name": "AbsenceDCUWindow"
}],
"segmentationContexts": [{
"name": "AbsenceDCUID"
}]
}]
},
"consumers": [{
"name": "OnFileAlert",
"createdDate": "Thu Oct 22 2015",
"type": "File",
"properties": [{
"name": "filename",
"value": "/var/log/tomcat7/Alert.json"
}, {
"name": "formatter",
"value": "json"
}, {
"name": "delimiter",
"value": ";"
}, {
"name": "tagDataSeparator",
"value": "="
}, {
"name": "SendingDelay",
"value": "1000"
}, {
"name": "dateFormat",
"value": "dd/MM/yyyy-HH:mm:ss"
}],
"events": [{
"name": "DCUAbsence"
}, {
"name": "DCUPiazzaCarraraContextUpdate"
}]
}],
"producers": [],
"name": "CounterExample"
}
}

In an EPA of type Aggregate, since many events from that type can arrive during the context, when you refer to an input event attribute in the derivation expression, you get an array that holds the attribute of all the participant input events of that type.
In your example, DCUPiazzaCarraraContextUpdate.entityId will hold an array of entityId.
If you want a single value you can either use array access function to get one value from the array, or since you use a segmentation context over the entityId, you can get the entityId from that context.
For getting the attribute from the context use context."segmentation-context-name" expression, which in your case is:
context.AbsenceDCUID
In the general case, you can get a specific entry from the array using array access functions. For example, getting the last value from the array:
ArrayGet(DCUPiazzaCarraraContextUpdate.entityId, ArraySize(DCUPiazzaCarraraContextUpdate.entityId)-1)

Related

How to group by single field and return more values together

I'm starting to use apache druid but having some difficult to run native queries (and some SQL too).
1- Is it possible to groupBy a single column while also returning more channels?
2- How could I groupBy a single column, while returning different grouped itens on same query/row ?
Query I'm trying to use:
{
"queryType": "groupBy",
"dataSource": "my-data-source",
"granularity": "all",
"intervals": ["2022-06-27T03:00:00.000Z/2022-06-28T03:00:00.000Z"],
"context:": { "timeout: 30000 },
"dimensions": ["userId"],
"filter": {
"type": "and",
"fields": [
{
"type": "or",
"fields": [{...}]
}
]
},
"aggregations": [
{
"type": "count",
"name": "count"
}
]
}
Tried to add a filtered type inside aggregations:[] but 0 changes happened.
"aggregations": [
{
"type: "count",
"name": "count"
},
{
"type": "filtered",
"filter": {
"type": "selector",
"dimension": "block_id",
"value": "block1"
},
"aggregator": {
"type": "count",
"name": "block1",
"fieldName": "block_id"
}
}
]
Grouping Aggregator also didn't work.
"aggregations": [
{
"type": "count",
"name": "count"
},
{
"type": "grouping",
"name": "groupedData",
"groupings": ["block_id"]
}
],
Below is the image illustrating the results I'm trying to achieve.
Not sure yet how to get the results in the format you want, but as a start, something like this might be a step:
{
"queryType": "groupBy",
"dataSource": {
"type": "table",
"name": "dataTest"
},
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
"filter": null,
"granularity": {
"type": "all"
},
"dimensions": [
{
"type": "default",
"dimension": "d2_ts2",
"outputType": "STRING"
},
{
"type": "default",
"dimension": "d3_email",
"outputType": "STRING"
}
],
"aggregations": [
{
"type": "count",
"name": "myCount",
}
],
"descending": false
}
I'm curious, what is the use case?
Using a SQL query you can do it this way:
SELECT UserID,
sum(1) FILTER (WHERE BlockId = 'block1') as Block1,
sum(1) FILTER (WHERE BlockId = 'block2') as Block2,
sum(1) FILTER (WHERE BlockId = 'block3') as Block3
FROM inline_data
GROUP BY 1
The Native Query for this (from the explain) is:
{
"queryType": "topN",
"dataSource": {
"type": "table",
"name": "inline_data"
},
"virtualColumns": [
{
"type": "expression",
"name": "v0",
"expression": "1",
"outputType": "LONG"
}
],
"dimension": {
"type": "default",
"dimension": "UserID",
"outputName": "d0",
"outputType": "STRING"
},
"metric": {
"type": "dimension",
"previousStop": null,
"ordering": {
"type": "lexicographic"
}
},
"threshold": 101,
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
"filter": null,
"granularity": {
"type": "all"
},
"aggregations": [
{
"type": "filtered",
"aggregator": {
"type": "longSum",
"name": "a0",
"fieldName": "v0",
"expression": null
},
"filter": {
"type": "selector",
"dimension": "BlockId",
"value": "block1",
"extractionFn": null
},
"name": "a0"
},
{
"type": "filtered",
"aggregator": {
"type": "longSum",
"name": "a1",
"fieldName": "v0",
"expression": null
},
"filter": {
"type": "selector",
"dimension": "BlockId",
"value": "block2",
"extractionFn": null
},
"name": "a1"
},
{
"type": "filtered",
"aggregator": {
"type": "longSum",
"name": "a2",
"fieldName": "v0",
"expression": null
},
"filter": {
"type": "selector",
"dimension": "BlockId",
"value": "block3",
"extractionFn": null
},
"name": "a2"
}
],
"postAggregations": [],
"context": {
"populateCache": false,
"sqlOuterLimit": 101,
"sqlQueryId": "bb92e899-c127-49b0-be1b-d4b38909d166",
"useApproximateCountDistinct": false,
"useApproximateTopN": false,
"useCache": false,
"useNativeQueryExplain": true
},
"descending": false
}

avro.io.AvroTypeException: The datum [object] is not an example of the schema

I have been struggling through this issue quite for some time. I am working on AvroProducer(confluent kafka) and getting error related to schema defined.
Here is the complete stacktrace of the issue I am getting:
<!--language: lang-none-->
raise AvroTypeException(self.writer_schema, datum)
avro.io.AvroTypeException: The datum {'totalDifficulty': 2726165051, 'stateRoot': '0xf09bd6730b3ae7f5728836564837d7f776a8f7333628c8b84cb57d7c6d48ebba', 'sha3Uncles': '0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347', 'size': 538, 'logs': [], 'gasLimit': 8000000, 'mixHash': '0x410b2b19519be16496727c93515f399072ffecf06defe4913d00eb4d10bb7351', 'logsBloom': '0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000', 'nonce': '0x18dc6c0d30839c91', 'proofOfAuthorityData': '0xd883010817846765746888676f312e31302e34856c696e7578', 'number': 5414, 'timestamp': 1552577641, 'difficulty': 589091, 'gasUsed': 0, 'miner': '0x48FA5EBc2f0D82B5D52faAe624Fa2426998ab492', 'hash': '0x71259991acb407a85befa8b3c5df26a94a11a6c08f92f3e3b7c9c0e8e1f5916d', 'transactionsRoot': '0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421', 'receiptsRoot': '0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421', 'transactions': [], 'parentHash': '0x9f0c25eeab86fc144296cb034c94857beed331936016d60c0986a35ac07d9c68', 'uncles': []} is not an example of the schema {
"type": "record",
"name": "value",
"namespace": "exporter.value.opsnetBlock",
"fields": [
{
"type": "int",
"name": "difficulty"
},
{
"type": "string",
"name": "proofOfAuthorityData"
},
{
"type": "int",
"name": "gasLimit"
},
{
"type": "int",
"name": "gasUsed"
},
{
"type": "string",
"name": "hash"
},
{
"type": "string",
"name": "logsBloom"
},
{
"type": "int",
"name": "size"
},
{
"type": "string",
"name": "miner"
},
{
"type": "string",
"name": "mixHash"
},
{
"type": "string",
"name": "nonce"
},
{
"type": "int",
"name": "number"
},
{
"type": "string",
"name": "parentHash"
},
{
"type": "string",
"name": "receiptsRoot"
},
{
"type": "string",
"name": "sha3Uncles"
},
{
"type": "string",
"name": "stateRoot"
},
{
"type": "int",
"name": "timestamp"
},
{
"type": "int",
"name": "totalDifficulty"
},
{
"type": "string",
"name": "transactionsRoot"
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "transactions"
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "uncles"
},
{
"type": {
"type": "array",
"items": {
"type": "record",
"name": "Child",
"namespace": "exporter.value.opsnetBlock",
"fields": [
{
"type": "string",
"name": "address"
},
{
"type": "string",
"name": "blockHash"
},
{
"type": "int",
"name": "blockNumber"
},
{
"type": "string",
"name": "data"
},
{
"type": "int",
"name": "logIndex"
},
{
"type": "boolean",
"name": "removed"
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "topics"
},
{
"type": "string",
"name": "transactionHash"
},
{
"type": "int",
"name": "transactionIndex"
}
]
}
},
"name": "logs"
}
]
}
Can anybody please tell me where am I going wrong in this?
Thanks in advance

azure data factory start pipeline different from starting job

I am getting crazy on this issue, I am running an Azure data factory V1, I need to schedule a copy job every week from 01/03/2009 through 01/31/2009, so I defined this schedule on the pipeline:
"start": "2009-01-03T00:00:00Z",
"end": "2009-01-31T00:00:00Z",
"isPaused": false,
monitoring the pipeline, the data factory schedule on these date:
12/29/2008
01/05/2009
01/12/2009
01/19/2009
01/26/2009
instead of this wanted schedule:
01/03/2009
01/10/2009
01/17/2009
01/24/2009
01/31/2009
why the starting date defined on the pipeline doesn't correspond to the schedule date on the monitor?
Many thanks!
Here is the JSON Pipeline:
{
"name": "CopyPipeline-blob2datalake",
"properties": {
"description": "copy from blob storage to datalake directory structure",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "script/dat230.usql",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 5,
"priority": 100,
"parameters": {
"salesfile": "$$Text.Format('/DAT230/{0:yyyy}/{0:MM}/{0:dd}.txt', Date.StartOfDay (SliceStart))",
"lineitemsfile": "$$Text.Format('/dat230/dataloads/{0:yyyy}/{0:MM}/{0:dd}/factinventory/fact.csv', Date.StartOfDay (SliceStart))"
}
},
"inputs": [
{
"name": "InputDataset-dat230"
}
],
"outputs": [
{
"name": "OutputDataset-dat230"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"retry": 1
},
"scheduler": {
"frequency": "Day",
"interval": 7
},
"name": "DataLakeAnalyticsUSqlActivityTemplate",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2009-01-03T00:00:00Z",
"end": "2009-01-11T00:00:00Z",
"isPaused": false,
"hubName": "edxlearningdf_hub",
"pipelineMode": "Scheduled"
}
}
and here the datasets:
{
"name": "InputDataset-dat230",
"properties": {
"structure": [
{
"name": "Date",
"type": "Datetime"
},
{
"name": "StoreID",
"type": "Int64"
},
{
"name": "StoreName",
"type": "String"
},
{
"name": "ProductID",
"type": "Int64"
},
{
"name": "ProductName",
"type": "String"
},
{
"name": "Color",
"type": "String"
},
{
"name": "Size",
"type": "String"
},
{
"name": "Manufacturer",
"type": "String"
},
{
"name": "OnHandQuantity",
"type": "Int64"
},
{
"name": "OnOrderQuantity",
"type": "Int64"
},
{
"name": "SafetyStockQuantity",
"type": "Int64"
},
{
"name": "UnitCost",
"type": "Double"
},
{
"name": "DaysInStock",
"type": "Int64"
},
{
"name": "MinDayInStock",
"type": "Int64"
},
{
"name": "MaxDayInStock",
"type": "Int64"
}
],
"published": false,
"type": "AzureBlob",
"linkedServiceName": "Source-BlobStorage-dat230",
"typeProperties": {
"fileName": "*.txt.gz",
"folderPath": "dat230/{year}/{month}/{day}/",
"format": {
"type": "TextFormat",
"columnDelimiter": "\t",
"firstRowAsHeader": true
},
"partitionedBy": [
{
"name": "year",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "yyyy"
}
},
{
"name": "month",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "MM"
}
},
{
"name": "day",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "dd"
}
}
],
"compression": {
"type": "GZip"
}
},
"availability": {
"frequency": "Day",
"interval": 7
},
"external": true,
"policy": {}
}
}
{
"name": "OutputDataset-dat230",
"properties": {
"structure": [
{
"name": "Date",
"type": "Datetime"
},
{
"name": "StoreID",
"type": "Int64"
},
{
"name": "StoreName",
"type": "String"
},
{
"name": "ProductID",
"type": "Int64"
},
{
"name": "ProductName",
"type": "String"
},
{
"name": "Color",
"type": "String"
},
{
"name": "Size",
"type": "String"
},
{
"name": "Manufacturer",
"type": "String"
},
{
"name": "OnHandQuantity",
"type": "Int64"
},
{
"name": "OnOrderQuantity",
"type": "Int64"
},
{
"name": "SafetyStockQuantity",
"type": "Int64"
},
{
"name": "UnitCost",
"type": "Double"
},
{
"name": "DaysInStock",
"type": "Int64"
},
{
"name": "MinDayInStock",
"type": "Int64"
},
{
"name": "MaxDayInStock",
"type": "Int64"
}
],
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "Destination-DataLakeStore-dat230",
"typeProperties": {
"fileName": "txt.gz",
"folderPath": "dat230/dataloads/{year}/{month}/{day}/factinventory/",
"format": {
"type": "TextFormat",
"columnDelimiter": "\t"
},
"partitionedBy": [
{
"name": "year",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "yyyy"
}
},
{
"name": "month",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "MM"
}
},
{
"name": "day",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "dd"
}
}
]
},
"availability": {
"frequency": "Day",
"interval": 7
},
"external": false,
"policy": {}
}
}
You need to look at the time slices for the datasets and there activity.
The pipeline schedule (badly named) only defines the start and end period in which any activities can use to provision and run there time slices.
ADFv1 doesn't use a recursive schedule like the SQL Server Agent. Each execution has to be provisioned at an interval on the time line (the schedule) you create.
For example, if you pipeline start and end is for 1 year. But your dataset and activity has a frequency of monthly and interval of 1 month you will only get 12 executions of the whatever is happening.
Apologies, but the concept of time slices is a little difficult to explain if you aren't already familiar. Maybe read this post: https://blogs.msdn.microsoft.com/ukdataplatform/2016/05/03/demystifying-activity-scheduling-with-azure-data-factory/
Hope this helps.
Would you share with us the json for the datasets and the pipeline? It would be easier to help you having that.
In the meanwhile, check if you are using "style": "StartOfInterval" at the scheduler property of the activity, and also check if you are using an offset.
Cheers!

what is the date format from stash api?

In the below json response, what is the date format for createdDate and updatedDate? I am not sure how to work in reverse to find what format the api is using for date. I couldn't find this any where in the documentation.
{
"size": 1,
"limit": 25,
"isLastPage": true,
"values": [
{
"id": 101,
"version": 1,
"title": "Talking Nerdy",
"description": "It’s a kludge, but put the tuple from the database in the cache.",
"state": "OPEN",
"open": true,
"closed": false,
"createdDate": 1359075920,
"updatedDate": 1359085920,
"fromRef": {
"id": "refs/heads/feature-ABC-123",
"repository": {
"slug": "my-repo",
"name": null,
"project": {
"key": "PRJ"
}
}
},
"toRef": {
"id": "refs/heads/master",
"repository": {
"slug": "my-repo",
"name": null,
"project": {
"key": "PRJ"
}
}
},
"locked": false,
"author": {
"user": {
"name": "tom",
"emailAddress": "tom#example.com",
"id": 115026,
"displayName": "Tom",
"active": true,
"slug": "tom",
"type": "NORMAL"
},
"role": "AUTHOR",
"approved": true
},
"reviewers": [
{
"user": {
"name": "jcitizen",
"emailAddress": "jane#example.com",
"id": 101,
"displayName": "Jane Citizen",
"active": true,
"slug": "jcitizen",
"type": "NORMAL"
},
"role": "REVIEWER",
"approved": true
}
],
"participants": [
{
"user": {
"name": "dick",
"emailAddress": "dick#example.com",
"id": 3083181,
"displayName": "Dick",
"active": true,
"slug": "dick",
"type": "NORMAL"
},
"role": "PARTICIPANT",
"approved": false
},
{
"user": {
"name": "harry",
"emailAddress": "harry#example.com",
"id": 99049120,
"displayName": "Harry",
"active": true,
"slug": "harry",
"type": "NORMAL"
},
"role": "PARTICIPANT",
"approved": true
}
],
"link": {
"url": "http://link/to/pullrequest",
"rel": "self"
},
"links": {
"self": [
{
"href": "http://link/to/pullrequest"
}
]
}
}
],
"start": 0
}
Just making a note that in my case, it is a UNIX timestamp, but I have to remove three trailing zeroes. E.g. the data looks like this:
"createdDate":1555621993000
If interpreted as a UNIX timestamp, that would be 09/12/51265 # 4:16am (UTC).
By removing the three trailing zeroes I get 1555621993, which is the correct time 04/18/2019 # 9:13pm (UTC)
Your mileage may vary but that was a key discovery for me :)
It looks like a UNIX timestamp.
https://en.wikipedia.org/wiki/Unix_time

Get the first document using $in with mongodb

how can get the first element using in in mongo ?
if i've a list like ['car', 'house', 'cat', dog'], and a collection which contains many documents these element, i'd like to find the first document which contain cat, and first which contains dog etc.
I've tried to use limit() but in fact it gives me only one document, which can be either car, or dog or cat etc.
is there a way to combine a limit with $in ?
Thanks
EDIT:
example of data i've:
{
"_id": {
"$oid": "51d53ace9e674607e837d62d"
},
"sensors": [{
"name": "os-hostname",
"value": "yahourt"
}, {
"name": "os-domain-name",
"value": ""
}, {
"name": "os-platform",
"value": "Win32NT"
}, {
"name": "os-fullname",
"value": "Microsoft Windows XP Professional"
}, {
"name": "os-version",
"value": "5.1.2600.131072"
}],
"type": "os",
"serial": "2_os_os-hostname_yahourt"
} {
"_id": {
"$oid": "51d53ace9e674607e837d62e"
},
"sensors": [{
"name": "cpu-id",
"value": "_Total"
}, {
"name": "cpu-usage",
"value": 37.2257042
}],
"type": "cpu",
"serial": "2_cpu_cpu-id_total"
} {
"_id": {
"$oid": "51d53ace9e674607e837d62f"
},
"sensors": [{
"name": "cpu-id",
"value": "0"
}, {
"name": "cpu-usage",
"value": 48.90282
}],
"type": "cpu",
"serial": "2_cpu_cpu-id_0"
} {
"_id": {
"$oid": "51d53ace9e674607e837d630"
},
"sensors": [{
"name": "cpu-id",
"value": "1"
}, {
"name": "cpu-usage",
"value": 25.54859
}],
"type": "cpu",
"serial": "2_cpu_cpu-id_1"
} {
"_id": {
"$oid": "51d53ace9e674607e837d631"
},
"sensors": [{
"name": "volume-name",
"value": "C:"
}, {
"name": "volume-label",
"value": ""
}, {
"name": "volume-total-size",
"value": "52427898880"
}, {
"name": "volume-total-free-space",
"value": "20305170432"
}, {
"name": "volume-percent-free-space",
"value": "38"
}, {
"name": "volume-reads-per-second",
"value": 0.0
}, {
"name": "volume-writes-per-second",
"value": 9.324152
}, {
"name": "volume-read-bytes-per-second",
"value": 0.0
}, {
"name": "volume-write-bytes-per-second",
"value": 194141.6
}, {
"name": "volume-queue-length",
"value": 0.0
}],
"type": "disk",
"serial": "2_disk_volume-name_c"
}
You cannot add a limit to $in but you could cheat by using the aggregation framework:
db.collection.aggregate([
{$match:{serial:{$in:[list_of_serials]}}},
{$sort:{_id:-1}},
{$group:{_id:'$serial',type:{$first:'$type'},sensors:{$first:'$sensors'},id:{$first:'$_id'}}}
]);
Would get a list of all first found of each type.
Edit
The update will get the last inserted according to the _id.