Fiware cygnus: no data have been persisted in mongo DB - mongodb

I am trying to use cygnus with Mongo DB, but no data have been persisted in the data base.
Here is the notification got in cygnus:
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Starting transaction (1437482681-118-0000000000)
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Received data ({ "subscriptionId" : "55a73819d0c457bb20b1d467", "originator" : "localhost", "contextResponses" : [ { "contextElement" : { "type" : "enocean", "isPattern" : "false", "id" : "enocean:myButtonA", "attributes" : [ { "name" : "ButtonValue", "type" : "", "value" : "ON", "metadatas" : [ { "name" : "TimeInstant", "type" : "ISO8601", "value" : "2015-07-20T21:29:56.509293Z" } ] } ] }, "statusCode" : { "code" : "200", "reasonPhrase" : "OK" } } ]})
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Event put in the channel (id=1454120446, ttl=10)
Here is my agent configuration:
cygnusagent.sources = http-source
cygnusagent.sinks = OrionMongoSink
cygnusagent.channels = mongo-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = mongo-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default service (service semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service = def_serv
# Default service path (service path semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = 10
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts gi
# TimestampInterceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# GroupinInterceptor, do not change
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
# Grouping rules for the GroupingInterceptor, put the right absolute path to the file if necessary
# See the doc/design/interceptors document for more details
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /home/egm_demo/usr/fiware-cygnus/conf/grouping_rules.conf
# ============================================
# OrionMongoSink configuration
# sink class, must not be changed
cygnusagent.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.OrionMongoSink
# channel name from where to read notification events
cygnusagent.sinks.mongo-sink.channel = mongo-channel
# FQDN/IP:port where the MongoDB server runs (standalone case) or comma-separated list of FQDN/IP:port pairs where the MongoDB replica set members run
cygnusagent.sinks.mongo-sink.mongo_hosts = 127.0.0.1:27017
# a valid user in the MongoDB server (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_username =
# password for the user above (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_password =
# prefix for the MongoDB databases
#cygnusagent.sinks.mongo-sink.db_prefix = kura
# prefix pro the MongoDB collections
#cygnusagent.sinks.mongo-sink.collection_prefix = button
# true is collection names are based on a hash, false for human redable collections
cygnusagent.sinks.mongo-sink.should_hash = false
# ============================================
# mongo-channel configuration
# channel type (must not be changed)
cygnusagent.channels.mongo-channel.type = memory
# capacity of the channel
cygnusagent.channels.mongo-channel.capacity = 1000
# amount of bytes that can be sent per transaction
cygnusagent.channels.mongo-channel.transactionCapacity = 100
Here is my rule :
{
"grouping_rules": [
{
"id": 1,
"fields": [
"button"
],
"regex": ".*",
"destination": "kura",
"fiware_service_path": "/kuraspath"
}
]
}
Any ideas of what I have missed? Thanks in advance for your help!

This configuration parameter is wrong:
cygnusagent.sinks = OrionMongoSink
According to your configuration, it must be mongo-sink (I mean, you are configuring a Mongo sink named mongo-sink when you configure lines such as cygnusagent.sinks.mongo-sink.type).
In addition, I would recommend you to not using the grouping rules feature; it is an advanced feature about sending the data to a collection different than the default one, and in a first stage I would play with the default behaviour. Thus, my recommendation is to leave the path to the file in cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file, but comment all the JSON within it :)

Related

Connecting Ditto to InfluxDB via Kafka

I am running kafka and influxDB on docker.
I have created a digital twin on ditto, that correctly updates when i send a message with mqtt.
I want the data to be sent from ditto to the influxDB but on influxDB once i create the bucket it shows no data whatsoever.
I have followed this guide:https://www.influxdata.com/blog/getting-started-apache-kafka-influxdb/
(i know this is for a python program but the steps should be the same, i just use the telegraf plugin for kafka consumer instead of the one used in the guide).
I have created the connection and the configuration file of telegraf but nothing happens on InfluxDB.
Here is the telegraf.conf
`
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
urls = ["http://localhost:8086"]
## API token for authentication.
token = "$INFLUX_TOKEN"
## Organization is the name of the organization you wish to write to; must exist.
organization = "digital"
## Destination bucket to write into.
bucket = "arduino"
## The value of this tag will be used to determine the bucket. If this
## tag is not set the 'bucket' option is used as the default.
# bucket_tag = ""
## If true, the bucket tag will not be added to the metric.
# exclude_bucket_tag = false
## Timeout for HTTP messages.
# timeout = "5s"
## Additional HTTP headers
# http_headers = {"X-Special-Header" = "Special-Value"}
## HTTP Proxy override, if unset values the standard proxy environment
## variables are consulted to determine which proxy, if any, should be used.
# http_proxy = "http://corporate.proxy:3128"
## HTTP User-Agent
# user_agent = "telegraf"
## Content-Encoding for write request body, can be set to "gzip" to
## compress body or "identity" to apply no encoding.
# content_encoding = "gzip"
## Enable or disable uint support for writing uints influxdb 2.0.
# influx_uint_support = false
## Optional TLS Config for use on HTTP connections.
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
# Read metrics from Kafka topics
[[inputs.kafka_consumer]]
## Kafka brokers.
brokers = ["localhost:9092"]
## Topics to consume.
topics = ["arduino"]
## When set this tag will be added to all metrics with the topic as the value.
# topic_tag = ""
## Optional Client id
# client_id = "Telegraf"
## Set the minimal supported Kafka version. Setting this enables the use of new
## Kafka features and APIs. Must be 0.10.2.0 or greater.
## ex: version = "1.1.0"
# version = ""
## Optional TLS Config
# enable_tls = false
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
## SASL authentication credentials. These settings should typically be used
## with TLS encryption enabled
# sasl_username = "kafka"
# sasl_password = "secret"
## Optional SASL:
## one of: OAUTHBEARER, PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, GSSAPI
## (defaults to PLAIN)
# sasl_mechanism = ""
## used if sasl_mechanism is GSSAPI (experimental)
# sasl_gssapi_service_name = ""
# ## One of: KRB5_USER_AUTH and KRB5_KEYTAB_AUTH
# sasl_gssapi_auth_type = "KRB5_USER_AUTH"
# sasl_gssapi_kerberos_config_path = "/"
# sasl_gssapi_realm = "realm"
# sasl_gssapi_key_tab_path = ""
# sasl_gssapi_disable_pafxfast = false
## used if sasl_mechanism is OAUTHBEARER (experimental)
# sasl_access_token = ""
## SASL protocol version. When connecting to Azure EventHub set to 0.
# sasl_version = 1
# Disable Kafka metadata full fetch
# metadata_full = false
## Name of the consumer group.
# consumer_group = "telegraf_metrics_consumers"
## Compression codec represents the various compression codecs recognized by
## Kafka in messages.
## 0 : None
## 1 : Gzip
## 2 : Snappy
## 3 : LZ4
## 4 : ZSTD
# compression_codec = 0
## Initial offset position; one of "oldest" or "newest".
# offset = "oldest"
## Consumer group partition assignment strategy; one of "range", "roundrobin" or "sticky".
# balance_strategy = "range"
## Maximum length of a message to consume, in bytes (default 0/unlimited);
## larger messages are dropped
max_message_len = 1000000
## Maximum messages to read from the broker that have not been written by an
## output. For best throughput set based on the number of metrics within
## each message and the size of the output's metric_batch_size.
##
## For example, if each message from the queue contains 10 metrics and the
## output metric_batch_size is 1000, setting this to 100 will ensure that a
## full batch is collected and the write is triggered immediately without
## waiting until the next flush_interval.
# max_undelivered_messages = 1000
## Maximum amount of time the consumer should take to process messages. If
## the debug log prints messages from sarama about 'abandoning subscription
## to [topic] because consuming was taking too long', increase this value to
## longer than the time taken by the output plugin(s).
##
## Note that the effective timeout could be between 'max_processing_time' and
## '2 * max_processing_time'.
# max_processing_time = "100ms"
## The default number of message bytes to fetch from the broker in each
## request (default 1MB). This should be larger than the majority of
## your messages, or else the consumer will spend a lot of time
## negotiating sizes and not actually consuming. Similar to the JVM's
## `fetch.message.max.bytes`.
# consumer_fetch_default = "1MB"
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "json"
the kafka connection as it is on ditto explorer:
{
"id": "0ab4b527-617f-4f4f-8bac-4ffa4b5a8471",
"name": "Kafka 2.x",
"connectionType": "kafka",
"connectionStatus": "open",
"uri": "tcp://192.168.109.74:9092",
"sources": [
{
"addresses": [
"arduino"
],
"consumerCount": 1,
"qos": 1,
"authorizationContext": [
"nginx:ditto"
],
"enforcement": {
"input": "{{ header:device_id }}",
"filters": [
"{{ entity:id }}"
]
},
"acknowledgementRequests": {
"includes": []
},
"headerMapping": {},
"payloadMapping": [
"Ditto"
],
"replyTarget": {
"address": "theReplyTopic",
"headerMapping": {},
"expectedResponseTypes": [
"response",
"error",
"nack"
],
"enabled": true
}
}
],
"targets": [
{
"address": "topic/key",
"topics": [
"_/_/things/twin/events",
"_/_/things/live/messages"
],
"authorizationContext": [
"nginx:ditto"
],
"headerMapping": {}
}
],
"clientCount": 1,
"failoverEnabled": true,
"validateCertificates": true,
"processorPoolSize": 1,
"specificConfig": {
"saslMechanism": "plain",
"bootstrapServers": "localhost:9092"
},
"tags": []
}
the policy file for ditto:
{
"policyId": "my.test:policy1",
"entries": {
"owner": {
"subjects": {
"nginx:ditto": {
"type": "nginx basic auth user"
}
},
"resources": {
"thing:/": {
"grant": ["READ","WRITE"],
"revoke": []
},
"policy:/": {
"grant": ["READ","WRITE"],
"revoke": []
},
"message:/": {
"grant": ["READ","WRITE"],
"revoke": []
}
}
},
"observer": {
"subjects": {
"ditto:observer": {
"type": "observer user"
}
},
"resources": {
"thing:/features": {
"grant": ["READ"],
"revoke": []
},
"policy:/": {
"grant": ["READ"],
"revoke": []
},
"message:/": {
"grant": ["READ"],
"revoke": []
}
}
}
}
}
the configuration file of telegraf but nothing happens on InfluxDB
When Telegraf is reading data from Kafka is needs to transform that into time-series metrics that InfluxDB can digest. You have correctly selected the JSON parser, but there may be additional configuraiton required, or even the use of the more powerful json_v2 parser to be able to set the tags and fields based on the JSON data.
My suggestion is to use the [[outputs.file]] output to see if anything is even getting passed, probably nothing will show up. Then do the following:
determine what your JSON looks like in kafka
what you want that JSON to look like as time-series data in influxdb
use the json_v2 parser to set the apporporiate tags and fields.

Impossible to fetch data from kafka to telegraf

I have a personal project to gather data from an earthquake monitoring API and put it in real time on a kafka topic and then retrieve it on grafana.
For that I use telegraf and this plugin allowing to listen and to recover data of a topic, before sending them to an influxdb bucket and use it as a source for grafana
I did some tests and I know for sure that the data is received on the topic.
{
"Earthquake": {
"metadata": {
"Metadata": {
"generated": "1662533868000",
"url": "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2022-09-07T05:57:47",
"title": "USGS Earthquakes",
"api": "1.13.6",
"count": "7",
"status": "200"
}
},
"features": [
{
"Features": {
"type": "Feature",
"properties": {
"Properties": {
"mag": "1.7",
"place": "34 km NE of Paxson, Alaska",
"time": "1662532868655",
"updated": "1662532994982",
"tz": "0",
"url": "https://earthquake.usgs.gov/earthquakes/eventpage/ak022bhk6641",
"detail": "https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ak022bhk6641&format=geojson",
"felt": "0",
"cdi": "0.0",
"mmi": "0.0",
"alert": "null",
"status": "automatic",
"tsunami": "0",
"sig": "44",
"net": "ak",
"code": "022bhk6641",
"ids": ",ak022bhk6641,",
"sources": ",ak,",
"types": ",origin,phase-data,",
"nst": "0",
"dmin": "0.0",
"rms": "0.6",
"gap": "0.0",
"magType": "ml",
"type": "earthquake"
}
},
"geometry": {
"Geometry": {
"type": "Point",
"coordinates": [
-145.1808,
63.3242,
0.0
]
}
},
"id": "ak022bhk6641"
}
}
],
"bbox": [
-151.5653,
-54.375,
0.0,
-118.39484,
63.3242,
71.4
]
}
}
Here is my telegraf.conf that i use for my influxdb data source:
# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply surround
# them with ${}. For strings the variable must be within quotes (ie, "${STR_VAR}"),
# for numbers and booleans they should be plain (ie, ${INT_VAR}, ${BOOL_VAR})
# Global tags can be specified here in key="value" format.
[global_tags]
# dc = "us-east-1" # will tag all metrics with dc=us-east-1
# rack = "1a"
## Environment variables can be used as tags, and throughout the config file
# user = "$USER"
# Configuration for telegraf agent
[agent]
## Default data collection interval for all inputs
interval = "10s"
## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc.
round_interval = true
## Telegraf will send metrics to outputs in batches of at most
## metric_batch_size metrics.
## This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 1000
## Maximum number of unwritten metrics per output. Increasing this value
## allows for longer periods of output downtime without dropping metrics at the
## cost of higher maximum memory usage.
metric_buffer_limit = 10000
## Collection jitter is used to jitter the collection by a random amount.
## Each plugin will sleep for a random time within jitter before collecting.
## This can be used to avoid many plugins querying things like sysfs at the
## same time, which can have a measurable effect on the system.
collection_jitter = "0s"
## Default flushing interval for all outputs. Maximum flush_interval will be
## flush_interval + flush_jitter
flush_interval = "10s"
## Jitter the flush interval by a random amount. This is primarily to avoid
## large write spikes for users running a large number of telegraf instances.
## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
flush_jitter = "0s"
## Collected metrics are rounded to the precision specified. Precision is
## specified as an interval with an integer + unit (e.g. 0s, 10ms, 2us, 4s).
## Valid time units are "ns", "us" (or "µs"), "ms", "s".
##
## By default or when set to "0s", precision will be set to the same
## timestamp order as the collection interval, with the maximum being 1s:
## ie, when interval = "10s", precision will be "1s"
## when interval = "250ms", precision will be "1ms"
##
## Precision will NOT be used for service inputs. It is up to each individual
## service input to set the timestamp at the appropriate precision.
precision = "0s"
## Override default hostname, if empty use os.Hostname()
hostname = ""
## If set to true, do no set the "host" tag in the telegraf agent.
omit_hostname = false
###############################################################################
# OUTPUT PLUGINS #
###############################################################################
# # Configuration for sending metrics to InfluxDB 2.0
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
urls = ["http://127.0.0.1:8086"]
## Token for authentication.
token = $INFLUX_TOKEN
## Organization is the name of the organization you wish to write to.
organization = "earthWatch"
## Destination bucket to write into.
bucket = "telegraf"
[[inputs.kafka_consumer]]
## Kafka brokers.
brokers = ["localhost:9092"]
## Topics to consume.
topics = ["general-events"]
## Maximum length of a message to consume, in bytes (default 0/unlimited);
## larger messages are dropped
max_message_len = 0
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "json_v2"
[[inputs.file.json_v2]]
measurement_name = "Earthquake"
[[inputs.file.json_v2.tag]]
path = "Earthquake.features.#.Features.properties.Properties.url"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.mag"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.place"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.time"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.updated"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.tz"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.url"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.detail"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.felt"
type = "bool"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.cdi"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.mmi"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.alert"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.status"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.tsunami"
type = "bool"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.sig"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.net"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.code"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.ids"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.sources"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.types"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.nst"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.dmin"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.rms"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.gap"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.magType"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.type"
type = "string"
After multiple attempts and modification of the telegraf.conf file, I was able to retrieve some fields from the data, but some fields are missing and I don't know what configuration allowed me to do this.
the image above shows a request just after the topic has received the data (so <1 minutes ago), but still impossible to display, as if they were not retransmitted.
Any idea what might be missing?
Thank you

How to know number of Request Units (RUs) consumed using azure-cosmosdb-spark

How to know number of Request Units (RUs) consumed using "azure-cosmosdb-spark" SDK ?
And is it possible to restrict not to cross provisioned RUs while reading?
When querying Azure Cosmos from databricks using "azure-cosmosdb-spark" SDK
It is crossing provisioned RUs on Cosmos.
How to restrict load(), not to cross provisioned RUs
I didn't find config to do from documentation
https://github.com/Azure/azure-cosmosdb-spark/wiki/Configuration-references
READ_COSMOS_SPARK_CONN = {
"Endpoint" : "---",
"Masterkey" : --,
"Database" : "--",
"Collection" : "--",
}
READ_COSMOS_SPARK_CONFIG = {
**READ_COSMOS_SPARK_CONN,
"query_pagesize" : "2147483647",
"SamplingRatio" : "1.0",
"schema_samplesize" : "10",
}
df = spark.read.format("com.microsoft.azure.cosmosdb.spark").options(**READ_COSMOS_SPARK_CONFIG).load()
dfResult = (
df
.filter(sf.col('timestamp') >= "2021-02-21T00:00:00")
.filter(sf.col('timestamp') < "2021-02-22T00:00:00")
)
display(dfResult)
Note:
I used "query_custom" but no use. It actually increased my RUs consumptions.
and I can't query on partition key for complex reasons.
But I applied index on "timestamp".

Error while generating node info file with database.runMigration

I added database.runMigration: true to my build.gradle file but I'm getting this error when running deployNodes. What's causing this?
[ERROR] 14:05:21+0200 [main] subcommands.ValidateConfigurationCli.logConfigurationErrors$node - Error(s) while parsing node configuration:
- for path: "database.runMigration": Unknown property 'runMigration'
Here's my build.gradle's deployNode task
task deployNodes(type: net.corda.plugins.Cordform, dependsOn: ['jar']) {
directory "./build/nodes"
ext.drivers = ['.jdbc_driver']
ext.extraConfig = [
'dataSourceProperties.dataSourceClassName' : "org.postgresql.ds.PGSimpleDataSource",
'dataSourceProperties.dataSource.user' : "corda",
'dataSourceProperties.dataSource.password' : "corda1234",
'database.transactionIsolationLevel' : 'READ_COMMITTED',
'database.runMigration' : "true"
]
nodeDefaults {
projectCordapp {
deploy = false
}
cordapp project(':cordapp-contracts-states')
cordapp project(':cordapp')
}
node {
name "O=HUS,L=Helsinki,C=FI"
p2pPort 10008
rpcSettings {
address "localhost:10009"
adminAddress "localhost:10049"
}
webPort 10017
rpcUsers = [[ user: "user1", "password": "test", "permissions": ["ALL"]]]
extraConfig = ext.extraConfig + [
'dataSourceProperties.dataSource.url' :
"jdbc:postgresql://localhost:5432/hus_db?currentSchema=corda_schema"
]
drivers = ext.drivers
}
}
The database.runMigration is a Corda Enterprise property only.
To control database migration in Corda Open Source use initialiseSchema.
initialiseSchema
Boolean which indicates whether to update the database schema at startup (or create the schema when node starts for the first time). If set to false on startup, the node will validate if it’s running against a compatible database schema.
Default: true
You may refer to the below link to look out for other database properties which you can set.
https://docs.corda.net/corda-configuration-file.html

Importing nested JSON documents into Elasticsearch and making them searchable

We have MongoDB-collection which we want to import to Elasticsearch (for now as a one-off effort). For this end, we have exported the collection with monogexport. It is a huge JSON file with entries like the following:
{
"RefData" : {
"DebtInstrmAttrbts" : {
"NmnlValPerUnit" : "2000",
"IntrstRate" : {
"Fxd" : "3.1415"
},
"MtrtyDt" : "2020-01-01",
"TtlIssdNmnlAmt" : "200000000",
"DebtSnrty" : "SNDB"
},
"TradgVnRltdAttrbts" : {
"IssrReq" : "false",
"Id" : "BMTF",
"FrstTradDt" : "2019-04-01T12:34:56.789"
},
"TechAttrbts" : {
"PblctnPrd" : {
"FrDt" : "2019-04-04"
},
"RlvntCmptntAuthrty" : "GB"
},
"FinInstrmGnlAttrbts" : {
"ClssfctnTp" : "DBFNXX",
"ShrtNm" : "AVGO 3.625 10/16/24 c24 (URegS)",
"FullNm" : "AVGO 3 5/8 10/15/24 BOND",
"NtnlCcy" : "USD",
"Id" : "USU1109MAXXX",
"CmmdtyDerivInd" : "false"
},
"Issr" : "549300WV6GIDOZJTVXXX"
}
We are using the following Logstash configuration file to import this data set into Elasticsearch:
input {
file {
path => "/home/elastic/FIRDS.json"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => json
}
}
filter {
mutate {
remove_field => [ "_id", "path", "host" ]
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "firds"
}
}
All this works fine, the data ends up in the index firds of Elasticsearch, and a GET /firds/_search returns all the entries within the _source field.
We understand that this field is not indexed and thus is not searchable, which we are actually after. We want make all of the entries within the original nested JSON searchable in Elasticsearch.
We assume that we have to adjust the filter {} part of our Logstash configuration, but how? For consistency reasons, it would not be bad to keep the original nested JSON structure, but that is not a must. Flattening would also be an option, so that e.g.
"RefData" : {
"DebtInstrmAttrbts" : {
"NmnlValPerUnit" : "2000" ...
becomes a single key-value pair "RefData.DebtInstrmAttrbts.NmnlValPerUnit" : "2000".
It would be great if we could do that immediately with Logstash, without using an additional Python script operating on the JSON file we exported from MongoDB.
EDIT: Workaround
Our current work-around is to (1) dump the MongoDB database to dump.json and then (2) flatten it with jq using the following expression, and finally (3) manually import it into Elastic
ad (2): This is the flattening step:
jq '. as $in | reduce leaf_paths as $path ({}; . + { ($path | join(".")): $in | getpath($path) }) | del(."_id.$oid") '
-c dump.json > flattened.json
References
Walker Rowe: ElasticSearch Nested Queries: How to Search for Embedded Documents
ElasticSearch search in document and in dynamic nested document
Mapping for Nested JSON document in Elasticsearch
Logstash - import nested JSON into Elasticsearch
Remark for the curious: The shown JSON is a (modified) entry from the Financial Instruments Reference Database System (FIRDS), available from the European Securities and Markets Authority (ESMA) who is an European financial regulatory agency overseeing the capital markets.