Connecting Ditto to InfluxDB via Kafka - apache-kafka

I am running kafka and influxDB on docker.
I have created a digital twin on ditto, that correctly updates when i send a message with mqtt.
I want the data to be sent from ditto to the influxDB but on influxDB once i create the bucket it shows no data whatsoever.
I have followed this guide:https://www.influxdata.com/blog/getting-started-apache-kafka-influxdb/
(i know this is for a python program but the steps should be the same, i just use the telegraf plugin for kafka consumer instead of the one used in the guide).
I have created the connection and the configuration file of telegraf but nothing happens on InfluxDB.
Here is the telegraf.conf
`
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
urls = ["http://localhost:8086"]
## API token for authentication.
token = "$INFLUX_TOKEN"
## Organization is the name of the organization you wish to write to; must exist.
organization = "digital"
## Destination bucket to write into.
bucket = "arduino"
## The value of this tag will be used to determine the bucket. If this
## tag is not set the 'bucket' option is used as the default.
# bucket_tag = ""
## If true, the bucket tag will not be added to the metric.
# exclude_bucket_tag = false
## Timeout for HTTP messages.
# timeout = "5s"
## Additional HTTP headers
# http_headers = {"X-Special-Header" = "Special-Value"}
## HTTP Proxy override, if unset values the standard proxy environment
## variables are consulted to determine which proxy, if any, should be used.
# http_proxy = "http://corporate.proxy:3128"
## HTTP User-Agent
# user_agent = "telegraf"
## Content-Encoding for write request body, can be set to "gzip" to
## compress body or "identity" to apply no encoding.
# content_encoding = "gzip"
## Enable or disable uint support for writing uints influxdb 2.0.
# influx_uint_support = false
## Optional TLS Config for use on HTTP connections.
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
# Read metrics from Kafka topics
[[inputs.kafka_consumer]]
## Kafka brokers.
brokers = ["localhost:9092"]
## Topics to consume.
topics = ["arduino"]
## When set this tag will be added to all metrics with the topic as the value.
# topic_tag = ""
## Optional Client id
# client_id = "Telegraf"
## Set the minimal supported Kafka version. Setting this enables the use of new
## Kafka features and APIs. Must be 0.10.2.0 or greater.
## ex: version = "1.1.0"
# version = ""
## Optional TLS Config
# enable_tls = false
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
## SASL authentication credentials. These settings should typically be used
## with TLS encryption enabled
# sasl_username = "kafka"
# sasl_password = "secret"
## Optional SASL:
## one of: OAUTHBEARER, PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, GSSAPI
## (defaults to PLAIN)
# sasl_mechanism = ""
## used if sasl_mechanism is GSSAPI (experimental)
# sasl_gssapi_service_name = ""
# ## One of: KRB5_USER_AUTH and KRB5_KEYTAB_AUTH
# sasl_gssapi_auth_type = "KRB5_USER_AUTH"
# sasl_gssapi_kerberos_config_path = "/"
# sasl_gssapi_realm = "realm"
# sasl_gssapi_key_tab_path = ""
# sasl_gssapi_disable_pafxfast = false
## used if sasl_mechanism is OAUTHBEARER (experimental)
# sasl_access_token = ""
## SASL protocol version. When connecting to Azure EventHub set to 0.
# sasl_version = 1
# Disable Kafka metadata full fetch
# metadata_full = false
## Name of the consumer group.
# consumer_group = "telegraf_metrics_consumers"
## Compression codec represents the various compression codecs recognized by
## Kafka in messages.
## 0 : None
## 1 : Gzip
## 2 : Snappy
## 3 : LZ4
## 4 : ZSTD
# compression_codec = 0
## Initial offset position; one of "oldest" or "newest".
# offset = "oldest"
## Consumer group partition assignment strategy; one of "range", "roundrobin" or "sticky".
# balance_strategy = "range"
## Maximum length of a message to consume, in bytes (default 0/unlimited);
## larger messages are dropped
max_message_len = 1000000
## Maximum messages to read from the broker that have not been written by an
## output. For best throughput set based on the number of metrics within
## each message and the size of the output's metric_batch_size.
##
## For example, if each message from the queue contains 10 metrics and the
## output metric_batch_size is 1000, setting this to 100 will ensure that a
## full batch is collected and the write is triggered immediately without
## waiting until the next flush_interval.
# max_undelivered_messages = 1000
## Maximum amount of time the consumer should take to process messages. If
## the debug log prints messages from sarama about 'abandoning subscription
## to [topic] because consuming was taking too long', increase this value to
## longer than the time taken by the output plugin(s).
##
## Note that the effective timeout could be between 'max_processing_time' and
## '2 * max_processing_time'.
# max_processing_time = "100ms"
## The default number of message bytes to fetch from the broker in each
## request (default 1MB). This should be larger than the majority of
## your messages, or else the consumer will spend a lot of time
## negotiating sizes and not actually consuming. Similar to the JVM's
## `fetch.message.max.bytes`.
# consumer_fetch_default = "1MB"
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "json"
the kafka connection as it is on ditto explorer:
{
"id": "0ab4b527-617f-4f4f-8bac-4ffa4b5a8471",
"name": "Kafka 2.x",
"connectionType": "kafka",
"connectionStatus": "open",
"uri": "tcp://192.168.109.74:9092",
"sources": [
{
"addresses": [
"arduino"
],
"consumerCount": 1,
"qos": 1,
"authorizationContext": [
"nginx:ditto"
],
"enforcement": {
"input": "{{ header:device_id }}",
"filters": [
"{{ entity:id }}"
]
},
"acknowledgementRequests": {
"includes": []
},
"headerMapping": {},
"payloadMapping": [
"Ditto"
],
"replyTarget": {
"address": "theReplyTopic",
"headerMapping": {},
"expectedResponseTypes": [
"response",
"error",
"nack"
],
"enabled": true
}
}
],
"targets": [
{
"address": "topic/key",
"topics": [
"_/_/things/twin/events",
"_/_/things/live/messages"
],
"authorizationContext": [
"nginx:ditto"
],
"headerMapping": {}
}
],
"clientCount": 1,
"failoverEnabled": true,
"validateCertificates": true,
"processorPoolSize": 1,
"specificConfig": {
"saslMechanism": "plain",
"bootstrapServers": "localhost:9092"
},
"tags": []
}
the policy file for ditto:
{
"policyId": "my.test:policy1",
"entries": {
"owner": {
"subjects": {
"nginx:ditto": {
"type": "nginx basic auth user"
}
},
"resources": {
"thing:/": {
"grant": ["READ","WRITE"],
"revoke": []
},
"policy:/": {
"grant": ["READ","WRITE"],
"revoke": []
},
"message:/": {
"grant": ["READ","WRITE"],
"revoke": []
}
}
},
"observer": {
"subjects": {
"ditto:observer": {
"type": "observer user"
}
},
"resources": {
"thing:/features": {
"grant": ["READ"],
"revoke": []
},
"policy:/": {
"grant": ["READ"],
"revoke": []
},
"message:/": {
"grant": ["READ"],
"revoke": []
}
}
}
}
}

the configuration file of telegraf but nothing happens on InfluxDB
When Telegraf is reading data from Kafka is needs to transform that into time-series metrics that InfluxDB can digest. You have correctly selected the JSON parser, but there may be additional configuraiton required, or even the use of the more powerful json_v2 parser to be able to set the tags and fields based on the JSON data.
My suggestion is to use the [[outputs.file]] output to see if anything is even getting passed, probably nothing will show up. Then do the following:
determine what your JSON looks like in kafka
what you want that JSON to look like as time-series data in influxdb
use the json_v2 parser to set the apporporiate tags and fields.

Related

Impossible to fetch data from kafka to telegraf

I have a personal project to gather data from an earthquake monitoring API and put it in real time on a kafka topic and then retrieve it on grafana.
For that I use telegraf and this plugin allowing to listen and to recover data of a topic, before sending them to an influxdb bucket and use it as a source for grafana
I did some tests and I know for sure that the data is received on the topic.
{
"Earthquake": {
"metadata": {
"Metadata": {
"generated": "1662533868000",
"url": "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2022-09-07T05:57:47",
"title": "USGS Earthquakes",
"api": "1.13.6",
"count": "7",
"status": "200"
}
},
"features": [
{
"Features": {
"type": "Feature",
"properties": {
"Properties": {
"mag": "1.7",
"place": "34 km NE of Paxson, Alaska",
"time": "1662532868655",
"updated": "1662532994982",
"tz": "0",
"url": "https://earthquake.usgs.gov/earthquakes/eventpage/ak022bhk6641",
"detail": "https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ak022bhk6641&format=geojson",
"felt": "0",
"cdi": "0.0",
"mmi": "0.0",
"alert": "null",
"status": "automatic",
"tsunami": "0",
"sig": "44",
"net": "ak",
"code": "022bhk6641",
"ids": ",ak022bhk6641,",
"sources": ",ak,",
"types": ",origin,phase-data,",
"nst": "0",
"dmin": "0.0",
"rms": "0.6",
"gap": "0.0",
"magType": "ml",
"type": "earthquake"
}
},
"geometry": {
"Geometry": {
"type": "Point",
"coordinates": [
-145.1808,
63.3242,
0.0
]
}
},
"id": "ak022bhk6641"
}
}
],
"bbox": [
-151.5653,
-54.375,
0.0,
-118.39484,
63.3242,
71.4
]
}
}
Here is my telegraf.conf that i use for my influxdb data source:
# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply surround
# them with ${}. For strings the variable must be within quotes (ie, "${STR_VAR}"),
# for numbers and booleans they should be plain (ie, ${INT_VAR}, ${BOOL_VAR})
# Global tags can be specified here in key="value" format.
[global_tags]
# dc = "us-east-1" # will tag all metrics with dc=us-east-1
# rack = "1a"
## Environment variables can be used as tags, and throughout the config file
# user = "$USER"
# Configuration for telegraf agent
[agent]
## Default data collection interval for all inputs
interval = "10s"
## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc.
round_interval = true
## Telegraf will send metrics to outputs in batches of at most
## metric_batch_size metrics.
## This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 1000
## Maximum number of unwritten metrics per output. Increasing this value
## allows for longer periods of output downtime without dropping metrics at the
## cost of higher maximum memory usage.
metric_buffer_limit = 10000
## Collection jitter is used to jitter the collection by a random amount.
## Each plugin will sleep for a random time within jitter before collecting.
## This can be used to avoid many plugins querying things like sysfs at the
## same time, which can have a measurable effect on the system.
collection_jitter = "0s"
## Default flushing interval for all outputs. Maximum flush_interval will be
## flush_interval + flush_jitter
flush_interval = "10s"
## Jitter the flush interval by a random amount. This is primarily to avoid
## large write spikes for users running a large number of telegraf instances.
## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
flush_jitter = "0s"
## Collected metrics are rounded to the precision specified. Precision is
## specified as an interval with an integer + unit (e.g. 0s, 10ms, 2us, 4s).
## Valid time units are "ns", "us" (or "µs"), "ms", "s".
##
## By default or when set to "0s", precision will be set to the same
## timestamp order as the collection interval, with the maximum being 1s:
## ie, when interval = "10s", precision will be "1s"
## when interval = "250ms", precision will be "1ms"
##
## Precision will NOT be used for service inputs. It is up to each individual
## service input to set the timestamp at the appropriate precision.
precision = "0s"
## Override default hostname, if empty use os.Hostname()
hostname = ""
## If set to true, do no set the "host" tag in the telegraf agent.
omit_hostname = false
###############################################################################
# OUTPUT PLUGINS #
###############################################################################
# # Configuration for sending metrics to InfluxDB 2.0
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
urls = ["http://127.0.0.1:8086"]
## Token for authentication.
token = $INFLUX_TOKEN
## Organization is the name of the organization you wish to write to.
organization = "earthWatch"
## Destination bucket to write into.
bucket = "telegraf"
[[inputs.kafka_consumer]]
## Kafka brokers.
brokers = ["localhost:9092"]
## Topics to consume.
topics = ["general-events"]
## Maximum length of a message to consume, in bytes (default 0/unlimited);
## larger messages are dropped
max_message_len = 0
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "json_v2"
[[inputs.file.json_v2]]
measurement_name = "Earthquake"
[[inputs.file.json_v2.tag]]
path = "Earthquake.features.#.Features.properties.Properties.url"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.mag"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.place"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.time"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.updated"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.tz"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.url"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.detail"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.felt"
type = "bool"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.cdi"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.mmi"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.alert"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.status"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.tsunami"
type = "bool"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.sig"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.net"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.code"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.ids"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.sources"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.types"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.nst"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.dmin"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.rms"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.gap"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.magType"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.type"
type = "string"
After multiple attempts and modification of the telegraf.conf file, I was able to retrieve some fields from the data, but some fields are missing and I don't know what configuration allowed me to do this.
the image above shows a request just after the topic has received the data (so <1 minutes ago), but still impossible to display, as if they were not retransmitted.
Any idea what might be missing?
Thank you

getting timeout when submitting fat jar to spark-jobserver (akka.pattern.AskTimeoutException)

I have built my job jar using sbt assembly to have all dependencies in one jar. When I try to submit my binary to spark-jobserver I am getting akka.pattern.AskTimeoutException
I modified my configuration to be able to submit large jars (I added parsing.max-content-length = 300m to my configuration) I also increased some of timeouts in configuration but nothing helped.
After I run:
curl --data-binary #matching-ml-assembly-1.0.jar localhost:8090/jars/matching-ml
I am getting:
{
"status": "ERROR",
"result": {
"message": "Ask timed out on [Actor[akka://JobServer/user/binary-manager#1785133213]] after [3000 ms]. Sender[null] sent message of type \"spark.jobserver.StoreBinary\".",
"errorClass": "akka.pattern.AskTimeoutException",
"stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)", "akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)", "scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)", "scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)", "akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)", "akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)", "akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)", "java.lang.Thread.run(Thread.java:745)"]
}
My configuration:
# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
master = "local[4]"
# master = "mesos://vm28-hulk-pub:5050"
# master = "yarn-client"
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 4
jobserver {
port = 8090
context-per-jvm = false
# Note: JobFileDAO is deprecated from v0.7.0 because of issues in
# production and will be removed in future, now defaults to H2 file.
jobdao = spark.jobserver.io.JobSqlDAO
filedao {
rootdir = /tmp/spark-jobserver/filedao/data
}
datadao {
# storage directory for files that are uploaded to the server
# via POST/data commands
rootdir = /tmp/spark-jobserver/upload
}
sqldao {
# Slick database driver, full classpath
slick-driver = slick.driver.H2Driver
# JDBC driver, full classpath
jdbc-driver = org.h2.Driver
# Directory where default H2 driver stores its data. Only needed for H2.
rootdir = /tmp/spark-jobserver/sqldao/data
# Full JDBC URL / init string, along with username and password. Sorry, needs to match above.
# Substitutions may be used to launch job-server, but leave it out here in the default or tests won't pass
jdbc {
url = "jdbc:h2:file:/tmp/spark-jobserver/sqldao/data/h2-db"
user = ""
password = ""
}
# DB connection pool settings
dbcp {
enabled = false
maxactive = 20
maxidle = 10
initialsize = 10
}
}
# When using chunked transfer encoding with scala Stream job results, this is the size of each chunk
result-chunk-size = 1m
}
# Predefined Spark contexts
# contexts {
# my-low-latency-context {
# num-cpu-cores = 1 # Number of cores to allocate. Required.
# memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.
# }
# # define additional contexts here
# }
# Universal context configuration. These settings can be overridden, see README.md
context-settings {
num-cpu-cores = 2 # Number of cores to allocate. Required.
memory-per-node = 2G # Executor memory per node, -Xmx style eg 512m, #1G, etc.
# In case spark distribution should be accessed from HDFS (as opposed to being installed on every Mesos slave)
# spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz"
# URIs of Jars to be loaded into the classpath for this context.
# Uris is a string list, or a string separated by commas ','
# dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"]
# Add settings you wish to pass directly to the sparkConf as-is such as Hadoop connection
# settings that don't use the "spark." prefix
passthrough {
#es.nodes = "192.1.1.1"
}
}
# This needs to match SPARK_HOME for cluster SparkContexts to be created successfully
# home = "/home/spark/spark"
}
# Note that you can use this file to define settings not only for job server,
# but for your Spark jobs as well. Spark job configuration merges with this configuration file as defaults.
spray.can.server {
# uncomment the next lines for making this an HTTPS example
# ssl-encryption = on
# path to keystore
#keystore = "/some/path/sjs.jks"
#keystorePW = "changeit"
# see http://docs.oracle.com/javase/7/docs/technotes/guides/security/StandardNames.html#SSLContext for more examples
# typical are either SSL or TLS
encryptionType = "SSL"
keystoreType = "JKS"
# key manager factory provider
provider = "SunX509"
# ssl engine provider protocols
enabledProtocols = ["SSLv3", "TLSv1"]
idle-timeout = 60 s
request-timeout = 20 s
connecting-timeout = 5s
pipelining-limit = 2 # for maximum performance (prevents StopReading / ResumeReading messages to the IOBridge)
# Needed for HTTP/1.0 requests with missing Host headers
default-host-header = "spray.io:8765"
# Increase this in order to upload bigger job jars
parsing.max-content-length = 300m
}
akka {
remote.netty.tcp {
# This controls the maximum message size, including job results, that can be sent
# maximum-frame-size = 10 MiB
}
}
I came to the similar issue. The way how to solve it is a bit tricky. First you need to add spark.jobserver.short-timeout to your configuration. Just modify your configuration like this:
jobserver {
port = 8090
context-per-jvm = false
short-timeout = 60s
...
}
The second (tricky) part is you can't fix it without modifying code of the spark-job-application. The attribute which cause timeout is in class BinaryManager:
implicit val daoAskTimeout = Timeout(3 seconds)
The default is set to 3 second which apparently for big jar is not enough. You can increase it to for example 60 second which solve problem for me.
implicit val daoAskTimeout = Timeout(60 seconds)
Actually you can bring down the size of the jars easily. Also some of the dependent jars can be passed using dependent-jar-uris instead of assembling into one big fat jar.

Why Cygnus not connected to another virtual machine with MongoDB?

Good morning,
I have the following set of virtual machines:
VM A
Generic Enablers Orion and Cygnus
IP: 10.10.0.10
VM B
MongoDB
IP: 10.10.0.17
Cygnus configuration is:
/usr/cygnus/conf/cygnus_instance_mongodb.conf
#####
#
# Configuration file for apache-flume
#
#####
# Copyright 2014 Telefonica Investigación y Desarrollo, S.A.U
#
# This file is part of fiware-connectors (FI-WARE project).
#
# cosmos-injector is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General
# Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any
# later version.
# cosmos-injector is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied
# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
# details.
#
# You should have received a copy of the GNU Affero General Public License along with fiware-connectors. If not, see
# http://www.gnu.org/licenses/.
#
# For those usages not covered by the GNU Affero General Public License please contact with iot_support at tid dot es
# Who to run cygnus as. Note that you may need to use root if you want
# to run cygnus in a privileged port (<1024)
CYGNUS_USER=cygnus
# Where is the config folder
CONFIG_FOLDER=/usr/cygnus/conf
# Which is the config file
CONFIG_FILE=/usr/cygnus/conf/agent_mongodb.conf
# Name of the agent. The name of the agent is not trivial, since it is the base for the Flume parameters
# naming conventions, e.g. it appears in .sources.http-source.channels=...
AGENT_NAME=cygnusagent
# Name of the logfile located at /var/log/cygnus. It is important to put the extension '.log' in order to the log rotation works properly
LOGFILE_NAME=cygnus.log
# Administration port. Must be unique per instance
ADMIN_PORT=8081
# Polling interval (seconds) for the configuration reloading
POLLING_INTERVAL=30
/usr/cygnus/conf/agent_mongodb.conf
#####
#
# Copyright 2014 Telefónica Investigación y Desarrollo, S.A.U
#
# This file is part of fiware-connectors (FI-WARE project).
#
# fiware-connectors is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General
# Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any
# later version.
# fiware-connectors is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied
# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
# details.
#
# You should have received a copy of the GNU Affero General Public License along with fiware-connectors. If not, see
# http://www.gnu.org/licenses/.
#
# For those usages not covered by the GNU Affero General Public License please contact with iot_support at tid dot es
#=============================================
# To be put in APACHE_FLUME_HOME/conf/agent.conf
#
# General configuration template explaining how to setup a sink of each of the available types (HDFS, CKAN, MySQL).
#=============================================
# The next tree fields set the sources, sinks and channels used by Cygnus. You could use different names than the
# ones suggested below, but in that case make sure you keep coherence in properties names along the configuration file.
# Regarding sinks, you can use multiple types at the same time; the only requirement is to provide a channel for each
# one of them (this example shows how to configure 3 sink types at the same time). Even, you can define more than one
# sink of the same type and sharing the channel in order to improve the performance (this is like having
# multi-threading).
cygnusagent.sources = http-source
cygnusagent.sinks = mongo-sink
cygnusagent.channels = mongo-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = mongo-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default service (service semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service = def_serv
# Default service path (service path semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = 10
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts gi
# Timestamp interceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# Destination extractor interceptor, do not change
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
# Matching table for the destination extractor interceptor, put the right absolute path to the file if necessary
# See the doc/design/interceptors document for more details
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /usr/cygnus/conf/grouping_rules.conf
# ============================================
# OrionMongoSink configuration
# channel name from where to read notification events
cygnusagent.sinks.mongo-sink.channel = mongo-channel
# sink class, must not be changed
cygnusagent.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.OrionMongoSink
# true if the grouping feature is enabled for this sink, false otherwise
cygnusagent.sinks.mongo-sink.enable_grouping = false
# the FQDN/IP address where the MySQL server runs (standalone case) or comma-separated list of FQDN/IP:port pairs where the MongoDB replica set members run
cygnusagent.sinks.mongo-sink.mongo_host = 10.10.0.17:27017
# a valid user in the MongoDB server
cygnusagent.sinks.mongo-sink.mongo_username =
# password for the user above
cygnusagent.sinks.mongo-sink.mongo_password =
# prefix for the MongoDB databases
cygnusagent.sinks.mongo-sink.db_prefix = hvds_
# prefix for the MongoDB collections
cygnusagent.sinks.mongo-sink.collection_prefix = hvds_
# true is collection names are based on a hash, false for human redable collections
cygnusagent.sinks.mongo-sink.should_hash = false
# specify if the sink will use a single collection for each service path, for each entity or for each attribute
cygnusagent.sinks.mongo-sink.data_model = collection-per-entity
# how the attributes are stored, either per row either per column (row, column)
cygnusagent.sinks.mongo-sink.attr_persistence = column
#=============================================
# mongo-channel configuration
# channel type (must not be changed)
cygnusagent.channels.mongo-channel.type = memory
# capacity of the channel
cygnusagent.channels.mongo-channel.capacity = 1000
# amount of bytes that can be sent per transaction
cygnusagent.channels.mongo-channel.transactionCapacity = 100
When performing the following steps:
I subscribe sensor and data persistently want to save :
(curl http://10.10.0.10:1026/NGSI10/subscribeContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d #- | python -mjson.tool) <<EOF
{
"entities": [
{
"type": "Sensor",
"isPattern": "false",
"id": "sensor003"
}
],
"attributes": [
"potencia_max",
"potencia_min",
"coste",
"co2"
],
"reference": "http://localhost:5050/notify",
"duration": "P1M",
"notifyConditions": [
{
"type": "ONTIMEINTERVAL",
"condValues": [
"PT5S"
]
}
]
}
EOF
Then I make the creation or modification of such data:
(curl http://10.10.0.10:1026/NGSI10/updateContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d #- | python -mjson.tool) <<EOF
{
"contextElements": [
{
"type": "Sensor",
"isPattern": "false",
"id": "sensor003",
"attributes": [
{
"name":"potencia_max",
"type":"float",
"value":"1000"
},
{
"name":"potencia_min",
"type":"float",
"value":"200"
},
{
"name":"coste",
"type":"float",
"value":"0.24"
},
{
"name":"co2",
"type":"float",
"value":"12"
}
]
}
],
"updateAction": "APPEND"
}
EOF
The expected result is obtained, but when accessing the database of VM B, to see if they have created and saved the data, we see that it has not happened:
admin (empty)
local 0.078GB
localhost (empty)
If we go to the database of VM A we can see who has created the database:
admin (empty)
hvds_def_serv 0.078GB
hvds_qsg 0.078GB
local 0.078GB
orion 0.078GB
Would I could indicate how it can solve?
Thank you in advance for your help
EDIT 1
I subscribe the sensor005
(curl http://10.10.0.10:1026/NGSI10/subscribeContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d #- | python -mjson.tool) <<EOF
{
"entities": [{
"type": "Sensor",
"isPattern": "false",
"id": "sensor005"
}],
"attributes": [
"muestreo"
],
"reference": "http://localhost:5050/notify",
"duration": "P1M",
"notifyConditions": [{
"type": "ONCHANGE",
"condValues": [
"muestreo"
]
}],
"throttling": "PT1S"
}
EOF
Then I edit data:
(curl http://10.10.0.10:1026/NGSI10/updateContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d #- | python -mjson.tool) <<EOF
{
"contextElements": [
{
"type": "Sensor",
"isPattern": "false",
"id": "sensor005",
"attributes": [
{
"name":"magnitud",
"type":"string",
"value":"energia"
},
{
"name":"unidad",
"type":"string",
"value":"Kw"
},
{
"name":"tipo",
"type":"string",
"value":"electrico"
},
{
"name":"valido",
"type":"boolean",
"value":"true"
},
{
"name":"muestreo",
"type":"hora/kw",
"value": {
"tiempo": [
"10:00:31",
"10:00:32",
"10:00:33",
"10:00:34",
"10:00:35",
"10:00:36",
"10:00:37",
"10:00:38",
"10:00:39",
"10:00:40",
"10:00:41",
"10:00:42",
"10:00:43",
"10:00:44",
"10:00:45",
"10:00:46",
"10:00:47",
"10:00:48",
"10:00:49",
"10:00:50",
"10:00:51",
"10:00:52",
"10:00:53",
"10:00:54",
"10:00:55",
"10:00:56",
"10:00:57",
"10:00:58",
"10:00:59",
"10:01:60"
],
"kw": [
"200",
"201",
"200",
"200",
"195",
"192",
"190",
"189",
"195",
"200",
"205",
"210",
"207",
"205",
"209",
"212",
"215",
"220",
"225",
"230",
"250",
"255",
"245",
"242",
"243",
"240",
"220",
"210",
"200",
"200"
]
}
}
]
}
],
"updateAction": "APPEND"
}
EOF
These are the two logs generated:
/var/log/contextBroker/contextBroker.log
/var/log/cygnus/cygnus.log
EDIT 2
/var/log/cygnus/cygnus.log with DEBUG

Fiware cygnus: no data have been persisted in mongo DB

I am trying to use cygnus with Mongo DB, but no data have been persisted in the data base.
Here is the notification got in cygnus:
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Starting transaction (1437482681-118-0000000000)
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Received data ({ "subscriptionId" : "55a73819d0c457bb20b1d467", "originator" : "localhost", "contextResponses" : [ { "contextElement" : { "type" : "enocean", "isPattern" : "false", "id" : "enocean:myButtonA", "attributes" : [ { "name" : "ButtonValue", "type" : "", "value" : "ON", "metadatas" : [ { "name" : "TimeInstant", "type" : "ISO8601", "value" : "2015-07-20T21:29:56.509293Z" } ] } ] }, "statusCode" : { "code" : "200", "reasonPhrase" : "OK" } } ]})
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Event put in the channel (id=1454120446, ttl=10)
Here is my agent configuration:
cygnusagent.sources = http-source
cygnusagent.sinks = OrionMongoSink
cygnusagent.channels = mongo-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = mongo-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default service (service semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service = def_serv
# Default service path (service path semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = 10
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts gi
# TimestampInterceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# GroupinInterceptor, do not change
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
# Grouping rules for the GroupingInterceptor, put the right absolute path to the file if necessary
# See the doc/design/interceptors document for more details
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /home/egm_demo/usr/fiware-cygnus/conf/grouping_rules.conf
# ============================================
# OrionMongoSink configuration
# sink class, must not be changed
cygnusagent.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.OrionMongoSink
# channel name from where to read notification events
cygnusagent.sinks.mongo-sink.channel = mongo-channel
# FQDN/IP:port where the MongoDB server runs (standalone case) or comma-separated list of FQDN/IP:port pairs where the MongoDB replica set members run
cygnusagent.sinks.mongo-sink.mongo_hosts = 127.0.0.1:27017
# a valid user in the MongoDB server (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_username =
# password for the user above (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_password =
# prefix for the MongoDB databases
#cygnusagent.sinks.mongo-sink.db_prefix = kura
# prefix pro the MongoDB collections
#cygnusagent.sinks.mongo-sink.collection_prefix = button
# true is collection names are based on a hash, false for human redable collections
cygnusagent.sinks.mongo-sink.should_hash = false
# ============================================
# mongo-channel configuration
# channel type (must not be changed)
cygnusagent.channels.mongo-channel.type = memory
# capacity of the channel
cygnusagent.channels.mongo-channel.capacity = 1000
# amount of bytes that can be sent per transaction
cygnusagent.channels.mongo-channel.transactionCapacity = 100
Here is my rule :
{
"grouping_rules": [
{
"id": 1,
"fields": [
"button"
],
"regex": ".*",
"destination": "kura",
"fiware_service_path": "/kuraspath"
}
]
}
Any ideas of what I have missed? Thanks in advance for your help!
This configuration parameter is wrong:
cygnusagent.sinks = OrionMongoSink
According to your configuration, it must be mongo-sink (I mean, you are configuring a Mongo sink named mongo-sink when you configure lines such as cygnusagent.sinks.mongo-sink.type).
In addition, I would recommend you to not using the grouping rules feature; it is an advanced feature about sending the data to a collection different than the default one, and in a first stage I would play with the default behaviour. Thus, my recommendation is to leave the path to the file in cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file, but comment all the JSON within it :)

Fiware Orion - pepProxy

i'm part of a team that is developing an application that uses the Fiware GE's has part of the Smart-AgriFood accelerator.
We are using the Orion Context Broker for gathering the data provided by the sensor network, and we intend to use the Pep-Proxy to authenticate the sensor node for access the Orion instance. We have tried the following pepProxy's:
https://github.com/telefonicaid/fiware-orion-pep
https://github.com/ging/fi-ware-pep-proxy
We only have success implementing the second (fi-ware-pep-proxy) implementation of the proxy. With the fiware-orion-pep we haven't been able to connect to the Keystone Global instance (account.lab.fi-ware.org), we have tried the account.lab... and the cloud.lab..., my question are:
1) is the keystone (IDM) instance for authentication the account.lab or the cloud.lab?? and what port's to use or address's?
2) is the fiware-orion-pep prepared for authenticate at the account.lab.fi-ware.org?? here is way i ask this:
This one works with the curl command at >> cloud.lab.fiware.org:4730/v2.0/tokens
{
"auth": {
"passwordCredentials": {
"username": "<my_user>",
"password": "<my_password>"
}
}
}'
This one does't work with the curl comand at >> account.lab.fi-ware.org:5000/v3/auth/tokens
{
"auth": {
"identity": {
"methods": [
"password"
],
"password": {
"user": {
"domain": {
"name": "<my_domain>"
},
"name": "<my_user>",
"password": "<my_password>"
}
}
}
} }'
3) what is the implementation that i should be using for authenticate the devices or other calls to the Orion instance???
Here are the configuration that i used:
fiware-orion-pep
config.authentication = {
checkHeaders: true,
module: 'keystone',
user: '<my_user>',
password: '<my_password>',
domainName: '<my_domain>',
retries: 3,
cacheTTLs: {
users: 1000,
projectIds: 1000,
roles: 60
},
options: {
protocol: 'http',
host: 'account.lab.fiware.org',
port: 5000,
path: '/v3/role_assignments',
authPath: '/v3/auth/tokens'
}
};
fi-ware-pep-proxy (this one works), i have set the listing port to 1026 at the source code
var config = {};
config.account_host = 'https://account.lab.fiware.org';
config.keystone_host = 'cloud.lab.fiware.org';
config.keystone_port = 4731;
config.app_host = 'localhost';
config.app_port = '10026';
config.username = 'pepProxy';
config.password = 'pepProxy';
// in seconds
config.chache_time = 300;
config.check_permissions = false;
config.magic_key = undefined;
module.exports = config;
Thanks in advance for the time ... :)
The are currently some differences in how both PEP Proxies authenticate and validate against the global instances, so they do not behave in exactly the same way.
The one in telefonicaid/fiware-orion-pep was developed to fulfill the PEP Proxy requirements (authentication and validation against a Keystone and Access Control) in individual projects with their own Keystone and Keypass (a flavour of Access Control) installations, and so it evolved faster than the one in ging/fi-ware-pep-proxy and in a slightly different direction. As an example, the former supports multitenancy using the fiware-service and fiware-servicepath headers, while the latter is transparent to those mechanisms. This development direction meant also that the functionality slightly differs from time to time from the one in the global instance.
That being said, the concrete answer:
- Both PEP Proxies should be able to contact the global instance. If one doesn't, please, fill a bug in the issues of the Github repository and we will fix it as soon as possible.
- The ging/fi-ware-pep-proxy was specifically designed for accessing the global instance, so you should be able to use it as expected.
Please, if you try to proceed with the telefonicaid/fiware-orion-pep take note also that:
- the configuration flag authentication.checkHeaders should be false, as the global instance does not currently support multitenancy.
- current stable release (0.5.0) is about to change to next version (probably today) so maybe some of the problems will solve with the update.
Hope this clarify some of your doubts.
[EDIT]
1) I have already install the telefonicaid/fiware-orion-pep (v 0.6.0) from sources and from the rpm package created following the tutorial available in the github. When creating the rpm package, this is created with the following name pep-proxy-0.4.0_next-0.noarch.rpm.
2) Here is the configuration that i used:
/opt/fiware-orion-pep/config.js
var config = {};
config.resource = {
original: {
host: 'localhost',
port: 10026
},
proxy: {
port: 1026,
adminPort: 11211
} };
config.authentication = {
checkHeaders: false,
module: 'keystone',
user: '<##################>',
password: '<###################>',
domainName: 'admin_domain',
retries: 3,
cacheTTLs: {
users: 1000,
projectIds: 1000,
roles: 60
},
options: { protocol: 'http',
host: 'cloud.lab.fiware.org',
port: 4730,
path: '/v3/role_assignments',
authPath: '/v3/auth/tokens'
} };
config.ssl = {
active: false,
keyFile: '',
certFile: '' }
config.logLevel = 'DEBUG'; // List of component
config.middlewares = {
require: 'lib/plugins/orionPlugin',
functions: [
'extractCBAction'
] };
config.componentName = 'orion';
config.resourceNamePrefix = 'fiware:';
config.bypass = false;
config.bypassRoleId = '';
module.exports = config;
/etc/sysconfig/pepProxy
# General Configuration
############################################################################
# Port where the proxy will listen for requests
PROXY_PORT=1026
# User to execute the PEP Proxy with
PROXY_USER=pepproxy
# Host where the target Context Broker is located
# TARGET_HOST=localhost
# Port where the target Context Broker is listening
# TARGET_PORT=10026
# Maximum level of logs to show (FATAL, ERROR, WARNING, INFO, DEBUG)
LOG_LEVEL=DEBUG
# Indicates what component plugin should be loaded with this PEP: orion, keypass, perseo
COMPONENT_PLUGIN=orion
#
# Access Control Configuration
############################################################################
# Host where the Access Control (the component who knows the policies for the incoming requests) is located
# ACCESS_HOST=
# Port where the Access Control is listening
# ACCESS_PORT=
# Host where the authentication authority for the Access Control is located
# AUTHENTICATION_HOST=
# Port where the authentication authority is listening
# AUTHENTICATION_PORT=
# User name of the PEP Proxy in the authentication authority
PROXY_USERNAME=XXXXXXXXXXXXX
# Password of the PEP Proxy in the Authentication authority
PROXY_PASSWORD=XXXXXXXXXXXXX
In the files above i have tried the following parameters:
Keystone instance: account.lab.fiware.org or cloud.lab.fiware.org
User: pep or pepProxy or "user from fiware account"
Pass: pep or pepProxy or "user password from account"
Port: 4730, 4731, 5000
The result it's the same as before... the telefonicaid/fiware-orion-pep is unable to authenticate:
log file at /var/log/pepProxy/pepProxy
time=2015-04-13T14:49:24.718Z | lvl=ERROR | corr=71a34c8b-10b3-40a3-be85-71bd3ce34c8a | trans=71a34c8b-10b3-40a3-be85-71bd3ce34c8a | op=/v1/updateContext | msg=VALIDATION-GEN-003] Error connecting to Keystone authentication: KEYSTONE_AUTHENTICATION_ERROR: There was a connection error while authenticating to Keystone: 500
time=2015-04-13T14:49:24.721Z | lvl=DEBUG | corr=71a34c8b-10b3-40a3-be85-71bd3ce34c8a | trans=71a34c8b-10b3-40a3-be85-71bd3ce34c8a | op=/v1/updateContext | msg=response-time: 50745 statusCode: 500
result from the client console
{
"message": "There was a connection error while authenticating to Keystone: 500",
"name": "KEYSTONE_AUTHENTICATION_ERROR"
}
I'm doing something wrong here??