Impossible to fetch data from kafka to telegraf - apache-kafka

I have a personal project to gather data from an earthquake monitoring API and put it in real time on a kafka topic and then retrieve it on grafana.
For that I use telegraf and this plugin allowing to listen and to recover data of a topic, before sending them to an influxdb bucket and use it as a source for grafana
I did some tests and I know for sure that the data is received on the topic.
{
"Earthquake": {
"metadata": {
"Metadata": {
"generated": "1662533868000",
"url": "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2022-09-07T05:57:47",
"title": "USGS Earthquakes",
"api": "1.13.6",
"count": "7",
"status": "200"
}
},
"features": [
{
"Features": {
"type": "Feature",
"properties": {
"Properties": {
"mag": "1.7",
"place": "34 km NE of Paxson, Alaska",
"time": "1662532868655",
"updated": "1662532994982",
"tz": "0",
"url": "https://earthquake.usgs.gov/earthquakes/eventpage/ak022bhk6641",
"detail": "https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ak022bhk6641&format=geojson",
"felt": "0",
"cdi": "0.0",
"mmi": "0.0",
"alert": "null",
"status": "automatic",
"tsunami": "0",
"sig": "44",
"net": "ak",
"code": "022bhk6641",
"ids": ",ak022bhk6641,",
"sources": ",ak,",
"types": ",origin,phase-data,",
"nst": "0",
"dmin": "0.0",
"rms": "0.6",
"gap": "0.0",
"magType": "ml",
"type": "earthquake"
}
},
"geometry": {
"Geometry": {
"type": "Point",
"coordinates": [
-145.1808,
63.3242,
0.0
]
}
},
"id": "ak022bhk6641"
}
}
],
"bbox": [
-151.5653,
-54.375,
0.0,
-118.39484,
63.3242,
71.4
]
}
}
Here is my telegraf.conf that i use for my influxdb data source:
# Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply surround
# them with ${}. For strings the variable must be within quotes (ie, "${STR_VAR}"),
# for numbers and booleans they should be plain (ie, ${INT_VAR}, ${BOOL_VAR})
# Global tags can be specified here in key="value" format.
[global_tags]
# dc = "us-east-1" # will tag all metrics with dc=us-east-1
# rack = "1a"
## Environment variables can be used as tags, and throughout the config file
# user = "$USER"
# Configuration for telegraf agent
[agent]
## Default data collection interval for all inputs
interval = "10s"
## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc.
round_interval = true
## Telegraf will send metrics to outputs in batches of at most
## metric_batch_size metrics.
## This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 1000
## Maximum number of unwritten metrics per output. Increasing this value
## allows for longer periods of output downtime without dropping metrics at the
## cost of higher maximum memory usage.
metric_buffer_limit = 10000
## Collection jitter is used to jitter the collection by a random amount.
## Each plugin will sleep for a random time within jitter before collecting.
## This can be used to avoid many plugins querying things like sysfs at the
## same time, which can have a measurable effect on the system.
collection_jitter = "0s"
## Default flushing interval for all outputs. Maximum flush_interval will be
## flush_interval + flush_jitter
flush_interval = "10s"
## Jitter the flush interval by a random amount. This is primarily to avoid
## large write spikes for users running a large number of telegraf instances.
## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
flush_jitter = "0s"
## Collected metrics are rounded to the precision specified. Precision is
## specified as an interval with an integer + unit (e.g. 0s, 10ms, 2us, 4s).
## Valid time units are "ns", "us" (or "µs"), "ms", "s".
##
## By default or when set to "0s", precision will be set to the same
## timestamp order as the collection interval, with the maximum being 1s:
## ie, when interval = "10s", precision will be "1s"
## when interval = "250ms", precision will be "1ms"
##
## Precision will NOT be used for service inputs. It is up to each individual
## service input to set the timestamp at the appropriate precision.
precision = "0s"
## Override default hostname, if empty use os.Hostname()
hostname = ""
## If set to true, do no set the "host" tag in the telegraf agent.
omit_hostname = false
###############################################################################
# OUTPUT PLUGINS #
###############################################################################
# # Configuration for sending metrics to InfluxDB 2.0
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
urls = ["http://127.0.0.1:8086"]
## Token for authentication.
token = $INFLUX_TOKEN
## Organization is the name of the organization you wish to write to.
organization = "earthWatch"
## Destination bucket to write into.
bucket = "telegraf"
[[inputs.kafka_consumer]]
## Kafka brokers.
brokers = ["localhost:9092"]
## Topics to consume.
topics = ["general-events"]
## Maximum length of a message to consume, in bytes (default 0/unlimited);
## larger messages are dropped
max_message_len = 0
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "json_v2"
[[inputs.file.json_v2]]
measurement_name = "Earthquake"
[[inputs.file.json_v2.tag]]
path = "Earthquake.features.#.Features.properties.Properties.url"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.mag"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.place"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.time"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.updated"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.tz"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.url"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.detail"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.felt"
type = "bool"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.cdi"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.mmi"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.alert"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.status"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.tsunami"
type = "bool"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.sig"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.net"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.code"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.ids"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.sources"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.types"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.nst"
type = "int"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.dmin"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.rms"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.gap"
type = "float"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.magType"
type = "string"
[[inputs.file.json_v2.field]]
path = "Earthquake.features.#.Features.properties.Properties.type"
type = "string"
After multiple attempts and modification of the telegraf.conf file, I was able to retrieve some fields from the data, but some fields are missing and I don't know what configuration allowed me to do this.
the image above shows a request just after the topic has received the data (so <1 minutes ago), but still impossible to display, as if they were not retransmitted.
Any idea what might be missing?
Thank you

Related

Connecting Ditto to InfluxDB via Kafka

I am running kafka and influxDB on docker.
I have created a digital twin on ditto, that correctly updates when i send a message with mqtt.
I want the data to be sent from ditto to the influxDB but on influxDB once i create the bucket it shows no data whatsoever.
I have followed this guide:https://www.influxdata.com/blog/getting-started-apache-kafka-influxdb/
(i know this is for a python program but the steps should be the same, i just use the telegraf plugin for kafka consumer instead of the one used in the guide).
I have created the connection and the configuration file of telegraf but nothing happens on InfluxDB.
Here is the telegraf.conf
`
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
urls = ["http://localhost:8086"]
## API token for authentication.
token = "$INFLUX_TOKEN"
## Organization is the name of the organization you wish to write to; must exist.
organization = "digital"
## Destination bucket to write into.
bucket = "arduino"
## The value of this tag will be used to determine the bucket. If this
## tag is not set the 'bucket' option is used as the default.
# bucket_tag = ""
## If true, the bucket tag will not be added to the metric.
# exclude_bucket_tag = false
## Timeout for HTTP messages.
# timeout = "5s"
## Additional HTTP headers
# http_headers = {"X-Special-Header" = "Special-Value"}
## HTTP Proxy override, if unset values the standard proxy environment
## variables are consulted to determine which proxy, if any, should be used.
# http_proxy = "http://corporate.proxy:3128"
## HTTP User-Agent
# user_agent = "telegraf"
## Content-Encoding for write request body, can be set to "gzip" to
## compress body or "identity" to apply no encoding.
# content_encoding = "gzip"
## Enable or disable uint support for writing uints influxdb 2.0.
# influx_uint_support = false
## Optional TLS Config for use on HTTP connections.
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
# Read metrics from Kafka topics
[[inputs.kafka_consumer]]
## Kafka brokers.
brokers = ["localhost:9092"]
## Topics to consume.
topics = ["arduino"]
## When set this tag will be added to all metrics with the topic as the value.
# topic_tag = ""
## Optional Client id
# client_id = "Telegraf"
## Set the minimal supported Kafka version. Setting this enables the use of new
## Kafka features and APIs. Must be 0.10.2.0 or greater.
## ex: version = "1.1.0"
# version = ""
## Optional TLS Config
# enable_tls = false
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
## SASL authentication credentials. These settings should typically be used
## with TLS encryption enabled
# sasl_username = "kafka"
# sasl_password = "secret"
## Optional SASL:
## one of: OAUTHBEARER, PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, GSSAPI
## (defaults to PLAIN)
# sasl_mechanism = ""
## used if sasl_mechanism is GSSAPI (experimental)
# sasl_gssapi_service_name = ""
# ## One of: KRB5_USER_AUTH and KRB5_KEYTAB_AUTH
# sasl_gssapi_auth_type = "KRB5_USER_AUTH"
# sasl_gssapi_kerberos_config_path = "/"
# sasl_gssapi_realm = "realm"
# sasl_gssapi_key_tab_path = ""
# sasl_gssapi_disable_pafxfast = false
## used if sasl_mechanism is OAUTHBEARER (experimental)
# sasl_access_token = ""
## SASL protocol version. When connecting to Azure EventHub set to 0.
# sasl_version = 1
# Disable Kafka metadata full fetch
# metadata_full = false
## Name of the consumer group.
# consumer_group = "telegraf_metrics_consumers"
## Compression codec represents the various compression codecs recognized by
## Kafka in messages.
## 0 : None
## 1 : Gzip
## 2 : Snappy
## 3 : LZ4
## 4 : ZSTD
# compression_codec = 0
## Initial offset position; one of "oldest" or "newest".
# offset = "oldest"
## Consumer group partition assignment strategy; one of "range", "roundrobin" or "sticky".
# balance_strategy = "range"
## Maximum length of a message to consume, in bytes (default 0/unlimited);
## larger messages are dropped
max_message_len = 1000000
## Maximum messages to read from the broker that have not been written by an
## output. For best throughput set based on the number of metrics within
## each message and the size of the output's metric_batch_size.
##
## For example, if each message from the queue contains 10 metrics and the
## output metric_batch_size is 1000, setting this to 100 will ensure that a
## full batch is collected and the write is triggered immediately without
## waiting until the next flush_interval.
# max_undelivered_messages = 1000
## Maximum amount of time the consumer should take to process messages. If
## the debug log prints messages from sarama about 'abandoning subscription
## to [topic] because consuming was taking too long', increase this value to
## longer than the time taken by the output plugin(s).
##
## Note that the effective timeout could be between 'max_processing_time' and
## '2 * max_processing_time'.
# max_processing_time = "100ms"
## The default number of message bytes to fetch from the broker in each
## request (default 1MB). This should be larger than the majority of
## your messages, or else the consumer will spend a lot of time
## negotiating sizes and not actually consuming. Similar to the JVM's
## `fetch.message.max.bytes`.
# consumer_fetch_default = "1MB"
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "json"
the kafka connection as it is on ditto explorer:
{
"id": "0ab4b527-617f-4f4f-8bac-4ffa4b5a8471",
"name": "Kafka 2.x",
"connectionType": "kafka",
"connectionStatus": "open",
"uri": "tcp://192.168.109.74:9092",
"sources": [
{
"addresses": [
"arduino"
],
"consumerCount": 1,
"qos": 1,
"authorizationContext": [
"nginx:ditto"
],
"enforcement": {
"input": "{{ header:device_id }}",
"filters": [
"{{ entity:id }}"
]
},
"acknowledgementRequests": {
"includes": []
},
"headerMapping": {},
"payloadMapping": [
"Ditto"
],
"replyTarget": {
"address": "theReplyTopic",
"headerMapping": {},
"expectedResponseTypes": [
"response",
"error",
"nack"
],
"enabled": true
}
}
],
"targets": [
{
"address": "topic/key",
"topics": [
"_/_/things/twin/events",
"_/_/things/live/messages"
],
"authorizationContext": [
"nginx:ditto"
],
"headerMapping": {}
}
],
"clientCount": 1,
"failoverEnabled": true,
"validateCertificates": true,
"processorPoolSize": 1,
"specificConfig": {
"saslMechanism": "plain",
"bootstrapServers": "localhost:9092"
},
"tags": []
}
the policy file for ditto:
{
"policyId": "my.test:policy1",
"entries": {
"owner": {
"subjects": {
"nginx:ditto": {
"type": "nginx basic auth user"
}
},
"resources": {
"thing:/": {
"grant": ["READ","WRITE"],
"revoke": []
},
"policy:/": {
"grant": ["READ","WRITE"],
"revoke": []
},
"message:/": {
"grant": ["READ","WRITE"],
"revoke": []
}
}
},
"observer": {
"subjects": {
"ditto:observer": {
"type": "observer user"
}
},
"resources": {
"thing:/features": {
"grant": ["READ"],
"revoke": []
},
"policy:/": {
"grant": ["READ"],
"revoke": []
},
"message:/": {
"grant": ["READ"],
"revoke": []
}
}
}
}
}
the configuration file of telegraf but nothing happens on InfluxDB
When Telegraf is reading data from Kafka is needs to transform that into time-series metrics that InfluxDB can digest. You have correctly selected the JSON parser, but there may be additional configuraiton required, or even the use of the more powerful json_v2 parser to be able to set the tags and fields based on the JSON data.
My suggestion is to use the [[outputs.file]] output to see if anything is even getting passed, probably nothing will show up. Then do the following:
determine what your JSON looks like in kafka
what you want that JSON to look like as time-series data in influxdb
use the json_v2 parser to set the apporporiate tags and fields.

Grafana 7.4.3 /var/lib/grafana not writeable in AWS ECS - EFS

I'm trying to host a Grafana 7.4 image in ECS Fargate using an EFS volume for persistent storage.
Using Terraform I have created the required resource and given the task access to the EFS volume via an "access point"
resource "aws_efs_access_point" "default" {
file_system_id = aws_efs_file_system.default.id
posix_user {
gid = 0
uid = 472
}
root_directory {
path = "/opt/data"
creation_info {
owner_gid = 0
owner_uid = 472
permissions = "600"
}
}
}
Note that I have set owner permissions as per the guides in https://grafana.com/docs/grafana/latest/installation/docker/#migrate-from-previous-docker-containers-versions (I've tried both group id 0 and 1 as the documentation seems to be inconsistent on the gid).
Using a base alpine image in place of the grafana image I've confirmed the directory /var/lib/grafana exists within container with the correct uid and gids set. However on attempting to run the grafana image I get the error message
GF_PATHS_DATA='/var/lib/grafana' is not writable.
I am launching the task with a terraformed task definition.
resource "aws_ecs_task_definition" "default" {
family = "${var.name}"
container_definitions = "${data.template_file.container_definition.rendered}"
memory = "${var.memory}"
cpu = "${var.cpu}"
requires_compatibilities = [
"FARGATE"
]
network_mode = "awsvpc"
execution_role_arn = "arn:aws:iam::REDACTED_ID:role/ecsTaskExecutionRole"
volume {
name = "${var.name}-volume"
efs_volume_configuration {
file_system_id = aws_efs_file_system.default.id
transit_encryption = "ENABLED"
root_directory = "/opt/data"
authorization_config {
access_point_id = aws_efs_access_point.default.id
}
}
}
tags = {
Product = "${var.name}"
}
}
With the container definition
[
{
"portMappings": [
{
"hostPort": 80,
"protocol": "tcp",
"containerPort": 80
}
],
"mountPoints": [
{
"sourceVolume": "${volume_name}",
"containerPath": "/var/lib/grafana",
"readOnly": false
}
],
"cpu": 0,
"secrets": [
...
],
"environment": [],
"image": "grafana/grafana:7.4.3",
"name": "${name}",
"user": "472:0"
}
]
For "user" I have tried "grafana", "742:0", "742" and "742:1" when trying gid 1.
I believe the terraform, security groups, mount_targets, etc... are all correct as I can get an alpine image to:
ls -lash /var/lib
> drw------- 2 472 root 6.0K Mar 12 11:22 grafana
I believe you have a problem, because AWS ECS https://github.com/aws/containers-roadmap/issues/938
Anyway, file system approach doesn't seems to be very cloud friendly (especially if you want to scale horizontally: problems with concurrent writes from multiple tasks, IOPs limitations, ...). Just provision proper DB (e.g. Aurora RDS Mysql, multi A-Z cluster if you need HA) and you will have nice opsless AWS deployment.

Why error, "alias target name does not lie within the target zone" in Terraform aws_route53_record?

With Terraform 0.12, I am creating a static web site in an S3 bucket:
...
resource "aws_s3_bucket" "www" {
bucket = "example.com"
acl = "public-read"
policy = <<-POLICY
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "AddPerm",
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject"],
"Resource": ["arn:aws:s3:::example.com/*"]
}]
}
POLICY
website {
index_document = "index.html"
error_document = "404.html"
}
tags = {
Environment = var.environment
Terraform = "true"
}
}
resource "aws_route53_zone" "main" {
name = "example.com"
tags = {
Environment = var.environment
Terraform = "true"
}
}
resource "aws_route53_record" "main-ns" {
zone_id = aws_route53_zone.main.zone_id
name = "example.com"
type = "A"
alias {
name = aws_s3_bucket.www.website_endpoint
zone_id = aws_route53_zone.main.zone_id
evaluate_target_health = false
}
}
I get the error:
Error: [ERR]: Error building changeset: InvalidChangeBatch:
[Tried to create an alias that targets example.com.s3-website-us-west-2.amazonaws.com., type A in zone Z1P...9HY, but the alias target name does not lie within the target zone,
Tried to create an alias that targets example.com.s3-website-us-west-2.amazonaws.com., type A in zone Z1P...9HY, but that target was not found]
status code: 400, request id: 35...bc
on main.tf line 132, in resource "aws_route53_record" "main-ns":
132: resource "aws_route53_record" "main-ns" {
What is wrong?
The zone_id within alias is the S3 bucket zone ID, not the Route 53 zone ID. The correct aws_route53_record resource is:
resource "aws_route53_record" "main-ns" {
zone_id = aws_route53_zone.main.zone_id
name = "example.com"
type = "A"
alias {
name = aws_s3_bucket.www.website_endpoint
zone_id = aws_s3_bucket.www.hosted_zone_id # Corrected
evaluate_target_health = false
}
}
Here is an example for CloudFront. The variables are:
base_url = example.com
cloudfront_distribution = "EXXREDACTEDXXX"
domain_names = ["example.com", "www.example.com"]
The Terraform code is:
data "aws_route53_zone" "this" {
name = var.base_url
}
data "aws_cloudfront_distribution" "this" {
id = var.cloudfront_distribution
}
resource "aws_route53_record" "this" {
for_each = toset(var.domain_names)
zone_id = data.aws_route53_zone.this.zone_id
name = each.value
type = "A"
alias {
name = data.aws_cloudfront_distribution.this.domain_name
zone_id = data.aws_cloudfront_distribution.this.hosted_zone_id
evaluate_target_health = false
}
}
Many users specify CloudFront zone_id = "Z2FDTNDATAQYW2" because it's always Z2FDTNDATAQYW2...until some day maybe it isn't. I like to avoid the literal string by computing it using data source aws_cloudfront_distribution.
For anyone like me that came here from Google in hope to find the syntax for the CloudFormation and YML, Here is how you can achieve it for your sub-domains.
Here we add a DNS record into the Route53 and redirect all the subnets of example.com to this ALB:
AlbDnsRecord:
Type: "AWS::Route53::RecordSet"
DependsOn: [ALB_LOGICAL_ID]
Properties:
HostedZoneName: "example.com."
Type: "A"
Name: "*.example.com."
AliasTarget:
DNSName: !GetAtt [ALB_LOGICAL_ID].DNSName
EvaluateTargetHealth: False
HostedZoneId: !GetAtt [ALB_LOGICAL_ID].CanonicalHostedZoneID
Comment: "A record for Stages ALB"
My mistakes was:
not adding . at the end of my HostedZoneName
under AliasTarget.HostedZoneId ID is al uppercase in the end of CanonicalHostedZoneID
replace the [ALB_LOGICAL_ID] with the actual name of your ALB, for me it was like: ALBStages.DNSName
You should have the zone in your Route53.
So for us all the below addresses will come to this ALB:
dev01.example.com
dev01api.example.com
dev02.example.com
dev02api.example.com
qa01.example.com
qa01api.example.com
qa02.example.com
qa02api.example.com
uat.example.com
uatapi.example.com

Why Cygnus not connected to another virtual machine with MongoDB?

Good morning,
I have the following set of virtual machines:
VM A
Generic Enablers Orion and Cygnus
IP: 10.10.0.10
VM B
MongoDB
IP: 10.10.0.17
Cygnus configuration is:
/usr/cygnus/conf/cygnus_instance_mongodb.conf
#####
#
# Configuration file for apache-flume
#
#####
# Copyright 2014 Telefonica Investigación y Desarrollo, S.A.U
#
# This file is part of fiware-connectors (FI-WARE project).
#
# cosmos-injector is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General
# Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any
# later version.
# cosmos-injector is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied
# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
# details.
#
# You should have received a copy of the GNU Affero General Public License along with fiware-connectors. If not, see
# http://www.gnu.org/licenses/.
#
# For those usages not covered by the GNU Affero General Public License please contact with iot_support at tid dot es
# Who to run cygnus as. Note that you may need to use root if you want
# to run cygnus in a privileged port (<1024)
CYGNUS_USER=cygnus
# Where is the config folder
CONFIG_FOLDER=/usr/cygnus/conf
# Which is the config file
CONFIG_FILE=/usr/cygnus/conf/agent_mongodb.conf
# Name of the agent. The name of the agent is not trivial, since it is the base for the Flume parameters
# naming conventions, e.g. it appears in .sources.http-source.channels=...
AGENT_NAME=cygnusagent
# Name of the logfile located at /var/log/cygnus. It is important to put the extension '.log' in order to the log rotation works properly
LOGFILE_NAME=cygnus.log
# Administration port. Must be unique per instance
ADMIN_PORT=8081
# Polling interval (seconds) for the configuration reloading
POLLING_INTERVAL=30
/usr/cygnus/conf/agent_mongodb.conf
#####
#
# Copyright 2014 Telefónica Investigación y Desarrollo, S.A.U
#
# This file is part of fiware-connectors (FI-WARE project).
#
# fiware-connectors is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General
# Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any
# later version.
# fiware-connectors is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied
# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
# details.
#
# You should have received a copy of the GNU Affero General Public License along with fiware-connectors. If not, see
# http://www.gnu.org/licenses/.
#
# For those usages not covered by the GNU Affero General Public License please contact with iot_support at tid dot es
#=============================================
# To be put in APACHE_FLUME_HOME/conf/agent.conf
#
# General configuration template explaining how to setup a sink of each of the available types (HDFS, CKAN, MySQL).
#=============================================
# The next tree fields set the sources, sinks and channels used by Cygnus. You could use different names than the
# ones suggested below, but in that case make sure you keep coherence in properties names along the configuration file.
# Regarding sinks, you can use multiple types at the same time; the only requirement is to provide a channel for each
# one of them (this example shows how to configure 3 sink types at the same time). Even, you can define more than one
# sink of the same type and sharing the channel in order to improve the performance (this is like having
# multi-threading).
cygnusagent.sources = http-source
cygnusagent.sinks = mongo-sink
cygnusagent.channels = mongo-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = mongo-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default service (service semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service = def_serv
# Default service path (service path semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = 10
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts gi
# Timestamp interceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# Destination extractor interceptor, do not change
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
# Matching table for the destination extractor interceptor, put the right absolute path to the file if necessary
# See the doc/design/interceptors document for more details
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /usr/cygnus/conf/grouping_rules.conf
# ============================================
# OrionMongoSink configuration
# channel name from where to read notification events
cygnusagent.sinks.mongo-sink.channel = mongo-channel
# sink class, must not be changed
cygnusagent.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.OrionMongoSink
# true if the grouping feature is enabled for this sink, false otherwise
cygnusagent.sinks.mongo-sink.enable_grouping = false
# the FQDN/IP address where the MySQL server runs (standalone case) or comma-separated list of FQDN/IP:port pairs where the MongoDB replica set members run
cygnusagent.sinks.mongo-sink.mongo_host = 10.10.0.17:27017
# a valid user in the MongoDB server
cygnusagent.sinks.mongo-sink.mongo_username =
# password for the user above
cygnusagent.sinks.mongo-sink.mongo_password =
# prefix for the MongoDB databases
cygnusagent.sinks.mongo-sink.db_prefix = hvds_
# prefix for the MongoDB collections
cygnusagent.sinks.mongo-sink.collection_prefix = hvds_
# true is collection names are based on a hash, false for human redable collections
cygnusagent.sinks.mongo-sink.should_hash = false
# specify if the sink will use a single collection for each service path, for each entity or for each attribute
cygnusagent.sinks.mongo-sink.data_model = collection-per-entity
# how the attributes are stored, either per row either per column (row, column)
cygnusagent.sinks.mongo-sink.attr_persistence = column
#=============================================
# mongo-channel configuration
# channel type (must not be changed)
cygnusagent.channels.mongo-channel.type = memory
# capacity of the channel
cygnusagent.channels.mongo-channel.capacity = 1000
# amount of bytes that can be sent per transaction
cygnusagent.channels.mongo-channel.transactionCapacity = 100
When performing the following steps:
I subscribe sensor and data persistently want to save :
(curl http://10.10.0.10:1026/NGSI10/subscribeContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d #- | python -mjson.tool) <<EOF
{
"entities": [
{
"type": "Sensor",
"isPattern": "false",
"id": "sensor003"
}
],
"attributes": [
"potencia_max",
"potencia_min",
"coste",
"co2"
],
"reference": "http://localhost:5050/notify",
"duration": "P1M",
"notifyConditions": [
{
"type": "ONTIMEINTERVAL",
"condValues": [
"PT5S"
]
}
]
}
EOF
Then I make the creation or modification of such data:
(curl http://10.10.0.10:1026/NGSI10/updateContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d #- | python -mjson.tool) <<EOF
{
"contextElements": [
{
"type": "Sensor",
"isPattern": "false",
"id": "sensor003",
"attributes": [
{
"name":"potencia_max",
"type":"float",
"value":"1000"
},
{
"name":"potencia_min",
"type":"float",
"value":"200"
},
{
"name":"coste",
"type":"float",
"value":"0.24"
},
{
"name":"co2",
"type":"float",
"value":"12"
}
]
}
],
"updateAction": "APPEND"
}
EOF
The expected result is obtained, but when accessing the database of VM B, to see if they have created and saved the data, we see that it has not happened:
admin (empty)
local 0.078GB
localhost (empty)
If we go to the database of VM A we can see who has created the database:
admin (empty)
hvds_def_serv 0.078GB
hvds_qsg 0.078GB
local 0.078GB
orion 0.078GB
Would I could indicate how it can solve?
Thank you in advance for your help
EDIT 1
I subscribe the sensor005
(curl http://10.10.0.10:1026/NGSI10/subscribeContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d #- | python -mjson.tool) <<EOF
{
"entities": [{
"type": "Sensor",
"isPattern": "false",
"id": "sensor005"
}],
"attributes": [
"muestreo"
],
"reference": "http://localhost:5050/notify",
"duration": "P1M",
"notifyConditions": [{
"type": "ONCHANGE",
"condValues": [
"muestreo"
]
}],
"throttling": "PT1S"
}
EOF
Then I edit data:
(curl http://10.10.0.10:1026/NGSI10/updateContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d #- | python -mjson.tool) <<EOF
{
"contextElements": [
{
"type": "Sensor",
"isPattern": "false",
"id": "sensor005",
"attributes": [
{
"name":"magnitud",
"type":"string",
"value":"energia"
},
{
"name":"unidad",
"type":"string",
"value":"Kw"
},
{
"name":"tipo",
"type":"string",
"value":"electrico"
},
{
"name":"valido",
"type":"boolean",
"value":"true"
},
{
"name":"muestreo",
"type":"hora/kw",
"value": {
"tiempo": [
"10:00:31",
"10:00:32",
"10:00:33",
"10:00:34",
"10:00:35",
"10:00:36",
"10:00:37",
"10:00:38",
"10:00:39",
"10:00:40",
"10:00:41",
"10:00:42",
"10:00:43",
"10:00:44",
"10:00:45",
"10:00:46",
"10:00:47",
"10:00:48",
"10:00:49",
"10:00:50",
"10:00:51",
"10:00:52",
"10:00:53",
"10:00:54",
"10:00:55",
"10:00:56",
"10:00:57",
"10:00:58",
"10:00:59",
"10:01:60"
],
"kw": [
"200",
"201",
"200",
"200",
"195",
"192",
"190",
"189",
"195",
"200",
"205",
"210",
"207",
"205",
"209",
"212",
"215",
"220",
"225",
"230",
"250",
"255",
"245",
"242",
"243",
"240",
"220",
"210",
"200",
"200"
]
}
}
]
}
],
"updateAction": "APPEND"
}
EOF
These are the two logs generated:
/var/log/contextBroker/contextBroker.log
/var/log/cygnus/cygnus.log
EDIT 2
/var/log/cygnus/cygnus.log with DEBUG

Fiware cygnus: no data have been persisted in mongo DB

I am trying to use cygnus with Mongo DB, but no data have been persisted in the data base.
Here is the notification got in cygnus:
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Starting transaction (1437482681-118-0000000000)
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Received data ({ "subscriptionId" : "55a73819d0c457bb20b1d467", "originator" : "localhost", "contextResponses" : [ { "contextElement" : { "type" : "enocean", "isPattern" : "false", "id" : "enocean:myButtonA", "attributes" : [ { "name" : "ButtonValue", "type" : "", "value" : "ON", "metadatas" : [ { "name" : "TimeInstant", "type" : "ISO8601", "value" : "2015-07-20T21:29:56.509293Z" } ] } ] }, "statusCode" : { "code" : "200", "reasonPhrase" : "OK" } } ]})
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Event put in the channel (id=1454120446, ttl=10)
Here is my agent configuration:
cygnusagent.sources = http-source
cygnusagent.sinks = OrionMongoSink
cygnusagent.channels = mongo-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = mongo-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default service (service semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service = def_serv
# Default service path (service path semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = 10
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts gi
# TimestampInterceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# GroupinInterceptor, do not change
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
# Grouping rules for the GroupingInterceptor, put the right absolute path to the file if necessary
# See the doc/design/interceptors document for more details
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /home/egm_demo/usr/fiware-cygnus/conf/grouping_rules.conf
# ============================================
# OrionMongoSink configuration
# sink class, must not be changed
cygnusagent.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.OrionMongoSink
# channel name from where to read notification events
cygnusagent.sinks.mongo-sink.channel = mongo-channel
# FQDN/IP:port where the MongoDB server runs (standalone case) or comma-separated list of FQDN/IP:port pairs where the MongoDB replica set members run
cygnusagent.sinks.mongo-sink.mongo_hosts = 127.0.0.1:27017
# a valid user in the MongoDB server (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_username =
# password for the user above (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_password =
# prefix for the MongoDB databases
#cygnusagent.sinks.mongo-sink.db_prefix = kura
# prefix pro the MongoDB collections
#cygnusagent.sinks.mongo-sink.collection_prefix = button
# true is collection names are based on a hash, false for human redable collections
cygnusagent.sinks.mongo-sink.should_hash = false
# ============================================
# mongo-channel configuration
# channel type (must not be changed)
cygnusagent.channels.mongo-channel.type = memory
# capacity of the channel
cygnusagent.channels.mongo-channel.capacity = 1000
# amount of bytes that can be sent per transaction
cygnusagent.channels.mongo-channel.transactionCapacity = 100
Here is my rule :
{
"grouping_rules": [
{
"id": 1,
"fields": [
"button"
],
"regex": ".*",
"destination": "kura",
"fiware_service_path": "/kuraspath"
}
]
}
Any ideas of what I have missed? Thanks in advance for your help!
This configuration parameter is wrong:
cygnusagent.sinks = OrionMongoSink
According to your configuration, it must be mongo-sink (I mean, you are configuring a Mongo sink named mongo-sink when you configure lines such as cygnusagent.sinks.mongo-sink.type).
In addition, I would recommend you to not using the grouping rules feature; it is an advanced feature about sending the data to a collection different than the default one, and in a first stage I would play with the default behaviour. Thus, my recommendation is to leave the path to the file in cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file, but comment all the JSON within it :)