Logstash not reading from file input - docker-compose

I'm running this implementation of the ELK stack, which is pretty straightforward and easy to configure.
I can push TCP input through the stack using netcat like so:
nc localhost 5000 < /Users/me/path/to/logs/appOne.log
nc localhost 5000 < /Users/me/path/to/logs/appOneStackTrace.log
nc localhost 5000 < /Users/me/path/to/logs/appTwo.log
nc localhost 5000 < /Users/me/path/to/logs/appTwoStackTrace.log
But I cannot get the Logstash to read the file paths I specify in the config:
input {
tcp {
port => 5000
}
file {
path => [
"/Users/me/path/to/logs/appOne.log",
"/Users/me/path/to/logs/appOneStackTrace.log",
"/Users/me/path/to/logs/appTwo.log",
"/Users/me/path/to/logs/appTwoStackTrace.log"
]
type => "log"
start_position => "beginning"
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
}
}
Here is the startup output from the stack regarding logstash input:
logstash_1 | [2019-01-28T17:44:33,206][INFO ][logstash.inputs.tcp ] Starting tcp input listener {:address=>"0.0.0.0:5000", :ssl_enable=>"false"}
logstash_1 | [2019-01-28T17:44:34,037][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/usr/share/logstash/data/plugins/inputs/file/.sincedb_a1605b28f1bc77daf785a8805c32f578", :path=>["/Users/me/path/to/logs/appOne.log", "/Users/me/path/to/logs/appOneStackTrace.log", "/Users/me/path/to/logs/appTwo.log", "/Users/me/path/to/logs/appTwoStackTrace.log"]}
There is no indication the pipeline has any issues starting.
I've also checked that the log files have been updated since the display of TCP input, and they have. The last Logstash-specific log from the ELK stack comes from either startup or the TCP input.
Here is my entire Logstash start-up logging in case that's helpful:
logstash_1 | Sending Logstash logs to /usr/share/logstash/logs which is now configured via log4j2.properties
logstash_1 | [2019-01-29T13:32:19,391][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
logstash_1 | [2019-01-29T13:32:19,415][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.5.4"}
logstash_1 | [2019-01-29T13:32:23,989][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
logstash_1 | [2019-01-29T13:32:24,648][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://elasticsearch:9200/]}}
logstash_1 | [2019-01-29T13:32:24,908][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://elasticsearch:9200/"}
logstash_1 | [2019-01-29T13:32:25,046][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
logstash_1 | [2019-01-29T13:32:25,051][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
logstash_1 | [2019-01-29T13:32:25,108][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//elasticsearch:9200"]}
logstash_1 | [2019-01-29T13:32:25,229][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
logstash_1 | [2019-01-29T13:32:25,276][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"#timestamp"=>{"type"=>"date"}, "#version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
logstash_1 | [2019-01-29T13:32:25,327][INFO ][logstash.inputs.tcp ] Starting tcp input listener {:address=>"0.0.0.0:5000", :ssl_enable=>"false"}
logstash_1 | [2019-01-29T13:32:25,924][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/usr/share/logstash/data/plugins/inputs/file/.sincedb_143c07d174c46eeab78b902edb3b1289", :path=>["/Users/me/path/to/logs/appOne.log", "/Users/me/path/to/logs/appOneStackTrace.log", "/Users/me/path/to/logs/appTwo.log", "/Users/me/path/to/logs/appTwoStackTrace.log"]}
logstash_1 | [2019-01-29T13:32:25,976][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x4d1515ce run>"}
logstash_1 | [2019-01-29T13:32:26,088][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
logstash_1 | [2019-01-29T13:32:26,106][INFO ][filewatch.observingtail ] START, creating Discoverer, Watch with file and sincedb collections
logstash_1 | [2019-01-29T13:32:26,432][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

I found the issue - I needed to map the log files from the host into the container(Docker noob). The local paths I was specifying in the Logstash config were fine for TCP, but were unavailable inside the container without volume mapping.
First, I created the container's internal log directories in the Dockerfile for Logstash:
RUN mkdir /usr/share/appOneLogs
RUN mkdir /usr/share/appTwoLogs
Then I volume-mapped my host's log directories into them in the docker-elk/docker-compose.yml file where Logstash is configured:
logstash:
build:
context: logstash/
args:
ELK_VERSION: $ELK_VERSION
volumes:
- ./logstash/config/logstash.yml:/usr/share/logstash/config/logstash.yml:ro
- ./logstash/pipeline:/usr/share/logstash/pipeline:ro
- /Users/me/path/to/appOne/logs:/usr/share/appOneLogs # this bit
- /Users/me/path/to/appTwo/logs:/usr/share/appTwoLogs # and this bit
ports:
- "5000:5000"
...
Finally, I replaced the paths in logstash/pipelines/logstash.config with the directories created in the Dockerfile:
file {
path => [
"/usr/share/appOneLogs",
"/usr/share/appTwoLogs",
]
}
Also of note, I removed start_position => "beginning" from the file input definition as this overrides the default behavior to treat files like live streams and thus start at the end.

Related

containerd multiline logs parsing with fluentbit

After shifting from Docker to containerd as docker engine used by our kubernetes, we are not able to show the multiline logs in a proper way by our visualization app (Grafana) as some details prepended to the container/pod logs by the containerd itself (i.e. timestamp, stream & log severity to be specific it is appending something like the following and as shown in the below sample: 2022-07-25T06:43:17.20958947Z stdout F  ) which make some confusion for the developers and the application owners.
I am showing here a dummy sample of the logs generated by the application and how it got printed in the nodes of kuberenetes'nodes after containerd prepended the mentioned details.
The following logs generated by the application (kubectl logs ):
2022-07-25T06:43:17,309ESC[0;39m dummy-[txtThreadPool-2] ESC[39mDEBUGESC[0;39m
  ESC[36mcom.pkg.sample.ComponentESC[0;39m - Process message meta {
  timestamp: 1658731397308720468
  version {
      major: 1
      minor: 0
      patch: 0
  }
}
when I check the logs in the filesystem (/var/log/container/ABCXYZ.log) :
2022-07-25T06:43:17.20958947Z stdout F 2022-07-25T06:43:17,309ESC[0;39m dummy-[txtThreadPool-2]
ESC[39mDEBUGESC[0;39m
ESC[36mcom.pkg.sample.ComponentESC[0;39m - Process message meta {
2022-07-25T06:43:17.20958947Z stdout F timestamp: 1658731449723010774
2022-07-25T06:43:17.209593379Z stdout F version {
2022-07-25T06:43:17.209595933Z stdout F major: 14
2022-07-25T06:43:17.209598466Z stdout F minor: 0
2022-07-25T06:43:17.209600712Z stdout F patch: 0
2022-07-25T06:43:17.209602926Z stdout F }
2022-07-25T06:43:17.209605099Z stdout F }
I am able to parse the multiline logs with fluentbit but the problem is I am not able to remove the details injected by containerd ( >> 2022-07-25T06:43:17.209605099Z stdout F .......). So is there anyway to configure containerd to not prepend these details somehow in the logs and print them as they are generated from the application/container ?
On the other hand is there any plugin to remove such details from fluentbit side .. as per the existing plugins none of them can manipulate or change the logs (which is logical  as the log agent should not do any change on the logs).
Thanks in advance.
This is the workaround I followed to show the multiline log lines in Grafana by applying extra fluentbit filters and multiline parser.
1- First I receive the stream by tail input which parse it by a multiline parser (multilineKubeParser).
2- Then another filter will intercept the stream to do further processing by a regex parser (kubeParser).
3- After that another filter will remove the details added by the containerd by a lua parser ().
fluent-bit.conf: |-
[SERVICE]
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
Flush 1
Daemon Off
Log_Level warn
Parsers_File parsers.conf
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
multiline.Parser multilineKubeParser
Exclude_Path /var/log/containers/*_ABC-logging_*.log
DB /run/fluent-bit/flb_kube.db
Mem_Buf_Limit 5MB
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Merge_Log On
Merge_Parser kubeParser
K8S-Logging.Parser Off
K8S-Logging.Exclude On
[FILTER]
Name lua
Match kube.*
call remove_dummy
Script filters.lua
[Output]
Name grafana-loki
Match kube.*
Url http://loki:3100/api/prom/push
TenantID ""
BatchWait 1
BatchSize 1048576
Labels {job="fluent-bit"}
RemoveKeys kubernetes
AutoKubernetesLabels false
LabelMapPath /fluent-bit/etc/labelmap.json
LineFormat json
LogLevel warn
labelmap.json: |-
{
"kubernetes": {
"container_name": "container",
"host": "node",
"labels": {
"app": "app",
"release": "release"
},
"namespace_name": "namespace",
"pod_name": "instance"
},
"stream": "stream"
}
parsers.conf: |-
[PARSER]
Name kubeParser
Format regex
Regex /^([^ ]*).* (?<timeStamp>[^a].*) ([^ ].*)\[(?<requestId>[^\]]*)\] (?<severity>[^ ]*) (?<message>[^ ].*)$/
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
Time_Keep On
Time_Offset +0200
[MULTILINE_PARSER]
name multilineKubeParser
type regex
flush_timeout 1000
rule "start_state" "/[^ ]* stdout .\s+\W*\w+\d\d\d\d-\d\d-\d\d \d\d\:\d\d\:\d\d,\d\d\d.*$/" "cont"
rule "cont" "/[^ ]* stdout .\s+(?!\W+\w+\d\d\d\d-\d\d-\d\d \d\d\:\d\d\:\d\d,\d\d\d).*$/" "cont"
filters.lua: |-
function remove_dummy(tag, timestamp, record)
new_log=string.gsub(record["log"],"%d+-%d+-%d+T%d+:%d+:%d+.%d+Z%sstdout%sF%s","")
new_record=record
new_record["log"]=new_log
return 2, timestamp, new_record
end
As I mentioned this is a workaround till I can find any other/better solution.
From the configuration options for containerd, it appears there's no way to configure logging in any way. You can see the config doc here.
Also, I checked at the logging code inside containerd and it appears this is prepended to the logs as they are redirected from the stdout of the container. You can see that the testcase here checks for appropriate fields by "splitting" the log line received. It checks for a tag and a stream entry prepended to the content of the log. I suppose that's the way logs are processed in containerd.
The best thing to do would be to open an issue in the project with your design requirement and perhaps the team can develop configurable stdout redirection for you.
This might help. They use a custom regex to capture the message log and the rest of the info in the logs and then use lift in the nest filter to flatten the json.
https://github.com/microsoft/fluentbit-containerd-cri-o-json-log

explanation of Service.get from pulumi

I am using pulumi release to deploy a helm chart including many service and trying to get one of the deployed service. https://www.pulumi.com/blog/full-access-to-helm-features-through-new-helm-release-resource-for-kubernetes/#how-do-i-use-it shows we can use Service.get to achieve this goal but I failed to find any information of the parameters of the method. Could someone explain it a bit or point me to the correct documentation on Service.get?
Thanks
I think there's a bug in that post; it should be -master, not -redis-master:
...
srv = Service.get("redis-master-svc", Output.concat(status.namespace, "/", status.name, "-master"))
As for what's going on here, I'll try to explain, as you're right that this doesn't seem to be documented in a way that's easy to find, as it isn't part of the Kubernetes provider API, but rather part of the core Pulumi resource API.
To address the
If you change up the example to use -master instead, you should be able to run the Pulumi program as otherwise quoted in that blog post. Here's the complete, modified program I'm using for reference:
import pulumi
from pulumi import Output
from pulumi_random.random_password import RandomPassword
from pulumi_kubernetes.core.v1 import Namespace, Service
from pulumi_kubernetes.helm.v3 import Release, ReleaseArgs, RepositoryOptsArgs
namespace = Namespace("redis-ns")
redis_password = RandomPassword("pass", length=10)
release_args = ReleaseArgs(
chart="redis",
repository_opts=RepositoryOptsArgs(
repo="https://charts.bitnami.com/bitnami"
),
version="13.0.0",
namespace=namespace.metadata["name"],
# Values from Chart's parameters specified hierarchically,
# see https://artifacthub.io/packages/helm/bitnami/redis/13.0.0#parameters
# for reference.
values={
"cluster": {
"enabled": True,
"slaveCount": 3,
},
"metrics": {
"enabled": True,
"service": {
"annotations": {
"prometheus.io/port": "9127",
}
},
},
"global": {
"redis": {
"password": redis_password.result,
}
},
"rbac": {
"create": True,
},
},
# By default Release resource will wait till all created resources
# are available. Set this to true to skip waiting on resources being
# available.
skip_await=False)
release = Release("redis-helm", args=release_args)
# We can lookup resources once the release is installed. The release's
# status field is set once the installation completes, so this, combined
# with `skip_await=False` above, will wait to retrieve the Redis master
# ClusterIP till all resources in the Chart are available.
status = release.status
pulumi.export("namespace", status.namespace)
srv = Service.get("redis-master-svc", Output.concat(status.namespace, "/", status.name, "-master"))
pulumi.export("redisMasterClusterIP", srv.spec.cluster_ip)
When you deploy this program with pulumi up (e.g., locally with Minikube), you'll have a handful of running services:
$ pulumi up --yes
...
Updating (dev)
...
Type Name Status
+ pulumi:pulumi:Stack so-71802926-dev created
+ ├─ kubernetes:core/v1:Namespace redis-ns created
+ ├─ random:index:RandomPassword pass created
+ ├─ kubernetes:helm.sh/v3:Release redis-helm created
└─ kubernetes:core/v1:Service redis-master-svc
Outputs:
namespace : "redis-ns-0f9e4b1e"
redisMasterClusterIP: "10.103.98.199"
Resources:
+ 4 created
Duration: 1m13s
$ minikube service list
|-------------------|------------------------------|--------------|-----|
| NAMESPACE | NAME | TARGET PORT | URL |
|-------------------|------------------------------|--------------|-----|
| default | kubernetes | No node port |
| kube-system | kube-dns | No node port |
| redis-ns-0f9e4b1e | redis-helm-b5f3ea12-headless | No node port |
| redis-ns-0f9e4b1e | redis-helm-b5f3ea12-master | No node port |
| redis-ns-0f9e4b1e | redis-helm-b5f3ea12-metrics | No node port |
| redis-ns-0f9e4b1e | redis-helm-b5f3ea12-slave | No node port |
|-------------------|------------------------------|--------------|-----|
Getter functions like Service.get are explained here, in the Resources docs: https://www.pulumi.com/docs/intro/concepts/resources/get/
Service.get takes two arguments. The first is the logical name you want to use to refer to the fetched resource in your stack; it can generally be any string, as long as it's unique among other resources in the stack. The second is the "physical" (i.e., provider-native) ID by which to look it up. It looks like the Kubernetes provider wants that ID to be of the form {namespace}/{name}, which is why you need to use Output.concat to assemble a string composed of the eventual values of status.namespace and status.name (as these values aren't known until the update completes). You can learn more about Outputs and Output.concat in the Resources docs as well: https://www.pulumi.com/docs/intro/concepts/inputs-outputs/
Hope that helps! Let me know if you have any other questions. I've also submitted a PR to get that blog post fixed up.

Liquibase via Docker - Changelog is not written to disk

I want to set up Liquibase (using Docker) for a PostgreSQL database running locally (not in a container). I followed multiple tutorials, including the one on Docker Hub.
As suggested I've created a liquibase.docker.properties file in my <PATH TO CHANGELOG DIR>
classpath: /liquibase/changelog
url: jdbc:postgresql://localhost:5432/mydb?currentSchema=public
changeLogFile: changelog.xml
username: myuser
password: mypass
to be able to run docker run --rm --net="host" -v <PATH TO CHANGELOG DIR>:/liquibase/changelog liquibase/liquibase --defaultsFile=/liquibase/changelog/liquibase.docker.properties <COMMAND>.
When I run [...] generateChangeLog I get the following output (with option --logLevel info):
[2021-04-27 06:08:20] INFO [liquibase.integration] No Liquibase Pro license key supplied. Please set liquibaseProLicenseKey on command line or in liquibase.properties to use Liquibase Pro features.
Liquibase Community 4.3.3 by Datical
####################################################
## _ _ _ _ ##
## | | (_) (_) | ##
## | | _ __ _ _ _ _| |__ __ _ ___ ___ ##
## | | | |/ _` | | | | | '_ \ / _` / __|/ _ \ ##
## | |___| | (_| | |_| | | |_) | (_| \__ \ __/ ##
## \_____/_|\__, |\__,_|_|_.__/ \__,_|___/\___| ##
## | | ##
## |_| ##
## ##
## Get documentation at docs.liquibase.com ##
## Get certified courses at learn.liquibase.com ##
## Free schema change activity reports at ##
## https://hub.liquibase.com ##
## ##
####################################################
Starting Liquibase at 06:08:20 (version 4.3.3 #52 built at 2021-04-12 17:08+0000)
BEST PRACTICE: The changelog generated by diffChangeLog/generateChangeLog should be inspected for correctness and completeness before being deployed.
[2021-04-27 06:08:22] INFO [liquibase.diff] changeSets count: 1
[2021-04-27 06:08:22] INFO [liquibase.diff] changelog.xml does not exist, creating and adding 1 changesets.
Liquibase command 'generateChangeLog' was executed successfully.
It looks like the command ran "successfully" but I could not find the file changelog.xml in my local directory which I mounted, i.e. <PATH TO CHANGELOG DIR>. The mounting however has to be working since it connects to the database successfully, i.e. the container is able to access and read liquibase.docker.properties.
First I thought I might have to "say" to Docker that it is allowed to write on my disk but it seems that this should be supported [from the description on Docker Hub]:
The /liquibase/changelog volume can also be used for commands that write output, such as generateChangeLog
What am I missing? Thanks in advance for any help!
Additional information
Output of docker inspect:
"Mounts": [
{
"Type": "bind",
"Source": "<PATH TO CHANGELOG DIR>",
"Destination": "/liquibase/changelog",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
...
],
When you run generateChangeLog, the path to the file should be specified as /liquibase/changelog/changelog.xml even though for update it needs to be changelog.xml
Example:
docker run --rm --net="host" -v <PATH TO CHANGELOG DIR>:/liquibase/changelog liquibase/liquibase --defaultsFile=/liquibase/changelog/liquibase.docker.properties --changeLogFile=/liquibase/changelog/changelog.xml generateChangeLog
For generateChangeLog, the changeLogFile argument is the specific path to the file to output vs. a path relative to the classpath setting that update and other commands use.
When you include the command line argument as well as a defaultsFile like above, the command line argument wins. That lets you leverage the same default settings while replacing specific settings when specific commands need more/different ones.
Details
There is a distinction between operations that are creating files and ones that are reading existing files.
With Liquibase, you almost always want to use paths to files that are relative to directories in the classpath like the examples have. The specified changelogFile gets stored in the tracking system, so if you ever run the same changelog but referenced in a different way (because you moved the root directory or are running from a different machine) then Liquibase will see it as a new file and attempt to re-run already ran changesets.
That is why the documentation has classpath: /liquibase/changelog and changeLogFile: com/example/changelog.xml. The update operation looks in the /liquibase/changelog dir to find a file called com/example/changelog.xml and finds it and stores the path as com/example/changelog.xml.
GenerateChangeLog is one of those "not always relative to classpath" cases because it needs to know where to store the file. If you just specify the output changeLogFile as changelog.xml it creates just creates that file relative to your process's working directory which is not what you are needing/expecting.
TL;DR
Prefix the changelog filename with /liquibase/changelog/ and pass it as a command line argument:
[...] --changeLogFile /liquibase/changelog/changelog.xml generateChangelog
See Nathan's answer for details.
Explanation
I launched the container with -it and overwrote the entrypoint to get an interactive shell within the container (see this post):
docker run --net="host" -v <PATH TO CHANGELOG DIR>:/liquibase/changelog -it --entrypoint /bin/bash liquibase/liquibase -s
Executing ls yields the following:
liquibase#ubuntu-rafael:/liquibase$ ls
ABOUT.txt UNINSTALL.txt docker-entrypoint.sh liquibase
GETTING_STARTED.txt changelog examples liquibase.bat
LICENSE.txt changelog.txt lib liquibase.docker.properties
README.txt classpath licenses liquibase.jar
Notable here is docker-entrypoint.sh which actually executes the liquibase command, and the folder changelog which is mounted to my local <PATH TO CHANGELOG DIR> (my .properties file is in there).
Now I ran the same command as before but now inside the container:
sh docker-entrypoint.sh --defaultsFile=/liquibase/changelog/liquibase.docker.properties --logLevel info generateChangeLog
I got the same output as above but guess what reveals when running ls again:
ABOUT.txt changelog examples liquibase.docker.properties
GETTING_STARTED.txt changelog.txt lib liquibase.jar
LICENSE.txt changelog.xml ...
The changelog actually exists! But it is created in the wrong directory...
If you prefix the changelog filename with /liquibase/changelog/, the container is able to write it to your local (mounted) disk.
P.S. This means that the description of the "Complete Example" using "a properties file" from here is not working. I will open an Issue for that.
UPDATE
Specifying the absolute path is only necessary for commands that write a new file, e.g. generateChangeLog (see Nathan's answer). But it is better practise to pass the absolute path via command line so that you can keep the settings in the defaults-file.

Telegraf & InfluxDB: how to convert PROCSTAT's pid from field to tag?

Summary: I am using telegraf to get procstat into InfluxDB. I want to convert the pid from an integer field to a TAG so that I can do group by on it in Influx.
Details:
After a lot of searching I found the following on some site but it seems to be doing the opposite (converts tag into a field). I am not sure how to deduce the opposite conversion syntax from it:
[processors]
[[processors.converter]]
namepass = [ "procstat",]
[processors.converter.tags]
string = [ "cmdline",]
I'm using Influx 1.7.9
The correct processor configuration to convert pid as tag is as below.
[processors]
[[processors.converter]]
namepass = [ "procstat"]
[processors.converter.fields]
tag = [ "pid"]
Please refer the documentation of converter processor plugin
https://github.com/influxdata/telegraf/tree/master/plugins/processors/converter
In the latest version of telegraf pid can be stored as tag by specifying it in the input plugin configuration. A converter processor is not needed here.
Mention pid_tag = true in the configuration. However be aware of the performance impact of having pid as a tag when processes are short lived.
P.S: You should try to upgrade your telegraf version to 1.14.5. There is a performance improvement fix for procstat plugin in this version.
Plugin configuration reference https://github.com/influxdata/telegraf/tree/master/plugins/inputs/procstat
Sample config.
# Monitor process cpu and memory usage
[[inputs.procstat]]
## PID file to monitor process
pid_file = "/var/run/nginx.pid"
## executable name (ie, pgrep <exe>)
# exe = "nginx"
## pattern as argument for pgrep (ie, pgrep -f <pattern>)
# pattern = "nginx"
## user as argument for pgrep (ie, pgrep -u <user>)
# user = "nginx"
## Systemd unit name
# systemd_unit = "nginx.service"
## CGroup name or path
# cgroup = "systemd/system.slice/nginx.service"
## Windows service name
# win_service = ""
## override for process_name
## This is optional; default is sourced from /proc/<pid>/status
# process_name = "bar"
## Field name prefix
# prefix = ""
## When true add the full cmdline as a tag.
# cmdline_tag = false
## Add the PID as a tag instead of as a field. When collecting multiple
## processes with otherwise matching tags this setting should be enabled to
## ensure each process has a unique identity.
##
## Enabling this option may result in a large number of series, especially
## when processes have a short lifetime.
# pid_tag = false

How to configure logstash 2.3.3 websocket

I am trying to get logstash 2.3.3 websocket input working.
Logstash: https://download.elastic.co/logstash/logstash/logstash-2.3.3.tar.gz
Websocket Input Plugin for Logstash: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-websocket.html
Websocket server: https://github.com/joewalnes/websocketd/releases/download/v0.2.11/websocketd-0.2.11-linux_amd64.zip
Websocket Client: Chrome Plugin "Simple Web Socket Client"
I am aware of a bug filed last year logstash 1.5.0 and the websocket input plugin. https://github.com/logstash-plugins/logstash-input-websocket/issues/3 I have also received those same error messages, although I can't reproduce them anymore. The following is my current procedure and result. I am hoping that bug has since been fixed and I just can't find the correct config.
First I installed the plugin and confirmed it is listed as installed.
/app/bin/logstash-plugin list | grep "websocket"
Next, I checked that logstash was working with the following config
input {
stdin { }
}
output {
file {
path => "/app/logstash-2.3.3/logstash-log.txt"
}
}
Logstash worked.
/app/logstash-2.3.3/bin/logstash agent --config /app/logstash-2.3.3/logstash.conf
Hello World
The file logstash-log.txt contained:
{"message":"Hello World","#version":"1","#timestamp":"2016-07-05T20:04:14.850Z","host":"server-name.domain.com"}
Next I opened port 9300
I wrote a simple bash script to return some numbers
#!/bin/bash
case $1 in
-t|--to)
COUNTTO=$2
shift
;;
esac
shift
printf 'Count to %i\n' $COUNTTO
for COUNT in $(seq 1 $COUNTTO); do
echo $COUNT
sleep 0.1
done
I started up websocketd pointing to my bash script
/app/websocketd --port=9300 /app/count.sh --to 7
I opened Simple Web Socket Client in Chrome and connected
ws://server-name.domain.com:9300
Success! It returned the following.
Count to 7
1
2
3
4
5
6
7
At this point I know websocketd works and logstash works. Now is when the trouble starts.
Logstash websocket input configuration file
input {
websocket {
codec => "plain"
url => "ws://127.0.0.1:9300/"
}
}
output {
file {
path => "/app/logstash-2.3.3/logstash-log.txt"
}
}
Run configtest
/app/logstash-2.3.3/bin/logstash agent --config /app/logstash-2.3.3/logstash.conf --configtest
Receive "Configuration OK"
Start up websocketd
/app/websocketd --port=9300 /app/logstash-2.3.3/bin/logstash agent --config /app/logstash-2.3.3/logstash.conf
Back in Simple Web Socket Client, I connect to ws://server-name.domain.com:9300. I see a message pop up that I started a session.
Tue, 05 Jul 2016 20:07:13 -0400 | ACCESS | session | url:'http://server-name.domain.com:9300/' id:'1467732248361139010' remote:'192.168.0.1' command:'/app/logstash-2.3.3/bin/logstash' origin:'chrome-extension://pfdhoblngbopfeibdeiidpjgfnlcodoo' | CONNECT
I try to send "hello world". Nothing apparent happens on the server. After about 15 seconds I see a disconnect message in my console window. logstash-log.txt is never created.
Any ideas for what to try? Thank you!
UPDATE 1:
I tried putting the following in a bash script called "launch_logstash.sh":
#!/bin/bash
exec /app/logstash-2.3.3/bin/logstash agent --config /app/logstash-2.3.3/logstash.conf
Then I started websocketd like so:
/app/websocketd --port=9300 /app/logstash-2.3.3/bin/launch_logstash.sh
Same result; no success.
Upon reading the websocketd documentation more closely, it sends the data received on the socket to a program's stdin. I was trying to listen to a socket in my logstash config, but the data is actually going to that app's stdin. I changed my config to this:
input {
stdin { }
}
output {
file {
path => "/app/logstash-2.3.3/logstash-log.txt"
}
}
Then launched websocketd like this:
/app/websocketd --port=9300 /app/logstash-2.3.3/bin/logstash agent --config /app/logstash-2.3.3/logstash.conf
So in short, until logstash-websocket-input implements their server option, stdin{} and stdout{} are the input and output if using websocketd as the web server.