Pycapa to kafka topic - bogue or undecoded characters - apache-kafka

I am trying to capture live packets from a network interface using Pycapa from Metron,but when i try to consume the messages from the topic, i am receiving the following strange characters.
!i�f�U�_� ��mP�pO���62.��a#;�k��o��0�?
!i�f�U�_� ��mP�pO���62.��a#;�k��o��0�?
I am not using Confluent platform. In this case, can someone guide me to a solution?
Thank you

Based on the docs it looks like pycapa stores the raw network packet data, which is probably what you're seeing here.
If you look at the examples you'll see there's one for consuming this raw data and piping it into something like tshark which can read the raw data and render it readable:
pycapa --consumer \
--kafka-broker localhost:9092 \
--kafka-topic ciscotopic1 \
--max-packets 10 \
| tshark -i -

Related

Truncated SAMLResponse with TCPdump

I have been trying to use tcpdump to capture the SAML request to the server.
I am interested in the SAMLResponse so i can decoded and get the XML but tcpdump seems to truncate the output so I miss a lot of data:
tcpdump -A -nnSs 0 'tcp port 8080 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'
This should capture all HTTP request/response/body which it does but the SAMLResponse is truncated:
SAMLResponse=PHNhbWxwOlJlc3BvbnNlIElEPSJfMDAyMDg3MmQtZTlmMi00ZGU5LTkxMGYtM2NiNDc1MjVkNTk2IiBWZXJzaW9uPSIyLjAiIElzc3VlSW5zdGFudD0iMjAxOS0xMS0xM1QyMTo0ODo0Mi42ODlaIiBEZXN0aW5hdG
if I decode that I get:
samlp:Response ID="_0020872d-e9f2-4de9-910f-3cb47525d596" Version="2.0" IssueInstant="2019-11-13T21:48:42.689Z" Destinat
An incomplete output. if I add -w /tmp/out.pcap i am able to see the entire SAMLResponse in wireshark, what am i missing here?
I am on a linux i would like to work with this from the command line. What i dont understand is that sometimes i get more characters than others.
I am not sure if this is in another call separate from this one if it is how to join them in tcpdump?
thanks
a alternative is to use tcpflow
tcpflow -c 'port 8080'
Extract of man tcpflow
DESCRIPTION
tcpflow is a program that captures data transmitted as part of TCP
connections (flows), and stores the data in a way that is
convenient for protocol analysis or debugging. Rather than showing
packet-by-packet information, tcpflow reconstructs the actual data
streams and stores each flow in a separate file for later analysis.
or you can use tshark

Use log4j to log message in liberty console

Our log server consumes our log messages through kubernetes pods sysout formatted in json and indexes json fields.
We need to specify some predefined fields in messages, so that we can track transactions across pods.
For one of our pod we use Liberty profile and have issue to configure logging for these needs.
One idea was to use log4j to send customized json message in console. But all message are corrupted by Liberty log system that handles and modifies all logs done in console. I failed to configure Liberty logging parameters (copySystemStreams = false, console log level = NO) for my needs and always have liberty modify my outputs and interleaved non json messages.
To workaround all that I used liberty consoleFormat="json" logging parameter, but this introduced unnecessary fields and also do not allow me to specify my custom fields.
Is it possible to control liberty logging and console ?
What is the best way to do my use case with Liberty (and if possible Log4j)
As you mentioned, Liberty has the ability to log to console in JSON format [1]. The two problems you mentioned with that, for your use case, are 1) unnecessary fields, and 2) did not allow you to specify your custom fields.
Regarding unnecessary fields, Liberty has a fixed set of fields in its JSON schema, which you cannot customize. If you find you don't want some of the fields I can think of a few options:
use Logstash.
Some log handling tools, like Logstash, allow you to remove [2] or mutate [3] fields. If you are sending your logs to Logstash you could adjust the JSON to your needs that way.
change the JSON format Liberty sends to stdout using jq.
The default CMD (from the websphere-liberty:kernel Dockerfile) is:
CMD ["/opt/ibm/wlp/bin/server", "run", "defaultServer"]
You can add your own CMD to your Dockerfile to override that as follows (adjust jq command as needed):
CMD /opt/ibm/wlp/bin/server run defaultServer | grep --line-buffered "}" | jq -c '{ibm_datetime, message}'
If your use case also requires sending log4J output to stdout, I would suggest changing the Dockerfile CMD to run a script you add to the image. In that script you would need to tail your log4J log file as follows (this could be combined with the above advice on how to change the CMD to use jq as well)
`tail -F myLog.json &`
`/opt/ibm/wlp/bin/server run defaultServer`
[1] https://www.ibm.com/support/knowledgecenter/en/SSEQTP_liberty/com.ibm.websphere.wlp.doc/ae/rwlp_logging.html
[2] https://www.elastic.co/guide/en/logstash/current/plugins-filters-prune.html
[3] https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html
Just in case it helps, I ran into the same issue and the best solution I found was:
Convert app to use java.util.Logging (JUL)
In server.xml add <logging consoleSource="message,trace" consoleFormat="json" traceSpecification="{package}={level}"/> (swap package and level as required).
Add a bootstrap.properties that contains com.ibm.ws.logging.console.format=json.
This will give you consistent server and application logging in JSON. A couple of lines at the boot of the server are not json but that was one empty line and a "Launching defaultServer..." line.
I too wanted the JSON structure to be consistent with other containers using Log4j2 so, I followed the advice from dbourne above and add jq to my CMD in my dockerfile to reformat the JSON:
CMD /opt/ol/wlp/bin/server run defaultServer | stdbuf -o0 -i0 -e0 jq -crR '. as $line | try (fromjson | {level: .loglevel, message: .message, loggerName: .module, thread: .ext_thread}) catch $line'
The stdbuf -o0 -i0 -e0 stops pipe ("|") from buffering its output.
This strips out the liberty specific json attributes, which is either good or bad depending on your perspective. I don't need to new values so I don't have a good recommendation for that.
Although the JUL API is not quite as nice as Log4j2 or SLF4j, it's very little code to wrap the JUL API in something closer to Log4j2 E.g. to have varargs rather than an Object[].
OpenLiberty will also dynamically change logging if you edit the server.xml so, it pretty much has all the necessary bits; IMHO.

How to send PHP app logs directly to ELK service?

According to the documentation there are two ways to send log information to the SwisscomDev ELK service.
Standard way via STDOUT: Every output to stdout is sent to Logstash
Directly send to Logstash
Asking about way 2. How is is this achieved, especially how is the input expected?
We're using Monolog in our PHP buildpack based application and using its stdout_handler is working fine.
I was trying the GelfHandler (connection refused), SyslogUdPHandler (no error, but no result), both configured to use VCAPServices logstashHost and logstashPort as API endpoint / host to send logs to.
Binding works, env variables are set, but I have no idea how to send SwisscomDev ELK service Logstash API endpoint compatible log information from our application.
Logstash is configured with a tcp input, which is reachable via logstashHost:logstashPort. The tcp input is configured with its default codec, which is the line codec (source code; not the plain codec as stated in the documentation).
The payload of the log event should be encoded in JSON so that the fields are automatically recognized by Elasticsearch. If this is the case, the whole log event is forwarded without further processing to Elasticsearch.
If the payload is not JSON, the whole log line will end up in the field message.
For your use case with Monolog, I suggest you to use the SocketHandler (pointing it to logstashHost:logstashPort) in combination with the LogstashFormatter which will take care of the JSON encoding with the log events being line delimited.

How do I get notified when an object is uploaded to my GCS bucket?

I have an app that uploads photos regularly to a GCS bucket. When those photos are uploaded, I need to add thumbnails and do some analysis. How do I set up notifications for the bucket?
The way to do this is to create a Cloud Pub/Sub topic for new objects and to configure your GCS bucket to publish messages to that topic when new objects are created.
First, let's create a bucket PHOTOBUCKET:
$ gsutil mb gs://PHOTOBUCKET
Now, make sure you've activated the Cloud Pub/Sub API.
Next, let's create a Cloud Pub/Sub topic and wire it to our GCS bucket with gsutil:
$ gsutil notification create \
-t uploadedphotos -f json \
-e OBJECT_FINALIZE gs://PHOTOBUCKET
The -t specifies the Pub/Sub topic. If the topic doesn't already exist, gsutil will create it for you.
The -e specifies that you're only interested in OBJECT_FINALIZE messages (objects being created). Otherwise you'll get every kind of message in your topic.
The -f specifies that you want the payload of the messages to be the object metadata for the JSON API.
Note that this requires a recent version of gsutil, so be sure to update to the latest version of gcloud, or run gsutil update if you use a standalone gsutil.
Now we have notifications configured and pumping, but we'll want to see them. Let's create a Pub/Sub subscription:
$ gcloud beta pubsub subscriptions create processphotos --topic=uploadedphotos
Now we just need to read these messages. Here's a Python example of doing just that. Here are the relevant bits:
def poll_notifications(subscription_id):
client = pubsub.Client()
subscription = pubsub.subscription.Subscription(
subscription_id, client=client)
while True:
pulled = subscription.pull(max_messages=100)
for ack_id, message in pulled:
print('Received message {0}:\n{1}'.format(
message.message_id, summarize(message)))
subscription.acknowledge([ack_id])
def summarize(message):
# [START parse_message]
data = message.data
attributes = message.attributes
event_type = attributes['eventType']
bucket_id = attributes['bucketId']
object_id = attributes['objectId']
return "A user uploaded %s, we should do something here." % object_id
Here is some more reading on how this system works:
https://cloud.google.com/storage/docs/reporting-changes
https://cloud.google.com/storage/docs/pubsub-notifications
GCP also offers an earlier version of the Pub/Sub cloud storage change notifications called Object Change Notification. This feature will directly POST to your desired endpoint(s) when an object in that bucket changes. Google recommends the Pub/Sub approach.
https://cloud.google.com/storage/docs/object-change-notification
while using this example!
keep in mind two things
1) they have upgraded code to python 3.6 pub_v1 this might not be running on python 2.7
2) while calling poll_notifications(projectid,subscriptionname)
pass your GCP project id : e.g bold-idad & subscrition name e.g asrtopic

How to pass PcapPackets into Kafka queue

With the below code to pass PcapPackets to a queue, is it possible to pass this into Kafka queue so that Kafka consumer can pull PcapPackets as such from Kafka producer?
StringBuilder errbuf = new StringBuilder();
Pcap pcap = Pcap.openOffline("tests/test-afs.pcap", errbuf);
PcapPacketHandler<Queue<PcapPacket>> handler = new PcapPacketHandler<Queue<PcapPacket>>() {
public void nextPacket(PcapPacket packet, Queue<PcapPacket> queue) {
PcapPacket permanent = new PcapPacket(packet);
queue.offer(packet);
}
}
Queue<PcapPacket> queue = new ArrayBlockingQueue<PcapPacket>();
pcap.loop(10, handler, queue);
System.out.println("we have " + queue.size() + " packets in our queue");
pcap.close();
Kafka supports storing an arbitrary binary data as messages. In your case you just need to provide a PcapPacket class binary serializer (and deserializer for reading).
See Kafka: writing custom serializer for an example.
Though I am late to the party, I share my tool: Pcap Processor (GitHub URL) here if anyone with similar requirements finds it useful. I have developed a tool in Python for my research to read raw pcap files, to process them and to feed them to my stream processor. Since I tried various stream protocols, I implemented all of them in this tool.
Currently supported sinks:
CSV file
Apache Kafka (encoded into JSON string)
HTTP REST (JSON)
gRPC
Console (just print to the terminal)
For example, to read input.pcap and to send it to a Kafka topic, you need to adjust the bootstrap endpoint and topic name in kafka_sink.py. Then, executing the following command from parent directory will read the file and send packets to Kafka queue.
python3 -m pcap_processor --sink kafka input.pcap
For more details and installation instructions, please check the GitHub readme and feel free to open GitHub issues if you encounter any problems.