Logstash file input and avro codec parsing problem - plugins

All,
I am having some issue with getting Logstash file input and avro codec to work together.
The idea is to read an avro file and extract a json object and then insert it into Elasticsearch.
Below is my config file.
input {
file {
path => "C:/dev/testData/xxx/weather.avro"
start_position => "beginning"
codec => avro {
schema_uri => "C:/dev/testData/xxx/weather.avsc"
}
sincedb_path=>"C:/dev/testData/property/db"
}
}
output {
stdout { codec => rubydebug }
}
Here is the log:
2018-10-18T13:45:09,523][DEBUG][filewatch.tailmode.handlers.grow] read_to_eof: get chunk
[2018-10-18T13:45:09,599][DEBUG][logstash.inputs.file ] Received line {:path=>"C:/dev/testData/chewer/weather.avro", :text=>"Obj\x01\x04\x14avro.codec\bnull\x16avro.schema\xF2\x02{\"type\":\"record\",\"name\":\"Weather\",\"namespace\":\"test\",\"fields\":[{\"name\":\"station\",\"type\":\"string\"},{\"name\":\"time\",\"type\":\"long\"},{\"name\":\"temp\",\"type\":\"int\"}],\"doc\":\"A weather reading.\"}\x00\xB0\x81\xB3\xC4"}
[2018-10-18T13:45:10,155][ERROR][filewatch.tailmode.handlers.grow] read_to_eof: general error reading C:/dev/testData/chewer/weather.avro {"error"=>"#", "backtrace"=>["org/jruby/ext/stringio/StringIO.java:788:in read'", "C:/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:106:inread'", "C:/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:93:in read_bytes'", "C:/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:99:inread_string'"]}
The test files are from Avro's own repo (weather.avro)
I use avro 1.8.2 and logstash-codec-avro-3.2.3-java.
Please advise.
Gordon

Related

How can i handle multipart post request with akka http?

I wont to handle multipart request.
If I accept a request using such a route
val routePutData = path("api" / "putFile" / Segment) {
subDir => {
entity(as[String]) { (str) => {
complete(str)
}
}
}}
I get the following text(i try to send log4j config):
Content-Disposition: form-data; name="file"; filename="log4j.properties"
Content-Type: application/binary
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd hh:mm:ss} %t %-5p %c{1} - %m%n
----gc0pMUlT1B0uNdArYc0p--
How can i get array of bytes from file i send and file name?
I try to use entity(as[Multipart.FormData]), and formFields directive, but it didn't help.
You should keep up with the akka docs, but I think that there were not enought examples in the file uploading section. Anyway, you don't need to extract entity as a string or byte arrays, akka already has a directive, called fileUpload. This takes a parameter called fieldName which is the key to look for in the multipart request, and expects a function to know what to do given the metadata and the content of the file. Something like this:
post {
extractRequestContext { ctx =>
implicit val mat = ctx.materializer
fileUpload(fieldName = "myfile") {
case (metadata, byteSource) =>
val fileName = metadata.fileName
val futureBytes = byteSource
.mapConcat[Byte] { byteString =>
collection.immutable.Iterable.from(
byteString.iterator
)
}
.toMat(Sink.fold(Array.emptyByteArray) {
case (arr, newLine) => arr :+ newLine
}
)(Keep.right)
.run()
val filePath = Files.createFile(Paths.get(s"/DIR/TO/SAVE/FILE/$fileName"))
onSuccess(futureBytes.map(bytes => Files.write(filePath, bytes))) { _ =>
complete(s"wrote file to: ${filePath.toUri.toString}")
}
}
}
}
While the above solution looks good, there is also the storeUploadedFile directive to achieve the same with less code, sth like:
path("upload") {
def tempDestination(fileInfo: FileInfo): File = File.createTempFile(fileInfo.fileName, ".tmp.server")
storeUploadedFile("myfile", tempDestination) {
case (metadataFromClient: FileInfo, uploadedFile: File) =>
println(s"Server stored uploaded tmp file with name: ${uploadedFile.getName} (Metadata from client: $metadataFromClient)")
complete(HttpResponse(StatusCodes.OK))
}
}

How to upload files and get formfields in akka-http

I am trying to upload a file via akka-http, and have gotten it to work with the following snippet
def tempDestination(fileInfo: FileInfo): File =
File.createTempFile(fileInfo.fileName, ".tmp")
val route =
storeUploadedFile("csv", tempDestination) {
case (metadata, file) =>
//Do my operation on the file.
complete("File Uploaded. Status OK")
}
But I'd also want to send a param1/param2 in the posted form.
I tried the following, and it works, but I am having to send the parameters via the URL (http://host:port/csv-upload?userid=arvind)
(post & path("csv-upload")) {
storeUploadedFile("csv", tempDestination) {
case (metadata, file) =>
parameters('userid) { userid =>
//logic for processing the file
complete(OK)
}
}
}
The restriction on the file size is around 200-300 MB. I added the following property to my conf
akka{
http{
parsing{
max-content-length=200m
}
}
}
Is there a way, I can get the parameters via the formFields directive ?
I tried the following
fileUpload("csv") {
case (metadata, byteSource) =>
formFields('userid) { userid =>
onComplete(byteSource.runWith(FileIO.toPath(Paths.get(metadata.fileName)))) {
case Success(value) =>
logger.info(s"${metadata}")
complete(StatusCodes.OK)
case Failure(exception) =>
complete("failure")
But, with the above code, I hit the following exception
java.lang.IllegalStateException: Substream Source cannot be materialized more than once
at akka.stream.impl.fusing.SubSource$$anon$13.setCB(StreamOfStreams.scala:792)
at akka.stream.impl.fusing.SubSource$$anon$13.preStart(StreamOfStreams.scala:802)
at akka.stream.impl.fusing.GraphInterpreter.init(GraphInterpreter.scala:306)
at akka.stream.impl.fusing.GraphInterpreterShell.init(ActorGraphInterpreter.scala:593)
Thanks,
Arvind
I got this working with sth like:
path("upload") {
formFields(Symbol("payload")) { payload =>
println(s"Server received request with additional payload: $payload")
def tempDestination(fileInfo: FileInfo): File = File.createTempFile(fileInfo.fileName, ".tmp.server")
storeUploadedFile("binary", tempDestination) {
case (metadataFromClient: FileInfo, uploadedFile: File) =>
println(s"Server stored uploaded tmp file with name: ${uploadedFile.getName} (Metadata from client: $metadataFromClient)")
complete(Future(FileHandle(uploadedFile.getName, uploadedFile.getAbsolutePath, uploadedFile.length())))
}
}
}
Full example:
https://github.com/pbernet/akka_streams_tutorial/blob/master/src/main/scala/akkahttp/HttpFileEcho.scala

how to read aparticular line from log file using logstash

I have to read 3 different lines from log files based on some text and then output the fields in a csv file.
sample log data:-
20110607 095826 [.] !! Begin test. Script filename/text.txt
20110607 095826 [.] Full path: filename/test/text.txt
20110607 095828 [.] FAILED: Test Failed()..
i have to read file name after !!Begin test. Script. this is my conf file
filter{
grok
{
match => {"message" => "%{BASE10NUM:Date}%{SPACE:pat}{BASE10NUM:Number}%
{SPACE:pat}[.]%{SPACE:pat}%{SPACE:pat}!! Begin test. Script%
{SPACE:pat}%{GREEDYDATA:file}"
}
overwrite => ["message"]
}
if "_grokparserfailure" in [tags]
{
drop{}
}
}
but its not giving me single record, its parsing full log file in json format no parsed field.

logstash(2.3.2) gzip codec not work

I'm using logstash(2.3.2) to read gz file by using gzip_lines codec.
The log file example (sample.log) is
127.0.0.2 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
The command I used to append to a gz file is:
cat sample.log | gzip -c >> s.gz
The logstash.conf is
input {
file {
path => "./logstash-2.3.2/bin/s.gz"
codec => gzip_lines { charset => "ISO-8859-1"}
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
#match => { "message" => "message: %{GREEDYDATA}" }
}
#date {
# match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
#}
}
output {
stdout { codec => rubydebug }
}
I have installed gzip_line plugin with bin/logstash-plugin install logstash-codec-gzip_lines
start logstash with ./logstash -f logstash.conf
When I feed s.gz with
cat sample.log | gzip -c >> s.gz
I expect that the console prints the data. but there is nothing print out.
I have tried it on mac and ubuntu, and get same result.
Is anything wrong with my code?
I checked the code for gzip_lines and it seemed obvious to me that this plugin is not working. At least for version 2.3.2. May be it is outdated. Because it does not implement the methods specified here:
https://www.elastic.co/guide/en/logstash/2.3/_how_to_write_a_logstash_codec_plugin.html
So current internal working is like that:
file input plugin reads file line by line and send it to codec.
gzip_lines codec tryies to create a new GzipReader object with GzipReader.new(io)
It then go through the reader line by line to create events.
Because you specify a gzip file, file input plugin tries to read gzip file as a regular file and sends lines to codec. Codec tries to create a GzipReader with that string and it fails.
You can modify it to work like that:
Create a file that contains list of gzip files:
-- list.txt
/path/to/gzip/s.gz
Give it to file input plugin:
file {
path => "/path/to/list/list.txt"
codec => gzip_lines { charset => "ISO-8859-1"}
}
Changes are:
Open vendor/bundle/jruby/1.9/gems/logstash-codec-gzip_lines-2.0.4/lib/logstash/codecs/gzip_lines.r file. Add register method:
public
def register
#converter = LogStash::Util::Charset.new(#charset)
#converter.logger = #logger
end
And in method decode change:
#decoder = Zlib::GzipReader.new(data)
as
#decoder = Zlib::GzipReader.open(data)
The disadvantage of this approach is it wont tail your gzip file but the list file. So you will need to create a new gzip file and append it to list.
I had a variant of this problem where I needed to decode bytes in a files to an intermediate string to prepare for a process input that only accepts strings.
The fact that encoding / decoding issues were ignored in Pyhton 2 is actually really bad IMHO. you may end up with various corrupt data problems especially if you needed to re-encode the string back into data.
using ISO-8859-1 works for both gz and text files alike. while utf-8 only worked for text files. I haven't tried it for png's yet.
Here's an example of what worked for me
data = os.read(src, bytes_needed)
chunk += codecs.decode(data,'ISO-8859-1')
# do the needful with the chunk....

How do I use Play2 Iteratees to consume streaming HTTP with different event names?

I want a functional way of consuming server-sent events (SSE) over HTTP (or streaming HTTP as some call it). Through examples (Scala: Receiving Server-Sent-Events) I've found that Play2 Iteratees work well with its WS client when the event name is set to "message." Here is what the "message" stream looks like:
GET http://streaming.server.com/temperature
event: message
data: {"room":"room1","temp":71,"time":"2015-05-06T00:23:10.203+02:00"}
event: message
data: {"room":"room1","temp":70,"time":"2015-05-06T00:31:18.873+02:00"}
...
And here's what my web client looks like:
import com.ning.http.client.AsyncHttpClientConfig.Builder
import play.api.libs.iteratee.Iteratee
import play.api.libs.iteratee.Execution.Implicits.defaultExecutionContext
import play.api.libs.ws.ning.NingWSClient
object Client extends App {
val client = new NingWSClient(new Builder().build())
def print = Iteratee.foreach { chunk: Array[Byte] => println(new String(chunk)) }
client.url("http://streaming.server.com/temperature").get(_ => print)
}
with some output that it printed to my console:
$ sbt run
[info] Running Client
data: {"room":"room1","temp": 70, "time":"2015-05-06T00:31:14.193+02:00"}
data: {"room":"room1","temp": 70, "time":"2015-05-06T00:31:18.873+02:00"}
...
But when I set "event" to some other value than "message" the Iteratee immediately returns the Done signal just after reading the first value and then stops the stream. The spec I'm required to satisfy uses "event":"put". Below is an example of what the "put" stream looks like:
GET http://streaming.server.com/temperature
event: put
data: {"room":"room1","temp":71,"time":"2015-05-06T00:39:14.281+02:00"}
event: put
data: {"room":"room1","temp":70,"time":"2015-05-06T00:39:18.778+02:00"}
...
I discovered this when I added an onComplete() handler at the end and matched on a Success case like so:
client.url("http://streaming.server.com/temperature").get(_ => print).onComplete {
case Success(s) => println(s)
case Failure(s) => println(f.getMessage)
}
This code now prints:
$ sbt run
[info] Running Client
data: {"room":"room1","temp": 71, "time":"2015-05-06T00:39:14.281+02:00"}
Done((),Empty)
So far, I've only had success with the Jersey library for Java whose semantics is very similar to the EventSource JavaScript client; however it doesn't compose and appears to only support single-threaded consumption of SSEs. I would much rather use the Play2 WS+Iteratee libraries. How can I achieve this?