ByteStringCoder for reading data in Apache Beam - apache-beam

I am trying to read data with version 2.4.0 of Apache Beam using the standard TextIO.read(). The data has to be read as a ByteString.
Unfortunately, it doesn't seem like Apache Beam supports .withCoder() in the same way Dataflow does. I can't seem to find an alternative way to introduce a coder. Furthermore, it seems like ByteStringCoder is no longer included in the coders of Apache Beam.
What's the best way to get the same result of Dataflow's .withCoder(ByteStringCoder.of()) with the latest version of Apache Beam? Coders are still present in Apache Beam so there has to be some way to use them.

ByteStringCoder is located in beam-sdks-java-extensions-protobuf module so you need to include dependency
https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-extensions-protobuf
As for the TextIO: It uses StringUtf8Coder so you probably need to write your own BoundedSource/UnboundedSource.
And then use:
pipeline.apply(Read.from(yourCreatedSource))
You could take inspiration of current TextSource, where you can probably only change type String of FileBasedSource for ByteString.
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java

Related

How to use stubsPerConsumer with restdocs

How do I use the stubsPerConsumer feature when creating a stub from a producer with restdocs?
If this is not supported, is it possible to generate the asciidoc snippets from the groovy DSL contract?
Update
It looks like baseClassMappings is not supported when using spring-cloud-contract with restdocs. Has anyone found a clever way to get this to work using the assembly-plugin (that doesn't require a lot of manual setup for each consumer)?
Currently, it's not supported on the producer side with rest docs out of the box. We treat rest docs as a way to do the producer contract approach. Theoretically what you could do is to create different output snippet directories. Instead of for example target/snippets you could do target/snippets/myconsumer. Then with the assembly plugin you would just pick the target/snippets. At least that's how theory would work.
As for the contracts and adocs you can check out this: https://github.com/spring-cloud-samples/spring-cloud-contract-samples/blob/master/beer_contracts/src/test/java/docs/GenerateAdocsFromContractsTests.java . It's a poor man's version of going through all of the contracts and generation of adoc documentation from them.

Does scala offer async non-blocking IO when working with files?

I am using scala 2.10 and I wonder If there is some package which has async IO when working with files?
I did some search o this topic but mostly found examples as following
val file = new File(canonicalFilename)
val bw = new BufferedWriter(new FileWriter(file))
bw.write(text)
bw.close()
what essentially essentially java.io package with blocking IO operations - write, read etc. I also found scala-io project with this intention but it seems that project is dead last activity 2012.
What is best practice in this scenario? Is there any scala package or the common way is wrapping java.io code to Futures and Observables ?
My use case is from an Akka actor need to manipulate files on local or remote file system. Need to avoid blocking. Or is there any better alternative?
Thnaks for clarifing this
Scala does not offer explicit API for asynchronous file IO, however the plain Java API is exactly the right thing to use in those cases (this is actually a good thing, we can use all these nice APIs without any wrapping!). You should look into using java.nio.channels.AsynchronousFileChannel, which is available since JDK7 and makes use of the underlying system async calls for file IO.
Akka IO, while not providing file IO in it's core, has a module developed by Dario Rexin, which allows to use AsynchronousFileChannel with Akka IO in a very simple manner. Have a look at this library to make use of it: https://github.com/drexin/akka-io-file
In the near future Akka will provide File IO in its akka-streams module. It may be as an external library for a while though, we're not exactly sure yet where to put this as it will require users to have JDK at-least 7, while most of Akka currently supports JDK6. Having that said, streams based asynchronous back-pressured file IO is coming soon :-)
If you're using scalaz-stream for your async support it has file functionality that's built on the java.nio async APIs - that's probably the approach I'd recommend. If you're using standard scala futures possibly you can use akka-io? which I think uses Netty as a backend. Or you can call NIO directly - it only takes a couple of lines to adapt a callback-based API to scalaz or scala futures.

Equivalent of Akka ByteString in Scala standard API

Is anyone aware of a standard API equivalent to Akka's ByteString: http://doc.akka.io/api/akka/2.3.5/index.html#akka.util.ByteString
This very convenient class has no dependency on any other Akka code, and it saddens me to have to import the whole Akka jar just to use it.
I found this fairly old discussion mentioning adding it to the standard API, but I don't know what happened to this project: https://groups.google.com/forum/#!msg/scalaz/ZFcjGpZswRc/0tCIdXvpGBAJ
Does anyone know of an equivalent piece of code in the standard API? Or in a very lightweight library?
You might want to check out scodec-bits. It provides two types, BitVector and ByteVector (API docs), supporting fast appends, take, drop, random access, etc. The library has zero dependencies. We split it out of scodec precisely because we thought it might of general use outside of scodec, where it's used heavily.

Environment properties files in scala project

Just starting to learn scala for a new project. Have got to the point where I would like to define different properties files for the different environments the app is going to run on, ideally in a similar way to Rails - very lightweight, just one different properties file per environment that is loaded based on its name. I don't really care if it's a java properties file, YML or scala code.
In the spirit of not reinventing the wheel I've been looking to see if there is some accepted standard Scala way of doing this but I can't find one, I've found a few similar but not identical questions here where people suggest using system properties in the startup script but this feels like it would end up being a nightmare.
I could obviously implement it if needs be but feels like the sort of thing that should already exist. So - does it?
I'm using sbt if that makes a difference.
I know of Configgy. Also, Akka/Play 2.0 will be using Config, which looks nice too. See blog about the latter.
Basically, Configgy has been used for a while now, but has been deprecated, while Config will be all-new. However, having Config as the default Typesafe Stack configuration tool will probably make it the preferred tool for that pretty fast.
I have written a Configgy replacement called Configrity. It can use different input formats (like YAML), it's immutable, supports functional patterns and uses type class to convert automatically the values to the desired type.
I have written BeeConfig, a replacement for java.util.Properties except that it is a Scala API and uses UTF-encoded configuration files. It supports string interpolation, chaining and a bunch of other features. But its main objective is simplicity.
Bitbucket | Blog post
Rick

How to use Brail as a stand-alone general purpose templating engine (like NVelocity)?

I've been using NVelocity as a stand-alone templating engine in my text file generator. The problem with NVelocity is that the macros are quite shaky; pretty much all errors I get are from faulty macro implementation.
It would be cool if I could just use some other templating engine, such as Brail. That way I would just write functions that output strings.
What's the best way of embedding Brail engine? I would like to just pass it a string containing the template (not reading from disk), and I would like to minimize the number of external dependencies.
EDIT: I found the answer myself. Take a look at this source file from Castle.
Try nHaml or Spark as they both support full standalone or direct usage
They both support standard c# functions that return strings etc
Spark is real HTML, nHaml is DRY HTML
Both very cool!