Refter to the python programming guide online:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/python.html
I didn't see any material related to streaming programming like kafka connector and so on.
Also from the git hub examples(https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming), I didn't see python codes either.
If it does support python in streaming programming, could you show me some examples to start with?
Thanks!
Maybe Flink 1.4 will support Python for streaming, you can see the recent PR FLINK-5886 - Python API for streaming applications.
No, Flink 1.2 does not support Python for streaming.
Flink 1.3 doesn't support it either.
Related
Is there a Scala client library for Apache OpenWhisk?
The short answer is that there isn’t an Apache community supported Scala client. There is quite a bit to seed such an implementation from the Scala REST and CLI bindings defined by this interface https://github.com/apache/openwhisk/blob/fcbe9ca83829f2194b47f7c61a166396838c6a44/tests/src/test/scala/common/WskOperations.scala#L163 but it is not a standalone client.
If it’s something you’re interested in contributing to the project and community, you should reach out on the Apache project dev list as noted in one of the comments above.
I have Scala-based application and I need to connect it to Cassandra.
I found DataStax Enterprise drivers very useful in this regard, and those have a lot of cool features like in-built load balancing for Cassandra and that is really import for me.
Unfortunately there isn't any native DSE drivers for Scala. I know we can use DSE Java drivers, but in that case, we loose a lot of Scala cool features.
I also found spark-cassandra-connector that's built by Datastax as well, but this built-in load balancing thing is really important to me and I don't know if spark-cassandra-connector support it or not.
In the Java-based applications using DSE Java driver, I need to config the built-in load balancer in a configuration file as below:
datastax-java-driver.basic.load-balancing-policy {
class = DefaultLoadBalancingPolicy
}
I don't know the equivalent way in Scala using spark-cassandra-connector and I'm not even sure if it is possible or not.
Any help would be appreciated. Thanks.
In the Scala you can just use the Java driver - out of the box you don't have only support for base Scala types, but you can solve this problem by importing the java-driver-scala-extras into your project (as source code) - it works for at least for driver 3.x. Another issue is the support for Option, but this could done via Java's optional that has an extra codec in Java driver.
Regarding the customization of the driver - that part should work with Scala without change. Regarding the support of default policy in Spark - Spark Cassandra connector has a separate policy for a special reason - it's close to the Java's default policy, but with specifics for Spark.
Does apache beam for python support flink runner right now ? or even the portable runner ? And is beam for java supported by flink runner commercially ?
Yes, both python and java is supported for the Apache Flink runner.
It is important to understand that the Flink Runner comes in two flavors:
A legacy Runner which supports only Java (and other JVM-based languages)
A portable Runner which supports Java/Python/Go
Ref: https://beam.apache.org/documentation/runners/flink/
I think you would have to define what commercially supported means:
Do you want to run the Flink Runner as a service, similarly to what Google Cloud Dataflow provides? If so, the closest to this is Amazon's Kinesis Data Analytics, but it is really just a managed Flink cluster.
Many companies use the Flink Runner and contribute back to the Beam project, e.g. Lyft, Yelp, Alibaba, Ververica. This could be seen as a form of commercial support. There are also various consultancies, e.g. BigDataInstitute, Ververica, which could help you manage your Flink Runner applications.
I have kafka broker upgraded from 0.8 to 0.11, and now I am trying to upgrade the spark streaming job code to be compatible with the new kafka -I am using spark 1.6.2-.
I searched a lot for steps to follow to do this upgrade I didn't find any article either official or not-official.
The only article I found useful is this one, however it is mentioning spark 2.2 and kafka 0.10, but I got a line saying
However, because the newer integration uses the new Kafka consumer API instead of the simple API, there are notable differences in usage. This version of the integration is marked as experimental, so the API is potentially subject to change
Do anyone have tried to integrate spark streaming 1.6 with kafka 0.11, or is it better to upgrade spark first to 2.X , since there is a lack of info about and support regarding this version mix of spark-streaming and kafka?
After lots of investigations, found no way to do this move, as spark-streaming only supporting kafka version up to 0.10 (which has major differences from kafka 0.11, 1.0.X).
That's why I decided to move from spark-streaming to use the new kafka-streaming api, and simply it was awesome, simple to use, very flexible, and the big advantage is that: IT IS A LIBRARY, you can simply add it to your project, not a framework that is wrapping your code.
Kafka-streaming api almost support all functionality provided by spark (aggregation, windowing, filtering, MR).
Basic question, which platforms and languages does Apache Kafka currently support?
Kafka is written in Scala, which means it runs on the JVM, so you can effectively run on any OS that supports the JVM. However, the brokers extract a huge performance boost by using the OS s kernel buffer cache. Im not sure how good this is with a non-unix system like Windows. The kafka source code base provides first class support for Scala and Java Clients . You could also find producer and consumer clients in languages like Php,C++, python etc under the contrib directory.
Apache Kafka runs well and is most stable and performant on Linux (either bare metal Linux, Linux VMs in private or public clouds, or Linux based docker containers). Kafka has been known to run on Windows but most vendors that commercially support Kafka do not extend their support to Windows for production servers so it's "community supported" by the Kafka community. Kafka also runs quite well on macOS for development.
The Apache Kafka distribution includes support for Java and Scala clients only but the larger Kafka community has created a long list of clients for other languages. A good list of the available options for clients is on the apache kafka wiki here: https://cwiki.apache.org/confluence/display/KAFKA/Clients
You will find that for some languages (like C#/.Net, Python, or Go) there are 2 or 3 or even more options for client libraries. Some are up to date with the newest Kafka wire protocol changes such as Exactly-Once Semantics, and message Headers which were added in Apache Kafka 0.11 or timestamps which were added in 0.10, or the security enhancements and new consumer api added in 0.9, and others are not. Some have the full set of functions/methods provided in Java (like seek(), or consumer group management, or interceptors) but others do not. Some are written purely in the target language and others are wrappers in the librdkafka C/C++ library. Some are commercially supported by a vendor and others are not, so choose based on your needs in terms of functionality, stability, execution environment, and supportability.