Implement publish-subscribe pattern using Curator/Zookeeper - apache-zookeeper

If you have implemented publish-subscribe pattern using Curator/Zookeeper could you please share your experience? We're currently doing proof-of-concept and your feedback would help big time!
Found this Pub-Sub Example link which shows an example using Curator. The note in that example says "it is not meant for production". Are they saying it's not a good pattern to use with Curator and they're only using it as an example to show Curator features? Or are they saying the pattern works well with Curator but the example is not something you would implement in Production?
Would also help to know pros and cons besides the well-known 1MB limitation on a znode size.
Your help will be greatly appreciated!

Came across this Curator Tech Note link which talks about using Zookeeper as a Queue source. Looks like most (if not all) bullets on that page would apply to using ZooKeeper as a publish-subscribe source as well.
So the verdict is ZooKeeper would make a bad publish-subscribe model.

Related

How to write at my own MQTT client code without using any API

I'd like to ask for recommendations. I want to write a client program using MQTT to publish data to the broker. I canĀ“t use any client libraries available in the internet. I would like to ask where can I find examples that works similar to this as an inspiration for my work, or any suggestions regarding the steps I should follow is highly appreciated.
Thank you!
To do this, you have to know well the MQTT protocol, in order to use adequately commands and messages needed to perform the communication.
Regarding the protocol, you can find documentation in the official page and a good explanation in this HiveMQ page. One example of a simple MQTT client implementation, in Java, can be seen in this tutorial.

mongodb change streams java

Since this feature is relatively new (mongo 3.6) I found very few java examples.
My questions:
1. What is the best practices for watching change streams?
2. Does it have to be a blocking call to watch the stream? (This means a thread per collection which is less desired)
This is the example I encountered:
http://mongodb.github.io/mongo-java-driver/3.6/driver/tutorials/change-streams/
The blocking call is:
collection.watch().forEach(printBlock);
Thanks,
Rotem.
Change streams make a lot more sense when you look at them in the context of reactive streams. It took me a while to realize this concept has a much broader existence than just the MongoDB driver.
I recommend reviewing the article above and then looking at the example provided here. The two links helped clear things up, and provided insight on how to write code leveraging the reactive streams Mongo driver, which is non-blocking.
Use mongo reactive driver so that it will be non-blocking. And we used this approach and running in production for last one month, no issue.

Apache Stanbol scalability and real-world applications

I'm starting a project with requirements such as NLP, storage of semantic data, content managment etc. and Apache Stanbol seems like a nice fit, but I'm not exactly sure it's ready so I'm trying to make an appropriate assessment before starting to work with it, as there are few things that worry me:
Stanbol seems a bit young and immature (newest version 0.12). Has anybody used it in a commercial project/application/setup (I failed to find this information online)? What is the scale of those projects?
How horizontally scalable is Stanbol? What are its cloud/clustering capabilities? As far as I know it relies on Apache Jena for storage, and Jena storage isn't horizontally scalable which would make Stanbol unable to scale horizontally as well. I might be wrong about this, but this is my current understanding, please correct me if I'm wrong. Maybe Jena can be swapped with something else to be used as RDF storage provider and I'm not aware of it?
Learning resources for Stanbol seem a little scarce. Does anyone know of a place/book/whatever where I can get more understanding about Stanbol under the hood (other than the official Stanbol website and the IKS website)?
Are there any good alternatives? I know there are nice alternatives regarding NLP (e.g. GATE, UIMA), but they lack CMS capabilities.
Thanks.
To your question:
1) I've been working on a project involving Stanbol(version 0.10). Its
still in the pre production stage. For CMS, we evaluated JackRabbit
and Alfresco. Alfresco (CMIS) was found to be a better choice in our case. What I
like about stanbol is the enhancement chains and the set of
Enhancement
Engines
that come by default. This is a small to mid size project.
3) I found this book (Instant Apache Stanbol, Packt Publishing)
very practical and useful while going about with my work especially the sections on Entity hubs and Enhancement engines.
A viable option is to use Redlink that offers content analysis and linked data services in the cloud using Apache Stanbol and Apache Marmotta in the back-end.
The Readlink team has worked on IKS and Apache Stanbol; for these reasons getting in contact with them can be a good starting point when deciding to use these technologies in production environments.

Scala distributed middleware

Is there any distributed middleware, like JXTA or JMS, for Scala?
I'm looking for a middleware that provides discovery, name service, service publications, availability verification, groups and so on, for Scala language.
Akka stack has many features , if you want AMQP reference then http://doc.akka.io/docs/akka-modules/1.3.1/modules/amqp.html
I just wrote this to help you use JMS in Scala, should you so wish
https://github.com/fancellu/jmsScala
I'm not sure what you mean by "for Scala". Since it runs within the JVM, with very good interoperability with Java, you can just use whatever Java-based facilities float your boat. In some cases there may be Scala "skins" available, but it doesn't matter a lot. For instance, here's an article on using ActiveMQ from Scala, with no Scala skin.
I think I would go further than Viral and say that Akka is the answer you're looking for. It doesn't provide all the features you mention (or at least takes a different approach to some of them) but it's a very powerful middleware suite that distributes well and is getting a lot of use and attention.

Please suggest direction for my small scala project

As a academic project of 6 months in college me and my 3 friends are going to implement "Distributed Caching" in scala language.
Being new to both of these concepts and this being our first project I would be really happy if you guys could provide some direction.
I am currently learning scala.
Please let me know which particular features of language to be learned for this particular project.
Any online resources for learning distributed caching.
thanks in advance
You could have a look at Terracotta and especially at its uses in implementing Distributed Caching. You could have a look at the source code of the open source edition of Terracotta. Also, you could even consider Terracotta as your framework for building the distributed cache. I don't have any personal experience in using Terracotta with Scala, but it has been done.
Features of the language... Try starting with the Programming in Scala book. It's a very good resource. If you want to do any concurrency you will have to be proficient in using Actors. I would recommend having a look over all the features of Scala. Each one has its uses and you will need to know at least a bit of them to recognise situations in which to use their power. :)
-- Flaviu Cipcigan
You might want to look at the project Velocity page.
In MSDN also there is an article about distributed caching in general.
I'm not sure, but I think the Akka project might is already doing what you're looking for (and a whole lot more). Perhaps you can take inspiration from that.