Apache Stanbol scalability and real-world applications - content-management-system

I'm starting a project with requirements such as NLP, storage of semantic data, content managment etc. and Apache Stanbol seems like a nice fit, but I'm not exactly sure it's ready so I'm trying to make an appropriate assessment before starting to work with it, as there are few things that worry me:
Stanbol seems a bit young and immature (newest version 0.12). Has anybody used it in a commercial project/application/setup (I failed to find this information online)? What is the scale of those projects?
How horizontally scalable is Stanbol? What are its cloud/clustering capabilities? As far as I know it relies on Apache Jena for storage, and Jena storage isn't horizontally scalable which would make Stanbol unable to scale horizontally as well. I might be wrong about this, but this is my current understanding, please correct me if I'm wrong. Maybe Jena can be swapped with something else to be used as RDF storage provider and I'm not aware of it?
Learning resources for Stanbol seem a little scarce. Does anyone know of a place/book/whatever where I can get more understanding about Stanbol under the hood (other than the official Stanbol website and the IKS website)?
Are there any good alternatives? I know there are nice alternatives regarding NLP (e.g. GATE, UIMA), but they lack CMS capabilities.
Thanks.

To your question:
1) I've been working on a project involving Stanbol(version 0.10). Its
still in the pre production stage. For CMS, we evaluated JackRabbit
and Alfresco. Alfresco (CMIS) was found to be a better choice in our case. What I
like about stanbol is the enhancement chains and the set of
Enhancement
Engines
that come by default. This is a small to mid size project.
3) I found this book (Instant Apache Stanbol, Packt Publishing)
very practical and useful while going about with my work especially the sections on Entity hubs and Enhancement engines.

A viable option is to use Redlink that offers content analysis and linked data services in the cloud using Apache Stanbol and Apache Marmotta in the back-end.
The Readlink team has worked on IKS and Apache Stanbol; for these reasons getting in contact with them can be a good starting point when deciding to use these technologies in production environments.

Related

Do you know any performant CMS with file based configuration?

I've been looking for an alternative to Drupal for a while now, mainly because I dislike the way Drupal stores all configuration in the database, thus making team development and deployment a real struggle (I'm aware that configuration management is one of the improvements coming up with Drupal 8 and I've been fighting with Features and Strongarm). The other issue I'm having with Drupal is its high memory footprint and the bazillion of database queries it does with every page call. I'm not asking for Drupal support, this is just to clarify what I'm looking for when searching alternatives.
So far I've only found two promising systems that handle configuration right:
Locomotive CMS (built with Ruby on Rails)
Bolt CMS (built with PHP on top of Silex and Symfony)
Is anybody aware of other CMSes that offer a versionable, file-based configuration and therefore painless collaboration and deployment? I couldn't find any CMS round-up that looks at things from this point of view.

Anyone using HyperDex in production?

I just noticed that the relatively new Open Source noSQL database "HyperDex" has no mentions in questions in S.O. yet - is someone using it? How does it compare with other noSQL engines?
We're working with some folks at LinkedIn to use HyperDex to power some of their custom analytics applications. We're also in discussion with a couple startups to build applications on top of HyperDex.
Our mailing list is the HyperDex discuss list and has been relatively active as of late. Many of these folks are using HyperDex and helping us to improve it.
Finally, HyperDex holds its own against other popular NoSQL engines. The HyperDex performance benchmarks show that HyperDex offers both higher throughput and lower latency than other popular systems. The HyperDex tutorials show just how easy it is to deploy a cluster. Start with the QuickStart and work your way up to deploying a truly fault tolerant cluster.
Emin gave a talk in our company today. Sounds interesting project, but I think you would encounter some gotchas if deploy in production, such as load balancing and optimal subspace scheme.
You can find the performance comparison at their site. The paper is also worth reading.

What is the good starting point to developing RESTful web service in Clojure?

I am looking into something lightweight, that, at a minimum should support the following features:
Support for easy definition of actions through metadata
Wrapper that extracts parameters from request into clojure map, or as function parameters
Support for multiple forms of authentication (basic, form, cookie)
basic authorization based of api method metadata
session object wrapped in clojure map
live coding from REPL (no need to restart server)
automatic serialization of return value to json and xml
have nice (pluggable) url parameter handling (eg /action/par1/par2 instead of /action?par1=val1&par2=val2)
I know it is relatively easy to roll my own micro-framework for each one of these options, but why reinvent the wheel if something like that already exists? Especially if it is:
Active project with rising number of contributors/users
Have at least basic documentation and tutorial online.
First of all, I think that you are unlikely to find a single shrinkwrapped solution to do all this in Clojure (except in the form of a Java library to be used through interop). What is becoming Clojure's standard Web stack comprises a number of libraries which people mix and match in all sorts of ways (since they happily tend to be perfectly compatible).1
Here's a list of some building blocks which you might find useful:
Ring -- Clojure's basic HTTP request handling library; all the other webby libraries (for writing routes &c.) that I know of are compatible with Ring. Ring is being actively developed, has a robust community, is very well-written and has a nice SPEC document detailing its design philosophy. This blog post provides a nice example of how it might be used (reacting to GitHub commits).
Sandbar -- currently an authentication library, more types of functionality planned; under development.
Compojure -- a mature and robust library which provides a nice DSL for writing routes to be used on top of Ring. This will give you the nice URL parameter handling.
Compojure-rest -- "a library for building RESTful applications on top of Compojure". Compojure-rest is, as far as I can tell, in its early stages of development; perhaps you might see this as an opportunity to influence its design. :-)
For dealing with XML, there's clojure.contrib.lazy-xml (and the helper library clojure.contrib.zip-filter.xml) and Enlive (the built-in clojure.xml namespace is currently not very usable); these would be used in tandem (though for your purposes the former might suffice).
For JSON, there a library in contrib and clojure-json (and I think there was at least one other lib I seem to be forgetting now...); pick the one you like best.
All of will be perfectly happy with a REPL-driven development style (see the accepted answer to this SO question for a Ring trick which is very much to the purpose here). I suppose the above collection of links does leave a few blind spots (in particular, the authentication story is still being ironed out, as far as I can tell), but hopefully it's a good start.
1The only single-package solution for building webapps in Clojure that I know of is Conjure, inspired by Rails; unfortunately I have to admit that I don't know much about it, so if you feel interested, follow the link and look around the sources, wiki &c.
While building my first Clojure rest service I found myself asking often the same question. The Clojure Toolbox helped me a lot: http://www.clojure-toolbox.com/
If you are looking for some sample, real-world, illustrative code to get you started, then you could study this clojure-news-feed on github project which demonstrates how to implement a non-trivial RESTful web service with compojure/ring that wraps both SQL (postgresql or mysql) and NoSQL (cassandra), search (solr), caching (redis), event logging (kafka), connection pooling (c3po), and real-time metrics via JMX.
This blog about Building a Scalable News Feed Web Service in Clojure provides a good introduction. I ran some load tests against this service on a humble AWS deployment and got about eighty transactions per second with less than a half second average latency per transaction.
Take a look at liberator library http://clojure-liberator.github.io/liberator/ It's noy a standalone solution, buy very good for rest service definition.
Just to provide an updated answer to this old question, currently (in 2018) I think Luminus provides an excellent starting point. It's using many of the libraries (ring, compojure, etc.) mentioned in previous answers, is modular and as close to "single package" as you can get with Clojure. Specifically for REST, take a look at compojure-api. Luminus recommends buddy for authentication, I've had good success using it both for traditional session-based auth as well as Oauth and stateless JWTs.

Memcached and Velocity

I'm investigating using either Memcached or Velocity for distributed caching over a cluster of servers after reading Scott Hanselman's answer to this question. Does anybody know of a Microsoft web site that uses Velocity for its caching? If Microsoft aren't using it then does anybody know of any relatively popular web site that's using it?
It would be pretty foolish for any substantial site to go live (in production) on a CTP of a product (edit - good point in the comments - this isn't a hard rule... there are exceptions, for example stackoverflow). Velocity is currently in CTP2, which is good for building out proof-of-concept and planning for product releases, but that's all. Once it is a supported product, I'm sure we will see plenty of usage. Follow the Velocity product team blog (http://blogs.msdn.com/velocity/) for details.
As far as memcached vs Velocity, they have somewhat overlapping but ultimately different purposes. Memcached is not reliable. That is spelled out very clearly in the documentation and by the authors. It is intended to be blazingly fast, cheap to run and simple to administer. Velocity, on the other hand, is much more familiar to the formal enterprise software crowd. It is complex, with a robust API and is better for a more formal data environment.
memcached is not natively supported on Win32. There is a project that aims to port memcached to Win32
http://jehiah.cz/projects/memcached-win32/
And while they have been successful, they lag a couple of versions (point versions at this point) behind the main release line. So if you're on Win32 I think your best bet would be Velocity.
So while I dont have an answer to your question (what sites use Velocity) I think you're better off going with Velocity over memcached.

Jitterbit vs. BizTalk

Is there anyone who has used or looked into using Jitterbit as well as BizTalk? If so, what are some pros and cons of each, and which one did you go with as your final solution?
Specifically, I'm looking for SAP integration, but any input would be appreciated.
Like Rob I have not heard about JitterBit until reading your question (thanks!), I have, however, been working with BizTalk, almost exclusively, for the past 9 years; for that reason I wasn't sure I should be responding, but as Rob did, and nobody else has, I figured it's worth a couple of cents....
From the little reading I've done it seems to me that JitterBit, apart from being an open source, which has it's pros and cons, is trying to lower the entry barrier by offering a relative simple solution with the promise of rapid development and drag-n-drop approach "with no custom code".
I'll take their promise at face value, as I know nothing about it, although I have my doubts, so let's assume developing with JitterBit is really easy, there's one thing I can clearly state - developing with BizTalk isn't.
But, and that's a bit but in my view, developing with BizTalk is somewhat difficult not because Microsoft did a bad job at it, on the contrary - developing with BizTalk is somewhat difficult because Microsoft wanted to create a tool that could realistically allow enterprises to solve their BPM and integration needs well, and, in my experience, these problems are almost never simple, so Microsoft had built a server that has many capabilities, is very strong and very flexible, at the cost of complexity.
So, while any experienced technical sales guy can give you a demo of an integration scenario that is very simple, and is developed in a few minutes using a lot of drag and drop and configuration, even in BizTalk, but is this a realistic enterprise-level solution? was it a realistic scenario that was demonstrated? from my experience the answer is almost exclusively no; the problems tend to be complex, and their require a more robust solution.
So, I guess the bottom line would be - if you're looking for a one off solution, and open source is something you guys work with - JitterBit is definitely worth looking at, seeing if it's capable of helping out and has, indeed, a short learning curve (it would be important to look at maintenance, monitoring, trouble shooting, instance management etc)
If, however, you believe, as is often the case, that your solution would grow to become a BPM/integration platform in your organisation, and you need something more robust - I would put my money on BizTalk being a better candidate.
I've done a fair bit of integration with SAP, starting with the old SAP DCOM connector. More recently I've been involved in the selection of an integration platform to serve in an Enterprise Service Bus pattern.
We did web service samples to connect to SAP on a number of platforms, including BizTalk, Mule, Netweaver, Webmethods and Tibco. Webmethods won out based on licensing and capability, though BizTalk and Netweaver both had very high marks.
Jitterbit was not part of the evaluation - in fact I had to look it up to be sure I understood your question.
If your goal is just to be able to call an RFC, the .NET SAP connector works well.
If your goal is to expose a web service to wrapper a process in SAP, then BizTalk is good, but I recommend you see if your organization already has netweaver licensed as there are many web services available directly from SAP with no coding.
My recommendation is to avoid Jitterbug and Mule for the enterprise for now - unless Open Source is actually a popular thing at your place of employment. Netweaver and BizTalk are very robust, polished products.
If you are looking for something you can ship easily, then Jitterbug may make more sense. Though generally I'd recommend you define it as a web service call, and look to your customers technology stack for the most appropriate integration technique.
More context of what you are looking to achieve will enable a more accurate answer.
Michael,
We use Jitterbit in our organization and we've been very successful with it in various projects. Our SAP projects use XI and Jitterbit has dramatically simplified the ability to integrate web service interfaces with the various protocols it supports.
In addition to an excellent price (and we now subscribe to Jitterbit for support) we realize great value out of the support service. If we have any questions during our implementations they seem to provide all the subject matter expertise included in the support cost, so we're quite self sufficient.
We still have many other integration solutions in our company including VB and Java programs; it's a mess, but we don't believe that any one platform will meet all of our different divisions' needs. We have been using open source, specifically Linux and Apache for many years now, although IBM and Microsoft are also prevalent here.
We went with Jitterbit as it supports protocols needed to integrate any modern system and with SOA / Web Services being our stated direction Jitterbit was a great fit for what we needed.
Given that Jitterbit is Open Source, I would encourage you to download it and try it out.
I will say it simply, I have been using biztalk and was one of the people that helped validate the 2006 training course. Biztalk by far one the best server applications for Business process that is available today. You do also have to factor in the price point is ridiculously low compared to what else is out there.