Anyone using HyperDex in production? - nosql

I just noticed that the relatively new Open Source noSQL database "HyperDex" has no mentions in questions in S.O. yet - is someone using it? How does it compare with other noSQL engines?

We're working with some folks at LinkedIn to use HyperDex to power some of their custom analytics applications. We're also in discussion with a couple startups to build applications on top of HyperDex.
Our mailing list is the HyperDex discuss list and has been relatively active as of late. Many of these folks are using HyperDex and helping us to improve it.
Finally, HyperDex holds its own against other popular NoSQL engines. The HyperDex performance benchmarks show that HyperDex offers both higher throughput and lower latency than other popular systems. The HyperDex tutorials show just how easy it is to deploy a cluster. Start with the QuickStart and work your way up to deploying a truly fault tolerant cluster.

Emin gave a talk in our company today. Sounds interesting project, but I think you would encounter some gotchas if deploy in production, such as load balancing and optimal subspace scheme.
You can find the performance comparison at their site. The paper is also worth reading.

Related

Apache Stanbol scalability and real-world applications

I'm starting a project with requirements such as NLP, storage of semantic data, content managment etc. and Apache Stanbol seems like a nice fit, but I'm not exactly sure it's ready so I'm trying to make an appropriate assessment before starting to work with it, as there are few things that worry me:
Stanbol seems a bit young and immature (newest version 0.12). Has anybody used it in a commercial project/application/setup (I failed to find this information online)? What is the scale of those projects?
How horizontally scalable is Stanbol? What are its cloud/clustering capabilities? As far as I know it relies on Apache Jena for storage, and Jena storage isn't horizontally scalable which would make Stanbol unable to scale horizontally as well. I might be wrong about this, but this is my current understanding, please correct me if I'm wrong. Maybe Jena can be swapped with something else to be used as RDF storage provider and I'm not aware of it?
Learning resources for Stanbol seem a little scarce. Does anyone know of a place/book/whatever where I can get more understanding about Stanbol under the hood (other than the official Stanbol website and the IKS website)?
Are there any good alternatives? I know there are nice alternatives regarding NLP (e.g. GATE, UIMA), but they lack CMS capabilities.
Thanks.
To your question:
1) I've been working on a project involving Stanbol(version 0.10). Its
still in the pre production stage. For CMS, we evaluated JackRabbit
and Alfresco. Alfresco (CMIS) was found to be a better choice in our case. What I
like about stanbol is the enhancement chains and the set of
Enhancement
Engines
that come by default. This is a small to mid size project.
3) I found this book (Instant Apache Stanbol, Packt Publishing)
very practical and useful while going about with my work especially the sections on Entity hubs and Enhancement engines.
A viable option is to use Redlink that offers content analysis and linked data services in the cloud using Apache Stanbol and Apache Marmotta in the back-end.
The Readlink team has worked on IKS and Apache Stanbol; for these reasons getting in contact with them can be a good starting point when deciding to use these technologies in production environments.

Is statebase framework such as Lift Scalable?

I've watch Harry Heymann, from Foursquare, gave a presentation on Lift to BASE usergroup. He mention something about how Lift being statebase isn't going to scale well in that video.
Is that true? If so why is that? Note: I know very little about state base.
I can't seem to find the google, I'll look for it later. Thank you in advance.
When this questions comes up on the Lift Mailing list, what the author of the framework usually replies with is a blog post he wrote some time ago, which explains why Lift is Stateful, but at the same time how you can use Lift as a stateless framework.
This is the link
David Pollack has a good answer to this at this this Quora thread, in a comment to Jackson Davis's answer:
In practice, scaling a Lift site is much much easier than scaling a LAMP site. Why? Well, state exists someplace. If it exists in the JVM, you get a lot of performance benefits and stability as well as, in Lift's case, lots of security. Contrast that with sessions in memcached. "Whoop, memcached went down, there go a pile of sessions." "Whoops, we've got a new memcached hashing algorithm, there go all the session." "Whoops, Google just crawled us creating 200,000 new sessions pushing all the but the active sessions out of cache." "Whoops, the Ruby runtime just went wild, ate all the VM on one of our boxes, memcached went down..." So, you try storing sessions in some wacky shared version of MySQL. This solution requires tons of hardware and a team of make sure that the sharing code is correct, etc. Contrast that to using Nginx, Jetty and session affinity. It's about 4 hours of setup time and it just works. See http://blog.harryh.org/post/7550... So, talk to a Facebook engineer about the challenges they go through to manage state between the front end, memcached, MySQL, etc. Compare that to Twitter with the famous fail whale. Compare that to Apple's store and the iTunes store which are written on WebObject (which is highly stateful.) Lift apps running at scale typically require 7% of the front end resources of LAMP app. The Lift apps that are running at scale (Foursquare and Novell pulse are two) do not have the kind of scaling issues associated with LAMP sites that have similar traffic patterns. Scaling with Lift is neither tricky, nor risky. It's simple. It's known. It's proven. Scaling with LAMP is playing whack-a-mole with state and that only becomes a problem at scale. -David Pollak • Jul 20, 2010
I think it's pretty clear that Lift apps scale very well. Foursquare and the UK Guardian are both using Lift. Both sites are very highly trafficked and neither has had a material Lift-related outage. Please also see the link that Diego posted. It provides an in-depth discussion of scaling Lift-powered sites.

SmartGWT, ZK and GenericFrame - Online Homework

Good day,
Our school, a small high school in semi-rural New Zealand, is currently looking into online homework solutions. Being one of the IT guys, I have been asked to look into some of the options. We have checked around and there are no robust solutions that cover what we are looking for. So, we are considering development of our own system, either on our own or in collaboration with some other schools.
Before I put significant time into any one option, I would thought I should ask for some expert advice.
Please keep in mind that one of our major obstacles is that around 20% of our students are on dial-up because broadband is not available in their area.
We are also not limited to the technologies listed, they just are the ones that we have been looking into up to this point.
With that in mind, here goes.
1. Is there a way to pre-determine the bandwidth needed for these technologies?
2. If bandwidth continued to be too limiting, could the final solution stand alone so we could distribute it to students on CD or USB stick?
3. What are some pros/cons of each for use with databases, specifically mysql or postgresql? (After all we do need to keep track of lots of data)
4. What are some pros/cons of each for of these RIA development?
I appreciate everyone for sharing their time and expertise on the matter.
Cheers,
Ben
1) If you write full-AJAX application, such as in GWT, the bandwitch will be:
a) the size of application java script, images, etc., you may consider that everything is loaded when user logs in (cache for images may seems to be big, but it's easily overloaded)
b) the size of communication - in GWT it depends only from you! no magic full-frame reloading, sending is only what YOU are wanting to send
2) I do not catch your point, stand alone applications can be distributed such way, applications that use databases generally can't
3) postgresql has high compatibility with Oracle - same transaction+select for update behaviour, pgPLSQL is highly inspired by PL/SQL (easy to rewrite stored procedures).
I personally suggest MySQL for a school project for its simplicity. PostgreSQL is powerful but a bit complicate to configure and the visual tool for optimizing queries not good.
Without considering the bandwidth, I definitely suggest ZK since, again, it is much easier to learn, to develop and to maintain (also much more powerful). The bandwidth consumption and latency of GWT really depends how much effort you want to invest, and how skillful your people are familiar with distributed computing, while the network bandwidth is basically the states of UI (not data), which is reasonably small. In short, you could have the best network bandwidth and latency if you optimize it at the best with GWT, while ZK is less to worry but, if you want to improve, you have to use jQuery (i.e, in JavaScript).
Thanks lechlukasz, I appreciate your comments and insight.
I will clarify my point about stand alone applications. We have a number of students, as high as 20%, who do not have access to broadband due to their geographic location. We are considering, as part of the design, how we may be able to distribute a stand alone version.
For instance, if we were to abstract all the database calls using a separate class in GWT, we could recompile a stand alone version that didn't make the database calls. The database would likely only be for tracking results and reporting.
In reality, we would likely implement the front end product first with references to empty methods for storing the results in a database and implement those methods at a later time.
For the record, we have started to code up some test cases using GWT/SmartGWT and are pleased with the results. Although we cannot comment on the other technologies considered because we didn't try them to the same extent, we are pleased with the results to this point of the project.
Cheers,
Ben

Choosing noSQL - availability priorited

We have thought a bit about running a noSQL database for our next project. However, we're not sure about which platform that will give us the best possible availability and has the best built-in replication features/functions to provide this - with the least headache.
Right now, Cassandra appears as the best candidate, but we would like to hear more about this from someone that have more experience in this area, then we do.
Thanks a lot!
High availablity will most likely be achieved with a Dynamo clone.
Cassandra is a good option although it has been bashed recently by several early adapters.
Project Voldemort is also Dynamo-based and therefore easily optimized for high-availability, it's what LinkedIn are using.
Another interesting noSQL option might be membase, I myself didn't use it but their notion of virtual buckets for rebalancing as opposed to just consistent hashing makes a lot of sense and would appear to provide more robust high-availability.

Jitterbit vs. BizTalk

Is there anyone who has used or looked into using Jitterbit as well as BizTalk? If so, what are some pros and cons of each, and which one did you go with as your final solution?
Specifically, I'm looking for SAP integration, but any input would be appreciated.
Like Rob I have not heard about JitterBit until reading your question (thanks!), I have, however, been working with BizTalk, almost exclusively, for the past 9 years; for that reason I wasn't sure I should be responding, but as Rob did, and nobody else has, I figured it's worth a couple of cents....
From the little reading I've done it seems to me that JitterBit, apart from being an open source, which has it's pros and cons, is trying to lower the entry barrier by offering a relative simple solution with the promise of rapid development and drag-n-drop approach "with no custom code".
I'll take their promise at face value, as I know nothing about it, although I have my doubts, so let's assume developing with JitterBit is really easy, there's one thing I can clearly state - developing with BizTalk isn't.
But, and that's a bit but in my view, developing with BizTalk is somewhat difficult not because Microsoft did a bad job at it, on the contrary - developing with BizTalk is somewhat difficult because Microsoft wanted to create a tool that could realistically allow enterprises to solve their BPM and integration needs well, and, in my experience, these problems are almost never simple, so Microsoft had built a server that has many capabilities, is very strong and very flexible, at the cost of complexity.
So, while any experienced technical sales guy can give you a demo of an integration scenario that is very simple, and is developed in a few minutes using a lot of drag and drop and configuration, even in BizTalk, but is this a realistic enterprise-level solution? was it a realistic scenario that was demonstrated? from my experience the answer is almost exclusively no; the problems tend to be complex, and their require a more robust solution.
So, I guess the bottom line would be - if you're looking for a one off solution, and open source is something you guys work with - JitterBit is definitely worth looking at, seeing if it's capable of helping out and has, indeed, a short learning curve (it would be important to look at maintenance, monitoring, trouble shooting, instance management etc)
If, however, you believe, as is often the case, that your solution would grow to become a BPM/integration platform in your organisation, and you need something more robust - I would put my money on BizTalk being a better candidate.
I've done a fair bit of integration with SAP, starting with the old SAP DCOM connector. More recently I've been involved in the selection of an integration platform to serve in an Enterprise Service Bus pattern.
We did web service samples to connect to SAP on a number of platforms, including BizTalk, Mule, Netweaver, Webmethods and Tibco. Webmethods won out based on licensing and capability, though BizTalk and Netweaver both had very high marks.
Jitterbit was not part of the evaluation - in fact I had to look it up to be sure I understood your question.
If your goal is just to be able to call an RFC, the .NET SAP connector works well.
If your goal is to expose a web service to wrapper a process in SAP, then BizTalk is good, but I recommend you see if your organization already has netweaver licensed as there are many web services available directly from SAP with no coding.
My recommendation is to avoid Jitterbug and Mule for the enterprise for now - unless Open Source is actually a popular thing at your place of employment. Netweaver and BizTalk are very robust, polished products.
If you are looking for something you can ship easily, then Jitterbug may make more sense. Though generally I'd recommend you define it as a web service call, and look to your customers technology stack for the most appropriate integration technique.
More context of what you are looking to achieve will enable a more accurate answer.
Michael,
We use Jitterbit in our organization and we've been very successful with it in various projects. Our SAP projects use XI and Jitterbit has dramatically simplified the ability to integrate web service interfaces with the various protocols it supports.
In addition to an excellent price (and we now subscribe to Jitterbit for support) we realize great value out of the support service. If we have any questions during our implementations they seem to provide all the subject matter expertise included in the support cost, so we're quite self sufficient.
We still have many other integration solutions in our company including VB and Java programs; it's a mess, but we don't believe that any one platform will meet all of our different divisions' needs. We have been using open source, specifically Linux and Apache for many years now, although IBM and Microsoft are also prevalent here.
We went with Jitterbit as it supports protocols needed to integrate any modern system and with SOA / Web Services being our stated direction Jitterbit was a great fit for what we needed.
Given that Jitterbit is Open Source, I would encourage you to download it and try it out.
I will say it simply, I have been using biztalk and was one of the people that helped validate the 2006 training course. Biztalk by far one the best server applications for Business process that is available today. You do also have to factor in the price point is ridiculously low compared to what else is out there.