Consistency techniques for federated distributed systems (DB CAP, Simulations, etc.) - nosql

I am looking for an authoritative source of techniques used for consistency management (something better than best effort). A guideline, book, or other resource would be great.
For example, in distributed cloud dbs, I am familiar with the five techniques offered by Azure CosmosDB
https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels (and of course, Dynamo, CouchDB, etc. similar approaches)
In simulation. HLA (High Level Architecture) has a "ACID" level of distributed update with a peer-peer system. See: https://www.acm-sigsim-mskr.org/Courseware/Fujimoto/Slides/FujimotoSlides-21-HLATimeManagement.pdf
The best survey I found was https://en.m.wikipedia.org/wiki/PACELC_theorem , but It is pretty thin.
If there is not a survey, is there some faculty at some university who has made this his/her career that we could consult with?

The authoritative source on this (in my humble opinion) is Martin Kleppmann and his book Designing Data-Intensive Applications. In particular, chapter 9 deals with consistency models, although there are other chapters that talk about replication and other issues related to building distributed databases so Chapter 9 itself should be read with the others.

Related

Can Apache nifi be used as an application server?

I'm an application developer mainly develop and maintain enterprise application, like ERP, HCM system. After being in the field for many years, I started feeling that the way business system are developed is not quite right. After years of maintenance and enhancement by hundreds of developers, the system keeps getting bigger and bigger, more and more complex. At the end, it just impossible to do big changes in the system, because the logics are all tangled together like Italian noodles. Developers so afraid of causing severe customer issues.
Recently I find Flow based programming paradigm invented by J. Paul Morrison and I find it really interesting. I approve very much the idea of doing application development by drawing diagram visually. As we all know to develop business system we start with drawing business flow diagram. Why can't business flow diagram just be the system itself??
Naturally, I tried to find FBP implementations, and nifi is the one that the FBP inventor recommends. I haven't dig very deep into nifi.
Just after watching some introduction videos and documentation, I find most of the time, the nifi experts always talking about using nifi for iot system, real time streaming these kind of stuff. It seems that nifi is not related to business systems.
Looking forward to someone to clarify my doubts. Is nifi suitable for building business transactional systems?
Apache NiFi is definitely used for many "business logic" systems, especially taking on the role of handling extract/transform/load logic (ETL). While not strictly an ETL tool, NiFi can facilitate data routing and simple event processing in a number of scenarios. The "Powered By NiFi" page lists some public use cases of NiFi, and many are for "business systems" that do not relate to IoT.
sorry I didn't see your question before - your comments are interesting. I am surprised that you say that NiFi is the FBP software that I recommend - I do list it as a product that is closer to the "classical" FBP philosophy than what we call "FBP-like" or "FBP-inspired" systems, and I assume it is one of the few FBP products that are in the marketplace - unlike my work, which is all public domain. The terms "FBP-like" and "FBP-inspired" are actually thanks to Joe Witt, the developer of NiFi. I try to describe the difference between "classical" FBP and "FBP-like" in my article on https://jpaulm.github.io/fbp/noflo.html . With all due respect to Joe, I find NiFi a bit over-complex, although his data packets are immutable, which has certain advantages. For a complete suite that takes you from a diagram to actual running code, I would suggest you start with the FBP diagramming tool, https://github.com/jpaulm/drawfbp , generate a JavaFBP network, using https://github.com/jpaulm/javafbp , and run! Both of these tools, as well as others on https://github.com/jpaulm/ , are open source. My colleague, Bob Corrick, and I are working on a tutorial which you may find helpful: https://github.com/jpaulm/fbp-tutorial-filter-file .

Enterprise application framework supporting DDD

I spent short time studing Habanero and i found it good approach for making Enterprise Application in a really short period of time.
The pattern witch Habanero use is "Active Record" as it's developers say.
My questions are:
There any similar application like Habanero witch fully support Domain
Driven Design by determining aggregate roots, entities and value objects
Is it right decision to use such tools in big organizations
Does it worth training our team on such a tool
thank you
Framework support for Domain Driven Design is quite different from frameworks supporting data driven applications. Such framework should increase the productivity of developers that works with an ubiquitous language that evolves with the business and that is learned by a domain-expert.
They should not face concepts like aggregates, root, value objects because they are just modelling concepts, conceptual tools, but ways to ease the development process. Thus a framework exposing abstract classes or interfaces named AggregateRoot, Entity or ValueObject is fundamentally broken. It doesn't provides any real value to an application, just useless indirections.
However:
There are a few frameworks designed to support domain driven design, listed here. Moreover, I'm developing one by myself based on previous experiences that worked very well
It depends, obviosly. For example we used all of the Epic's modeling patterns with success.
We used some "home made" framewoks too, and some of them proved to really increase productivity. However, such frameworks (if useful) always have steep learning curves and it depends very much on how much reliable the software have to be and what are the developers skills.
It depends on the framework, on the complexity of the business (if you don't need a domain expert to understand it, you don't need DDD) and on the developers, too. I faced successful stories and huge failures with different frameworks in different contexts. I've also had a conference that faced the topic (you can see the slides here).

Looking for a mature, scalable GraphDB with .NET or C++ binding

My basic requirements from a GraphDB:
Mature (production-ready)
Native .NET or C++ language binding
Horizontal scalability: both
Automated data redundancy and sharding
Distributed graph algorithms / query execution
Currently I disqualified the following:
InfiniteGraph: no C++ / .NET language binding
HyperGraphDB: no C++ / .NET language binding
Microsoft Trinity: Not mature
Neo4j: not distributed
I'm not sure about the scalability of the following:
Sparsity DEX
Franz Inc. AllegroGraph
Sones GraphDB
I found the available information about horizontal scalability capabilities quite general. I guess there are good reasons for this.
Any information would be appreciated.
Unfortunately your basic requirements already extend todays general understanding of graphs - even in the academia. No listed pure graph database will be able to satisfy all your needs. Distributed graph algorithms which are aware of large distributed but interconnected graphs are still a big research issue. So for your application it might be best to find a well matching graph database, graph processing stack or RDF-Store and implement the missing parts on your own.
When your application is mostly Online Transactional Graph Processing (OLTP) (read/write heavy) with a focus on the vertices and you can resign on the distributed algorithms for a moment then use one of these:
Neo4j
OrientDB
DEX
HyperGraphDB
InfiniteGraph
InfoGrid
Microsoft Horton
When it is more Online Analytical Processing (OLAP) (mostly read) still with a focus on the vertices and distribution really matters then :
Apache Hama (early stage project)
Microsoft Trinity (research project)
Golden Orb (good, but Java only)
Signal/Collect (http://www.ifi.uzh.ch/ddis/research/sc , but a research project)
Or is its focus more on the edges, logical reasoning/pattern matching and you need or better can live with a distribution on an edge level like in the Semantic Web then use one of these RDF-/Triple-/Quadstores:
AllegroGraph (okay, they are a graphdb/rdf store hybrid ;)
Jena
Sesame
Stardog
Virtuoso
...and many more RDF stores
Good starting points might be DEX or Neo4j: If you're looking for a good and really fast graphdb kernel for C++ DEX might be best, but you would have to implement a lot of networking and distribution stuff on your own. Neo4j has a lot of distribution and fault tolerance, but at the moment more on a vertex sharding level and it's kernel is Java. For ideas and inspiration on implementing distributed graph algorithms perhaps take a look at Golden Orb and Signal/Collect.
An alternative approach might be starting with AllegroGraph or Stardog. Especially AllegroGraph might be a bit tricky in the beginning until you get adopted to their way of thinking. Stardog is still young and Java, but fast and already quite mature.

Is CouchDB a good persistent layer for Membase?

Membase is great for social game due to it's low latency.
As I understand CouchDB is a MVCC system using b+ tree, with a focus on append only design.
(http://guide.couchdb.org/draft/btree.html)
One of the most important scenario of Membase is social game.
Social game has a lot of write operations (50+%).
And a good portion of them are in-place updates.
So why is CouchDB a suitable persistent layer for Membase?
I'd also add that CouchDB's append-only log format really doesn't have much relation to whether application writes are new items or updates. The append-only format gives us much better reliability and performance than an in-place system (like sqlite...which is still quite reliable). It's also much easier to take backups of.
Does Membase NEED an append-only log format? maybe not...does it NEED CouchDB?...YES!
The benefits of map-reduce and indexing as well as eventually consistent replication that CouchDB brings are nothing less than huge for Membase...and the benefits of low-latency, clustering and UI that Membase brings to CouchDB are arguably just as important.
(Disclosure: I work for Couchbase)
Perry Krug
CouchDB has great file formats, great ability to recover from crashes, sophisticated authentication and authorization tools, and a universal, standard, interface: HTTP. CouchDB is poor at low-latency queries, optimized memory utilization, and heavy update speeds (a million per second).
Membase currently has only a simple SQLite file format for persistence, less sophisticated authentication and authorization, using a more obscure protocol. Membase is amazing for low-latency queries, ideal memory utilization, and heavy update speeds.
I think the two complement each other very well. Since the merging effort is coming from core developers in both projects, collaborating together, I expect to see the strengths of both and the weaknesses of neither. Yes, CouchDB is a good persistence layer for Membase.
Money speaks and if there ever was a vote of confidence then here it is, not only from a new lead investor but also from the existing ones as well.
http://www.couchbase.com/press-releases/couchbase-series-C
Besides, don't you think that Membase itself is more than well enough qualified to make an evaluation for such a merger decision?

OODBMS - RDBMS difference and which one is suitable for a factory management system

I searched a bit for the differences between OODBMS and RDBMS. I pretty much know what they are. However, how I will decide which one is better for which applications. Can anyone kindly help me please?
What I meant for factory management is: there are production lines to manufacture bottled, frozen and other food stuff. The application manages from assigning staff onto the lines, to keep the production records in the system. Which dbms is better for such systems?
Thanks in advance.
Here is a nice article by Rick Grehan that describes cases where ODBMS are useful:
http://www.odbms.org/wp-content/uploads/2013/11/006.04-Grehan-When-to-Use-an-ODBMS-2005.pdf
Disclaimer: this is an "old curmudgeon" answer, from a guy who wrote plenty of perfectly functional accounting, manufacturing and other code before OOP came into the mainstream.
With that being said...
Factory management is classic relational database stuff, it's what it was invented to do. The code for classic relational apps tends to follow very predictable patterns, lots of loops over retrieved rows from tables, or straight pass-through stuff: passing data up to the UI or down to the database. If your DB is well-designed, the biz logic you code will be details in those loops or pass-throughs, but those two patterns will dominate.
An OODMS on the other hand, from the point of view of this "old curmudgeon", attempts to recast the perfectly and efficiently functional RDBMS into something that will work with classes/objects, for no discernible gain over a system that has proven itself for decades to work extremely well. Classes have little or nothing to do with the classic code patterns that sit on top of relational databases. In fact, they tend to complicate things and can easily get in the way. I'm not saying don't use OOP code to deal with the database, just that OOP was invented for a different kind of problem, a problem that database apps don't happen to have.
Decision to choose OODBMS or RDBMS does not depend upon particular application like factory management/automation.
It is depends on many criteria like
1) Programming Paradigm - If you [programmer] choose to visualize/implement in the OO programming language then the OODBMS is suitable to store the objects as directly into the database, but Most widely the type of DBMS Relational, because it is well established commercially and have a good mathematical background.
2) Application Specific - For an factory automation/management, responsiveness and fast access is important. The OODBMS are swifter than the RDBMS. If you considered for web-development then a light-weight tool like MySQL will fit a lot.
3) Trend - Now there is paradigm swift from Legacy/Structural to Object/Component Oriented programming. so therefore, in this trend the OODBMS is best suitable for the Enterprise Applications like factory management, etc.
It depends on the application layer using. If it is a simple approach more towards procedural way [which can have classes too] RDBMS is more suitable. Otherwise if you are more towards a strict object oriented system OODBMS can be used.
I usually draw the line of usefulness at the point in which they need to be integrated into enterprise systems. If your project does not necessarily need to integrate, ODBMS is usually easier or technically superior. If you can integrate via web services or "push/pull" into an enterprise system DB, then you can still use an ODBMS, but there might be political pressure against it. (newer ODBMS/RDBMS replication like dRS for db4o may be a good fit) But if you need tight integration with legacy or enterprise datastores, then you're usually forced to use the RDBMS for one reason or another.
That said, your individual production lines might benefit greatly from an ODBMS which are great at storing oft-changing complex object models and schemas while the orchestrator system could follow my previous line of thinking.
I've been using ODBMS for many years, and have been dreading the project which requires me to return to purely relational data management. Although recent improvements in ORM tooling have made relational much more pleasant to work with, the ORM+RDBMS solution still can't keep up with ODBMS systems in a few key areas (see the previously mentioned article on odbms.org).