How can I abstract database repository from data service? - jpa

I was reading at this article https://www.infoq.com/articles/spring-data-intro to understand how can a data service layer be independent of database(RDBMS /NoSQL). It looks like there's no way to design entity and repository to be independent of database. This article was written on 2012. Do we've any other technologies since then that has implemented this feature?

Before actually answering your question I have to ask: Why do you want to do that? Think twice because abstraction comes at a cost and just doing it in order to have a "clean" design is most certainly not worth it.
Now to your question:
There is no library or framework that does this out of the box.
Probably the thing getting closest to it is spring data which you are obviously aware of. If you stick to the persistent storage independent interfaces your repositories will abstract over the persistence store use at least to some extent. BUT: you will have to provide different kinds of metadata (typically as annotations on the entities) to make it work. So in this sense, it is a really leaky abstraction.
Of course, you can roll your own: create an interface with the operations you need and provide implementations for the different data stores you want to use. Also, include a storage independent way of providing metadata.
So the question becomes: Why hasn't anybody done that yet? And why you probably shouldn't do it either?
It is hard: Just writing SQL in a way that all (relevant) SQL databases understand is difficult, see this for an example.
You loose a lot of power of your stores. For example, RDBMSes are great at joining stuff. But joining is basically a no go for many No-SQL databases. So your API probably shouldn't offer this feature. This basically dumbs everything down to the common denominator, which is going to be really small when you have very different data stores.
It is not worth it. This leads back to my opening question: Why would you want to do it anyway? I certainly see the use of switching different RDBMSes. Some companies only want certain vendors in their datacenter, it is nice to have an in memory variant for testing and so on. But switching a store from for example Oracle to Hazelcast and then to MongoDB to CSV? Why would you want to do that? What is the business value of that?

Related

OODBMS - RDBMS difference and which one is suitable for a factory management system

I searched a bit for the differences between OODBMS and RDBMS. I pretty much know what they are. However, how I will decide which one is better for which applications. Can anyone kindly help me please?
What I meant for factory management is: there are production lines to manufacture bottled, frozen and other food stuff. The application manages from assigning staff onto the lines, to keep the production records in the system. Which dbms is better for such systems?
Thanks in advance.
Here is a nice article by Rick Grehan that describes cases where ODBMS are useful:
http://www.odbms.org/wp-content/uploads/2013/11/006.04-Grehan-When-to-Use-an-ODBMS-2005.pdf
Disclaimer: this is an "old curmudgeon" answer, from a guy who wrote plenty of perfectly functional accounting, manufacturing and other code before OOP came into the mainstream.
With that being said...
Factory management is classic relational database stuff, it's what it was invented to do. The code for classic relational apps tends to follow very predictable patterns, lots of loops over retrieved rows from tables, or straight pass-through stuff: passing data up to the UI or down to the database. If your DB is well-designed, the biz logic you code will be details in those loops or pass-throughs, but those two patterns will dominate.
An OODMS on the other hand, from the point of view of this "old curmudgeon", attempts to recast the perfectly and efficiently functional RDBMS into something that will work with classes/objects, for no discernible gain over a system that has proven itself for decades to work extremely well. Classes have little or nothing to do with the classic code patterns that sit on top of relational databases. In fact, they tend to complicate things and can easily get in the way. I'm not saying don't use OOP code to deal with the database, just that OOP was invented for a different kind of problem, a problem that database apps don't happen to have.
Decision to choose OODBMS or RDBMS does not depend upon particular application like factory management/automation.
It is depends on many criteria like
1) Programming Paradigm - If you [programmer] choose to visualize/implement in the OO programming language then the OODBMS is suitable to store the objects as directly into the database, but Most widely the type of DBMS Relational, because it is well established commercially and have a good mathematical background.
2) Application Specific - For an factory automation/management, responsiveness and fast access is important. The OODBMS are swifter than the RDBMS. If you considered for web-development then a light-weight tool like MySQL will fit a lot.
3) Trend - Now there is paradigm swift from Legacy/Structural to Object/Component Oriented programming. so therefore, in this trend the OODBMS is best suitable for the Enterprise Applications like factory management, etc.
It depends on the application layer using. If it is a simple approach more towards procedural way [which can have classes too] RDBMS is more suitable. Otherwise if you are more towards a strict object oriented system OODBMS can be used.
I usually draw the line of usefulness at the point in which they need to be integrated into enterprise systems. If your project does not necessarily need to integrate, ODBMS is usually easier or technically superior. If you can integrate via web services or "push/pull" into an enterprise system DB, then you can still use an ODBMS, but there might be political pressure against it. (newer ODBMS/RDBMS replication like dRS for db4o may be a good fit) But if you need tight integration with legacy or enterprise datastores, then you're usually forced to use the RDBMS for one reason or another.
That said, your individual production lines might benefit greatly from an ODBMS which are great at storing oft-changing complex object models and schemas while the orchestrator system could follow my previous line of thinking.
I've been using ODBMS for many years, and have been dreading the project which requires me to return to purely relational data management. Although recent improvements in ORM tooling have made relational much more pleasant to work with, the ORM+RDBMS solution still can't keep up with ODBMS systems in a few key areas (see the previously mentioned article on odbms.org).

Do I need to use DDD, Unit of Work, Repositories or something similar for simple web apps?

I'm working on a simple eCart system using .net4 (c#). I've been doing a lot of reading about Unit of Work Pattern, Repository Pattern, and Persistence Ignorance. I think I have a grasp on the strategy and benefits to building my layers this way, but for my simple app I'm wondering if it's necessary and if anyone can point me towards good architecture for my scope.
Please correct me if I'm wrong - the main benefits to using repositories are to create fewer trips to the DB and to separate application architecture from DB architecture. IE - what's good for DB performance isn't always good for application design so it's best to design what's best for both and then create an interface between the two.
So here's the question - I want any business transaction that occurs to be saved to the DB as soon as it occurs, so there doesn't seem to be a point in queuing data in repositories and then saving it immediately. Why not just save it directly?
Are there other benefits of DDD that I'm missing or would it be over engineering to build out such a robust architecture for every simple project that comes along? Thanks for any help.
Do you need to use [insert pattern here]: Nope When it comes right down to it, the best practice is always the one that gets your application done, and meets the time, monetary, and technical requirements.
But if you take the "lets just get it done" approach, then be aware of the Technical Debt you might be incurring.
Also there are a lot of reasons to use some of these patterns (and they don't always have to do with performance), particularly the Unit Of Work pattern. This has more to do with the requirements and restrictions that often come with ORM's and such. These issues can be a bit complex, but I suspect as you begin to implement some of these things you'll start to realize what those issues are and come to understand why these patterns are useful.
Agree with CodingGorilla. Patterns are great unless they conflict with YAGNI.
If every transaction needs to be written immediately (that is, if you have potential contention between the actions of two users) then you will need a queuing mechanism or you can use the underlying transactional mechanism of whatever data repository you might be using (e.g. SQL)

MongoDB as the primary database?

I have read a lot of the MongoDB.
I like all the features it provides, but I wonder if it's possible to have it as the only database for my application, including storing sensitive information.
I know that it compromises the durability part in ACID but I will as a solution have 1 master and 2 slaves in different locations.
If I do that, is it possible to use it as the primary database, storing everything?
UPDATE:
Lets put it this way.
I really need a document storage rather than traditional dbms for be able to create my flexible application. But is MongoDB reliable enough to store customer sensitive information if I have multiple database replications and master-slave? Cause as far as I know one major downside is that it compromises the D in ACID. So I solve it with multiple databases.
Now there is not major problems such as lost of data issues?
And someone told me that with MongoDB a customer could be billed twice. Could someone enlighten this?
Yes, MongoDB can work as an application's only data store. Use replication and make sure you know about safe writes and w.
And someone told me that with MongoDB a customer could be billed
twice. Could someone enlighten this?
I'm guessing whoever told you that was talking about eventual consistency. Some NoSQL databases are eventually consistent, meaning that you could have conflicting writes. MongoDB is not, though, it is strongly consistent (just like a relational database).
Your application being flexible or not has absoutely nothing to do with wether you use "nosql", a "document db" or a proper RDBMS. Nobody using your application will care either way.
If you need flexibility while coding, you should research into frameworks, like ActiveRecord for Ruby, which can make DB-interfacing much more simple, generic and powerful. At that level, you can gain alot more than just changing the DB, and you can even become DB-agnostic, meaning you can change DB without changing any code. Indeed, I have found ActiveRecord to boost my productivity many many fold by alleviating me from tedious and error-prone "code intermixed with SQL".
Indeed, if you feel you need a schemaless database, for critical data, you are doing something wrong. You are putting your own convenience over critical needs of the projects, or in ignorance thinking you won't get into problems later. Sooner or later, lack of consistency will bite your ass, hard!
I feel you are hesistant towards RDBMS because you are not really that comfortable with all the jargons, syntax and sound CS principles.
Believe me, if you're going to create an application of any value, you are hundred times better learning SQL, ACID and good database-principles in the first place. Just read up on the basics, and build your knowledge from wherever you are now. It's the same for each and every one of us, it takes time, but you learn to do things right from the start.
Low-level tools like MongoDB and equivalent just provide you with infinitely more ammunition to shoot yourself in the foot with. They make it seem easy. In reality however, they leave the hard work for you, the coder, to deal with, while an RDBMS will handle more of the cruft for you once you grok the basics.
Why use computers at all, if you want more work, you can just go back to paper. Design will be a breeze, and implementation can be super-quick. Done. Except it won't be right of course.
In the real world, we can't afford to ignore consistency, database design and many more sound CS principles. Which is why it's a great idea to study them in the first place, and keep learning more and more.
Don't buy into the hype. You ask question about MongoDB here, but include that you really need its features. With 25 years of computer experience, I simply don't buy it. If you think creatively, an RDBMS can be made to do whatever you want it to, or a framework can be utilized to save you from errors and wasted time.
Crafting ACID properties onto MongoDB seems like more work to me, and by experience, sounds like an excercise in futility, rather than using what is already designed to suit such purposes.
Is it possible? Sure. Why not? It's possible to store everything as XML in text files if you wanted to.
Is it the best idea? It depends on what you're trying to do, the rest of your system architecture, the scale your site is intended to achieve, and so on.

What SQLite library do you recommend on the iPhone/iPad?

A few people have wrapped the SQLite library or provided alternatives. What are their relative merits?
Core Data
Yes, a bit of snarky answer.
There are three primary reasons to use SQLite directly or via a wrapper.
You are sharing databases across platforms and can't use Core Data's schema
You really do need raw SQLite performance and have the 17th level SQLite API Mastery required to actually achieve said performance advantage over SQLite.
You know SQLite inside and out, don't like learning new things, and want to re-invent the bits necessary to get between database and screen. (Slightly snarky again).
Core Data buys you a tremendous amount of functionality that is subtly very hard to do. That is, object graph management with full integrity and undo support.
bbum gave the most succinct answer and you should ponder carefully not using Core Data.
But I thought you also deserved an answer to your actual question.
There are basically two wrapper approaches I know of:
FMDB uses an approach that simply makes a much easier to use Cocoa API overlay for SQLLite. This can be great if you know SQL and database design well and just want a simple database.
Other approaches are generally more object-relational mapping systems, that attempt to give you an object view of an underlying datastore, and hide some queries from you.
In both cases if you have a really simple data store they can make sense to use if you have a specific reason... but using Core Data gives you an awful lot for free (though I'll admit the learning curve can be steep).
There is a third approach. This third approach allows you to build SQLite statements via a set of Objective-C classes. They will handle how your data is converted and sanitized. They will also ensure that your statements are correctly constructed. You can try https://github.com/ziminji/objective-c-sql-query-builder This particular library also offers Object Relational Mapping (ORM) that follows the Active Record design pattern.

How best to integrate several systems?

Ok where I work we have a fairly substantial number of systems written over the last couple of decades that we maintain.
The systems are diverse in that multiple operating systems (Linux, Solaris, Windows), Multiple Databases (Several Versions of oracle, sybase and mysql), and even multiple languages (C, C++, JSP, PHP, and a host of others) are used.
Each system is fairly autonomous, even at the cost of entering the same data into multiple systems.
Management recently decided that we should investigate what it will take to get all the systems happily talking to each other and sharing data.
Keep in mind that while we can make software changes to any of the individual systems, a complete rewrite of any one system (or more) is not something management is likely to entertain.
The first thought of several of the developers here was the straight forward: If system A needs data from system B it should just connect to system B's database and get it. Likewise if it needs to give B data it should just insert it into B's database.
Due to the mess of databases (and versions) used, other developers were of the opinion that we should have one new database, combining the tables from all the other systems to avoid having to juggle multiple connections. By doing this they hope that we might be able to consolidate some tables and get rid of the redundant data entry.
This is about the time I was brought in for my opinion on the whole mess.
The whole idea of using the database as a means of system communication smells funny to me. Business logic will have to be placed into multiple systems (if System A wants to add data to System B it better understand B's rules concerning the data before doing the insert), several systems will most likely have to do some form of database polling to find any changes to their data, continuing maintenance will be a headache, as any change to a database schema now propagates several systems.
My first thought was to take the time and write APIs/Services for the different systems, which once written could be easily used to pass/retrieve data back and forth. A lot of the other developers feel that is excessive and far more work than just using the database.
So what would be the best way to go about getting these systems to talk to each other?
Integrating disparate systems is my day job.
If I were you, I would go to great effort to avoid accessing System A's data from directly within System B. Updating System A's database from System B is extremely unwise. It is exactly the opposite of good practice to make your business logic so diffuse. You will end up regretting it.
The idea of the central database isn't necessarily bad ... but the amount of effort involved is probably within an order of magnitude of rewriting the systems from scratch. It is certainly not something I would attempt, at least in the form you describe. It can succeed, but it is much, much harder and it takes a lot more discipline than the point-to-point integration approach. It's funny to hear it suggested in the same breath as the 'cowboy' approach of just shoving data directly into other systems.
Overall your instincts seem pretty good. There are a couple of approaches. You mention one: implementing services. That's not a bad way to go, especially if you need updates in real time. The other is a separate integration application that is responsible for shuffling the data around. That's the approach I usually take, but usually because I can't change the systems I'm integrating to ask for the data it needs; I have to push the data in. In your case the services approach isn't a bad one.
One thing I would like to say that might not be obvious to someone coming to system integration for the first time is that every piece of data in your system should have a single, authoritative point of truth. If the data is duplicated (and it is duplicated), and the copies disagree with each other, the copy in the point of truth for that data must be taken to be correct. There is just no other way to integrate systems without having the complexity scream skyward at an exponential rate. Spaghetti integration is like spaghetti code, and it should be avoided at all costs.
Good luck.
EDIT:
Middleware addresses the problem of transport, but that is not the central problem in integration. If the systems are close enough together that one app can shove data directly in to another, they're probably close enough that a service offered by one can be called directly by another. I wouldn't recommend middleware in your case. You might get some benefit from it, but that would be outweighed by the increased complexity. You need to solve one problem at a time.
Sounds like you may want to investigate Message Queuing and message-oriented middleware.
MSMQ and Java Message Service being examples.
It seems you are looking for opinions, so I will provide mine.
I agree with the other developers that writing an API for all the different systems is excessive. You would likely get it done faster and have much more control over it if you just take the other suggestion of creating a single database.
One of the challenges that you will have is to align the data in each of the different systems so that it can be integrated in the first place. It may be that each of the systems that you want to integrate holds entirely different sets of data but more likely it is data that is overlapping. Before diving into writing API:s (which is the route I would take as well given your description) I would recommend that you try and come up with a logical data model for the data that needs to be integrated. This data model will then help you leverage the data that you are having in the different systems and make it more useful to the other databases.
I would also highly recommend an iterative approach to the integration. With legacy systems there is so much uncertainty that trying to design and implement it all in one go is too risky. Start small and work your way to a reasonably integrated system. "Fully integrated" is hardly ever worth aiming for.
Directly interfacing via pushing/ poking databases exposes a lot of internal detail of one system to another. There are obvious disadvantages: upgrading one system can break the other. Moreover, there can be technical limitations in how one system can access the database of the other (consider how an application written in C on Unix will interact with a SQL Server 2005 database running on Windows 2003 Server).
The first thing you have to decide is the platform where the "master database" will reside, and the same for the middleware providing the much required glue. Instead of going towards API level middleware-integration (such as CORBA), I would suggest you to consider Message Oriented Middleware. MS Biztalk, Sun's eGate and Oracle's Fusion can be some of the options.
Your idea of a new database is a step in the right direction. You might like to read a little bit on Enterprise Entity Aggregation pattern.
A combination of "data integration" with a middleware is the way to go.
If you are going towards Middleware + Single Central Database strategy, you might want to consider achieving this in multiple phases. Here's a logical stepped process which can be considered:
Implementation of services/APIs for different systems which expose the functionality for each system
Implementation of Middleware which accesses these APIs and provides an interface to all the systems to access the data/services from other systems (accesses data from central source if available, else gets it from another system)
Implementation of Central Database only, without data
Implementation of Caching/Data-Storage Services at the Middleware level which can store/cache data in the central database whenever that data is accessed from any of the Systems e.g. IF System A's records 1-5 are fetched by System B through Middleware, the Middleware Data Caching Services can store these records in the centralized database and the next time these records will be fetched from the central database
Data Cleansing can happen in Parallel
You can also create a import mechanism to push data from multiple systems to the central database on a daily basis (automated or manual)
This way, the effort is distributed across multiple milestones and data is gradually stored in the central database on first-accessed-first-stored basis.