Database-independent queries to the extreme in Java... or in general - persistence

Let's say I have an app that should ideally be able to use a relational database, object database, XML files, or whatever to persist its data. In the spirit of coding to interfaces instead of implementations, I have a generic DataStore interface that specifies a contract for all I/O involving the data store. This interface can be implemented by concrete classes such as RDBMSDataStore, OODBMSDataStore, XMLFileDataStore, and so on.
This works well as long as I keep the contents of the DataStore interface simple - i.e. getThis(), getThose(), saveThat(), updateThis(), etc. But as soon as I require more complicated queries, it breaks down. The XMLFileDataStore class obviously doesn't understand SQL, and the RDBMSDataStore class obviously doesn't understand XPath/XQuery. And OODBMSDataStore understands something entirely different depending on the OODBMS in use.
I could adopt a language-independent object query language, write all my queries in that and then have the concrete classes translate them into their native language, but that's a huge task, if I want to be complete.
Are there standards or best practices for handling this kind of situation in Java? Unfortunately it seems like 99% of the world interprets "database independence" to mean "relational database independence" and ignores the object databases, XML databases, document databases, etc. entirely.

From the way I read the question, this sounds a lot like the semantic that Hibernate brings to the table for Java. It even has mode for dealing with XML as the content backing store (using Dom4J). The Hibernate API has a number of extension points that could allow the addition of an OODBMS model. Even if Hibernate turns out not to be the best solution for you (implementation-wise), I think it provides a good example of the types of patterns that can be used to solve the problems you proposed.

Related

Why use interfaces

I see the benefit of interfaces, to be able to add new implementations via contract.
I dont see following problem:
Imagine you have interface DB with method "startTransaction".
Everything is fine you implement it in MySQL, PostgreSQL. But tomorrow you move to mongodb - then you have no transaction support.
What do you do?
1) Empty method - bad because you think you have transactions but u havent
2) Create your own - then you should have some parameters that will be different that regular "startTransaction" method.
And on top of that sometimes simple interfaces just doesnt work.
Example: You need additional parameters for different implementations.
If you're exposing the concept of transactions on your interface, then you must functionally support transactions no matter what, since users of the interface will logically depend on it. I.e., if a caller can start a transaction, then they expect to also be able to roll back a transaction of several queries. Since Mongo doesn't natively have any concept of rolling back transactions, there's one of two possibilities:
You implement the possibility of rolling back queries in code, emulating the functionality of transactions for a database which doesn't natively support it. (Whether that's even reliably possible in Mongo is a debatable topic.)
Your interface is working at the wrong level of abstraction. If your interface is promising functionality an implementation can't deliver, then either the interface or the implementation is unrealistic.
In practice, Mongo and SQL databases are such different beasts that you would either never make this kind of change without changing large parts of your business logic around it; or you specify your interface using an extremely minimal common-denominator interface only, most certainly not exposing technology-specific concepts on an abstract interface.
You are mostly correct, interfaces can be very useful, but also problematic in (fast) changing code, a best practice conserning interfaces is to keep them as small as possible.
When something can handle an transaction, create an interface only for handling an transaction. Split them up in as small as logically possible parts, in that way, when new classes emerge, you can assign them the specific interfaces that can determine their methods.
For the multiple parameter problem, this can indeed be problematic, see if you can determine if this specific value could be moved to a constructor, or indicates that the action that you are doing is indeed sightly different from the action that does not need this parameter.
I hope this helps, goodluck
You are right interfaces are used to add new implementations via contract but those implementations have to posses some similarity.
Let's take an example:
You cannot implement dog using human interface because dog is a living organism.
Same thing you are trying to do here.You are trying to implement a non-sql db using sql db implementation.

Replacements to hand-rolled ADO.NET POCO mapping?

I have written a wrapper around ADO.NET's DbProviderFactory that I use extensively throughout my applications. I also have written a lot of code that maps IDataReader rows to POCOs. However, as I have tons of classes the whole thing is getting to be a pain in the ass to maintain.
I have been looking at replacing the whole she-bang with a micro-orm like Petapoco. I have a few queries though:
I have lots of POCOs that contain other POCOs in them as properties. How well does the Petapoco support this?
Should I use a ORM like Massive or Simple.Data that returns a dynamic object and map that to a POCO?
Are there any approaches I can take to the whole mapping of rows to POCOs? I can't really use convention-based tools as my database isn't particularly consistent in how it is designed.
How about using a text templating/code generator to build out a lightweight persistence layer? I have a battle-hardened open source project called TextMetal to generate the necessary persistence layer based on tried and true architectural decisions. The only lacking thing is object to object relations but it does support query expressions and works well with poorly designed data schemas.
You can see a real world project that uses the above tool call Can Do It For.
Feel free to ask me about any design decisions once you take a look-sse.
Simple.Data automagically casts its dynamic type to static types. It will map nested properties as long as they have been eager-loaded using the .With method. So for example
Customer customer = db.Customer.WithOrders().Get(42);
would populate the Orders property of the customer object.
Could you use QueryFirst, or modify it? It takes your sql and wraps it in vanilla ADO code, generated at design time. You get fresh POCOs from your result schema every time you save your file. Additionally, you can choose to test all queries and regenerate all wrappers via the option in the tools menu. It's dependent on Sql Server and SqlClient, so unless you do some modification, you'll lose DbProviderFactory.

Is core data implementing data mapper pattern?

I know that core data should not be considered as ORM but it still offers the functionality that is similar to ORM. Just curious, is it implementing data mapper pattern? I know "The Data Mapper is a layer of software that separates the in-memory objects from the database. Its responsibility is to transfer data between the two and also to isolate them from each other." (Martin Fowler). IMHO context manager handles all SQL stuff into one transaction, so it's very performance wise design and IMHO core data might be considered implementing data mapper pattern.
One year latter, I will contribute with my two cents
I am not an ORM expert and just recently started something using a Data Mapper, but as a long time Core Data user I can say that no. The main objective of this pattern is having a clear cut of a domain object from all database related operations.
Once I start writing unit tests, the first thing I notice is that I must load a database, even if it is just some in memory store, but I do must load one. Also there are no mappers for each class, I have no control about how each relation is stored.
Core Data loads lots of meta information about your object graph and forces some structure to them. Although you can change the persistent store and bake something of your own, you will have lots of restrictions about how to do it, with a clear "relational" feeling to it.
The idea is good, we might say it is some variation of it. Something that I do love is that the save operation is done by the context, not the object itself. So there is some type of separation.
However look at those functions like "awakeFromFetch" or "didSave", both operations are related with the data store, not a plain domain object. A proper Data Mapper pattern would allow you to define those operations for each persistent store, not unified in a single object.
UPDATE:
Funny enough one day after my answer I had to deal with an old CoreData based project and must come back to improve this answer. To make things clear, I do consider that "seems like a pattern" is not enough. For example, implementation of the facade and adapter patterns is quite similar, but you name them differently depending on how you use them.
Is Core Data implementing data mapper?
I must say that my "not quite" should have been "definitely not!"
I have just been very angry because I needed to rename some fields and later add new ones. Although I do know quite well how auto-migrations work with Core Data I forgot how annoying these are.
How many times do you need some new field, rename something, experiment until you get it right.... and every single tiny change requires a full blown database migration? With Data Mappers this never happens because domain objects are perfectly decoupled. You only touch the database to catch up with the domain objects after you finish some new feature. Core Data forces you to bind at every single moment every single detail of your domain objects.
Boy, how sweet life was until I forgot that "tiny" annoyance of Core Data being the exact opposite of what you can achieve with data mappers.

Perl DAL Design Questions

Recently I've been working on some Perl projects and I'm a very novice Perl programmer. I've been experimenting with DBIx::Class and so far I'm really please with the flexibility and the ease of use. I'm curious though. I come from a .NET background and it seems like we spend a lot of time abstracting our DAL to a certain degree. Is this a good idea with a language like Perl?
Where I want to get shortly is to have the ability to start mocking my DAL so I can write unit tests for tasks. Right now though I'm struggling with how the overall structure and design of the application should look though?
Re: Relationship of the ORM within the application...
Hopefully this is the kind of answer you are looking for...
With most web app frameworks in the "scripting" world (i.e. perl, ruby, python, php), most of the time I've seen the business logic implemented at the ORM object level. E.g. in a Rails app it's at the ActiveRecord level; if you are using DBix::Class it would be at the Result-class level.
More concretely, in the case of DBIx::Class, if you have a table named VENDOR there would be a class called MySchema::Result::Vendor which represents a single row in the table VENDOR. Simply add your business methods to this class.
One disadvantage of this approach is that it ties your business logic with the ORM class which can make (unit) testing more difficult. One solution to this is to use a light-weight database for unit tests (i.e. SQLite), and an ORM like DBIx::Class will facilitate switching between the two. Of course, this won't work if you rely on SQL features which are not implemented in SQLite.
Another approach is to place your business logic methods into a Moose role. Then those methods can be composed into either the DBIx::Class Result class or into a mock object for testing. I can elaborate with an example if you'd like.
One big assumption of the above is that your business object = one row in the database. If this is not the case (i.e. you business object spans more than one table), then you'll probably want to create a "shell" or container object which has as instance members each of the constituent ORM objects. Fortunately, Moose has a nice facility for delegating methods (search for Moose delegation and the handles attribute of instance member declarations), so it is relatively easy to make a composite business object out of two or more ORM objects. Again, I can give you an example of this if you'd like.
HTH
I used to work in perl projects for the web long ago. But after working with things such as Django, perl's tools like DBI, etc now look to me rather rudimentary and outdated. Have a look at the django ORM for example, it's elegant and very productive to use, you can bypass it if your query is too complex or the ORM gets in the way...
These days I'd go python or ruby for that kind of projects.
For one liners, small text parsing or sysadmin stuff I still love to use small perl snippets. But I'm more into DRY than TMTOWTDI for more than a few lines of code these days.

Rules of thumbs for writing "queries" using ADO.NET Entity Framework

I’m currently working on a prototype of a medium size web application, and I thought that it would be good to also experiment with Entity Framework. The problem is that the major part of the application is not the data layer and logic, and so that I don't have much time to play with Entity Framework. On the other hand, the database schema is quite simple.
One of the problems I’m facing is that I cannot find a consistent way to "write queries". As far as I can tell, there are four "interfaces" for the job:
LINQ to Entities
LINQ to Entities using LINQ extension methods
Entity SQL
Query builder
OK, the first two are essentially the same, but it’s good to use just one for maintenance and consistency.
I’m mostly puzzled by the fact that none of them seems to be complete and the most general. I often find myself cornered and using some ugly looking combination of several of them. My guess is that Entity SQL is the most general one, but writing queries using strings feels like a step back. The main reason I’m experimenting with something like Entity Framework is that I like the compile time checking.
Some other random thought / issues:
I often also use the ObjectQuery.Include() method, but again it takes a string. Is this the only way?
When to use ObjectQuery.Execute() (vs. ToList())? Does it actually execute the query?
Should execute queries as soon as possible (e.g. using ToList()) or should I not care just let leave the execution for the first enumeration which gets in the way?
Are ObjectQuery.Skip() and ObjectQuery.Take() available only as extension methods? Is there a better way to do paging? It’s 2009 and almost every web application deals with paging.
Overall, I understand there are many difficulties when implementing an ORM, and often one has to compromise. On the other hand, the direct database access (e.g. ADO.NET) is plain and simple and has well defined interface (tabular results, data readers), so all code - no matter who and when writes it - is consistent. I don’t want to faced with too many choices whenever writing a database query. It’s too tedious and more than likely different developers will come up with different ways.
What are your rules of thumbs?
I use LINQ-to-Entities as much as possible. I also try and formalise to the lambda-form, as opposed to the extended SQL-style syntax. I have to admit to have had problems enforcing relationships and making compromises on efficiency just to expedite my coding of our application (eg. Master->Child tables may need to be manually loaded) but all in all, EF is a good product.
I do use EF's .Include() method for lazy-loading, which as you say, does require a string input. I find no problem with this, other than that of identifying the string to use which is relatively simple. I guess if you're keen on compile-time checking of such relations, a model similar to: Parent.GetChildren() might be more appropriate.
My application does require some "dynamic" queries to be performed, though. I have two ways of meeting this:
a) I create a mediator object, eg. ClientSearchMediator, which "knows" how to search for clients by name, etc. I can then put this through a SearchHandler.Search(ISearchMediator[] mediators) call (for example). This can be used to target specific data structures and sort results accordingly using LINQ-to-Entities.
b) For a looser experience, possibly as a result of a user designing their own query (using high level tools our application provides), eSQL is ideal for this purpose. It can be made to be injection-safe.
I don't have enough knowledge to address all of this, but I'll at least take a few stabs.
I don't know why you think ADO.NET is more consistent than Entity Framework. There are many different ways to use ADO.NET and I've definitely seen inconsistency within a single code base.
Entity Framework is currently a 1.0 release and it suffers from many 1.0 type problems (incomplete & inconsistent API, missing features, etc.).
In regards to Include, I assume you are referring to eager loading. Multiple people (outside of Microsoft) have developed solutions for getting "type safe" includes (try googling something like: Entity Framework ObjectQueryExtension Include). That said, Include is more of a hint than anything. You can't force eager loading and you have to always remember to call the IsLoaded() method to see if your request was fulfilled. As far as I know, the way "Include" works is not changing at all in the next version of Entity Framework (4.0 - to ship with VS 2010).
As far as executing the Linq query as soon as it's built vs. the last possible moment, that decision is situational. Personally, I would probably execute it as soon as it's built for the most part unless there was a compelling reason not to, but I can see other people going the opposite direction.
There are more mature ORMs on the market and Entity Framework isn't necessarily your best option. For the most part, you can bend Entity Framework to your will, but you may end up rolling your own implementation of features that come out of the box with other ORMs.