Testing: Best embedded database to use - postgresql

I've been using Derby (embedded) for testing my applications. It was working fine, but recently it started giving me problems like this one Derby Sequence Loop in junit test.
The same test runs fine with other databases (e.g. H2).
My tech stack is Java 1.6, Spring, Hibernate.
I've seen comparisons of similar databases (H2, HSQLDB, Derby, PostgreSQL and MySQL), but the comparisons don't mentions the problems (like the one I faced).
So, which one is the prefered one for testing purposes?

The best test database is the same version of the same type you're running in production, on the same host OS and architecture.
The question really tends to be "what requirements do you have in testing but not production that make you consider a different DB"?
For example, you might want an embedded database you can easily launch and destroy under unit testing control, without having to specify paths to external binaries, launch and stop a database server, etc. In exchange for that you accept a compromise that your tests are not as good a match for the real system you'll be running on.
When I was building software on Java app servers I ran my tests against PostgreSQL, just like the production machine. It's not like it's hard to initdb and pg_ctl start a DB; you can even do it from a test wrapper if you really feel the need. I think people are a bit too focused on fully in-process embedded DBs and don't properly consider the costs of that.

I understand some people think that it's better to always use the same database. But there are a few disadvantages:
Unit tests might be very slow as each database call goes over TCP/IP.
Running the tests might be more complicated as you need to install and run the same database as you use for production on your local machine.
The database you use for production database might need a lot of memory, which you might not want on the developer machine.
There is a risk you inadvertently use features that are only available on one database, which will make switching to another database very hard (if you ever consider doing that in the future).
Using a different database for unit testing also has disadvantages of course. If you use database specific features, there is a risk that some tests can not be run or the tests don't behave exactly the same way. Because of that, I suggest to run the nightly builds using the same database as used for production.
Specially if you are using a database abstraction layer such as Hibernate, switching the database should be very easy. In this case it does make a lot of sense to consider an in-memory database such as H2, HSQLDB, or Derby. Specially if you consider writing many (or longer running) tests.

Related

How to make a legacy system heavily reliant on Entity Framework more testable?

We have a fairly large system (~1m lines) which is heavily reliant on Entity Framework 6. This means our DbContext is passed around and used everywhere.
We also have lots of "unit" tests, that are using an actual SQL Server database. Each machine that runs tests has a dedicated database which gets wiped and set up with needed data before running each test.
This is of course not ideal in terms of speed, maintainability, ease of use, etc.
My end goal is to make all of our unit tests (~5k tests) not use an actual database, but a mock of some sort. I know this process is not going to be easy but I also want it to be as less painful as possible.
How can I make my tests faster and more unit scoped?
There is only one option here, and that is to make an in memory system. The unfortunate part of in memory is that it does not merge well with entity framework, like at all.
All of the mocks for Entity Framework are suggested as being done with mock databases, which you already have setup.
So the good news is that you can do it in memory on your own! The bad news is that you can do it, in memory, on your own.
You are going to have to create a mirror implementation of Entity Framework which works only on an in memory set of information. In many ways this is similar to event driven in memory cache systems and the benefit you will get out of creating a custom implementation for mocking in memory would be that you can easily port it over to support caching your database information and also allowing an interface for querying that information.
This will expose features such as event driven push technologies (SignalR) for live data. It will also take a lot of time. If the development time required to implement an in memory mock of entity framework is less than the time lost due to waiting for tests to complete in the mock database then it may be worth it.
We also have lots of "unit" tests, that are using an actual SQL Server database. Each machine that runs tests has a dedicated database which gets wiped and set up with needed data before running each test.
This is of course not ideal in terms of speed, maintainability, ease of use, etc.
My end goal is to make all of our unit tests (~5k tests) not use an actual database, but a mock of some sort. I know this process is not going to be easy but I also want it to be as less painful as possible.
I disagree with your conclusion. I recommend you keep the existing dedicated testing database, I actually think it's an excellent idea to test with an actual database because it will reveal any data-layer issues, such as if someone makes a change to the database schema without it being reflected in your EF Entity classes. The EF itself is fairly well tested so having code-coverage of EF-generated classes is a fool's errand in my personal opinion, and the disconnect between entity-classes and the database is probably the #1 area where things can (and will) go wrong.
I understand your performance concerns with using this approach, certainly there are potential optimizations to be had, such as not resetting the database before running every test (perhaps have the tests assume the database is in a known "good state" before starting) and only reset it after the test suite is completed.

MVC 5 Unit tests vs integration tests

I'm currently working a MVC 5 project using Entity Framework 5 (I may switch to 6 soon). I use database first and MySQL with an existing database (with about 40 tables). This project started as a “proof of concept” and now my company decided to go with the software I'm developing. I am struggling with the testing part.
My first idea was to use mostly integration tests. That way I felt that I can test my code and also my underlying database. I created a script that dumps the existing database schema into a “test database” in MySQL. I always start my tests with a clean database with no data and creates/delete a bit of data for each test. The thing is that it takes a fair amount of time when I run my tests (I run my tests very often).
I am thinking of replacing my integration tests with unit tests in order to speed up the time it takes to run them. I would “remove” the test database and only use mocks instead. I have tested a few methods and it seems to works great but I'm wondering:
Do you think mocking my database can “hide” bugs that can occur only when my code is running against a real database? Note that I don’t want to test Entity Framework (I'm sure the fine people from Microsoft did a great job on that), but can my code runs well against mocks and breaks against MySQL ?
Do you think going from integration testing to unit testing is a king of “downgrade”?
Do you think dropping Integration testing and adopting unit testing for speed consideration is ok.
I'm aware that some framework exists that run the tests against an in-memory database (i.e. Effort framework), but I don’t see the advantages of this vs mocking, what am I missing?
I'm aware that this kind of question is prone to “it depends of your needs” kind of responses but I'm sure some may have been through this and can share their knowledge. I'm also aware that in a perfect world I would do both (tests by using mocks and by using database) but I don’t have this kind of time.
As a side question what tool would you recommend for mocking. I was told that “moq” is a good framework but it’s a little bit slow. What do you think?
Do you think mocking my database can “hide” bugs that can occur only when my code is running against a real database? Note that I don’t want to test Entity Framework (I’m sure the fine people from Microsoft did a great job on that), but can my code runs well against mocks and breaks against MySQL ?
Yes, if you only test your code using Mocks, it's very easy for you to have false confidence in your code. When you're mocking the database, what you're doing is saying "I expect these calls to take place". If your code makes those calls, it'll pass the test, but if they're the wrong calls, it won't work in production. At a simple level, if you add / remove a column from your database the database interaction may need to change, but the process of adding/removing the column is hidden from your tests until you update the mocks.
Do you think going from integration testing to unit testing is a king of “downgrade”?
It's not a downgrade, it's different. Unit testing and integration testing have different benefits that in most cases will complement each other.
Do you think dropping Integration testing and adopting unit testing for speed consideration is ok.
Ok is very subjective. I'd say no, however you don't have to run all of your tests all of the time. Most testing frameworks (if not all) allow you to categorise your tests in some way. This allows you to create subsets of your tests, so you could for example have a "DatabaseIntegration" category that you put all of your database integration tests in, or "EndToEnd" for full end to end tests. My preferred approach is to have separate builds. The usual/continuous build that I would run before/after each check-in only runs unit tests. This gives quick feedback and validation that nothing has broken. A less common / daily / overnight build, in addition to running the unit tests, would also run slower / repeatable integration tests. I would also tend to run integration tests for areas that I've been working on before checking in the code if there's a possibility of the code impacting the integration.
I’m aware that some framework exists that run the tests against an in-memory database (i.e. Effort framework), but I don’t see the advantages of this vs mocking, what am I missing?
I haven't used them, so this is speculation. I would imagine the main benefit is that rather than having to simulate the database interaction with mocks, you instead setup the database and measure the post state. The tests become less how you did something and more what data moved. On the face of it, this could lead to less brittle tests, however you're effectively writing integration tests against another data provider that you're not going to use in production. If it's the right thing to do is again, very subjective.
I guess the second benefit is likely to be that you don't necessarily need to refactor your code in order to take advantage of the in memory database. If your code hasn't been constructed to support dependency injection then there is a good chance that you will need to perform some level of refactoring in order to support mocking.
I’m also aware that in a perfect world I would do both (tests by using mocks and by using database) but i don’t have this kind of time.
I don't really understand why you feel this is the case. You've already said that you have integration tests already that you're planning on replacing with unit tests. Unless you need to do major refactoring in order to support the unit-tests your integration tests should still work. You don't usually need as many integration tests as you need unit tests, since the unit tests are there to verify the functionality and the integration tests are there to verify the integration, so the overhead of creating them should be relatively small. Using categorisation to determine which tests you run will reduce the time impact of running your tests.
As a side question what tool would you recommend for mocking. I was told that “moq” is a good framework but it’s a little bit slow. What do you think?
I've used quite a few different mocking libraries and for the most part, they are all very similar. Some things are easier with different frameworks, but without knowing what you're doing it's hard to say if you will notice. If you haven't built your code with dependency injection in mind then you may have find it challenging getting your mocks to where you need them.
Mocking of any kind is generally quite fast, you're usually (unless you're using partial mocks) removing all of the functionality of the class/interface you're mocking so it's going to perform faster than your normal code. The only performance issues I've heard about are if you're MS fakes/shims, sometimes (depending on the complexity of the assembly being faked) it can take a while for the fake assemblies to be created.
The two frameworks I've used that are a bit different are MS fakes/shims and Typemock. The MS version requires a certain level of visual studio, but allows you to generate fake assemblies with shims of certain types of object that means you don't have to pass your mocks from your test through to where they're used. Typemock is a commercial solution that uses the profiling API to inject code while your tests are running which means it can reach parts other mocking frameworks can't. These are both particularly useful if you've got a codebase that hasn't been written with unit testing in mind that can help to bridge the gap.

Query on test automation framework

This is regarding an issue I have been facing for sometime. Though I have found a solution, I really would like to get some opinion about the approach taken.
We have an application which receives messages from a host, does some processing and then pass that message on to an external system. This application is developed in Java and has to run on Linux/Oracle and HP-NonS top Tandem/SQLMX OS/DB combination.
I have developed a test automation framework which is written in Perl.This script traverses directories (specified as an argument to this script) and executes test cases specified under those directories. Test cases could be organized into directories as per functionality. This approach was taken to ensure that a specific functionality can also checked in addition to entire regression suite.For verification of the test results, script read test case specific input files which has sql queries mentioned in them.
In Linux/Oracle, Perl DBD/DBI interface is used to query Oracle database.
When this automation tool was run in Tandem, I came to know that there was no DBD/DBI interface for SQLMX. When we contacted HP, they informed us that it would be a while before they develop DBD/DBI interfaces for SQLMX DB.
To circumvent this issue, I developed a small Java application which accepts DB connection string, user name, password and various other parameters. This Java app is now responsible for test case verification functionality.
I must say it meets our current needs, but something tells me (do not know what) that approach taken is not a good one, though now I have the flexibility of running this automation with any DB which has a JDBC interface.
Can you please provide feedback on the above approach and suggest a better solution?
Thanks in advance
The question is a bit too broad to comment usefully on except for one part.
If the project is in Java, write the tests in Java. Writing the tests in a different language adds all sorts of complications.
You have to maintain another programming language and attendant libraries. They can have different caveats and bugs for the same actions, such as you ran into with a lack of a database driver in a certain environment.
Having the tests done in a different language than the project is developed in drives a wedge between testing and development. Developers will not feel responsible for participating in the testing process because they don't even know the language.
With the tests written in a different language, they cannot leverage any work which has already been done. They have to write all over again basic code to access and work with the data and services, doubling the work and doubling the bugs. If the project code changes APIs or data structures, the test code can easily fall out of sync requiring extra maintenance hassles.
Java already has well developed testing tools to do what you want. The whole structure of running specific tests vs the whole test suite is built into test suites like jUnit.
So I can underscore the point, I wrote Test::More and I'm recommending you not use it here.

Experiences with db4o in production environment

We are planning to migrate from Prevayler (http://prevayler.org/) to db4o (http://www.db4o.com/), so we wanted to know experiences, pros and cons, and best practices to move forward. What do you think about it? Is it a good solution? Or, maybe moving forward with a NoSQL standard solution would be better? (Such as MongoDB or CouchDB). Thanks!
we use db4o as main db in our production environment (both embedded and client/server), so i am going to share some of my experiences.
Pro:
- very easy for development (you just implememt data classes)
- support both embedded/client server under the same interface, which makes it easy to unittest
- decent performance for small projects
Cons:
- db4o is no longer developed so it's quite dead project, and you wont get much of support for it
- [client/server] everytime you change model you need to redeploy server (not talking about the fact that you need to host server app yourself)
- [client/server] performance degrade with more clients connected - not possible to scale
Summary: db4o is very good as embedded db (mobile app, desktop local db), but if it comes to server application you get into troubles
Given that I did not receive so much feedback, we gave it a try. So far, it seemed to be a good option for a embedded database, that makes much easier the deployment. So, we wrote again the whole persistence layer, with their unit tests and seemed to work fine.
Then, we tried with real data, and we start to have some weird Null Pointers, and we did not know why. Then, we started to read and we found this issue: http://www.gamlor.info/wordpress/2009/09/db4o-activation-update-depth/.
We've been trying to solve for a few hours, but then we decided no to spend more time on it, and found another way. CouchDB, OrientDB or MongoDB are still on our list.

Derby vs PostgreSql Performance Comparison

We are doing research right now on whether to switch our postgresql db to an embedded Derby db. Both would be using glassfish 3 for our data layer. Anybody have any opinions or knowledge that could help us decide?
Thanks!
edit: we are writing some performance tests ourselves right now. Looking for answers more based on experience / first hand knowledge
I know I'm late to post an answer here, but I want to make sure nobody makes the mistake of using Derby over any production-quality database in the future. I apologize in advance for how negative this answer is - I'm trying to capture an entire engineering team's ill feelings in a brief Q&A answer.
Our experience using Derby in many small-ish customer deployments has led us to seriously doubt how useful it is for anything but test environments. Some problems we've had:
Deadlocks caused by lock escalations - this is the biggest one and happens to one customer about once every week or two
Interrupted I/Os cause Derby to fail outright on Solaris (may not be an issue on other platforms) - we had to build a shim to protect it from these failures
Can't handle complicated queries which MySQL/PostgreSQL would handle with ease
Buggy transaction log implementation caused a table corruption which required us to export the database and then re-import it (couldn't just drop the corrupted table), and we still lost the table in the process - thank goodness we had a backup
No LIMIT syntax
Low performance for complicated queries
Low performance for large datasets
Due to the fact that it's embedded, Derby is more of a competitor to SQLite than it is to PostgreSQL, which is an extremely mature production-quality database which is used to store multi-petabyte datasets by some of the largest websites in the world. If you want to be ready for growth and don't want to get caught debugging someone else's database code, I would recommend not using Derby. I don't have any experience with SQLite, but I can't imagine it being much less reliable than Derby has been for us and still being as popular as it is.
In fact, we're in the process of porting to PostgreSQL now.
Derby still is relatively slow in performance, but ... where ever your Java application goes your database server goes, completely platform independent. You don't even need to think about installing a DB server where your Java app is being copied to.
I was using MySQL with Java, but having an embedded implementation of your Database server sitting right within my Java App is just stunning and unprecedented productivity, freedom and flexibility.
Always having a DB server included whenever and wherever on any platform for me is just heaven !!!
Have not compared Postgresql to Derby directly. However, having used both in different circumstances, I have found Derby to be highly reliable. However you will need to pay attention to Derby configuration to ensure it suits your application needs.
When looking at the H2 databases stats site, it's worth reading follow up discussion which comes out in favour of Derby compared to the H2 conclusions. http://groups.google.com/group/h2-database/browse_thread/thread/55a7558563248148?pli=1
Some stats from the H2 database site here:
http://www.h2database.com/html/performance.html
There are a number of performance test suites that are included as part of the Derby source code distribution itself; they are used by Derby developers to conduct their own performance testing of Derby. So if you need examples of performance tests, or want additional ones, you could consider using those. Look in the subdirectory named java/testing/org/apache/derbyTesting/perf in the Derby source distribution.
I'm not sure what you mean that Derby is embedded. Certainly that is an option, but you can also install it as a separate server.