integration testing, comparing JPA entities - jpa

Consider you are doing some integration testing, you are storing some bigger entity into db, and then read it back and would like to compare it. Obviously it has some associations as well, but that's just a cherry on top of very unpleasant cake. How do you compare those entities? I saw lot of incorrect ideas and feel, that this has to be written manually. How you guys do that?
Issues:
you cannot use equals/hashcode: these are for natural Id.
you cannot use subclass with fixed equals, as that would test different class and can give wrong results when persisting data as data are handled differently in persistence context.
lot of fields: you don't want to type all comparisons by hand. You want reflection.
#Temporal annotations: you cannot use trivial "reflection equals" approaches, because #Temporal(TIMESTAMP) java.util.Date <> java.sql.Date
associations: typical entity you would like to have properly tested will have several associations, thus tool/approach ideally should support deep comparison. Also cycles in object graph can ruin the fun.
Best solution what I found:
don't use transmogrifying data types (like Date) in JPA entities.
all associations should be initialized in entity, because null <> empty list.
calculate externaly toString via say ReflectionToStringBuilder, and compare those. Reason for that is to allow entity to have its toString, tests should not depend that someone does not change something. Theoretically, toString can be deep, but commons recursive toStringStyle includes object identifier, which ruins it.
I though, that I could use json format to string, but commons support that only for shallow toString, Jackson (without further instructions on entity) fails on cycles over associations
Alternative solution would be actually declaring subclasses with generated id (say lombok) and use some automatic mapping tool (say remondis mapper), with option to overcome differences in Dates/collections.
But I'm listening. Does anyone posses better solution?

Related

What are the disadvantages of using records instead of classes?

C# 9 introduces record reference types. A record provides some synthesized methods like copy constructor, clone operation, hash codes calculation and comparison/equality operations. It seems to me convenient to use records instead of classes in general. Are there reasons no to do so?
It seems to me that currently Visual Studio as an editor does not support records as well as classes but this will probably change in the future.
Firstly, be aware that if it's possible for a class to contain circular references (which is true for most mutable classes) then many of the auto generated record members can StackOverflow. So that's a pretty good reason to not use records for everything.
So when should you use a record?
Use a record when an instance of a class is entirely defined by the public data it contains, and has no unique identity of it's own.
This means that the record is basically just an immutable bag of data. I don't really care about that particular instance of the record at all, other than that it provides a convenient way of grouping related bits of data together.
Why?
Consider the members a record generates:
Value Equality
Two instances of a record are considered equal if they have the same data (by default: if all fields are the same).
This is appropriate for classes with no behavior, which are just used as immutable bags of data. However this is rarely the case for classes which are mutable, or have behavior.
For example if a class is mutable, then two instances which happen to contain the same data shouldn't be considered equal, as that would imply that updating one would update the other, which is obviously false. Instead you should use reference equality for such objects.
Meanwhile if a class is an abstraction providing a service you have to think more carefully about what equality means, or if it's even relevant to your class. For example imagine a Crawler class which can crawl websites and return a list of pages. What would equality mean for such a class? You'd rarely have two instances of a Crawler, and if you did, why would you compare them?
with blocks
with blocks provides a convenient way to copy an object and update specific fields. However this is always safe if the object has no identity, as copying it doesn't lose any information. Copying a mutable class loses the identity of the original object, as updating the copy won't update the original. As such you have to consider whether this really makes sense for your class.
ToString
The generated ToString prints out the values of all public properties. If your class is entirely defined by the properties it contains, then this makes a lot of sense. However if your class is not, then that's not necessarily the information you are interested in. A Crawler for example may have no public fields at all, but the private fields are likely to be highly relevant to its behavior. You'll probably want to define ToString yourself for such classes.
All properties of a record are per default public
All properties of a record are per default immutable
By default, I mean when using the simple record definition syntax.
Also, records can only derive from records and you cannot derive a regular class from a record.

JPA spec requiring no-arg constructor disables us to write completely correct hashcode/equals. How do you cope with that?

Ok, so here [1] is the great read, how do really correctly define hashcode/equals, namely with respect to object hierarchies. But here I'd like to ask about #pitfall 3 from that article, which shows bizarre behavior when hashcode/equals are defined on mutable fields and Set is used for collections. We cannot use final fields and parameterized constructor only, due to JPA spec. So what are the means to avoid these gotchas? What do you use?
Well, obviously one is to avoid using Set in JPA entities. Does not seems very nice. Another solution could be to "unsupport" setters after equals method was called, but that's ridiculous and equals surely shouldn't have side-effect.
So how do you cope with that? Aside from not-knowing/ignoring it, which probably would be default action in java world...
[1] https://www.artima.com/lejava/articles/equality.html
If entity is detached you need to override equal and hashcode1. Every entity has to have #Id. ID is immutable. Entities should implement equal and hashcode based on primary key ID.
Pitfall 3 deals with mutable object. Cannot by applied on entity with immutable ID.
Guide to Implementing equals() and hashCode() with Hibernate

Scala, Morphia and Enumeration

I need to store Scala class in Morphia. With annotations it works well unless I try to store collection of _ <: Enumeration
Morphia complains that it does not have serializers for that type, and I am wondering, how to provide one. For now I changed type of collection to Seq[String], and fill it with invoking toString on every item in collection.
That works well, however I'm not sure if that is right way.
This problem is common to several available layers of abstraction on the top of MongoDB. It all come back to a base reason: there is no enum equivalent in json/bson. Salat for example has the same problem.
In fact, MongoDB Java driver does not support enums as you can read in the discussion going on here: https://jira.mongodb.org/browse/JAVA-268 where you can see the problem is still open. Most of the frameworks I have seen to use MongoDB with Java do not implement low-level functionalities such as this one. I think this choice makes a lot of sense because they leave you the choice on how to deal with data structures not handled by the low-level driver, instead of imposing you how to do it.
In general I feel that the absence of support comes not from technical limitation but rather from design choice. For enums, there are multiple way to map them with their pros and their cons, while for other data types is probably simpler. I don't know the MongoDB Java driver in detail, but I guess supporting multiple "modes" would have required some refactoring (maybe that's why they are talking about a new version of serialization?)
These are two strategies I am thinking about:
If you want to index on an enum and minimize space occupation, you will map the enum to an integer ( Not using the ordinal , please can set enum start value in java).
If your concern is queryability on the mongoshell, because your data will be accessed by data scientist, you would rather store the enum using its string value
To conclude, there is nothing wrong in adding an intermediate data structure between your native object and MongoDB. Salat support it through CustomTransformers, on Morphia maybe you would need to do the conversion explicitely. Go for it.

How to serialize two different sets of fields of an object

I have a some classes which I need to serialize in two different ways: first- "basic" fields, and the second- some other fields.
e.g. a User class which I sometimes need to serialize just the "first name" and "last name" fields, and sometimes I need to serialize the "id" and "email"
fields as well.
The best way I found to do this so far is mark the basic fields with the [DataMember] attribute, and let .NET do the serializing for me, and for the rest
mark them with a customize attribute and do the serialization myself.
This solution proved to be very costly:
I first sirialize the basic attributes (as mentioned, .NET does that for me)
Then I get the property names of the fields marked with the custom attribute (using reflection namespace),
Then I try to get the those fields and their values from the object, and add their serialization to the basic serialization (not very successfully so far).....
Question is:
Is there a better way? preferbly by which .NET will do the rest of the work for me, and if not, at least one by which I don't need to go through all the
object's fields, find the relevant ones and serialize them myself..
Thank you all..
Oren,
Are you having to run these operations 1000x or more per minute? If not, all but the clumsiest of solutions will not be too costly. For exmample, if you need to do it like this, working from 2 objects is probably just fine. if you haven't actually run real timing comparisons, there's a huge chance you're wrong about what's expensive and what isn't.
But if you want to do it like this anyway, here is a solution that will only take 1% more time.
have an object, e.g., Core, for the subset, and Full for the whole thing
in the constructor of Full, instantiate a private instance of Core (composition pattern sort of). This has insignificant overhead.
Full will not have private member variables for the Core members. Full's setters and getters of the core data will refer to the private instance of Core. So no overhead.
Now you have 2 objects to serialize.

Linq to SQL, Entity Framework, Repository Pattern, and Dependency Injection

Stephan Walters video on MVC and Models is a very good and light discussion of the various topics listed in this questions title. The one question listed in the notes unanswered was:
If you create an Interface / Repository pattern for Linq2SQL, does Linq2SQLs classes still cause a dependency on Linq, even though you pass the classes as toList?
It is probably an easy answer YES, however, what standard mechanic would you use to represent the data?
Lets say you have a Product entity that is made up of three tables (Prices, Text, and Photos) (you could have sets of price for different regions, different text for localization, and different photos). (Sounds like a builder pattern) Would you create a slice of these tables grabbing the right prices, text, and photos in to a single List? Since Lists may be proprietary, would you use a Dictionary object?
I thank you for your answers. I am very interested in the "standard and proper" way to do it rather than 101 possibilities.
Another quick question: is Entity Framework ready for a complicated database yet? There are a lot of constructs that Linq2SQL likes that EF does not. EF seems to require identity fields as primary keys (HAHA), but it seems like every demo does this. I want to use EF, but I constantly fail to make it work, falling back to Linq2SQL.
If you keep the L2S on the other side of the Repository facade (remember, that's all a Repository is - a facade) then you decouple the rest of your application from L2S. This means that the job of the code behind your repository is to turn the L2S into "domain" objects, custom classes, and then the Repository returns those.
In this sense, the Repository is returning fully formed "Product" objects with all their related Price, Text, and Photo data. This is called an Aggregate Root.
There shouldn't be a problem with Lists, since they are CLR objects.
As far as EF for advanced scenarios, my advice would be not yet, for the reasons you note.
The standard mechanism I'd use to represent the data is a Data Transfer Object. I would never return a LINQ to SQL or Entity Framework object across a service boundary, and I would hesitate to return it across a layer boundary of any kind. This is because these objects will serialize implementation-dependant data.