Refactoring application: Direct database access -> access through REST - rest

we have a huge database application, which must get refactored (there are so many reasons for this. biggest one: security).
What we already have:
MySQL Database
JPA2 (Eclipselink) classes for over 100 tables
Client application that accesses the database directly
What needs to be there:
REST interface
Login/Logout with roles via database
What I've done so far:
Set up Spring MVC 3.2.1 with Spring Security 3.1.1
Using a custom UserDetailsService (contains just static data for testing atm)
Created a few Controllers for testing (simply receiving/providing data)
Design Problems:
We have maaaaany #OneToMany and #ManyToMany relations in our database
1.: (important)
If I'd send the whole object tree with all child objects as a response, I could probably send the whole database at once.
So I need a way to request for example 'all Articles'. But it should omit all the child objects. I've tried this yesterday and the objects I received were tons of megabytes:
#PersistenceContext
private EntityManager em;
#RequestMapping(method=RequestMethod.GET)
public #ResponseBody List<Article> index() {
List<Article> a = em.createQuery("SELECT a FROM Article a", Article.class).getResultList();
return a;
}
2.: (important)
If the client receives an Article, at the moment we can simply call article.getAuthor() and JPA will do a SELECT a FROM Author a JOIN Article ar WHERE ar.author_id = ?.
With REST we could make a request to /authors/{id}. But: This way we can't use our old JPA models on the client side, because the model contains Author author and not Long author_id.
Do we have to rewrite every model or is there a simpler approach?
3.: (less important)
Authentication: Make it stateless or not? I've never worked with stateless auth so far, but Spring seems to have some kind of support for it. When I look at some sample implementations on the web I have security concerns: With every request they send username and password. This can't be the right way.
If someone knows a nice solution for that, please tell me. Else I'd just go with standard HTTP Sessions.
4.:
What's the best way to design the client side model?
public class Book {
int id;
List<Author> authors; //option1
List<Integer> authorIds; //option2
Map<Integer, Author> idAuthorMap; //option3
}
(This is a Book which has multiple authors). All three options have different pros and cons:
I could directly access the corresponding Author model, but if I request a Book model via REST, I maybe don't want the model now, but later. So option 2 would be better:
I could request a Book model directly via REST. And use the authorIds to afterwards fetch the corresponding author(s). But now I can't simply use myBook.getAuthors().
This is a mixture of 1. and 2.: If I just request the Books with only the Author ids included, I could do something like: idAuthorMap.put(authorId, null).
But maybe there's a Java library that handles all the stuff for me?!
That's it for now. Thank you guys :)
The maybe solution(s):
Problem: Select only the data I need. This means more or less to ignore every #ManyToMany, #OneToMany, #ManyToOne relations.
Solution: Use #JsonIgnore and/or #JsonIgnoreProperties.
Problem: Every ignored relation should get fetched easily without modifying the data model.
Solution: Example models:
class Book {
int bId;
Author author; // has #ManyToOne
}
class Author {
int aId;
List<Book> books; // has #OneToMany
}
Now I can fetch a book via REST: GET /books/4 and the result will look like that ('cause I ignore all relations via #JsonIgnore): {"bId":4}
Then I have to create another route to receive the related author: GET /books/4/author. Will return: {"aId":6}.
Backwards: GET /authors/6/books -> [{"bId":4},{"bId":42}].
There will be a route for every #ManyToMany, #OneToMany, #ManyToOne, but nothing more. So this will not exist: GET /authors/6/books/42. The client should use GET /books/42.

First, you will want to control how the JPA layer handles your relationships. What I mean is using Lazy Loading vs. Eager loading. This can easily be controller via the "fetch" option on the annotation like thus:
#OneToMany(fetch=FetchType.Lazy)
What this tells JPA is that, for this related object, only load it when some code requests it. Behind the scenes, what is happening is that a dynamic "proxy" object is being made/created. When you try to access this proxy, it's smart enough to go out and do another SQL to gather that needed bit. In the case of Collection, its even smart enough to grab the underlying objects in batches are you iterate over the items in the Collection. But, be warned: access to these proxies has to happen all within the same general Session. The underlying ORM framework (don't know how Eclipselink works...I am a Hybernate user) will not know how to associate the sub-requests with the proper domain object. This has a bigger effect when you use transportation frameworks like Flex BlazeDS, which tries to marshal objects using bytecode instead of the interface, and usually gets tripped up when it sees these proxy objects.
You may also want to set your cascade policy, which can be done via the "cascade" option like
#OneToMany(cascade=CascadeType.ALL)
Or you can give it a list like:
#OneToMany(cascade={CascadeType.MERGE, CascadeType.REMOVE})
Once you control what is getting pulled from your database, then you need to look at how you are marshalling your domain objects. Are you sending this via JSON, XML, a mixture depending on the request? What frameworks are you using (Jackson, FlexJSON, XStream, something else)? The problem is, even if you set the fetch type to Lazy, these frameworks will still go after the related objects, thus negating all the work you did telling it to lazily load. This is where things get more specific to the mashalling/serializing scheme: you will need to figure out how to tell your framework what to marshal and what not to marshal. Again, this will be highly dependent on whatever framework is in use.

Related

Entity Framework and DDD - Load required related data before passing entity to business layer

Let's say you have a domain object:
class ArgumentEntity
{
public int Id { get; set; }
public List<AnotherEntity> AnotherEntities { get; set; }
}
And you have ASP.NET Web API controller to deal with it:
[HttpPost("{id}")]
public IActionResult DoSomethingWithArgumentEntity(int id)
{
ArgumentEntity entity = this.Repository.GetById(id);
this.DomainService.DoDomething(entity);
...
}
It receives entity identifier, load entity by id and execute some business logic on it with domain service.
The problem:
The problem here is with related data. ArgumentEntity has AnotherEntities collection that will be loaded by EF only if you explicitly ask to do so via Include/Load methods.
DomainService is a part of business layer and should know nothing about persistence, related data and other EF concepts.
DoDomething service method expects to receive ArgumentEntity instance with loaded AnotherEntities collection.
You would say - it's easy, just Include required data in Repository.GetById and load whole object with related collection.
Now lets come back from simplified example to reality of the large application:
ArgumentEntity is much more complex. It contains multiple related collections and that related entities have their related data too.
You have multiple methods of DomainService. Each method requires different combinations of related data to be loaded.
I could imagine possible solutions, but all of them are far from ideal:
Always load the whole entity -> but it is inefficient and often impossible.
Add several repository methods: GetByIdOnlyHeader, GetByIdWithAnotherEntities, GetByIdFullData to load specific data subsets in controller -> but controller become aware of which data to load and pass to each service method.
Add several repository methods: GetByIdOnlyHeader, GetByIdWithAnotherEntities, GetByIdFullData to load specific data subsets in each service method -> it is inefficient, sql query for each service method call. What if you call 10 service methods for one controller action?
Each domain method call repository method to load additional required data ( e.g: EnsureAnotherEntitiesLoaded) -> it is ugly because my business logic become aware of EF concept of related data.
The question:
How would you solve the problem of loading required related data for the entity before passing it to business layer?
In your example I can see method DoSomethingWithArgumentEntity which obviously belongs to Application Layer. This method has call to Repository which belongs to Data Access Layer. I think this situation does not conform to classic Layered Architecture - you should not call DAL directly from Application Layer.
So your code can be rewritten in another manner:
[HttpPost("{id}")]
public IActionResult DoSomethingWithArgumentEntity(int id)
{
this.DomainService.DoDomething(id);
...
}
In DomainService implementation you can read from repo whatever it needs for this specific operation. This avoids your troubles in Application Layer. In Business Layer you will have more freedom to implement reading: with serveral repository methods reads half-full entity, or with EnsureXXX methods, or something else. Knowledge about what you need to read for operation will be placed into operation's code and you don't need this knowledge in app-layer any more.
Every time situation like this emerged it is a strong signal about your entity is not preperly designed. As krzys said the entity has not cohesive parts. In other words if you often need parts of an entity separately you should split this entity.
Nice question :)
I would argue that "related data" in itself is not a strict EF concept. Related data is a valid concept with NHibernate, with Dapper, or even if you use files for storage.
I agree with the other points mostly, though. So here's what I usually do: I have one repository method, in your case GetById, which has two parameters: the id and a params Expression<Func<T,object>>[]. And then, inside the repository I do the includes. This way you don't have any dependency on EF in your business logic (the expressions can be parsed manually for another type of data storage framework if necessary), and each BLL method can decide for themselves what related data they actually need.
public async Task<ArgumentEntity> GetByIdAsync(int id, params Expression<Func<ArgumentEntity,object>>[] includes)
{
var baseQuery = ctx.ArgumentEntities; // ctx is a reference to your context
foreach (var inlcude in inlcudes)
{
baseQuery = baseQuery.Include(include);
}
return await baseQuery.SingleAsync(a=>a.Id==id);
}
Speaking in context of DDD, It seems that you had missed some modeling aspects in your project that led you to this issue. The Entity you wrote about looked not to be highly cohesive. If different related data is needed for different processes (service methods) it seems like you didn't find proper Aggregates yet. Consider splitting your Entity into several Aggregates with high cohesion. Then all processes correlated with particular Aggregate will need all or most of all data that this Aggregate contains.
So I don't know the answer for your question, but if you can afford to make few steps back and refactor your model, I believe you will not encounter such problems.

JPA: How to exclude certain fields from loading from database table

i have a class User which holds an email address and password for authentication users in my web application. This user is mapped to the database via JPA / Eclipselink.
My question is, how can i prevent JPA from loading the password field back from the database? Since i will access the user object in my web app, i'm uncomfortable regarding security with sending the password to the browser.
Is there any way i can prevent loading the field in JPA / EclipseLink? Declaring the field transient is not an option, since i want to store it on the database when i call persist() on the user object.
Thanks,
fredddmadison
JB Nizet has a valid point. Retrieving it and serializing it in the Http response are two separate concerns.
I'm not sure what you're using to serialize your data. If it this is a REST API, consider Jackson's #JsonIgnore annotation or Eclipselink MOXy's #XmlTransient equivalent. If this uses Java EL (facelets, jsps), you should be able to select only the bean properties of interest.
If you really must do this during retrieval, consider JPQL's/Criteria API's constructor functionality. Ensure that the object has a constructor that accepts the specified parameters, and keep in mind that it won't be managed in the persistence context if it's retrieved in this manner.
SELECT NEW my.package.User(u.id, u.name, u.etc) FROM User u
Alternatively, consider the #PostLoad lifecycle callback.
#PostLoad
private void postLoad() {
this.password = null;
}
Finally, this might not be the case, but I would like to reinforce the notion that passwords shouldn't be stored in plaintext. I mention this because returning a hashed salted password that used a secure algorithm (bCrypt, multiple iteration SHA-512, etc) wouldn't be that big a deal (but still isn't ideal).
I have the similar problem. But in my case I have many #OneToMany relationships inside of Entity class and some of them are EAGER. When I query against this Entity it loads all of them, although for web service I need only some of them.
I tried TupleQuery. But it's not the solution because to get needed OneToMany relationships I have to join and get many duplicate rows of the main query. It makes the result more heawy, than economic.

JPA lazy loading strategies for remote client using EclipseLink

1. Question
What are known strategyies, solutions for LAZY-loading from client side?
I was checking out this stuff http://wiki.eclipse.org/Introduction_to_EclipseLink_Sessions_(ELUG)#Remote_Sessions but not sure if this is a solution to my problem or how to use this.
2. Use-case
I'm developing a three tier application where my presentation layer (Eclipse RCP) is a remote client over network:
[ Eclipse RCP ] <-----(RMI)-----> [ [EJB 3] [JPA 2] [Mysql] ]
Now, I use JPA Entities as my domain model which I want to use in my client as well. I get the entites from #Session beans over network.
Problem happens when my JPA Entites have LAZY fields. After serialization and especially because my JPA provider (EclipseLink) is over the other side of the network I need a strategy to load those LAZY fields from client.
I'm going to have many entities: 30-40 maybe. And typical scenario would be when the user sees a list of SomethingModel which has many List fields, but those are not needed to be shown on the list, only when she want's to change a specific element.
3. Possible Solution
I came up with one solution e.g.: I create proxy classes for my JPA entites in client side. When I need a collection field from my Domain model, the proxy class will call the remote EJB to populate the field.
class CarModelClient {
CarModel model;
public String getColor(){
model.getColor();
}
public List<Wheels> getWheels(){
CarModelFacadeRemote carFacade = //get my remote ejb
model.setWheels( carFacade.getWheels( model.getId() ) );
return model.getWheels();
}
}
Well, similar to that.
Thank you for your answers.
Don't try to be too smart and pretend like the client was on the server, and the entity manager was always open. Consider the domain objects, at client-side, exactly as you would consider DTOs or JSON objects: objects containing some information coming from the server and seralized over the wire.
Document the service methods called from the client (the facade methods) to specify which associations are initialized and which are not in the returned entities. If you are on some "list" screen and want to see the detailed view of one of the elements of the list, for example, call another service which loads the entity again from the database (and thus gets fresh results), with probably other associations initialized in order to display more details about the entity.
Trying to dynamically initialize the lazy-loaded associations at the client just doesn't work: it's complex, inefficient, results in an obsolete and incoherent graph on the client, doesn't take transactional isolation into account.

ASP.NET MVC - I think I am going about this wrong

Or I don't understand this at all.
I have started my ASP.NET MVC application using the Controller --> ViewModel --> Service --> Repository pattern.
Does every type of object (Customer, Product, Category, Invoice, etc..) need to have it's own repository and service? If so, how do you bring common items together?
I mean there are a lot of the times when a few of these things will be displayed on the same page. So I am not getting this I don't think.
So I was thinking I need a ShopController, which has a ShopViewModel, which could have categories, sub categoires, products, etc. But the problem, for me, is that it just does not seem to mesh well.
Maybe ASP.NET WebForms were for people like me :)
Edit
So would an aggregate consist of say:
Category, SubCategory, Product, ChildProduct, ProductReview with the Product being the aggregate root?
Then in the ViewModels, you would access the Product to get at it's child products, reviews, etc.
I am using entity framework 4, so how would you implement lazy loading using the repository/service pattern?
Does every type of object (Customer,
Product, Category, Invoice, etc..)
need to have it's own repository
You should have a repository per aggregate root in your domain. See this question for more information on what is an aggregate root.
In the example you give I could see a CustomerReposiotry which would handle retrieve all pertinent customer data(Customer has orders a order has a customer). A ProductRepository that handles retrieving product information.
and service? If so, how do you bring
common items together?
A service layer is nice but only if there is added value in adding this layer. If your service simply passes straight into the repository it might not be needed. However if you need to perform certain business logic on a Product a ProductService might make sense.
This might not make sense
public void UpdateProduct(Product product)
{
_repo.Update(product);
}
But if you have logic this layer makes sense to encapsulate your business rules for products.
public void UpdateProduct(Product productToUpdate)
{
//Perform some sort of business on the productToUpdate, raise domain events, ....
_repo.Update(productToUpdate);
}
So I was thinking I need a
ShopController, which has a
ShopViewModel, which could have
categories, sub categoires, products,
etc. But the problem, for me, is that
it just does not seem to mesh well.
If the domain is flushed out the view model ends up making sense
public ActionResult Index()
{
ShopViewModel shopViewModel = new ShopViewModel();
shopViewModel.Products = _productRepo.GetAll();
//other stuff on the view model.
return(shopViewModel);
}
Update
What happens when you also need to
provide data unobtainable from an
aggregate root? For example, say I
have a create Customer view and in
that view, I also need to provide the
user with a collection of Companies to
choose from to associate a new
customer with. Does the collection of
Companies come from CustomerRepository
or would you also need a
CompanyRepository?
If a Company can live by itself (e.g. you edit, update, delete a company) I would suggest a Company is also an aggregate root for your domain (A Customer has a company and a company has a list of Customers). However if a Company is only obtainable via a Customer, I would treat a company as a ValueType/Value Object. If that is the case I would create a method on the customer repository to retrive all CompanyNames.
_repo.GetAllCompanyNames();
Repositories are indispensable, just go with them. They hide out data implementation. Used with an ORM you can pretty much forget about core db activity (CRUD). You'll find generally there's 1:1 map between an object and a repository, but nothing stops a repository returning anything it likes. Typically though you will acting upon an instance. Create non-object specific repositories for your queries that don't naturally fit into an existing one.
You will find a lot of conflicting arguments on the "Services" part of it - which some people like to split between Domain Services (i'd call these business rules that don't comfortably fit into a Core Domain Object) and Application Services (logical groupings of operations on Domain Objects). I've actually gone for one, separate project called [ProjectName].Core.Operations that lives in my [ProjectName].Core solution folder. Core + Operations = Domain.
An operation might be something that returns a DTO of all the information a View requires built via a number of repository calls and actions on the Domain. Some people (myself included) prefer to hide Repositories completely from Presentation and instead use Operations(Services) as a facade to the them. Just go with gut feeling on naming and don't be afraid, refactoring is healthy. Nothing wrong with a HomePageOperations class, with a method GetEveryThingINeedForTheHomepage returns a ThingsINeedForTheHomePage class.
Keep your controllers as light weight as possible. all they do is map data to views and views to data, talk to "Services" and handle application flow.
Download and have a look at S#arp architecture or the Who Can Help Me projects. The latter really shows a good architecture IMHO.
Lastly don't forget one of the major concerns of tiers is pluggability/testability, so I advise getting your head around a good IoC container (I'm a fan of Castle.Windsor). Again S#arp architecture is a good place to find about this.
You can pass more than one type of Repository to the controller (I'm assuming your using some kind of IoC container and constructor injection). You may then decide to compose some type of service object from all of the passed repositories.

DTOs: best practices

I am considering to use DTOs instead of passing around my domain objects. I have read several posts here as well as elsewhere, and i understand there are several approaches to getting this done.
If i only have about 10 domain classes in all, and considering that i want to use DTOs rather than domain objects for consumption in my Views (WPF front ends), what is the recommended approach.
I think using tools like automapper etc maybe an overkill for my situation. So i am thinking of writing my custom mapper class that will have methods for converting a domain type to a DTO type.
What is the best way to do this, are there any sample to get me started to do this?
Second question: When writing those methods that will create DTOs, how do i deal with setting up all the data, especially when the domain type has references to other domain objects? Do i write equivalent properties in the DTO for mapping to those refernece types in the domain class?
Please ask if i have not put my second question in proper words. But i think you understand what i am trying to ask.
Thrid question: When writing DTOs, should i write multiple DTOs, each containing partial data for a given domain model, so that each of it can be used to cater to a specific View's requirement, or should the DTO have all the data that are there in the corresponding model class.
I've been reading a few posts here regarding DTO's and it seems to me that a lot of people equate them to what I would consider a ViewModel. A DTO is just that, Data Transfer Object - it's what gets passed down the wire. So I've got a website and services, only the services will have access to real domain/entity objects, and return DTO's. These may map 1:1, but consider that the DTO's may be populated from another service call, a database query, reading a config - whatever.
After that, the website then can take those DTO and either add them to a ViewModel, or convert into one. That ViewModel may contain many different types of DTO's. A simple example would be a task manager - the ViewModel contains both the task object you are editing, as well as a group of Dto.User objects that the task can be assigned to.
Keep in mind that the services returning DTO's maybe used by both a website, and maybe a tablet or phone application. These applications would have different views to take advantage of their displays and so the ViewModels would differ, but the DTO's would remain the same.
At any rate, I love these types of discussions, so anyone please let me know what you think.
Matt
I'm kind of using DTOs in a project. I tend to make the DTOs only to show the data I need for an specified view. I fetch all the data shown in the view in my data access class. For example, I may have an Order object which references a Client object:
public class Client{
public int Id{get;set;}
public string Name{get;set;}
}
public class Order{
public int OrderID{get;set;}
public Client client{get;set;}
public double Total{get;set;}
public IEnumerable<OrderLine> lines {get;set;}
}
Then in my OrderListDTO I may have something like:
public class OrderListDTO{
public int OrderId{get;set;}
public string ClientName{get;set;}
...
}
Which are the fields I want to show in my view. I fetch all these fields in my Database access code so I don't have to bother with entity asociations in my view or controller code.
Best Way to develop DTOs
The way to start developing DTOs is to understand that their sole purpose is to transfer subset of data of your business entities to different clients(could be UI, or an external service). Given this understanding you could create seperate packages for each client...and write your DTO classes. For mapping you could write your own mapper defining interfaces to be passed to a factory creating DTO objects based on which data from the entity for which the DTO is being created would be extracted. You could also define annotations to be placed on your entity fields but personally given the number of annotations used I would prefer the interface way. The main thing to note about DTOs is that they are also classes and data among the DTOs should be reused, in other words while it may seem tempting to create DTOs for each use case try to reuse existing DTOs to minimize this.
Getting started
Regarding getting started as stated above the sole purpose of the DTO is to give the client the data it needs....so you keeping in mind you could just set data into the dto using setters...or define a factory which creates a DTO from an Entity based on an interface.....
Regarding your third question, do as is required by your client :)
I come to project with spring-jdbc and there are used DAO layer. Some times existing entities doesn't cover all possible data from DB. So I start using DTO.
By applying '70 structure programming rule I put all DTOs into separate package:
package com.evil.dao; // DAO interfaces for IOC.
package com.evil.dao.impl; // DAO implementation classes.
package com.evil.dao.dto; // DTOs
Now I rethink and decide to put all DTO as inner classes on DAO interfaces for result-sets which have no reuse. So DAO interface look like:
interface StatisticDao {
class StatisticDto {
int count;
double amount;
String type;
public static void extract(ResultSet rs, StatisticDto dto) { ... }
}
List<StatisticDto> getStatistic(Criteria criteria);
}
class StatisticDaoImpl implements StatisticDao {
List<StatisticDto> getStatistic(Criteria criteria) {
...
RowCallbackHandler callback = new RowCallbackHandler() {
#Override
public void processRow(ResultSet rs) throws SQLException {
StatisticDao.StatisticDto.extract(rs, dto);
// make action on dto
}
}
namedTemplate.query(query, queryParams, callback);
}
}
I think that holding related data together (custom DTO with DAO interface) make code better for PageUp/PageDown.
Question 1: If the DTO's you need to transfer are just a simple subset of your domain object, you can use a modelmapper to avoid filling your codebase with logic-less mapping. But if you need to apply some logic/conversion to your mapping then do it yourself.
Question 2: You can and probably should create a DTO for each domain object you have on your main DTO. A DTO can have multiple DTO's inside of it, one for each domain object you need to map. And to map those you could do it yourself or even use some modelmapper.
Question 3: Don't expose all your domain if your view does not require it to. Also you don't need to create a DTO for each view, try to create DTO's that expose what need to be exposed and may be reused to avoid having multiples DTO's that share a lot of information. But it mainly depend's on your application needs.
If you need clarification, just ask.
I'm going to assume that your domain model objects have a primary key ID that may correspond to the ID's from the database or store they came from.
If the above is true, then your DTO will overcome type referecning to other DTO's like your domain objects do, in the form of a foreign key ID. So an OrderLine.OrderHeader relationship on the domain object, will be OrderLine.OrderHeaderId cin the DTO.
Hope that helps.
Can I ask why you have chosen to use DTO's instead of your rich domain objects in the view?
We all know what Dtos are (probably).
But the important thing is to overuse DTOs or not.
Transfering data using Dtos between "local" services is a good practice but have a huge overhead on your developer team.
There is some facts:
Clients should not see or interact with Entities (Daos). So you
always need Dtos for transferig data to/from remote (out of the process).
Using Dtos to pass data between services is optional. If you don't plan to split up your project to microservices there is no need to do that. It will be just an overhead for you.
And this is my comment: If you plan to distribute your project to
microservices in long future. or don't plan to do that, then
DON'T OVERUSE DTOs
You need to read this article https://martinfowler.com/bliki/LocalDTO.html