Reuse code in online application and batch processes with Spring Batch - spring-batch

I've got a question regarding Spring Batch and reusability.
I'm working on a project which is composed of an online application developed with Spring Mvc and several batch processes developed with Spring Batch. Some of the online use cases are also executed in the batch processes.
For example, let's imagine that our online application is Spring's Sagan app and the online use case that we want to reuse in batch is createOrUpdateMemberProfile. From the online application a single member profile is created or updated in each request. However, in the the batch process lots of member profiles are created reading the information from a csv file.
To implement this, the first option is to keep createOrUpdateMemberProfile method as it is and develop new code for the batch process with Spring Batch using its ItemWriters and other elements. In this way I'm developing the use case two times, one for online transactions and the second for batch processes.
The second option is to reuse createOrUpdateMemberProfile method in the batch process with ItemWriterAdapter. In this way I'm reusing the online method, but I guess I'm losing some performance because I'm getting rid of chunk-oriented processing. Depending on the amount of member profiles to create from the csv file, it might be a bit risky.
The third, and last, option is to get rid of createOrUpdateMemberProfile method and implement the batch process with Spring Batch which be called from the online application as well. That is, the online application will launch the job, but instead of creating many member profiles it will create just one. In this way I'm reusing code, but it sounds really strange to me and I discard it beforehand.
Right now I'm keen on the second option but I wonder if I'm loosing something, apart from performance, or I'll struggle with another aspect that I'm not aware of. Any advice or suggestion will be appreciated.
Thanks.

Related

Transition from legacy database to new one that works with legacy application

I have a problem concerning legacy application that can’t be changed in any way (single executable file with no dlls) which is connected to a database that can be changed. It is a visual basic 6 application connecting to the database using ADO.Net. Database engine is a SQL Server 2008. The goal is to create new correct database that will work with legacy application
It is coupled so tightly, that it does not even work with views instead of tables as suggested here. So the present situation look like this: current situation diagram
Currently I am trying to research into the problem and find my options. I have some idea that might work:
Since the approach to change tables to views does not work, I think that one possibility is to intercept the communication between app and legacy DB, read a sent command and redirect it somewhere else and not letting legacy db respond to the request.
Each command is either CRUD or procedure execution and we know what possible commands can be sent. Let’s suppose that a new database is set and has views corresponding to the legacy one. Commands are redirected to my own application that filters out everything and manipulates it (somehow) to work with the new schema.
Diagram of intercepted communication
This is my general idea of what I want to do to avoid rewriting the legacy application which is tightly coupled. Someone already asked a question similar to mine.
They discuss approach how to either dig commands from sql dump files or to intercept the communication.
The interception itself doesn’t seem to be a problem as discussed here. But I wonder how can the mirror reply.
The same goes for port mirroring using [TCP packet hijacking] (https://reverseengineering.stackexchange.com/a/1816)
To sum up, my questions are as follows:
Is that feasible approach to achieve smooth transition from a legacy modifiable solution to new one?
If my idea is doable, how can I listen to db requests and create responses from a different application and not the original db?
Is there a better way how to achieve my goal which is to create new database with database abstraction layer so the old legacy application will remain functional?

Is it ok to hit the database in integration tests?

I have a very specific situation in an integration test.
I'm developing a Rest API composed by few micro services using spring boot. Some of those services have basically crud operations to be accessed by an UI application or to be consumed into internal validations/queries.
All the database manipulation is done through procedures by a legacy library (no jpa) and I'm using a non-standard database. I know that the good practices say to do not use real databases, but in this scenario, I cannot imagine how to use a dummy databases in the test time (like dbunit or h2). In this way:
1 - Is it ok to hit the real database in an integration test?
If 1 is ok, I have another question:
Usually, we do not change the data state in unit/integration tests; and the tests should be independent of each other.
However, in my case, I only know what is the entity id in the response of the post method, making difficult to implement the get/put/delete methods. Of course in the get/put/delete methods I can first insert and then make the another operation, but in this perspective, at the end, I will have a database in a different state of the beginning of the test. In this way, my another question is:
2 - How can I recover the database to the same status before the tests?
I know that it could be a specific situation but I really appreciate any help to find an elegant way of testing this scenario.
Thanks in advance.
You should ask differently: is the risk acceptable to run tests against your production db?
Meaning: if your tests only uncover problems in your code, everybody will be happy.
But if you mess up and the database gets seriously corrupted, and the whole site needs to be taken down for a backup that fails initially... So your business goes offline for 2 days, how do you think your manager will like that?
Long story short: it depends. If you can contain the risks- yes sure. But if not, look for other alternatives. At least : make sure that your manager understands what you are doing.
Integration test are fine and a must I would say as long as you don't run them in a production environment. It allows to test the overall application and how you are handling responses, serializations, and deserializations. Your test cases should handle what you expect to have in you production database and every test test should be isolated and what you create in your test case you must delete it after it so returning to its original state, otherwise you might have clashing test cases. Test integration tests in a local database or a dedicated testing database.
You can specify the in memory H2 database for interface integration testing and populate it as needed for specific tests. This is useful when you are running in situations where having a database on your Jenkins or similar unit test system doesn't make sense. It really depends what you are testing ie end to end integration or finer grain integration.

RavenDB and batch API

Hi in my project we are doing an import if lets say products.
We will have a web service where we will get maybe 10 calls for one import, so we need to have a transaction that can be over several requests.
The import will have both new products that needs to be created and existing products that needs to be updated.
Right now the only way we can know if the product is already in our system or not is to look at Name or Previous Name on the Product we want to import.
So basically my questions is.
Can the transaction API described here http://ravendb.net/docs/client-api/advanced/databasecommands, Be used for the Batch api described here http://ravendb.net/docs/1.0/client-api/advanced/databasecommands/batch. (which it sounds not to work together if one read the batch api documentation).
And if not should I use the database commands connected with an transaction? But in the documentation for databasecommands I can't see how the transaction guid and the operations can be connected?
The easiest way to handle that is to aggregate things in memory and have just a single SaveChanges call in the end.

Approch for JasperReports

I am new to JasperReports, I need a few heads up before going forward with my development. My colleagues told me they are able to generate a basic report, but they are stuck with what approach should be used.
I was told we could:
write the queries in each report
run the queries outside the report, and pass it to the report as a
datasource
Which approach is preferable? Does passing the datasource have any performance hits compared to passing the bean? Also would like to know does first approach run in a different jvm?
Current Project Architecture
Struts 2 - Spring 2.5 - Spring JDBC
If you and your team are just starting out with JasperReports I would recommend embedding the SQL query into each report. It makes building the reports in iReport much easier, since you can constantly preview your report with live data while working on it.
As far as performance, I do not think it really is not going to matter in the most basic of examples. If it is just a SQL query then no matter what scenario you use it is going to be using JDBC with the connection you give it. So I would ignore performance for now.
With that said, if you already have the data (i.e you have displayed on a screen and you want to allow the users to then export it to PDF or whatever), you could simply pass it in as a datasource and not take the performance hit of running the query again.
Another scenario you may want to use your own datasource is if you wanted to manipulate the data before it was exported in the report. Maybe some crazy sort that you could not pull off in SQL or something.
As far as your last question, it should all run in the same JVM (at least from my understanding).

How do we share data between two different services

I am currently working on a web service which is periodically polled. It does not store its state and is instantiated everytime it is queried. Essentially, it retrieves the state of other external entities e.g. databases and delivers it back to the requester.
Recently, the need to store state as arisen in that
There is the need to continously collect data from a particular source and store the bits that are important/relevant
There is the need to collect the aggregate of a particular data source over a period of time
I came up with the following idea:
My main concern here is the fact that I am using a static class (essentially a global) to share data between the two services. Is there a better way to doing this?
edit: Thanks for the responses thus far. Apologies for the vaguesness of this question: just trying to work out what is the best way to share data across different services and am unsure as to the specifics (i.e. what is required). The platform that I am developing on is the .NET framework and both services are simply WCF services hosted as a Windows service.
The database route sounds like the most conventional way to go - however I am reluctant to go down that path for now (mainly for deployment/setup issues; it introduces the need to create new tables, etc in addition to simply installing the software) for at this point the transfer of relatively small amounts of data. This may of course change in the future and going the database route might be the way to go at that point.
Is there any other way besides adding a database persistance layer?
If you need to collect and aggregate data, you might want to consider using a database between the two layers. Or have I misunderstood something?
You should consider enhancing your question with more requirements: pretty much all options are open here.
Sure - how about data binding? I don't have a lot of information to go on here - about your platform but most sufficiently advanced systems offer it in some form.
You could replace your static shared data with some database representation, with a caching layer (like memcached) between the database and the webservice, so that most of the time the data is available very quickly from the cache, but can be retrieved from the database as needed.
I appreciate that you want to keep the architecture simple. Depending on the magnitude of items you have to look up and there permanency, you might just consider leveraging your file system or a message queue. It sounds like you want a file system, because that sounds the least amount of impact to your design.
If you start dealing with tens of thousands of small files, your directories could get hard to navigate and slow to do file lookups on. I typically shoot for about 1000 - 10000 files per directory, and concoct a routine that can generate a path to the file depending on the file name pattern. Keeping the number of subdirectories even is important, some file systems have a limit on the number of subdirectories in a parent directory.