How best to integrate several systems? - legacy-code

Ok where I work we have a fairly substantial number of systems written over the last couple of decades that we maintain.
The systems are diverse in that multiple operating systems (Linux, Solaris, Windows), Multiple Databases (Several Versions of oracle, sybase and mysql), and even multiple languages (C, C++, JSP, PHP, and a host of others) are used.
Each system is fairly autonomous, even at the cost of entering the same data into multiple systems.
Management recently decided that we should investigate what it will take to get all the systems happily talking to each other and sharing data.
Keep in mind that while we can make software changes to any of the individual systems, a complete rewrite of any one system (or more) is not something management is likely to entertain.
The first thought of several of the developers here was the straight forward: If system A needs data from system B it should just connect to system B's database and get it. Likewise if it needs to give B data it should just insert it into B's database.
Due to the mess of databases (and versions) used, other developers were of the opinion that we should have one new database, combining the tables from all the other systems to avoid having to juggle multiple connections. By doing this they hope that we might be able to consolidate some tables and get rid of the redundant data entry.
This is about the time I was brought in for my opinion on the whole mess.
The whole idea of using the database as a means of system communication smells funny to me. Business logic will have to be placed into multiple systems (if System A wants to add data to System B it better understand B's rules concerning the data before doing the insert), several systems will most likely have to do some form of database polling to find any changes to their data, continuing maintenance will be a headache, as any change to a database schema now propagates several systems.
My first thought was to take the time and write APIs/Services for the different systems, which once written could be easily used to pass/retrieve data back and forth. A lot of the other developers feel that is excessive and far more work than just using the database.
So what would be the best way to go about getting these systems to talk to each other?

Integrating disparate systems is my day job.
If I were you, I would go to great effort to avoid accessing System A's data from directly within System B. Updating System A's database from System B is extremely unwise. It is exactly the opposite of good practice to make your business logic so diffuse. You will end up regretting it.
The idea of the central database isn't necessarily bad ... but the amount of effort involved is probably within an order of magnitude of rewriting the systems from scratch. It is certainly not something I would attempt, at least in the form you describe. It can succeed, but it is much, much harder and it takes a lot more discipline than the point-to-point integration approach. It's funny to hear it suggested in the same breath as the 'cowboy' approach of just shoving data directly into other systems.
Overall your instincts seem pretty good. There are a couple of approaches. You mention one: implementing services. That's not a bad way to go, especially if you need updates in real time. The other is a separate integration application that is responsible for shuffling the data around. That's the approach I usually take, but usually because I can't change the systems I'm integrating to ask for the data it needs; I have to push the data in. In your case the services approach isn't a bad one.
One thing I would like to say that might not be obvious to someone coming to system integration for the first time is that every piece of data in your system should have a single, authoritative point of truth. If the data is duplicated (and it is duplicated), and the copies disagree with each other, the copy in the point of truth for that data must be taken to be correct. There is just no other way to integrate systems without having the complexity scream skyward at an exponential rate. Spaghetti integration is like spaghetti code, and it should be avoided at all costs.
Good luck.
EDIT:
Middleware addresses the problem of transport, but that is not the central problem in integration. If the systems are close enough together that one app can shove data directly in to another, they're probably close enough that a service offered by one can be called directly by another. I wouldn't recommend middleware in your case. You might get some benefit from it, but that would be outweighed by the increased complexity. You need to solve one problem at a time.

Sounds like you may want to investigate Message Queuing and message-oriented middleware.
MSMQ and Java Message Service being examples.

It seems you are looking for opinions, so I will provide mine.
I agree with the other developers that writing an API for all the different systems is excessive. You would likely get it done faster and have much more control over it if you just take the other suggestion of creating a single database.

One of the challenges that you will have is to align the data in each of the different systems so that it can be integrated in the first place. It may be that each of the systems that you want to integrate holds entirely different sets of data but more likely it is data that is overlapping. Before diving into writing API:s (which is the route I would take as well given your description) I would recommend that you try and come up with a logical data model for the data that needs to be integrated. This data model will then help you leverage the data that you are having in the different systems and make it more useful to the other databases.
I would also highly recommend an iterative approach to the integration. With legacy systems there is so much uncertainty that trying to design and implement it all in one go is too risky. Start small and work your way to a reasonably integrated system. "Fully integrated" is hardly ever worth aiming for.

Directly interfacing via pushing/ poking databases exposes a lot of internal detail of one system to another. There are obvious disadvantages: upgrading one system can break the other. Moreover, there can be technical limitations in how one system can access the database of the other (consider how an application written in C on Unix will interact with a SQL Server 2005 database running on Windows 2003 Server).
The first thing you have to decide is the platform where the "master database" will reside, and the same for the middleware providing the much required glue. Instead of going towards API level middleware-integration (such as CORBA), I would suggest you to consider Message Oriented Middleware. MS Biztalk, Sun's eGate and Oracle's Fusion can be some of the options.
Your idea of a new database is a step in the right direction. You might like to read a little bit on Enterprise Entity Aggregation pattern.
A combination of "data integration" with a middleware is the way to go.

If you are going towards Middleware + Single Central Database strategy, you might want to consider achieving this in multiple phases. Here's a logical stepped process which can be considered:
Implementation of services/APIs for different systems which expose the functionality for each system
Implementation of Middleware which accesses these APIs and provides an interface to all the systems to access the data/services from other systems (accesses data from central source if available, else gets it from another system)
Implementation of Central Database only, without data
Implementation of Caching/Data-Storage Services at the Middleware level which can store/cache data in the central database whenever that data is accessed from any of the Systems e.g. IF System A's records 1-5 are fetched by System B through Middleware, the Middleware Data Caching Services can store these records in the centralized database and the next time these records will be fetched from the central database
Data Cleansing can happen in Parallel
You can also create a import mechanism to push data from multiple systems to the central database on a daily basis (automated or manual)
This way, the effort is distributed across multiple milestones and data is gradually stored in the central database on first-accessed-first-stored basis.

Related

Implementing SOA with RESTful service and application APIs?

At the moment we have one huge API which is used by our backoffice, our frontend, and also our public API.
This causes me a lot of headaches because when building new endpoints I find a lot of application specific logic in the code which I don't necessarily want to include in my endpoint. For example, the code to create a user might contain code to send a welcome email, but because that's not needed for the backoffice endpoint I will then need to add a new endpoint without that logic.
I was thinking about a large refactor to break our code base in to a number of smaller highly specific service APIs, then building a set of small application APIs on top of those.
So for example, an application endpoint to create a new user might do something like this after the refactor:
customerService.createCustomer();
paymentService.chargeCard();
emailService.sendWelcomeEmail();
The application and service APIs will be entirely separate code bases (perhaps a separate code base per service), they may also be built using different languages. They will only interact through REST API calls. They will be on the same local network, so latency shouldn't be a huge issue.
Is this a bad idea? I've never seen/worked on a codebase which has separated the two before, so perhaps there is a better architecture to achieve the flexibility and maintainability I'm looking for?
Advise, links, or comments would all be appreciated.
Your idea of making multiple, well-defined services is sound and really it is the best way to approach this. Going with purely micro-services approach however trendy it might seem, proves to be an overkill most often than not. This is why I'd just redesign the existing API/services properly and follow solid and sound SOA design principles below. Good Resources could be found on both serviceorientation.com and soapatterns.org I've always used them as reference in my career.
Consider what types of services you need
(image from serviceorientation.com)
Entity services are generally your Client, Payment services - e.g. services centered around an entity in your domain. They should be business-agnostic, and be able to be reused in all scenarios. They could be called sometimes by clients directly if sufficient for their needs. They could be called by Task services.
Utility services contain logic you're likely to reuse in other services, but are generally not called by the clients directly. Rather, they'd be called by Task and Entity services. An example might be a Transliteration service.
Task services combine and reuse Entity and Utility services into meaningful tasks. Most often they are not that agnostic and they do implement some specific business logic. They have meaningful business operations and they are what clients mostly call.
Principles to follow when redesigning
I strongly recommend going over this cheat sheet and making sure everything there is covered when you do your redesign. It's great help.
In general, you should make sure that:
Each service has a common context and follows the separation of concerns principle. E.g. Clients service is only for clients related operations, etc.
Each of the Entity and Utility services is business-agnostic and basic enough. So it can be reused in multiple scenarios and context without being changed. Contract must be simple - CRUD and only common operations that make sense in most usage scenarios.
Services follow a common data model - make sure all the data structures you use are used uniformly in all services in order to prevent need for integration efforts in the future and promote combination of services for clients to exploit. If you need to receive a customer that another service returns, this should be happening without the need for transformation
OK, but where to put the non-agnostic logic?
Now, you have multiple options for abstracting business logic whenever you have a need for complex business functionality. It depends on your scenario what you're going to chose:
Leave logic to all clients. Let them combine your simplified services
If there is business logic that is commonly implemented in multiple of your applications and has the potential to be reused heavily you can implement a composite service that reuses multiple existing underlying services and exposing the logic.
Service Composability. Concerns on multiple API calls communication overhead.
Well, this is an age-old question - should you make multiple API calls when they will probably create some communication overhead? The answer is - it depends on how complex your scenario is, how much reuse you expect and how flexible you want to be. Also is speed critical? To what extent? In Service Oriented Architecture though, this is a very common approach - to reuse your existing services and combine them in new configurations as needed. Yes, it does add some overhead, but I've seen implementations in very complex environments, for example Telecoms, where thanks to the use of ESB solutions, message queues, etc the overhead is negligible compared to the benefits. Here is a common architecture approach (image from serviceorientation.com):
The mandatory legacy refactoring heads-up
More often than not, changing the established contract for multiple existing client systems is a messy business and could very well lead to lots of refactoring and need for looking for needle-in-a-stack functionality that's somewhere deep in the (possibly) legacy code. Business logic might be dispersed everywhere. So make sure you're ready and have the controls, time and will to lead this battle.
Hope this helps
Is this a bad idea?
No, but this is a big overall question to be able to provide very specific advice.
I'd like to separate this into 3 areas:
Approach
Design
Technology
Working backwards, the Technology is the final and most-specific part, and totally depends on what your current environment is (platforms, skills), and (hopefully) will be reasonable self-evident to you once the other things are in progress.
The Design that you outlined above seems like a good end-state - having multiple, specific, focused APIs, each with their own responsibility. Again, the details of the design will depend on the skills of you and your organization, and the existing platforms that you have. E.g. if you are already using TIBCO (for example) and have a lot invested (licenses, platforms, tools, people) then leveraging some of their published patterns/designs/templates makes sense; but (probably) not if you don't already have TIBCO exposure.
In the abstract, the REST API services seems like a good starting point - there are a lot of tools and platforms at all levels of the system for security, deployment, monitoring, scalability, etc. If you are NGINX users, they have a lot of (platform-independent) thoughts on how to do this also NGINX blog, including some smart thinking on scalability and performance. If you are more adventurous, and have an smart, eager team, a look at Event-driven architecture - see this
Approach (or Process) is the key thing here. Ultimately, this is a refactoring, though your description of "a large refactor" does scare me a little - put that way, it sounds like you are talking about a big-bang change and calling it refactoring. Perhaps it is just language, but what's in my mind would be "an evolution of the 'one huge API' into multiple, specific, focused APIs (by refactoring the architecture)". One place to start is Martin Fowler, while this book is about refactoring software, the principles and approach are the same, just at a higher-level. Indeed, he talks about just this here
IBM talk about refactoring to microservices and make it sound easy to do in one step, but it never is (outside the lab).
You have an existing API, serving multiple internal and external clients. I will suggest that you'll want to keep this interface solid for these clients - separate your refactoring of the implementation from the additional concerns of liaising with and coordinating external systems/groups. My high-level starting approach would be:
identify a small (3-7) number of related methods on the API
ideally if a significant, limited-scope change is needed anyway with these methods, that is good - business value with the code change
design/specify a new stand-alone API specifically for these methods
at first, clone the existing model/naming/style
code a new service just for these
with proper automated CI/CD testing and deployment practices
with associated monitoring
modify the existing API to have calls to these methods re-direct to call the new service
perhaps have a run-time switch to change between the old implementation and the new implementation
remove the old implementation from codebase
capture issues, assumptions and problems along the way
the first pass will involve a lot of learning about what works and doesn't.
then repeat the process over & over, incorporating improvements each time.
At some point in the future, when appropriate due to other business-driven needs, the API published to the back-end, front-end and/or public clients can change, but that is a whole different project.
As you can see, if the API is huge (1,000 methods => 140 releases) this is a many-months process, and having a reasonably frequent release schedule is important. And there may be no value improving code that works reliably and never changes, so a (potentially) large portion of the existing API may remain, just wrapped by a new API.
Other considerations:
public API? Maybe a new version (significant changes) will be needed sooner than the internal APIs
focus on the methods/services used by it
what parts/services change the most (have the most enhancement requests approved)
these are the bits most likely to change, and could benefit most from a better process/architecture
what are future plans for change and where would the API be impacted
e.g. change to user management, change to payment processors, change to fulfilment systems
e.g. new business plans (new products/services)
consider affected methods in the API
Also see:
Using Microservices for Legacy System Modernization
Migrating From a Monolith to APIs and Microservices
Break the Monolith! Loosely Coupled Architecture Brings DevOps Success
From the CEO’s Desk: Application Modernization – Assess, Strategize, Modernize! 9
[Microservices Architecture As A Large-Scale Refactoring Tool 10
Probably the biggest 4 pieces of advice that I can give is:
think refactoring: small changes that don't affect function
think agile: small increments that are valuable, testable, achievable
think continuous: have a vision for where you will (eventually) get to, then work the process continuously
script & automate the processes from code, documentation, testing, deployment, monitoring...
improving it every time!
you have an application/API that works - keep it working!
That is always the first priority (you just need to work to carve-out time/budget for maintenance)
Not a bad idea at all.
Also what are your looking is microservices arch. and with that the question comes is how you break your system into well defined services.
We use Domain Driven Design Arch. to break our system into microservices and lagom framework , which allows every service to be in diff. code base and event driven arch. between microservices.
Now lets look at your problem at low level: you said a service contains code like creating a user and sending a email and one with just creating a user and there might be other code as well.
First we need to understand how many type of code you are writing:
Domain Object Logic (eg: User Object) -- what parameters are valid and all -- this should be independent of service endpoint and should be encapsulated in one Class like user class and we say it an Aggregate in Domain Driven Design terms
Business Reactions -- like on user creation send a email -- using event driven arch. these type of logics are separated into process managers or sagas which could most cases work conditionally like a for user created externally send a mail and for user created internally send a email , by having extra data in the event
Also the current way you are doing it , how are you handling transaction across services???

How can I abstract database repository from data service?

I was reading at this article https://www.infoq.com/articles/spring-data-intro to understand how can a data service layer be independent of database(RDBMS /NoSQL). It looks like there's no way to design entity and repository to be independent of database. This article was written on 2012. Do we've any other technologies since then that has implemented this feature?
Before actually answering your question I have to ask: Why do you want to do that? Think twice because abstraction comes at a cost and just doing it in order to have a "clean" design is most certainly not worth it.
Now to your question:
There is no library or framework that does this out of the box.
Probably the thing getting closest to it is spring data which you are obviously aware of. If you stick to the persistent storage independent interfaces your repositories will abstract over the persistence store use at least to some extent. BUT: you will have to provide different kinds of metadata (typically as annotations on the entities) to make it work. So in this sense, it is a really leaky abstraction.
Of course, you can roll your own: create an interface with the operations you need and provide implementations for the different data stores you want to use. Also, include a storage independent way of providing metadata.
So the question becomes: Why hasn't anybody done that yet? And why you probably shouldn't do it either?
It is hard: Just writing SQL in a way that all (relevant) SQL databases understand is difficult, see this for an example.
You loose a lot of power of your stores. For example, RDBMSes are great at joining stuff. But joining is basically a no go for many No-SQL databases. So your API probably shouldn't offer this feature. This basically dumbs everything down to the common denominator, which is going to be really small when you have very different data stores.
It is not worth it. This leads back to my opening question: Why would you want to do it anyway? I certainly see the use of switching different RDBMSes. Some companies only want certain vendors in their datacenter, it is nice to have an in memory variant for testing and so on. But switching a store from for example Oracle to Hazelcast and then to MongoDB to CSV? Why would you want to do that? What is the business value of that?

OODBMS - RDBMS difference and which one is suitable for a factory management system

I searched a bit for the differences between OODBMS and RDBMS. I pretty much know what they are. However, how I will decide which one is better for which applications. Can anyone kindly help me please?
What I meant for factory management is: there are production lines to manufacture bottled, frozen and other food stuff. The application manages from assigning staff onto the lines, to keep the production records in the system. Which dbms is better for such systems?
Thanks in advance.
Here is a nice article by Rick Grehan that describes cases where ODBMS are useful:
http://www.odbms.org/wp-content/uploads/2013/11/006.04-Grehan-When-to-Use-an-ODBMS-2005.pdf
Disclaimer: this is an "old curmudgeon" answer, from a guy who wrote plenty of perfectly functional accounting, manufacturing and other code before OOP came into the mainstream.
With that being said...
Factory management is classic relational database stuff, it's what it was invented to do. The code for classic relational apps tends to follow very predictable patterns, lots of loops over retrieved rows from tables, or straight pass-through stuff: passing data up to the UI or down to the database. If your DB is well-designed, the biz logic you code will be details in those loops or pass-throughs, but those two patterns will dominate.
An OODMS on the other hand, from the point of view of this "old curmudgeon", attempts to recast the perfectly and efficiently functional RDBMS into something that will work with classes/objects, for no discernible gain over a system that has proven itself for decades to work extremely well. Classes have little or nothing to do with the classic code patterns that sit on top of relational databases. In fact, they tend to complicate things and can easily get in the way. I'm not saying don't use OOP code to deal with the database, just that OOP was invented for a different kind of problem, a problem that database apps don't happen to have.
Decision to choose OODBMS or RDBMS does not depend upon particular application like factory management/automation.
It is depends on many criteria like
1) Programming Paradigm - If you [programmer] choose to visualize/implement in the OO programming language then the OODBMS is suitable to store the objects as directly into the database, but Most widely the type of DBMS Relational, because it is well established commercially and have a good mathematical background.
2) Application Specific - For an factory automation/management, responsiveness and fast access is important. The OODBMS are swifter than the RDBMS. If you considered for web-development then a light-weight tool like MySQL will fit a lot.
3) Trend - Now there is paradigm swift from Legacy/Structural to Object/Component Oriented programming. so therefore, in this trend the OODBMS is best suitable for the Enterprise Applications like factory management, etc.
It depends on the application layer using. If it is a simple approach more towards procedural way [which can have classes too] RDBMS is more suitable. Otherwise if you are more towards a strict object oriented system OODBMS can be used.
I usually draw the line of usefulness at the point in which they need to be integrated into enterprise systems. If your project does not necessarily need to integrate, ODBMS is usually easier or technically superior. If you can integrate via web services or "push/pull" into an enterprise system DB, then you can still use an ODBMS, but there might be political pressure against it. (newer ODBMS/RDBMS replication like dRS for db4o may be a good fit) But if you need tight integration with legacy or enterprise datastores, then you're usually forced to use the RDBMS for one reason or another.
That said, your individual production lines might benefit greatly from an ODBMS which are great at storing oft-changing complex object models and schemas while the orchestrator system could follow my previous line of thinking.
I've been using ODBMS for many years, and have been dreading the project which requires me to return to purely relational data management. Although recent improvements in ORM tooling have made relational much more pleasant to work with, the ORM+RDBMS solution still can't keep up with ODBMS systems in a few key areas (see the previously mentioned article on odbms.org).

SmartGWT, ZK and GenericFrame - Online Homework

Good day,
Our school, a small high school in semi-rural New Zealand, is currently looking into online homework solutions. Being one of the IT guys, I have been asked to look into some of the options. We have checked around and there are no robust solutions that cover what we are looking for. So, we are considering development of our own system, either on our own or in collaboration with some other schools.
Before I put significant time into any one option, I would thought I should ask for some expert advice.
Please keep in mind that one of our major obstacles is that around 20% of our students are on dial-up because broadband is not available in their area.
We are also not limited to the technologies listed, they just are the ones that we have been looking into up to this point.
With that in mind, here goes.
1. Is there a way to pre-determine the bandwidth needed for these technologies?
2. If bandwidth continued to be too limiting, could the final solution stand alone so we could distribute it to students on CD or USB stick?
3. What are some pros/cons of each for use with databases, specifically mysql or postgresql? (After all we do need to keep track of lots of data)
4. What are some pros/cons of each for of these RIA development?
I appreciate everyone for sharing their time and expertise on the matter.
Cheers,
Ben
1) If you write full-AJAX application, such as in GWT, the bandwitch will be:
a) the size of application java script, images, etc., you may consider that everything is loaded when user logs in (cache for images may seems to be big, but it's easily overloaded)
b) the size of communication - in GWT it depends only from you! no magic full-frame reloading, sending is only what YOU are wanting to send
2) I do not catch your point, stand alone applications can be distributed such way, applications that use databases generally can't
3) postgresql has high compatibility with Oracle - same transaction+select for update behaviour, pgPLSQL is highly inspired by PL/SQL (easy to rewrite stored procedures).
I personally suggest MySQL for a school project for its simplicity. PostgreSQL is powerful but a bit complicate to configure and the visual tool for optimizing queries not good.
Without considering the bandwidth, I definitely suggest ZK since, again, it is much easier to learn, to develop and to maintain (also much more powerful). The bandwidth consumption and latency of GWT really depends how much effort you want to invest, and how skillful your people are familiar with distributed computing, while the network bandwidth is basically the states of UI (not data), which is reasonably small. In short, you could have the best network bandwidth and latency if you optimize it at the best with GWT, while ZK is less to worry but, if you want to improve, you have to use jQuery (i.e, in JavaScript).
Thanks lechlukasz, I appreciate your comments and insight.
I will clarify my point about stand alone applications. We have a number of students, as high as 20%, who do not have access to broadband due to their geographic location. We are considering, as part of the design, how we may be able to distribute a stand alone version.
For instance, if we were to abstract all the database calls using a separate class in GWT, we could recompile a stand alone version that didn't make the database calls. The database would likely only be for tracking results and reporting.
In reality, we would likely implement the front end product first with references to empty methods for storing the results in a database and implement those methods at a later time.
For the record, we have started to code up some test cases using GWT/SmartGWT and are pleased with the results. Although we cannot comment on the other technologies considered because we didn't try them to the same extent, we are pleased with the results to this point of the project.
Cheers,
Ben

Do I need to use DDD, Unit of Work, Repositories or something similar for simple web apps?

I'm working on a simple eCart system using .net4 (c#). I've been doing a lot of reading about Unit of Work Pattern, Repository Pattern, and Persistence Ignorance. I think I have a grasp on the strategy and benefits to building my layers this way, but for my simple app I'm wondering if it's necessary and if anyone can point me towards good architecture for my scope.
Please correct me if I'm wrong - the main benefits to using repositories are to create fewer trips to the DB and to separate application architecture from DB architecture. IE - what's good for DB performance isn't always good for application design so it's best to design what's best for both and then create an interface between the two.
So here's the question - I want any business transaction that occurs to be saved to the DB as soon as it occurs, so there doesn't seem to be a point in queuing data in repositories and then saving it immediately. Why not just save it directly?
Are there other benefits of DDD that I'm missing or would it be over engineering to build out such a robust architecture for every simple project that comes along? Thanks for any help.
Do you need to use [insert pattern here]: Nope When it comes right down to it, the best practice is always the one that gets your application done, and meets the time, monetary, and technical requirements.
But if you take the "lets just get it done" approach, then be aware of the Technical Debt you might be incurring.
Also there are a lot of reasons to use some of these patterns (and they don't always have to do with performance), particularly the Unit Of Work pattern. This has more to do with the requirements and restrictions that often come with ORM's and such. These issues can be a bit complex, but I suspect as you begin to implement some of these things you'll start to realize what those issues are and come to understand why these patterns are useful.
Agree with CodingGorilla. Patterns are great unless they conflict with YAGNI.
If every transaction needs to be written immediately (that is, if you have potential contention between the actions of two users) then you will need a queuing mechanism or you can use the underlying transactional mechanism of whatever data repository you might be using (e.g. SQL)