Architecture of microservices from a business approach or technical? - rest

Our team is trying to decouple a monolithic spring mvc administrative application (create, update, delete) and we want to adopt an architecture based on microservices.
After a bit of research, it seems the best is create microservices according to the problem that a specific part of the software solves, for example, Managing Clients.
The problem comes when we read some definitions, like the following from Wikipedia:
In software engineering, a monolithic application describes a
single-tiered software application in which the user interface and
data access code are combined into a single program from a single
platform.
Based on that definition, my application is not monolithic, because it is perfectly separated in layers, but it is not found in a micro-services architecture either, which is confusing to me since in the web everything is about Monolithic vs. Microservices.
So, should the microservices architecture be designed based on the business problem it solves?
Should the microservices architecture be designed based on to the way in which the application is organized in layers?
Thanks.

I like to view each microservice as self contained smaller monoliths. When you're forcing yourself to split up your legacy application to, um, smaller monoliths, you'll find:
60% of your code is scaffolding and will need to be repeated across multiple services.
It's easier to split things (and maintain them that way) if you've established a what-goes-where rule upfront.
The most common approach is to split the application by functionality area. So to answer your question, I'd agree more with the image at the top-right, assuming you intended to show multiple containers there.
And about #1 above, there's often a whole bunch of scaffolding modules that you can avoid writing by hand after all.

From my experience, the most obvious advantage of a microservice is the ability to scale horizontally. User analysis takes to long? Just add 10 more workers. Done. Remove then. No need to add more RAM/CPU/whatever to your already costly server that runs your monolith.
Do not plan ahead an try to separate ClientManager microservice - this should be just a class.
You are thinking about migrating to microservices for a reason. Something is using up too much resources. Find the most problematic process that slows everything down, and create microservice for it. It can be for example report generation, user creation, data agregation. Start with planning the API. It will state clearly, what are responsibilities it will have and how much resources it will use. When you know what it should do, name it properly.
Agile software methodologies are your greatest friend in this process. Take the processes one by one. Experiment, iterate and evaluate. With time, it will be obvious how the microservices should do.
There is also a hot topic on how to organize code with microservices - I lean towards a monorepo - a single repository with all the code.
Pros: One pull request for many services, easy utils sharing, common dependencies, common deployment procedure and easier automation.
Cons: You can easily break the API contract and do too much work within one microservice (meaning, it can take other services responsiblity.)

Related

Is there a specific standard to follow to therefore call something a"Microservices Architecture"?

In the past I worked with what I believe were "microservices". We had a service discovery and the applications communicated with each other through REST sync calls, which I believe is called request / response.
Now we are working on an application that is simply broken down into multiple small applications which work together through publish / subscribe using Kafka, there's no direct communication between them nor a service discovery, per say.
Is it safe to say that these are "Microservices" too or what should I call them?
As one commenter pointed out- An answer will always be opinionated (and stackoverflow is generally not the best place for opinion based questions), but I am not afraid to give mine in an attempt to help.
My view goes along the lines of what Martin Fowler and others have taught us in last decade, as well as the insight from the pioneering work of Werner Vogels at Amazon. Their definitions are that microservices are not a well defined single standard, but an architectural style that can encompass a range of different approaches in distributed computing. What they have in common are the goals of providing scalability in many different dimensions, but mostly system complexity, organizational productivity, throughput and resilience - All while being affordable. To achieve that the designs should be centered around:
Service boundaries cut according to business goals (a type of Domain Driven Design).
Sizing / Scope of services to be limited (the famous two-pizza team size idea).
Clear separation in terms of organizational responsibilities and maintenance. This includes independent teams and release-/ maintenance cycles for each service.
Embrace of automation on all layers (how cloud virtualization, devops, CICD, infrastructure as code, etc. became popular tools).
Design focus on failure-resilient operation, e.g. avoid cascading failures, defined failure states, fallbacks, etc.
A consequence of those goals is also the embrace of distributed system architectures and design for horizontal scalability (statelessness, separation of compute and storage, etc.).
So if you feel your designs are along those lines then in my opinion you would have applied the microservices architectural style successfully.
If you want to learn more about this school of thought I recommend this read.

Why shared libraries between microservices are bad? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 6 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Sam Newman states in his book Building Microservices
The evils of too much coupling between services are far worse than the problems caused by code duplication
I just don't understand how the shared code between the services is evil. Does the author mean the service boundaries themselves are poorly designed if a need for a shared library emerges, or does he really mean I should duplicate the code in the case of common business logic dependency? I don't see what that solves.
Let's say I have a shared library of entities common to two services. The common domain objects for two services may smell, but another service is the GUI to tweak the state of those entities, another is an interface for other services to poll the state for their purpose. Same domain, different function.
Now, if the shared knowledge changes, I would have to rebuild and deploy both services regardless of the common code being an external dependency or duplicated across the services. Generally, same concerns all the cases for two services depending of the same article of the business logic. In this case, I see only harm of duplication of the code, reducing the cohesion of the system.
Of course, diverging from the shared knowledge may cause headaches in the case of shared library, but even this could be solved with inheritance, composition and clever use of abstractions.
So, what does Sam mean by saying code duplication is better than too much coupling via shared libraries?
The evils of too much coupling between services are far worse than the problems caused by code duplication
The author is very unspecific when he uses the generic word "coupling". I would agree with certain types of coupling being a strict no-no (like sharing databases or using internal interfaces). However the use of common libraries is not one of those. For example if you develop two micro services using golang you already have a shared dependency (towards golang's basic libraries). The same applies to libraries that you develop yourself for sharing purpose. Just pay attention to the following points:
Treat libraries that are shared like you would dependencies to 3rd party entities.
Make sure each component / library / service has a distinct business purpose.
Version them correctly and leave the decision which version of the library to use to the corresponding micro service teams.
Set up responsibilities for development and testing of shared libraries separately from the micro services teams.
Don't forget - The microservices architectural style is not so much focusing on code organization or internal design patterns, but on the larger organizational and process relevant aspects to allow scaling application architectures, organizations and deployments. See this answer for an overview.
Short
The core concept of the microservice architecture is that microservices have their independent development-release cycles. "Shared libraries" undermining this.
Longer
From my own experience, it's very important to keep microservices isolated and independent as much as possible. Isolation is basically about being able to release & deploy the service independently of any other services most of the time.
In other words its something like:
you build a new version of a service
you release it (after tests)
you deploy it into production
you have not caused the deployment cascade of your whole environment.
"Shared libraries" in my definition those libraries, do hinder you to do so.
It's "funny" how "Shared Libraries" poison your architecture:
Oh we have a User object! Let's reuse it everywhere!
This leads to a "shared library" for the whole enterprise and starts to undermine Bounded Contexts (DDD), forces you to dependent on one technology
we already have this shared library with TDOs you need, written in
java...
Repeating myself. The new version of this kind of shared libs will affect all services and complicate your deployments up to very fragile setups. The consequence is at some point, that nobody trusts himself to develop the next releases of the common shared library or everyone fears the big-bang releases.
All of this just for the sake of "Don't repeat yourself"? - This is not worth it (My experience proves it). T
The shared compromised "User" object is very seldom better than several focused User objects in the particular Microservices in the praxis.
However, there is never a silver bullet and Sam gives us only a guideline and advice (a heuristic if you like) based on his projects.
My take
I can give you my experience. Don't start a microservice project with reasoning about shared libraries. Just don't do them in the beginning and accept some code repetition between services. Invest time in DDD and the quality of your Domain Objects and Service Boundaries. Learn on the way what are stable parts and what evolves fast.
Once you or your team gained enough insides you can refactor some parts to libraries. Such refactoring is usually very cheap in comparison to the reverse approach.
And these libraries should probably cover some boilerplate code and be focussed on one task - have several of them, not one common-lib-for- everything In the comment above Oswin Noetzelmann gave some advice on how to proceed. Taking his approach to the maximum would lead to good and focused libraries and not toxic "shared libraries"
Good example of tight coupling where duplication would be acceptable can be shared library defining interface/DTOs between services. In particular using the same classes/structs to serialize/deserialize data.
Let's say you have two services - A and B - they both may accept slightly different but overall almost same looking JSON input.
It would be tempting to make one DTO describing common keys, also including the very few ones used by service A and service B as a shared library.
For some time system works fine. Both services add shared library as dependency, build and run properly.
With time, though, service A requires some additional data that would change the structure of JSON where is was the same before. As a result you can't use the same classes/structs to deserialize the JSON for both services at the same time - the change is needed for service A, but then service B won't be able to deserialize the data.
You must change shared library, add new feature to service A and rebuild it, then rebuild service B to adjust it to new version of shared library even though no logic has been changed there.
Now, would you have the DTOs defined separately, internally, for both services from the very beginning, later on, their contracts could evolve separately and safely in any direction you could imagine. Sure, at first it might have looked smelly to keep almost the same DTOs in both services but on the long run it gives you a freedom of change.
At the end of the day, (micro)services don't differ that much from monolith. Separation of concerns and isolation are critical. Some dependencies can't be avoided (language, framework, etc.) but before you introduce any additional dependency by yourself think twice about future implications.
I'd rather follow given advice - duplicate DTOs and avoid shared code unless you can't avoid it. It has bitten me in the past. Above scenario is trivial one, but it may be much more nuanced and affect much more services. Unfortunately it hits you only after some time, so the impact may be big.
There are no absolute answer with this. You'll always find an example for a reasonable exception to the rule. We should take this as 'guidelines'.
With that being said, yes coupling between services is something to avoid and a shared library is a warning alarm for coupling.
As other answers have explained, microservices lifecycles should be independant.
And as for your example, I think it strongly depends on what kind of logic / responsibilities does the library have.
If it is business logic, something is odd. Maybe you need to split the library in different libraries with different responsibilities, if that responsability is unique and can't be splited, you should wonder if those two services should be only one. And if that library has business logic that feels weird on those two services, most likely that library should be a service in his own right.
Each microservice is autonomous so executables will have its own copy of shared libraries so there is no coupling with shared library?
Spring Boot, packages language run time also in package of microservice
Nothing is shared even runtime so I don't see problem in using library or common package in microservice
If shared library creates coupling in Microservice then using same languages in different Microservice also a problem?
I was also confused while reading "Building Microservices" by Sam Newman

How microservices are different from isolated, standalone service?

I would like to understand, how microservices is different from creating separate, stand-alone service(like REST or SOAP).
For instance, I have to create a license system for my Webapp. This can be simply a separate REST service, which can be consumed by Webapp. Still it can be tweaked, scaled up/down, isolated from my Webapp,
Why it needs to be a microservice ?
What are the pros and cons of each approach ?
Microservice is just a buzzword and people have been practicing that architectural style before calling it microservice. Here are the commonly agreed attributes of a microservice. If your service fulfills those points, chances are you can call it a microservice:
Independent life cycles (development, testing, deployment, etc) with independent development teams (to utilize Conway's Law to your advantage).
Focused scope - Implement exactly one specific domain aspect to keep it as "micro" as possible. (Separation of concerns)
Clearly defined interface to the outside world. Typically with changes only versioned and phaseout time frames.
Scalable design. Often stateless on purpose to allow for deployment of application clusters.
No sharing of databases/persistence. This is important to avoid indirect dependencies.
Additionally some people like to include things like freedom of implementation technologies, infrastructure automation and asynchronous communication.
The cons of not following these practices may not be relevant if you are only a single developer or small team starting out. But if you have a large development organization you would start to suffer under the crippling "monolithis"-disease, which potentially slows down development and release cycles significantly.
For more detailed understanding I recommend reading the Martin Fowler material. Especially the section about trade-offs.

Implementing SOA with RESTful service and application APIs?

At the moment we have one huge API which is used by our backoffice, our frontend, and also our public API.
This causes me a lot of headaches because when building new endpoints I find a lot of application specific logic in the code which I don't necessarily want to include in my endpoint. For example, the code to create a user might contain code to send a welcome email, but because that's not needed for the backoffice endpoint I will then need to add a new endpoint without that logic.
I was thinking about a large refactor to break our code base in to a number of smaller highly specific service APIs, then building a set of small application APIs on top of those.
So for example, an application endpoint to create a new user might do something like this after the refactor:
customerService.createCustomer();
paymentService.chargeCard();
emailService.sendWelcomeEmail();
The application and service APIs will be entirely separate code bases (perhaps a separate code base per service), they may also be built using different languages. They will only interact through REST API calls. They will be on the same local network, so latency shouldn't be a huge issue.
Is this a bad idea? I've never seen/worked on a codebase which has separated the two before, so perhaps there is a better architecture to achieve the flexibility and maintainability I'm looking for?
Advise, links, or comments would all be appreciated.
Your idea of making multiple, well-defined services is sound and really it is the best way to approach this. Going with purely micro-services approach however trendy it might seem, proves to be an overkill most often than not. This is why I'd just redesign the existing API/services properly and follow solid and sound SOA design principles below. Good Resources could be found on both serviceorientation.com and soapatterns.org I've always used them as reference in my career.
Consider what types of services you need
(image from serviceorientation.com)
Entity services are generally your Client, Payment services - e.g. services centered around an entity in your domain. They should be business-agnostic, and be able to be reused in all scenarios. They could be called sometimes by clients directly if sufficient for their needs. They could be called by Task services.
Utility services contain logic you're likely to reuse in other services, but are generally not called by the clients directly. Rather, they'd be called by Task and Entity services. An example might be a Transliteration service.
Task services combine and reuse Entity and Utility services into meaningful tasks. Most often they are not that agnostic and they do implement some specific business logic. They have meaningful business operations and they are what clients mostly call.
Principles to follow when redesigning
I strongly recommend going over this cheat sheet and making sure everything there is covered when you do your redesign. It's great help.
In general, you should make sure that:
Each service has a common context and follows the separation of concerns principle. E.g. Clients service is only for clients related operations, etc.
Each of the Entity and Utility services is business-agnostic and basic enough. So it can be reused in multiple scenarios and context without being changed. Contract must be simple - CRUD and only common operations that make sense in most usage scenarios.
Services follow a common data model - make sure all the data structures you use are used uniformly in all services in order to prevent need for integration efforts in the future and promote combination of services for clients to exploit. If you need to receive a customer that another service returns, this should be happening without the need for transformation
OK, but where to put the non-agnostic logic?
Now, you have multiple options for abstracting business logic whenever you have a need for complex business functionality. It depends on your scenario what you're going to chose:
Leave logic to all clients. Let them combine your simplified services
If there is business logic that is commonly implemented in multiple of your applications and has the potential to be reused heavily you can implement a composite service that reuses multiple existing underlying services and exposing the logic.
Service Composability. Concerns on multiple API calls communication overhead.
Well, this is an age-old question - should you make multiple API calls when they will probably create some communication overhead? The answer is - it depends on how complex your scenario is, how much reuse you expect and how flexible you want to be. Also is speed critical? To what extent? In Service Oriented Architecture though, this is a very common approach - to reuse your existing services and combine them in new configurations as needed. Yes, it does add some overhead, but I've seen implementations in very complex environments, for example Telecoms, where thanks to the use of ESB solutions, message queues, etc the overhead is negligible compared to the benefits. Here is a common architecture approach (image from serviceorientation.com):
The mandatory legacy refactoring heads-up
More often than not, changing the established contract for multiple existing client systems is a messy business and could very well lead to lots of refactoring and need for looking for needle-in-a-stack functionality that's somewhere deep in the (possibly) legacy code. Business logic might be dispersed everywhere. So make sure you're ready and have the controls, time and will to lead this battle.
Hope this helps
Is this a bad idea?
No, but this is a big overall question to be able to provide very specific advice.
I'd like to separate this into 3 areas:
Approach
Design
Technology
Working backwards, the Technology is the final and most-specific part, and totally depends on what your current environment is (platforms, skills), and (hopefully) will be reasonable self-evident to you once the other things are in progress.
The Design that you outlined above seems like a good end-state - having multiple, specific, focused APIs, each with their own responsibility. Again, the details of the design will depend on the skills of you and your organization, and the existing platforms that you have. E.g. if you are already using TIBCO (for example) and have a lot invested (licenses, platforms, tools, people) then leveraging some of their published patterns/designs/templates makes sense; but (probably) not if you don't already have TIBCO exposure.
In the abstract, the REST API services seems like a good starting point - there are a lot of tools and platforms at all levels of the system for security, deployment, monitoring, scalability, etc. If you are NGINX users, they have a lot of (platform-independent) thoughts on how to do this also NGINX blog, including some smart thinking on scalability and performance. If you are more adventurous, and have an smart, eager team, a look at Event-driven architecture - see this
Approach (or Process) is the key thing here. Ultimately, this is a refactoring, though your description of "a large refactor" does scare me a little - put that way, it sounds like you are talking about a big-bang change and calling it refactoring. Perhaps it is just language, but what's in my mind would be "an evolution of the 'one huge API' into multiple, specific, focused APIs (by refactoring the architecture)". One place to start is Martin Fowler, while this book is about refactoring software, the principles and approach are the same, just at a higher-level. Indeed, he talks about just this here
IBM talk about refactoring to microservices and make it sound easy to do in one step, but it never is (outside the lab).
You have an existing API, serving multiple internal and external clients. I will suggest that you'll want to keep this interface solid for these clients - separate your refactoring of the implementation from the additional concerns of liaising with and coordinating external systems/groups. My high-level starting approach would be:
identify a small (3-7) number of related methods on the API
ideally if a significant, limited-scope change is needed anyway with these methods, that is good - business value with the code change
design/specify a new stand-alone API specifically for these methods
at first, clone the existing model/naming/style
code a new service just for these
with proper automated CI/CD testing and deployment practices
with associated monitoring
modify the existing API to have calls to these methods re-direct to call the new service
perhaps have a run-time switch to change between the old implementation and the new implementation
remove the old implementation from codebase
capture issues, assumptions and problems along the way
the first pass will involve a lot of learning about what works and doesn't.
then repeat the process over & over, incorporating improvements each time.
At some point in the future, when appropriate due to other business-driven needs, the API published to the back-end, front-end and/or public clients can change, but that is a whole different project.
As you can see, if the API is huge (1,000 methods => 140 releases) this is a many-months process, and having a reasonably frequent release schedule is important. And there may be no value improving code that works reliably and never changes, so a (potentially) large portion of the existing API may remain, just wrapped by a new API.
Other considerations:
public API? Maybe a new version (significant changes) will be needed sooner than the internal APIs
focus on the methods/services used by it
what parts/services change the most (have the most enhancement requests approved)
these are the bits most likely to change, and could benefit most from a better process/architecture
what are future plans for change and where would the API be impacted
e.g. change to user management, change to payment processors, change to fulfilment systems
e.g. new business plans (new products/services)
consider affected methods in the API
Also see:
Using Microservices for Legacy System Modernization
Migrating From a Monolith to APIs and Microservices
Break the Monolith! Loosely Coupled Architecture Brings DevOps Success
From the CEO’s Desk: Application Modernization – Assess, Strategize, Modernize! 9
[Microservices Architecture As A Large-Scale Refactoring Tool 10
Probably the biggest 4 pieces of advice that I can give is:
think refactoring: small changes that don't affect function
think agile: small increments that are valuable, testable, achievable
think continuous: have a vision for where you will (eventually) get to, then work the process continuously
script & automate the processes from code, documentation, testing, deployment, monitoring...
improving it every time!
you have an application/API that works - keep it working!
That is always the first priority (you just need to work to carve-out time/budget for maintenance)
Not a bad idea at all.
Also what are your looking is microservices arch. and with that the question comes is how you break your system into well defined services.
We use Domain Driven Design Arch. to break our system into microservices and lagom framework , which allows every service to be in diff. code base and event driven arch. between microservices.
Now lets look at your problem at low level: you said a service contains code like creating a user and sending a email and one with just creating a user and there might be other code as well.
First we need to understand how many type of code you are writing:
Domain Object Logic (eg: User Object) -- what parameters are valid and all -- this should be independent of service endpoint and should be encapsulated in one Class like user class and we say it an Aggregate in Domain Driven Design terms
Business Reactions -- like on user creation send a email -- using event driven arch. these type of logics are separated into process managers or sagas which could most cases work conditionally like a for user created externally send a mail and for user created internally send a email , by having extra data in the event
Also the current way you are doing it , how are you handling transaction across services???

How best to integrate several systems?

Ok where I work we have a fairly substantial number of systems written over the last couple of decades that we maintain.
The systems are diverse in that multiple operating systems (Linux, Solaris, Windows), Multiple Databases (Several Versions of oracle, sybase and mysql), and even multiple languages (C, C++, JSP, PHP, and a host of others) are used.
Each system is fairly autonomous, even at the cost of entering the same data into multiple systems.
Management recently decided that we should investigate what it will take to get all the systems happily talking to each other and sharing data.
Keep in mind that while we can make software changes to any of the individual systems, a complete rewrite of any one system (or more) is not something management is likely to entertain.
The first thought of several of the developers here was the straight forward: If system A needs data from system B it should just connect to system B's database and get it. Likewise if it needs to give B data it should just insert it into B's database.
Due to the mess of databases (and versions) used, other developers were of the opinion that we should have one new database, combining the tables from all the other systems to avoid having to juggle multiple connections. By doing this they hope that we might be able to consolidate some tables and get rid of the redundant data entry.
This is about the time I was brought in for my opinion on the whole mess.
The whole idea of using the database as a means of system communication smells funny to me. Business logic will have to be placed into multiple systems (if System A wants to add data to System B it better understand B's rules concerning the data before doing the insert), several systems will most likely have to do some form of database polling to find any changes to their data, continuing maintenance will be a headache, as any change to a database schema now propagates several systems.
My first thought was to take the time and write APIs/Services for the different systems, which once written could be easily used to pass/retrieve data back and forth. A lot of the other developers feel that is excessive and far more work than just using the database.
So what would be the best way to go about getting these systems to talk to each other?
Integrating disparate systems is my day job.
If I were you, I would go to great effort to avoid accessing System A's data from directly within System B. Updating System A's database from System B is extremely unwise. It is exactly the opposite of good practice to make your business logic so diffuse. You will end up regretting it.
The idea of the central database isn't necessarily bad ... but the amount of effort involved is probably within an order of magnitude of rewriting the systems from scratch. It is certainly not something I would attempt, at least in the form you describe. It can succeed, but it is much, much harder and it takes a lot more discipline than the point-to-point integration approach. It's funny to hear it suggested in the same breath as the 'cowboy' approach of just shoving data directly into other systems.
Overall your instincts seem pretty good. There are a couple of approaches. You mention one: implementing services. That's not a bad way to go, especially if you need updates in real time. The other is a separate integration application that is responsible for shuffling the data around. That's the approach I usually take, but usually because I can't change the systems I'm integrating to ask for the data it needs; I have to push the data in. In your case the services approach isn't a bad one.
One thing I would like to say that might not be obvious to someone coming to system integration for the first time is that every piece of data in your system should have a single, authoritative point of truth. If the data is duplicated (and it is duplicated), and the copies disagree with each other, the copy in the point of truth for that data must be taken to be correct. There is just no other way to integrate systems without having the complexity scream skyward at an exponential rate. Spaghetti integration is like spaghetti code, and it should be avoided at all costs.
Good luck.
EDIT:
Middleware addresses the problem of transport, but that is not the central problem in integration. If the systems are close enough together that one app can shove data directly in to another, they're probably close enough that a service offered by one can be called directly by another. I wouldn't recommend middleware in your case. You might get some benefit from it, but that would be outweighed by the increased complexity. You need to solve one problem at a time.
Sounds like you may want to investigate Message Queuing and message-oriented middleware.
MSMQ and Java Message Service being examples.
It seems you are looking for opinions, so I will provide mine.
I agree with the other developers that writing an API for all the different systems is excessive. You would likely get it done faster and have much more control over it if you just take the other suggestion of creating a single database.
One of the challenges that you will have is to align the data in each of the different systems so that it can be integrated in the first place. It may be that each of the systems that you want to integrate holds entirely different sets of data but more likely it is data that is overlapping. Before diving into writing API:s (which is the route I would take as well given your description) I would recommend that you try and come up with a logical data model for the data that needs to be integrated. This data model will then help you leverage the data that you are having in the different systems and make it more useful to the other databases.
I would also highly recommend an iterative approach to the integration. With legacy systems there is so much uncertainty that trying to design and implement it all in one go is too risky. Start small and work your way to a reasonably integrated system. "Fully integrated" is hardly ever worth aiming for.
Directly interfacing via pushing/ poking databases exposes a lot of internal detail of one system to another. There are obvious disadvantages: upgrading one system can break the other. Moreover, there can be technical limitations in how one system can access the database of the other (consider how an application written in C on Unix will interact with a SQL Server 2005 database running on Windows 2003 Server).
The first thing you have to decide is the platform where the "master database" will reside, and the same for the middleware providing the much required glue. Instead of going towards API level middleware-integration (such as CORBA), I would suggest you to consider Message Oriented Middleware. MS Biztalk, Sun's eGate and Oracle's Fusion can be some of the options.
Your idea of a new database is a step in the right direction. You might like to read a little bit on Enterprise Entity Aggregation pattern.
A combination of "data integration" with a middleware is the way to go.
If you are going towards Middleware + Single Central Database strategy, you might want to consider achieving this in multiple phases. Here's a logical stepped process which can be considered:
Implementation of services/APIs for different systems which expose the functionality for each system
Implementation of Middleware which accesses these APIs and provides an interface to all the systems to access the data/services from other systems (accesses data from central source if available, else gets it from another system)
Implementation of Central Database only, without data
Implementation of Caching/Data-Storage Services at the Middleware level which can store/cache data in the central database whenever that data is accessed from any of the Systems e.g. IF System A's records 1-5 are fetched by System B through Middleware, the Middleware Data Caching Services can store these records in the centralized database and the next time these records will be fetched from the central database
Data Cleansing can happen in Parallel
You can also create a import mechanism to push data from multiple systems to the central database on a daily basis (automated or manual)
This way, the effort is distributed across multiple milestones and data is gradually stored in the central database on first-accessed-first-stored basis.