What does a web-based framework scalable? - frameworks

thanks you very much in advance.
First of all, I conceive scalability as the ability to design a system that doest not change when the demand of its services, whatever they are, increases considerably. May you need more hardware (vertically or horizontally0? Fine, add it at your leisure because the system is prepared and has been designed to cope with it.
My question is simple to ask but presumably very complex to answer. I would like to know what you I look at in a framework to make sure it will scale accordingly, both in number of hits and number of sessions running simultaneously.
This question is not about technology nor a particular framework at all, it is more a theoretical question.
I know that depend very much on having a good database design and a proper hardware behind with replication, etc... Let's assume that this all exists, however yet my framework must meet some criteria, what?
Provide a memcache?
Ability to run across multiple machines (at the web server level) and use many replicated databases? But what is in the software that makes that possible?
etc...
Please, let's not relate the answers with any particular programming language or technology behind.
Thanks again,
D.

I think scalability depends most of all on the use case: do you expect huge amounts of data, then you should focus on the database, if it's about traffic, focus on the server, is it about adding new features, focus on your data-model and the framework you are using...
Comparing a microposts-service like Twitter to a university website or a webservice like GoogleDocs you will find quite different requirements.

First of all the common notion of scalability is the ability of a software to improve in throughput or capacity if more hardware resources are added (CPUs, memory, bandwidth etc).
Software that does not improve in increased resources is not scalable.
Getting out of the definitions, I think your question is related to evaluation of frameworks you are planning to introduce to your implementation that may affect your software's ability to scale.
IMHO the most important factor to evaluate when introducing a framework is to see if there is hidden serialization in it (that serialization in effects transfers to/affects your software)
So if you introduce a framework that introduces serialization in your application that can affect your ability to scale.
How to evaluate?
Careful source code inspection (if open source)
Are there any performance guarantees offered by those that build the
framework?
Do measurements yourself to see how introducing this framework
affects your performance and replace if not satisfied

Related

Is CouchDB a good persistent layer for Membase?

Membase is great for social game due to it's low latency.
As I understand CouchDB is a MVCC system using b+ tree, with a focus on append only design.
(http://guide.couchdb.org/draft/btree.html)
One of the most important scenario of Membase is social game.
Social game has a lot of write operations (50+%).
And a good portion of them are in-place updates.
So why is CouchDB a suitable persistent layer for Membase?
I'd also add that CouchDB's append-only log format really doesn't have much relation to whether application writes are new items or updates. The append-only format gives us much better reliability and performance than an in-place system (like sqlite...which is still quite reliable). It's also much easier to take backups of.
Does Membase NEED an append-only log format? maybe not...does it NEED CouchDB?...YES!
The benefits of map-reduce and indexing as well as eventually consistent replication that CouchDB brings are nothing less than huge for Membase...and the benefits of low-latency, clustering and UI that Membase brings to CouchDB are arguably just as important.
(Disclosure: I work for Couchbase)
Perry Krug
CouchDB has great file formats, great ability to recover from crashes, sophisticated authentication and authorization tools, and a universal, standard, interface: HTTP. CouchDB is poor at low-latency queries, optimized memory utilization, and heavy update speeds (a million per second).
Membase currently has only a simple SQLite file format for persistence, less sophisticated authentication and authorization, using a more obscure protocol. Membase is amazing for low-latency queries, ideal memory utilization, and heavy update speeds.
I think the two complement each other very well. Since the merging effort is coming from core developers in both projects, collaborating together, I expect to see the strengths of both and the weaknesses of neither. Yes, CouchDB is a good persistence layer for Membase.
Money speaks and if there ever was a vote of confidence then here it is, not only from a new lead investor but also from the existing ones as well.
http://www.couchbase.com/press-releases/couchbase-series-C
Besides, don't you think that Membase itself is more than well enough qualified to make an evaluation for such a merger decision?

SmartGWT, ZK and GenericFrame - Online Homework

Good day,
Our school, a small high school in semi-rural New Zealand, is currently looking into online homework solutions. Being one of the IT guys, I have been asked to look into some of the options. We have checked around and there are no robust solutions that cover what we are looking for. So, we are considering development of our own system, either on our own or in collaboration with some other schools.
Before I put significant time into any one option, I would thought I should ask for some expert advice.
Please keep in mind that one of our major obstacles is that around 20% of our students are on dial-up because broadband is not available in their area.
We are also not limited to the technologies listed, they just are the ones that we have been looking into up to this point.
With that in mind, here goes.
1. Is there a way to pre-determine the bandwidth needed for these technologies?
2. If bandwidth continued to be too limiting, could the final solution stand alone so we could distribute it to students on CD or USB stick?
3. What are some pros/cons of each for use with databases, specifically mysql or postgresql? (After all we do need to keep track of lots of data)
4. What are some pros/cons of each for of these RIA development?
I appreciate everyone for sharing their time and expertise on the matter.
Cheers,
Ben
1) If you write full-AJAX application, such as in GWT, the bandwitch will be:
a) the size of application java script, images, etc., you may consider that everything is loaded when user logs in (cache for images may seems to be big, but it's easily overloaded)
b) the size of communication - in GWT it depends only from you! no magic full-frame reloading, sending is only what YOU are wanting to send
2) I do not catch your point, stand alone applications can be distributed such way, applications that use databases generally can't
3) postgresql has high compatibility with Oracle - same transaction+select for update behaviour, pgPLSQL is highly inspired by PL/SQL (easy to rewrite stored procedures).
I personally suggest MySQL for a school project for its simplicity. PostgreSQL is powerful but a bit complicate to configure and the visual tool for optimizing queries not good.
Without considering the bandwidth, I definitely suggest ZK since, again, it is much easier to learn, to develop and to maintain (also much more powerful). The bandwidth consumption and latency of GWT really depends how much effort you want to invest, and how skillful your people are familiar with distributed computing, while the network bandwidth is basically the states of UI (not data), which is reasonably small. In short, you could have the best network bandwidth and latency if you optimize it at the best with GWT, while ZK is less to worry but, if you want to improve, you have to use jQuery (i.e, in JavaScript).
Thanks lechlukasz, I appreciate your comments and insight.
I will clarify my point about stand alone applications. We have a number of students, as high as 20%, who do not have access to broadband due to their geographic location. We are considering, as part of the design, how we may be able to distribute a stand alone version.
For instance, if we were to abstract all the database calls using a separate class in GWT, we could recompile a stand alone version that didn't make the database calls. The database would likely only be for tracking results and reporting.
In reality, we would likely implement the front end product first with references to empty methods for storing the results in a database and implement those methods at a later time.
For the record, we have started to code up some test cases using GWT/SmartGWT and are pleased with the results. Although we cannot comment on the other technologies considered because we didn't try them to the same extent, we are pleased with the results to this point of the project.
Cheers,
Ben

How do I plan an enterprise level web application?

I'm at a point in my freelance career where I've developed several web applications for small to medium sized businesses that support things such as project management, booking/reservations, and email management.
I like the work but find that eventually my applications get to a point where the overhear for maintenance is very high. I look back at code I wrote 6 months ago and find I have to spend a while just relearning how I originally coded it before I can make a fix or feature additions. I do try to practice using frameworks (I've used Zend Framework before, and am considering Django for my next project)
What techniques or strategies do you use to plan out an application that is capable of handling a lot of users without breaking and still keeping the code clean enough to maintain easily?
If anyone has any books or articles they could recommend, that would be greatly appreciated as well.
Although there are certainly good articles on that topic, none of them is a substitute of real-world experience.
Maintainability is nothing you can plan straight ahead, except on very small projects. It is something you need to take care of during the whole project. In fact, creating loads of classes and infrastructure code in advance can produce code which is even harder to understand than naive spaghetti code.
So my advise is to clean up your existing projects, by continuously refactoring them. Look at the parts which were a pain to change, and strive for simpler solutions that are easier to understand and to adjust. If the code is even too bad for that, consider rewriting it from scratch.
Don't start new projects and expect them to succeed, just because your read some more articles or used a new framework. Instead, identify the failures of your existing projects and fix their specific problems. Whenever you need to change your code, ask yourself how to restructure it to support similar changes in the future. This is what you need to do anyway, because there will be similar changes in the future.
By doing those refactorings you'll stumble across various specific questions you can ask and read articles about. That way you'll learn more than by just asking general questions and reading general articles about maintenance and frameworks.
Start cleaning up your code today. Don't defer it to your future projects.
(The same is true for documentation. Everyone's first docs were very bad. After several months they turn out to be too verbose and filled with unimportant stuff. So complement the documentation with solutions to the problems you really had, because chances are good that next year you'll be confronted with a similar problem. Those experiences will improve your writing style more than any "how to write good" style guide.)
I'd honestly recommend looking at Martin Fowlers Patterns of Enterprise Application Architecture. It discusses a lot of ways to make your application more organized and maintainable. In addition, I would recommend using unit testing to give you better comprehension of your code. Kent Beck's book on Test Driven Development is a great resource for learning how to address change to your code through unit tests.
To improve the maintainability you could:
If you are the sole developer then adopt a coding style and stick to it. That will give you confidence later when navigating through your own code about things you could have possibly done and the things that you absolutely wouldn't. Being confident where to look and what to look for and what not to look for will save you a lot of time.
Always take time to bring documentation up to date. Include the task into development plan; include that time into the plan as part any of change or new feature.
Keep documentation balanced: some high level diagrams, meaningful comments. Best comments tell that cannot be read from the code itself. Like business reasons or "whys" behind certain chunks of code.
Include into the plan the effort to keep code structure, folder names, namespaces, object, variable and routine names up to date and reflective of what they actually do. This will go a long way in improving maintainability. Always call a spade "spade". Avoid large chunks of code, structure it by means available within your language of choice, give chunks meaningful names.
Low coupling and high coherency. Make sure you up to date with techniques of achieving these: design by contract, dependency injection, aspects, design patterns etc.
From task management point of view you should estimate more time and charge higher rate for non-continuous pieces of work. Do not hesitate to make customer aware that you need extra time to do small non-continuous changes spread over time as opposed to bigger continuous projects and ongoing maintenance since the administration and analysis overhead is greater (you need to manage and analyse each change including impact on the existing system separately). One benefit your customer is going to get is greater life expectancy of the system. The other is accurate documentation that will preserve their option to seek someone else's help should they decide to do so. Both protect customer investment and are strong selling points.
Use source control if you don't do that already
Keep a detailed log of everything done for the customer plus any important communication (a simple computer or paper based CMS). Refresh your memory before each assignment.
Keep a log of issues left open, ideas, suggestions per customer; again refresh your memory before beginning an assignment.
Plan ahead how the post-implementation support is going to be conducted, discuss with the customer. Make your systems are easy to maintain. Plan for parameterisation, monitoring tools, in-build sanity checks. Sell post-implementation support to customer as part of the initial contract.
Expand by hiring, even if you need someone just to provide that post-implementation support, do the admin bits.
Recommended reading:
"Code Complete" by Steve Mcconnell
Anything on design patterns are included into the list of recommended reading.
The most important advice I can give having helped grow an old web application into an extremely high available, high demand web application is to encapsulate everything. - in particular
Use good MVC principles and frameworks to separate your view layer from your business logic and data model.
Use a robust persistance layer to not couple your business logic to your data model
Plan for statelessness and asynchronous behaviour.
Here is an excellent article on how eBay tackles these problems
http://www.infoq.com/articles/ebay-scalability-best-practices
Use a framework / MVC system. The more organised and centralized your code is the better.
Try using Memcache. PHP has a built in extension for it, it takes about ten minutes to set up and another twenty to put in your application. You can cache whatever you want to it - I cache all my database records in it - for every application. It does wanders.
I would recommend using a source control system such as Subversion if you aren't already.
You should consider maybe using SharePoint. It's an environment that is already designed to do all you have mentioned, and has many other features you maybe haven't thought about (but maybe you will need in the future :-) )
Here's some information from the official site.
There are 2 different SharePoint environments you can use: Windows Sharepoint Services (WSS) or Microsoft Office Sharepoint Server (MOSS). WSS is free and ships with Windows Server 2003, while MOSS isn't free, but has much more features and covers almost all you enterprise's needs.

How best to integrate several systems?

Ok where I work we have a fairly substantial number of systems written over the last couple of decades that we maintain.
The systems are diverse in that multiple operating systems (Linux, Solaris, Windows), Multiple Databases (Several Versions of oracle, sybase and mysql), and even multiple languages (C, C++, JSP, PHP, and a host of others) are used.
Each system is fairly autonomous, even at the cost of entering the same data into multiple systems.
Management recently decided that we should investigate what it will take to get all the systems happily talking to each other and sharing data.
Keep in mind that while we can make software changes to any of the individual systems, a complete rewrite of any one system (or more) is not something management is likely to entertain.
The first thought of several of the developers here was the straight forward: If system A needs data from system B it should just connect to system B's database and get it. Likewise if it needs to give B data it should just insert it into B's database.
Due to the mess of databases (and versions) used, other developers were of the opinion that we should have one new database, combining the tables from all the other systems to avoid having to juggle multiple connections. By doing this they hope that we might be able to consolidate some tables and get rid of the redundant data entry.
This is about the time I was brought in for my opinion on the whole mess.
The whole idea of using the database as a means of system communication smells funny to me. Business logic will have to be placed into multiple systems (if System A wants to add data to System B it better understand B's rules concerning the data before doing the insert), several systems will most likely have to do some form of database polling to find any changes to their data, continuing maintenance will be a headache, as any change to a database schema now propagates several systems.
My first thought was to take the time and write APIs/Services for the different systems, which once written could be easily used to pass/retrieve data back and forth. A lot of the other developers feel that is excessive and far more work than just using the database.
So what would be the best way to go about getting these systems to talk to each other?
Integrating disparate systems is my day job.
If I were you, I would go to great effort to avoid accessing System A's data from directly within System B. Updating System A's database from System B is extremely unwise. It is exactly the opposite of good practice to make your business logic so diffuse. You will end up regretting it.
The idea of the central database isn't necessarily bad ... but the amount of effort involved is probably within an order of magnitude of rewriting the systems from scratch. It is certainly not something I would attempt, at least in the form you describe. It can succeed, but it is much, much harder and it takes a lot more discipline than the point-to-point integration approach. It's funny to hear it suggested in the same breath as the 'cowboy' approach of just shoving data directly into other systems.
Overall your instincts seem pretty good. There are a couple of approaches. You mention one: implementing services. That's not a bad way to go, especially if you need updates in real time. The other is a separate integration application that is responsible for shuffling the data around. That's the approach I usually take, but usually because I can't change the systems I'm integrating to ask for the data it needs; I have to push the data in. In your case the services approach isn't a bad one.
One thing I would like to say that might not be obvious to someone coming to system integration for the first time is that every piece of data in your system should have a single, authoritative point of truth. If the data is duplicated (and it is duplicated), and the copies disagree with each other, the copy in the point of truth for that data must be taken to be correct. There is just no other way to integrate systems without having the complexity scream skyward at an exponential rate. Spaghetti integration is like spaghetti code, and it should be avoided at all costs.
Good luck.
EDIT:
Middleware addresses the problem of transport, but that is not the central problem in integration. If the systems are close enough together that one app can shove data directly in to another, they're probably close enough that a service offered by one can be called directly by another. I wouldn't recommend middleware in your case. You might get some benefit from it, but that would be outweighed by the increased complexity. You need to solve one problem at a time.
Sounds like you may want to investigate Message Queuing and message-oriented middleware.
MSMQ and Java Message Service being examples.
It seems you are looking for opinions, so I will provide mine.
I agree with the other developers that writing an API for all the different systems is excessive. You would likely get it done faster and have much more control over it if you just take the other suggestion of creating a single database.
One of the challenges that you will have is to align the data in each of the different systems so that it can be integrated in the first place. It may be that each of the systems that you want to integrate holds entirely different sets of data but more likely it is data that is overlapping. Before diving into writing API:s (which is the route I would take as well given your description) I would recommend that you try and come up with a logical data model for the data that needs to be integrated. This data model will then help you leverage the data that you are having in the different systems and make it more useful to the other databases.
I would also highly recommend an iterative approach to the integration. With legacy systems there is so much uncertainty that trying to design and implement it all in one go is too risky. Start small and work your way to a reasonably integrated system. "Fully integrated" is hardly ever worth aiming for.
Directly interfacing via pushing/ poking databases exposes a lot of internal detail of one system to another. There are obvious disadvantages: upgrading one system can break the other. Moreover, there can be technical limitations in how one system can access the database of the other (consider how an application written in C on Unix will interact with a SQL Server 2005 database running on Windows 2003 Server).
The first thing you have to decide is the platform where the "master database" will reside, and the same for the middleware providing the much required glue. Instead of going towards API level middleware-integration (such as CORBA), I would suggest you to consider Message Oriented Middleware. MS Biztalk, Sun's eGate and Oracle's Fusion can be some of the options.
Your idea of a new database is a step in the right direction. You might like to read a little bit on Enterprise Entity Aggregation pattern.
A combination of "data integration" with a middleware is the way to go.
If you are going towards Middleware + Single Central Database strategy, you might want to consider achieving this in multiple phases. Here's a logical stepped process which can be considered:
Implementation of services/APIs for different systems which expose the functionality for each system
Implementation of Middleware which accesses these APIs and provides an interface to all the systems to access the data/services from other systems (accesses data from central source if available, else gets it from another system)
Implementation of Central Database only, without data
Implementation of Caching/Data-Storage Services at the Middleware level which can store/cache data in the central database whenever that data is accessed from any of the Systems e.g. IF System A's records 1-5 are fetched by System B through Middleware, the Middleware Data Caching Services can store these records in the centralized database and the next time these records will be fetched from the central database
Data Cleansing can happen in Parallel
You can also create a import mechanism to push data from multiple systems to the central database on a daily basis (automated or manual)
This way, the effort is distributed across multiple milestones and data is gradually stored in the central database on first-accessed-first-stored basis.

Has a system that incorporated a rule engine ever been TRULY successful? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
Our system (exotic commodity derivative trade capture and risk management) is being redeveloped shortly. One proposal that I have heard is that a rule engine will be incorporated to make it easier for the end-users (commodities traders, so fairly sophisticated) to make certain changes to the business logic.
I am a little skeptical of rules engines. The agilist in me wonders if they are just a technical solution to a process problem... ie. it takes too long for our developers to respond to the business's need for change. The solution to that problem should be a more collaborative approach to development, better test coverage, more agile practices all around.
Hearing about situations where a rule engine was truly a boon (especially in a trading environment) would be certainly helpful.
I've seen two applications that used the Blaze Rete engine from Fair Issac.
One application slammed thousands of rules into a single knowledge base, had terrible memory problems, has become a black box that few understand. I would not call that a success, but it is running in production.
Another application used decision trees to represent on the order of hundreds of questions on a medical form to disposition clients. It was done so elegantly that business people can update the rules as needed, without having to involve a developer. (Still has to be deployed by one, though.) I'd call that a great success.
So it depends on how well focused the problem is, the size of the rule set, the knowledge of the developers. My prejudice is that simply making a rules engine a single point of failure and dumping rules into it probably isn't a good approach. I'd start with a data-driven or table-driven approach and grow that until a rules engine was needed. I'd also strive to encapsulate the rules engine as part of the behavior of an object. I'd hide the rules engine from users and try to partition the rules space into the domain model.
I don't know if I'd say they're ever truly a boon, but I think they can certainly be valuable. I worked on a system for a few years in the insurance industry, where a rules engine was employed quite successfully to allow the business users to create rules that determined what policies were legal, depending on the state.
For instance, if you had to have a copay in certain states, or certain combinations of deductible and copay were not allowed, either because of product considerations, or because it was simply illegal due to state law.
The number of states that the company operated in, along with the constant change in rules (quarterly) would make this a dizzying coding practice. More importantly, it's not in the expertise of a programmer. It adds extra pointless communication where the end user is describing the rule to be put in effect to a programmer who is not an insurance industry expert like they are.
Designed correctly, a rules engine can still enable a workflow system that allows for good testing. In this case, the rules were stored in a database, and there were QA and PROD databases. So the BA's could test their rules in QA, and then promote them to PROD.
As with anything, its usually about the implementation, and not the actual technique.
Yes, Microsoft has a Business Rule Engine (BRE) in BizTalk that has been used successfully for years. I've heard that they've had clients buy BizTalk (very expensive) just for the BRE.
In my experience, the practicality of having a business user update the rules is slim to none. It usually takes a technical person to work the business rules editor.
A rule engine is little more than something that executes declarative statements. They come with two primary advantages (that I see):
Your business logic is maintained from a single place instead of being sprinkled throughout application code. Technically, a well-designed application should already do this with the architecture, regardless of a rule engine being present or not.
You need to worry [less] about dependencies between declarative statements. The rule engine should be smart enough to decide the order to run rules based on dependencies. You may find that some rule engines support a sequential ordering of rules within a ruleset or calling rulesets (groups of rules) in a particular order, but this isn't really in the spirit of declarative programming. Many rule engines use Rete (an algorithm) to decide when to schedule the execution of declarative statements.
I suspect most, if not all, rule engines add more overhead than if you were to write the best possible program that doesn't use a rule engine. This is similar to how writing code in assembly is generally faster than a compiler (but you usually don't write assembly because it's more convenient and productive to use higher-level abstractions).
If you were to stop here, then you would probably use programmers to maintain rules and use a rule engine as a convenient way to build a business logic tier in your application. Some rule engines offer something called templates that let you define templates for rules. The advantage here is that non-technical users are supposed to be able to write their own rules and modify existing rules.
A rule engine is one more tool in your tool chest that, when used properly, can be valuable.
The problem with many of these rule engines are the lack of speed and the fact that replacing or augmenting rules can break existing working rules in subtle ways. So you still have to re-test the system thoroughly after each rule change. So you're basically just exchanging one computer language for another one - one with a much smaller base of users. As another poster mentioned, I've yet to see a business analyst successfully use a rule engine. You need a programmer anyway.
I certainly have, but can't publicly talk about them, but its likely you have interacted with one several times this year ;)
I see it in 2 camps: the logic programmers and the business users. Different tools target different sets, some both. The successful cases of business users have only worked when it was a subset of the logic, and they also had a way to define test cases and run them themselves (and they are prepared to think logically).
Logic programmers are rarer, but can often be found coming from non imperative programming backgrounds (they are also the sort of people that find functional programming intuitive).
Keep in mind at the end of the day even with visual tools, if you are telling a computer to do something it is still programming.
I work with lots of vendors in the space and one of the great things about this is that I get to talk to lots of their customers. So, yes, hundreds of companies have got exactly the benefits they were promised - increased agility, better business/IT collaboration, easier regulatory compliance, better consistency of decision making, lower maintenance costs, faster times to market etc.
Over and over again, across all the major vendors and the open source players, I see that used correctly - to automate and improve high-volume operational decisions with many rules, rules that change a lot, rules that interact in complex ways or rules with a high business domain content - business rules management systems work.
Really.
My experience is limited to (i)not much and (ii)prolog; but I can safely say that a rule engine can help you express propositional concepts much cleaner than procedural code.
Rules engines are routinely used in the insurance business. I've worked on systems with hundreds (600ish) rules that were implemented in a rules engine. It worked very well.
Do you have a credit rating? A FICO score, perhaps? That's Fair Isaac COrporation, the developers of the Blaze rules engine.
For a while I worked for the PEATE distributed computing project which was developing a system for calculation of large scale, high volume atmospheric data. The system had three parts to it: the data manager, the scheduler, and the algorithm execution component. There could be any number of any of these components, all done through web services, but what it allowed for was for different researchers to execute arbitrary jobs against arbitrary data, and also allowed for different scheduling mechanisms to be plugged in as requirements changed.
I left the project before it got too far off the ground, but this seems like it could potentially fit the scenario, and serve as another example for some kind of rule engine. That being said, however, if the original developers are still going to be the one's making the algorithms to run, I can't see too much benefit in having a rule engine unless it handled a substantial overhead that each rule or algorithm would incur on it's own.
This sounds like a bit more involved than a simple rule engine, but such an architecture could feasibly apply to a rule engine as well.