Data persistence in Smalltalk / Seaside - persistence

I've been spending some time lately getting acquainted with Smalltalk and Seaside. I'm coming from the Java EE world and as you can imagine it's been challenging getting my mind around some of the Smalltalk concepts. :)
At the moment I'm trying to grasp how data persistence is most typically implemented in the Smalltalk world. The assumption for me as a Java programmer is to use RDMS (ie. MySQL) and ORM (ie. Hibernate). I understand that is not the case for Smalltalk (using Hibernate at least). I'm not necessarily seeking the method that maps most closely to the way it is done in Java EE.
Is it most common to save data into the image, an object store or RDMS? Is it even typical for Smalltalk apps to use RDMS?
I understand there is no one-size-fits-all approach here and the right persistence strategy will depend on the needs of the application (how much data, concurrency, etc). What's a good approach that can start simple but also scale?
I've watched a video of Avi Bryant discussing the strategy he used for persistence and scaling DabbleDB. From what I understand, the customer's data was saved right into the image (one image per customer). That worked in his use case since customers didn't have to share data. Is this a common approach?
Hope I didn't make this TLDR. Many thanks to the insight you Smalltalk guys have provided in my previous questions. It's appreciated.

Justin,
don't worry, Smalltalk is not so different form other languages in this area, it just adds the Image based persistence option.
There are O/R mappers like Hibernate for Smalltalk, the GLORP and its Pharo port DBXtalk are surely the most popular ones these days. These should feel very comfortable for you if you know Hibernate.
Then there are OODB solutions like GemStone or Magma DB or VOSS and many others that let you leave all the O/R-mapping problems behind. Most of these are pretty limited to storing Smalltalk objects, GemStone being an exception in providing bridges to Ruby and other languages.
There also are tools to store Smalltalk objects in modern NoSQL databases like CouchDB, Cassandra, GOODS or others. The trick here is just the conversion of Smalltalk object values to JSON streams and a little HTTP-requesting.
Finally there is the option of saving your complete Smalltalk image. I'd say you can do that in a production environment, but it's not the standard or preferred way of dong it for many people. You do it a lot in development, because you can simply save an image and resume your work the next time exactly with all objects in place as you had them when you saved.
So the base line is: All the storage options you know are available in Smalltalk as well, plus one extra.
Joachim

I guess it basically depends on how big your DB is going to be and what kind of load will it be handling.
In my case, all apps I ever wrote use image persistance with disk serialization. Essentially, you just serialize your objects by using Fuel at request. In my case, I do so every time an important piece of data is dealt with, plus a regular process that serializes them every 24 hours. The image is also automatically saved every 24 hours.
The biggest application I wrote by using this approach is handling all the business processes of a small company of 10 workers plus around 50 freelancers who have been using it every day for a year and a half. The workload is pretty "big" taking in account the application deals with big files all the time, but the app has stayed stable and fast. Switching to a new server and updating the Pharo image was as easy as getting the project back from monticello and materializing the latest serialized "database".
In my opinion, ORM is an unnecessary pain, we're in the object world, and having to flatten our objects feels just wrong, especially when we have nice object-oriented solutions.
So, if your app handles fairly small amounts of data, I'd suggest either my simple approach or SandstoneDB. If your app deals with huge amounts of transactions and data, I'd go Gemstone.
Just my two cents.

Ramon Leon describes the situation, basic strategies, and their tradeoffs beautifully in his blog post.
I would start with his Simple Image Based Persistence framework, which I ported and use in Pharo 1.3. Mariano Martinez Peck recently adapted it to use Fuel (same link). It's very simple, does the job, and gives me much more confidence to play in my image, knowing that even if I permanently damage it, all my data is safe. I just copy the data folders to the new image folder, load my packages, and all my objects are alive in the new image.

Related

NoSQL (e.g. MongoDB) or RDMS (e.g. PostgreSQL) for new Scala project?

I'm developing a brand new project in Scala. It's just an application for a bunch of CRUD operations, however, because of some eccentric requirements, Play2 or Lift does not fit the bill, so I'm going to develop the application from the ground up. This means that Anorm or ScalaQuery becomes less obvious choices for database integration, and leaves me with the question: is it time to try something new?
My past technology stacks mostly included Java and PostgreSQL and I have experience with both ORM and plain SQL. Are NoSQL database management systems like MongoDB a good replacement for a typical RDBMS or are they special case application data stores? Also, how does the choice of database effect the greater Scala system design (if at all)? For example, the fact that you are using a JSON-like interface to talk to the database, and JSON between the web and a REST service, does not mean that much if everything in the middle becomes Scala objects, or does it?
I'm basically asking for someone's experience on moving from relational to object/document type databases, using Scala in particular. I know that good RDBMS integration is promised in the upcoming release of SLICK. So, if a company like TypeSafe decides to make a RDBMS integration part of the TypeSafe stack, then will I be swimming upstream by integrating to MongoDB using Casbah for example?
Apologies if this question appears a bit vague. I do hope that someone with the right insights or experience will be able to help though.
Update:
Apologies for not adding links to SLICK (it being fairly new). Here goes:
Quick overview
Project home
Update 2:
My personal first win for a technology is usually developer productivity - this translates to lightweight and simple: quick to learn, easy to maintain, no magic
I am currently in a similar situation, and since I have some experience with web development and SQL databases, I took it as an opportunity to work with MongoDB, Cashbah (and Scalatra). My experience is still very limited and the project and the amount of data I am working with is pretty small, but here are a few observations I've made.
For the few sets of data I have, performance does not seem to motivate either SQL or NoSQL. However, performance in the presence of huge amounts of data is often listed as a reason for using NoSQL, e.g., by Wikipedia
My documents (entries in the database) arise from benchmarking test suits, and mainly have a static structure, and I am optimistic that I could store them in a fixed-schema SQL database. However, a few substructures are not static, e.g., new test cases are added, new statistics are tracked, others are removed. This was my main motivation for trying a schema-free NoSQL database. Also, because I had the feeling that the document approach of MongoDB makes it much more obvious which data belongs together (i.e., to a document), in contrast to entries in a relational database, where the data would be distributed over various tables and rows, and where a full "document" would need to be reconstructed by joins.
Tools such as Lift-Json or Rogue allow you to work with regular Scala objects in a type-safe, although the data is regularly (de-)serialised as (from) JSON. However, this naturally works best if the structure of your data is mainly static, otherwise, you you are left with using strings to access your data (e.g., for expanding the results of a query using Cashbah).
If you are mainly concerned about a coherent representation of data on server and client side, languages such as Opa or Haxe might be of interest, since they compile to code that can executed on both sides. See this page for "multitarget" or "tierless" languages.
Got too long for a comment. Was just trying to relate my short experience with Scala (about 6 months now, since about when Play2 came out--it's quickly become my go to language).
I've enjoyed using Salat/Casbah with MongoDB in my last few projects; most have been in Play2, but the latest was without a webapp framework. It definitely hasn't felt like swimming upstream.
I would say that there are particular use cases for which I wouldn't use mongo, but it works nicely as a general purpose object data store, especially if you expect to query by id or index and don't need transactions (and will need minimal ad-hoc aggregation type stuff).
Expect to require a separate set of servers dedicated to mongodb (or to use a service dedicated to mongodb), but I guess that's normal for most serious database apps.
I've also used Play2/Anorm, which was surprisingly enjoyable to use for some ad-hoc query dashboard-style report pages. I started trying to go the Squeryl route, but Anorm seemed easier to use for one-off aggregation queries. Haven't looked at SLICK, but it sounds interesting.
It's really hard to say without knowing what problems you would like the app to solve.
I've personally found my productivity increased using NoSQL DBs via REST/JSON. Though bear in mind most NoSQL DBs offer REST interfaces which preclude the need for much middleware, Scala or otherwise, unless you intend to write a webapp with a UI.
If this is a learning exercise, I recommend you try multiple things out, as each NoSQL DB has something different to offer to your toolkit, and have personally found CouchDB, Riak, Neo4j, and MongoDb all with various pluses and drawbacks and good for different purposes.
Hope this helps, good luck.

SmartGWT, ZK and GenericFrame - Online Homework

Good day,
Our school, a small high school in semi-rural New Zealand, is currently looking into online homework solutions. Being one of the IT guys, I have been asked to look into some of the options. We have checked around and there are no robust solutions that cover what we are looking for. So, we are considering development of our own system, either on our own or in collaboration with some other schools.
Before I put significant time into any one option, I would thought I should ask for some expert advice.
Please keep in mind that one of our major obstacles is that around 20% of our students are on dial-up because broadband is not available in their area.
We are also not limited to the technologies listed, they just are the ones that we have been looking into up to this point.
With that in mind, here goes.
1. Is there a way to pre-determine the bandwidth needed for these technologies?
2. If bandwidth continued to be too limiting, could the final solution stand alone so we could distribute it to students on CD or USB stick?
3. What are some pros/cons of each for use with databases, specifically mysql or postgresql? (After all we do need to keep track of lots of data)
4. What are some pros/cons of each for of these RIA development?
I appreciate everyone for sharing their time and expertise on the matter.
Cheers,
Ben
1) If you write full-AJAX application, such as in GWT, the bandwitch will be:
a) the size of application java script, images, etc., you may consider that everything is loaded when user logs in (cache for images may seems to be big, but it's easily overloaded)
b) the size of communication - in GWT it depends only from you! no magic full-frame reloading, sending is only what YOU are wanting to send
2) I do not catch your point, stand alone applications can be distributed such way, applications that use databases generally can't
3) postgresql has high compatibility with Oracle - same transaction+select for update behaviour, pgPLSQL is highly inspired by PL/SQL (easy to rewrite stored procedures).
I personally suggest MySQL for a school project for its simplicity. PostgreSQL is powerful but a bit complicate to configure and the visual tool for optimizing queries not good.
Without considering the bandwidth, I definitely suggest ZK since, again, it is much easier to learn, to develop and to maintain (also much more powerful). The bandwidth consumption and latency of GWT really depends how much effort you want to invest, and how skillful your people are familiar with distributed computing, while the network bandwidth is basically the states of UI (not data), which is reasonably small. In short, you could have the best network bandwidth and latency if you optimize it at the best with GWT, while ZK is less to worry but, if you want to improve, you have to use jQuery (i.e, in JavaScript).
Thanks lechlukasz, I appreciate your comments and insight.
I will clarify my point about stand alone applications. We have a number of students, as high as 20%, who do not have access to broadband due to their geographic location. We are considering, as part of the design, how we may be able to distribute a stand alone version.
For instance, if we were to abstract all the database calls using a separate class in GWT, we could recompile a stand alone version that didn't make the database calls. The database would likely only be for tracking results and reporting.
In reality, we would likely implement the front end product first with references to empty methods for storing the results in a database and implement those methods at a later time.
For the record, we have started to code up some test cases using GWT/SmartGWT and are pleased with the results. Although we cannot comment on the other technologies considered because we didn't try them to the same extent, we are pleased with the results to this point of the project.
Cheers,
Ben

How do I plan an enterprise level web application?

I'm at a point in my freelance career where I've developed several web applications for small to medium sized businesses that support things such as project management, booking/reservations, and email management.
I like the work but find that eventually my applications get to a point where the overhear for maintenance is very high. I look back at code I wrote 6 months ago and find I have to spend a while just relearning how I originally coded it before I can make a fix or feature additions. I do try to practice using frameworks (I've used Zend Framework before, and am considering Django for my next project)
What techniques or strategies do you use to plan out an application that is capable of handling a lot of users without breaking and still keeping the code clean enough to maintain easily?
If anyone has any books or articles they could recommend, that would be greatly appreciated as well.
Although there are certainly good articles on that topic, none of them is a substitute of real-world experience.
Maintainability is nothing you can plan straight ahead, except on very small projects. It is something you need to take care of during the whole project. In fact, creating loads of classes and infrastructure code in advance can produce code which is even harder to understand than naive spaghetti code.
So my advise is to clean up your existing projects, by continuously refactoring them. Look at the parts which were a pain to change, and strive for simpler solutions that are easier to understand and to adjust. If the code is even too bad for that, consider rewriting it from scratch.
Don't start new projects and expect them to succeed, just because your read some more articles or used a new framework. Instead, identify the failures of your existing projects and fix their specific problems. Whenever you need to change your code, ask yourself how to restructure it to support similar changes in the future. This is what you need to do anyway, because there will be similar changes in the future.
By doing those refactorings you'll stumble across various specific questions you can ask and read articles about. That way you'll learn more than by just asking general questions and reading general articles about maintenance and frameworks.
Start cleaning up your code today. Don't defer it to your future projects.
(The same is true for documentation. Everyone's first docs were very bad. After several months they turn out to be too verbose and filled with unimportant stuff. So complement the documentation with solutions to the problems you really had, because chances are good that next year you'll be confronted with a similar problem. Those experiences will improve your writing style more than any "how to write good" style guide.)
I'd honestly recommend looking at Martin Fowlers Patterns of Enterprise Application Architecture. It discusses a lot of ways to make your application more organized and maintainable. In addition, I would recommend using unit testing to give you better comprehension of your code. Kent Beck's book on Test Driven Development is a great resource for learning how to address change to your code through unit tests.
To improve the maintainability you could:
If you are the sole developer then adopt a coding style and stick to it. That will give you confidence later when navigating through your own code about things you could have possibly done and the things that you absolutely wouldn't. Being confident where to look and what to look for and what not to look for will save you a lot of time.
Always take time to bring documentation up to date. Include the task into development plan; include that time into the plan as part any of change or new feature.
Keep documentation balanced: some high level diagrams, meaningful comments. Best comments tell that cannot be read from the code itself. Like business reasons or "whys" behind certain chunks of code.
Include into the plan the effort to keep code structure, folder names, namespaces, object, variable and routine names up to date and reflective of what they actually do. This will go a long way in improving maintainability. Always call a spade "spade". Avoid large chunks of code, structure it by means available within your language of choice, give chunks meaningful names.
Low coupling and high coherency. Make sure you up to date with techniques of achieving these: design by contract, dependency injection, aspects, design patterns etc.
From task management point of view you should estimate more time and charge higher rate for non-continuous pieces of work. Do not hesitate to make customer aware that you need extra time to do small non-continuous changes spread over time as opposed to bigger continuous projects and ongoing maintenance since the administration and analysis overhead is greater (you need to manage and analyse each change including impact on the existing system separately). One benefit your customer is going to get is greater life expectancy of the system. The other is accurate documentation that will preserve their option to seek someone else's help should they decide to do so. Both protect customer investment and are strong selling points.
Use source control if you don't do that already
Keep a detailed log of everything done for the customer plus any important communication (a simple computer or paper based CMS). Refresh your memory before each assignment.
Keep a log of issues left open, ideas, suggestions per customer; again refresh your memory before beginning an assignment.
Plan ahead how the post-implementation support is going to be conducted, discuss with the customer. Make your systems are easy to maintain. Plan for parameterisation, monitoring tools, in-build sanity checks. Sell post-implementation support to customer as part of the initial contract.
Expand by hiring, even if you need someone just to provide that post-implementation support, do the admin bits.
Recommended reading:
"Code Complete" by Steve Mcconnell
Anything on design patterns are included into the list of recommended reading.
The most important advice I can give having helped grow an old web application into an extremely high available, high demand web application is to encapsulate everything. - in particular
Use good MVC principles and frameworks to separate your view layer from your business logic and data model.
Use a robust persistance layer to not couple your business logic to your data model
Plan for statelessness and asynchronous behaviour.
Here is an excellent article on how eBay tackles these problems
http://www.infoq.com/articles/ebay-scalability-best-practices
Use a framework / MVC system. The more organised and centralized your code is the better.
Try using Memcache. PHP has a built in extension for it, it takes about ten minutes to set up and another twenty to put in your application. You can cache whatever you want to it - I cache all my database records in it - for every application. It does wanders.
I would recommend using a source control system such as Subversion if you aren't already.
You should consider maybe using SharePoint. It's an environment that is already designed to do all you have mentioned, and has many other features you maybe haven't thought about (but maybe you will need in the future :-) )
Here's some information from the official site.
There are 2 different SharePoint environments you can use: Windows Sharepoint Services (WSS) or Microsoft Office Sharepoint Server (MOSS). WSS is free and ships with Windows Server 2003, while MOSS isn't free, but has much more features and covers almost all you enterprise's needs.

How best to integrate several systems?

Ok where I work we have a fairly substantial number of systems written over the last couple of decades that we maintain.
The systems are diverse in that multiple operating systems (Linux, Solaris, Windows), Multiple Databases (Several Versions of oracle, sybase and mysql), and even multiple languages (C, C++, JSP, PHP, and a host of others) are used.
Each system is fairly autonomous, even at the cost of entering the same data into multiple systems.
Management recently decided that we should investigate what it will take to get all the systems happily talking to each other and sharing data.
Keep in mind that while we can make software changes to any of the individual systems, a complete rewrite of any one system (or more) is not something management is likely to entertain.
The first thought of several of the developers here was the straight forward: If system A needs data from system B it should just connect to system B's database and get it. Likewise if it needs to give B data it should just insert it into B's database.
Due to the mess of databases (and versions) used, other developers were of the opinion that we should have one new database, combining the tables from all the other systems to avoid having to juggle multiple connections. By doing this they hope that we might be able to consolidate some tables and get rid of the redundant data entry.
This is about the time I was brought in for my opinion on the whole mess.
The whole idea of using the database as a means of system communication smells funny to me. Business logic will have to be placed into multiple systems (if System A wants to add data to System B it better understand B's rules concerning the data before doing the insert), several systems will most likely have to do some form of database polling to find any changes to their data, continuing maintenance will be a headache, as any change to a database schema now propagates several systems.
My first thought was to take the time and write APIs/Services for the different systems, which once written could be easily used to pass/retrieve data back and forth. A lot of the other developers feel that is excessive and far more work than just using the database.
So what would be the best way to go about getting these systems to talk to each other?
Integrating disparate systems is my day job.
If I were you, I would go to great effort to avoid accessing System A's data from directly within System B. Updating System A's database from System B is extremely unwise. It is exactly the opposite of good practice to make your business logic so diffuse. You will end up regretting it.
The idea of the central database isn't necessarily bad ... but the amount of effort involved is probably within an order of magnitude of rewriting the systems from scratch. It is certainly not something I would attempt, at least in the form you describe. It can succeed, but it is much, much harder and it takes a lot more discipline than the point-to-point integration approach. It's funny to hear it suggested in the same breath as the 'cowboy' approach of just shoving data directly into other systems.
Overall your instincts seem pretty good. There are a couple of approaches. You mention one: implementing services. That's not a bad way to go, especially if you need updates in real time. The other is a separate integration application that is responsible for shuffling the data around. That's the approach I usually take, but usually because I can't change the systems I'm integrating to ask for the data it needs; I have to push the data in. In your case the services approach isn't a bad one.
One thing I would like to say that might not be obvious to someone coming to system integration for the first time is that every piece of data in your system should have a single, authoritative point of truth. If the data is duplicated (and it is duplicated), and the copies disagree with each other, the copy in the point of truth for that data must be taken to be correct. There is just no other way to integrate systems without having the complexity scream skyward at an exponential rate. Spaghetti integration is like spaghetti code, and it should be avoided at all costs.
Good luck.
EDIT:
Middleware addresses the problem of transport, but that is not the central problem in integration. If the systems are close enough together that one app can shove data directly in to another, they're probably close enough that a service offered by one can be called directly by another. I wouldn't recommend middleware in your case. You might get some benefit from it, but that would be outweighed by the increased complexity. You need to solve one problem at a time.
Sounds like you may want to investigate Message Queuing and message-oriented middleware.
MSMQ and Java Message Service being examples.
It seems you are looking for opinions, so I will provide mine.
I agree with the other developers that writing an API for all the different systems is excessive. You would likely get it done faster and have much more control over it if you just take the other suggestion of creating a single database.
One of the challenges that you will have is to align the data in each of the different systems so that it can be integrated in the first place. It may be that each of the systems that you want to integrate holds entirely different sets of data but more likely it is data that is overlapping. Before diving into writing API:s (which is the route I would take as well given your description) I would recommend that you try and come up with a logical data model for the data that needs to be integrated. This data model will then help you leverage the data that you are having in the different systems and make it more useful to the other databases.
I would also highly recommend an iterative approach to the integration. With legacy systems there is so much uncertainty that trying to design and implement it all in one go is too risky. Start small and work your way to a reasonably integrated system. "Fully integrated" is hardly ever worth aiming for.
Directly interfacing via pushing/ poking databases exposes a lot of internal detail of one system to another. There are obvious disadvantages: upgrading one system can break the other. Moreover, there can be technical limitations in how one system can access the database of the other (consider how an application written in C on Unix will interact with a SQL Server 2005 database running on Windows 2003 Server).
The first thing you have to decide is the platform where the "master database" will reside, and the same for the middleware providing the much required glue. Instead of going towards API level middleware-integration (such as CORBA), I would suggest you to consider Message Oriented Middleware. MS Biztalk, Sun's eGate and Oracle's Fusion can be some of the options.
Your idea of a new database is a step in the right direction. You might like to read a little bit on Enterprise Entity Aggregation pattern.
A combination of "data integration" with a middleware is the way to go.
If you are going towards Middleware + Single Central Database strategy, you might want to consider achieving this in multiple phases. Here's a logical stepped process which can be considered:
Implementation of services/APIs for different systems which expose the functionality for each system
Implementation of Middleware which accesses these APIs and provides an interface to all the systems to access the data/services from other systems (accesses data from central source if available, else gets it from another system)
Implementation of Central Database only, without data
Implementation of Caching/Data-Storage Services at the Middleware level which can store/cache data in the central database whenever that data is accessed from any of the Systems e.g. IF System A's records 1-5 are fetched by System B through Middleware, the Middleware Data Caching Services can store these records in the centralized database and the next time these records will be fetched from the central database
Data Cleansing can happen in Parallel
You can also create a import mechanism to push data from multiple systems to the central database on a daily basis (automated or manual)
This way, the effort is distributed across multiple milestones and data is gradually stored in the central database on first-accessed-first-stored basis.

What level of complexity requires a framework?

At what level of complexity is it mandatory to switch to an existing framework for web development?
What measurement of complexity is practical for web development? Code length? Feature list? Database Size?
If you work on several different sites then by using a common framework across all of them you can spend time working on the code rather than trying to remember what is located where and why.
I'd always use a framework of some sort, even if it's your own, as the uniformity will help you structure your project. Unless it's a one page static HTML project.
There is no mandatory limit however.
I don't think there is a level of complexity that necessitates a framework. For me whenever I am writing a dynamic site I immediately consider a framework, and if it will save me time, I use it(it almost always does, and I almost always do).
Consider that the question may be faulty. Many of the most complex websites don't use any popular, preexisting, framework. Google has their own web server and their own custom way of doing things, as does Amazon, and probably lots of other sites.
If a framework makes your task easier, or provides added value, go for it. However, when you get that framework you are tied to a new dependancy. I'm starting to essentially recreate a Joel on Software post, so I will redirect you here for more on adding unneeded dependencies to your code:
http://www.joelonsoftware.com/articles/fog0000000007.html
All factors matter. You should measure how much time you can save using 3rd party framework and compare it to the risks of using other's code
Never "mandatory." Some problems are not well solved by any framework. It would be suggestible to switch to a framework when most of the code you are implementing has already be implemented by the framework in question in a way that suits your particular application. This saves you time, energy, and will most likely be more stable than the fresh code you would have written.
This is really two questions, you realize. :-) The answer to the first one is that it's never mandatory, but honestly, parsing HTML request parameters directly is pretty horrible right from the start. I don't want to do it even once, so I tend to go toward a framework relatively early on.
As far as what measurement is practical, well, what are you worried about? All of the descriptions that you list have value. Database size matters primarily for scaling, in my opinion (you can write a very simple app if you have a very simple schema, even if there are hundreds of thousands of rows in the database). The feature list will probably determine the number and complexity of UI pages, which will in turn help to dictate the code length.
There are frameworks that are there for getting moving very quickly with a simple blog, django or RoR all the way to enterprise full-stack applications Zope. Not to be tied to just the buzz world, you also have ASP.Net and J2EE, etc.
All frameworks and libraries are tools at your disposal. Determine which ones will make your life easier for your given project and use them.
I would say the reverse is true. At some point, your project gets so expansive, that you actually get slowed down by the shortcomings of the framework. For sufficiently large projects you may, in fact, be better off developing your own framework, to meet your own needs. I have seen many times where people were held back in the decisions they could make, or the work they could produce, because they were trying to do something that the framework didn't anticipate. And doing these things that the framework doesn't anticipate can be very troublesome. The nice thing about making your own framework, is that it can evolve with your project, to be a help to you system, instead of a hindrance.
So, to conclude, small projects should be use existing frameworks. Large projects should contain their own framework.