Load CMS core files from one server from multiple servers - zend-framework

I'm almost done with our custom CMS system. Now we want to install this for different websites (and more in the future), but every time I change the core files I will need to update each server/website seperatly.
What I really want is to load the core files from our server, so if I install an CMS I only define the nedded config files (on that server) and the rest is loaded from our server. This way I can pass changes in the core very simple, and only once.
How to do this, or this a completely wrong way? If so, what is the right way? Thing I need to look out for? Is it secure (without paying thousands for a https connection)?
I have completely no idea how to start or were to begin, and couldn't find anything helpful (maybe wrong search) so everything is helpful!
Thanks in advance!
Note: My application is build using the Zend Framework

You can't load the required files from remote on runtime (or really don't want to ;). This problem goes down to a proper release & configuration managment where you update all of your servers. But this can mostly be done automatically.
Depending on how much time you want to spend on this mechanism there are some things you have to be aware of. The general idea is, that you have one central server which holds the releases and have all other servers check if for updates, download and install them. There are lot's of possibilities like svn, archives, ... and the check/update can be done manually at the frontend or by crons in the background. Usually you'll update all changed files except the config files and the database as they can't be replaced but have to be modified in a certain way (this is the place where update scripts come into place).
This could look like this:
Cronjob is running on the server which checks for updates via svn
If there is a new revision it'll do a svn-update
This is an very easy to implement mechansim which holds some drawbacks like you can't change the config-files and database. Well infact it'd be possible but quite difficult to achieve.
Maybe this could be easier with a archive-based solution:
Cronjob checks updateserver for a new version. This could be done by reading the contents of a file on the update-server and compare it to a local copy
If there is a new version, download the related archive
Unpack the archive and copy the files
With that approach you might be able to include update-scripts into updates to modify configs/databases.
Automatic updatedistribution is a very very complex topic and that are only two very simple approaches. There are probably very many different solutions out there and 'selecting' the right one is not an easy task (it does even get more complex if you have different versions of a product with dependencies :) and there is no "this is the way it has to be done".

Related

Version-control in a large SSIS ETL project

We're about to make data transformation from one system to another using SSIS. We are four people people who will continuously be working on this for two years and therefore we need some sort of versioning system. We can not use team foundation. We're currently configuring a SVN server, but digging into it I've seen some big risks.
It seems that a solution is stored in one huge XML file. This must be a huge problem in a combined code/drag and drop environment as SSIS, as it will be impossible for SVN to merge the changes correctly, and whenever we get an error when commiting we will have to look inside that huge XML file and correct the mistakes manually.
One way to solve this problem is to create many solution projects in SSIS. However, this is not really the setup we want as we are creating one big monster which will have 2 days to execute and we want to follow its progress as it executes. If we have to create several solutions are there ways to link their execution and still have a visual look of whats going on and how well the execution is doing?
Has anyone had similar problems and/or do you have any suggestions as to how to solve them?
Just how many packages are you talking about? If it is hundreds of packages, then what is the specific problem you are trying to avoid? Here are a few things you might be trying to avoid based on your post:
Slow solution and project load time at startup in BIDS. I suppose this could be irritating from time to time. But if you keep BIDS open all day, that seems like a once a day cost.
Slow solution and project load time when you get latest solution definition from your version control system. Again, I suppose this could be irritating from time to time, but how frequently do you need to refresh the whole solution? If you break the solution into separate projects, then you only need to refresh a project. You would only need to refresh the whole solution if you want to get access to a new project within the solution.
What do you mean by "one huge XML file"? The solution file is an XML file that keeps track of the projects. Each project file is an XML file that keeps track of its SSIS packages. So if you have 1,000 SSIS packages evenly distribution across 10 projects in 1 solution, then each file would have no more than 100 objects to track. I can tell you from experience that I've had Reporting Services projects with more RDL files than this and it only took seconds to load the solution properly in BIDS. And as #revelator pointed out, the actual SSIS packages are their own individual XML files. Any version control system should track each of these as separate files and won't combine them into "one huge XML file". If you clarify what you mean by this point, then I think you will get better help on the question.
Whether you are running one package or 1,000 packages, you won't be doing this interactively from BIDS. You will probably deploy the packages to server first and then have the server run the packages. If that's the case, then you will need to call the packages probably with a SQL Server Agent job. Whether you chain the packages by making each package call another package or if you chain the packages by having the job call each package as a separate job step, you can still track where you are in the chain with logging. If you are calling the packages with jobs, then you can track it with job steps too. I run a data warehouse that has scores of packages and I primarily rely on separating processes into jobs that each contain one or more packages. I also chain jobs with start job commands so that I can more easily monitor performance of logical groups of loads. Also, each package shows its execution time in the job history at the step level. Furthermore, I have custom logging in each stored procedure and package that shows how many seconds and rows an individual data load or stored procedure took so that I can troubleshoot performance bottlenecks.
Whatever you do, don't rely on running packages interactively as a way to track performance! You won't get optimal performance running ETL on your machine, let alone running it with a GUI. Run packages in jobs on servers, not desktops. Interactively running packages is just their to help build and troubleshoot individual packages, not to adminster daily ETL.
If you are building generic packages that change their targets and sources based on parameters, then you probably need to build a control table in a database tha tracks progress. If you are simply moving data from one large system to another as a one time event, then you are probably going to divide the load into small sets of packages and have separate jobs for each so that you can more easily manage recovering from failures. If you intend to build something that runs regularly to move data, then how could 2 days of constant running for one process even make sense? It sounds like the underlying data will change on you within 2 days...
If you are concerned about which version control system to use for managing SSIS package projects, then I can say that just about any will do. I've used Visual SourceSafe and Perforce at different companies and both have the same basic features of checking in and checking out individual packages. I'm sure just about any version control system that integrates with Visual Studios will do this for you.
Hope you find something useful in the above and good luck with your project.
Version control makes it possible to have multiple people developing together and working on same project. If I am working on something, a fellow ETL developer will not be able to check it out and make changes to it until I am finished with my changes and check those back in. This addresses the common situation where one developer’s project artifact and code changes clobber that of another developer by accident.
http://blog.sqlauthority.com/2011/08/10/sql-server-who-needs-etl-version-control/
Most ETL projects I work use SVN as the source control repository. The best method I have found is to break each project or solution down into smaller, distinct (and often independently runnable) packages. So for example, say you had a process called ManufacturingImport, this could be your project. Within this you would have a Master package, which then called other packages as required. This means that members of the team can work on distinct packages or pieces of work, rather than everyone trying to edit the same package and getting into troublesome situations with merging.

How to create a proper website? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Web developing isn't what it used to be. It used to consist of hacking together a few PHP scripts (I have nothing against PHP, actually it's currently my main programming language), uploading them via FTP to some webhost and that was that. Today, things are more complicated. As I can see by looking at a number of professional and modern websites (SO being the main one, I consider SO being a great example of good practice in web developing, even if it's made with ASP.NET and hosted on Windows), developing a website is much more than that:
The website code is actually in a repository (that little svn revision in the footer makes my nerdy feelings tingle);
Static files (CSS, JavaScript, images) are stored on a separate domain;
Ok, these were my observations. Now for my questions:
What do you do with JavaScript and CSS files? Do you just not keep them under version control? That would seem stupid. Do you create a separate repository for them?
How do you set up the repository? Do you just create one in the root of the web server? Or do you create some sort of post-commit trigger that copies the latest files to their appropriate destinations?
What happens if you have multiple machines running the website and want to push some changes to all of them?
Every such project has to have configuration files. These differ from the local repository to the remote one. For example, on my development machine I have no MySQL root password, while on the production server I certainly have a password. This password would be stored in a config file, amongst other such things, which would be completely different on my machine and on the server. Maybe they are different between production machines, too (like I said earlier, maybe the website runs on multiple machines for load balancing). How do I handle that?
I'm looking to start a new web project using:
Python + SQLAlchemy + Werkzeug + Jinja2
Apache httpd + modwsgi
MySQL
Mercurial
What I'd like is some best practice advice on using the aforementioned tools and answers to my questions above.
You're right, things can get complicated when trying to deploy a scalable website. Here are what I've found to be a few good guidelines (disclaimer: I'm a rails engineer):
Most of the decisions regarding file structure for your code repository are largely based upon the convention of the language, framework and platform you choose to implement. Many of the questions you brought up (JS, CSS, assets, production vs development) is handled with Rails. However, that may differ from PHP to Python to whichever other language you want to use. I've found you should do some research about what language you're choosing to use, and try to find a way to fit the convention of that community. This will help you when you're trying to find help on an obstacle later. Your code will be organized like their code, and you'll be able to get answers more easily.
I would version control everything that isn't very substantial in size. The only problem I've found with VC is when your repo gets large. Apart from that I've never regretted keeping a version of previous code.
For deployment to multiple servers, there are many scripts that can help you accomplish what you need to do. For Ruby/Rails, the most widely used tool is Capistrano. There are comparable resources for other languages as well. Basically you just need to configure what your server setup is like, and then write or look to open source for a set of scripts that can deploy/rollback/manipulate your codebase to the servers you've outlined in your config file.
Development vs Production is an important distinction to make. While you can operate without that distinction, it becomes cumbersome quickly when you're having to patch up code all over your repository. If I were you, I'd write some code that is run at the beginning of every request that determines what environment you're running in. Then you have that knowledge available to you as you process that request. This information can be used when you specify which configuration you want to use when you connect to your db, all the way to showing debug information in the browser only on development. It comes in handy.
Being RESTful often dictates much of your design with regards to how your site's pages are discovered. Trying to keep your code within the restful framework helps you remember where your code is located, keeps your routing predictable, keeps your code from becoming too coupled, and follows a convention that is becoming more and more accepted. There are obviously other conventions that can accomplish these same goals, but I've had a great experience using REST and it's improved my code substantially.
All that being said. I've found that while you can have good intentions to make a pristine codebase that can scale infinitely and is nice and clean, it rarely turns out this way. If I were you, I'd do a small amount of research on what you feel the most comfortable with and what will help make your life easier, and go with that.
Hopefully that helps!
While I have little experience working with the tools you've mentioned, except for MySQL, I can give you a few fairly standard answers for the questions you posted.
1) Depends on the details, but most often you keep them in the same repository but in a separate folder.
2) Just because something is commited to the repository doesn't mean that it's ready to go live - it's quite often an intermediary build that could be riddled with bugs. A publish is done manually, with an export from the repository. Setting up the webserver in the same folder as a svn checkout is a huge nono as the .svn folder contains quite a bit of sensitive information, such as how to push changes to the svn server.
3) You use some sort of NAS or SAN solution, or simply a network share on one of the servers, and read all your data from there. That way, when you push information to one place, it's accessible by all servers. If your network is slow, you set up scripts that pushes the files out to all the servers automatically from a single location. If you use a multi-server environment in ASP.NET, don't forget to update the machine key in the config files or your shared encrypted caches, like the viewstate, won't work across servers. Having a session store in a database is also a good idea.
4) I've got a post build step that only triggers on publish that replaces my database connectionstrings with production ones, and also changes my Production app config value from false to true in the published web.config/app.config files. I can't see any case where you'd want different config files for different servers serving the same content.
If something is unclear, just comment and I'll try to clarify.
Good luck! // Eric Johansson
I think you are mixing 2 different aspects, source control and deployment. Just because you have all your files in a single repository doesnt mean they have to be deployed that way. Its also arguable whether you should be deploying directly using source control or instead using a build/deploy script which could handle any number of configurations.
Also hosting static files on a seperate domain only really becomes worthwhile on high traffic websites. Are you sure you aren't prematurely optimising?

How should I create an automated deployment script?

I need to create some way to get a local WAR file deployed on a Linux server. What I have been doing until now is the following process:
Upload WAR using WinSCP.
SSH into server using PuTTY.
Move/Rename/Delete certain files folders to prepare for WAR explosion.
Explode WAR.
Send email notifying users of restart.
Stop Tomcat server.
Use tail to make sure server stopped correctly.
Change symlink to point to exploded WAR.
Start Tomcat.
Use tail to make sure server started correctly.
Send email notifying users of completed restart.
This stuff is all relatively straightforward. And I'm sure there are a million and one different ways to do it. Id like to hear about some options. My first thought was a Bash script. I have very little experience with scripting in general but thought this would be a good way to learn. I would also be interested in doing this with Ruby/Python or something current like this as I have little to no experience with these languages. I think as a young developer, I should definitely get some sort of scripting language under my belt. I may also be interested in some sort of software solution that could do this stuff for me, although I think scripting would be a better way to go for the sake of ease and customizability (I might have just made that word up).
Some actual questions for those that made it this far. What language would you recommend to automate the process I've listed above? Would this be a good opportunity for me to learn Bash/Ruby/Python/something else, or should I simply take the 10 minutes to do this by hand 2-3 times a week? I would think the answer to this is obviously no. Can I automate these things from my computer, or will I need to setup the scripts to run within the Linux server? Is the email something I can automate or am I better off doing that part myself?
More questions will almost certainly come up as I do this so thanks to all in advance.
UPDATE
I should mention, I am using Maven to build the WAR. So if I can do all of this with Maven please let me know.
This might be too heavy duty for your needs, but have you looked at build automation tools such as CruiseControl or Hudson? You might also want to look at Integrity, which is more lightweight and written in Ruby (instead of Java like the other two I mentioned). These tools can do everything you said you needed in your question plus way, way more.
Edit
Since you want this to be more of a learning exercise in scripting languages than a practical solution, here's an idea for you. Instead of manually uploading your WAR each time to your server, set up a Mercurial repository on your server and create a hook (see here, here, and especially here) that executes a Ruby (or ant, or maven) script each time a changeset is pushed from a remote computer (i.e. your local workstation). You would write the script so it does all the action items in your list above. That way, you will get to learn three new things: a distributed version control paradigm, how to customize said tool, and how to write Ruby scripts to interact with your operating system (since your actions are very filesystem heavy).
The most common in my experience is ant, it's worth learning, it's all pretty simple, and very usefull.
You should definately automate it, and you should aim to have it happen in 1 step.
What are you using to build the WAR file itself? There's some advantage to using the same tool for build and deployment. On several projects I've used Ant to build a Java project and deploy it to the servers.

need to implement versioning in Online backup tool

I am working on the developement of a application that will perform online backup of the files and folder in the PC, automatically or manually. Currently, I was keeping only the latest version of the file at the server.Now, I have to implement the versioning so that only the changes can be transfered to the online server and user must be able to download any of the available version of the file at Backup Server.
I need to perform Deduplication for this. Guys, though I am able to perform it using the fixed block size but facing an overhead of transferring the file having CRC information with each version backup.
I have never worked on such technology , so lacks in experience. I am eager to know is there any feasible method to embedd this functionality in the application without much pain. Is any third party tool would help to perform same thing? Please let me know?
Note: I am using FTP protocol to transfer the data.
There's a program called dump that does something similar, but it operates on filesystem blocks rather than files. rsync also may be of interest.
You will need to keep track of a large number of blocks with multiple versions and how they fit into the various versions of the original files, so you will need some kind of database to track this information, and an efficient way to query it to determine which blocks in a given file need to be transferred. Also note that adding something to the beginning of a file will cause all your blocks to be "new" if you use a naive blocking and diff scheme.
To do this well will be very complex. I highly recommend you thoroughly research already-available solutions, and if you decide you need to write your own, consider the benefits of their designs carefully.

How do I sync my development with the users?

I create websites for people. I have given them the ability to edit certain areas of their published pages using CushyCMS. That works fine, and everyone is happy with it.
When I go to publish some of my more extensive changes, I first need to pull down the latest version that they have produced. Then I make my changes, and upload everything to production.
I would like to use some sort of version control in this process. This should be a classic update-edit-commit-publish workflow, but I'm not sure how to go about this. Basically I want to avoid pulling down everything locally and doing the commits. I only want to pull down what has changed.
I use filezilla, and it doesn't do a good job of identifying changed files. I can't rely on the filesize, because sometimes it stays the same. I can't rely on timestamps because the server time is different than my machine, and it never seems to work correctly.
How can I get around my problem? I use Notepad++, Subversion and FileZilla, but I'm willing to try other tools if they would make this process easier.
It comes down to CushyCMS's decision to edit files directly and not put the user-provided content in a database like WordPress, DotNetDuke, Drupal, etc. So the real answer is you can't get there from here and should look into migrating to a database backed CMS. Thats not what you want to hear though.
Version control will get you part of the way to concurrency but there is always the possibility of a user updating a page between your pull down and publishing the revised copy since your users wouldn't be checking into the version control system directly. That would require them to learn the version control system and negate the ease that CushyCMS (or any CMS really) provides. You'll want to try and find a system that allows your live site to be the Master to which you compare and check-out files from. I do not know of any mainstream systems that currently work that way.
I found that it was easiest to use a tool like Beyond Compare to handle the synchronization.