need to implement versioning in Online backup tool - version-control

I am working on the developement of a application that will perform online backup of the files and folder in the PC, automatically or manually. Currently, I was keeping only the latest version of the file at the server.Now, I have to implement the versioning so that only the changes can be transfered to the online server and user must be able to download any of the available version of the file at Backup Server.
I need to perform Deduplication for this. Guys, though I am able to perform it using the fixed block size but facing an overhead of transferring the file having CRC information with each version backup.
I have never worked on such technology , so lacks in experience. I am eager to know is there any feasible method to embedd this functionality in the application without much pain. Is any third party tool would help to perform same thing? Please let me know?
Note: I am using FTP protocol to transfer the data.

There's a program called dump that does something similar, but it operates on filesystem blocks rather than files. rsync also may be of interest.
You will need to keep track of a large number of blocks with multiple versions and how they fit into the various versions of the original files, so you will need some kind of database to track this information, and an efficient way to query it to determine which blocks in a given file need to be transferred. Also note that adding something to the beginning of a file will cause all your blocks to be "new" if you use a naive blocking and diff scheme.
To do this well will be very complex. I highly recommend you thoroughly research already-available solutions, and if you decide you need to write your own, consider the benefits of their designs carefully.

Related

HTTP based "mirror"

I am looking to implement a PS based system to manage a local library of assets, specifically a library of Revit Family files. There is a "vetted library" that acts as the source library, to which items can be added, removed or revised. This library then needs to be mirrored on the local machine.
I do this now with the vetted library on the network, and I do a Robocopy /mir at every user logon. This works great for a traditional office environment with laptops that sometimes leave the office, to ensure they have the current library. However, with Work From Home now a major issue, I want to implement a similar functionality but with a web hosted library, either on my own server or an Amazon S3 bucket. My thinking is to make this a two stage process.
1: At update of the vetted library, an XML file is updated, which includes the entire folder structure and file data for the library, including file size and file hash.
2: On the local machine, I download the vetted library map, and compare with the previous map. Missing and extraneous files are easy, though moved files are a bit more complex. Files with different sizes are easy too. If files are the same size, then already computed hashes are compared. In this way I can build a list of files to be deleted locally, as well as new files to be downloaded.
These libraries can easily reach 5gb and 10k files per library, and every year a new library is required. Often firms have as many as 5 year versions of the software installed. So, LOTS of files and lots of size.
This seems like the most performant way to handle regular updates, but I wonder if there is a cmdlet already available that handles this kind of thing better?
I know I COULD do this with Dropbox or the like, but there are a number of arguments against it, from the size of the libraries to security and access control (which I will need to address with my solution eventually as well). These libraries can cost 10s of thousands of $ to purchase, and folks aren't going to want to manage them via dropbox or OneDrive.
And... the fact that Microsoft has OneDrive has me thinking there isn't a built in PS way to do this, since they want to push OneDrive. In which case, is my file map compare based approach viable, or is there a better approach I should consider.
I know there is no code here, so maybe I am running afoul of some Stack overflow rule, but hopefully program specification and planning is seen as an appropriate avenue for questions as well as simple code solutions.

How can I code or use directly a version control system such as Subversion with Mongodb?

I am setting up a simple online cms/editing system with a few multiple editors and would like a simple audit trail with diff, history, comparison and roll back functionality for small bits of text.
Our editors have gotten used to the benefits of using XML / Svn and I really would like to create a simple version of this in my system.
I realise I could probably create my own using say, a versions / history db with linked ids like this but I wondered if this is the best way or if there is an equivalent to an Svn api style interface available?
Btw I am totally new to Mongodb so go easy on me :-)
Cheers
Putting the data that create the database is not a good idea since it consits only of binary data. Additionally, this is rather huge in the beginnging since MongoDB allocates some disk space for it. So you have no benefit of putting the data folders under version controll.
If you want to track changes, you could export the data into its serialized form and store it in your VCS. If this is getting bigger, the advantage of the VCS may also drop since it will become very slow.
I assume you need to track the changes from within the data but since you deal with binary data, you are out of luck.

Load CMS core files from one server from multiple servers

I'm almost done with our custom CMS system. Now we want to install this for different websites (and more in the future), but every time I change the core files I will need to update each server/website seperatly.
What I really want is to load the core files from our server, so if I install an CMS I only define the nedded config files (on that server) and the rest is loaded from our server. This way I can pass changes in the core very simple, and only once.
How to do this, or this a completely wrong way? If so, what is the right way? Thing I need to look out for? Is it secure (without paying thousands for a https connection)?
I have completely no idea how to start or were to begin, and couldn't find anything helpful (maybe wrong search) so everything is helpful!
Thanks in advance!
Note: My application is build using the Zend Framework
You can't load the required files from remote on runtime (or really don't want to ;). This problem goes down to a proper release & configuration managment where you update all of your servers. But this can mostly be done automatically.
Depending on how much time you want to spend on this mechanism there are some things you have to be aware of. The general idea is, that you have one central server which holds the releases and have all other servers check if for updates, download and install them. There are lot's of possibilities like svn, archives, ... and the check/update can be done manually at the frontend or by crons in the background. Usually you'll update all changed files except the config files and the database as they can't be replaced but have to be modified in a certain way (this is the place where update scripts come into place).
This could look like this:
Cronjob is running on the server which checks for updates via svn
If there is a new revision it'll do a svn-update
This is an very easy to implement mechansim which holds some drawbacks like you can't change the config-files and database. Well infact it'd be possible but quite difficult to achieve.
Maybe this could be easier with a archive-based solution:
Cronjob checks updateserver for a new version. This could be done by reading the contents of a file on the update-server and compare it to a local copy
If there is a new version, download the related archive
Unpack the archive and copy the files
With that approach you might be able to include update-scripts into updates to modify configs/databases.
Automatic updatedistribution is a very very complex topic and that are only two very simple approaches. There are probably very many different solutions out there and 'selecting' the right one is not an easy task (it does even get more complex if you have different versions of a product with dependencies :) and there is no "this is the way it has to be done".

Is there any form of Version Control for LSL?

Is there any form of version control for Linden Scripting Language?
I can't see it being worth putting all the effort into programming something in Second Life if when a database goes down over there I lose all of my hard work.
Unfortunately there is no source control in-world. I would agree with giggy. I am currently moving my projects over to a Subversion (SVN) system to get them under control. Really should have done this a while ago.
There are many free & paid SVN services available on the net.
Just two free examples:
http://www.sourceforge.net
http://code.google.com
You also have the option to set one up locally so you have more control over it.
Do a search on here for 'subversion' or 'svn' to learn more about how to set one up.
[edit 5/18/09]
You added in a comment you want to backup entire objects. There are various programs to do that. One I came across in a quick Google search was: Second Inventory
I cannot recommend this or any other program as I have not used them. But that should give you a start.
[/edit]
-cb
You can use Meerkat viewer to backupt complete objects. or use some of the test programas of libopenmetaverse to backup in a text environment. I think you can backup scripts from the inventory with them.
Jon Brouchoud, an architect working in SL, developed an in-world collaborative versioning system called Wikitree. It's a visual SVN without the delta-differencing that occurs in typical source code control systems. He announced that it was being open sourced in http://archvirtual.com/2009/10/28/wiki-tree-goes-open-source/#.VQRqDeEyhzM
Check out the video in the blog post to see how it's used.
Can you save it to a file? If so then you can use just about anything, SVN, Git, VSS...
There is no good source control in game. I keep meticulous version information on the names of my scripts and I have a pile of old versions of things in folders.
I keep my source out of game for the most part and use SVN. LSLEditor is a decent app for working with the scripts and if you create a solution with objects, it can emulate alot of the in game environment. (Giving Objects, reading notecards etc.) link text
I personally keep any code snippets that I feel are worth keeping around on github.com (http://github.com/cylence/slscripts).
Git is a very good source code manager for LSL since its commits work line-by-line, unlike other SCM's such as Subversion or CVS. The reason this is so crucial is due to the fact that most Second Life scripts live in ONE FILE (since they can't call each other... grrr). So having the comparison done on the file level is not nearly as effective. Comparing line by line is perfect for LSL. With that said, it also (alike SourceForge and Google Code) allows you to make your code publicly viewable (if you so choose) and available for download in a compressed file for easier distribution.
Late reply, I know, but some things have changed in SecondLife, and some things, well, have not. Since the Third Party Viewer policy still keeps a hard wall up against saving and loading objects between viewer and system, I was thinking about another possibility so far completely overlooked: Bots!
Scripted agents, AKA Bots, have all usual avatar actions available to them. Although I have never seen one used as an object repository, there is no reason you couldn't create one. Logged in as a separate account the agent can be wherever you want automatically or by command, then collect any or all objects you are working on at set intervals or by command, and anything they have collected may be given to you or collaborators.
I won't say it's easy to script an agent, and couldn't even speak for making an extension to a scripted agent myself, but if you don't want to start from scratch there is an extensive open source framework to build on, Corrade. Other bot services don't seem to list 'object repository' among their abilities either but any that support CasperVend must already provide the ability to receive items on request.
Of course the lo-fi route, just regularly taking a copy and sending the objects to a backup avatar, may still be a simple backup solution for one user. Although that does necessitate logging in as the other account either in parallel or once every 20 or-so items to be sure they are being received and not capped by the server. This process cannot rename the items or sort them automatically like a bot may. Identically named items are listed in inventory as most recent at the top but this is a mess when working with multiples of various items.
Finally, there is a Coalesce feature for managing several items as one in inventory. This is currently not supported for sending or receiving objects, but in the absence of a bot, can make it easier to keep track of projects you don't wish to actually link as one item. (Caveat; don't rezz 'no-copy' coalesced items near 'no-build' land parcels, any that cannot be rezzed are completely lost)

Which is the faster way to interact with SourceSafe? Command line or object model?

Our project is held in a SourceSafe database. We have an automated build, which runs every evening on a dedicated build machine. As part of our build process, we get the source and associated data for the installation from SourceSafe. This can take quite some time and makes up the bulk of the build process (which is otherwise dominated by the creation of installation files).
Currently, we use the command line tool, ss.exe, to interact with SourceSafe. The commands we use are for a recursive get of the project source and data, checkout of version files, check-in of updated version files, and labeling. However, I know that SourceSafe also supports an object model.
Does anyone have any experience with this object model?
Does it provide any advantages over using the command line tool that might be useful in our process?
Are there any disadvantages?
Would we gain any performance increase from using the object model over the command line?
I should imagine the command line is implemented internally with the same code as you'd find in the object model, so unless there's a large amount of startup required, it shouldn't make much of a difference.
The cost of rewriting to use the object model is probably more than would be saved in just leaving it go as it is. Unless you have a definite problem with the time taken, I doubt this will be much of a solution for you.
You could investigate shadow directories so the latest version is always available, so you don't have to perform a 'getlatest' every time, and you could ensure that you're talking to a local VSS (as all commands are performed directly on the filesystem, so WAN operations are tremendously expensive).
Otherwise, you're stuck unless you'd like to go with a different SCM (and I recommend SVN - there's an excellent converter available on codeplex for it, with example code showing how to use the VSS ans SVN object models)
VSS uses a mounted file system to share the database. When you get a file from SourceSafe it works at the file system level which means that instead of just sending you the file it send you all the blocks of the disk to find the file and the file. This adds up to a lot more transactions and extra data.
When using VSS over a remote or slow connection or with huge projects it can be pretty much unusable.
There is a product which amongst other things improves the speed of VSS by ~12 times when used over a network. It does this by implementing a client server protocol. This additionally can be encripted which is useful when using VSS over the internet.
I don't work or have any connection with them I just used it in a previous company.
see SourceOffSite at www.sourcegear.com.
In answer to the only part of your question which seems to have any substance - no switching to the object model will not be any quicker as the "slowness" is coming from the protocol used for sharing the files between VSS and the database - see my other answer.
The product I mentioned works along side VSS to address the problem you have. You still use VSS and ahev to have licences to use it... it just speeds it up where you need it.
Not sure why you marked me down?!
We've since upgraded our source control to Team Foundation Server. When we were using VSS, I noticed the same thing in the CruiseControl.Net build logs (caveat: I never researched what CC uses; I'm assuming the command line).
Based on my experience, I would say the problem is VSS. Our TFS is located over 1000 miles away and gets are faster than when the servers were separated by about 6 feet of ethernet cables.
Edit: To put on my business hat, if you add up the time spent waiting for builds + the time spent trying to speed them up may be enough to warrant upgrading or the VSS add-on mentioned in another post (already +1'd it). I wouldn't spend much of your time building a solution on VSS.
I betting running the Object Model will be slower by at least 2 hours.... ;-)
How is the command line tool used? You're not by chance calling the tool once per file?
It doesn't sound like it ('recursive get' pretty much implies you're not), but I thought I'd throw this thought in. Others may have similar problems to yours, and this seems frighteningly common with source control systems.
ClearCase at one client performed like a complete dog because the client's backend scripts did this. Each command line call created a connection, authenticated the user, got a file, and closed the connection. Tens of thousands of times. Oh, the dangers of a command line interface and a little bit of Perl.
With the API, you're very likely to properly hold the session open between actions.