Version control of uploaded images to file system - version-control

After reading Storing Images in DB - Yea or Nay? I think that the file system is the right place for storing images. But I would like to know how you handle backup/version control of uploaded images in your different environments (dev/stage/prod) and for network load balancing?
These problems is pretty easy to handle when working with a database e.g. to make a backup from the production environment and restore the DB in the development environment.
What do you think of using for example git to handle version control of uploaded files e.g?
Production Environment:
A image is uploaded to a shared folder at the web server.
Meta data is stored in the database
The image is automatically added to a git repository
Developer at work:
Checks out the source code.
Runs a script to restore the database.
Runs a script to get the the latest images.
I think the solution above is pretty smooth for the developer, the images will be under version control and the environments can be isolated from each other.

For us, the version control isn't as important as the distribution. Meta data is added via the web admin and the images are dropped on the admin server. Rsync scripts push those out to the cluster that serves prod images. For dev/test, we just rsync from prod master server back to the dev server.
The rsync is great for load balancing and distribution. If you sub in git for the admin/master server, you have a pretty good solution.
If you're OK with backup that preserves file history at the time of backup (as opposed to version control with every revision), then some adaption of this may help:
Automated Snapshot-style backups with rsync.

It can work, but I would store those images in a git repository which would then be a submodule of the git repo with the source code.
That way, a strong relationship exists between the code and and images, even though the images are in their own repo.
Plus, it avoids issues with git gc or git prune being less efficient with large number of binary files: if images are in their own repo, and with few variations for each of them, the maintenance on that repo is fairly light. Whereas the source code repo can evolve much more dynamically, with the usual git maintenance commands in play.

Related

git hub and coding processes

Ive been programming for a little while now and have built a little application which is now hosted on a dedicated server.
Now i have been rolling out different versions of my app with no real understanding on how to manage the process properly.
Is this the proper way to manage a build of an application when using a product like git hub ?
Upload my entire application onto github.
Each time i work on it, download it and install it on my dev server.
When im done working on it and it appears to be ok, do i then upload the changed files with the current project i am working on or am i meant to update the entire lot or am i mean to create a new version of the project?
once all my changes are updated, is there anyway of pushing these to a production machine from git hub or generating a listing of the newly changed files so i can update production machine easily with a checklist of some kind ?
My application has about 900 files associated with it and is stored in various folder structures and is a server based app (coldfusion to be precise) and as i work alone majority of the time, im struggling to understand how to manage the development of an app...
I also have no idea on using the command line and my desktop machine is a mac, with a VM running all my required server apps (windows server 2012, MSSQL 2012 etc)
I really want to make sure i can keep my dev process in order, but ive struggled with how to understand how to manage a server side apps development when im using a mac my dev machine is a windows machine i feel like im stuck in the middle.
You make it sound more complicated than it is.
Upload my entire application onto github.
Well, this is actually 2 steps: First, create a local git repo (git init), then push your repo up to github.
Each time i work on it, download it and install it on my dev server.
Well, you only need to "download" it once to a new dev box. After that, just git pull (or git fetch depending on workflow), which ensures any changes on the server are pulled down. Just the deltas are sent.
Git is a distributed version control system. That means every git repo has the full history of the entire project. So only deltas need to be sent. (This really helps when multiple people are hacking on a project).
When im done working on it and it appears to be ok, do i then upload the changed files with the current project i am working on or am i meant to update the entire lot or am i mean to create a new version of the project?
Hmm, you are using fuzzy terminology here. When you are done editing, you first commit locally (git add ...; git commit), then you push the changes to github (git push). Only the deltas are sent. Every commit is "a new version" if you squint.
Later on, if you want to think in terms of "software releases" (i.e. releasing "version 1.1" after many commits), you can use git tags. But don't worry about that right away.
once all my changes are updated, is there anyway of pushing these to a production machine from git hub or
generating a listing of the newly changed files so i can update production machine easily with a checklist of some kind ?
Never manually mess around with files manually on your server. The server should ONLY be allowed to run a valid, checked-out version of your software. If your production server is running random bits of code, nobody will be able to reproduce problems because they aren't in the version control system.
The super-simple way to deploy is to do a git clone on your server (one time), then git pull to update the code. So you push a change to github, then pull the change from your server.
More advanced, you will want something like capistrano that will manage the checkouts for you, and break up "checking out" from "deploying" to allow for easier rollback, etc. There may be windows-specific ways of doing that too. (Sorry, I'm a Linux guy.)

How to work on a large number of remote files with PHPStorm

I have a small Debian VPS-box on which I host and develop a few small, private PHP websites.
I develop on a Windows desktop with PHPStorm.
Most of my projects only have a few dozen source files but also contain a few thousand lib files.
I don't want to run a webserver on my local machine because this creates a whole set of problems, I don't want to be bothered with for such small projects (e.g. setting up another webserversynching files between my Desktop and the VPS-box; managing different configurations for Windows and Debian (different hosts, paths...); keeping db schema and data in synch).
I am looking for a good way to work with PHPStorm on a large amount of remote files.
My approaches so far:
Mounting the remote file system in Windows (tried via pptp/smb, ftp, webdav) and working on it with PHPStorm as if it were local files.
=> Indexing, synching, and PHPStorms VCS-support became unusably slow. This is probably due to the high latency for file access.
PHPStorm offers the possibility to automatically copy the remote files to the local machine and then synching them when changes are made.
=> After the initial copying, this is fast. Unfortunately, with this setup, PHPStorm is unable to provide VCS support, which I use heavily.
Any ideas on this are greatly appreciated :)
I use PhpStorm in a very similar setup as your second approach (local copies, automatic synced changes) AND importantly VCS support.
Ideal; Easiest In my experience the easiest solution is to checkout/clone your VCS branch on your local machine and use your remote file system as a staging platform which remains ignorant of VCS; a plain file system.
Real World; Remote VCS Required If however (as in my case) it is necessary to have VCS on each system; perhaps your remote environment is the standard for your shop or your shop's proprietary review/build tools are platform specific. Then a slightly different remote setup is required, however treating your remote system as staging is still the best approach.
Example: Perforce - centralized VCS (client work-space)
In my experience work-space based VCS systems (e.g. Perforce) can be handled best by sharing the same client work-space between local and remote systems, which has the benefit of VCS file status changes having to be applied only once. The disadvantage is that file system changes on the remote system typically must be handled manually. In my case I manually chmod (or OS equivalent) my remote files and wash my hands (problem solved). The alternative (dual work-space) approach requires more moving parts, which I do not advice.
Example: Git - distributed VCS
The easier approach is certainly Git which has it's wonderful magic of detecting file changes without file permissions being directly coupled to the VCS. This makes life easy as you can simply start with a common working branch and create two separate branches "my-feature" and "my-feature-remote-proxy" for example. Once you decide to merge your changes upstream, you do so (ideally) from your local environment. The remote proxy branch could be reverted or whatever you want. NOTE: in the case of Git I always have two branches because it's easy. And when you hard drive melts in a freak lighting strike you have extra redundancy :|
Hope this helps.

DVCS, Databases, and User Generated Content?

I want to create a development environment with my central repository hosted somewhere like bitbucket/github. Then on my dev server and my production server I will have clones.
I will work on new features and make local commits on the dev server. Once this is at a stage that it can be pushed to production, I will push from the development clone to the central repository, then pull from the central repo to the production server.
All this makes sense, but there are 2 parts I cannot figure out.
How to keep the data-base and user-generated content (file uploads, etc.) in sync?
Also, will user generated content get wiped out when I do my next pull+update on the production server?
How do others address this?
Additional info:
This is going to be a MySQL/PHP website. I am also planing on using a mvc framework (probably cake) and I haven't firmly decided which DVCS to use but so far Mercurial is what I am thinking. Not sure if this info matters but adding just in case.
That is why a DVCS is not always the right tool for release management: once your code is on the server remote repo, you should have another "rsync" mechanism to:
extract the right tag (the one to put into prod)
transform/copy the right files
leave intact other set of files/database.

Drupal 6: using bitbucket.org for my Drupal projects as a real version control system dummy

Here is a real version control system dummy! proper new starter!
The way I have worked so far:
I have a Drupal-6 web project www.blabla.com and making development under www.blabla.com/beta . I'm directly working on blabla.com/beta on server. nothing at my local, nothing at anywhere else. Only taking backup to local, time to time. I know horrible and not safe way :/
The new way I want to work from now on:
I decided to use Mercurial. I have one more developer to work on same project with me. I have a blabla.com Drupal-6 project on bluehost and making development blabla.com/beta. I found out http://bitbucket.org/ for mercurial hosting. I have created an account.
So now how do I set up things? I'm totally confused after reading tens of article :/
bitbucket is only for hosting revised files? so if I or my developer friend edit index.php, bitbucket will host only index.php?
from now on do I have to work at localhost and upload the changes to blueshost? no more editing directly at blabla.com/beta? or can I still work on bluehost maybe under blabla.com/beta2?
When I need to edit any file, do I first download update from bitbucket, I make my change at localhost, update bitbucket for edited files, and uploading to bluehost?
Sorry for silly questions, I really need a guidance...
Appreciate helps so much! thanks a lot!
bitbucket is only for hosting revised files?
The main service of bitbucket is to host files under revision control, but there is also a way to store arbitrary files there.
so if I or my developer friend edit index.php, bitbucket will host only index.php?
I a typical project every file which belongs to the product is cheked into revision control, not only index.php. see this example
from now on do I have to work at localhost and upload the changes to blueshost? no more editing directly at blabla.com/beta? or can I still work on bluehost maybe under blabla.com/beta2?
Mercurial does not dictate a fix workflow. But I recommend that you have mercurial installed where you edit the files. For example then you can see direct which changes you did since the last commit, without to need to copy the files from your server to your local repository.
I absolutely recommend a workflow where somewhere in the repository is a script which generates the archive file which is transmitted to the server, containing the revision of the repository when the archive got created. This revision information should also be somewhere stored on the server (not necessarily in a public accessible area), since this information can get very handy when something went wrong.
When I need to edit any file, do I first download update from bitbucket, I make my change at localhost, update bitbucket for edited files, and uploading to bluehost?
There are several different approaches to get the data to the server:
export the local repo into an archive and transmit this onto the server (hg archive production.tar.bz2), this is the most secure variant, since it does not depend on any extra software on the server. Also depending on how big the archive is this approach can waste lots of bandwidth.
work on the server and copy changed files back, but I don't recommend this since is is very easy to miss something important
install mercurial on the server, work in a working copy there and hg export locally there into the production area
install mercurial on the server and hg fetch from bitbucket(or any other server-accessible repository)
install mercurial on the server and hg push from your local working copy to the server (and hg update on the server afterwards)
The last two points can expose the repository to the public. This exposition can be both good and bad, depending on what your repository contains, and if you want to share the content. When you want to share the content, or you can limit the access to www.blabla.com/beta/.hg, you can clone directly from your web server.
Also note that you should not check in any files with passwords or critical secrets, even when you access-limit the repository. It is much more save to check in template files (with a different name than in production), and copy-and-edit these files on the server.

best practice for backing up cvs repository?

Some of our projects are still on cvs. We currently use tar to backup the repository nightly.
Here's the question:
best practice for backing up a cvs repository?
Context: We're combining a several servers across the country onto one central server. The combined repsitory size is 14gb. (yes this is high, most likely due to lots of binary files, many branches, and the age of the repositories).
A 'straight tar' of the cvs repository yields ~5gb .tar.gz file. Restoring files from 5gb tar files will be unwieldy. Plus we fill up tapes quickly.
How well does a full-and-incremental backup approach, i.e. weekly full backup, nightly incremental backups? What open source tools solve this problem well? (e.g. Amanda, Bacula).
thanks,
bill
You can use rsync to create backup copy of your repo on another machine if you don't need history of backups. rsync works in incremental mode, so bandwidth will be consumed only for sending changed files.
I don't think that you need full history of backups as VCS provides its own history management and you need backups ONLY as failure-protection measure.
Moreover, if you worry about consistent state of backed up repository you MAY want to use filesystem snapshots, e.g. LVM can produce them on Linux. As far as I know, ZFS from Solaris also has snapshots feature.
You don't need snapshots if and only if you run backup procedure deeply at night when noone touches your repo and your VCS daemon is stopped during backup :-)
As Darkk mentioned rsync makes for good backups since only charged things are copied. Dirvish is a nice backup system based on rsync. Backups run quickly. Restores are extremely simple since all you have to do is copy things. Multiple versions of the backups are store efficiently stored.