Is there a way to binary patch WAR files? - diff

My web application is getting bigger and bigger, now is 25Mb. I have to upload it everytime and takes a while with my DSL. I was thinking of using a binary patch system, but I can't find a good one. Requirements:
Work on Linux and Windows
(Desired) be available on Amazon Ec2 linux via yum
Easy to integrate in scripts
Suggestions? Alternative ways of doing this?

Considering that a war file is nothing more than a zip file, my guess is that tiny little changes to one file could potentially modify the whole binary (such is the nature of compression) so binary patching really doesn't make sense since you might be sending the whole thing each time anyway.
Instead may I suggest that you simply explode the war file and use something like rsync to keep the contents up to date? I'd think this would be less of a headache while accomplishing the same thing.

For Linux: bsdiff/bspatch
The bsdiff package is also available under cygwin.

Related

Creating RPM from directory of files

I just want a simple way to create an RPM from a directory full of files. I can't seem to find any simple way of doing this online?
I know that tools such as fpm exist for doing this, but I'd like to understand the RPM build process a little so would rather not use that.
The closest I've found is:
https://www.suse.com/communities/conversations/building-simple-rpms-arbitary-files/
but I will be installing 100's of files - I don't really want to write an install command for each of them.
Any pointers appreciated.
Take a look at Jordan Sissel's FPM:
https://github.com/jordansissel/fpm
It's magic for turning one thing in to the other, package to compressed directory and back.
The rpmerizor (http://rpmerizor.sourceforge.net/) may be of interest. The web page also lists other tools (including fpm) that do this kind of thing.

perl: new cpan module maker? local configuration text files and executables, too?

I am writing a perl program that I want to share with others, eventually via cpan. it's getting to the point where I should start thinking about this on a bigger scale.
a decade ago, I used the h2xs package maker once. is this still the most recommended way to get started? there used to be a couple of alternatives. because I am starting from scratch with very little recollection, anything simple will do at this point.
I need to read a few long text files (not perl modules) for configuration. where do I put them and how do I access them, no matter where the module is installed? (FindBin?) _DATA_ is inconvenient.
I need to provide an executable (linux and osx). can putting an executable into the user's path be part of the module installation? (how?)
I would like to be able to continue developing it, run it for test purposes, have a new version, repack it, and reupload it easily.
before uploading to cpan, can I share a cpan bundle for easy local installation to downloaders and testers?
# cpan < mybundle.cpanbundle
advice appreciated.
regards,
/iaw
If anything I say conflicts with Andy Lester, listen to him instead. He knows more than I ever will.
Module::Starter is a good, simple way to generate module scaffolding. My take is it's been the default for this sort of thing for a few years now.
For configuration/support files, I think you probably want File::ShareDir. Might be worth considering Data::Section if it's just a matter of needing multiple __DATA__ sections though.
You can certainly put scripts in the bin subdirectory of your distribution, the build tool will put it in the right place at install time.
A build tool will take care of the work-flow you describe.
Bundles are something different. You make a distribution and share the tarball/archive.
If you set up PERL5LIB appropriately, then repeat make test, make install, make dist to your heart's content. For development/sharing purposes a lot of projects do their work on github or similar - makes it easy to share. They have private accounts for business purposes too. Very useful if you want to rewind and see where/when a problem was introduced.
If you get a copy of cpanm (simple to install, fairly lightweight) then it can install from a tar.gz file or even direct from a git repository. You can also tell it to install to a local dir (local::lib compatible - another utility that's very useful).
Hopefully that's reasonably up-to-date as of 2014. You may see Dist::Zilla mentioned for module development. My understanding is that it's most useful for those with a large family of CPAN distributions to manage. Oh - if you (or other readers) aren't aware of them, do check out autodie and Try::Tiny around errors and exceptions, Moose (for a full-featured object-oriented framework) and Moo (for a smaller lightweight version).
I think that advice is all reasonably non-controversial. I find cpanm to be much more pleasant than the "full" cpan client, and Moo seems pretty popular nowadays too.
Take a look at Module::Starter and its much more capable (and complex) successor Dist::Zilla.
Whatever you do, don't use h2xs. Module::Starter was created specifically because h2xs was such an inappropriate tool for creating distributions.

Should I put included code under SCM?

I'm developing a web app.
If I include a jQuery plugin (or the jQuery file itself), this has to be put under my static directory, which is under SCM, to be served correctly.
Should I gitignore it, or add it, even if I don't plan on modifying anything from it?
And what about binary files (graphic resources) that might come with it?
Thanks in advance for any advice!
My view is that everything you need for your application to run correctly needs to be managed. This includes third-party code.
If you don't put it under SCM, how is it going to get deployed correctly on your production systems? If you have other ways of ensuring that, that's fine, but otherwise you run the risk that successful deployment is a matter of people remembering to do all the right things, rather than some automated low-risk "push the button" procedure.
If you don't manage it under SCM or something similar, how do you ensure that the versions you develop against and test against are the same? And that they're the same as production? Debugging an issue caused by a version difference you don't notice can be horrible.
I generally add external resources to my project directly. Doing so facilitates deployment and ensures that if someone changes the version of this file in your project, you have a clear audit history of what happened in case it causes issues in the code that you've written. Developers should know not to modify these external resources.
You could use something like git submodules, I suppose, but I haven't felt that this is worth the hassle in the past.
Binary files from external sources can be checked in to the project as well, although if they're extremely large you may want to consider a different approach.
There aren't a lot of reasons not to put external resources like jQuery into your repo:
If you pull it down from jQuery every time you check out or deploy, you have less control over which version you're using. This holds true for most third-party libraries; you probably don't want to upgrade your libraries without testing with your code to see if it breaks something.
You'll always have a complete copy of your site when you check out your repository and you won't need to go seeking resources that may have become unavailable.
For small (in terms of filesize) things like jQuery and images, I'd just add them unless you're really, really concerned about space.
It depends.
These arguments relate to having a copy of the library on your system and not pulling it from it's original location.
Arguments in favour:
It will ensure that everything needed for your project can be found in one place when someone else joins your development team. I've lost count of the number of times I've had to scramble around looking for the right versions of libraries in order to be able to get something working.
If you make any modifications to the library you can make these changes to the source controlled version so when a new version comes out you use the source control's merging tools to ensure your edits don't go missing.
Arguments against:
It could mean everyone has a copy of the library locally - unless you map the 3rd party tools to a central server.
Deploying could be problematical - again unless you map the 3rd party tools to a central server and don't include them in the deploy script.

How can I create a Zip archive in Perl?

I need to create a Zip archive after filtering the list of files I want to include. Preferably I'd like the module to work in both Windows and Linux.
Since I need to filter the list of files, I don't really want to to use an external program. I'd rather not introduce external dependencies either so I can compile the script into a single executable on Windows (using ActiveState PDK).
What I already tried
Until now I've used Archive::Zip found on CPAN but it has a major bug on Windows machine that use non-ASCII filenames: the filenames get corrupted in the archive as they don't get translated into unicode.
There is a bug report filed for that but it hasn't been updated in over 10 months and in the module documentation the developer is rather unhelpful (of the "fix your computer or get rid of Windows" kind).
Update:
Thanks to the clarifications from brian and Alan Haggai Alavi it seems that enough love is being put in Archive::Zip to get these bugs out soon and finally have a fully functioning zip module in Windows.
Although the module documentation says some stupid things about Windows, the current maintainer is Adam Kennedy, the same guy who brought you Strawberry Perl. He's definitely not anti-Windows. He released a version October, so they are working on it. There's also an open grant from The Perl Foundation to fix Archive::Extract bugs.The bug you mention, RT 35334: Filename Encoding by Archive::Zip, maybe just needs someone to show it some love. That could be you. People solve the problems that bother them, so maybe nobody interested in the module needs this just yet.
The module has had problems, and I've been following its progress since I use it in a couple projects. It has gotten a lot better recently and can certainly use some love. Sometimes open source means helping to fix the problems that you encounter. I know this doesn't help you solve your problem immediately, but that's how I think you're going to get this done aside from system() calls.
The above said bug has been solved very lately by the addition of Unicode filename support under Windows. A release featuring the fix will be available in CPAN within a week.
You could try the standard-distribution Archive::Extract. It may not be any better than Archive::Zip, but the documentation says that, if there are problems, it goes under the hood to try to use command-line tools on your system to unzip the file. This is probably most robust on Unix, but Windows has a zip archive utility, and it should be accessible via the command line. Plus, Archive::Extract can handle many other types of compression (theoretically).
Of course, it may turn out that Archive::Extract simply figures out what kind of compression the file uses and then passes it to the appropriate other library, which might be Archive::Zip.
You might also try IO::Uncompress::Unzip and it's counterpart, IO::Compress::Zip, for just unzipping, reading, and rezipping. If absolutely necessary. Again, I don't know how much better these will work, but they are all part of the standard library.

Version control of deliverables

We need to regularly synchronize many dozens of binary files (project executables and DLLs) between many developers at several different locations, so that every developer has an up to date environment to build and test at. Due to nature of the project, updates must be done often and on-demand (overnight updates are not sufficient). This is not pretty, but we are stuck with it for a time.
We settled on using a regular version (source) control system: put everything into it as binary files, get-latest before testing and check-in updated DLL after testing.
It works fine, but a version control client has a lot of features which don't make sense for us and people occasionally get confused.
Are there any tools better suited for the task? Or may be a completely different approach?
Update:
I need to clarify that it's not a tightly integrated project - more like extensible system with a heap of "plugins", including thrid-party ones. We need to make sure those modules-plugins works nicely with recent versions of each other and the core. Centralised build as was suggested was considered initially, but it's not an option.
I'd probably take a look at rsync.
Just create a .CMD file that contains the call to rsync with all the correct parameters and let people call that. rsync is very smart in deciding what part of files need to be transferred, so it'll be very fast even when large files are involved.
What rsync doesn't do though is conflict resolution (or even detection), but in the scenario you described it's more like reading from a central place which is what rsync is designed to handle.
Another option is unison
You should look into continuous integration and having some kind of centralised build process. I can only imagine the kind of hell you're going through with your current approach.
Obviously that doesn't help with the keeping your local files in sync, but I think you have bigger problems with your process.
Building the project should be a centralized process in order to allow for better control soon your solution will be caos in the long run. Anyway here is what I'd do.
Create the usual repositories for
source files, resources,
documentation, etc for each project.
Create a repository for resources.
There will be the latest binary
versions for each project as well as
any required resources, files, etc.
Keep a good folder structure for
each project so developers can
"reference" the files directly.
Create a repository for final buidls
which will hold the actual stable
release. This will get the stable
files, done in an automatic way (if
possible) from the checked in
sources. This will hold the real
product, the real version for
integration testing and so on.
While far from being perfect you'll be able to define well established protocols. Check in your latest dll here, generate the "real" versiĆ³n from latest source here.
What about embedding a 'what' string in the executables and libraries. Then you can synchronise the desired list of versions with a manifest.
We tend to use CVS id strings as a part of the what string.
const char cvsid[] = "#(#)INETOPS_filter_ip_$Revision: 1.9 $";
Entering the command
what filter_ip | grep INETOPS
returns
INETOPS_filter_ip_$Revision: 1.9 $
We do this for all deliverables so we can see if the versions in a bundle of libraries and executables match the list in a associated manifest.
HTH.
cheers,
Rob
Subversion handles binary files really well, is pretty fast, and scriptable. VisualSVN and TortoiseSVN make dealing with Subversion very easy too.
You could set up a folder that's checked out from Subversion with all your binary files (that all developers can push and update to) then just type "svn update" at the command line, or use TortoiseSVN: right click on the folder, click "SVN Update" and it'll update all the files and tell you what's changed.