How to create VCS-like conflict-merging file? - merge

I'm trying to generate "unresolved-conflict"-like files with no luck.
I checked diff manpage and googled about diff, merging etc... but I only found information about how to handle these files, but not about how to actually generate them.
To be clear, what I am trying to do is, having two similar files, generate single automatically merged one similar to that most VCS systems like Git or Subversion generate over files in "conflict" status.
The main goal is be able to rapidly edit it to manually resolve all differences just as I do in Git or Subversion but without having them in any VCS system.
I "almost" successfully generated full diffs with diff -C 1000000command (because I won't have too large files that context limit is pretty acceptable).
...but resulting file comes with ALL rows modified. That is: prepended by "-" or "+" (depending of if it comes from first or second file) or " " (space) for common rows.
What I would obtain is an "almost unchanged" file with sections like following example emphasizing differences:
<<<<<<<< File1
Section from File1
Foo
========
Section from File2
Bar
>>>>>>>> File2
EDIT:
Answering #s.m. comment, I explain here what is my exact goal (because it is too long to explain in a comment):
I'm working on a server to allocate multiple PostgreSQL clusters acting as hot-stanby of distinct masters.
I already successfully implemented binary full/incremental backups (bacula) over production servers and also have a helper script to configure hot-standby servers.
But nowadays we have to setup (and mantain -and ideally periodically check-) all of them one by one.
To make it simpler, we are planning to create single (or possibly multiple) "Super"-hot-standby server(s) containing multiple clusters replicating different master servers.
My goal is to have a single script to create new standby cluster easily without too complicated tunning and not having to bother about backup setup (because all clusters will be backed up at once).
I almost successfully implemented that script: It creates a new cluster in a free port, adjust needed configuration parameters and put it in sync with master.
These adjustmens are made over "default" configuration files but some masters may have special configuration parameters (specially memory adjustments) that must be replicated in standby because, otherwise, it could be unable to replicate some operations of the master). And there is too the pg_haba.conf which defines which users/servers are allowed to connect to, which we also want to replicate on standby (for an eventual failover).
So, to make it easier (and less error prone) to merge both configuration files, I implemented a bash function to retrieve configuration files from masters and, now, my goal is to merge it with forementioned "default-tuned" one.
This way, adding new standby would be as easy as executing our script providing master's network name and reviewing automatically merged files to manually solve the few differences encountered in merge.
EDIT 2:
To be clear, what I were trying to do in preference order is:
Approach it by just using GNU Diff (like #s.m. pointed in his comment) even by using complex arguments or piping to external tools usually available on most unix* systems so I can wrap it in a bash function and use it in my script without no dependencies.
Use some existing tool (but not reinvent the wheel).
Implement my own tool and use it.

Without better solution, I finally tried to implement my own tool (which I called 'humandiff') to approach it.
I published it in Github and uploaded as npm package so I can now install it from npm in producion servers.
Even thought it needs a little setup to be installed. That is:
Install NodeJS and NPM (sudo apt-get install nodejs-legacy npm in debian-like systems).
Install humandiff itself (sudo npm install -g humandiff).
Usage and output examples can be found in README file so I do'nt extend myself anymore.
I post this answer just in case someone happens to have the same problem but, anyway, better solutions would be welcomed too.
Edit: I missed to say, even it's pretty obvious, that in fact I didn't implement any diff algorithm at all. I just noticed that having position and offset metatata provided by GNU diff and one of the original files is possible to construct the other or that merged file a were searching for so I simply implemented a wrapper to de so. But, instead of calling GNU Diff binary, I found an also named "diff" module in the NPM repository that served to me for the same task.

Related

Need to SVN export changed projects

I maintain an SVN repository that has code dating back 9 years. This repo contains code for hundreds of applications in it, many of which are obsolete. The back-end is VisualSVN Server running on a Windows server and the client used is TortoiseSVN with CLI tools.
We are now prepping to migrate to a different SCM application and I need to export only the applications which have been updated in since January 1, 2021 (just under 40K revisions). If I only needed to export simply the artifacts that had been changed since that date, it would be simple. But instead, I need to export any project that contains any file (no matter how far down the tree) that has been created or modified since that date.
Note that while tags created after the cut-off date will be migrated (to make audits simpler), these should not be considered when deciding what is active or not.
I have no qualms making this a multi-step process (identifying active project root folders and putting them into a changelist, then exporting just those folders, but I'd prefer a single command solution if I can manage it. While Unix OS (like grep) commands are potentially possible, I'd prefer to stick to Windows OS commands (like findstr) if at all possible.
Things I want to avoid at all costs:
checking out the entire repo
considering anything in any of the thousands of /tags/ folders (active branches are fine)
I had started out trying
svn diff --summarize -r{2021-01-01}:HEAD https://Url.To/Repo/
but after it had been running a while, I realized that it was trying to look at all of the /tags/ and would likely end up taking a week or more to finish.
I've also looked at ls -r -verbose and log commands but neither seems to give me what I need.
Can anyone help come up with either the single or the two-step (or more) commands? I don't mind it taking several hours to run (or even overnight), but when we do the final migration, I can't take the SCM offline for more than one weekend, including the import on the other side. As a worst case, I can search the repo for all instances of /trunk/ (this would be the root of each project), and then loop through that output. Is there a cleaner/more elegant way to do this?

Firebird minimum server installation

I have installed Firebird server from the zip kit using instsvc.exe. The work's done well with the inno setup Exec function.
instsvc install -auto -name 'FireBird2_5'
My question is what are the minimum files necessary to install Firebird server.
The installer is too slow due to unnecessary files, I found this link and I'm looking for something similar.
The total size of Firebird 2.5.8 is 230 files and +/- 30MB unzipped, I doubt this would really be a problem, but if you really want to minimize things, you can remove the following.
Using Firebird-2.5.8.27089-0_x64.zip as the basis, you can get rid of the following files or folders because they are just examples and documentation, or files for specific purposes (if you know you need them, don't delete them):
doc
examples
help
include
lib
misc
system32
udf (most have been replaced by built-in functions anyway)
Readme.txt
In theory you can remove the intl folder, but that will severely limit character set support in Firebird which can cause a lot of problems, so I'd advise against that.
If I'm not mistaken it should also be possible to remove plugin\fbtrace.dll and fbtrace.conf, but you may want to double check that.
From the bin folder, you can get rid of the following files:
fbguard.exe (make sure you don't enable use of Firebird Guardian using instsvc)
gdef.exe (tool for deprecated GDL DDL language)
gpre.exe (preprocessor for compiling embedded SQL, unlikely you need this)
gsplit.exe (tool for splitting backup files)
install_classic.bat
install_super.bat
install_superclassic.bat
qli.exe (tool for a deprecated query language)
uninstall.bat
If you don't need the administrative tools (but this might not be a good idea because management, and fixing or diagnosing database problems gets harder), you can also remove from bin:
fb_lock_print.exe
fbsvmgr.exe
fbtracemgr.exe
gbak.exe
gfix.exe
gsec.exe
gstat.exe
isql.exe
nbackup.exe
In theory you could also get rid of fb_inet_server.exe or fbserver.exe, depending on whether you use Classic, SuperServer or SuperClassic. Classic and SuperClassic use fb_inet_server.exe and SuperServer fbserver.exe; you can delete the other.
The other files are either technically necessary or legally necessary (the license notices).

Flyway Oracle Deployment

I just started working on a new project. We are building a new application from scratch. Team started with a brand new schema. I wanted to automate the database build process, so I started looking for the options. Flyway seems to be a good one. I have been playing around a bit and found some limitations of the tool. Perhaps, someone will be able to help.
We have the following directory structure for SQL files:
SQL
-- DDL
-- DML
-- PACKAGES
We are doing agile development, so file names are based on the sprint number. The file naming convention we are using is:
Sprint#_script#_userstory#_description
For example:
S1_01_US123_CreateNewTable.sql
S1_02_US123_AddConstraint.sql
Next sprint:
S2_01_US456_AddColumn.sql
And so on...
I setup the JDBC parameter and I am able to connect. I tested basic things like: clean, repair, info and migrate with couple of test scripts and that worked like a charm. I started to run into issues when I tried deploying all the scripts. Issues like:
- It didn't like single underscore.
- It didn't like the file names starting with S1_01_*, rest of the file name is different and they are in different folders.
I have the following questions:
Can I build using Flyway without having to rename the files?
How can I get it to deploy in this order:
DDLs
DMLs
Packages (everytime I deploy). And we have a separate header and body files, so deploy header first as well.
Can I change the structure of schema_version table?
Can I do selective clean? Like flag some of the objects to not to be dropped?
My main concern is running DDLs before everything else. If I can accomplish that, then I can start using Flyway and learn as I go.
Thanks in advance.
Harbinder
Can I build using Flyway without having to rename the files?
Maybe. Experiment with the flyway.sqlMigrationSeparator property. Try "_US" which will break after the script number. You'll also need to set flyway.sqlMigrationPrefix=S.
How can I get it to deploy in this order: DDLs, DMLs, Packages (everytime I deploy). And we have a separate header and body files, so deploy header first as well.
Specify multiple locations (separated with comma) and ensure the version numbering ordering makes sense as if these files were all in the same directory. If running from the command line, turn on debug with -X to see how flyway collects the migrations.
Additionally, if possible, you should consider renaming your packages as a Repeatable migration (default:R) so that you just need to change the contents of the file for flyway to pick it up.
Can I change the structure of schema_version table?
No. This is managed by flyway.
Can I do selective clean? Like flag some of the objects to not to be dropped?
No. In this situation it might be best to set flyway.cleanDisabled=true to stop accidental mistakes. There are callbacks before and after clean if you wish to do extra cleaning but I don't think you can restrict clean itself without delving into the code.
Good luck!

How to make webpack generate consistent hashes across multiple builds

Background
In a Webpack configuration, you can specify the naming convention for emitted files as in [name]-[hash].js. I use this in combination with the html-webpack-plugin to generate .html.erb partials for use in a Rails app to include correct assets on deployment. Every Webpack build produces a unique fingerprint in filenames, which works great ... except for when you scale your app to multiple servers, where Webpack is part of the build process (a fresh new build for each server). Rails does a similar fingerprinting of precompiled assets.
github.css
** becomes **
github-448c90f2e2f181cd43b943786ee6f.css
Problem
Because the app is scaled to multiple servers behind a load balancer (using Elastic Beanstalk), the builds must be exactly the same on each deploy. As Webpack generates a unique hash per build, we get 404s on page loads, as the generated assets are not in sync.
Question
Has anyone figured out how to get the same hash across multiple builds? Possibly based on the git commit hash? That's what I'm thinking, but lots of searching has yielded no results. Not above building it myself.
I had the same problem as Kyle: Using Elastic Beanstalk with multiple servers, each server using Webpack generates a different hash.
First, I tried using [contenthash]. I thought this would work because, unlike [hash], it is based on the content of a file. It didn't work. My suspicion is that each server is using a different salt.
I think you could resolve this by specifying the salt with output.hashSalt, however I have not tested this, as I have since eliminated the need to use a hash in the filename.
The hashes are deterministic and as long as the content of the included files is the same, the hash will be the same as well. This also includes dependencies, so it's important to have the exact same dependencies.
Yarn uses a yarn.lock file to guarantee that the dependencies installed are identical on every install, this makes it very simple to have the exact same build every time on every machine. With npm you can use npm shrinkwrap to lock down the versions of the dependencies, but this is usually quite tedious to manage (one of the reasons Yarn was created and why it uses a lockfile).
You might also want to read Guides - Caching of the offical docs.

How to deploy: database, source and binary changes in 1 patch?

I'm part of a development team that works on many CMS based projects, using systems like Joomla and Drupal.
In our development process, all of our code changes are managed inside of Git. At the end of a sprint, we create a DIFF that we can apply via patch to live site.
The problem is that most of the time, the changes include
Database Schema Changes
Database Data Changes
Source Code changes
Binary file changes (like images)
Git Diff handles Source Code changes beautifully. Binary files are only not included in the Diff except for reference to the fact that the files have changed.
Database Schema Changes and Database Data Changes are a mess.
I was wandering if anything like an unified patch system exists that could be used to deploy all of these changes in 1 patch.
So the question is, "Is there a system that can be used to deploy all of these changes in 1 shot?
Ideally, this system would allow to run dry-run like patch, but for all of the 4 data types.
Edit:
Thank you everyone for the feedback that you provided, it was a starting point for my research in this area.
Here is what I found so far:
It's difficult to deploy php based
applications using linux packaging
system because the changes to the
project happen iteratively rather
then as releases.
It would be possible to use dbconfig to deploy changes to a
project, but the problem is
generating mysql db diffs (schema
and data)
what really is missing for deployment of php based applications
is a deployment manager that would
be installed on the server and would
be the interface for deploying the
patches
I started a Google Wave on this topic and produced a lot of information as a result.
If anyone is interested in reading this wave, please let me know and I will add you.
For handling installation and upgrade of our application, we use the debian packaging system . ( .deb package )
Context :
We are making J2EE + Flex application. Shipping and administred throught a VPN.
So not so far from you.
Fresh install and upgrade for a version to another are made through puppet ( a system for automating system administration tasks : he install our .deb )
In the .deb we have
our compiled sourcecode
the schema of the database ( handled by [db-config][1] )
binary stuff
how to install throught apt all other application needed ( mysql, tomcat ... )
= All stuff for a fresh install
We also add the info to go from a version to another
the script for upgrading the database ( for each version )
new binary
new stuff to lauch at the machine start ( eg : some weeks ago we have add a activeMQ server )
=> Once the .deb is made correctly, we can install or upgrade seamless in one operation. ( it's made automatically, without any prompt ).
Theire is one .deb per realease, each .deb has a version number and a signature.
You can pick any of our .deb and make a fresh install or upgrade from the actual version to the version number he hold.
The .deb is in our continous integration system. ( we build a .deb each hour, like if we are about to realease a new version )
What are the benefit ?
Install / upgrade automaticcally, with confidence.
Rollback a version
run dry are natively supported
In your precise case
* Database Schema Changes
* Database Data Changes
* Source Code changes
* Binary file changes (like images)
Database => you will have to write migration script. One for each version. ( ex : 1.2-update.sql 1.3-update.sql )
Source code and binary => add them, say in witch version they have to be copied/use
Edit : i'm not sure about source code. We are doing that with compiled code...
Some links to start :
https://wiki.ubuntu.com/PackagingGuide/Complete
http://www.debian.org/doc/manuals/maint-guide/index.fr.html#contents ( in french )
[1]: http://pwet.fr/man/linux/formats/dbconfig dbconfig
[1]: http://www.debian.org/doc/FAQ/ch-pkg_basics.en.html debian
I don't think you'll find a fail-safe mechanism.
I recommend that, when possible, you take into account compatibility with the current published source when making schema/data changes.
This way you can make a v. simple tool that runs database scripts committed to a particular svn location (you don't want diff on database changes, as if you need further modifications you need different statements).
With the above done, you can have a simple command that runs the database changes, then the binary & source code changes.
For database there is also the option of schema&data comparisons tools, these could be used to compare environments & make sure there isn't anything unexpected missing in the change scripts - could also generate the change scripts, but as I said you really want to make sure it won't break current source.
You can create a tool to do the migrations painlessly -- something similar to Peoplesoft's Patch Upgrade Assistant.
It is basically a standalone executable that reads an "Upgrade Template" and carries out tasks. The upgrade template declaratively describes the upgrade tasks or "steps". The steps could be - copy (for backing up or moving the precompiled objects like classes and othar binaries), database (for altering schema elements), SQL Scripts (for loading or transforming current data). The steps will have some predicate logic capable - if it is this, do this, else skip it and go to next etc.
The template is usually an XML file. It also provides for manual steps with instructions for manual actions. Each step also specifies if it is recoverable or not. It would also validate if the step has succeeded or not.
It may be possible to have a Open Source project around this requirement which is quite common.
You need to save git commit objects in local file and then import them into other repo/branch.