Need to SVN export changed projects - date

I maintain an SVN repository that has code dating back 9 years. This repo contains code for hundreds of applications in it, many of which are obsolete. The back-end is VisualSVN Server running on a Windows server and the client used is TortoiseSVN with CLI tools.
We are now prepping to migrate to a different SCM application and I need to export only the applications which have been updated in since January 1, 2021 (just under 40K revisions). If I only needed to export simply the artifacts that had been changed since that date, it would be simple. But instead, I need to export any project that contains any file (no matter how far down the tree) that has been created or modified since that date.
Note that while tags created after the cut-off date will be migrated (to make audits simpler), these should not be considered when deciding what is active or not.
I have no qualms making this a multi-step process (identifying active project root folders and putting them into a changelist, then exporting just those folders, but I'd prefer a single command solution if I can manage it. While Unix OS (like grep) commands are potentially possible, I'd prefer to stick to Windows OS commands (like findstr) if at all possible.
Things I want to avoid at all costs:
checking out the entire repo
considering anything in any of the thousands of /tags/ folders (active branches are fine)
I had started out trying
svn diff --summarize -r{2021-01-01}:HEAD https://Url.To/Repo/
but after it had been running a while, I realized that it was trying to look at all of the /tags/ and would likely end up taking a week or more to finish.
I've also looked at ls -r -verbose and log commands but neither seems to give me what I need.
Can anyone help come up with either the single or the two-step (or more) commands? I don't mind it taking several hours to run (or even overnight), but when we do the final migration, I can't take the SCM offline for more than one weekend, including the import on the other side. As a worst case, I can search the repo for all instances of /trunk/ (this would be the root of each project), and then loop through that output. Is there a cleaner/more elegant way to do this?

Related

How to create VCS-like conflict-merging file?

I'm trying to generate "unresolved-conflict"-like files with no luck.
I checked diff manpage and googled about diff, merging etc... but I only found information about how to handle these files, but not about how to actually generate them.
To be clear, what I am trying to do is, having two similar files, generate single automatically merged one similar to that most VCS systems like Git or Subversion generate over files in "conflict" status.
The main goal is be able to rapidly edit it to manually resolve all differences just as I do in Git or Subversion but without having them in any VCS system.
I "almost" successfully generated full diffs with diff -C 1000000command (because I won't have too large files that context limit is pretty acceptable).
...but resulting file comes with ALL rows modified. That is: prepended by "-" or "+" (depending of if it comes from first or second file) or " " (space) for common rows.
What I would obtain is an "almost unchanged" file with sections like following example emphasizing differences:
<<<<<<<< File1
Section from File1
Foo
========
Section from File2
Bar
>>>>>>>> File2
EDIT:
Answering #s.m. comment, I explain here what is my exact goal (because it is too long to explain in a comment):
I'm working on a server to allocate multiple PostgreSQL clusters acting as hot-stanby of distinct masters.
I already successfully implemented binary full/incremental backups (bacula) over production servers and also have a helper script to configure hot-standby servers.
But nowadays we have to setup (and mantain -and ideally periodically check-) all of them one by one.
To make it simpler, we are planning to create single (or possibly multiple) "Super"-hot-standby server(s) containing multiple clusters replicating different master servers.
My goal is to have a single script to create new standby cluster easily without too complicated tunning and not having to bother about backup setup (because all clusters will be backed up at once).
I almost successfully implemented that script: It creates a new cluster in a free port, adjust needed configuration parameters and put it in sync with master.
These adjustmens are made over "default" configuration files but some masters may have special configuration parameters (specially memory adjustments) that must be replicated in standby because, otherwise, it could be unable to replicate some operations of the master). And there is too the pg_haba.conf which defines which users/servers are allowed to connect to, which we also want to replicate on standby (for an eventual failover).
So, to make it easier (and less error prone) to merge both configuration files, I implemented a bash function to retrieve configuration files from masters and, now, my goal is to merge it with forementioned "default-tuned" one.
This way, adding new standby would be as easy as executing our script providing master's network name and reviewing automatically merged files to manually solve the few differences encountered in merge.
EDIT 2:
To be clear, what I were trying to do in preference order is:
Approach it by just using GNU Diff (like #s.m. pointed in his comment) even by using complex arguments or piping to external tools usually available on most unix* systems so I can wrap it in a bash function and use it in my script without no dependencies.
Use some existing tool (but not reinvent the wheel).
Implement my own tool and use it.
Without better solution, I finally tried to implement my own tool (which I called 'humandiff') to approach it.
I published it in Github and uploaded as npm package so I can now install it from npm in producion servers.
Even thought it needs a little setup to be installed. That is:
Install NodeJS and NPM (sudo apt-get install nodejs-legacy npm in debian-like systems).
Install humandiff itself (sudo npm install -g humandiff).
Usage and output examples can be found in README file so I do'nt extend myself anymore.
I post this answer just in case someone happens to have the same problem but, anyway, better solutions would be welcomed too.
Edit: I missed to say, even it's pretty obvious, that in fact I didn't implement any diff algorithm at all. I just noticed that having position and offset metatata provided by GNU diff and one of the original files is possible to construct the other or that merged file a were searching for so I simply implemented a wrapper to de so. But, instead of calling GNU Diff binary, I found an also named "diff" module in the NPM repository that served to me for the same task.

Best option for check-in/out with small team using Visual Studio 2012?

I have a small team of web developers who work together on up to 50 external sites. I am trying to find a better solution to using Dreamweaver's check-in check-out for managing source. We have just started using Visual Studio 2012 here and there and I am curious if TFS is the way to go for us. No one here has ever used versioning or any type of source control before, so I am looking for something similar to what they are used to.
If it matters at all, our sites are all hosted on a Windows 2008 R2 server, and largely written in C#.
I think TFS is a good option to consider. As several people have commented, it will be a jump from what you are your team are used to in Dreamweaver, but I personally feel if you are serious about managing your intellectual property, you will invest in some sort of version control system. With that said, there will be a learning curve regardless whether you are your team select TFS, SVN, Git, etc.
Assuming you do go with TFS, you do get the added benefit of everything else that comes with TFS - it's not just about version control. This includes work item tracking, automated builds/deployments, reports, a simple SharePoint site, etc.
With TFS you get the benefit of all of these features, combined into a single product. You can accomplish a similar setup using open source products as well, but would require you to piece the products together.
I'd use the integrated Subversion client in Dreamweaver, which does the basic stuff very nicely and doesn't require the tedious navigation process that will lead to your team bypassing the system. Only problem, DW does not support the latest versions of SVN so you need to pick up an SVN server that is compatible. Try this:
Setting Up Version Control for Dreamweaver CS6 on Windows
Any previous attempts to get version control working may well have created some .svn folders and files on your PC. You MUST remove ALL of these and UNINSTALL ALL OTHER VARIETIES of Subversion software from your PC before you start.
Go to the VisualSVN Server website and download an archived standard version of their software, version 2.1.16 . Don’t be tempted to grab a later version, because this will install SVN 1.7 or 1.8 and neither will work with Dreamweaver.
http://www.visualsvn.com/server/changes/
Trying to get DW working direct to a local folder using the file:// protocol probably won’t work and is also known to put data at risk. You need the server. I chose to install the VisualSVN server with the default settings, other than opting to use Windows logins and go with HTTP, not HTTPS. I decided to have the repositories live on an internal SSD drive, but any local drive will do. When creating a folder for your repositories to live in, use a name that is pretty general e.g. ourcorepositories . I used lower case for everything.
Right click on ‘Repositories’ to create a new one. Give it a name without any spaces or special characters e.g. mynewprojectrepo and check to ‘Create default structure’ . Before you OK, note the Repository URL and copy it into Notepad or a similar plain text editor so you can refer to it later during 6 below. It will be something like
http://OFFICEDESKTOP/svn/mynewprojectrepo
Notice that the capitalised part of the URL is the name of your computer. Click OK and you now have a repository for your project.
5. Boot DW and go to your project. If you don’t have a project yet, create one and stick some dummy files and folders in it. Go to Site menu>>Manage sites… and 2-click your project. Select Version Control.
6. Set Access to be ‘Subversion’ (no other choices exist), Protocol to be HTTP and for the Server Address enter the name of your computer in lower case e.g.
officedesktop
For the Repository Path enter (e.g., using current example from 4. Above)
/svn/mynewprojectrepo
The Server Port should be 80 . For the Username enter your Windows user name, in lower case. Enter your Windows password for the Password. This is the name and password combo that you use to log in to your PC . Click the Test button and you should get a success message. If not, the best advice is to delete any .svn files and repositories you have created and start again. Be sure not to add any slashes or omit any; the above works. Before you click Save, click the link to the Adobe Subversion resources and bookmark it in your browser. There is a lot of useful background information there. Click Save, click Done.
7. Go to your DW project and open up Local View. All of your site’s files and folders will have a green + sign beside the icon. Right-click on the site folder and click ‘Version control>>Commit” . It is a very good idea to leave comments whenever you change anything, so leave a Commit Message along the lines of “The initial commit for My New Project” and click to Commit. If you have a lot of files to go to the repository, they’ll take some time to upload. As they upload, the green + signs disappear to show that you local version is in synch with the repo.
8. Okay, that’s it, you have Version Control in Dreamweaver CS6. It may also work in CS5 and 5.5. Check out those Adobe resources for some good insights on workflow. I can’t help with any other ways to implement version control, but I can maybe save you time by saying that DW doesn’t integrate with Git and that the basic, but integrated, Subversion client in Dreamweaver is way better than having no version control. For coverage against physical disaster, I’d also add in a scheduled daily backup of your entire repositories folder to some cloud storage.
Apologies for any errors. I’d recheck all of the steps, but A) I think they’ll get you up and running and B) it’s easier to do the install and set up the first time than the second time (all those .svn files and folders to get rid of).

How can I utilize source control when my working copy needs to be on a shared host without SSH access?

I'm trying to develop a little toy PHP project, and the most convenient location to run it is on a shared host I happen to have for my ill-maintained blog. The problem with this is that I have no way to run Subversion on this shared host, nor do I even have SSH access to be able to access an external repository from the host. Had I been thinking straight a few months ago when the hosting was up for renewal, I probably should have paid a couple extra bucks to switch to something a bit better, but for now I can't justify throwing money at having a second host just for side projects.
This means that a working copy of my project would need to be checked out to my laptop, while the project itself would need to be uploaded to the shared host to run. My best option seems to be creating a virtual machine running Linux and developing everything from in there, but I know from past experience that the extra barrier that creates, small though it may be, is enough that it puts me off firing the VM up just to do a couple minutes work to make some minor change I just thought up. I'd much prefer to just be able to fire up my editor and get to work.
While I'd imagine I'm not the first to encounter such a problem, I haven't had much success finding a solution online. Perhaps there isn't one beyond the VM or "manual mirroring" options, but if there is I'd expect StackOverflow to be the place to find it.
Edit: There's some confusion, it seems, so let me attempt to clarify. The shared host here is basically my dev server, but it has no svn or ssh. In other words, I can svn checkout to my laptop, but I can't run that on my shared host. Similarly, I can run/test my code on the shared host, but I can't do that on my laptop (well, I technically could, but it's Windows, and I don't want to worry about Win-vs.-Linux differences with PHP, since I do want this to become public at some point, and it will certainly be Linux-based at that point).
You might consider writing a post-commit hook to automatically upload the code to your host, so that any time you commit a change, a script executes that:
Checks out a copy of the code into a temporary directory
Uploads that code via FTP (or whatever your preferred method is) to the shared host
Cleans up after itself, optionally informing you via e.g. email when the transfer is successful
Subversion makes enough information available to these scripts at runtime that you could get more sophisticated and opt only to upload the files that changed or alter behavior based on specific property changes, for instance, but for a small project the brute force "copy it all" approach should be fine.

Best way to synchronize my code between multiple workstations?

Firstly, I'm not sure if this belongs here or programmers. Please move if it needs to be there.
I am mostly a hobbiest web developer, with a bit of freelance sidework. I program anywhere I can, from a laptop on the go to my home PC. I've pretty well settled on Net Beans as my IDE, and xampp for my test environment. My question is how do I best synchronize changes between my different PCs?
I started out FTPing changes to a "dev" area on my webserver, then FTPing them down to my other PC, but that's sort of a pain. Lately I have started using dropbox, which takes a lot of the pain out, but still isn't quite as seemless as I'd like.
Has anyone come up with a bulletproof way to easily ensure you're always opening up the latest version of your files across multiple PCs which aren't necessarily always (but sometimes are) on your home network?
Free is a necessity.
I personally use Subversion.
It integrates easily with Netbeans or Eclipse, and you say you've got a webserver, which I presume is Linux based? It's easy to set up in any Linux environment, though I think it can also be set up in a Windows environment.
Then you just run an update on your code when you want to get the latest version, do checkins when you like it, and you can always go back to earlier code (like if you tried a two day experiment that didn't work out and now want to delete it all and go back to what you had that was working).
Use some version control system. If you are new to this stuff Subversion would be probably the easiest to start with and it is very well intergrated with Netbeans.
You may set up repository on your own server or use some external service - there are a lot of them and almost everyone offers some free plan to start with. I'd be glad to give you some pointers if you like.
Learn to use a version control system.
www.github.com is free for open source projects, but must pay for private source repositories and also closed source projects, hurray.
http://unfuddle.com uses subversion, and is free for 200Mb of private source.
You may find some of the links in this thread useful.
A very simple and efficient way is to open an account on dropbox.com.
I disagree with a lot of the answers here (A lot are pretty old). Git/SVN is not a synchronization solution (nor a backup). It is just a version control system. (But if done correctly you can use git and a sync tool at the same time.)
By using git for synchronization you get the following side effects:
polluted git log: e.g. git commit -am 'synced files'... 'synced files again', 'synced from laptop', 'synced from desktop'
a substandard workflow: every time you leave your workstation or laptop you have to remember to git commit and push. This takes time and mental energy
Instead, I would recommend a solution that offers a continuous sync of your files to a central server. You can close your laptop within five seconds (maybe less) and your changes are propagated to a central server awaiting to sync to other devices when they come online. One priviso: you need to make sure you are not syncing folders like .git so a sync from your laptop .git for your project doesn't corrupt your .git on your desktop. Some options are:
Synology Cloudstation Drive - I can speak personally to this one. It excludes all "." files by default, and syncs at every file change. As soon as you save the file it is synced
NextCloud/OwnCloud - I now use Nextcloud, sync all computers, and make sure to exclude .git so that each git repo will track independent changes against origin BUT still be synced between devices.
Google Drive
Dropbox
You can set up a web-based source repository on something like http://www.github.com, and be able to access it from any computer.

Storing third-party framework/middleware into source control that needs to alter your compiler/IDE

I know there are posts that ask how one stores third-party libraries into source control (such as this and this). While those are great answers, I still can't find the answer to this:
How do you store third-party middleware/frameworks binaries that need to alter your compiler / IDE for the library to work properly? Note: for my needs, I don't need to store the middleware source, I only store header files / lib / JAR ..so that it's ready to be linked.
Typically, you simply link libraries to your app, and you are good. But what about middleware / frameworks that need more?
Specific examples:
Qt moc pre-processor.
ZeroC Ice Slice (ice) compiler (similar to CORBA IDL preprocessor).
Basically these frameworks/middleware need to generate their own code before your application can link to it.
From the point of view of the developer, ideally he wants to just checkout, and everything should be ready to go. But then my IDE/compiler will not be setup properly yet, so the compilation will fail..
What do you think?
Backup everything including the setup of the IDE, operating system, etc. This is what i do
1) Store all 3rd party libraries in source control. I have a branch for all the libraries.
2) Backup the entire tool chain which was used to build. This includes every tool. Each tool is installed into the same directory on each developers computer, so this makes it simple to setup a developers machine remotely.
3) This is the most hardcore, but prepare 1 perfect developer IDE setup which is clean, then make a VMWare / VirtualPC image out of it. This will be useful when you cant seem to get the installers to work in future.
I learned this lesson the painful way because I often have to wade through visual studio 6 code which don't build properly.
I think that a better solution is to make sure that the build is self-contained and downloads all necessary software for itself unless you tell it otherwise. This is the way maven works, and it is really handy. The downside is that it sometimes needs to download a application server or similar, which is highly unpractical, but at least the build succeeds and it becomes the new developers responsibility to improve the build if needed.
This does of course not work great if your software needs attended installs, but I would try to avoid any such dependencies in any case. You can add alternative routes (e.g the ant script compiles the code if eclipse hasn't done it yet). If this is not feasible, an alternative option is to fail with a clear indication of what went wrong (e.g 'CORBA_COMPILER_HOME' not set, please set and try again').
All that said, the most complete solution is of course to ship everything with your app (i.e OS, IDE, the works), but I doubt that that is applicable in the general case, how would you feel about that type of requirements to build a software product? It also limits people who want to adapt your software to new platforms.
What about adding 1 step.
A nant script which is started with a bat file. The developer would only have to execute one .bat file, the bat file could start nant, and the Nant script could be made to do anything you need.
This is actually a pretty subtle question. You're talking about how to manage features of the environment which are necessary in order to allow your build to proceed. In this case it's the top level of your code toolchain, but the problem can be generalised to include the entire toolchain, and even key aspects of the operating system.
In my place of work, we have various requirements of the underlying operating system before our code will successfully run. This includes machine-specific configurations as well as ensuring correct versions of system libraries and language runtimes are present. We've dealt with this by maintaining a standard generic build machine image which contains the toolchain requirements we need. We can push this out to a virgin machine and get a basic environment that contains the complete toolchain and any auxiliary programs.
We then use fsvs to version control any additional configuration, which can be layered on to specific groups of machines as needed.
Finally, we use custom scripts hooked in to our CI server (we use Hudson) to perform any pre-processing steps required for specific projects.
The main advantages for us of this approach is:
We can build and deploy developer and production machines very easily (and have IT handle this side of the problem).
We can easily replace failed machines.
We have a known environment for testing (we install everything to a simulated 'production server' before going live).
We (the software team) version control critical configuration details and any explicit pre-processing steps.
I would outsource the task of building the midleware to a specialized build server and only include the binary output as regular 3rd party dependencies under source control.
If this strategy can be successfully applied depends on whether all developers need to be able to change midleware code and recompile it frequently. But this issue could also be solved via a Continous Integration Server like Teamcity that allows to create private builds.
Your build process would look like the following:
Middleware repo containing middleware code
Build server, building middleware
Push middleware build output to project repository as 3rd party references
Update: This doesn't really answer how to modify the IDE. It's just a sort-of Maven replacement thingy for C++/Python/Java. You shouldn't need to modify the IDE to build stuff, if so, you need a different IDE or a system that generates/modifies IDE files for you. (See CMake for a cross-platform c/c++ project file generator.)
I've written a system (first in Ant/Beanshell at two different places, then rewrote it in Python at my current job) where third-partys are compiled separately (by someone), stored and shared via HTTP.
Somewhat hurried description follows:
Upon start, the build system looks through all modules in repo, executes each module's setup target, which downloads the specific version of a third-party lib or app that the current code revision uses. These are then unzipped, PATH/INCLUDE etc are added to (or, for small libs, copy them to a single directory for the current repo) and then launches Visual Studio with /useenv.
Each module's file check for stuff that it needs, and if it needs installing and licensing, such as Visual Studio, Matlab or Maya, that must be on the local computer. If that's not there, the cmd-file will fail with a nice error message. This way, you can also check that the correct version is in there
So there are a number of directories on the local disk involved. %work% needs to be set using an global environment variable, preferrable on a different disk than system or source-checkout, at least if doing heavy C++.
%work% <- local store for all temp files, unzip, and for each working copy's temp files
%work%/_cache <- downloaded zips (2 gb)
%work%/_local <- local zips (for development or retrieved in other manners while travvelling)
%work%/_unzip <- unzips of files in _cache (10 gb)
%work%&_content <- textures/3d models and other big files (syncronized manually, this is 5 gb today, not suitable for VC either)
%work%/D_trunk/ <- store for working copy checked out to d:/trunk
%work%/E_branches/v2 <- store for working copy checked out to e:/branches/v2
So, if trunk uses Boost 1.37 and branches/v2 uses 1.39, both boost-1.39 and boost-1.37 reside in /_cache/ (as zips) and /_unzip/ (as raw files).
When starting visual studio using bat files from d:/trunk/BuildSystem/Visual Studio.cmd, INCLUDE points to /_unzip/boost-1.37, while if runnig e:/branches/v2/BuildSystem/Visual Studio.cmd, INCLUDE points to /_unzip/boost-1.39.
In the repo, only a small set of bootstrap binaries need to be stored (i.e. wget and 7z).
We currently download about 2 gb of packed data, which is unzipped to 10 gb (pdb files are huge!), so keeping this out of source control is essential. Having this system allows us to keep the repo size small enough to use DVCS such as Mercurial (or Git) instead of SVN, which is very nice. (I'm thinking of using Mercurials bigfiles extension or file sharing instead of a separately http-served directory.)
It work flawlessly. Developers need only to check out, set an enviroment variable for their local cache, then run Visual Studio via a specific batch-file in the repo. No unzipping or compiling or stuff. A new developer can set up his computer in no time. (Installing Visual Studio takes the order of a magnitude more time.)
First time on a new computer takes some time, but then it's fast, only a few seconds. Downloads/unzips are shared on the local computer, do checking out additional branches/versions does not occupy more space. Working offline is also possible, you just need to get the zip files manually if new ones have been uploaded. (This mechanism is essential to test new versions/compilations of third-party libraries.)
The basics are in a repo on bitbucket but it needs more work before it's ready for the public. Apart from doc and polish, I plan to:
extend it to use cmake instead of raw
vcproj-files, to make it more
cross-platform.
script the entire
process from checkout/download of
third-party packages to building and
zipping them (including storing the
download in a local repo) ... currently that's on my dev computer. Not good. Will fix. :)
As for moc, we use Qt's Visual Studio add-in, which stores this in the .vcproj files. Works well. I do think that CMake is one of the best answers for this though