How is Accurev Performance? - version-control

How is performance in the current version (4.7) of Accurev?
time to checkout per 100mb, per gb?
time to commit per # of files or mb?
responsiveness of gui when 100+ streams?
I just had a demo of Accurev, and the streams look like a lightweight way to model workflow around code/projects. I've heard people praising Accurev for the streams back end and complaining about performance. Accurev appears to have worked on the performance, but I'd like to get some real world data to make sure it isn't a case of demos-well-runs-less-well.
Does anyone have Accurev performance anecdotes or (even better) data from testing?

I don't have any numbers but I can tell you where we have noticed performance issues.
Our builds typically use 30-40K files from source control. In my workspace currently there are over 66K files including build intermediate and output files, over 15GB in size. To keep AccuRev working responsively we aggressively use the ignore elements so AccuRev ignores any intermediate files such as *.obj. In addition we use the time stamp optimization. In general running an update is quick, but the project sizes are typically 5-10 people so normally only a couple of dozen files come down if you update daily. Even if someone made changes that touched lots of files speed is not an issue. On the other hand a full populate of all 30K+ files is slow. I don't have a time since I seldom do this and on the rare occasion I do, I run the populate when I'm going to lunch or a meeting. I expect it could be as much as 10 minutes. In general source files come down very quickly, but we have some large binary files, 10-20MB, that take a couple of seconds each.
If the exclude rules and ignore elements are not correctly configured, AccuRev can take a couple of minutes to run an update for workspaces of this size. When I hear of other developers complaining about the speed I know something is miss-configured and we get it straightened out.
A year or so ago one of the project updated boost with 25K+ files and also added FireFox to the repository (forget the size but made boost look small.) They also added ICU, wrote a lot of software and modified countless files. In all I recall there were approx 250K+ files sitting in a stream. I unfortunately decided that all their good code should be promoted to the root so all projects could share. This turned out to be a little beyond what AccuRev could handle well. It was a multi hour process getting all the changes promoted. As I recall once FireFox was promoted the rest went smoothly - perhaps a single transaction with over 100K files was the issue?
I recently updated boost and so had to keep and promote 25K+ files. It took a minute or two but not unreasonable considering the number of files and the size of the binaries.
As for the number of streams, we have over 800 streams and workspaces. Performance here is not an issue. In general I find the large number of streams hard to navigate so I run a filtered view of just my workspaces and the just streams I'm interested in. However when I need to look at the unfiltered list to find something performance is fine.
As a final note, AccuRev support is terrific - we call them the voice in the sky. Every now and again we shoot ourselves in the foot using AccuRev and wind up clueless on how to fix things. Almost always we did something dumb and then tried something dumber to fix it. Eventually we place a support request and next thing we know they are walking us through the steps to righteousness either on the phone or a goto meeting. I've even contacted them for trivial things that I just don't have time to figure out as I'm having a hectic day and they kindly walk me through it rather than telling me to RTFM.

Edit 2014: We can now get acceptable X-Windows performance by using the commercial version of RealVNC.
Original comment:This answer applies to any version of Accurev, not just 4.7. Firstly, GUI performance might be OK if you can use the web client. If you can't use the web client and if you want GUI performance then you'd better be using Windows, or have all your developers in one place, i.e. where the Accurev server is located. Try to run the GUI on X-Windows over a WAN ? Forget it : our experience has been dozens of seconds or minutes for basic point and click operations. This is over a fairly good WAN about 800 miles distant, with an almost optimal ping time. This is not a failing of Accurev, but of X-Windows, and you'll likely have similar problems with other X applications over a WAN. So avoid basic X if you possibly can. Currently we cannot, and our WAN users are forcibly relegated to command-line only. The basic problem is that Accurev is is centralized and you can't increase the speed of light. I believe you can get around WAN latency by running Accurev Replication Servers, but that still does not properly address the problem if you have remote developers at single-person offices over VPN. It is ironic that the replication servers somewhat turn this centralized VCS into a form of DVCS. If you don't have replication servers then a horrible but somewhat workable work-around is to use a delta-synchronization tool such as rsync to sync your source tree between your local machine where you can run the GUI (i.e. GUI running directly on your Windows or Linux laptop), and the machine where you're actually working (e.g. UNIX machine 1,000 miles away). Another option is to use something like VNC which works better over a WAN than X, connecting to a virtual desktop at the Accurev server's location, and use X from there. At my workplace more than one team has resorted to using Mercurial on the side and promoting to Accurev only when it's strictly necessary. As Stephen Nutt points out above, other necessary work is to use time-stamp optimization and ignores. We also have our Accurev admins (yes, it requires you employ people to baby sit it) complain when we need to include large numbers of files, despite the fact they form a core part of our product and MUST be included and version controlled. Draw your own conclusions.

Related

Editor for writing in tandem

I'm working with a couple of student interns, showing them proof-of-concepts.
The amount of code we are writing is extremely small - snippets. I don't have to worry about version control or branches.
What I'm looking for is the simplest html editor for two people to be sharing a single document, not at the same time. In other words, it's ok if I totally overwrite his change if we are both making changes at the same time. In fact, that's preferred!
If we were on a local machine, I could use Notepad++. But since the code is on my VPS, then we are making changes in Dreamweaver and pushing changes up. That's too complicated because I make a change and he doesn't get the change unless he downloads it. We need to make the changes directly on the server with no intermediary steps. And when I make a change to his document, it reloads on his screen.
It needs to be as simple as two people using Notepad++ to change a file on a networked drive.
Remote Desktop only allows 1 person to remote in at a time.
Hmm... I wonder if that's really a restriction or if I could increase the number of concurrent users.
Hmm... I wonder if I could add my VPS drive as a networked drive? That would be slick...

Is it safe to cloud sync TFS workspaces?

Please excuse a newbie question, but I've always used SVN and more recently, Git. Just now am touching TFS for the first time.
If I have two different machines that I work on regularly, can I safely keep the project files in sync using something like Dropbox/Sugarsync/Skydrive?
Are there any pros/cons to be aware of?
(I know that some of you might ask something like why not just checkout on the other machine. Just trying to save a step. I want to just pick up the other machine and do what I need to do without having to check out anything.)
TFS workspaces contain information about the machine name and user that created them, however if you're using local workspaces and you're not putting any server-side locks on files then I suppose you could sync them via dropbox and it should probably work just fine.
That said, I'd never recommend it.
You're not only going to sync all your code but also all the binaries that you're producing each and every time you compile, plus you won't have any change history between machines and you need to keep monitoring the drop box app to make sure things have synced fully before switching machines.
If you want to move changes between two machines I'd recommend using shelvesets. It only takes a few seconds to do and you'll have a more explicit update process between machines. You can be sure of what is happening in your code on each machine and you have an implicit rollback point if you realise you put something in the shelveset you didn't want.

Looking for a version control based backup tool

I'm traveling all the time (every 2-3 months, I'm in a new city or country), with no real permanent address. I've managed to work out all the kinks over the last couple of years...except having a good backup/sync solution.
I have a macbook pro & a thinkpad w701 (which runs two different VMs). It's a pain in the ass because making changes on one machine (such as adding some new music or updating some presentations) requires me to keep track of what changed where. And then every couple of weeks, after syncing the three different images, I try to manually sync it out to a backup drive that I carry around.
It's pretty much the most annoying thing ever...especially when I sometimes make changes on the backup drive and I have to remember not to override them.
What I'd really like is something simple that has more of a version control like workflow:
I can push out changes to some
central server (like a commit.
Example: I add some changes to my
music directory and then I can just
commit those changes to backup)
Before the backup happens, I'd like to see a "diff": what files will be
overridden, which one's newer, etc
I can access my files off the server (if I'm making an audio mix and need
to pull out some songs, I'd like to get them from the server. All the
backups can't just be one big binary
compressed zip blob)
Dropbox comes pretty close but it lacks the "commit" & "diff" functionality. I thought about using Amazon AWS but that falls short because I can't see diffs and can't access my files directly off aws.
Any ideas? Or any other solutions? I guess what I'd really like is TimeMachine in the cloud or maybe even a NAS that's securely accessible through the internet
You might want to use rsync. It's a unix synchronization tool you can use it on windows and unix variants (including Mac OS X). It uses delta copying to minimize transfer and hardlinks to minimize backup size.
You can access all files in every backup as though they were normal files. Diffing can be done using traditional tools. It is all command-line based so if you don't want that you will need to find GUI tools, but I don't know which you could use.
You would need a server with a rsync deamon/service. I don't know if there are providers for it but you can set up your own VPS starting at a few dollars a month.
Have you looked at Amazon S3? S3 is data storage mechanism and there are bunch of tools to "sync" your local directory with S3. Some of the tools are:
http://www.vinodlive.com/2007/08/20/amazon-s3-storage-tools/
Out of these , S3Sync should do what you are looking for, I.e. submit only changed files and a mode that would tell you changes before submitting them.

How to use replication in combination with version control system?

The situation is as follow :
Our company works two main production sites, communicating via WAN. We develop a software internally which uses about 100Gb of disk space on our servers (application data deployed to our customers with a lot of images). In order in improve performance, our network administrators choosed DFS replication (every 6 hours). This means that our users (people from within the company) do not have to wait (sometimes 2-3 hours) to download the needed files, because they are available locally (over LAN).
The problem is that the algorithm used by DFS replication is "Last Writer Wins". So, in case of simultaneous changes (during development/maintenance), the file with the latest date will win. I would like to avoid such data loss.
I am project manager for the overall develop process. What I want to do, is to introduce people to version control systems to tackle the simultaneous modifications problem. I plan to use Mercurial for several reasons, mainly because it is distributed, simple to explain, usable for personal use, free, and (most importantly) has great merging capabilities. However the benefits of the version control system when used locally (LAN) is lost because of the replication process (WAN) which doesn't know how to merge.
Some possible solutions are to :
use only version control over the WAN (hoping that compression will be enough to speed things up)
use only DFS, and track changes manually (error-prone)
find a work-around with both methods
The team is small (about 10 persons). Your help and experience is appreciated.
If it were me, I'd have a "central" repository at each location, with the developers from each site working on a different branch. One of those should probably be chosen as the "main" branch (ideally the one that will be making the most changes), although in practice it won't really matter much.
Each team's repo should be synchronized regularly (e.g., daily, on your 6 hour schedule, or even more often) with the repo from the other location, to reflect changes made in that branch. Then they would be merged to the site's branch (ideally this would be done automatically as part of the same update, but the exact details of how that merge will happen may vary, depending on your VCS of choice and your branching model).
Remember: "sync early, sync often"

How to limit the effect of client modifications to production systems

Our shop has developed a few WEB/SMS/DB solution for a dozen client installations. The applications have some real-time performance requirements, and are just good enough to function properly. The problem is that the clients (owners of the production servers) are using the same server/database for customizations that are causing problems with the performance of the applications that we created and deployed.
A few examples of clients' customizations:
Adding large tables with many text datatypes for the columns that get cast to other data types in the queries
No primary keys, indexes, or FK constraints
Use of external scripts that use count(*) from table where id = x, in a loop from the script, to determine how to construct more queries later in the same script. (no bulk actions that the planner can optimize or just do everything in a single pass)
All new code files on the server are created/owned by root, with 0777 permissions
The clients don't take suggestions/criticism well. If we just go ahead and try to port/change the scripts ourselves, the old code can come back, clobbering any changes that we make! Or with out limited knowledge of their use cases, we break functionality while trying to optimize their changes.
My question is this: how can we limit the resources to queries/applications other that what we create and deploy? Are there any pragmatic options in scenarios like this? We prided ourselves in having an OSS solution, but it seems that it's become a liability.
We use PG 8.3 running on a range on Linux Distos. The clients prefer php, but shell scripts, perl, python, and plpgsql are all used on the system in one form or another.
This problem started about two minutes after the first client was given full access to the first computer, and it hasn't gone away since. Anytime someone whose priorities are getting business oriented work done quickly they will be sloppy about it and screw up things for everyone. That's just how things work, because proper design and implementation are harder than cheap hacks. You're not going to solve this problem, all you can do is figure out how to make it easier for the client to work with you than against you. If you do it right, it will look like excellent service rather than nagging.
First off, the database side. There's now way to control query resources in PostgreSQL. The main difficulty is that tools like "nice" control CPU usage, but if the database doesn't fit in RAM it may very well be I/O usage that is killing you. See this developer message summarizing the issues here.
Now, if in fact it's CPU the clients are burning through, you can use two techniques to improve that situation:
Install a C function that changes the process priority (example 1, example 2) and make sure whenever they run something it gets called first (maybe put it into their psql config file, there are other ways).
Write a script that looks for postmaster processes spawned by their userid and renice them, make it run often in cron or as a daemon.
It sounds like your problem isn't the particular query processes they're running, but rather other modifications they're making to the larger structure. There's only one way to cope with that: you have to treat the client like they're an intruder and use the approaches of that portion of the computer security field to detect when they screw things up. Seriously! Install an intrusion detection system like Tripwire on the server (there are better tools, that's just the classic example), and have it alert you when they touch anything. New file that's 0777? Should jump right out of a proper IDS report.
On the database side, you can't directly detect the database being modified usefully. You should do a pg_dump of the schema every day into a file (pg_dumpall -g and pg_dump -s, then diff that against the last one you delivered and again alert you when it's changed. If you manage that this well, the contact with the client turns into "we noticed you changed on the server...what is it you're trying to accomplish with that?" which makes you look like you're really paying attention to them. That can turn into a sales opportunity, and they may stop fiddling with things as much just knowing you're going to catch it immediately.
The other thing you should start doing immediately is install as much version control software as you can on each client box. You should be able to login to each system, run the appropriate status/diff tool for the install, and see what's changed. Get that mailed to you regularly too. Again, this works best if combined with something that dumps the schema as a component to what it manages. Not enough people use serious version control approaches on the code that lives in the database.
That's the main set of technical approaches useful here. The rest of what you've got is a classic consulting client management problem that's far more of a people problem than a computer one. Cheer up, it could be worse--FSM help you if you give them ODBC access and they discover they can write their own queries in Access or something simple like that.