Resource Type in Concourse CI: What happens if Check always returns just the latest version? - concourse

I was looking at the git resource and found it curious that Check is doing a clone rather than the much more lightweight ls-remote. I think there are two reasons:
The possibility to filter commits based on metadata and what files changed in the repo
Because the docs say it's supposed to return an array of versions, not just the latest
The first one is obvious, but I don't see the reason for the second.
It is given the configured source and current version on stdin, and must print the array of new versions, in chronological order, to stdout, including the requested version if it's still valid.
But then later it says:
If your resource is unable to determine which versions are newer than the given version (e.g. if it's a git commit that was push -fed over), then the current version of your resource should be returned (i.e. the new HEAD).
So my question is, why can't a resource always just return a single version - the latest. I.e. not even the requested version if the source has moved on?
What functionality would be lost?
Note that this question is related to Implemented a Resource Type: How does Concourse use the output of the check, in, and out scripts?

If you were to do it that way, your resources would appear with one set of versions, and your get steps would not return the version of the file that was explicitly requested (the version returned by check is passed to in), which would be a violation of the resource type design, and frankly, a bad practice (why would you return something that wasn't requested just because it was newer? what if you need to pin a resource to a specific version?)
The answer to why the git resource does clone instead of ls-remote is because you can configure the resource to recognize new versions only if certain files are changed (or the inverse of that, if there was a new commit but the files changed were in the ignore_paths stanza of the source configuration, check does not return the new resource.
So the biggest question is, why do you want to do that? Ease of programming? Performance concerns?

Related

Storing code metrics

I'd like to write a pre-commit hook that tells you if you've improved/worsened some code metric of a project (i.e. average function length). The hook would have to know what the previous average function length was and I don't know where to store that information. One option would be to store an additional .metrics file in the repo but that sounds clunky. Another option would be to git stash, compute the metrics, git stash pop, compute the metrics again and print the delta. I'm inclined to go with the latter. Are the any other solutions?
Disclaimer: I am author of the Metrix++ tool, which I am using in the workflow I described below. I guess the same workflow can be executed with other tools capable to compare the results.
One of the ideas you suggested works perfectly, if you add a couple of CI checks (see the steps below). I find it solid. Not sure why you are considering it clunky.
I have got a file with metrics results which is updated before each commit and stored in VCS. Let's name this file metrics.db, and consider automation of the following workflow on build/test of a project:
1) if metrics.db has not been changed since last checkout (i.e. it is the original data for the previous/base revision), copy it to metrics-prev.db
2) Collect metrics for current code, what produces metrics.db file again. Note: It is very helpful when a metrics tool can do iterative scans for the best performance (i.e. calculate metrics for updated functions/classes), so it gives you the opportunity to run metrics tool on every build, including iterative.
3) Compare metrics-prev.db with metrics.db. If metrics identify regressions, fail the build and [optionally] do not allow to commit - team rule. If metrics are good, build is successful, and commit may happen.
4) [optionally] you may run Continuous Integration (CI) which validates that the actual committed metrics.db file corresponds to the committed code for the same revision (i.e. do the same 1-3 steps and make sure that the diff is zero at the step 3). If diff is not zero, it means somebody forgot to update the metrics.db file, and presumably did not execute pre-commit check, so revert the change.
5) [optionally] CI may do steps 1-3 if you fetch metrics.db as metrics-prev.db from the previous revision. In this case, CI may also check that the collected metrics.db is the same as committed (alternative or addition for the step 4).
Another implementation I have seen: metrics.db files are stored in a separate drive, out of VCS, and custom script is able to locate corresponding metrics.db for a revision. I find this solution unreliable as the drive can disappear, files can be moved and renamed, and so on. So, placing the file in VCS is better solution, but any will work.
I have attempted to do the alternative you suggested: switch to the previous revision and run the metrics tool twice. I abandoned this approach for several reasons: metrics check script alters your source files (so, it is impossible to include it into iterative rebuild and continue to work smoothly with your IDE as it will complain about changed files), and secondly it is very slow performance (comparing with iterative re-scans, it is extremely slow).
Hope it helps.

Why Rational Team Concert changes the files' last modified attribute?

I'm having some issues with the installation of Rational Team Concert on my server.
The thing is that when I upload some changes to the server (any kind), it changes the last modified attribute of the file, but it shouldn't do it.
Is there a way to avoid this behavior?
Thank you in advance!
This is something that we have tried to add to RTC SCM (and we still plan to). However, we found that it needs to be an option on load/update.
There are numerous details and discussions available # this work item on jazz.net
Regarding timestamp, getting over the fact that relying on it in a version control tool isn't always considered a best-practice (see "What's the equivalent of use-commit-times for git?"), it is actually a complex issue:
an SCM loader wouldn't use just timestamp to determined what file has changed (Task 179263)
you can have various requirements for that timestamp (like in Defect 159043, where the file timestamp of the modified file on disk that of when it was delivered, not when I accepted.). The variable JAZZ_CCM_SKIP_MOD_TIME=true is mentioned so check if that could improve your specific case.
it is all based on the assumption the timestamp is correctly set by the local workstation, which isn't always true, as illustrated in Task 77201

automatically add to each changeset a file that contains the new revision number

Whenever I commit, I want to save in a file the revision number of the changeset that I'm creating. I also want that file to be added to the same changeset.
Note that the revision number of the parent of the working directory is not what I want because the changeset being created will have a higher revision number. Usually it's just the parent revision number + 1, but if someone committed since the time I checked out my working directory, it may be higher.
UPDATE:
It's obviously very strange that I'd be interested in this information, since as the comments below say, it's repo-specific and won't match what others see. However, I am the only developer, using a single repository. I find the repo revision numbers super convenient to keep track of what code was used to generated various research results. I can see how it's not great, but it works in this specific scenario.
Obviously, I could use the hash, but that's harder to remember and use in a conversation. If I did want to use the hash, my question would still remain: how to get the hash of the changeset that's being committed.
Related:
mercurial - I want to add some custom code to be run after commit seems to be unable to achieve the desired outcome.
This article is clearly relevant, but unless I miss something, it relies on the fact that nobody committed to the same repository since the last checkout by the current user.
I'm under Windows 7, TortoiseHG, latest version.
You can probably just put this in there:
TIP=$(hg id --num --rev tip)
NEXT=$(($TIP + 1))
but please do keep in mind that those numbers are almost entirely meaningless. When someone else clones that repository the revision numbers can change. Only the nodeids have any meaning outside the repository in which you looked them up.

eclipse CVS usage: clean timestamps

during synchronisation with the CVS server, eclipse compares the content of the files (of course it uses internally CVS commands). But files without any content change are also shown as different, if they have another timestamp, because they are "touched". You always have to look manually per file comparison dialog if there was really a change in it or not.
Due to auto-generation I have some files that always get new timestamps and therefore I always have to check manually if they really contain any change.
At the eclipse docu I read :
Update and Commit Operations
There are several flavours of update and commit operations available
in the Synchronize view. You can perform the standard update and
commit operation on all visible applicable changes or a selected
subset. You can also choose to override and update, thus ignoring any
local changes, or override and commit, thus making the remote resource
match the contents of the local resource. You can also choose to clean
the timestamps for files that have been modified locally (perhaps by
an external build tool) but whose contents match that of the server.
That's exactly what I want to do. But I don't know how!? There is no further description/manual ...
Did anybody use this functionality and can help me (maybe even post a screenshot)?
Thanks in advance,
Mayoares
When you perform a CVS Update on a project (using context menu Team->Update), Eclipse implicitly updates the timestamp of local files whose contents match that of the server.

Performing Historical Builds with Mercurial

Background
We use a central repository model to coordinate code submissions between all the developers on my team. Our automated nightly build system has a code submission cut-off of 3AM each morning, when it pulls the latest code from the central repo to its own local repository.
Some weeks ago, a build was performed that included Revision 1 of the repo. At that time, the build system did not in any way track the revision of the repository that was used to perform the build (it does now, thankfully).
-+------- Build Cut-Off Time
|
|
O Revision 1
An hour before the build cut-off time, a developer branched off the repository and committed a new revision in their own local copy. They did NOT push it back to the central repo before the cut-off and so it was not included in the build. This would be Revision 2 in the graph below.
-+------- Build Cut-Off Time
|
| O Revision 2
| |
| |
|/
|
O Revision 1
An hour after the build, the developer pushed their changes back to the central repo.
O Revision 3
|\
| |
-+-+----- Build Cut-Off Time
| |
| O Revision 2
| |
| |
|/
|
O Revision 1
So, Revision 1 made it into the build, while the changes in Revision 2 would've been included in the following morning's build (as part of Revision 3). So far, so good.
Problem
Now, today, I want to reconstruct the original build. The seemingly obvious steps to do this would be to
determine the revision that was in the original build,
update to that revision, and
perform the build.
The problem comes with Step 1. In the absence of a separately recorded repository revision, how can I definitively determine what revision of the repo was used in the original build? All revisions are on the same named branch and no tags are used.
The log command
hg log --date "<cutoff_of_original_build" --limit 1
gives Revision 2 - not Revision 1, which was in the original build!
Now, I understand why it does this - Revision 2 is now the revision closest to the build cut-off time - but it doesn't change the fact that I've failed to identify the correct revision on which to rebuild.
Thus, if I can't use the --date option of the log command to find the correct historical version, what other means are available to determine the correct one?
Considering whatever history might have been in the undo files is gone by now (the only thing I can think of that could give an indication), I think the only way to narrow it down to a specific revision will be a brute force approach.
If the range of possible revisions is a bit large and the product of building changes in size or other non-date aspect that is linear or near enough to linear, you may be able to use the bisect command to basically do a binary search to narrow down what revision you're looking for (or maybe just get close to it). At each revision that bisect stops to test, you would build at that revision and test whatever aspect you're using to compare against what the scheduled build produced that night. Might not even require building, depending on the test.
If it really is as simple as the graph you depict and the range of possibilities is short, you could just start from the latest revision it might be and walk backwards a few revisions, testing against the original build.
As for a definitive test comparing the two builds, hashing the test build and comparing it to a hash of the original build might work. If a compile on the nightly build machine and a compile on your machine of the same revision do not produce binary-identical builds, you may have to use binary diffing (such as with xdelta or bsdiff) and look for the smallest diff.
Mercurial does not have the information you want:
Mercurial does not, out of the box, make it its business to log and track every action performed regarding a repository, such as push, pull, update. If it did, it would be producing a lot of logging information. It does make available hooks that can be used to do that if one so desires.
It also does not care what you do with the contents of the working directory, such as opening files or compiling, so of course it is not going to track that at all. It's simply not what Mercurial does.
It was a mistake to not know exactly what the scheduled build was building. You agree implicitly because you now log that very information. The lack of that information before has simply come back to bite you, and there is no easy way out of it. Mercurial does not have the information you need. If the central repo is just a shared directory rather than a web-hosted repository that might have tracked activity, the only information about what was built is in the compiled version. Whether it is some metadata declared in the source that becomes part of the build, a naive aspect like filesize, or you truly are stuck hashing files, you can't get your answer without some effort.
Maybe you don't need to test every revision; there may be revisions you can be certain are not candidates. Knowing the time of the compile is merely a factor as the upper bound on the range of revisions to test. You know that revisions after that time could not possibly be candidates. What you don't know is what was pushed to the server at the time the build server pulled from it. But you do know that revisions from that day are the most likely. You also know that revisions in parallel unnamed branches are less-likely candidates than linear revisions and merges. If there are a lot of parallel unnamed branches and you know all your developers merge in a particular way, you might know whether the revisions under parent1 or parent2 should be tested based.
Maybe you don't even need to compile if there is metadata you can parse from the source code to compare with what you know about the specific build.
And you can automate your search. It would be easiest to do so with a linear search: less heuristics to design.
The bottom line is simply that Mercurial does not have a magic button to help in this case.
Apologies, it's probably bad form to answer one's own question, but there wasn't enough room to properly respond in a comment box.
To Joel, a couple of things:
First - and I mean this sincerely - thanks for your response. You provided an option that was considered, but which was ultimately rejected because it would be too complex to apply to my build environment.
Second, you got a little preachy there. In the question, it was understood that because a separately recorded repository revision was absent, there would be 'some effort' to figure out the correct revision. In a response to Lance's comment (above), I agree that recording the 40-byte repository hash is the 'correct' way of archiving the necessary build info. However, this question was about what CAN be done IF you do not have that information.
To be clear, I posted my question on StackOverflow for two reasons:
I figured that others must have run into this situation before and that, perhaps, someone may have determined a means to get at the requisite information. So, it was worth a shot.
Information sharing. If others run into this problem in the future, they will have an online reference that clearly explained the problem and discussed viable options for remediation.
Solution
In the end, perhaps my greatest thanks should go to Chris Morgan, who got me thinking to use the central server's mercurial-server logs. Using those logs, and some scripting, I was able to definitively determine the set of revisions that were pushed to the central repository at the time of the build. So, my thanks to Chris and to everyone else who responded.
As Joel said, it is not possible. However there are certain solutions that can help you:
maintain a database of nightly build revisions (date + changeset id)
build server can automatically tag revision it is based on (nightly/)
switch to Bazaar, it manages version numbers differently (branched versions are in form of REVISION_FORKED.BRANCH_NUMBER.BRANCH_REVISION so your change number 2 would be 1.1.1