Mercurial cat command using --include - version-control

Using version 4.1.1 of Mercurial, I would like to provide a file specifying a bunch of files as args to an hg cat command, so that each file is output to a different file. I thought the following would work:
hg cat -o 'catOut-%s' --include listfile:files.lst
where files.lst looks like this
foo01.txt
foo02.txt
But it yields an error message saying "invalid arguments" plus a usage message.
Here is an MWE that sets up a code repository with the required structure and then tries running the cat command shown above.
hg init mwe
cd mwe
echo abc > foo01.txt
echo def > foo02.txt
echo PQR > baz.txt
echo files.lst > .hgignore
hg add .hgignore
hg add foo*.txt
hg add baz.txt
echo foo01.txt >> files.lst
echo foo02.txt >> files.lst
hg ci -m "Adding all files"
hg cat -o 'catOut-%s' baz.txt
cat catOut-baz.txt
rm catOut*
hg cat -o 'catOut-%s' --include listfile:files.lst baz.txt
cat catOut-baz.txt
hg cat -o 'catOut-%s' --include listfile:files.lst
Here is a trace of these commands and their results as typed to a shell:
~/tmp $ hg init mwe
~/tmp $ cd mwe
~/tmp/mwe $ echo abc > foo01.txt
~/tmp/mwe $ echo def > foo02.txt
~/tmp/mwe $ echo PQR > baz.txt
~/tmp/mwe $ echo files.lst > .hgignore
~/tmp/mwe $ hg add .hgignore
~/tmp/mwe $ hg add foo*.txt
~/tmp/mwe $ hg add baz.txt
~/tmp/mwe $ echo foo01.txt >> files.lst
~/tmp/mwe $ echo foo02.txt >> files.lst
~/tmp/mwe $ hg ci -m "Adding all files"
~/tmp/mwe $ hg cat -o 'catOut-%s' baz.txt
~/tmp/mwe $ cat catOut-baz.txt
cat catOut-baz.txt
PQR
~/tmp/mwe $ rm catOut*
rm catOut*
~/tmp/mwe $ hg cat -o 'catOut-%s' --include listfile:files.lst baz.txt
~/tmp/mwe $ cat catOut-baz.txt
cat: catOut-baz.txt: No such file or directory
~/tmp/mwe $ hg cat -o 'catOut-%s' --include listfile:files.lst
hg cat -o 'catOut-%s' --include listfile:files.lst
hg cat: invalid arguments
hg cat [OPTION]... FILE...
output the current or given revision of files
options ([+] can be repeated):
-o --output FORMAT print output to file with formatted name
-r --rev REV print the given revision
--decode apply any matching decode filter
-I --include PATTERN [+] include names matching the given patterns
-X --exclude PATTERN [+] exclude names matching the given patterns
(use 'hg cat -h' to show more help)
~/tmp/mwe $
You have to supply a file argument to avoid the error message. But that argument is ignored if an --include and an -o are supplied.
I suspect no one has ever used the --include argument to cat before, because there is a dearth of explanation out there about how --include arguments are handled. Either that or I'm overlooking something obvious.

You have to supply a file argument to avoid the error message. But that argument is ignored if an --include and an -o are supplied.
It is not literally ignored. The problem is that --include means something odd.
... because there is a dearth of explanation out there about how --include arguments are handled.
That does seem to be the case! There is a description in hg help patterns but it is rather inadequate (in my opinion at least). What --include means is that only files matching the patterns in the file are used. Think of this as "include only", rather than "also include".
Thus, if your listfile has those two file names in it, you may run, e.g.:
hg cat -o 'catOut-%s' --include listfile:files.lst baz.txt foo01.txt
and Mercurial will extract foo01.txt since it's in the list.
You might think you could use:
hg cat -o 'catOut-%s' --include listfile:files.lst '*'
but you can't (well, you can on Windows, as hg does glob style matching there, but that's the wrong approach). The right trick is to direct hg cat to read a directory, namely the top level directory of the repository:
hg cat .
(though there are similar methods, such as using set:*; see hg help filesets). Then the filtering produced by --include strips you down to just the files you want included.
More "color", as they say in some circles - no need to read this!
(This is just side stuff I found while researching this answer a bit. I wondered how one made hg cat scan every file in a revision, so I plunged into the source.)
For reference, here is the snippet of Python code that implements hg cat:
#command('cat',
[('o', 'output', '',
_('print output to file with formatted name'), _('FORMAT')),
('r', 'rev', '', _('print the given revision'), _('REV')),
('', 'decode', None, _('apply any matching decode filter')),
] + walkopts,
_('[OPTION]... FILE...'),
inferrepo=True)
def cat(ui, repo, file1, *pats, **opts):
"""output the current or given revision of files
Print the specified files as they were at the given revision. If
no revision is given, the parent of the working directory is used.
Output may be to a file, in which case the name of the file is
given using a format string. The formatting rules as follows:
:``%%``: literal "%" character
:``%s``: basename of file being printed
:``%d``: dirname of file being printed, or '.' if in repository root
:``%p``: root-relative path name of file being printed
:``%H``: changeset hash (40 hexadecimal digits)
:``%R``: changeset revision number
:``%h``: short-form changeset hash (12 hexadecimal digits)
:``%r``: zero-padded changeset revision number
:``%b``: basename of the exporting repository
Returns 0 on success.
"""
ctx = scmutil.revsingle(repo, opts.get('rev'))
m = scmutil.match(ctx, (file1,) + pats, opts)
ui.pager('cat')
return cmdutil.cat(ui, repo, ctx, m, '', **opts)
The most critical line is:
def cat(ui, repo, file1, *pats, **opts):
This means that non-option FILE... arguments (as in the description just before the def) are bound with the first one going to file1 and the rest going to *pats (as a Python tuple). This forces you to pass one or more file-name or file-set arguments.
Those file name arguments (baz.txt or whatever) are passed in to scmutil.match, which is what is going to find the files in the manifest for the specified revision—the one now in ctx, obtained by the previous line calling scmutil.revsingle, which gets the last revision in the --rev option, defaulting to the current revision (the first parent of the working directory).
It's scmutil.match that handles the --include option. Unfortunately this code is rather impenetrable:
m = ctx.match(pats, opts.get('include'), opts.get('exclude'),
default, listsubrepos=opts.get('subrepos'), badfn=badfn)
(with pats being the non-empty file names passed in as command line arguments), which invokes this code in context.py:
def match(self, pats=None, include=None, exclude=None, default='glob',
listsubrepos=False, badfn=None):
if pats is None:
pats = []
r = self._repo
return matchmod.match(r.root, r.getcwd(), pats,
include, exclude, default,
auditor=r.nofsauditor, ctx=self,
listsubrepos=listsubrepos, badfn=badfn)
which gets us into match.py's class match object, which is what implements the listfile: part. Here's a bit from that:
matchfns = []
if include:
kindpats = self._normalize(include, 'glob', root, cwd, auditor)
self.includepat, im = _buildmatch(ctx, kindpats, '(?:/|$)',
listsubrepos, root)
roots, dirs = _rootsanddirs(kindpats)
self._includeroots.update(roots)
self._includedirs.update(dirs)
matchfns.append(im)
and self._normalize winds up reading the file given as the listfile argument, so that's what is in kindpats. (The string literal passed to _buildmatch is a regular expression glob suffix pattern, i.e., file names from the include file are followed by an implied trailing slash or end-of-string.)

Related

GitHub - Remove a indexed file from "Languages" on first page

How can I remove this indexed HTML page, that are a documentation to one of the external librarys I use on my GitHub blob?
I have tried alot of diffrent commands, but don't find a way to remove this file from the GitHub Linguist indexer...
Here are the "Languages" that are indexed on the startpage:
[image] Languages on the startpage
The file that I want to exclude:
[image] HTML file that needs to be excluded
TestProject/wwwroot/lib/bootstrap-icons/docs/index.html
Code that I've tried to get it removed via ".attributes"-file in root-folder (the vendored, works... But not getting rid of this HTML-file... from the GitHub-Languages) :
### vendored:
TestProject/wwwroot/lib/* linguist-vendored
### documentations:
TestProject/wwwroot/lib/bootstrap-icons/* linguist-documentation
and tried:
TestProject/wwwroot/lib/bootstrap-icons/* -linguist-documentation
and this:
TestProject/wwwroot/lib/bootstrap-icons/docs/* linguist-documentation
and this:
TestProject/wwwroot/lib/bootstrap-icons/docs/* -linguist-documentation
and this:
TestProject/wwwroot/lib/* linguist-documentation
and this:
TestProject/wwwroot/lib/* -linguist-documentation
But I can't figure it out how to remove this file:
TestProject/wwwroot/lib/bootstrap-icons/docs/index.html
Please help me with the correct syntax to remove the file from being indexed as a Language in my GitHub repository, main branch. 🙂
You've got the right idea and the right Linguist overrides (either will do the trick). The problem is your path matching isn't quite right.
From the .gitattributes docs
The rules by which the pattern matches paths are the same as in .gitignore files (see gitignore[5]), with a few exceptions:
[...]
If we look in the .gitignore docs (emphasis is mine):
An asterisk "*" matches anything except a slash. The character "?" matches any one character except "/". The range notation, e.g. [a-zA-Z], can be used to match one of the characters in a range. See fnmatch(3) and the FNM_PATHNAME flag for a more detailed description.
Two consecutive asterisks ("**") in patterns matched against full pathname may have special meaning:
[...]
A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.
The files you're trying to ignore are in sub-directories of the paths you've specified so you need to either:
use TestProject/wwwroot/lib/** linguist-vendored to recurse, or
use TestProject/wwwroot/lib/bootstrap-icons/docs/* linguist-vendored to limit to this directory.
We can demonstrate this without even using Linguist thanks to git check-attr:
$ # Create a repo with just the one file
$ git init -q Test-Project
$ cd Test-Project
$ mkdir -p TestProject/wwwroot/lib/bootstrap-icons/docs/
$ echo "<html>" > TestProject/wwwroot/lib/bootstrap-icons/docs/index.html
$ git add -A
$ git commit -m 'Add file'
[main (root-commit) bed71b5] Add file
1 file changed, 1 insertion(+)
create mode 100644 TestProject/wwwroot/lib/bootstrap-icons/docs/index.html
$
$ # Add your initial override
$ git add -A && git commit -m 'attribs'
[main 7d0a0cf] attribs
1 file changed, 1 insertion(+)
create mode 100644 .gitattributes
$
$ # Check the attributes
$ git check-attr linguist-vendored TestProject/wwwroot/lib/bootstrap-icons/docs/index.html
TestProject/wwwroot/lib/bootstrap-icons/docs/index.html: linguist-vendored: unspecified
$ # So it doesn't have any effect.
$ # Now lets recurse
$ echo "TestProject/wwwroot/lib/** linguist-vendored" > .gitattributes
$ git add -A && git commit -m 'attribs'
[main 9007c34] attribs
1 file changed, 1 insertion(+), 1 deletion(-)
$ git check-attr linguist-vendored TestProject/wwwroot/lib/bootstrap-icons/docs/index.html
TestProject/wwwroot/lib/bootstrap-icons/docs/index.html: linguist-vendored: set
$ # Woohoo!!! It's work.
$ # Lets be specific to the docs dir
$ echo "TestProject/wwwroot/lib/bootstrap-icons/docs/* linguist-vendored" > .gitattributes
$ git add -A && git commit -m 'attribs'
[main a46f416] attribs
1 file changed, 1 insertion(+), 1 deletion(-)
$ git check-attr linguist-vendored TestProject/wwwroot/lib/bootstrap-icons/docs/index.html
TestProject/wwwroot/lib/bootstrap-icons/docs/index.html: linguist-vendored: set
$ # Woohoo!!! It's worked too
Some good troubleshooting from #lildude, shown that:
All the files was ignored correctly.
I had alot of CSHTML-files under my repository that was grouped as HTML+Razor (see this post on GitHub: GitHub linguist discussion ) .
When I clicked the "HTML"-link on startpage under language, it took me to: https://github.com/pownas/Test-Project/search?l=html
But the startpage under language was telling me that I had around 40% html from the HTML+Razor search: https://github.com/pownas/Test-Project/search?l=HTML%2BRazor

sed not working properly with multiple input files

sed -i is creating a backup of all files in subdirectories before editing in place (as expected) but it's not actually editing files in subdirectories.
$ mkdir -p a/b
$ echo "A" > a/a.txt
$ echo "B" > a/b/b.txt
Now I have two text files, one in a one in a subdirectory of a
$ sed -i.bac "1s/^/PREPENDED /" a/**/*.txt
Backups are created for both:
$ find a
a
a/a.txt
a/a.txt.bac
a/b
a/b/b.txt
a/b/b.txt.bac
Only a.txt is edited:
$ cat a/a.txt
PREPENDED A
$ cat a/b/b.txt
B
I'm using ZSH (so I have globstar support) and I'm on Mac.
Why is this happening and how can I fix it?
It's happening because your sed invocation only has a single line 1, which happens to be in a.txt. If you want it to do it for each file then you need to invoke sed multiple times.
for f in a/**/*.txt
do
sed ... "$f"
done
Since you are needing to descend through several levels of directories, a single invocation of sed alone is not sufficient. However, using find you can accomplish what you want in a single line. If you are not familiar with find ... -exec '{}' \; it is worth taking a few minutes with startpage.com and do a quick search. In your case, the following invocation works well:
find a -type f -name "*.txt" -exec sed -i.bac 's/^/PREPENDED /' '{}' \;
Here find searches directory a and all below for any file (-type f) matching *.txt, then for each file (indicated by '{}') -exec executes sed -i.bac 's/^/PREPENDED /' and lastly an escaped \; is given to indicate the end of the -exec command.
results:
$ ls -1 a
b
a.txt
a.txt.bac
$ ls -1 a/b
b.txt
b.txt.bac
$ cat a/a.txt
PREPENDED A
$ cat a/b/b.txt
PREPENDED B
As was correctly pointed out, with globstar set shopt -s globstar it is unnecessary to use find as the following invocation of sed is sufficient:
sed -i.bac 's/^/PREPENDED /' a/**/*.txt

comparing two directories with separate diff output per file

I'd need to see what has been changed between two directories which contain different version of a software sourcecode. While I have found a way to get a unique .diff file, how can I obtain a different file for each changed file in the two directories? I'd need this, as the "main" is about 6 MB and wanted some more handy thing.
I came around this problem too, so I ended up with some lines of a shell script. It takes three arguments: Source and destination directory (as used for diff) and a target folder (should exist) for the output.
It's a bit hacky, but maybe it would be useful for someone. So use with care, especially if your paths have special characters.
#!/bin/sh
DIFFARGS="-wb"
LANG=C
TARGET=$3
SRC=`echo $1 | sed -e 's/\//\\\\\\//g'`
DST=`echo $2 | sed -e 's/\//\\\\\\//g'`
if [ ! -d "$TARGET" ]; then
echo "'$TARGET' is not a directory." >&2
exit 1
fi
diff -rqN $DIFFARGS "$1" "$2" | sed "s/Files $SRC\/\(.*\?\) and $DST\/\(.*\?\) differ/\1/" | \
while read file
do
if [ ! -d "$TARGET/`dirname \"$file\"`" ]; then
mkdir -p "$TARGET/`dirname \"$file\"`"
fi
diff $DIFFARGS -N "$1/$file" "$2/$file" > "$TARGET"/"$file.diff"
done
if you want to compare source code it is better to commit it to a source vesioning program as "svn".
after you have done so. do a diff of your uploaded code and pipe it to file.diff
svn diff --old svn:url1 --new svn:url2 > file.diff
A bash for loop will work for you. The following will diff two directories with C source code and produce a separate diff for each file.
for FILE in $(find <FIRST_DIR> -name '*.[ch]'); do DIFF=<DIFF_DIR>/$(echo $FILE | grep -o '[-_a-zA-Z0-9.]*$').diff; diff -u $FILE <SECOND_DIR>/$FILE > $DIFF; done
Use the correct patch level for the lines starting with +++

Mercurial: Converting existing folders into sub-repos

I have a Mercurial repository that looks like this:
SWClients/
SWCommon
SWB
SWNS
...where SWCommon is a a library common to the other two projects. Now, I want to convert SWCommon into a sub-repository of SWClients, so I followed the instructions here and here. However, in contrast to the example in the first link I want my sub-repository to have the same name as the folder had at the beginning. In detail, this is what I have done:
Create a file map.txt as follows
include SWCommon
rename SWCommon .
Create a file .hgsub as follows
SWCommon = SWCommon
Then run
$ hg --config extensions.hgext.convert= convert --filemap map.txt . SWCommon-temp
...lots of stuff happens...
Then
$ cd SWCommon-temp
$ hg update
101 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ cd ..
$ mv SWCommon SWCommon-old
$ mv SWCommon-temp SWCommon
$ hg status
abort: path 'SWCommon/SWCommon.xcodeproj/xcuserdata/malte.xcuserdatad/xcschemes/SWCommon.xcscheme' is inside nested repo 'SWCommon'
...which is indeed the case, but why is that a reason to abort? The other strange thing is that if I do not do that last 'mv' above and I execute an 'hg status' then, I end up with lots of 'missing' files in SWCommon as you would expect. The example in the link never makes it this far and basically stops on the hg update above? How do you make it work in practice?
Not currently possible. You could create a new repo converting the original one like:
$ hg --filemap excludemap.txt SWClients SWClients-without-SWCommon
With a excludemap.txt like:
exclude "SWCommon"
And then add the subrepo there.
$ hg --filemap map.txt SWCommon SWClients-without-SWCommon/SWCommon
$ cd SWClients-without-SWCommon
$ hg add SWCommon
$ hg ci -m "Created subrepo"
See the mailing list thread that discusses this problem.

How do I identify what branches exist in CVS?

I have a legacy CVS repository which shall be migrated to Perforce.
For each module, I need to identify what branches exist in that module.
I just want a list of branch names, no tags.
It must be a command line tool, for scripting reasons.
For example (assuming there is a cvs-list-branches.sh script):
$ ./cvs-list-branches.sh module1
HEAD
dev_foobar
Release_1_2
Release_1_3
$
As a quick hack:) The same stands true for rlog.
cvs log -h | awk -F"[.:]" '/^\t/&&$(NF-1)==0{print $1}' | sort -u
Improved version as per bdevay, hiding irrelevant output and left-aligning the result:
cvs log -h 2>&1 | awk -F"[.:]" '/^\t/&&$(NF-1)==0{print $1}' | awk '{print $1}' | sort -u
You could simply parse log output of cvs log -h. For each file there will be a section named Symbolic names :. All tags listed there that have a revision number that contains a zero as the last but one digit are branches. E.g.:
$ cvs log -h
Rcs file : '/cvsroot/Module/File.pas,v'
Working file : 'File.pas'
Head revision : 1.1
Branch revision :
Locks : strict
Access :
Symbolic names :
1.1 : 'Release-1-0'
1.1.2.4 : 'Release-1-1'
1.1.0.2 : 'Maintenance-BRANCH'
Keyword substitution : 'kv'
Total revisions : 5
Selected revisions : 0
Description :
===============================================
In this example Maintenance-BRANCH is clearly a branch because its revision number is listed as 1.1.0.2. This is also sometimes called a magic branch revision number.
This will bring up tags too, but tags and branches are basically the same in CVS.
$cvs.exe rlog -h -l -b module1
I have a small collection of "handy" korn shell functions one of which fetches tags for a given file. I've made a quick attempt to adapt it to do what you want. It simply does some seding/greping of the (r)log output and lists versions which have ".0." in them (which indicates that it's a branch tag):
get_branch_tags()
{
typeset FILE_PATH=$1
TEMP_TAGS_INFO=/tmp/cvsinfo$$
/usr/local/bin/cvs rlog $FILE_PATH 1>${TEMP_TAGS_INFO} 2>/dev/null
TEMPTAGS=`sed -n '/symbolic names:/,/keyword substitution:/p' ${TEMP_TAGS_INFO} | grep "\.0\." | cut -d: -f1 | awk '{print $1}'`
TAGS=`echo $TEMPTAGS | tr ' ' '/'`
echo ${TAGS:-NONE}
rm -Rf $TEMP_TAGS_INFO 2>/dev/null 1>&2
}
with Wincvs (Gui client for windows) this is trivial, a right click will give you any branches and tags the files have.
Trough a shell you may use cvs log -h -l module.
Check for the very first file created and committed in the repository. Open the file in server which will list all the Tags and Branches together