How do I search a CVS repository for a particular file? - version-control

Is there any way to do it? I only have client access and no access to the server. Is there a command I've missed or some software that I can install locally that can connect and find a file by filename?

You could grep the output of
cvs rlog -Nh .
(note the period character at the end - this effectively means: the whole repository).
That should give you info about the whole shebang including removed files and files added on branches.

You can use
cvs rls -Rde <modulename>
which will give you all files in recursively, e.g.
foo:
/x.py/1.2/Mon Dec 1 23:33:51 2008//
/y.py/1.1/Mon Dec 1 23:33:31 2008//
D/bar////
foo/bar:
/xxx/1.1/Mon Dec 1 23:36:38 2008//
Notice that the -d option gives you also deleted files; not sure whether you
wanted that. Without -e, it only gives you the file names.

Related

Using tac on most recent log file out of several log files in a directory

I have several log files in a directory that we’ll call path/to/directory that are in the following format after long listing in Red Hat Enterprise 6:
-rw-r——-. 1 root root 17096 Sep 30 11:00 logfile_YYYYDDMM_HHMMSS.log
There are several of these log files that are generated everyday. I need to automatically tac the most recently-modified file without typing the exact name of the log file. For example, I’d like to do:
tac /path/to/directory/logile*.log | grep -m 1 keyword
And have it automatically tac the most recently modified file and grep the keyword in the reverse direction from the end of the log file so it runs quicker. Is this possible?
The problem I’m running into is that there is always more than one log file in the /path/to/directory and I can’t get Linux to automatically tac the most recently modified file as of yet. Any help would be greatly appreciated.
I’ve tried:
tac /path/to/directory/logfile_$(date +%Y%m%d)*.log
which will tac a file created on the present date but the part that I’m having trouble with is using tac on the newest file (by YYYYMMDD AND HHMMSS) because multiple files can be generated on the same date but only one of them can be the most current and the most current log file is the only one I care about. I can’t use a symbolic link either.. Limitations, sigh.
The problem you seem to be expressing in your question isn't so much about tac, but rather .. how to select the most recent of a set of predictably named files in a directory.
If your filenames really are in the format logfile_YYYYDDMM_HHMMSS.log, then they will sort lexically without the need for an innate understanding of dates. Thus, if your shell is bash, you might:
shopt -s nullglob
for x in /path/to/logfile_*.log; do
[[ "$x" > "$file" ]] && file="$x"
done
The nullglob option tells bash to expand a glob matching no files as a null rather than as a literal string. Following the code above, you might want to test for the existence of $hit before feeding it to tac.

Can we wget with file list and renaming destination files?

I have this wget command:
sudo wget --user-agent='some-agent' --referer=http://some-referrer.html -N -r -nH --cut-dirs=x --timeout=xxx --directory-prefix=/directory/for/downloaded/files -i list-of-files-to-download.txt
-N will check if there is actually a newer file to download.
-r will turn the recursive retrieving on.
-nH will disable the generation of host-prefixed directories.
--cut-dirs=X will avoid the generation of the host's subdirectories.
--timeout=xxx will, well, timeout :)
--directory-prefix will store files in the desired directorty.
This works nice, no problem.
Now, to the issue:
Let's say my files-to-download.txt has these kind of files:
http://website/directory1/picture-same-name.jpg
http://website/directory2/picture-same-name.jpg
http://website/directory3/picture-same-name.jpg
etc...
You can see the problem: on the second download, wget will see we already have a picture-same-name.jpg, so it won't download the second or any of the following ones with the same name. I cannot mirror the directory structure because I need all the downloaded files to be in the same directory. I can't use the -O option because it clashes with --N, and I need that. I've tried to use -nd, but doesn't seem to work for me.
So, ideally, I need to be able to:
a.- wget from a list of url's the way I do now, keeping my parameters.
b.- get all files at the same directory and being able to rename each file.
Does anybody have any solution to this?
Thanks in advance.
I would suggest 2 approaches -
Use the "-nc" or the "--no-clobber" option. From the man page -
-nc
--no-clobber
If a file is downloaded more than once in the same directory, >Wget's behavior depends on a few options, including -nc. In certain >cases, the local file will be
clobbered, or overwritten, upon repeated download. In other >cases it will be preserved.
When running Wget without -N, -nc, -r, or -p, downloading the >same file in the same directory will result in the original copy of file >being preserved and the second copy
being named file.1. If that file is downloaded yet again, the >third copy will be named file.2, and so on. (This is also the behavior >with -nd, even if -r or -p are in
effect.) When -nc is specified, this behavior is suppressed, >and Wget will refuse to download newer copies of file. Therefore, ""no->clobber"" is actually a misnomer in
this mode---it's not clobbering that's prevented (as the >numeric suffixes were already preventing clobbering), but rather the >multiple version saving that's prevented.
When running Wget with -r or -p, but without -N, -nd, or -nc, >re-downloading a file will result in the new copy simply overwriting the >old. Adding -nc will prevent this
behavior, instead causing the original version to be preserved >and any newer copies on the server to be ignored.
When running Wget with -N, with or without -r or -p, the >decision as to whether or not to download a newer copy of a file depends >on the local and remote timestamp and
size of the file. -nc may not be specified at the same time as >-N.
A combination with -O/--output-document is only accepted if the >given output file does not exist.
Note that when -nc is specified, files with the suffixes .html >or .htm will be loaded from the local disk and parsed as if they had been >retrieved from the Web.
As you can see from this man page entry, the behavior might be unpredictable/unexpected. You will need to see if it works for you.
Another approach would be to use a bash script. I am most comfortable using bash on *nix, so forgive the platform dependency. However the logic is sound, and with a bit of modifications, you can get it to work on other platforms/scripts as well.
Sample pseudocode bash script -
for i in `cat list-of-files-to-download.txt`;
do
wget <all your flags except the -i flag> $i -O /path/to/custom/directory/filename ;
done ;
You can modify the script to download each file to a temporary file, parse $i to get the filename from the URL, check if the file exists on the disk, and then take a decision to rename the temp file to the name that you want.
This offers much more control over your downloads.

GitHub API - How do I find out if a file is actually a symlink?

When querying a symlink via the GitHub API, I get different results if the symlink points to a file as opposed to a directory. The latter is more well behaved in that it returns "type": "symlink" as part of its JSON, whereas the former returns "type": "file". Example file symlink, example directory symlink.
It's very confusing when a symlink advertises itself as a file, as GET-ing its download URL will just get you the target of the symlink and not the file contents.
How do I tell if a file is actually a symlink, as opposed to a real file?
Also, is the behaviour of returning type "file" for file symlinks a downright bug? It just doesn't seem right.
The answer is unfortunately "you don't". There is no way (with the API in its current state) to differentiate between a file request and a symlink. From the documentation:
If the requested :path points to a symlink, and the symlink's target is a normal file in the repository, then the API responds with the content of the file [...]
Otherwise, the API responds with an object describing the symlink itself:
I raised this with GitHub support and they confirmed there's no way to do this. They offered to raise it as a request with the internal teams, but I would imagine it's unlikely to get picked up.
One work around (that isn't suitable in all scenarios) is to request the file via https://raw.githubusercontent.com/ which will return either the file contents if it's a real file, or just the file path if it's a symlink.
A symlink in Git doesn't care whether the target is a file or a directory. (Or even if the target exists.)
The API is not returning "file" for symlink The file in question is not a symbolic link. It's a regular file. After cloning your repository:
% ls -Flas muzhack/files/littleBitsMidiNotes.ino
4 -rw-r--r-- 1 user group 3247 Jun 7 12:00 muzhack/files/littleBitsMidiNotes.ino
% git ls-files --stage muzhack/files/littleBitsMidiNotes.ino
100644 08918243048ae4a4f57e69a34776e9a0bd1ec7af 0 muzhack/files/littleBitsMidiNotes.ino
A mode of 100644 (that first field returned by ls-files) indicates that this is a regular file. In contrast, the entry that GitHub is reporting as a symlink is, in fact, a symbolic link:
% ls -Flas muzhack/files/adapter-board-files
4 lrwxr-xr-x 1 user group 25 Jun 7 12:00 muzhack/files/adapter-board-files# -> ../../adapter-board-files
% git ls-files --stage muzhack/files/adapter-board-files
120000 ef17a5e7b4bef4e51f19dc6b4c360c95cbb223c8 0 muzhack/files/adapter-board-files
So the GitHub API appears to be reporting this information correctly.

Perforce: Prevent keywords from being expanded when syncing files out of the depot?

I have a situation where I'd like to diff two branches in Perforce. Normally I'd use diff2 to do a server-side diff but in this case the files on the branches are so large that the diff2 call ends up filling up /tmp on my server trying to diff them and the diff fails.
I can't bring down my server to rectify this so I'm looking at checking out the the content to disk and using diff on the command line to inspect and compare the content.
The trouble is: most of the files have RCS keywords in them that are being expanded.
I know can remove keyword expansion from a file by opening the files for edit and removing the -k attribute from the files in the process, but that seems a bit brute force. I was hoping I could just tell the p4 sync command not to expand the keywords on checkout. I can't seem to find a way to do this? Is it possible?
As a possible alternative solution, does anyone know if you can tell p4 diff2 which directory to use for temporary space when you call it? If I could tell it to use abundant NAS space instead of /tmp on the Perforce server I might be able to make it work.
I'm using 2010.x version of Perforce if that changes the answer in any way.
There's no way I know of to disable keyword expansion on sync. Here's what I would try:
1) Create a branch spec between the two sets of files
2) Run "p4 files //path/to/files/... | cut -d '#' -f 1 > tmp"
Path to files above should be the right hand side of the branch spec you created
3) p4 -x tmp diff2 -b
This tells p4 to iterate over the lines of text in 'tmp' and treat them as arguments to the command. I think /tmp on your server will get cleared in-between each file this way, preventing it from filling up.
I unfortunately don't have files large enough to test that it works, so this is entirely theoretical.
To change the temp directory that p4d uses just TEMP or TMP to a different path and restart p4d. If you're on Windows make sure to call 'p4 set -S perforce TMP=' to set variable for the Perforce service; without the -S perforce you'll just set it for the current user.

Creating a script that compares multiple files in multiple servers

I have several different linux servers, all of which are essentially mirrors of each other. However, some of them have gone out of sync (file A in machine 1 is different from file B in machine 2).
I'm in the process of designing a script (shell or Perl only) that will systematically walk through certain directories and diff the corresponding files in the different machines against each other, and generate a meaningful report. Later on, I will try to sync up the files.
These are my thoughts so far on how to approach this:
sftp files to /tmp and diff locally
using ssh and diff
using rsync
My question is: what is the best way to systematically compare two files that are in different machines (but similar directory structure), and are there any built-in Perl utilities that may be helpful?
rsync will figure out the difference and sync your files by sending only the diff. Once two folders get synced, it will be pretty quick. (But the 1st time to sync will take some time)
You can also use git here. One possible workflow: just checkin all files you want to compare (or complete directories using git add -A). Then create an empty git repository on your local workstation which is used fetch all the other repositories, and which is used to do the comparisons:
git init
git remote add firstmachine ssh://user#firstmachine/path/to/directory
git remote add othermachine ssh://user#othermachine/path/to/directory
git fetch --all
Now the contents of two machines may be compared:
git diff remotes/firstmachine/master remotes/othermachine/master
Or just compare the contents of a specific file:
git diff remotes/firstmachine/master remotes/othermachine/master -- file/to/compare
It's not strictly necessary to use a third machine for the comparisons. You can also git-fetch the contents from othermachine to firstmachine.
I had worked on a similar tool (which was in python). What it did was, run a cron job, at a given time of the night, which would bring the tar bzipped files to one server, extract the directories and run a recursive diff on it. The diff output was then run through some python scripts, which would analyse the diff hunks (+ lines/! lines etc) to know the amount of change.
Not sure if there are pre-built modules in Perl or Python, but some helper utils might sure be available in one of them.
If you need to know the difference between some local and remote file systems, the following method minimizes the network load:
make a local copy ($C) of the local directory ($D) you want to compare. I.e.:
cp -R $D $C
use rsync to copy the remote directory ($R) you want to compare over $C:
rsync -av --delete $remote_host:$R $C
compare $D to $C:
diff -u $D $C