I am using following command to get a brief history of the CVS repository.
cvs -d :pserver:*User*:*Password*#*Repo* rlog -N -d "*StartDate* < *EndDate*" *Module*
This works just fine except for one small problem. It lists all tags created on each file in that repository. I want the tag info, but I only want the tags that are created in the date range specified. How do I change this command to do that.
I don't see a way to do that natively with the rlog command. Faced with this problem, I would write a Perl script to parse the output of the command, correlate the tags to the date range that I want and print them.
Another solution would be to parse the ,v files directly, but I haven't found any robust libraries for doing that. I prefer Perl for that type of task, and the parsing modules don't seem to be very high quality.
Related
Is there any way (for on premise github) to :
For N number of files in the Pull Request.
Look at the history of those files.
And add any/all github users (on the history) .. to the code reviewers list of users?
I have searched around.
I found "in general" items like this:
https://www.freecodecamp.org/news/how-to-automate-code-reviews-on-github-41be46250712/
But cannot find anything in regards to the specific "workflow" I describe above.
We can get the list of changed files to the text file from PR. Then we can run the git command below to get the list of users included in last version's blame. For each file we get from file list, run the blame command. This might be also simple script.
Generate txt file from list of files of PR.
Traverse all filenames through txt file. (python, bash etc.)
Run blame command and store in a list.
Add reviewers to the PR from that list manually or some JS script for it.
For github spesific: list-pull-requests-files
The blame command is something like :
git blame filename --porcelain | grep "^author " | sort -u
As a note, if there are users who are not available in github anymore. Extra step can be added after we get usernames to check whether they exist or not. (It looks achievable through github API)
I am attempting to regularly archive a few file types hosted on a community website where our admin has been MIA for years, in case he dies or just stops paying for the hosting.
I am able to download all of the files I need using wget -r -np -nd -e robots=off -l 0 URL but this leaves me with about 60,000 extra files to waste time both downloading and deleting.
I am really only looking for files with the extensions "tbt" and "zip". When I add in -A tbt,zip to the input, wget then only downloads a single file, "index.html.tmp". It immediately deletes this file because it doesn't match the file type specified, and then the process stops entirely, with wget announcing that it is finished. It does not attempt to download any of the other files that it grabs when the -A flag is not included.
What am I doing wrong? Why does specifying file types in the way that I did cause it to finish after only looking at one file?
Possibly you're hitting the same problem I've hit when trying to do something similar. When using --accept, wget determines whether a links refers to a file or directory based on whether or not it ends with a /.
For example, say I have a directory named files, and a web page that has:
Lots o' files!
If I were to request this with wget -r, then I wget would happily GET /files, see that it was an HTML document containing a bunch of links, and continue to download those links.
However, if I add -A zip to my command line, and run wget with --debug, I see:
appending ‘http://localhost:8080/files’ to urlpos.
[...]
Deciding whether to enqueue "http://localhost:8080/files".
http://localhost:8080/files (files) does not match acc/rej rules.
Decided NOT to load it.
In other words, wget thinks this is a file (no trailing /) and it doesn't match our acceptance criteria, so it gets rejected.
If I modify the remote file so that it looks like...
Lots o' files!
...then wget will follow the link and download files as desired.
I don't think there's a great solution to this problem if you need to use wget. As I mentioned in my comment, there are other tools available that may handle this situation more gracefully.
It's also possible you're experiencing a different issue; the output of adding --debug to your command line clarify things in that case.
I also experienced this issue, on a page where all the download links looked something like this: filedownload.ashx?name=file.mp3. The solution was to match for both the linked file, and the downloaded file. So my wget accept flag looked like this: -A 'ashx,mp3'. I also used the --trust-server-names flag. This catches all the .ashx that are linked in the webpage, then when wget does the second check, all the mp3 files that were downloaded will stay.
As an alternative to --trust-server-names, you may also find the --content-disposition flag helpful. Both flags help rename the file that gets downloaded from filedownload.ashx?name=file.mp3 to just file.mp3.
I'm writing a syntax check tool to parse several files on different branches.
Is there a way for me to read the contents without checking out the file?
The tool is written in Perl.
`p4 print //depot/path/to/file`;
(Usual requirements for running a p4 command apply -- make sure the p4 executable is in your PATH, make sure you're authenticated with p4 login, make sure you're connecting to the right server, etc.)
See p4 help print for more info on the print command -- you might find the -q and/or -o flags helpful depending on what exactly you need to do with the output.
I have this wget command:
sudo wget --user-agent='some-agent' --referer=http://some-referrer.html -N -r -nH --cut-dirs=x --timeout=xxx --directory-prefix=/directory/for/downloaded/files -i list-of-files-to-download.txt
-N will check if there is actually a newer file to download.
-r will turn the recursive retrieving on.
-nH will disable the generation of host-prefixed directories.
--cut-dirs=X will avoid the generation of the host's subdirectories.
--timeout=xxx will, well, timeout :)
--directory-prefix will store files in the desired directorty.
This works nice, no problem.
Now, to the issue:
Let's say my files-to-download.txt has these kind of files:
http://website/directory1/picture-same-name.jpg
http://website/directory2/picture-same-name.jpg
http://website/directory3/picture-same-name.jpg
etc...
You can see the problem: on the second download, wget will see we already have a picture-same-name.jpg, so it won't download the second or any of the following ones with the same name. I cannot mirror the directory structure because I need all the downloaded files to be in the same directory. I can't use the -O option because it clashes with --N, and I need that. I've tried to use -nd, but doesn't seem to work for me.
So, ideally, I need to be able to:
a.- wget from a list of url's the way I do now, keeping my parameters.
b.- get all files at the same directory and being able to rename each file.
Does anybody have any solution to this?
Thanks in advance.
I would suggest 2 approaches -
Use the "-nc" or the "--no-clobber" option. From the man page -
-nc
--no-clobber
If a file is downloaded more than once in the same directory, >Wget's behavior depends on a few options, including -nc. In certain >cases, the local file will be
clobbered, or overwritten, upon repeated download. In other >cases it will be preserved.
When running Wget without -N, -nc, -r, or -p, downloading the >same file in the same directory will result in the original copy of file >being preserved and the second copy
being named file.1. If that file is downloaded yet again, the >third copy will be named file.2, and so on. (This is also the behavior >with -nd, even if -r or -p are in
effect.) When -nc is specified, this behavior is suppressed, >and Wget will refuse to download newer copies of file. Therefore, ""no->clobber"" is actually a misnomer in
this mode---it's not clobbering that's prevented (as the >numeric suffixes were already preventing clobbering), but rather the >multiple version saving that's prevented.
When running Wget with -r or -p, but without -N, -nd, or -nc, >re-downloading a file will result in the new copy simply overwriting the >old. Adding -nc will prevent this
behavior, instead causing the original version to be preserved >and any newer copies on the server to be ignored.
When running Wget with -N, with or without -r or -p, the >decision as to whether or not to download a newer copy of a file depends >on the local and remote timestamp and
size of the file. -nc may not be specified at the same time as >-N.
A combination with -O/--output-document is only accepted if the >given output file does not exist.
Note that when -nc is specified, files with the suffixes .html >or .htm will be loaded from the local disk and parsed as if they had been >retrieved from the Web.
As you can see from this man page entry, the behavior might be unpredictable/unexpected. You will need to see if it works for you.
Another approach would be to use a bash script. I am most comfortable using bash on *nix, so forgive the platform dependency. However the logic is sound, and with a bit of modifications, you can get it to work on other platforms/scripts as well.
Sample pseudocode bash script -
for i in `cat list-of-files-to-download.txt`;
do
wget <all your flags except the -i flag> $i -O /path/to/custom/directory/filename ;
done ;
You can modify the script to download each file to a temporary file, parse $i to get the filename from the URL, check if the file exists on the disk, and then take a decision to rename the temp file to the name that you want.
This offers much more control over your downloads.
I have a situation where I'd like to diff two branches in Perforce. Normally I'd use diff2 to do a server-side diff but in this case the files on the branches are so large that the diff2 call ends up filling up /tmp on my server trying to diff them and the diff fails.
I can't bring down my server to rectify this so I'm looking at checking out the the content to disk and using diff on the command line to inspect and compare the content.
The trouble is: most of the files have RCS keywords in them that are being expanded.
I know can remove keyword expansion from a file by opening the files for edit and removing the -k attribute from the files in the process, but that seems a bit brute force. I was hoping I could just tell the p4 sync command not to expand the keywords on checkout. I can't seem to find a way to do this? Is it possible?
As a possible alternative solution, does anyone know if you can tell p4 diff2 which directory to use for temporary space when you call it? If I could tell it to use abundant NAS space instead of /tmp on the Perforce server I might be able to make it work.
I'm using 2010.x version of Perforce if that changes the answer in any way.
There's no way I know of to disable keyword expansion on sync. Here's what I would try:
1) Create a branch spec between the two sets of files
2) Run "p4 files //path/to/files/... | cut -d '#' -f 1 > tmp"
Path to files above should be the right hand side of the branch spec you created
3) p4 -x tmp diff2 -b
This tells p4 to iterate over the lines of text in 'tmp' and treat them as arguments to the command. I think /tmp on your server will get cleared in-between each file this way, preventing it from filling up.
I unfortunately don't have files large enough to test that it works, so this is entirely theoretical.
To change the temp directory that p4d uses just TEMP or TMP to a different path and restart p4d. If you're on Windows make sure to call 'p4 set -S perforce TMP=' to set variable for the Perforce service; without the -S perforce you'll just set it for the current user.