Using diff3 where filenames contain a dash (-) - diff

I'm trying to use diff3 in this way
diff3 options... mine older yours
My problem is that I probably can't use it, since all my 3 files contain a "dash" within.
The manual mentions:
At most one of these three file names may be `-', which tells diff3 to read the standard input for that file.
so I probably have to rename filenames before running diff3.
If you know for a better solution or a workaround, please let me know about. Thank you!

At most one of these three file names may be `-', which tells diff3 to read the standard input for that file.
It does not state, that your filenames should not contain dash symbols. It simply says, that if you want to, you can put - instead of one of the names, in which case the standard input will be read instead of reading one of the files.
So, you can have as many dashes in your filenames as you like and diff3 should work just fine.
However, on Windows putting filenames in "" for escaping space characters does not work, and I failed to find a suitable workaround. However, you can automatize the process of renaming files (if the files are relatively small, this would not even be too inefficient):
#echo off
copy %1 tempfile_1.txt
copy %2 tempfile_2.txt
copy %3 tempfile_3.txt
"C:\Program Files (x86)\KDiff3\bin\diff3.exe" -E tempfile_1.txt tempfile_2.txt tempfile_3.txt
del tempfile_1.txt tempfile_2.txt tempfile_3.txt
Put this in a file like diff3.cmd, then run diff3.cmd "first file.txt" "second file.txt" "third file.txt".
P.S. Moving files would be more efficient (if they are on the same disk volume as the script, which they are not in your case), you could even move them back to where they were initially, but for some time they would not be present at their original folder.

Related

SAS- Reading multiple compressed data files

I hope you are all well.
So my question is about the procedure to open multiple raw data files that are compressed.
My files' names are ordered so I have for example : o_equities_20080528.tas.zip o_equities_20080529.tas.zip o_equities_20080530.tas.zip ...
Thank you all in advance.
How much work this will be depends on whether:
You have enough space to extract all the files simultaneously into one folder
You need to be able to keep track of which file each record has come from (i.e. you can't tell just from looking at a particular record).
If you have enough space to extract everything and you don't need to track which records came from which file, then the simplest option is to use a wildcard infile statement, allowing you to import the records from all of your files in one data step:
infile "c:\yourdir\o_equities_*.tas" <other infile options as per individual files>;
This syntax works regardless of OS - it's a SAS feature, not shell expansion.
If you have enough space to extract everything in advance but you need to keep track of which records came from each file, then please refer to this page for an example of how to do this using the filevar option on the infile statement:
http://www.ats.ucla.edu/stat/sas/faq/multi_file_read.htm
If you don't have enough space to extract everything in advance, but you have access to 7-zip or another archive utility, and you don't need to keep track of which records came from each file, you can use a pipe filename and extract to standard output. If you're on a Linux platform then this is very simple, as you can take advantage of shell expansion:
filename cmd pipe "nice -n 19 gunzip -c /yourdir/o_equities_*.tas.zip";
infile cmd <other infile options as per individual files>;
On windows it's the same sort of idea, but as you can't use shell expansion, you have to construct a separate filename for each zip file, or use some of 7zip's more arcane command-line options, e.g.:
filename cmd pipe "7z.exe e -an -ai!C:\yourdir\o_equities_*.tas.zip -so -y";
This will extract all files from all of the matching archives to standard output. You can narrow this down further via the 7-zip command if necessary. You will have multiple header lines mixed in with the data - you can use findstr to filter these out in the pipe before SAS sees them, or you can just choose to tolerate the odd error message here and there.
Here, the -an tells 7-zip not to read the zip file name from the command line, and the -ai tells it to expand the wildcard.
If you need to keep track of what came from where and you can't extract everything at once, your best bet (as far as I know) is to write a macro to process one file at a time, using the above techniques and add this information while you're importing each dataset.

Using wildard with DOS COPY command corrupts destination file

I don't understand the behaviour of the COPY command when using a wildcard.
I have a single text file in C:\Source called mpt*.asm and I want to copy it to C:\Dest. This is needed from a batch script, and I can't be sure of the exact name of mpt*.asm (it may be mpt001.asm for example). The destination name should be exactly mpt.asm.
If I use:
COPY C:\Source\mpt*.asm C:\Dest\mpt.asm
The file file is copied, but has a extra (0x1A) character appended to the end of the file.
If I use:
COPY C:\Source\mpt*.asm C:\Dest\mpt.asm /B
I don't get this spurious character.
If I don't use a wildcard, I don't get the spurious character either. It seems unlikely there is a bug in COPY, but this behavior seems unexpected.
Is there a way of doing this copy without resorting to using /B?
I have never seen that before, but it does append an extra arrow character for me too.
You can work around the issue using xcopy instead.
echo f| xcopy C:\Source\mpt*.asm C:\Dest\mpt.asm
If you read copy /? it says
To append files, specify a single file for destination, but multiple files
for source (using wildcards or file1+file2+file3 format).
So by using a single filename as the the dest, and using a wildcard in the source, it may interpret that as appending, which may be what the extra character is for, but as you aren't appending anything you can see it.
I'm only guessing but that may explain it.

zip recursively each file in a dir, where the name of the file has spaces in it

I am quite stuck; I need to compress the content of a folder, where I have multiple files (extension .dat). I went for shell scripting.
So far I told myself that is not that hard: I just need to recursively read the content of the dir, get the name of the file and zip it, using the name of the file itself.
This is what I wrote:
for i in *.dat; do zip $i".zip" $i; done
Now when I try it I get a weird behavior: each file is called like "12/23/2012 data102 test1.dat"; and when I run this sequence of commands; I see that zip instead of recognizing the whole file name, see each part of the string as single entity, causing the whole operation to fail.
I told myself that I was doing something wrong, and that the i variable was wrong; so I have replaced echo, instead than the zip command (to see which one was the output of the i variable); and the $i output is the full name of the file, not part of it.
I am totally clueless at this point about what is going on...if the variable i is read by zip it reads each single piece of the string, instead of the whole thing, while if I use echo to see the content of that variable it gets the correct output.
Do I have to pass the value of the filename to zip in a different way? Since it is the content of a variable passed as parameter I was assuming that it won't matter if the string is one or has spaces in it, and I can't find in the man page the answer (if there is any in there).
Anyone knows why do I get this behavior and how to fix it? Thanks!
You need to quote anything with spaces in it.
zip "$i.zip" "$i"
Generally speaking, any variable interpolation should have double quotes unless you specifically require the shell to split it into multiple tokens. The internal field separator $IFS defaults to space and tab, but you can change it to make the shell do word splitting on arbitrary separators. See any decent beginners' shell tutorial for a detailed account of the shell's quoting mechanisms.

Rename file containing '©' character

We received as input in our application (running on Windows) a list of files. These files were automatically extracted from a database with a script.
Apparently some of the names are containing special characters (like accents) and these characters are rendered as '©' on our side.
How can rename programmatically these text files (around 900'000) to get rid of this character?
We cannot change the source neither re-extract the files.
The problem is that because of this character another program involved with our system does not accept the files.
Have a look at the unix command rename. It allows you to apply a perl regex to the names of a bunch of files. In this case you might want something like:
$ rename 's/[^a-zA-Z0-9]//' *
In debian the rename command is part of the perl package. It should also be available on CPAN.
I ended up creating a new script that reads the input files and search for special characters in their title.
It was quite easy indeed:
string filename = filename.Replace("©", "e");
Since the '©' is in the filename, the script (in C#) is able to recognize it and replace the match accordingly. In this way I can loop through all the folders and subfolders simply reading the filename and change specials characters.
Thank you all for the contributions!

Basic script to replace image files in multiple directories

I have a situation where we have several thousand image files that have become corrupted on our server (Windows 2008 R2 x64). I have a working image file that I want to replace the corrupt files with. The files must retain the same name and path (size, timestamps, etc do not matter).
So the basic idea would be to replace each corrupt image file with the working file.
I do not write code, only the occasional windows batch file.
Should I use VB or PowerShell (or something else) for this? What will the script look like for this?
I apologize in advance if this question is too basic for stackoverflow.
You don't really need a batch file,
try looking at the for command
e.g.
FOR /R %f in (*.jpg) DO copy newfile.jpg "%f"
This should do a recursive search and copy newfile.jpg over the jpg's it finds.
It all boils down to how you are identifying the broken jpgs.
When I dont use a wild card for example
FOR /R %f in (broken.jpg) DO copy newfile.jpg "%f"
Then newfile.jpg gets copied to every subdirectory. If I use a wildcard ( *,?) the command works as expected. Is there a way to have this commend work with a (set) that does not contain wildcards?