In Shell, how to run a command on two files in a directory at once? - sh

I know how to run a subscript in shell on all files of a similar type. I do:
for filePath in path/*.extension; do
script.py $filepath
done
Currently I have about nine pairs of files with the same extension and very similar base names (think xxx_R1 and xxx_R2). I have a script I want to run that takes in pairs of files. How can I run a script on all those pairs using shell?

I would list the files matching one pattern, strip off the suffix to form a list of the "base" names, then re-append both suffixes. Something like this:
for base in $(ls *_R1 | sed 's/_R1$//')
do
f1=${base}_R1
f2=${base}_R2
script2.py $f1 $f2
done
Alternatively, you could accomplish the same thing by letting sed do the selection as well as the stripping:
for base in $(ls | sed -n '/_R1$/s///p')
...
Both of these are somewhat simplistic, and can fall down if you have files with "funny" names, such as embedded spaces. If that's a possibility for you, you can use some more sophisticated (albeit less obvious) techniques to get around them. Several are mentioned in links #tripleee has posted. An incremental improvement I like to use, to avoid the improper word splitting that a for ... in loop can do, is to use while and read instead:
ls | sed -n '/_R1$/s///p' | while read base
do
f1=${base}_R1
f2=${base}_R2
script2.py "$f1" "$f2"
done
This still isn't perfect, and will fall down if you happen to have a file with a newline in its name (although personally, I believe that if you have a file with a newline in its name, you deserve whatever miseries befall you :-) ).
Again, if you want something perfect, see the links posted elsewhere.

Related

Using sed/awk to print ONLY words that contains matched pattern - Words starting with /pattern/ or Ending with /pattern/

I have the following output:
junos-vmx-x86-64-21.1R1.11.qcow2 metadata-usb-fpc0.img metadata-usb-fpc10.img
metadata-usb-fpc11.img metadata-usb-fpc1.img metadata-usb-fpc2.img metadata-usb-fpc3.img
metadata-usb-fpc4.img metadata-usb-fpc5.img metadata-usb-fpc6.img metadata-usb-fpc7.img
metadata-usb-fpc8.img metadata-usb-fpc9.img metadata-usb-re0.img metadata-usb-re1.img
metadata-usb-re.img metadata-usb-service-pic-10g.img metadata-usb-service-pic-2g.img
metadata-usb-service-pic-4g.img vFPC-20210211.img vmxhdd.img
The output came from the following script:
images_fld=$(for i in $(ls "$DIRNAME_IMG"); do echo ${i%%/}; done)
The previous output is saved in a variable called images_fld=
Problem:
I need to extract the values of junos-vmx-x86-64-21.1R1.11.qcow2
vFPC-20210211.img and vmxhdd.img When I mean values I mean the entire word
The problem is that this directory containing all the files is always being updated, and new files are added constantly, which means that I can not rely on the line number ($N) to extract the name of those files.
I am trying to use awk or sed to achieve this.
Is there a way to:
match all files ending with.qcow2 and then extract the full file name? Like: junos-vmx-x86-64-21.1R1.11.qcow2
match all files starting withvFPC and then extract the full file name? Like: vFPC-20210211.img
match all files starting withvmxhdd and then extract the full file name? Like: vmxhdd.img
I am using those patterns as those file names tend to change names according to each version I am deploying. But the patterns like: .qcow2 or vFPC or vmxhddalways remain the same regardless, so for that reason, I need to extract the entire string only by matching partial patterns. Is it possible? Thanks!
Note: I can not rely on files ending with .img as there are quite a lot of them, so it would make it more difficult to extract the specific file names :/
This might work for you (GNU sed):
sed -nE '/\<\S+\.qcow2\>|\<(vFPC|vmxhdd)\S+\>/{s//\n&\n/;s/[^\n]*\n//;P;D}' file
If a string matches the required criteria, delimit it by newlines.
Delete up to and including the first newline.
Print/delete the first line and repeat.
Thanks to KamilCuk I was able to solve the problem. Thank you! For anyone who may need this in the future, instead of using sed or awk the solution was by using tail.
echo $images_fld | tail -f | tr ' ' '\n' | grep '\.qcow2$\|vFPC\|vmxhdd')
Basically, the problem that I was having was only to extract the names of the files ending with .qcow2 | and starting with vFPC & vmxhdd
Thank you KamilCuk
Another solution given by potong is by using
echo $images_fld sed -nE '/\<\S+\.qcow2\>|\<(vFPC|vmxhdd)\S+\>/{s//\n&\n/;s/[^\n]*\n//;P;D}'
which gives the same output as KamilCuk's! Thanks both

Using sed, prepend line only once, if there's a match later in file content

I'd like to add a line on top of my output if my input file has a specific word.
However, if I'm just looking for specific string, then as I understand it, it's too late. The first line is already in the output and I can't prepend to it anymore.
Here's an exemple of input.
one
two
two
three
If I can find a line with, say, the word two, I'd like to add a new line before the first one, with for example FOUND. I want that line prepended only once, even if there are several matches.
So an input file without any two would remain unchanged, and the example file above would become:
FOUND
one
two
two
three
I know how to prepend with i\, but can't get the context right. From what I understood that would be around:
1{
/two/{ # This will will search "two" in the first line, how to look for it in the whole file ?
1i\
FOUND
}
}
EDIT:
I know how to do it using other languages/methods, that's not my question.
Sed has advanced features to work on several lines at once, append/prepend lines and is not limited to substitution. I have a sed file already filled with expressions to modify a python source file, which is why I'd prefer to avoid using something else. I want to be able to add an import at the beginning of a file if a certain class is used.
A Perl solution:
perl -i.bak -0077 -pE 'say "FOUND" if /two/;' in_file
The Perl one-liner uses these command line flags:
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
-E : Tells Perl to look for code in-line, instead of in a file. Also enables all optional features. Here, enables say.
-0777 : Slurp files whole.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
sed is for doing s/old/new on individual strings, that's not what you're trying to do so you shouldn't bother trying to use sed. There's lots of ways to do this, this one will be very efficient, robust and portable to all Unix systems:
$ grep -Fq 'two' file && echo "FOUND"; cat file
FOUND
one
two
two
three
To operate on a stream instead of (or in addition to) a file and without needing to read the whole input into memory:
awk 'f{print; next} {buf[NR]=$0} /two/{print "FOUND"; for (i=1;i<=NR;i++) print buf[i]; f=1}'
e.g.:
$ cat file | awk 'f{print; next} {buf[NR]=$0} /two/{print "FOUND"; for (i=1;i<=NR;i++) print buf[i]; f=1}'
FOUND
one
two
two
three
That awk script will also work using any awk in any shell on every Unix box.

Is there a way to search for a string in only those files where a previous search found a different string?

I can't manage to find a way to make special edition, e.g., changing a string 'ABC' with a text 'TEXT1 TEXT2' in only files that already met a search criteria that I want.
Example: in all the files that contain the string '-FI-' replace the string 'ABC' with the string 'TEXT1 TEXT2'.
Is there a way/feature to do it please?, I have VScode 1.37.1 installed on Windows10. I want something that can be run in VScode and in worst case maybe some linux stuff can help ...
I tried for example how to make a search inside a search and edit. And I don't have enough knowledge to do it using regex.
Thank you.
Based on the clarifying comments, I interpret the question to be:
How can I replace all instances of "ABC" with "TEXT1 TEXT2" within files that also contain the string "-Fl-", starting at a given directory and recursively considering all files beneath it?
I would solve this using Cygwin shell commands rather than in VSCode.
First, let's make a file that contains the names of all the files that contain the string "-Fl-". At a bash shell, use cd to go to the directory of interest, and run:
$ grep -l -r -- '-Fl-' . > files.txt
Breaking this down:
grep searches for text within files.
The -l switch prints the file names rather than matching lines.
The -r switch searches recursively in subdirectories.
The -- switch tells grep that that is the last command line option, so subsequent words should be treated as arguments (i.e., text to search for). This is necessary because our search text begins with - and hence would otherwise be treated as an option.
The -Fl- is the text to search for (case sensitive; use the -i option for case insensitive search).
The . is the place for grep to search, and means "current directory" (plus all files and subdirectories, recursively, due to -r).
The > files.txt part says to write the results to files.txt.
Before going on, open files.txt in an editor or just cat it to the terminal to verify that it looks reasonable to you:
$ cat files.txt
Now we need to do search and replace in this list of files. This isn't so easy to do with just stock shell commands, so I've written my own script that I use to do it:
https://raw.githubusercontent.com/smcpeak/scripts/master/replace-across-files
Save that as a file called "replace-across-files". I normally put such things into $HOME/scripts, also known as ~/scripts, so I will assume you've done the same (make that directory first if necessary).
Now, go back to the directory that has files.txt and run:
perl ~/scripts/replace-across-files 'ABC' 'TEXT1 TEXT2' $(cat files.txt)
This will interactively prompt you for each occurrence. Use y (or just Enter) to accept each one individually, Y to accept all in the current file, and ! to make all replacements.
If you get perl: command not found, then you need to install Cygwin perl first.
One possible gotcha: if any of the file names contain a space, then this won't work because $(cat files.txt) splits files.txt at whitespace boundaries before putting the contents onto the command line. One way to deal with this is to use xargs instead:
$ cat files.txt | xargs -d '\n' perl ~/scripts/replace-across-files -f 'ABC' 'TEXT1 TEXT2'
Breaking this down:
cat files.txt | feeds the contents of files.txt to the next command, xargs, as its input.
xargs adds its input onto the given command line and runs it.
-d '\n' tells xargs to divide its input at newline boundaries, not any whitespace.
-f tells replace-across-files to do all replacements non-interactively. This is necessary because, due to the way xargs works, prompting for each replacement would not work.
This is pretty simple using an extension I wrote that can use the results from one search to limit the files searched in a second, third, etc. search. Using the Find and Transform extension, make this keybinding (in your keybindings.json):
{
"key": "alt+m", // whatever keybinding you want
"command": "runInSearchPanel",
"args": {
"find": ["-F1-", "ABC"],
"replace": ["", "TEXT1 TEXT2"],
"filesToInclude": ["", "${resultsFiles}"],
"triggerReplaceAll": true,
// "delay": 250 // a pause between searches, in milliseconds
}
}
"find": ["-F1-", "ABC"], runs 2 finds, first for -F1- and then for ABC.
"replace": ["", "TEXT1 TEXT2"], two replaces, but the first does nothing (it does NOT replace -F1- with an empty string).
"filesToInclude": ["", "${resultsFiles}"], the first "" clears the files to include input, and the second will populate the files to include search input with only the files that were found with the previous search.
If you wanted to start the first search in a particular directory, you could put something into that first "", like ${relativeFileDirname} for example or a few other variables or any string value representing a folder or file path.
"delay": 250 for searching larger groups of files, there must be a delay between each search if you are doing multiple searches like in the current case. This is to allow vscode to complete the previous search and populate the search results. There is a default delay of 2000 or 2 seconds but you can try shorter or longer delay values for your situation.

What is the purpose of filtering a log file using this Perl one-liner before displaying it in the terminal?

I came across this script which was not written by me, but because of an issue I need to know what it does.
What is the purpose of filtering the log file using this Perl one-liner?
cat log.txt | perl -pe 's/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)/ /g'
The log.txt file contains the output of a series of commands. I do not understand what is being filtered here, and why it might be useful.
It looks like the code should remove ANSI escape codes from the input, i.e codes to set colors, window title .... Since some of these code might cause harm it might be a security measure in case some kind of attack was able to include such escape codes into the log file. Since usually a log file does not contain any such escape codes this would also explain why you don't see any effect of this statement for normal log files.
For more information about this kind of attack see A Blast From the Past: Executing Code in Terminal Emulators via Escape Sequences.
BTW, while your question looks bad on the first view it is actually not. But you might try to improve questions by at least formatting it properly. Otherwise you risk that this questions gets down-voted fast.
First, the command line suffers from a useless use of cat. perl is fully capable of reading from a file name on the command line.
So,
$ perl -pe 's/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)/ /g' log.txt
would have done the same thing, but avoided spawning an extra process.
Now, -e is followed by a script for perl to execute. In this case, we have a single global substitution.
\e in a Perl regex pattern corresponds to the escape character, x1b.
The pattern following \e looks like the author wants to match ANSI escape sequences.
The -p option essentially wraps the script specified with -e in while loop, so the s/// is executed for each line of the input.
The pattern probably does the job for this simple purpose, but one might benefit from using Regexp::Common::ANSIescape as in:
$ perl -MRegexp::Common::ANSIescape=ANSIescape,no_defaults -pe 's/$RE{ANSIescape}/ /g' log.txt
Of course, if one uses a script like this very often, one might want to either use an alias, or even write a very short script that does this, as in:
#!/usr/bin/env perl
use strict;
use Regexp::Common 'ANSIescape', 'no_defaults';
while (<>) {
s/$RE{ANSIescape}/ /g;
print;
}

I want to print a text file in columns

I have a text file which looks something like this:
jdkjf
kjsdh
jksfs
lksfj
gkfdj
gdfjg
lkjsd
hsfda
gadfl
dfgad
[very many lines, that is]
but would rather like it to look like
jdkjf kjsdh
jksfs lksfj
gkfdj gdfjg
lkjsd hsfda
gadfl dfgad
[and so on]
so I can print the text file on a smaller number of pages.
Of course, this is not a difficult problem, but I'm wondering if there is some excellent tool out there for solving problems like these.
EDIT: I'm not looking for a way to remove every other newline from a text file, but rather a tool which interprets text as "pictures" and then lays these out on the page nicely (by writing the appropriate whitespace symbols).
You can use this python code.
tables=input("Enter number of tables ")
matrix=[]
file=open("test.txt")
for line in file:
matrix.append(line.replace("\n",""))
if (len(matrix)==int(tables)):
print (matrix)
matrix=[]
file.close()
(Since you don't name your operating system, I'll simply assume Linux, Mac OS X or some other Unix...)
Your example looks like it can also be described by the expression "joining 2 lines together".
This can be achieved in a shell (with the help of xargs and awk) -- but only for an input file that is structured like your example (the result always puts 2 words on a line, irrespective of how many words each one contains):
cat file.txt | xargs -n 2 | awk '{ print $1" "$2 }'
This can also be achieved with awk alone (this time it really joins 2 full lines, irrespective of how many words each one contains):
awk '{printf $0 " "; getline; print $0}' file.txt
Or use sed --
sed 'N;s#\n# #' < file.txt
Also, xargs could do it:
xargs -L 2 < file.txt
I'm sure other people could come up with dozens of other, quite different methods and commandline combinations...
Caveats: You'll have to test for files with an odd number of lines explicitly. The last input line may not be processed correctly in case of odd number of lines.