In bash if I wish to truncate a bunch of files in a directory, I would do the following:
for i in *
do
cat /dev/null > $i
done
In fish, I tried:
for I in *
cat /dev/null > $I
end
but that gives me the error:
fish: Invalid redirection target: $I
So anyone know how to achieve this?
Thanks.
Works for me. Note that the only way you'll get that error is if variable I is not set. I noticed you used a lowercase letter for your bash example and uppercase for the fish example. Did you perhaps mix the case? For example, this will cause the error you saw:
for i in *
true > $I
end
P.S., In a POSIX shell it's more efficient to do : > $i. Since fish doesn't support : it's more efficient to do true > $i to avoid spawning an external command and opening /dev/null.
Related
The follwoing code is Perl script, grep lines with 'Stage' from hostlog. and then line by line match the content with regex, if find add the count by 1:
$command = 'grep \'Stage \' '. $hostlog;
#stage_info = qx($command);
foreach (#stage_info) {
if ( /Stage\s(\d+)\s(.*)/ ) {
$stage_number = $stage_number+1;
}
}
so how to do this in linux shell? Based on my test, the we can not loop line by line, since there is space inside.
That is a horrible piece of Perl code you've got there. Here's why:
It looks like you are not using use strict; use warnings;. That is a huge mistake, and will not prevent errors, it will just hide them.
Using qx() to grep lines from a file is a completely redundant thing to do, as this is what Perl does best itself. "Shelling out" a process like that most often slows your program down.
Use some whitespace to make your code readable. This is hard to read, and looks more complicated than it is.
You capture strings by using parentheses in your regex, but you never use these strings.
Re: $stage_number=$stage_number+1, see point 3. And also, this can be written $stage_number++. Using the ++ operator will make your code clearer, will prevent the uninitialized warnings, and save you some typing.
Here is what your code should look like:
use strict;
use warnings;
open my $fh, "<", $hostlog or die "Cannot open $hostlog for reading: $!";
while (<$fh>) {
if (/Stage\s\d+/) {
$stage_number++;
}
}
You're not doing anything with the internal captures, so why bother? You could do everything with a grep:
$ stage_number=$(grep -E 'Stage\s\d+\s' | wc -l)
This is using extended regular expressions. I believe the GNU version takes these without a -E parameter, and in Solaris, even the egrep command might not quite allow for this regular expression.
If there's something more you have to do, you've got to explain it in your question.
If I understand the issue correctly, you should be able to do this just fine in the shell:
while read; do
if echo ${REPLY} | grep -q -P "'Stage' "; then
# Do what you need to do
fi
done < test.log
Note that if your grep command supports the -P option you may be able to use the Perl regular expression as-is for the second test.
this is almost it. bash has no expression for multiple digits.
#!/bin/bash
command=( grep 'Stage ' "$hostlog" )
while read line
do
[ "$line" != "${line/Stage [0-9]/}" ] && (( ++stage_number ))
done < <( "${command[#]}" )
On the other hand taking the function of the perl script into account rather than the operations it performs the whole thing could be rewritten as
(( stage_number += ` grep -c 'Stage \d\+\s' "$hostlog" ` ))
or this
stage_number=` grep -c 'Stage \d\+\s' "$hostlog" `
if, in the original perl, stage_number is uninitialised, or is initalised to 0.
Background
This is an optimization problem. Oracle Forms XML files have elements such as:
<Trigger TriggerName="name" TriggerText="SELECT * FROM DUAL" ... />
Where the TriggerText is arbitrary SQL code. Each SQL statement has been extracted into uniquely named files such as:
sql/module=DIAL_ACCESS+trigger=KEY-LISTVAL+filename=d_access.fmb.sql
sql/module=REP_PAT_SEEN+trigger=KEY-LISTVAL+filename=rep_pat_seen.fmb.sql
I wrote a script to generate a list of exact duplicates using a brute force approach.
Problem
There are 37,497 files to compare against each other; it takes 8 minutes to compare one file against all the others. Logically, if A = B and A = C, then there is no need to check if B = C. So the problem is: how do you eliminate the redundant comparisons?
The script will complete in approximately 208 days.
Script Source Code
The comparison script is as follows:
#!/bin/bash
echo Loading directory ...
for i in $(find sql/ -type f -name \*.sql); do
echo Comparing $i ...
for j in $(find sql/ -type f -name \*.sql); do
if [ "$i" = "$j" ]; then
continue;
fi
# Case insensitive compare, ignore spaces
diff -IEbwBaq $i $j > /dev/null
# 0 = no difference (i.e., duplicate code)
if [ $? = 0 ]; then
echo $i :: $j >> clones.txt
fi
done
done
Question
How would you optimize the script so that checking for cloned code is a few orders of magnitude faster?
Idea #1
Remove the matching files into another directory so that they don't need to be examined twice.
System Constraints
Using a quad-core CPU with an SSD; trying to avoid using cloud services if possible. The system is a Windows-based machine with Cygwin installed -- algorithms or solutions in other languages are welcome.
Thank you!
Your solution, and sputnick's solution, both take O(n^2) time. This can be done in O(nlog n) time by sorting the files and using a list merge. It can be sped up further by comparing MD5 (or any other cryptographically-strong hash function) of the files, instead of the files themselves.
Assuming you're in the sql directory:
md5sum * | sort > ../md5sums
perl -lane 'print if $F[0] eq $lastMd5; $last = $_; $lastMd5 = $F[0]' < ../md5sums
Using the above code will report only exact byte-for-byte duplicates. If you want to consider two non-identical files to be equivalent for the purposes of this comparison (e.g. if you don't care about case), first create a canonicalised copy of each file (e.g. by converting every character to lower case with tr A-Z a-z < infile > outfile).
The best way to do this is to hash each file, like SHA-1, and then use a set. I'm not sure bash can do this, but python can. Although if you want best performance C++ is the way to go.
To optimize comparison of your files :
#!/bin/bash
for i; do
for j; do
[[ "$i" != "$j" ]] &&
if diff -IEbwBaq "$i" "$j" > /dev/null; then
echo "$i & $j are the same"
else
echo "$i & $j are different"
fi
done
done
USAGE
./script /dir/*
I am relatively new to Perl and am working on Perl files written by someone else, and I keep encountering the following statement at the beginning of the scripts:
eval '(exit $?0)' && eval 'exec perl -w -S $0 ${1+"$#"}' && eval 'exec perl -w -S $0 $argv:q'
if 0;
What do these two lines do? What is the code checking? and What does the if 0; sentence do?
This is a variant of the exec hack. In the days before interpreters could be reliably specified with a #!, this was used to make the shell exec perl. The if 0 on the second line is never read by the shell, which reads the first line only and execs perl, which reads the if 0 and does not re-execute itself.
This is an interesting variant, but I think not quite correct. It seems to be set up to work with either the bourne shell or with csh variants, using the initial eval to determine the shell that is parsing it and then using the appropriate syntax to pass the arguments to perl. The middle clause is sh syntax and the last clause is appropriate for csh. If the second && were || and the initial eval '(exit $?0)' did actually fail in csh, then this would accomplish those goals, but as written I don't think it quite works for csh. Is there a command that precedes this that would set $? to some value based on the shell? But even if that were the case and $? is set to a non-zero value, then nothing would be exec'ed unless the && is replaced with ||. Something funny is happening.
So, I have a bash script inside of which I'd like to have a conditional which depends on what a perl script returns. The idea behind my code is as follows:
for i in $(ls); do
if $(perl -e "if (\$i =~ /^.*(bleh|blah|bluh)/) {print 'true';}"); then
echo $i;
fi;
done
Currently, this always returns true, and when I tried it with [[]] around the if statement, I got errors. Any ideas anyone?
P.s. I know I can do this with grep, but it's just an example. I'd like to know how to have Bash use Perl output in general
P.p.s I know I can do this in two lines, setting the perl output to a variable and then testing for that variables value, but I'd rather avoid using that extra variable if possible. Seems wasteful.
If you use exit, you can just use an if directly. E.g.
if perl -e "exit 0 if (successful); exit 1"; then
echo $i;
fi;
0 is success, non-zero is failure, and 0 is the default if you don't call exit.
To answer your question, you want perl to exit 1 for failure and exit 0 for success. That being said, you're doing this the wrong way. Really. Also, don't parse the output of ls. You'll cause yourself many headaches.
for file in *; do
if [[ $file = *bl[eau]h ]]; then
echo "$file matches"
fi
done
for file in * ; do
perl -e "shift =~ /^.*(bleh|blah|bluh)/ || exit 1" "$file" && echo $file: true
done
You should never parse the output of ls. You will have, at least, problems with file names containing spaces. Plus, why bother when your shell can glob on its own?
Quoting $file when passing to the perl script avoids problems with spaces in file names (and other special characters). Internally I avoided expanding the bash $file variable so as to not run afoul of quoting problems if the file name contained ", ' or \
Perl seems to (for some reason) always return 0 if you don't exit with an explicit value, which seems weird to me. Since this is the case I test for failure inside the script and return nonzero in that case.
The return value of the previous command is stored in the bash variable $?. You can do something like:
perl someargs script.pl more args
if [ $? == 0 ] ; then
echo true
else
echo false
fi
It's a good question, my advice is: keep it simple and go Posix (avoid Bashisms1) where possible..
so ross$ if perl -e 'exit 0'; then echo Good; else echo Bad; fi
Good
so ross$ if perl -e 'exit 1'; then echo Good; else echo Bad; fi
Bad
1. Sure, the OP was tagged bash, but others may want to know the generic-Posix form.
I just had to produce a long xml sequence for some testing purpose, a lot of elements
like <hour>2009.10.30.00</hour>.
This made me drop into a linux shell and just run
for day in $(seq -w 1 30) ; do
for hour in $(seq -w 0 23) ;
do echo "<hour>2009.10.$day.$hour</hour>" ;
done ;
done >out
How would I do the same in powershell on windows ?
Pretty similar...
$(foreach ($day in 1..30) {
foreach ($hour in 0..23) {
"<hour>2009.10.$day.$hour</hour>"
}
}) > tmp.txt
Added file redirection. If you are familiar with bash the syntax should be pretty intuitive.
If I were scripting I would probably go with orsogufo's approach for readability. But if I were typing this at the console interactively I would use a pipeline approach - less typing and it fits on a single line e.g.:
1..30 | %{$day=$_;0..23} | %{"<hour>2009.10.$day.$_</hour>"} > tmp.txt