I just had to produce a long xml sequence for some testing purpose, a lot of elements
like <hour>2009.10.30.00</hour>.
This made me drop into a linux shell and just run
for day in $(seq -w 1 30) ; do
for hour in $(seq -w 0 23) ;
do echo "<hour>2009.10.$day.$hour</hour>" ;
done ;
done >out
How would I do the same in powershell on windows ?
Pretty similar...
$(foreach ($day in 1..30) {
foreach ($hour in 0..23) {
"<hour>2009.10.$day.$hour</hour>"
}
}) > tmp.txt
Added file redirection. If you are familiar with bash the syntax should be pretty intuitive.
If I were scripting I would probably go with orsogufo's approach for readability. But if I were typing this at the console interactively I would use a pipeline approach - less typing and it fits on a single line e.g.:
1..30 | %{$day=$_;0..23} | %{"<hour>2009.10.$day.$_</hour>"} > tmp.txt
Related
In bash if I wish to truncate a bunch of files in a directory, I would do the following:
for i in *
do
cat /dev/null > $i
done
In fish, I tried:
for I in *
cat /dev/null > $I
end
but that gives me the error:
fish: Invalid redirection target: $I
So anyone know how to achieve this?
Thanks.
Works for me. Note that the only way you'll get that error is if variable I is not set. I noticed you used a lowercase letter for your bash example and uppercase for the fish example. Did you perhaps mix the case? For example, this will cause the error you saw:
for i in *
true > $I
end
P.S., In a POSIX shell it's more efficient to do : > $i. Since fish doesn't support : it's more efficient to do true > $i to avoid spawning an external command and opening /dev/null.
The follwoing code is Perl script, grep lines with 'Stage' from hostlog. and then line by line match the content with regex, if find add the count by 1:
$command = 'grep \'Stage \' '. $hostlog;
#stage_info = qx($command);
foreach (#stage_info) {
if ( /Stage\s(\d+)\s(.*)/ ) {
$stage_number = $stage_number+1;
}
}
so how to do this in linux shell? Based on my test, the we can not loop line by line, since there is space inside.
That is a horrible piece of Perl code you've got there. Here's why:
It looks like you are not using use strict; use warnings;. That is a huge mistake, and will not prevent errors, it will just hide them.
Using qx() to grep lines from a file is a completely redundant thing to do, as this is what Perl does best itself. "Shelling out" a process like that most often slows your program down.
Use some whitespace to make your code readable. This is hard to read, and looks more complicated than it is.
You capture strings by using parentheses in your regex, but you never use these strings.
Re: $stage_number=$stage_number+1, see point 3. And also, this can be written $stage_number++. Using the ++ operator will make your code clearer, will prevent the uninitialized warnings, and save you some typing.
Here is what your code should look like:
use strict;
use warnings;
open my $fh, "<", $hostlog or die "Cannot open $hostlog for reading: $!";
while (<$fh>) {
if (/Stage\s\d+/) {
$stage_number++;
}
}
You're not doing anything with the internal captures, so why bother? You could do everything with a grep:
$ stage_number=$(grep -E 'Stage\s\d+\s' | wc -l)
This is using extended regular expressions. I believe the GNU version takes these without a -E parameter, and in Solaris, even the egrep command might not quite allow for this regular expression.
If there's something more you have to do, you've got to explain it in your question.
If I understand the issue correctly, you should be able to do this just fine in the shell:
while read; do
if echo ${REPLY} | grep -q -P "'Stage' "; then
# Do what you need to do
fi
done < test.log
Note that if your grep command supports the -P option you may be able to use the Perl regular expression as-is for the second test.
this is almost it. bash has no expression for multiple digits.
#!/bin/bash
command=( grep 'Stage ' "$hostlog" )
while read line
do
[ "$line" != "${line/Stage [0-9]/}" ] && (( ++stage_number ))
done < <( "${command[#]}" )
On the other hand taking the function of the perl script into account rather than the operations it performs the whole thing could be rewritten as
(( stage_number += ` grep -c 'Stage \d\+\s' "$hostlog" ` ))
or this
stage_number=` grep -c 'Stage \d\+\s' "$hostlog" `
if, in the original perl, stage_number is uninitialised, or is initalised to 0.
Background
This is an optimization problem. Oracle Forms XML files have elements such as:
<Trigger TriggerName="name" TriggerText="SELECT * FROM DUAL" ... />
Where the TriggerText is arbitrary SQL code. Each SQL statement has been extracted into uniquely named files such as:
sql/module=DIAL_ACCESS+trigger=KEY-LISTVAL+filename=d_access.fmb.sql
sql/module=REP_PAT_SEEN+trigger=KEY-LISTVAL+filename=rep_pat_seen.fmb.sql
I wrote a script to generate a list of exact duplicates using a brute force approach.
Problem
There are 37,497 files to compare against each other; it takes 8 minutes to compare one file against all the others. Logically, if A = B and A = C, then there is no need to check if B = C. So the problem is: how do you eliminate the redundant comparisons?
The script will complete in approximately 208 days.
Script Source Code
The comparison script is as follows:
#!/bin/bash
echo Loading directory ...
for i in $(find sql/ -type f -name \*.sql); do
echo Comparing $i ...
for j in $(find sql/ -type f -name \*.sql); do
if [ "$i" = "$j" ]; then
continue;
fi
# Case insensitive compare, ignore spaces
diff -IEbwBaq $i $j > /dev/null
# 0 = no difference (i.e., duplicate code)
if [ $? = 0 ]; then
echo $i :: $j >> clones.txt
fi
done
done
Question
How would you optimize the script so that checking for cloned code is a few orders of magnitude faster?
Idea #1
Remove the matching files into another directory so that they don't need to be examined twice.
System Constraints
Using a quad-core CPU with an SSD; trying to avoid using cloud services if possible. The system is a Windows-based machine with Cygwin installed -- algorithms or solutions in other languages are welcome.
Thank you!
Your solution, and sputnick's solution, both take O(n^2) time. This can be done in O(nlog n) time by sorting the files and using a list merge. It can be sped up further by comparing MD5 (or any other cryptographically-strong hash function) of the files, instead of the files themselves.
Assuming you're in the sql directory:
md5sum * | sort > ../md5sums
perl -lane 'print if $F[0] eq $lastMd5; $last = $_; $lastMd5 = $F[0]' < ../md5sums
Using the above code will report only exact byte-for-byte duplicates. If you want to consider two non-identical files to be equivalent for the purposes of this comparison (e.g. if you don't care about case), first create a canonicalised copy of each file (e.g. by converting every character to lower case with tr A-Z a-z < infile > outfile).
The best way to do this is to hash each file, like SHA-1, and then use a set. I'm not sure bash can do this, but python can. Although if you want best performance C++ is the way to go.
To optimize comparison of your files :
#!/bin/bash
for i; do
for j; do
[[ "$i" != "$j" ]] &&
if diff -IEbwBaq "$i" "$j" > /dev/null; then
echo "$i & $j are the same"
else
echo "$i & $j are different"
fi
done
done
USAGE
./script /dir/*
Is there a way to filter stdout (or stderr) before being redirected to a file?
"redirecting to a pipe" is probably not the best way to put it but I'm looking for the easiest way to achieve something with that effect.
The usage scenario is the following. I'm using gawk --lint-invalid by principle to detect possible errors in my scripts and want to filter out spurious ones. Instead of redirecting errors to a file and grepping them out when examining the file, I would like the filtering to take place before writing to the file.
Example: this script prints every second line to stderr.
echo -ne 'a\nb\nc\nd\n' | gawk --lint=invalid 'BEGIN {b = 1;} // {if (b) print; else print > "/dev/stderr"; b = !b;}' 1>/dev/null 2>errors
cat errors | less
gawk: warning: regexp constant `//' looks like a C++ comment, but is not
b
d
gawk: (FILENAME=- FNR=4) warning: no explicit close of file `/dev/stderr' provided
But you can see the spurious gawk warnings (they are not of concern). They could be filtered for example, using
filter-gawk-output.sh
---------------------
grep -Ev 'looks like a|explicit close'
Is there an elegant way of doing that in-line when redirecting to errors file?
Right now when examining error files I always do
cat errors | ./filter-gawk-output.sh | less
What about:
gawk --lint=invalid 'whatever' INPUTFILE 2> GAWK_ERRORS.LOG
This way STDERR will be redirected to the error log.
I am not aware of gawk having facility to change the output of warnings. So I think this is more a question about shell syntax.
Given
filter_warnings() { grep -v '^gawk:'; }
awkprog='BEGIN {b = 1;} // {if (b) print; else print > "/dev/stderr"; b = !b;}'
where filter_warnings is for filtering out the gawk warnings and assuming bash as your shell, we can direct stderr to pipe command using |& syntax:
echo -ne 'a\nb\nc\nd\n' | gawk --lint=invalid "$awkprog" |& filter_warnings
If you want to outputs to file, then need to use parenthesis:
(echo -ne 'a\nb\nc\nd\n' | gawk --lint=invalid "$awkprog" > output.1) |& filter_warnings > output.2
Here output.1 will contain the gawk program output to stdout and output.2 the program output to to stderr.
I am trying to use the tee command on Solaris to route output of 1 command to 2 different steams each of which comprises multiple statements. Here is the snippet of what I coded, but does not work. This iteration throws errors about unexpected end of files. If I change the > to | it throws an error Syntax Error near unexpected token do.
todaydir=/some/path
baselen=${#todaydir}
grep sometext $todaydir/somefiles*
while read iline
tee
>(
# this is the first block
do ojob=${iline:$baselen+1:8}
echo 'some text here' $ojob
done > firstoutfile
)
>(
# this is the 2nd block
do ojob=${iline:$baselen+1:8}
echo 'ls -l '$todaydir'/'$ojob'*'
done > secondoutfile
)
Suggestions?
The "while" should begin (and end) inside each >( ... ) substitution, not outside. Thus, I believe what you want is:
todaydir=/some/path
baselen=${#todaydir}
grep sometext $todaydir/somefiles* | tee >(
# this is the first block
while read iline
do ojob=${iline:$baselen+1:8}
echo 'some text here' $ojob
done > firstoutfile
) >(
# this is the 2nd block
while read iline
do ojob=${iline:$baselen+1:8}
echo 'ls -l '$todaydir'/'$ojob'*'
done > secondoutfile
)
I don't think the tee command will do that. The tee command will write stdin to one or more files as well as spit it back out to stdout. Plus I'm not sure the shell can fork off two sub-processes in the command pipeline like you are trying. You'd probably be better off to use something like Perl to fork off a couple of sub-process and write stdin to each.