Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
This regular expression will match exactly one / and one . in a line. But why is it matching? Can anyone explain to me each characters role in this regular expression clearly?
if ($fp =~ m{^[^/]*/[^/]*$} and $fp =~ m{^[^.]*.[^.]$})
{
print $fp;
}
if($fp =~ m{^[^/]*/[^/]*$} and $fp =~ m{^[^.]*.[^.]$}) {
^\ / ^^\ / ^^
| | || | ||
------------- | || | ||
begin line | || | ||
--------------- || | ||
any char but / || | ||
------------------| | ||
zero or more | | ||
------------------ | ||
one / | ||
--------------------- ||
any char but / ||
------------------------|
zero or more |
------------------------|
end of line
So it search
begin or line (^),
followed by zero or more occurrence (*) of any char but / ([^/])
followed by a /
followed by zero or more occurrence (*) of any char but / ([^/])
followed by end of line ($)
The "." search is similar and the 'if' triggers if both are true.
Note that [...] searches a char in a range. For instance [abc] searches either a 'a', a 'b', or a 'c'. If first char is '^' test is reversed and [^/] is any char, but '/'.
While the previous answers are correct in explaining the regex, they do fail to point out that the 2nd regex is actually broken. As written it will match
start of line
followed by zero-or-more non-. (dot) characters
followed by ANY character, except \n
followed by ONE non-. (dot) character
end of line
Proof:
$ echo "This should NOT match" | perl -ne 'print if m{^[^.]*.[^.]$}'
This should NOT match <--- INCORRECT MATCH
$ echo "This should. match" | perl -ne 'print if m{^[^.]*.[^.]$}'
<--- INCORRECT MIS-MATCH
$ echo "This should match.!" | perl -ne 'print if m{^[^.]*.[^.]$}'
This should match.! <-- CORRECT (by luck)
$ echo "This should match." | perl -ne 'print if m{^[^.]*.[^.]$}'
This should match. <-- CORRECT
Correct would be
the . needs to be escaped (\.)
the 2nd character class needs a *
$ echo "This should NOT match" | perl -ne 'print if m{^[^.]*\.[^.]*$}'
<-- CORRECT
$ echo "This should. match" | perl -ne 'print if m{^[^.]*\.[^.]*$}'
This should. match <-- CORRECT
$ echo "This should match.!" | perl -ne 'print if m{^[^.]*\.[^.]*$}'
This should match.! <-- CORRECT
$ echo "This should match." | perl -ne 'print if m{^[^.]*\.[^.]*$}'
This should match. <-- CORRECT
The first expresion: m matches { opens expresion ^ first of line, [^/]* any character not '/' 0 or more times, '/' literal '/', again [^/]*, $ end of line, } closes the expresion.
I have a file with the following
firsttext=cat secondtext=dog thirdtext=mouse
and I want it to return this string:
"firsttext=cat" "secondtext=dog" "thirdtext=mouse"
I yave tried this one-liner but it gives me an error.
cat oneline | perl -ne 'print \"$_ \" '
Can't find string terminator '"' anywhere before EOF at -e line 1.
I don't understand the error.Why can't it just add the quotation marks?
Also, if I have a variable in this string, I want it to be interpolated like:
firsttext=${animal} secondtext=${othervar} thirdtext=mouse
Which should output
"firsttext=cat" "secondtext=dog" "thirdtext=mouse"
perl -lne '#f = map qq/"$_"/, split; print "#f";' oneline
What you want is this:
cat oneline | perl -ne 'print join " ", map { qq["$_"] } split'
The -ne option only splits on lines, it won't split on arbitrary whitespace without other options set.
I have a string like this $data = .|abc|bcd|cde|.
I need the string like this : abc|bcd|cde.
So I do :
$data =~ s/\|$//; # trim the last '|' out...
$data =~ s/^\.| +//gm ; #trim '.' in the begining
$data =~ s/^\|//; # trim '|' in the begining
But the problem I am facing is, the script is taking too long to execute. Is there any way to complete the whole operation with a single command ??
(Also tried chop($data) but that takes out only the last |)
Please suggest...
$data =~ s/(^[.|]*)|([.|]*$)//g;
That said, I don't assume that this will speed up your script significantly.
Another way: $data =~ s/^\.\|(.*)\|/$1/
But as Rene said, your speed bottleneck is probably somewhere else in your script.
I have a little problem. I want to split a line at every pipe character found using the split operator. Like in this example.
echo "000001d17757274585d28f3e405e75ed|||||||||||1||||||||||||||||||||||||" | \
perl -ane '$data = $_ ; chop $data ; #d = split(/\|/ , $data) ; print $#d+1,"\n" ;'
I would expect an ouput of 36
as awk splitting with the delimiter | return 36, but instead I get 12, as if the split stopped at the 1 character in the line.
echo "000001d17757274585d28f3e405e75ed|||||||||||1|||||||||||||||||||||||||||||||||||||||" | \
awk -F"|" '{print NF}'
Any idea. I have tried many ways of quoting the |, but without success.
Many thanks by advance.
According to split:
By default, empty leading fields are preserved, and empty trailing ones are deleted.
You need to specify a negative limit to the split to get the trailing ones:
split(/\|/, $data, -1)
Let me try to explain this as clearly as I can...
I have a script that at some point does this:
grep -vf ignore.txt input.txt
This ignore.txt has a bunch of lines with things I want my grep to ignore, hence the -v (meaning I don't want to see them in the output of grep).
Now, what I want to do is I want to be able to know how many lines of input.txt have been ignored by each line of ignore.txt.
For example, if ignore.txt had these lines:
line1
line2
line3
I would like to know how many lines of input.txt were ignored by ignoring line1, how many by ignoring line2, and so on.
Any ideas on how can I do this?
I hope that made sense... Thanks!
Note that the sum of the ignored lines plus the shown lines may NOT add up to the total number of lines... "line1 and line2 are here" will be counted twice.
#!/usr/bin/perl
use warnings;
use strict;
local #ARGV = 'ignore.txt';
chomp(my #pats = <>);
foreach my $pat (#pats) {
print "$pat: ", qx/grep -c $pat input.txt/;
}
According to unix.stackexchange
grep -o pattern file | wc -l
counts the total number of a given pattern in the file. A solution, given this and the information, that you already use a script, is to use several grep instances to filter and count the patterns, which you want to ignore.
However, I'd try to build a more comfortable solution involving a scripting language like e.g. python.
This script will count the matched lines by hash lookup and save the lines to be printed in #result, where you may process them as you will. To emulate grep, just print them.
I made the script so it can print out an example. To use with the files, uncomment the code in the script, and comment the ones marked # example line.
Code:
use strict;
use warnings;
use v5.10;
use Data::Dumper; # example line
# Example data.
my #ignore = ('line1' .. 'line9'); # example line
my #input = ('line2' .. 'line9', 'fo' .. 'fx', 'line2', 'line3'); # example line
#my $ignore = shift; # first argument is ignore.txt
#open my $fh, '<', $ignore or die $!;
#chomp(my #ignore = <$fh>);
#close $fh;
my #result;
my %lookup = map { $_ => 0 } #ignore;
my $rx = join '|', map quotemeta, #ignore;
#while (<>) { # This processes the remaining arguments, input.txt etc
for (#input) { # example line
chomp; # Required to avoid bugs due to missing newline at eof
if (/($rx)/) {
$lookup{$1}++;
} else {
push #result, $_;
}
}
#say for #result; # This will emulate grep
print Dumper \%lookup; # example line
Output:
$VAR1 = {
'line6' => 1,
'line1' => 0,
'line5' => 1,
'line2' => 2,
'line9' => 1,
'line3' => 2,
'line8' => 1,
'line4' => 1,
'line7' => 1
};
while IFS= read -r pattern ; do
printf '%s:' "$pattern"
grep -c -v "$pattern" input.txt
done < ignore.txt
grep with -c counts matching lines, but with -v added it counts non-matching lines. So, simply loop over the patterns and count once for each pattern.
This will print the number of ignored matches along with the matching pattern:
grep -of ignore.txt input.txt | sort | uniq -c
For example:
$ perl -le 'print "Coroline" . ++$s for 1 .. 21' > input.txt
$ perl -le 'print "line2\nline14"' > ignore.txt
$ grep -of ignore.txt input.txt | sort | uniq -c
1 line14
3 line2
I.e., A line matching "line14" was ignored once. A line matching "line2" was ignored 3 times.
If you just wanted to count the total ignored lines this would work:
grep -cof ignore.txt input.txt
Update: modified the example above to use strings so that the output is a little clearer.
This might work for you:
# seq 1 15 | sed '/^1/!d' | sed -n '$='
7
Explanation:
Delete all lines except those that match. Pipe these matching (ignored) lines to another sed command. Delete all these lines but show the line number only of the last line. So in this example 1 thru 15, lines 1,10 thru 15 are ignored - a total of 7 lines.
EDIT:
Sorry misread the question (still a little confused!):
sed 's,.*,sed "/&/!d;s/.*/matched &/" input.txt| uniq -c,' ignore.txt | sh
This shows the number of matches for each pattern in the the ignore.txt
sed 's,.*,sed "/&/d;s/.*/non-matched &/" input.txt | uniq -c,' ignore.txt | sh
This shows the number of non-matches for each pattern in the the ignore.txt
If using GNU sed, these should work too:
sed 's,.*,sed "/&/!d;s/.*/matched &/" input.txt | uniq -c,;e' ignore.txt
or
sed 's,.*,sed "/&/d;s/.*/non-matched &/" input.txt | uniq -c,;e' ignore.txt
N.B. Your success with patterns may vary i.e. check for meta characters beforehand.
On reflection I thought this can be improved to:
sed 's,.*,/&/i\\matched &,;$a\\d' ignore.txt | sed -f - input.txt | sort -k2n | uniq -c
or
sed 's,.*,/&/!i\\non-matched &,;$a\\d' ignore.txt | sed -f - input.txt | sort -k2n | uniq -c
But NO, on large files this is actually slower.
Are both ignore.txt and input.txt sorted?
If so, you can use the comm command!
$ comm -12 ignore.txt input.txt
How many lines are ignored?
$ comm -12 ignore.txt input.txt | wc -l
Or, if you want to do more processing, combine comm with awk.:
$ comm ignore.txt input.txt | awk '
END {print "Ignored lines = " igtotal " Lines not ignored = "commtotal " Lines unique to Ignore file = " uniqtotal}
{
if ($0 !~ /^\t/) {uniqtotal+=1}
if ($0 ~ /^\t[^\t]/) {commtotal+=1}
if ($0 ~ /^\t\t/) {igtotal+=1}
}'
Here I'm taking advantage with the tabs that are placed in the output by the comm command:
* If there are no tabs, the line is in ignore.txt only.
* If there is a single tab, it is in input.txt only
* If there are two tabs, the line is in both files.
By the way, not all the lines in ignore.txt are ignored. If the line isn't also in input.txt, the line can't really be said to be ignored.
With Dennis Williamson's Suggestion
comm ignore.txt input.txt | awk '
!/^\t/ {uniqtotal++}
/^\t[^\t]/ {commtotal++}
/^\t\t/ {igtotal++}
END {print "Ignored lines = " igtotal " Lines not ignored = "commtotal " Lines unique to Ignore file = " uniqtotal}'