grep -v with newlines not behaving as expected - sh

I'm trying to create a pre-commit trigger for git, I have to use bin/sh for maximum compatibility so please keep to what will work with sh (not bash etc)
I'm not a unix developer, so there is probably something pretty fundamental I'm not grasping here, but I can't seem to discover it.
I have a list of the files in a variable. I want to remove those with certain suffixes.
what I thought would work, does work here: https://www.online-utility.org/text/grep.jsp
Input Regex: ^.+(\.auto\.sql|\.sln)$
src/blah/auto/star.LOAD_DimBrandDataLevels.auto.sql
src/blah/auto/star.LOAD_DimBrandDataLevels_FROM_Mds_mdm_BrandDataConfidence.auto.sql
src/blah/Blah.sln
src/blah/Some Other File.sql
Invert Match (Display Non-Matching Lines)
Correctly returns
src/blah/Some Other File.sql
But when I put it into a sh script it doesn't work (I'm using https://www.jdoodle.com/test-bash-shell-script-online/)
#!/bin/sh
files="src/blah/auto/star.LOAD_DimBrandDataLevels.auto.sql
src/blah/auto/star.LOAD_DimBrandDataLevels_FROM_Mds_mdm_BrandDataConfidence.auto.sql
src/blah/Blah.sln
src/blah/Some Other File.sql"
printf "%s" "$files"
numfiles=$( printf '%s' "$files" | grep -c '$' )
printf "\n%s\n" $numfiles
#files=$( printf '%s' "$files" | grep -v "\.auto\.sql") # works but diesn't guarantee end of line
#printf "%s" "$files"
files=$( printf '%s' "$files" | grep -v "^.+(\.auto\.sql|\.sln)$") # doesn't work even though it should match
printf "%s" "$files"
returns
src/blah/auto/star.LOAD_DimBrandDataLevels.auto.sql
src/blah/auto/star.LOAD_DimBrandDataLevels_FROM_Mds_mdm_BrandDataConfidence.auto.sql
src/blah/Blah.sln
src/blah/Some Other File.sql
4
src/blah/auto/star.LOAD_DimBrandDataLevels.auto.sql
src/blah/auto/star.LOAD_DimBrandDataLevels_FROM_Mds_mdm_BrandDataConfidence.auto.sql
src/blah/Blah.sln
src/blah/Some Other File.sql
matching with non-end of line tokens works but the end of line doesn't.
However -E works fine and finds only the rows I don't want
src/blah/auto/star.LOAD_DimBrandDataLevels.auto.sql
src/blah/auto/star.LOAD_DimBrandDataLevels_FROM_Mds_mdm_BrandDataConfidence.auto.sql
src/blah/Blah.sln
src/blah/Some Other File.sql
4
src/blah/auto/star.LOAD_DimBrandDataLevels.auto.sql
src/blah/auto/star.LOAD_DimBrandDataLevels_FROM_Mds_mdm_BrandDataConfidence.auto.sql
src/blah/Blah.sln
Really not sure what is going on and have exhausted several avenues.
Hopefully someone can shine a bit of light on how to solve this. Thanks

You need to use -E, --extended-regexp:
$ printf "%s" "$files" | grep -v -E "^.+(\.auto\.sql|\.sln)$"
src/blah/Some Other File.sql

Related

Get version of Podspec via command line (bash, zsh) [duplicate]

Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"

Removing a specific line in bash with an exact string

I'm having trouble in getting sed to remove just the specific line I want. Let's say I have a file that looks like this:
testfile
testfile.txt
testfile2
Currently I'm using this to remove the line I want:
sed -i "/$1/d" file
The issue is that with this if I were to give testfile as input it would delete all three lines but I want it to only remove the first line. How do I do this?
With grep
grep -x -F -v -- "$1" file
# or
grep -xFv -- "$1" file
-F is for "fixed strings" -- turns off regex engine.
-x is to match entire line.
-v is for "everything but" the matched line(s).
-- to signal the end of options, in case $1 starts with a hyphen.
To save the file
grep -xFv -- "$1" file | sponge file # `moreutils` package
# or
tmp=$(mktemp)
grep -xFv -- "$1" file > "$tmp" && mv "$tmp" file
So match the whole line.
var=testfile
sed -i '/^'"$var"'$/d' file
# or with " quoting
sed -i "/^$var\$/d" file
You can learn regex with fun online with regex crosswords.

AWK/SED. How to remove parentheses in simple text file

I have a text file looking like this:
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02) ... and so on.
I would like to modify the file by removing all the parenthesis and a new line for each couple
so that it look like this:
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
...
A simple way to do that?
Any help is appreciated,
Fred
I would use tr for this job:
cat in_file | tr -d '()' > out_file
With the -d switch it just deletes any characters in the given set.
To add new lines you could pipe it through two trs:
cat in_file | tr -d '(' | tr ')' '\n' > out_file
As was said, almost:
sed 's/[()]//g' inputfile > outputfile
or in awk:
awk '{gsub(/[()]/,""); print;}' inputfile > outputfile
This would work -
awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' inputfile > outputfile
Test:
[jaypal:~/Temp] cat file
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)
[jaypal:~/Temp] awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' file
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
This might work for you:
echo "(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)" |
sed 's/) (/\n/;s/[()]//g'
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
Guess we all know this, but just to emphasize:
Usage of bash commands is better in terms of time taken for execution, than using awk or sed to do the same job. For instance, try not to use sed/awk where grep can suffice.
In this particular case, I created a file 100000 lines long file, each containing characters "(" as well as ")". Then ran
$ /usr/bin/time -f%E -o log cat file | tr -d "()"
and again,
$ /usr/bin/time -f%E -ao log sed 's/[()]//g' file
And the results were:
05.44 sec : Using tr
05.57 sec : Using sed
cat in_file | sed 's/[()]//g' > out_file
Due to formatting issues, it is not entirely clear from your question whether you also need to insert newlines.

grep all lines from start of file to line containing a string

If I have input file containing
statementes
asda
rertte
something
nothing here
I want to grep / extract (without using awk) every line from starting till I get the string "something". How can I do this? grep -B does not work since it needs the exact number of lines.
Desired output:
statementes
asda
rertte
something
it's not completely robust, but sure -B works... just make the -B count huge:
grep -B `wc -l <filename>` -e 'something' <filename>
You could use a bash while loop and exit early when you hit the string:
$ cat file | while read line; do
> echo $line
> if echo $line | grep -q something; then
> exit 0
> fi
> done
head -n `grep -n -e 'something' <filename> | cut -d: -f1` <filename>

sed or grep or awk to match very very long lines

more file
param1=" 1,deerfntjefnerjfntrjgntrjnvgrvgrtbvggfrjbntr*rfr4fv*frfftrjgtrignmtignmtyightygjn 2,3,4,5,6,7,8,
rfcmckmfdkckemdio8u548384omxc,mor0ckofcmineucfhcbdjcnedjcnywedpeodl40fcrcmkedmrikmckffmcrffmrfrifmtrifmrifvysdfn drfr4fdr4fmedmifmitfmifrtfrfrfrfnurfnurnfrunfrufnrufnrufnrufnruf"****
need to match the content of param1 as
sed -n "/$param1/p" file
but because the line length (very long line) I cant match the line
what’s the best way to match very long lines?
The problem you are facing is that param1 contains special characters which are being interpreted by sed. The asterisk ('*') is used to mean 'zero or more occurrences of the previous character', so when this character is interpreted by sed there is nothing left to match the literal asterisk you are looking for.
The following is a working bash script that should help:
#!/bin/bash
param1=' 1,deerfntjefnerjfntrjgntrjnvgrvgrtbvggfrjbntr\*rfr4fv\*frfftrjgtrignmtignmtyightygjn 2,3,4,5,6,7,8, rfcmckmfdkckemdio8u548384omxc,mor0ckofcmineucfhcbdjcnedjcnywedpeodl40fcrcmkedmrikmckffmcrffmrfrifmtrifmrifvysdfn'
cat <<EOF | sed "s/${param1}/Bubba/g"
1,deerfntjefnerjfntrjgntrjnvgrvgrtbvggfrjbntr*rfr4fv*frfftrjgtrignmtignmtyightygjn 2,3,4,5,6,7,8, rfcmckmfdkckemdio8u548384omxc,mor0ckofcmineucfhcbdjcnedjcnywedpeodl40fcrcmkedmrikmckffmcrffmrfrifmtrifmrifvysdfn
EOF
Maybe the problem is that your $param1 contains special characters? This works for me:
A="$(perl -e 'print "a" x 10000')"
echo $A | sed -n "/$A/p"
($A contains 10 000 a characters).
echo $A | grep -F $A
and
echo $A | grep -P $A
also works (second requires grep with built-in PCRE support. If you want pattern matching you should use either this or pcregrep. If you don't, use the fixed grep (grep -F)).
echo $A | grep $A
is too slow.