Need a greedy address range in sed - sed

I have a bash script and I'm working with Vim. This script appends data to a file before the end of the final fold by copying the file with the final # }}} then appending the new data followed by a new # }}}. This following snippet could be so much more elegant I had a greedy address range.
local END=$(grep -n '# }}}' $FILENAME | sed -n "$ s/\([[:digit:]]*\)\(.*\)/\1/p ")
let END=$END-1
sed -n "1, $END {p}" $FILENAME > $TEMPFILE
In theory if sed supported a '--greedy-address-range' flag I could use this: sed --silent --in-place --greedy-address-range "1, /# }}}/ {p}" $FILENAME
Of course, thank you in advance for any suggestions!

If I understood well the output you need, this will do the job just as well:
tac $FILENAME | sed -n '/# }}}/,$p' | tac > $FILENAME
In order to print all lines until the last match, I reverse the file and then use
sed to print all lines from the first match to EOF then reverse it again.

Related

Get version of Podspec via command line (bash, zsh) [duplicate]

Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"

Matching pattern on multiple lines

I have a file as below
NAME(BOLIVIA) TYPE(SA)
APPLIC(Java) IP(192.70.xxx.xx)
NAME(BOLIVIA) TYPE(SA)
APPLIC(Java) IP(192.71.xxx.xx)
I am trying to extract the values NAME and IP using sed:
cat file1 |
sed ':a
N
$!ba
s/\n/ /g' | sed -n 's/.*\(NAME(BOLI...)\).*\(IP(.*)\).*/\1 \2/p'
However, I'm only getting the output:
NAME(BOLIVIA) IP(192.71.xxx.xx)
What I would like is:
NAME(BOLIVIA) IP(192.70.xxx.xx)
NAME(BOLIVIA) IP(192.71.xxx.xx)
Would appreciate it if someone could give me a pointer on what I'm missing.
TIA
Your first sed commands reformats the file into one long line. You could have used tr -d "\n" for this, but that is not the problem.
The problem is in the second part, where the .* greedy eats as much as possible until finding the last match.
Your solution could be "fixed" with the ugly
# Do not use this:
sed -zn 's/[^\n]*\(NAME(BOLI...)\)[^\n]*\n[^\n]*\(IP([^)]*)\)[^\n]*/\1 \2/gp' file1
Possible solutions:
cat file1 | paste -d " " - - | sed -n 's/.*\(NAME(BOLI...)\).*\(IP(.*)\).*/\1 \2/p'
# or
grep -Eo "(NAME\(BOLI...\)|IP\(.*\))" file1 | paste -d " " - -
# or
printf "%s %s\n" $(grep -Eo "(NAME\(BOLI...\)|IP\(.*\))" file1)
In case you are ok with awk could you please try following. Written and tested in link
https://ideone.com/bJDzgf with shown samples only.
awk '
match($0,/^NAME\([^)]*/){
name=substr($0,RSTART+5,RLENGTH-5)
next
}
match($0,/IP\([^)]*/){
print name,substr($0,RSTART+3,RLENGTH-3)
name=""
}
' Input_file
This might work for you (GNU sed):
sed -n '/NAME/{N;/IP/s/\s.*\s/ /p}' file
If a line contains NAME and the following line contains IP remove everything between and print the result.
An alternative shorter awk:
awk '$1 ~ /^NAME/ {nm = $1} $2 ~ /^IP/ {print nm, $2}' file
NAME(BOLIVIA) IP(192.70.xxx.xx)
NAME(BOLIVIA) IP(192.71.xxx.xx)
The issue in your script is the use .* which matches in a greedy way
so that you have only the first NAME(BOLI...) and last IP(.*)
If you can use python :
#!/bin/bash
python -c '
import re, sys
for ar in re.findall(r"(NAME\(BOLI.*?\)).*?(IP\(.*?\))", sys.stdin.read(), re.DOTALL):
print(*ar)
' < input-file

Remove everything in a line before comma

I have multiple files with lines like:
foo, 123456
bar, 654321
baz, 098765
I would like to remove everything on each line before (and including) the comma.
The output would be:
123456
654321
098765
I attempted to use the following after seeing something similar on another question, but the user didn't leave an explanation, so I'm not sure how the wildcard would be handled:
find . -name "*.csv" -type f | xargs sed -i -e '/*,/d'
Thank you for any help you can offer.
METHOD 1:
If it's always the 2nd column you want, you can do this with awk -- this command is actually splitting the rows on the whitespace rather than the comma, so it gets your second column -- the numbers, but without the leading space:
awk '{print $2}' < whatever.csv
METHOD 2:
Or to get everything after the comma (including the space):
sed -e 's/^.*,//g' < whatever.csv
METHOD 3:
If you want to find all of the .csv files and get the output of all of them together, you can do:
sed -e 's/^.*,//g' `find . -name '*.csv' -print`
METHOD 4:
Or the same way you were starting to -- with find and xargs:
find . -name '*.csv' -type f -print | xargs sed -e 's/^.*,//'
METHOD 5:
Making all of the .csv files into .txt files, processed in the way described above, you can make a brief shell script. Like this:
Create a script "bla.sh":
#!/bin/sh
for infile in `find . -name '*.csv' -print` ; do
outfile=`echo $infile | sed -e 's/.csv/.txt/'`
echo "$infile --> $outfile"
sed -e 's/^.*,//g' < $infile > $outfile
done
Make it executable by typing this:
chmod 755 bla.sh
Then run it:
./bla.sh
This will create a .txt output file with everything after the comma for each .csv input file.
ALTERNATE METHOD 5:
Or if you need them to be named .csv, the script could be updated like this -- this just makes an output file named "file-new.csv" for each input file named "file.csv":
#!/bin/sh
for infile in `find . -name '*.csv' -print` ; do
outfile=`echo $infile | sed -e 's/.csv/-new.csv/'`
echo "$infile --> $outfile"
sed -e 's/^.*,//g' < $infile > $outfile
done
Something like this should work for a single file. Let's say the
input is 'yourfile' and you want the output to go to 'outfile'.
sed 's/^.*,//' < yourfile > outfile
The syntax to do a search-and-replace is s/input_pattern/replacement/
The ^ anchors the input pattern to the beginning of the line.
A dot . matches any single character; .* matches a string of zero or more of any character.
The , matches the comma.
The replacement pattern is empty, so whatever matched the input_pattern
will be removed.

Unix - Split to N files using regexp to name destination file

How do I split a file to N files using as a filename the first 2 chars on the line.
Ex input file:
AA23409234TEXT
BA23201202Other Text
AA23509234YADA
BA23202202More Text.
C1000000000000000000
Should generate 3 files:
AA.txt
AA23409234TEXT
AA23509234YADA
BA.txt
BA23201202Other Text
BA23202202More Text.
C1.txt
C1000000000000000000
I'm thinking of using a sed script similar to this
/^(..)/w \1
But what that really does is create a file named '\1' instead of the capture group.
Any ideas?
$ awk '{fname=substr($0, 0, 2); print >>fname}' input.txt
Or
$ while read line; do echo "$line" >>"${line:0:2}"; done <input.txt
The first thing you need to do is determine all of your file names:
filenames=$(sed 's/\(..\).*/\1/' listOfStrings.txt | sort | uniq)
Then, loop through those filenames
for filename in $filenames
do
sed -n '/^$filename/ p' listOfStrings.txt > $filename.txt
done
I have not tested this, but I think it should work.
This might work for you:
sed 's/\(..\).*/echo "&" >>\1.txt/' file | sh
or if you have GNU sed:
sed 's/\(..\).*/echo "&" >>\1.txt/e' file

sed or grep or awk to match very very long lines

more file
param1=" 1,deerfntjefnerjfntrjgntrjnvgrvgrtbvggfrjbntr*rfr4fv*frfftrjgtrignmtignmtyightygjn 2,3,4,5,6,7,8,
rfcmckmfdkckemdio8u548384omxc,mor0ckofcmineucfhcbdjcnedjcnywedpeodl40fcrcmkedmrikmckffmcrffmrfrifmtrifmrifvysdfn drfr4fdr4fmedmifmitfmifrtfrfrfrfnurfnurnfrunfrufnrufnrufnrufnruf"****
need to match the content of param1 as
sed -n "/$param1/p" file
but because the line length (very long line) I cant match the line
what’s the best way to match very long lines?
The problem you are facing is that param1 contains special characters which are being interpreted by sed. The asterisk ('*') is used to mean 'zero or more occurrences of the previous character', so when this character is interpreted by sed there is nothing left to match the literal asterisk you are looking for.
The following is a working bash script that should help:
#!/bin/bash
param1=' 1,deerfntjefnerjfntrjgntrjnvgrvgrtbvggfrjbntr\*rfr4fv\*frfftrjgtrignmtignmtyightygjn 2,3,4,5,6,7,8, rfcmckmfdkckemdio8u548384omxc,mor0ckofcmineucfhcbdjcnedjcnywedpeodl40fcrcmkedmrikmckffmcrffmrfrifmtrifmrifvysdfn'
cat <<EOF | sed "s/${param1}/Bubba/g"
1,deerfntjefnerjfntrjgntrjnvgrvgrtbvggfrjbntr*rfr4fv*frfftrjgtrignmtignmtyightygjn 2,3,4,5,6,7,8, rfcmckmfdkckemdio8u548384omxc,mor0ckofcmineucfhcbdjcnedjcnywedpeodl40fcrcmkedmrikmckffmcrffmrfrifmtrifmrifvysdfn
EOF
Maybe the problem is that your $param1 contains special characters? This works for me:
A="$(perl -e 'print "a" x 10000')"
echo $A | sed -n "/$A/p"
($A contains 10 000 a characters).
echo $A | grep -F $A
and
echo $A | grep -P $A
also works (second requires grep with built-in PCRE support. If you want pattern matching you should use either this or pcregrep. If you don't, use the fixed grep (grep -F)).
echo $A | grep $A
is too slow.