Remove all the characters from string after last '/' - sed

I have the followiing input file and I need to remove all the characters from the strings that appear after the last '/'. I'll also show my expected output below.
input:
/start/one/two/stopone.js
/start/one/two/three/stoptwo.js
/start/one/stopxyz.js
expected output:
/start/one/two/
/start/one/two/three/
/start/one/
I have tried to use sed but with no luck so far.

You could simply use good old grep:
grep -o '.*/' file.txt
This simple expression takes advantage of the fact that grep is matching greedy. Meaning it will consume as much characters as possible, including /, until the last / in path.
Original Answer:
You can use dirname:
while read line ; do
echo dirname "$line"
done < file.txt
or sed:
sed 's~\(.*/\).*~\1~' file.txt

perl -lne 'print $1 if(/(.*)\//)' your_file

Try this GNU sed command,
$ sed -r 's~^(.*\/).*$~\1~g' file
/start/one/two/
/start/one/two/three/
/start/one/
Through awk,
awk -F/ '{sub(/.*/,"",$NF); print}' OFS="/" file

Related

Insert linebreak in a file after a string

I have a unique (to me) situation:
I have a file - file.txt with the following data:
"Line1", "Line2", "Line3", "Line4"
I want to insert a linebreak each time the pattern ", is found.
The output of file.txt shall look like:
"Line1",
"Line2",
"Line3",
"Line4"
I am having a tough time trying to escape ", .
I tried sed -i -e "s/\",/\n/g" file.txt, but I am not getting the desired result.
I am looking for a one liner using either perl or sed.
You may use this gnu sed:
sed -E 's/(",)[[:blank:]]*/\1\n/g' file.txt
"Line1",
"Line2",
"Line3",
"Line4"
Note how you can use single quote in sed command to avoid unnecessary escaping.
If you don't have gnu sed then here is a POSIX compliant sed solution:
sed -E 's/(",)[[:blank:]]*/\1\
/g' file.txt
To save changes inline use:
sed -i.bak -E 's/(",)[[:blank:]]*/\1\
/g' file.txt
Could you please try following. using awk's substitution mechanism here, in case you are ok with awk.
awk -v s1="\"" -v s2="," '{gsub(/",[[:blank:]]+"/,s1 s2 ORS s1)} 1' Input_file
Here's a Perl solution:
perl -pe 's/",\K/\n/g' file.txt
The substitution pattern matches the ",, but the \K says to ignore anything to the left for the replacement (so, ",) will not be replaced. The replacement then effectively inserts the newline.
I used the single quote for the argument to -e, but that doesn't work on Windows where you have to use ". Instead of escaping the ", you can specify it in another way. That's code number 0x22, so you can write:
perl -pe "s/\x22,\K/\n/g" file.txt
Or in octal:
perl -pe "s/\042,\K/\n/g" file.txt
Use this Perl one-liner:
perl -F'/"\K,\s*/' -lane 'print join ",\n", #F;' in_file > out_file
Or this for in-line replacement:
perl -i.bak -F'/"\K,\s*/' -lane 'print join ",\n", #F;' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F'/"\K,\s*/' : Split into #F on a double quote, followed by comma, followed by 0 or more whitespace characters, rather than on whitespace. \K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. This causes to keep the double quote in #F elements, while comma and whitespace are removed during the split.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

Grep numbers between colon and comma

I want to grep all results which contain over 70 percent of usage
Example of output:
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":69,"dir":"/root"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":1,"dir":"/oradump"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Expected View after the grep:
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Awk is more suited here:
$ awk -F'[:,]' '$6>70' file
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Or with Perl:
$ perl -ne'print if /"percentage":([0-9]+),/ and $1 > 70'
(no pesky seperator counting needed)
perl -F'[:,]' -ane 'print if $F[5]>70' file
GNU sed
sed -n '/:[0]\?70,/d;/:[0-1]\?[7-9][0-9],/p' file

extract a substring of 11 characters from a line using sed,awk or perl

I have a file with many lines, in each line
there is either substring
whatever_blablablalsfjlsdjf;asdfjlds;f/watch?v=yPrg-JN50sw&amp,whatever_blabla
or
whatever_blablabla"/watch?v=yPrg-JN50sw&amp" class=whatever_blablablavwhate
I want to extract a substring, like the "yPrg-JN50s" above
the matching pattern is
the 11 characters after the string "/watch?="
how to extract the substring
I hope it is sed, awk in one line
if not, a pn line perl script is also ok
You can do
grep -oP '(?<=/watch\?v=).{11}'
if your grep knows Perl regex, or
sed 's/.*\/watch?v=\(.\{11\}\).*/\1/g'
$ cat file
/watch?v=yPrg-JN50sw&amp
"/watch?v=yPrg-JN50sw&amp" class=
$
$ awk 'match($0,/\/watch\?v=/) { print substr($0,RSTART+RLENGTH,11) }' file
yPrg-JN50sw
yPrg-JN50sw
Just with the shell's parameter expansion, extract the 11 chars after "watch?v=":
while IFS= read -r line; do
tmp=${line##*watch?v=}
echo ${tmp:0:11}
done < filename
You could use sed to remove the extraneous information:
sed 's/[^=]\+=//; s/&.*$//' file
Or with awk and sensible field separators:
awk -F '[=&]' '{print $2}' file
Contents of file:
cat <<EOF > file
/watch?v=yPrg-JN50sw&amp
"/watch?v=yPrg-JN50sw&amp" class=
EOF
Output:
yPrg-JN50sw
yPrg-JN50sw
Edit accommodating new requirements mentioned in the comments
cat <<EOF > file
<div id="" yt-grid-box "><div class="yt-lockup-thumbnail"><a href="/watch?v=0_NfNAL3Ffc" class="ux-thumb-wrap yt-uix-sessionlink yt-uix-contextlink contains-addto result-item-thumb" data-sessionlink="ved=CAMQwBs%3D&ei=CPTsy8bhqLMCFRR0fAodowXbww%3D%3D"><span class="video-thumb ux-thumb yt-thumb-default-185 "><span class="yt-thumb-clip"><span class="yt-thumb-clip-inner"><img src="//i1.ytimg.com/vi/0_NfNAL3Ffc/mqdefault.jpg" alt="Miniature" width="185" ><span class="vertical-align"></span></span></span></span><span class="video-time">5:15</span>
EOF
Use awk with sensible record separator:
awk -v RS='[=&"]' '/watch/ { getline; print }' file
Note, you should use a proper XML parser for this sort of task.
grep --perl-regexp --only-matching --regexp="(?<=/watch\\?=)([^&]{0,11})"
Assuming your lines have exactly the format you quoted, this should work.
awk '{print substr($0,10,11)}'
Edit: From the comment in another answer, I guess your lines are much longer and complicated than this, in which case something more comprehensive is needed:
gawk '{if(match($0, "/watch\\?v=(\\w+)",a)) print a[1]}'

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon

I have a line:
<random junk>TYPE=snp;<more random junk>
and I need to return everything between the end of TYPE= and the ; (in this case snp but it could be any of a number of text strings.
I tried various sed / awk solutions but I can't seem to get it working. I have the feeling this is a simple problem so, sorry about that.
This seems to work:
sed 's/.*TYPE=\(.*\);.*/\1/'
EDIT:
Ah, so there can be semicolons in the random junk. Try this:
sed 's/.*TYPE=\([^;]*\);.*/\1/'
requires GNU grep:
grep -Po '(?<=TYPE=)[^;]+'
meaning: preceded by "TYPE=", find some non-semicolon characters
One way using GNU sed:
sed -r 's/.*TYPE=([^;]+).*/\1/' file.txt
Since you also tagged this awk:
$ text='<random junk>TYPE=snp;<more random junk>'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
$ text='foo=bar;baz=fnu;TYPE=snp;XAI=0;XAM=0'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
(Only using the variable to keep the lines from wrapping.)
Or, to parse this as set of variable=value pairs rather than just a string of text:
$ echo "$text" | awk -vRS=";" -F= '$1=="TYPE" {print $2}'
snp
You can also do this in pure bash, if you want:
$ t="red=blue;TYPE=snp;XAI=0.0037843;XAM=0.0170293;XAS=0.013245;XRI=0;XRM=0"
$ t=${t#*TYPE=}
$ t=${t%%;*}
$ echo $t
snp

Delete all the lines in a file that contains a specific character

I want to delete all the rows/lines in a file that has a specific character, '?' in my case. I hope there is a single line command in Bash or AWK or Perl. Thanks
You can use sed to modify the file "in-place":
sed -i "/?/d" file
Alternatively, use grep:
grep -v "?" file > newfile.txt
Even better, just a single line using sed
sed '/?/d' input
use -i to edit file in place.
perl -i -ne'/\?/ or print' file
or
perl -i -pe's/^.*?\?.*//s' file
Here are already grep, sed and perl solutions - only for fun, pure bash one:
pattern='?'
while read line
do
[[ "$line" =~ "$pattern" ]] || echo "$line"
done
translated
for every line on the STDIN
match it for the pattern =~
and if the match is not successful || - print out the line
awk '!($0~/?/){print $0}' file_name