I have a simple playlist of song files:
1003 James Brown - The Boss Unknown Artist.mp3
1004 James Brown - Slaughters Theme Unknown Artist.mp3
1005 James Brown - Payback(1) Unknown Artist.mp3
...
I would like them in the following format:
1003 James_Brown_-_The_Boss_Unknown_Artist.mp3
1004 James_Brown_-_Slaughters_Theme_Unknown_Artist.mp3
...
Notice that the whitespace behind the number in front is NOT replaced. I have the following simple sed script:
sed "s/ /_/g"
but that replaces also the space after the number. I know how to form capture groups, but that will not help either. How can I convince sed to only apply the replacement to a portion of the input string, rather than the whole string?
You could do
sed 's/ /_/g; s/_/ /'
I.e. first turn all spaces into underscores, then turn the first underscore back into a space.
Related
I have an input which looks like this:
1
2
3
4
5
6
And I want to transform it with sed to :
12
345
6
I know it can be easily done with other tools but I want to do it specifically with sed as a learning exercise.
I have attempted this:
sed ':x ; /^ *$/{ N; s/\n// ; bx; }'
But it prints :
123456
Can someone help me fix this?
Quoting from the GNU sed manual:
A common technique to process blocks of text such as paragraphs (instead of line-by-line) is using the following construct:
sed '/./{H;$!d} ; x ; s/REGEXP/REPLACEMENT/'
The first expression, /./{H;$!d} operates on all non-empty lines, and adds the current line (in the pattern space) to the hold space. On all lines except the last, the pattern space is deleted and the cycle is restarted.
The other expressions x and s are executed only on empty lines (i.e. paragraph separators). The x command fetches the accumulated lines from the hold space back to the pattern space. The s/// command then operates on all the text in the paragraph (including the embedded newlines).
And indeed,
sed '/./{H;$!d} ; x ; s/\n//g'
does what you want.
FWIW here's how to really do that task in UNIX:
$ awk -v RS= -v OFS= '{$1=$1}1' file
12
345
6
The above will work on any UNIX box.
A GNU awk approach:
$ awk -F"\n" '{gsub("\n","");}1' RS='\n{2,}' file
12
345
6
Note it will add a trailing newline\n after last line.
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
Taiwan 55 144 Asia
North Korea 44 2134 Asia
The above is my data file.
There are empty lines in it.
There are no spaces or tabs in those empty lines.
I want to remove all empty lines in the data.
I did a search Delete empty lines using SED has given the perfect answer.
Before that, I wrote two sed code myself:
sed -r 's/\n\n+/\n/g' cou.data
sed 's/\n\n\n*/\n/g' cou.data
And I tried awk gsub, not successful either.
awk '{ gsub(/\n\n*/, "\n"); print }' cou.data
But they don't work and nothing changes.
Where did I do wrong about my sed code?
Use the following sed to delete all blank lines.
sed '/./!d' cou.data
Explanation:
/./ matches any character, including a newline.
! negates the selector, i.e. it makes the command apply to lines which do not match the selector, which in this case is the empty line(s).
d deletes the selected line(s).
cou.data is the path to the input file.
Where did you go wrong?
The following excerpt from How sed Works states:
sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed; each command can have an address associated to it: addresses are a kind of condition code, and a command is only executed if the condition is verified before the command is to be executed.
When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed.8 Then the next cycle starts for the next input line.
I've intentionally emboldened the parts which are pertinent to why your sed examples are not working. Given your examples:
They seem to disregard that sed reads one line at a time.
The trailing newlines, (\n\n and \n\n\n in your first and second example respectively), which you're trying to match don't actually exist. They've been removed by the time your regexp pattern is executed and then reinstated when the end of the script is reached.
RobC's answer is great if your lines are terminated by newline (linefeed or \n) only, because SED separates lines that way. If your lines are terminated by \r\n (or CRLF) - which you may have your reasons for doing even on a unix system - you will not get a match, because from sed's perspective the line isn't empty - the \r (CR) counts as a character. Instead you can try:
sed '/^\r$/d' filename
Explanation:
^ matches the start of the line
\r matches the carriage return
$ matches the end of the line
d deletes the selected line(s).
filename is the path to the input file.
If I have
123456red100green
123456bee010yellow
123456usb110orange
123456sos011querty
123456let101bottle
and I want it to be
123456red111green
123456bee111yellow
123456usb111orange
123456sos111querty
123456let111bottle
notice: the first 6 characters don't change,,,,
the following 6 change,,,,
also these strings might be anywhere in a file (beginning, end, anywhere)
I want to specify sed to
1)find 123456
2)skip the next three characters
3)replace the next three with 111
The closest I've come to is:
sed '/s/123456....../123456...111/g'
I know dots mean anything but I don't know the equivalent on the other side. In short how to command sed to leave characters in a match untouched.
sorry for having been unclear of what I want please bear with me
Matching 123456 followed by three characters that are not to be modified, and then replacing the next three characters with 111:
sed 's/\(123456...\).../\1111/g' file
The \( ... \) captures the part of the string that we don't want to modify. These are re-inserted with \1. The whole matching bit of the line is replaced by "the bit in the \( ... \) (i.e. \1) followed by 111".
If you want to change each and every zero (as in your examples), then just sed 's/0/1/g' would do. Or sed -e '/^123456/ s/0/1/g' to do the same on lines starting with 123456.
But to count characters, as you ask, use ( .. ) to capture the varying parts and \1 to replace them (using sed -E). So:
echo 123456abcdefgh | sed -Ee 's/^(123456...).../\1111/'
outputs 123456abc111gh. The \1 puts back the part matched by 123456..., the next three ones are literal characters.
(Without -E, you'd need \( .. \) to group.)
I have just started experimenting with sed and don't really get how does match capturing work: if I have a code like this for capturing two words sed 's/\([a-z]*\).*\([a-z]*\).*/\1 \2/' why isn't the second word captured?
Edit1: Let's say I have this string: "the brown fox jumps over the lazy dog". I want sed to match "the brown", but it only matches the first word
(Quoting Sundeep, just to make a Q/A pair.)
replace the dot in .* with space character...
sed 's/\([a-z]*\) *\([a-z]*\).*/\1 \2/'
sample string
There are 1 123 456 drops of water
Is there a ways to take out the thousand space separator with SED ?
resulting in
There are 1123456 drops of water
Find the pattern was not difficult
but I cannot find the how to remove the space
sed s/[0-9]' '[0-9]/ ??? /
Thank you in advance.
sed 's/\([0-9]\) \([0-9]\)/\1\2/g'
This should work too -
perl -pe 's/(?<=[0-9])(\s)(?=[0-9])//g'
We use a negative look behind and look ahead where we look for numbers in both cases. If we find a space between them, we replace with nothing.
[jaypal:~] echo "There are 1 123 456 drops of water" | perl -pe 's/(?<=[0-9])(\s)(?=[0-9])//g'
There are 1123456 drops of water