How do you remove ^# with sed? - sed

I am trying to parse some logs and there is a strange ^# symbol in there. I can remove it in vim by cutting that character and paste/searching for it, but how do I remove it in the bash command line automatically.
This doesn't work
sed 's/^#//'

When faced with an unwanted byte in a text file represented by some other stand-in symbol, a tool like hexdump or od helps. Try this:
Make a copy of the original file.
Remove everything in the copied file, except a line or two that includes the mystery symbol. Save the file.
To see what the byte really is, do:
hexdump -v -e '/1 "%_ad# "' -e '/1 " _%_u\_\n"' file
From which listing find the hex code for the unwanted byte, (let's
say it's 00), and try:
sed 's/\x00//' file
If that works, run the same sed line on the original file.

Related

Removing line breaks from CSV exported from Google Sheets

I have some data in the format:
-e, 's/,Chalk/,Cheese/g'
-e, 's/,Black/,White/g'
-e, 's/,Leave/,Remain/g'
in a file data.csv.
Using Gitbash, I use the file command to discover that this is ASCII text with CRLF terminators. If I also use the command cat -v , I see in Gitbash that each line ends ^M .
I want to remove those terminators, to leave a single line.
I've tried the following:
sed -e 's/'\r\n'//g' < data.csv > output.csv
taking care to put the \r\n in single quotes in order that the backslash is treated literally, but it does not work. No error, just no effect.
I'm using Gitbash for Windows.
Quotes within quotes cancel each other out, so you actually undo the quotes around the sed command for the newline characters. You could escape the quotes like 's|'\''\r\n'\''||g', but that would just include them in the string, which would not match anything in your case.
But that is not the only problem; sed by default only processes strings between newlines.
If you have the GNU version of sed, RAM to spare if the file is huge, and are sure the file does not contain data with null characters, try adding the -z argument, like:
sed -z -e 's|\r\n||g' < data.csv > output.csv
Though I guess you probably also want to replace it with a comma:
sed -z -e 's|\r\n|,|g' < data.csv > output.csv
For non-GNU versions of sed, you may have an easier time using tr instead, like:
tr '\r\n' ',' data.csv > output.csv

Delete lines containing pattern at the end of line

Quite certainly I miss something basic. My file contains lines like
fooLOCATION=sdfmsvdnv
fooLOCATION=
barLOCATION=sadssf
barLOCATION=
and I want to delete all lines ending with LOCATION=.
sed -i '/LOCATION=$/d' file
does not do, it deletes nothing, and I have tried endless variations, but I don't get it. What inline sed command can do this?
There are two approaches here, either print all non-matching lines with
sed -in '/LOCATION=$/!p' file
or delete all matching names with
sed -i '/LOCATION=$/d' file
The first uses the n command line option to suppress the default action of printing the line. We then test for lines that end in LOCATION= and invert the pattern (only keeping those that don't match). When we get a desirable line, we print it with the p option.
The second looks for lines matching the end of line pattern, and deletes those that do.
Your file contains blank lines, and both of these keep those. If we don't want to keep those, we can change the first option to
sed -in '/^$/!{/LOCATION=$/!p}' file
which first checks if a line is not empty, and only bothers checking if it should be printed if it isn't empty. We can modify the second option to
sed -i '/^$/d;/LOCATION=$/d' file
which deletes blank lines and then checks about deleting the other pattern.
We can modify the options to work with different line ending by specifying the difference in the pattern. The difference between line endings on Unix/Linux (\n) and Windows (\r\n) is the presence of an extra carriage return on Windows. Modifying the four commands above to accept either, we get
sed -in '/LOCATION=\r\{0,1\}$/!p' file
sed -i '/LOCATION=\r\{0,1\}$/d' file
sed -in '/^\r\{0,1\}$/!{/LOCATION=\r\{0,1\}$/!p}' file
sed -i '/^\r\{0,1\}$/d;/LOCATION=\r\{0,1\}$/d' file
Note that in each of these we allow an optional \r before the end of line. We use the curly bracket notation, as sed does not support the question mark optional quantifier in normal mode (using the r option to GNU sed for enabling extended regular expressions, we can replace \{0,1\} with ?).
On a Windows shell, all of the options above require double quotes instead of single quotes.
Your command does work for me:
$ sed -i '/LOCATION=$/d' file
Results, viewed using cat:
$ cat file
fooLOCATION=sdfmsvdnv
barLOCATION=sadssf
Note
If a file has non-Unix line endings such as files from Windows with DOS-formatted line-endings, it can be a reason for failure. A typical remedy is to use dos2unix:
$ dos2unix file
This converter fixes the newline issues, so that file will now have Unix-style line endings. Sed should now properly recognize those line endings, so retry your sed command and it should work.
This might work for you (GNU sed):
sed -i '/LOCATION=\s*$/d' file
This deletes the line if LOCATION= is at the end of the line or if there is any optional white space following the pattern.

Using sed to keep the beginning of a line

I have a file in which some lines start by a >
For these lines, and only these ones, I want to keep the first eleven characters.
How can I do that using sed ?
Or maybe something else is better ?
Thanks !
Muriel
Let's start with this test file:
$ cat file
line one with something or other
>1234567890abc
other line in file
To keep only the first 11 characters of lines starting with > while keeping all other lines:
$ sed -r '/^>/ s/(.{11}).*/\1/' file
line one with something or other
>1234567890
other line in file
To keep only the first eleven characters of lines starting with > and deleting all other lines:
$ sed -rn '/^>/ s/(.{11}).*/\1/p' file
>1234567890
The above was tested with GNU sed. For BSD sed, replace the -r option with -E.
Explanation:
/^>/ is a condition. It means that the command which follows only applies to lines that start with >
s/(.{11}).*/\1/ is a substitution command. It replaces the whole line with just the first eleven characters.
-r turns on extended regular expression format, eliminating the need for some escape characters.
-n turns off automatic printing. With -n in effect, lines are only printed if we explicitly ask them to be printed. In the second case above, that is done by adding a p after the substitute command.
Other forms:
$ sed -r 's/(>.{10}).*/\1/' file
line one with something or other
>1234567890
other line in file
And:
$ sed -rn 's/(>.{10}).*/\1/p' file
>1234567890

Extracting the contents between two different strings using bash or perl

I have tried to scan through the other posts in stack overflow for this, but couldn't get my code work, hence I am posting a new question.
Below is the content of file temp.
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/<env:Body><dp:response xmlns:dp="http://www.datapower.com/schemas/management"><dp:timestamp>2015-01-
22T13:38:04Z</dp:timestamp><dp:file name="temporary://test.txt">XJzLXJlc3VsdHMtYWN0aW9uX18i</dp:file><dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:file></dp:response></env:Body></env:Envelope>
This file contains the base64 encoded contents of two files names test.txt and test1.txt. I want to extract the base64 encoded content of each file to seperate files test.txt and text1.txt respectively.
To achieve this, I have to remove the xml tags around the base64 contents. I am trying below commands to achieve this. However, it is not working as expected.
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g' > test.txt
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g' > test1.txt
Below command:
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g'
produces output:
XJzLXJlc3VsdHMtYWN0aW9uX18i
<dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:response> </env:Body></env:Envelope>`
Howeveer, in the output I am expecting only first line XJzLXJlc3VsdHMtYWN0aW9uX18i. Where I am commiting mistake?
When i run below command, I am getting expected output:
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g'
It produces below string
lc3VsdHMtYWN0aW9uX18i
I can then easily route this to test1.txt file.
UPDATE
I have edited the question by updating the source file content. The source file doesn't contain any newline character. The current solution will not work in that case, I have tried it and failed. wc -l temp must output to 1.
OS: solaris 10
Shell: bash
sed -n 's_<dp:file name="\([^"]*\)">\([^<]*\).*_\1 -> \2_p' temp
I add \1 -> to show link from file name to content but for content only, just remove this part
posix version so on GNU sed use --posix
assuming that base64 encoded contents is on the same line as the tag around (and not spread on several lines, that need some modification in this case)
Thanks to JID for full explaination below
How it works
sed -n
The -n means no printing so unless explicitly told to print, then there will be no output from sed
's_
This is to substitute the following regex using _ to separate regex from the replacement.
<dp:file name=
Regular text
"\([^"]*\)"
The brackets are a capture group and must be escaped unless the -r option is used( -r is not available on posix). Everything inside the brackets is captured. [^"]* means 0 or more occurrences of any character that is not a quote. So really this just captures anything between the two quotes.
>\([^<]*\)<
Again uses the capture group this time to capture everything between the > and <
.*
Everything else on the line
_\1 -> \2
This is the replacement, so replace everything in the regex before with the first capture group then a -> and then the second capture group.
_p
Means print the line
Resources
http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
http://www.grymoire.com/Unix/Sed.html
/usr/xpg4/bin/sed works well here.
/usr/bin/sed is not working as expected in case if the file contains just 1 line.
below command works for a file containing only single line.
/usr/xpg4/bin/sed -n 's_<env:Envelope\(.*\)<dp:file name="temporary://BackUpDir/backupmanifest.xml">\([^>]*\)</dp:file>\(.*\)_\2_p' securebackup.xml 2>/dev/null
Without 2>/dev/null this sed command outputs the warning sed: Missing newline at end of file.
This because of the below reason:
Solaris default sed ignores the last line not to break existing scripts because a line was required to be terminated by a new line in the original Unix implementation.
GNU sed has a more relaxed behavior and the POSIX implementation accept the fact but outputs a warning.

Add text at the end of each line

I'm on Linux command line and I have file with
127.0.0.1
128.0.0.0
121.121.33.111
I want
127.0.0.1:80
128.0.0.0:80
121.121.33.111:80
I remember my colleagues were using sed for that, but after reading sed manual still not clear how to do it on command line?
You could try using something like:
sed -n 's/$/:80/' ips.txt > new-ips.txt
Provided that your file format is just as you have described in your question.
The s/// substitution command matches (finds) the end of each line in your file (using the $ character) and then appends (replaces) the :80 to the end of each line. The ips.txt file is your input file... and new-ips.txt is your newly-created file (the final result of your changes.)
Also, if you have a list of IP numbers that happen to have port numbers attached already, (as noted by Vlad and as given by aragaer,) you could try using something like:
sed '/:[0-9]*$/ ! s/$/:80/' ips.txt > new-ips.txt
So, for example, if your input file looked something like this (note the :80):
127.0.0.1
128.0.0.0:80
121.121.33.111
The final result would look something like this:
127.0.0.1:80
128.0.0.0:80
121.121.33.111:80
Concise version of the sed command:
sed -i s/$/:80/ file.txt
Explanation:
sed stream editor
-i in-place (edit file in place)
s substitution command
/replacement_from_reg_exp/replacement_to_text/ statement
$ matches the end of line (replacement_from_reg_exp)
:80 text you want to add at the end of every line (replacement_to_text)
file.txt the file name
How can this be achieved without modifying the original file?
If you want to leave the original file unchanged and have the results in another file, then give up -i option and add the redirection (>) to another file:
sed s/$/:80/ file.txt > another_file.txt
sed 's/.*/&:80/' abcd.txt >abcde.txt
If you'd like to add text at the end of each line in-place (in the same file), you can use -i parameter, for example:
sed -i'.bak' 's/$/:80/' foo.txt
However -i option is non-standard Unix extension and may not be available on all operating systems.
So you can consider using ex (which is equivalent to vi -e/vim -e):
ex +"%s/$/:80/g" -cwq foo.txt
which will add :80 to each line, but sometimes it can append it to blank lines.
So better method is to check if the line actually contain any number, and then append it, for example:
ex +"g/[0-9]/s/$/:80/g" -cwq foo.txt
If the file has more complex format, consider using proper regex, instead of [0-9].
You can also achieve this using the backreference technique
sed -i.bak 's/\(.*\)/\1:80/' foo.txt
You can also use with awk like this
awk '{print $0":80"}' foo.txt > tmp && mv tmp foo.txt
Using a text editor, check for ^M (control-M, or carriage return) at the end of each line. You will need to remove them first, then append the additional text at the end of the line.
sed -i 's|^M||g' ips.txt
sed -i 's|$|:80|g' ips.txt
sed -i 's/$/,/g' foo.txt
I do this quite often to add a comma to the end of an output so I can just easily copy and paste it into a Python(or your fav lang) array