Removing line breaks from CSV exported from Google Sheets - sed

I have some data in the format:
-e, 's/,Chalk/,Cheese/g'
-e, 's/,Black/,White/g'
-e, 's/,Leave/,Remain/g'
in a file data.csv.
Using Gitbash, I use the file command to discover that this is ASCII text with CRLF terminators. If I also use the command cat -v , I see in Gitbash that each line ends ^M .
I want to remove those terminators, to leave a single line.
I've tried the following:
sed -e 's/'\r\n'//g' < data.csv > output.csv
taking care to put the \r\n in single quotes in order that the backslash is treated literally, but it does not work. No error, just no effect.
I'm using Gitbash for Windows.

Quotes within quotes cancel each other out, so you actually undo the quotes around the sed command for the newline characters. You could escape the quotes like 's|'\''\r\n'\''||g', but that would just include them in the string, which would not match anything in your case.
But that is not the only problem; sed by default only processes strings between newlines.
If you have the GNU version of sed, RAM to spare if the file is huge, and are sure the file does not contain data with null characters, try adding the -z argument, like:
sed -z -e 's|\r\n||g' < data.csv > output.csv
Though I guess you probably also want to replace it with a comma:
sed -z -e 's|\r\n|,|g' < data.csv > output.csv
For non-GNU versions of sed, you may have an easier time using tr instead, like:
tr '\r\n' ',' data.csv > output.csv

Related

Escaping a variable with special characters within sed - comment and uncomment an arbitrary line of source code

I need to comment out a line in a crontab file through a script, so it contains directories, spaces and symbols. This specific line is stored in a variable and I am starting to get mixed up on how to escape the variable. Since the line changes on a regular basis I dont want any escaping in there. I don't want to simply add # in front of it, since I also need to switch it around and replace the line again with the original without the #.
So the goal is to replace $line with #$line (comment) with the possibility to do it the other way around (uncomment).
So I have a variable:
line="* * * hello/this/line & /still/this/line"
This is a line that occurs in a file, file.txt. Wich needs to get comment out.
First try:
sed -i "s/^${line}/#${line}/" file.txt
Second try:
sed -i 's|'${line}'|'"#${line}"'|g' file.txt
choroba's helpful answer shows an effective solution using perl.
sed solution
If you want to use sed, you must use a separate sed command just to escape the $line variable value, because sed has no built-in way to escape strings for use as literals in a regex context:
lineEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$line") # escape $line for use in regex
sed -i "s/^$lineEscaped\$/#&/" file.txt # Note the \$ to escape the end-of-line anchor $
With BSD/macOS sed, use -i '' instead of just -i for in-place updating without backup.
And the reverse (un-commenting):
sed -i "s/^#\($lineEscaped\)\$/\1/" file.txt
See this answer of mine for an explanation of the sed command used for escaping, which should work with any input string.
Also note how variable $lineEscaped is only referenced once, in the regex portion of the s command, whereas the substitution-string portion simply references what the regex matched (which avoids the need to escape the variable again, using different rules):
& in the substitution string represents the entire match, and \1 the first capture group (parenthesized subexpression, \(...\)).
For simplicity, the second sed command uses double quotes in order to embed the value of shell variable $lineEscaped in the sed script, but it is generally preferable to use single-quoted scripts so as to avoid confusion between what the shell interprets up front vs. what sed ends up seeing.
For instance, $ is special to both the shell and sed, and in the above script the end-of-line anchor $ in the sed regex must therefore be escaped as \$ to prevent the shell from interpreting it.
One way to avoid confusion is to selectively splice double-quoted shell-variable references into the otherwise single-quoted script:
sed -i 's/^'"$lineEscaped"'$/#&/' file.txt
awk solution
awk offers literal string matching, which obviates the need for escaping:
awk -v line="$line" '$0 == line { $0 = "#" $0 } 1' file.txt > $$.tmp && mv $$.tmp file.txt
If you have GNU Awk v4.1+, you can use -i inplace for in-place updating.
And the reverse (un-commenting):
awk -v line="#$line" '$0 == line { $0 = substr($0, 2) } 1' file.txt > $$.tmp &&
mv $$.tmp file.txt
Perl has ways to do the quoting/escaping for you:
line=$line perl -i~ -pe '$regex = quotemeta $ENV{line}; s/^$regex/#$ENV{line}/' -- input.txt

Delete lines containing pattern at the end of line

Quite certainly I miss something basic. My file contains lines like
fooLOCATION=sdfmsvdnv
fooLOCATION=
barLOCATION=sadssf
barLOCATION=
and I want to delete all lines ending with LOCATION=.
sed -i '/LOCATION=$/d' file
does not do, it deletes nothing, and I have tried endless variations, but I don't get it. What inline sed command can do this?
There are two approaches here, either print all non-matching lines with
sed -in '/LOCATION=$/!p' file
or delete all matching names with
sed -i '/LOCATION=$/d' file
The first uses the n command line option to suppress the default action of printing the line. We then test for lines that end in LOCATION= and invert the pattern (only keeping those that don't match). When we get a desirable line, we print it with the p option.
The second looks for lines matching the end of line pattern, and deletes those that do.
Your file contains blank lines, and both of these keep those. If we don't want to keep those, we can change the first option to
sed -in '/^$/!{/LOCATION=$/!p}' file
which first checks if a line is not empty, and only bothers checking if it should be printed if it isn't empty. We can modify the second option to
sed -i '/^$/d;/LOCATION=$/d' file
which deletes blank lines and then checks about deleting the other pattern.
We can modify the options to work with different line ending by specifying the difference in the pattern. The difference between line endings on Unix/Linux (\n) and Windows (\r\n) is the presence of an extra carriage return on Windows. Modifying the four commands above to accept either, we get
sed -in '/LOCATION=\r\{0,1\}$/!p' file
sed -i '/LOCATION=\r\{0,1\}$/d' file
sed -in '/^\r\{0,1\}$/!{/LOCATION=\r\{0,1\}$/!p}' file
sed -i '/^\r\{0,1\}$/d;/LOCATION=\r\{0,1\}$/d' file
Note that in each of these we allow an optional \r before the end of line. We use the curly bracket notation, as sed does not support the question mark optional quantifier in normal mode (using the r option to GNU sed for enabling extended regular expressions, we can replace \{0,1\} with ?).
On a Windows shell, all of the options above require double quotes instead of single quotes.
Your command does work for me:
$ sed -i '/LOCATION=$/d' file
Results, viewed using cat:
$ cat file
fooLOCATION=sdfmsvdnv
barLOCATION=sadssf
Note
If a file has non-Unix line endings such as files from Windows with DOS-formatted line-endings, it can be a reason for failure. A typical remedy is to use dos2unix:
$ dos2unix file
This converter fixes the newline issues, so that file will now have Unix-style line endings. Sed should now properly recognize those line endings, so retry your sed command and it should work.
This might work for you (GNU sed):
sed -i '/LOCATION=\s*$/d' file
This deletes the line if LOCATION= is at the end of the line or if there is any optional white space following the pattern.

UNIX Replacing a character sequence in either tr or sed

Have a file that has been created incorrectly. There are several space delimited fields in the file but one text field has some unwanted newlines. This is causing a big problem.
How can I remove these characters but not the wanted line ends?
file is:
'Number field' 'Text field' 'Number field'
1 Some text 999999
2 more
text 111111111
3 Even more text 8888888888
EOF
So there is a NL after the word "more".
I've tried sed:
sed 's/.$//g' test.txt > test.out
and
sed 's/\n//g' test.txt > test.out
But none of these work. The newlines do not get removed.
tr -d '\n' does too much - I need to remove ONLY the newlines that are preceded by a space.
How can I delete newlines that follow a space?
SunOS 5.10 Generic_144488-09 sun4u sparc SUNW,Sun-Fire-V440
A sed solution is
sed '/ $/{N;s/\n//}'
Explanation:
/ $/: whenever the line ends in space, then
N: append a newline and the next line of input, and
s/\n//: delete the newline.
It might be simplest with Perl:
perl -p0 -e 's/ \n/ /g'
The -0 flag makes Perl read the entire file as one line. Then we can substitute using s in the usual way. You can, of course, also add the -i option to edit the file in-place.
How can I delete newlines that follow a space?
If you want every occurrence of $' \n' in the original file to be replaced by a space ($' '), and if you know of a character (e.g. a control character) that does not appear in the file, then the task can be accomplished quite simply using sed and tr (as you requested). Let's suppose, for example, that control-A is a character that is not in the file. For the sake of simplicity, let's also assume we can use bash. Then the following script should do the job:
#!/bin/bash
A=$'\01'
tr '\n' "$A" | sed "s/ $A/ /g" | tr "$A" '\n'

Replace a word with another set of strings in a UNIX file

When I try to replace a string using sed command it works perfectly fine.
For eg :
When i used the below sed command:
sed 's/DB_ALTER/DB_REPRISE/g' /product/dwhrec1/abc.ksh > /product/dwhrec1/abc1.ksh
This command works perfectly fine and replace all the "DB_ALTER" with "DB_REPRISE" and writes the result to abc1.ksh script.
But when I place all such values in a file. for eg:
cat Repla.txt
DB_ALTER
DB_CMD
DB_GEST_COMM
for i in `cat Repla.txt`
do
sed 's/$i/DB_REPRISE/g' /product/dwhrec1/abc.ksh > /product/dwhrec1/abc1.ksh
done
But this does not work. In my file Repla.txt is just an example. In actual it has many values.
Can anyone please help me on this command or suggest some alternative.
Thanks
There are two problems with your script. The first is that the $i variable appears within single quotes. That means that bash will not substitute for the value of i. It needs to be in double-quotes.
Secondly, every time that you run sed, it overwrites the previous abc1.ksh file. You should copy abc.ksh to abc1.ksh and then modify in place abc1.ksh as many times as needed:
cp abc.ksh abc1.ksh
for i in `cat Repla.txt`; do
sed -i'' "s/$i/DB_REPRISE/g" abc1.ksh
done
The -i flag to sed causes it to modify the file in place.
Also, bash will apply word splitting to cat Repla.txt. This can surprise people who were expecting it to work line-by-line, not word-by-word.
Workaround in case your sed does not support -i
The sed on both linux (GNU) and Mac OSX (BSD) support -i. If your sed does not, try:
cmd=
for i in `cat Repla.txt`; do
[ "$cmd" ] && cmd="$cmd;"
cmd="$cmd s/$i/DB_REPRISE/g"
done
sed "$cmd" abc.ksh >abc1.ksh
The above puts all the substitution commands that you need in a single shell variable. This way, sed only needs to be run once and -i is not used.
Another option
If it is acceptable to overwrite the source file, then:
for i in $(cat Repla.txt)
do
sed 's/'$i'/DB_REPRISE/g' abc.ksh >abc1.ksh
mv -f abc1.ksh abc.ksh
done
The above puts in single quotes all of the sed command except for the part that we want the shell to expand. This is not needed in this example but could be useful if your replacement text had shell-active characters. The above also uses the more modern $(...) in place of backquotes for command substitution.
If $i were to contain spaces (it doesn't here), we would need to enclose it in double-quotes to protect it against shell word splitting as in:
for i in $(cat Repla.txt)
do
sed 's/'"$i"'/DB_REPRISE/g' abc.ksh >abc1.ksh
mv -f abc1.ksh abc.ksh
done

sed + remove "#" and empty lines with one sed command

how to remove comment lines (as # bal bla ) and empty lines (lines without charecters) from file with one sed command?
THX
lidia
If you're worried about starting two sed processes in a pipeline for performance reasons, you probably shouldn't be, it's still very efficient. But based on your comment that you want to do in-place editing, you can still do that with distinct commands (sed commands rather than invocations of sed itself).
You can either use multiple -e arguments or separate commands with a semicolon, something like (just one of these, not both):
sed -i 's/#.*$//' -e '/^$/d' fileName
sed -i 's/#.*$//;/^$/d' fileName
The following transcript shows this in action:
pax> printf 'Line # with a comment\n\n# Line with only a comment\n' >file
pax> cat file
Line # with a comment
# Line with only a comment
pax> cp file filex ; sed -i 's/#.*$//;/^$/d' filex ; cat filex
Line
pax> cp file filex ; sed -i -e 's/#.*$//' -e '/^$/d' filex ; cat filex
Line
Note how the file is modified in-place even with two -e options. You can see that both commands are executed on each line. The line with a comment first has the comment removed then all is removed because it's empty.
In addition, the original empty line is also removed.
#paxdiablo has a good answer but it can be improved.
(1) The '/^$/d' clause only matches 100% blank lines.
If you want to also match lines that are entirely whitespace (spaces, tabs etc.) use this instead:
'/^\s*$/d'
(2) The 's/#.*$//' clause only matches lines that start with the # character in column 0.
If you want to also match lines that have only whitespace before the first # use this instead:
'/^\s*#.*$/d'
The above criteria may not be universal (e.g. within a HEREDOC block, or in a Python multi-line string the different approaches could be significant), but in many cases the conventional definition of "blank" lines include whitespace-only, and "comment" lines include whitespace-then-#.
(3) Lastly, on OSX at least, the #paxdiablo solution in which the first clause turns comment lines into blank lines, and the second clause strips blank lines (including what were originally comments) doesn't work. It seems to be more portable to make both clauses /d delete actions as I've done.
The revised command incorporating the above is:
sed -e '/^\s*#.*$/d' -e '/^\s*$/d' inputFile
This tiny jewel removes all # comments, no matter where they begin in a line (see caution below):
sed -e 's/\s*#.*$//'
Example:
text="
this is a # test
#this is a test
#this is a #test
this is # another #test
"
$echo "$text" | sed -e 's/\s*#.*$//'
this is a
this is
Next this removes any resulting blank lines:
$echo "$text" | sed -e 's/\s*#.*$//' | sed -e '/^\s*$/d'
Caution: Depending on the syntax and/or interpretation of the lines your processing, this might not be an appropriate solution, as it just stupidly removes end of lines, even if the '#' is part of your data or code. However, for use cases where you'll never use a hash except for as an end of line comment then it works fine. So just as with all coding, context must be taken into consideration.
Alternative variant, using grep:
cat file.txt | grep -Ev '(#.*$)|(^$)'
you can use awk
awk 'NF{gsub(/^[ \t]*#/,"");print}' file
First example(paxdiablo) is very good except its not change file, just output result. If you want to change it inline:
sudo sed -i 's/#.*$//;/^$/d' inputFile
On (one of) my linux boxes, sed understands extended regular expressions with the -r option, so:
sed -r '/(^\s*#)|(^\s*$)/d' squid.conf.installed
is very useful for showing all non-blank, non comment lines.
The regex matches either start of line followed by zero or more spaces or tabs followed by either a hash or end of line, and deletes those matching lines from the input.