File contains both CRLF and also LF. I want to delete CRLF

File contains both CRLF and also LF. I want to delete CRLF - sed

I have a file that contains both CRLF in it and also LF in it. I am trying to delete the CRLF from my file. The other pre-existing LF in my file, I would like to remain behind. I'm having trouble because either it seems sed can't identify CRLF, or because tr delete both CRLF and LF at the same time. I've trie,
tr -d '\r\n' <file.txt> file2.txt
it is also deleting the LF in my file as well. I tried deleting only \r but then the LF from the CRLF gets left behind. I want CRLF to be deleted or at least turned into a space, but the row doesn't end.
Any help?
I've tried:
tr -d '\r\n' <file.txt> file2.txt
I've tried sed as well but I'm not able to find CRLF.

This will convert dos format file to linux/unix format.
dos2unix file

The argument to tr is a list of characters to operate on, not a string, so \r\n means \r or \n, not \r followed by \n. sed by default cant remove \r\n because sed reads one \n-terminated line at a time so there's never \r\n in seds buffer, just the \r after the \n was removed by sed reading the line.
You can do this with GNU awk for multi-char RS:
awk -v RS='\r\n' -v ORS= '1' file
or with GNU sed for -z (but this will read the whole file into memory at once):
sed -z 's/\r\n//g' file

Related

Removing line breaks from CSV exported from Google Sheets

I have some data in the format:
-e, 's/,Chalk/,Cheese/g'
-e, 's/,Black/,White/g'
-e, 's/,Leave/,Remain/g'
in a file data.csv.
Using Gitbash, I use the file command to discover that this is ASCII text with CRLF terminators. If I also use the command cat -v , I see in Gitbash that each line ends ^M .
I want to remove those terminators, to leave a single line.
I've tried the following:
sed -e 's/'\r\n'//g' < data.csv > output.csv
taking care to put the \r\n in single quotes in order that the backslash is treated literally, but it does not work. No error, just no effect.
I'm using Gitbash for Windows.

Quotes within quotes cancel each other out, so you actually undo the quotes around the sed command for the newline characters. You could escape the quotes like 's|'\''\r\n'\''||g', but that would just include them in the string, which would not match anything in your case.
But that is not the only problem; sed by default only processes strings between newlines.
If you have the GNU version of sed, RAM to spare if the file is huge, and are sure the file does not contain data with null characters, try adding the -z argument, like:
sed -z -e 's|\r\n||g' < data.csv > output.csv
Though I guess you probably also want to replace it with a comma:
sed -z -e 's|\r\n|,|g' < data.csv > output.csv
For non-GNU versions of sed, you may have an easier time using tr instead, like:
tr '\r\n' ',' data.csv > output.csv

Delete lines containing pattern at the end of line

Quite certainly I miss something basic. My file contains lines like
fooLOCATION=sdfmsvdnv
fooLOCATION=
barLOCATION=sadssf
barLOCATION=
and I want to delete all lines ending with LOCATION=.
sed -i '/LOCATION=$/d' file
does not do, it deletes nothing, and I have tried endless variations, but I don't get it. What inline sed command can do this?

There are two approaches here, either print all non-matching lines with
sed -in '/LOCATION=$/!p' file
or delete all matching names with
sed -i '/LOCATION=$/d' file
The first uses the n command line option to suppress the default action of printing the line. We then test for lines that end in LOCATION= and invert the pattern (only keeping those that don't match). When we get a desirable line, we print it with the p option.
The second looks for lines matching the end of line pattern, and deletes those that do.
Your file contains blank lines, and both of these keep those. If we don't want to keep those, we can change the first option to
sed -in '/^$/!{/LOCATION=$/!p}' file
which first checks if a line is not empty, and only bothers checking if it should be printed if it isn't empty. We can modify the second option to
sed -i '/^$/d;/LOCATION=$/d' file
which deletes blank lines and then checks about deleting the other pattern.
We can modify the options to work with different line ending by specifying the difference in the pattern. The difference between line endings on Unix/Linux (\n) and Windows (\r\n) is the presence of an extra carriage return on Windows. Modifying the four commands above to accept either, we get
sed -in '/LOCATION=\r\{0,1\}$/!p' file
sed -i '/LOCATION=\r\{0,1\}$/d' file
sed -in '/^\r\{0,1\}$/!{/LOCATION=\r\{0,1\}$/!p}' file
sed -i '/^\r\{0,1\}$/d;/LOCATION=\r\{0,1\}$/d' file
Note that in each of these we allow an optional \r before the end of line. We use the curly bracket notation, as sed does not support the question mark optional quantifier in normal mode (using the r option to GNU sed for enabling extended regular expressions, we can replace \{0,1\} with ?).
On a Windows shell, all of the options above require double quotes instead of single quotes.

Your command does work for me:
$ sed -i '/LOCATION=$/d' file
Results, viewed using cat:
$ cat file
fooLOCATION=sdfmsvdnv
barLOCATION=sadssf
Note
If a file has non-Unix line endings such as files from Windows with DOS-formatted line-endings, it can be a reason for failure. A typical remedy is to use dos2unix:
$ dos2unix file
This converter fixes the newline issues, so that file will now have Unix-style line endings. Sed should now properly recognize those line endings, so retry your sed command and it should work.

This might work for you (GNU sed):
sed -i '/LOCATION=\s*$/d' file
This deletes the line if LOCATION= is at the end of the line or if there is any optional white space following the pattern.

UNIX Replacing a character sequence in either tr or sed

Have a file that has been created incorrectly. There are several space delimited fields in the file but one text field has some unwanted newlines. This is causing a big problem.
How can I remove these characters but not the wanted line ends?
file is:
'Number field' 'Text field' 'Number field'
1 Some text 999999
2 more
text 111111111
3 Even more text 8888888888
EOF
So there is a NL after the word "more".
I've tried sed:
sed 's/.$//g' test.txt > test.out
and
sed 's/\n//g' test.txt > test.out
But none of these work. The newlines do not get removed.
tr -d '\n' does too much - I need to remove ONLY the newlines that are preceded by a space.
How can I delete newlines that follow a space?
SunOS 5.10 Generic_144488-09 sun4u sparc SUNW,Sun-Fire-V440

A sed solution is
sed '/ $/{N;s/\n//}'
Explanation:
/ $/: whenever the line ends in space, then
N: append a newline and the next line of input, and
s/\n//: delete the newline.

It might be simplest with Perl:
perl -p0 -e 's/ \n/ /g'
The -0 flag makes Perl read the entire file as one line. Then we can substitute using s in the usual way. You can, of course, also add the -i option to edit the file in-place.

How can I delete newlines that follow a space?
If you want every occurrence of $' \n' in the original file to be replaced by a space ($' '), and if you know of a character (e.g. a control character) that does not appear in the file, then the task can be accomplished quite simply using sed and tr (as you requested). Let's suppose, for example, that control-A is a character that is not in the file. For the sake of simplicity, let's also assume we can use bash. Then the following script should do the job:
#!/bin/bash
A=$'\01'
tr '\n' "$A" | sed "s/ $A/ /g" | tr "$A" '\n'

Using sed to keep the beginning of a line

I have a file in which some lines start by a >
For these lines, and only these ones, I want to keep the first eleven characters.
How can I do that using sed ?
Or maybe something else is better ?
Thanks !
Muriel

Let's start with this test file:
$ cat file
line one with something or other
>1234567890abc
other line in file
To keep only the first 11 characters of lines starting with > while keeping all other lines:
$ sed -r '/^>/ s/(.{11}).*/\1/' file
line one with something or other
>1234567890
other line in file
To keep only the first eleven characters of lines starting with > and deleting all other lines:
$ sed -rn '/^>/ s/(.{11}).*/\1/p' file
>1234567890
The above was tested with GNU sed. For BSD sed, replace the -r option with -E.
Explanation:
/^>/ is a condition. It means that the command which follows only applies to lines that start with >
s/(.{11}).*/\1/ is a substitution command. It replaces the whole line with just the first eleven characters.
-r turns on extended regular expression format, eliminating the need for some escape characters.
-n turns off automatic printing. With -n in effect, lines are only printed if we explicitly ask them to be printed. In the second case above, that is done by adding a p after the substitute command.
Other forms:
$ sed -r 's/(>.{10}).*/\1/' file
line one with something or other
>1234567890
other line in file
And:
$ sed -rn 's/(>.{10}).*/\1/p' file
>1234567890

sed + removes all leading and trailing whitespace from each line on solaris system

I have a Solaris machine (SunOSsu1a 5.10 Generic_142900-15 sun 4vsparcSUNW,Netra-T2000).
The following sed syntax removes all leading and trailing whitespace from each line (I need to remove whitespace because it causes application problems).
sed 's/^[ \t]*//;s/[ \t]*$//' orig_file > new_file
But I noticed that sed also removes the "t" character from the end of each line.
Please advise how to fix the sed syntax/command in order to remove only the leading and trailing whitespace from each line (the solution can be also with Perl or AWK).
Examples (take a look at the last string - set_host)
1)
Original line before running sed command
pack/configuration/param[14]/action:set_host
another example (before I run sed)
+/etc/cp/config/Network-Configuration/Network-Configuration.xml:/cp-pack/configuration/param[8]/action:set_host
2)
the line after I run the sed command
pack/configuration/param[14]/action:set_hos
another example (after I run sed)
+/etc/cp/config/Network-Configuration/Network-Configuration.xml:/cp-pack/configuration/param[8]/action:set_hos

Just occurred to me you can use a character class:
sed 's/^[[:space:]]*//;s/[[:space:]]*$//'
This happens in your sed and gnu sed with the --posix option because (evidently) posix interprets the [ \t] as a space, a \, or a t. You can fix this by putting a literal tab instead of \t, easiest way is probably Ctrl+V Tab. If that doesn't work, put the patterns in a file (with the literal tabs) and use sed -f patterns.sed oldfile > newfile.

As #aix noted, the problem is undoubtedly that your sed doesn't understand \t. While the GNU sed does, many propriety Unix flavors don't. HP-UX is one and I believe Solaris is too. If you can't install a GNU sed I'd look to Perl:
perl -pi.old -e 's{^\s+}{};s{\s+$}{}' file
...will trim one or more leading white space (^\s+) [spaces and/or tabs] together with trailing white space (\s+$) updating the file in situ leaving a backup copy as "file.old".

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

File contains both CRLF and also LF. I want to delete CRLF - sed

This will convert dos format file to linux/unix format. dos2unix file

Related

Removing line breaks from CSV exported from Google Sheets

Delete lines containing pattern at the end of line

UNIX Replacing a character sequence in either tr or sed

Using sed to keep the beginning of a line

sed + removes all leading and trailing whitespace from each line on solaris system

Categories

Resources