Remove newline characters in file [duplicate] - perl

This question already has answers here:
What's the most robust way to efficiently parse CSV using awk?
(6 answers)
Closed 5 years ago.
I have a text file with comma seperated values which has newline characters in the column values. So it makes the column data split to next line causing data issues.
Sample data
"604","56-1203802","xx","VEN","null","50","1","20","N�
jTï"
"5526","841328305","yyINC","VEN","null","50","1","20","~R¿½K�ï
¿½ï¿½}("
"604","561203802","C","VEN",,"null","50","1","20","2ï½a��"
Expected Output
"604","56-1203802","xx","VEN","null","50","1","20","N�jTï"
"5526","841328305","yyINC","VEN","null","50","1","20","~R¿½K���}("
"604","561203802","C","VEN",,"null","50","1","20","2ï½a��"
I need to remove the newlines inside double-quoted strings.
I tried the below awk command to remove it, but it is not working as expected.
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' infile.txt > outfile.txt
The required result would be to remove the LF and CR characters from the data.
I tried solutions for similar question posted, but not working for me.
Newline characters in the file are not visible unless copied to Notepad++ when it shows as CR LF.

You can try this sed:
sed ':loop; /" *$/!{N;s/\n//g; b loop}' file

Related

SED Replace character with ' [duplicate]

This question already has answers here:
How do I replace single quotes with another character in sed?
(6 answers)
Closed 5 months ago.
Trying to put ' before each line of text and ' at the end of each line of text.
I have been using sed 's/^/1/' file.txt to replace to begging of each line and sed 's/$/0/' file.txt to replace the end of each line.
What I am trying to make work is sed 's/^/'/' and sed 's/$/'/'
This would format my file to make each line reach as a command, when applied to a separate script.
echo abc | sed "s/.*/'&'/"
Output:
'abc'
From man sed:
The replacement may contain the special character & to refer to that portion of the pattern space which
matched

Remove string from the beginning and the end of line keeping the ones in the middle (sed) [duplicate]

This question already has answers here:
How to remove the leading and trailing space from each line of a file using shell script?
(2 answers)
Closed 8 months ago.
The community reviewed whether to reopen this question 8 months ago and left it closed:
Original close reason(s) were not resolved
I am have the following text:
>seq1
--A--CGT-A--
>seq2
-GA-T-A-CC--
I would like to remove all "-" from the beginning and the end of the lines, i.e., keeping the "-" between the letters. Expected output:
>seq1
A--CGT-A
>seq2
GA-T-A-CC
I have tried the following sed, but it deletes only the "-" from the beginning.
sed 's/^\(-\)*//'
Can anyone help, please?
You can use
sed 's/^-*\|-*$//g' file
sed -E 's/^-*|-*$//g' file
sed -E 's/^-+|-+$//g' file
Each of the commands removes hyphens from the start and from the end of the lines. Note the g flag that enables multiple matching on the same line.
To support cases with leading or trailing whitespaces, add [[:space:]] / \s:
sed 's/^\s*-*\|-*\s*$//g'
sed -E 's/^[[:space:]]*-*|-*[[:space:]]*$//g'
Note: \s and \| examples are only valid for GNU sed.
See the online demo:
#!/bin/bash
s='>seq1
--A--CGT-A--
>seq2
-GA-T-A-CC--'
sed 's/^-*\|-*$//g' <<< "$s"
Output:
>seq1
A--CGT-A
>seq2
GA-T-A-CC
This removes leading and trailing dashes on every line not beginning with > (including indented >).
sed -E '/^[[:space:]]*>/!{s/^-+|-+$//}'
Consider allowing for indented dashes: s/[[:space:]]*-//.

How to reverse all words in line with sed? [duplicate]

This question already has answers here:
Reverse input order with sed
(6 answers)
Closed 4 years ago.
For example, we have:
This is the song that doesn`t end
What sed command will turn it into this?
end doesn`t that song the is This
I've found only how to reverse lines in a file (a.k.a. tac):
sed -n '1!G;h;$p'
This might work for you (GNU sed):
sed -r 'G;:a;s/^(\S+)(\s*)(.*\n)/\3\2\1/;ta;s/\n//' file
Append a newline as a delimiter. Split the current line into three and prepend the first word, the following space and the remainder of the line following the newline in that order. Iterate until the pattern matching fails and then remove the introduced newline.
Could you please try following and let me know if this helps you.
awk '{for(i=NF;i>0;i--){printf("%s%s",$i,(i>1?OFS:ORS))}}' Input_file

Unable to remove carriage returns and line feeds in columns enclosed in double quotes [duplicate]

This question already has answers here:
What's the most robust way to efficiently parse CSV using awk?
(6 answers)
Closed 5 years ago.
I want to remove any non printable new line characters in the column data.
I have enclosed all the columns with double quotes to delete the new line characters present in the column easily and to ignore the record delimiter after each end of line.
Say,I have 4 columns seperated by comma and enclosed by quotes in a text file.
I'm trying to remove \n and \r characters only if it is present in between the double quotes
Currently used trim,but it deleted every line break and made it a sequence file without any record seperator.
tr -d '\n\r' < in.txt > out.txt
Sample data:
"1","test\n
Sample","data","col4"\n
"2\n
","Test","Sample","data" \n
"3","Sam\n
ple","te\n
st","data"\n
Expected Output:
"1","testSample","data","col4"\n
"2","Test","Sample","data" \n
"3","Sample","test","data"\n
Any suggestions ? Thanks in advance
With GNU sed
sed ':a;N;$!ba;s/\("[^\n\r]*\)[\n\r\]*\([^\n\r]*\"\)/\1\2/g' file
See this post for the newline replacement without the enclosing ".
Could you please try awk solution and let me know if this helps you.
awk '{gsub(/\r/,"");printf("%s%s",$0,$0~/,$/?"":RS)}' Input_file
Output will be as follows.
"1","test","Sample","data"\n
"2","Test" \n
"3","Sample"
Explanation: Using printf to print the lines, so using 2 %s(it is used for printing strings in printf) here, first %s simply prints the current line, second one will check if a line is ending with comma(,) if yes then it will not print anything else it will print a new line. Add gsub(/\r/,"") before printf in case you want to remove carriage returns and want to get the expected output shown by you too.
EDIT: As your post title suggests to remove carriage returns, so in case you want to remove carriage returns then you could try following. Though you should be mentioning your problem clearly.
tr -d '\r' < Input_file > temp_file && mv temp_file Input_file
Above will remove the carriage characters from your Input_file and save it in the same Input_file too.
Here's a possible solution:
perl -pe 'if (tr/"// % 2) { chomp; $_ .= <>; redo; }'
If the current line has unbalanced quotes (i.e. an odd number of "), it must end in the middle of a field, so we chomp out the newline, append the next input line, and restart the loop.

Using command line to find and replace? [duplicate]

This question already has answers here:
Remove everything after the first / including the first / for each line
(3 answers)
Closed 8 years ago.
I'm trying to use command line to find and replace some text. I have a file with a few million lines that are similar to this:
Something-Here/Grafton-WV</loc>
More-Information/Claremore-OK</loc>
This-Is-It/Seminole-OK</loc>
Your-Company/Lunenburg-MA</loc>
What I need to do is remove the slash and everything after it. I've done wildcard find/replace before but I'm not sure what command would need to be used to start at the slash and continue until the end of the line.
Here's what the output should be:
Something-Here
More-Information
This-Is-It
Your-Company
The following one-liner could work for you:
perl -pe 's{/.*}{}' file.txt
Explanation:
Switches:
-p: Creates a while(<>){...; print} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
Code:
s{/.*}{}: Remove all characters after the first forward slash from the line
This is usually done with sed:
sed 's|/.*||' file.txt > newfile.txt