Perl from command line: When replace a string in a file it removes also the new lines

Perl from command line: When replace a string in a file it removes also the new lines - perl

I'm using perl from command line to to replace duplicate spaces from a text file.
The command I use is:
perl -pi -e 's/\s+/ /g' file.csv
The problem: This procedure removes also the new lines in the resulting file....
Any idea why this occur?
Thanks!

\s means the five characters: [ \f\n\r\t]. So, you're replacing newlines by single spaces.
In your case, the simplest way is to enable automatic line-ending processing with -l flag:
perl -pi -le 's/\s+/ /g' file.csv
This way, newlines will be chomped before -e statement and appended after.

Will add my two cents to the previous answer.
If you use this regexp in perl script itself, then you can just change it to:
s/[ ]+/ /gis;
That will change every line and won't delete line-endings.

Related

GREP Print Blank Lines For Non-Matches

I want to extract strings between two patterns with GREP, but when no match is found, I would like to print a blank line instead.
Input
This is very new
This is quite old
This is not so new
Desired Output
is very
is not so
I've attempted:
grep -o -P '(?<=This).*?(?=new)'
But this does not preserve the second blank line in the above example. Have searched for over an hour, tried a few things but nothing's worked out.
Will happily used a solution in SED if that's easier!

You can use
#!/bin/bash
s='This is very new
This is quite old
This is not so new'
sed -En 's/.*This(.*)new.*|.*/\1/p' <<< "$s"
See the online demo yielding
is very
is not so
Details:
E - enables POSIX ERE regex syntax
n - suppresses default line output
s/.*This(.*)new.*|.*/\1/ - finds any text, This, any text (captured into Group 1, \1, and then any text again, or the whole string (in sed, line), and replaces with Group 1 value.
p - prints the result of the substitution.
And this is what you need for your actual data:
sed -En 's/.*"user_ip":"([^"]*).*|.*/\1/p'
See this online demo. The [^"]* matches zero or more chars other than a " char.

With your shown samples, please try following awk code.
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} NF!=3{print ""}' Input_file
OR
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} {print ""}' Input_file
Explanation: Simple explanation would be, setting This\\s+ OR \\s+new as field separators for all the lines of Input_file. Then in main program checking condition if NF(number of fields) are 3 then print 2nd field (where next will take cursor to next line). In another condition checking if NF(number of fields) is NOT equal to 3 then simply print a blank line.

sed:
sed -E '
/This.*new/! s/.*//
s/.*This(.*)new.*/\1/
' file
first line: lines not matching "This.*new", remove all characters leaving a blank line
second lnie: lines matching the pattern, keep only the "middle" text
this is not the pcre non-greedy match: the line
This is new but that is not new
will produce the output
is new but that is not
To continue to use PCRE, use perl:
perl -lpe '$_ = /This(.*?)new/ ? $1 : ""' file

This might work for you:
sed -E 's/.*This(.*)new.*|.*/\1/' file
If the first match is made, the line is replace by everything between This and new.
Otherwise the second match will remove everything.
N.B. The substitution will always match one of the conditions. The solution was suggested by Wiktor Stribiżew.

Perl one liner to add text at last but one line of a large file

I am novice to Perl. Please help me in the programming using either one liner or a Perl proc or a Perl program.
Let's suppose my input file is input.txt and its contents are as follows :
This is an example
This file has three lines
Oh you are mistaken. It has many lines
I want my text here
Thanks for making it to the last line of input.txt.
Below is the output file that I want to generate:
This is an example
This file has three lines
Oh you are mistaken. It has many lines
I want my text here
This line has special characters like $
I love this community
Thanks for making it to the last line of input.txt
I am running this on tcsh. I used the below one-liner :
Perl -p -e 'print "This line has special characters like $ \nI love this community"' if $. == 9' input.txt > output.txt
The problem is that, in the above example, I know the number of last line. But in my code, the length of input.txt keeps changing. What changes should I make to the one-liner so that it works even if I don't give the last line number.
Note: please don't suggest using sed. I tried with sed and I was successful at performing the required task. However, my input file is around 325MB and sed is taking neraly 25 mins to do this task. I want it to be done in less than 5 mins.
Perl version being used : v5.10.1

Instead of fixed line number, check whether it is end of input file with eof
perl -pe 'print "This line has special characters like \$ \nI love this community\n" if eof' input.txt > output.txt

Using GNU sed to insert text before the last line of input:
sed '$i This line has special characters like $\nI love this community' input.txt > output.txt

Perl one line removes spaces

I pieced together this one line to change the values in a csv file. It works perfect except that it removes all the spaces. If someone could explain what I'm doing wrong I would appreciate it.
perl -pne 's/\s+(-?\d+\.?\d*)/$1>100?1000:$1/ge

Everything matching the LHS of your regex
\s+(-?\d+\.?\d*)
will be replaced. That includes the whitespace matched by \s+. You can use a zero-width look-behind assertion as Matt suggested:
perl -pe 's/(?<=\s)(-?\d+\.?\d*)/$1>100?1000:$1/ge' file
or the special \K form, which will "keep" everything before the \K:
perl -pe 's/\s+\K(-?\d+\.?\d*)/$1>100?1000:$1/ge' file
Note that both -p and -n loop through every line of your input file(s), so you only need one or the other (although -p overrides -n if you do specify both). I used -p because it prints each line automatically. Details in perldoc perlrun.

put all separate paragraphs of a file into a separate line

I have a file that contains sequence data, where each new paragraph (separated by two blank lines) contain a new sequence:
#example
ASDHJDJJDMFFMF
AKAKJSJSJSL---
SMSM-....SKSKK
....SK
SKJHDDSNLDJSCC
AK..SJSJSL--HG
AHSM---..SKSKK
-.-GHH
and I want to end up with a file looking like:
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH
each sequence is the same length (if that helps).
I would also be looking to do this over multiple files stored in different directiories.
I have just tried
sed -e '/./{H;$!d;}' -e 'x;/regex/!d' ./text.txt
however this just deleted the entire file :S
any help would bre appreciated - doesn't have to be in sed, if you know how to do it in perl or something else then that's also great.
Thanks.

All you're asking to do is convert a file of blank-lines-separated records (RS) where each field is separated by newlines into a file of newline-separated records where each field is separated by nothing (OFS). Just set the appropriate awk variables and recompile the record:
$ awk '{$1=$1}1' RS= OFS= file
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH

awk '
/^[[:space:]]*$/ {if (line) print line; line=""; next}
{line=line $0}
END {if (line) print line}
'
perl -00 -pe 's/\n//g; $_.="\n"'
For multiple files:
# adjust your glob pattern to suit,
# don't be shy to ask for assistance
for file in */*.txt; do
newfile="/some/directory/$(basename "$file")"
perl -00 -pe 's/\n//g; $_.="\n"' "$file" > "$newfile"
done

A Perl one-liner, if you prefer:
perl -nle 'BEGIN{$/=""};s/\n//g;print $_' file
The $/ variable is the equivalent of awk's RS variable. When set to the empty sting ("") it causes two or more empty lines to be treated as one empty line. This is the so-called "paragraph-mode" of reading. For each record read, all newline characters are removed. The -l switch adds a newline to the end of each output string, thus giving the desired result.

just try to find those double linebreaks: \n or \r and replace first those with an special sign like :$:
after that you replace every linebreak with an empty string to get the whole file in one line.
next, replace your special sign with a simple line break :)

replace two newlines to one in shell command line

There are lot of questions about replacing multi-newlines to one newline but no one is working for me.
I have a file:
first line
second line MARKER
third line MARKER
other lines
many other lines
I need to replace two newlines (if they exist) after MARKER to one newline. A result file should be:
first line
second line MARKER
third line MARKER
other lines
many other lines
I tried sed ':a;N;$!ba;s/MARKER\n\n/MARKER\n/g' Fail.
sed is useful for single line replacements but has problems with newlines. It can't find \n\n
I tried perl -i -p -e 's/MARKER\n\n/MARKER\n/g' Fail.
This solution looks closer, but it seems that regexp didn't reacts to \n\n.
Is it possible to replace \n\n only after MARKER and not to replace other \n\n in the file?
I am interested in one-line-solution, not scripts.

I think you were on the right track. In a multi-line program, you would load the entire file into a single scalar and run this substitution on it:
s/MARKER\n\n/MARKER\n/g
The trick to getting a one-liner to load a file into a multi-line string is to set $/ in a BEGIN block. This code will get executed once, before the input is read.
perl -i -pe 'BEGIN{$/=undef} s/MARKER\n\n/MARKER\n/g' input

Your Perl solution doesn't work because you are search for lines that contain two newlines. There is no such thing. Here's one solution:
perl -ne'print if !$m || !/^$/; $m = /MARKER$/;' infile > outfile
Or in-place:
perl -i~ -ne'print if !$m || !/^$/; $m = /MARKER$/;' file
If you're ok with loading the entire file into memory, you can use
perl -0777pe's/MARKER\n\n/MARKER\n/g;' infile > outfile
or
perl -0777pe's/MARKER\n\K\n//g;' infile > outfile
As above, you can use -i~ do edit in-place. Remove the ~ if you don't want to make a backup.

awk:
kent$ cat a
first line
second line MARKER
third line MARKER
other lines
many other lines
kent$ awk 'BEGIN{RS="\x034"} {gsub(/MARKER\n\n/,"MARKER\n");printf $0}' a
first line
second line MARKER
third line MARKER
other lines
many other lines

See sed one liners.

awk '
marker { marker = 0; if (/^$/) next }
/MARKER/ { marker = 1 }
{ print }
'

This can be done in very simple sed.
sed '/MARKER$/{n;/./!d}'

This might work for you:
sed '/MARKER/,//{//!d}'
Explanation:
Deletes all lines between MARKER's preserving the MARKER lines.
Or:
sed '/MARKER/{n;N;//D}'
Explanation:
Read the next line after MARKER, then append the line after that. Delete the previous line if the current line is a MARKER line.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl from command line: When replace a string in a file it removes also the new lines - perl

I'm using perl from command line to to replace duplicate spaces from a text file. The command I use is: perl -pi -e 's/\s+/ /g' file.csv The problem: This procedure removes also the new lines in the resulting file.... Any idea why this occur? Thanks!

\s means the five characters: [ \f\n\r\t]. So, you're replacing newlines by single spaces. In your case, the simplest way is to enable automatic line-ending processing with -l flag: perl -pi -le 's/\s+/ /g' file.csv This way, newlines will be chomped before -e statement and appended after.

Will add my two cents to the previous answer. If you use this regexp in perl script itself, then you can just change it to: s/[ ]+/ /gis; That will change every line and won't delete line-endings.

Related

GREP Print Blank Lines For Non-Matches

Perl one liner to add text at last but one line of a large file

Perl one line removes spaces

put all separate paragraphs of a file into a separate line

replace two newlines to one in shell command line

Categories

Resources