I want to create a newline containing incrementing numbers between each line of text - perl

I have lines of text as follows:
The cat and the mouse
Were in the house
They spotted some grouse
I want to put a new line between each line of text with an incrementing number after a ">" so that it looks like this
>1
The cat and the mouse
>2
Were in the house
>3
They spotted some grouse
I would like to do this in perl if possible s I can run it on a mac. Can anyone help?

Something like this should do it:
perl -pe 'print ">$.\n"' foo.txt

You can use perl from command line,
perl -pe 's|^|>$.$/|' file
$. is current line number, and $/ is input record separator (usually newline \n)

Related

Perl one liner to add text at last but one line of a large file

I am novice to Perl. Please help me in the programming using either one liner or a Perl proc or a Perl program.
Let's suppose my input file is input.txt and its contents are as follows :
This is an example
This file has three lines
Oh you are mistaken. It has many lines
I want my text here
Thanks for making it to the last line of input.txt.
Below is the output file that I want to generate:
This is an example
This file has three lines
Oh you are mistaken. It has many lines
I want my text here
This line has special characters like $
I love this community
Thanks for making it to the last line of input.txt
I am running this on tcsh. I used the below one-liner :
Perl -p -e 'print "This line has special characters like $ \nI love this community"' if $. == 9' input.txt > output.txt
The problem is that, in the above example, I know the number of last line. But in my code, the length of input.txt keeps changing. What changes should I make to the one-liner so that it works even if I don't give the last line number.
Note: please don't suggest using sed. I tried with sed and I was successful at performing the required task. However, my input file is around 325MB and sed is taking neraly 25 mins to do this task. I want it to be done in less than 5 mins.
Perl version being used : v5.10.1
Instead of fixed line number, check whether it is end of input file with eof
perl -pe 'print "This line has special characters like \$ \nI love this community\n" if eof' input.txt > output.txt
Using GNU sed to insert text before the last line of input:
sed '$i This line has special characters like $\nI love this community' input.txt > output.txt

How to find patterns across multiple lines using perl

I want to grep some string spread along multiple lines withing some begin and end pattern
Example:
MediaHelper->fetchStrings( names => [ //Here new line may or many not be
**'ubp-firstrun_heading',
'firstrun_text',
'_firstrun-or-start_search',
'installed'** //may end here also );
]);
using perl or grap how I can get list 4 strings here begin pattern is MediaHelper->fetchStrings(names => [ and end pattern is );
Or any other suggesting using other commands like grep or sed or awk ?
Try this:
sed -n '/MediaHelper->fetchStrings( names =>/,/);/ p' <yourfile>
Or, if you want to skip the delimiting lines, this:
sed -n '/MediaHelper->fetchStrings( names =>/,/);/ {/MediaHelper->fetchStrings( names =>/b; /^);/b; p}' <yourfile>
If I understand your question, you need to match all strings in all lines (and not just the MediaHelper thing).
If this is the case, then sed is the right tool, because it is by default line-oriented.
In our case, if you want to match the string in every line:
sed "s/.*\('.*'\).*/\1/" <your_file>
Hope it helps
Edit: To be more descriptive, first we need to match the whole line (that's the first and the last .*) and then we enclose in parenthesis the part of the line we want to print, which in our case is everything inside single quotes. The number 1 before the last delimiter denotes that we want to print the first (in our case it is the last also) parenthesis.
Just process the file in slurp mode instead of line by line:
perl -0777 -ne 'print $1 while m{MediaHelper->fetchStrings(names\s*=>\s*\[(.*?)\]}g' file
Explanation:
Switches:
-0777: Slurp mode instead of line by line
-n: Creates a while(<>){..} loop for each line in your input file.
-e: Tells perl to execute the code on command line.

put all separate paragraphs of a file into a separate line

I have a file that contains sequence data, where each new paragraph (separated by two blank lines) contain a new sequence:
#example
ASDHJDJJDMFFMF
AKAKJSJSJSL---
SMSM-....SKSKK
....SK
SKJHDDSNLDJSCC
AK..SJSJSL--HG
AHSM---..SKSKK
-.-GHH
and I want to end up with a file looking like:
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH
each sequence is the same length (if that helps).
I would also be looking to do this over multiple files stored in different directiories.
I have just tried
sed -e '/./{H;$!d;}' -e 'x;/regex/!d' ./text.txt
however this just deleted the entire file :S
any help would bre appreciated - doesn't have to be in sed, if you know how to do it in perl or something else then that's also great.
Thanks.
All you're asking to do is convert a file of blank-lines-separated records (RS) where each field is separated by newlines into a file of newline-separated records where each field is separated by nothing (OFS). Just set the appropriate awk variables and recompile the record:
$ awk '{$1=$1}1' RS= OFS= file
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH
awk '
/^[[:space:]]*$/ {if (line) print line; line=""; next}
{line=line $0}
END {if (line) print line}
'
perl -00 -pe 's/\n//g; $_.="\n"'
For multiple files:
# adjust your glob pattern to suit,
# don't be shy to ask for assistance
for file in */*.txt; do
newfile="/some/directory/$(basename "$file")"
perl -00 -pe 's/\n//g; $_.="\n"' "$file" > "$newfile"
done
A Perl one-liner, if you prefer:
perl -nle 'BEGIN{$/=""};s/\n//g;print $_' file
The $/ variable is the equivalent of awk's RS variable. When set to the empty sting ("") it causes two or more empty lines to be treated as one empty line. This is the so-called "paragraph-mode" of reading. For each record read, all newline characters are removed. The -l switch adds a newline to the end of each output string, thus giving the desired result.
just try to find those double linebreaks: \n or \r and replace first those with an special sign like :$:
after that you replace every linebreak with an empty string to get the whole file in one line.
next, replace your special sign with a simple line break :)

Perl from command line: When replace a string in a file it removes also the new lines

I'm using perl from command line to to replace duplicate spaces from a text file.
The command I use is:
perl -pi -e 's/\s+/ /g' file.csv
The problem: This procedure removes also the new lines in the resulting file....
Any idea why this occur?
Thanks!
\s means the five characters: [ \f\n\r\t]. So, you're replacing newlines by single spaces.
In your case, the simplest way is to enable automatic line-ending processing with -l flag:
perl -pi -le 's/\s+/ /g' file.csv
This way, newlines will be chomped before -e statement and appended after.
Will add my two cents to the previous answer.
If you use this regexp in perl script itself, then you can just change it to:
s/[ ]+/ /gis;
That will change every line and won't delete line-endings.

replace two newlines to one in shell command line

There are lot of questions about replacing multi-newlines to one newline but no one is working for me.
I have a file:
first line
second line MARKER
third line MARKER
other lines
many other lines
I need to replace two newlines (if they exist) after MARKER to one newline. A result file should be:
first line
second line MARKER
third line MARKER
other lines
many other lines
I tried sed ':a;N;$!ba;s/MARKER\n\n/MARKER\n/g' Fail.
sed is useful for single line replacements but has problems with newlines. It can't find \n\n
I tried perl -i -p -e 's/MARKER\n\n/MARKER\n/g' Fail.
This solution looks closer, but it seems that regexp didn't reacts to \n\n.
Is it possible to replace \n\n only after MARKER and not to replace other \n\n in the file?
I am interested in one-line-solution, not scripts.
I think you were on the right track. In a multi-line program, you would load the entire file into a single scalar and run this substitution on it:
s/MARKER\n\n/MARKER\n/g
The trick to getting a one-liner to load a file into a multi-line string is to set $/ in a BEGIN block. This code will get executed once, before the input is read.
perl -i -pe 'BEGIN{$/=undef} s/MARKER\n\n/MARKER\n/g' input
Your Perl solution doesn't work because you are search for lines that contain two newlines. There is no such thing. Here's one solution:
perl -ne'print if !$m || !/^$/; $m = /MARKER$/;' infile > outfile
Or in-place:
perl -i~ -ne'print if !$m || !/^$/; $m = /MARKER$/;' file
If you're ok with loading the entire file into memory, you can use
perl -0777pe's/MARKER\n\n/MARKER\n/g;' infile > outfile
or
perl -0777pe's/MARKER\n\K\n//g;' infile > outfile
As above, you can use -i~ do edit in-place. Remove the ~ if you don't want to make a backup.
awk:
kent$ cat a
first line
second line MARKER
third line MARKER
other lines
many other lines
kent$ awk 'BEGIN{RS="\x034"} {gsub(/MARKER\n\n/,"MARKER\n");printf $0}' a
first line
second line MARKER
third line MARKER
other lines
many other lines
See sed one liners.
awk '
marker { marker = 0; if (/^$/) next }
/MARKER/ { marker = 1 }
{ print }
'
This can be done in very simple sed.
sed '/MARKER$/{n;/./!d}'
This might work for you:
sed '/MARKER/,//{//!d}'
Explanation:
Deletes all lines between MARKER's preserving the MARKER lines.
Or:
sed '/MARKER/{n;N;//D}'
Explanation:
Read the next line after MARKER, then append the line after that. Delete the previous line if the current line is a MARKER line.