Perl one liner to add text at last but one line of a large file - perl

I am novice to Perl. Please help me in the programming using either one liner or a Perl proc or a Perl program.
Let's suppose my input file is input.txt and its contents are as follows :
This is an example
This file has three lines
Oh you are mistaken. It has many lines
I want my text here
Thanks for making it to the last line of input.txt.
Below is the output file that I want to generate:
This is an example
This file has three lines
Oh you are mistaken. It has many lines
I want my text here
This line has special characters like $
I love this community
Thanks for making it to the last line of input.txt
I am running this on tcsh. I used the below one-liner :
Perl -p -e 'print "This line has special characters like $ \nI love this community"' if $. == 9' input.txt > output.txt
The problem is that, in the above example, I know the number of last line. But in my code, the length of input.txt keeps changing. What changes should I make to the one-liner so that it works even if I don't give the last line number.
Note: please don't suggest using sed. I tried with sed and I was successful at performing the required task. However, my input file is around 325MB and sed is taking neraly 25 mins to do this task. I want it to be done in less than 5 mins.
Perl version being used : v5.10.1

Instead of fixed line number, check whether it is end of input file with eof
perl -pe 'print "This line has special characters like \$ \nI love this community\n" if eof' input.txt > output.txt

Using GNU sed to insert text before the last line of input:
sed '$i This line has special characters like $\nI love this community' input.txt > output.txt

Related

Remove whitespaces till we find comma, but this should start skipping first comma in each line of a file

I am in the learning phase of sed and awk commands, trying some complicated logic but couldn't get solution for the below.
File contents:
This is apple,apple.com 443,apple2.com 80,apple3.com 232,
We talk on 1 banana,banana.com 80,banannna.com 23,
take 5 grape,grape5.com 23,
When I try with
$ cat sample.txt | sed -e 's/[[:space:]][^,]*,/,/g'
,apple.com,apple2.com,apple3.com,
,banana.com,banannna.com,
,grape5.com,
is ok but I want to skip this sed for the first comma in each line, so expected output is
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
Any help is appreciated.
If you are using GNU sed, you can do something like
sed -e 's/[[:space:]][^,]*,/,/2g' file
where the 2g specifies something like start the substitution from the 2nd occurrence and g for doing it subsequently to the rest of the occurrences.
The output for the above command.
sed -e 's/[[:space:]][^,]*,/,/2g' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
An excerpt from the man page of GNU sed
g
Apply the replacement to all matches to the regexp, not just the first.
number
Only replace the numberth match of the regexp.
awk '{gsub(/[ ]+/," ")gsub(/com [0-9]+/,"com")}1' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
The first gsub removes extra space and the next one takes away unwanted numbers between com and comma.

I want to create a newline containing incrementing numbers between each line of text

I have lines of text as follows:
The cat and the mouse
Were in the house
They spotted some grouse
I want to put a new line between each line of text with an incrementing number after a ">" so that it looks like this
>1
The cat and the mouse
>2
Were in the house
>3
They spotted some grouse
I would like to do this in perl if possible s I can run it on a mac. Can anyone help?
Something like this should do it:
perl -pe 'print ">$.\n"' foo.txt
You can use perl from command line,
perl -pe 's|^|>$.$/|' file
$. is current line number, and $/ is input record separator (usually newline \n)

Perl from command line: When replace a string in a file it removes also the new lines

I'm using perl from command line to to replace duplicate spaces from a text file.
The command I use is:
perl -pi -e 's/\s+/ /g' file.csv
The problem: This procedure removes also the new lines in the resulting file....
Any idea why this occur?
Thanks!
\s means the five characters: [ \f\n\r\t]. So, you're replacing newlines by single spaces.
In your case, the simplest way is to enable automatic line-ending processing with -l flag:
perl -pi -le 's/\s+/ /g' file.csv
This way, newlines will be chomped before -e statement and appended after.
Will add my two cents to the previous answer.
If you use this regexp in perl script itself, then you can just change it to:
s/[ ]+/ /gis;
That will change every line and won't delete line-endings.

sed - inserting a comma after every 4th line in plain text file

I am trying to insert a comma after the values on line 1, 4, 8, etc using sed,
sed '0-4 s/$/,/' in.txt > in2.txt
For some reason this isn't working so I was wondering if anyone has any solutions doing this using awk, sed, or any other methods.
The error I am getting is
sed: 1: "0-4 s/$/,/": invalid command code -
Currently my data looks like this:
City
Address
Zip Code
County
and I was trying to format it like this
City,
Address
Zip Code
County
Much Appreciated.
0-4 indeed is not well-formed sed syntax. I would use awk for this, but it is easy to do it with either.
sed 's/$/,/;n;n;n' file
which substitutes one line and prints it, then prints the next three lines without substitution, then starts over from the beginning of the script; or
awk 'NR % 4 == 1 {sub(/$/,",")} {print}'
which does the substitution if the line number modulo 4 is 1, then prints unconditionally.
Sed's addressing modes are sometimes a tad disappointing; there is no standard way to calculate line offsets, relative or in reference to e.g. the end of the file. Of course, awk is more complex, but if you can only learn one or the other, definitely go for awk. (Or in this day and age, Python or Perl -- a much better investment.)
This might work for you (GNU sed):
sed '1~4s/$/,/' file

How can I remove all non-word characters except the newline?

I have a file like this:
my line - some words & text
oh lóok i've got some characters
I want to 'normalize' it and remove all the non-word characters. I want to end up with something like this:
mylinesomewordstext
ohlóokivegotsomecharacters
I'm using Linux on the command line at the moment, and I'm hoping there's some one-liner I can use.
I tried this:
cat file | perl -pe 's/\W//'
But that removed all the newlines and put everything one line. Is there someway I can tell Perl to not include newlines in the \W? Or is there some other way?
This removes characters that don't match \w or \n:
cat file | perl -C -pe 's/[^\w\n]//g'
#sth's solution uses Perl, which is (at least on my system) not Unicode compatible, thus it loses the accented o character.
On the other hand, sed is Unicode compatible (according to the lists on this page), and gives a correct result:
$ sed 's/\W//g' a.txt
mylinesomewordstext
ohlóokivegotsomecharacters
In Perl, I'd just add the -l switch, which re-adds the newline by appending it to the end of every print():
perl -ple 's/\W//g' file
Notice that you don't need the cat.
The previous response isn't echoing the "ó" character. At least in my case.
sed 's/\W//g' file
Best practices for shell scripting dictate that you should use the tr program for replacing single characters instead of sed, because it's faster and more efficient. Obviously use sed if replacing longer strings.
tr -d '[:blank:][:punct:]' < file
When run with time I get:
real 0m0.003s
user 0m0.000s
sys 0m0.004s
When I run the sed answer (sed -e 's/\W//g' file) with time I get:
real 0m0.003s
user 0m0.004s
sys 0m0.004s
While not a "huge" difference, you'll notice the difference when running against larger data sets. Also please notice how I didn't pipe cat's output into tr, instead using I/O redirection (one less process to spawn).