Delete \n characters from line range in text file - perl

Let's say we have a text file with 1000 lines.
How can we delete new line characters from line 20 to 500 (replace them with space for example)?
My try:
sed '20,500p; N; s/\n/ /;' #better not to say anything
All other lines (1-19 && 501-1000) should be preserved as-is.
As I'm familiar with sed, awk or perl solutions are welcomed, but please give an explanation with them as I'm a perl and awk newbie.

You could use something like this (my example is on a slightly smaller scale :-)
$ cat file
1
2
3
4
5
6
7
8
9
10
$ awk '{printf "%s%s", $0, (2<=NR&&NR<=5?FS:RS)}' file
1
2 3 4 5 6
7
8
9
10
The second %s in the printf format specifier is replaced by either the Field Separator (a space by default) or the Record Separator (a newline) depending on whether the Record Number is within the range.
Alternatively:
$ awk '{ORS=(2<=NR&&NR<=5?FS:RS)}1' file
1
2 3 4 5 6
7
8
9
10
Change the Output Record Separator depending on the line number and print every line.
You can pass variables for the start and end if you want, using awk -v start=2 -v end=5 '...'

This might work for you (GNU sed):
sed -r '20,500{N;s/^(.*)(\n)/\2\1 /;D}' file
or perhaps more readably:
sed ':a;20,500{N;s/\n/ /;ta}' file

Using a perl one-liner to strip the newline:
perl -i -pe 'chomp if 20..500' file
Or to replace it with a space:
perl -i -pe 's/\R/ / if 20..500' file
Explanation:
Switches:
-i: Edit <> files in place (makes backup if extension supplied)
-p: Creates a while(<>){...; print} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
Code:
chomp: Remove new line
20 .. 500: if Range operator .. is between line numbers 20 to 500

Here's a perl version:
my $min = 5; my $max = 10;
while (<DATA>) {
if ($. > $min && $. < $max) {
chomp;
$_ .= " ";
}
print;
}
__DATA__
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Output:
1
2
3
4
5
6 7 8 9 10
11
12
13
14
15
It reads in DATA (which you can set to being a filehandle or whatever your application requires), and checks the line number, $.. While the line number is between $min and $max, the line ending is chomped off and a space added to the end of the line; otherwise, the line is printed as-is.

Related

How to repeat a sequence of numbers to the end of a column?

I have a data file that needs a new column of identifiers from 1 to 5. The final purpose is to split the data into five separate files with no leftover file (split leaves a leftover file).
Data:
aa
bb
cc
dd
ff
nn
ww
tt
pp
with identifier column:
aa 1
bb 2
cc 3
dd 4
ff 5
nn 1
ww 2
tt 3
pp 4
Not sure if this can be done with seq? Afterwards it will be split with:
awk '$2 == 1 {print $0}'
awk '$2 == 2 {print $0}'
awk '$2 == 3 {print $0}'
awk '$2 == 4 {print $0}'
awk '$2 == 5 {print $0}'
Perl to the rescue:
perl -pe 's/$/" " . $. % 5/e' < input > output
Uses 0 instead of 5.
$. is the line number.
% is the modulo operator.
the /e modifier tells the substitution to evaluate the replacement part as code
i.e. end of line ($) is replaced with a space concatenated (.) with the line number modulo 5.
$ awk '{print $0, ((NR-1)%5)+1}' file
aa 1
bb 2
cc 3
dd 4
ff 5
nn 1
ww 2
tt 3
pp 4
No need for that to create 5 separate files of course. All you need is:
awk '{print > ("file_" ((NR-1)%5)+1)}' file
Looks like you're happy with a perl solution that outputs 1-4 then 0 instead of 1-5 so FYI here's the equivalent in awk:
$ awk '{print $0, NR%5}' file
aa 1
bb 2
cc 3
dd 4
ff 0
nn 1
ww 2
tt 3
pp 4
I am going to offer a Perl solution even though it wasn't tagged because Perl is well suited to solve this problem.
If I understand what you want to do, you have a single file that you want to split into 5 separate files based on the position of a line in the data file:
the first line in the data file goes to file 1
the second line in the data file goes to file 2
the third line in the data file goes to file 3
...
since you already have the lines position in the file, you don't really need the identifier column (though you could pursue that solution if you wanted).
Instead you can open 5 filehandles and simply alternate which handle you write to:
use strict;
use warnings;
my $datafilename = shift #ARGV;
# open filehandles and store them in an array
my #fhs;
foreach my $i ( 0 .. 4 ) {
open my $fh, '>', "${datafilename}_$i"
or die "$!";
$fhs[$i] = $fh;
}
# open the datafile
open my $datafile_fh, '<', $datafilename
or die "$!";
my $row_number = 0;
while ( my $datarow = <$datafile_fh> ) {
print { $fhs[$row_number++ % #fhs] } $datarow;
}
# close resources
foreach my $fh ( #fhs ) {
close $fh;
}

create a hash from command line in perl

I have a file with data like below:
4 1
7 12
2 5
4 4
6 67
12 5
through command line i can split each and every line into an array like below:
perl -F'\s+' -ane 'print $F[0]' file
thus will print all the first fields.
Now the above command transforms every line into an array.
in a similar way can this be done line creating a hash with keys as the first field and values for each key is the second field.?
Try this:
perl -MData::Dumper -ane '$X{$F[0]}=$F[1]}{print Dumper \%X' file
Yes, it can be done.
perl -MData::Dumper -e '%a = map { (split)[0,1] } <ARGV>;print Dumper \%a' dt.txt

Using SED to delete a line that has multiple fields to match

I have a file that has 12 columns of data. I would like to delete / remove the entire line if column 5 equals "A" and column 12 equals "Z". Is this possible using SED?
You can. Suppose your columns are separated by spaces:
sed -i -e '/\([^ ]* *\)\{4\}A *\([^ ]* *\)\{6\}Z/d' file
The -i flag is used to edit the file in place.
The pattern [^ ]* * matches zero or more (indicated by the asterisk) characters that aren't spaces (indicated by the space character after the ^ in the brackets) followed by zero or more spaces.
Placing this pattern between backslashed parenthesis, we can group it into a single expression, and we can then use backslashed braces to repeat the expression. Four times initially, then match an A followed by spaces, then the pattern again repeated six times, then the Z.
Hope this helps =)
You can do this with sed, but it is much easier with awk:
awk '! ( $5 == "A" && $12 == "Z" )' input-file
or
awk '$5 != "A" || $12 != "Z"' input-file
perl -F -ane 'print unless($F[4] eq "A" and $F[11] eq "Z") your_file
tested below:
> cat temp
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 A 6 7 8 9 10 11 Z
> perl -F -ane 'print unless($F[4] eq "A" and $F[11] eq "Z")' temp
1 2 3 4 5 6 7 8 9 10 11 12
>

Prevent column shift after character removal

Currently I am using the following oneliner for the removal of special characters:
sed 's/[-$*=+()]//g'
However sometimes it occurs that a column only contains the special character *.
How can I prevent the column from shifting if it only contains *?
Would it be possible to use a placeholder, so that whenever it occurs that the only character(s) in the columns two and/or four are * it gets replaced by N for every *?
From:
6 cc-g*$ 10 cc+c
6 c$c$*g$q 10 ***
6 *c*c$$qq 10 ccc
6 ** 10 c$cc
6 ** 10 *
To possibly:
6 ccg 10 ccc
6 ccgq 10 NNN
6 ccqq 10 ccc
6 NN 10 ccc
6 NN 10 N
Try with in awk,
awk '{ if($2 ~ /^[*]+$/) { gsub ( /[*]/,"N",$2); } if($4 ~ /^[*]+$/ ){ gsub ( /[*]/,"N",$4); } print }' your_file.txt | sed 's/[-$*=+()]//g'
I hope this will help you.
One way using perl. Traverse all fields of each line and substitute special characters unless the field only has * characters. After that print them separated with one space.
perl -ane '
for my $pos ( 0 .. $#F ) {
$F[ $pos ] =~ s/[-\$*=+()]//g unless $F[ $pos ] =~ m/\A\*+\Z/;
}
printf qq|%s\n|, join qq| |, #F;
' infile
Assuming infile has the content of the question, output will be:
6 ccg 10 ccc
6 ccgq 10 ***
6 ccqq 10 ccc
6 ** 10 ccc
6 ** 10 *
This might work for you (GNU sed):
sed 'h;s/\S*\s*\(\S*\).*/\1/;:a;/^\**$/y/*/N/;s/[*$+=-]//g;H;g;/\n.*\n/bb;s/\(\S*\s*\)\{3\}\(\S*\).*/\2/;ba;:b;s/^\(\S*\s*\)\(\S*\)\([^\n]*\)\n\(\S*\)/\1\4\3/;s/\(\S*\)\n\(.*\)/\2/' file

Concatenate Lines in Bash

Most command-line programs just operate on one line at a time.
Can I use a common command-line utility (echo, sed, awk, etc) to concatenate every set of two lines, or would I need to write a script/program from scratch to do this?
$ cat myFile
line 1
line 2
line 3
line 4
$ cat myFile | __somecommand__
line 1line 2
line 3line 4
sed 'N;s/\n/ /;'
Grab next line, and substitute newline character with space.
seq 1 6 | sed 'N;s/\n/ /;'
1 2
3 4
5 6
$ awk 'ORS=(NR%2)?" ":"\n"' file
line 1 line 2
line 3 line 4
$ paste - - < file
line 1 line 2
line 3 line 4
Not a particular command, but this snippet of shell should do the trick:
cat myFile | while read line; do echo -n $line; [ "${i}" ] && echo && i= || i=1 ; done
You can also use Perl as:
$ perl -pe 'chomp;$i++;unless($i%2){$_.="\n"};' < file
line 1line 2
line 3line 4
Here's a shell script version that doesn't need to toggle a flag:
while read line1; do read line2; echo $line1$line2; done < inputfile