Am a noob when it comes to coding so bear with me... I am trying to write a script that will input data from 3 separate files into 3 specific locations within a text file - for example:
edited to read easier
#start of script
start_of_line1_text "$1" end_of_line1_text
start_of_line2_text "$2" end_of_line2_text
start_of_line3_text "$3" end_of_line3_text
#output to text when done
$1 is the value in text1
$2 is the value in text2
$3 is the value in text3
I am thinking of using sed but cant quite work out how this would be done...
Or just insert a word at $1 after matching random_text? ie:
sed '/start_of_line1_text/ a middle_of_line1_text' input
Also on a larger scale - if text1,2 and 3 had multiple values in how could you import these values 1 at a time and save a new file each time? so for example:
text1 =
a
b
c
text2 =
e
f
g
text3 =
h
i
j
#start of script
start_of_line1_text "line one of text1" end_of_line1_text
start_of_line2_text "line one of text2" end_of_line2_text
start_of_line3_text "line one of text3" end_of_line3_text
#output to text when done
then:
#start of script
start_of_line1_text "line two of text1" end_of_line1_text
start_of_line2_text "line two of text2" end_of_line2_text
start_of_line3_text "line two of text3" end_of_line3_text
#output to text when done
Im not fussy on the language used I am just a bit stuck as to how to fit this all together....
Many thanks in advance
This problem suits very well awk.
Give this a try:
awk '!f[FNR] {f[FNR]=("out" FNR ".txt"); print "#start of script" >f[FNR]} {print "random_text \"" $0 "\" some_other_text" >f[FNR]} END {for(n in f) print "#output to text when done">f[n]}' text1 text2 text3
The output files are generated in current directory out1.txt and out2.txt etc.
You can provide any number of text files as input at the end of the command.
There is below the script file version with additional parameters:
#!/usr/bin/awk -f
!f[FNR] {
f[FNR]=("out" FNR ".txt")
print first >f[FNR]
}
{
print start $0 end >f[FNR]
}
END {
for(n in f) print last>f[n]
}
The test:
chmod +x ./script.awk
./script.awk -v first="#start of script" \
-v start="random_text \"" -v end="\" some_other_text"\
-v last="#output to text when done" \
text1 text2 text3
what I have so far is:
#!/usr/bin/perl
use strict;
use warnings;
sub rtrim { my $s = shift; $s =~ s/\s+$//; return $s };
sub read_file_line {
my $fh = shift;
if ($fh and my $line = <$fh>) {
chomp $line;
return [split(/\t/,$line)];
}
return;
}
open(my $f1, "file1.txt");
open(my $f2, "file2.txt");
open(my $f3, "file3.txt");
open(FILE, ">group.txt") or die "Cannot open file";
my $pair1 = read_file_line($f1);
my $pair2 = read_file_line($f2);
my $pair3 = read_file_line($f3);
while ($pair1 and $pair2 and $pair3) {
printf '%s,',$pair1->[0] ;
printf "%s\n",$pair2->[0] ;
printf '%s,',$pair3->[0] ;
printf FILE " line1_text\n" ;
printf FILE " line2_text\n" ;
printf FILE " line3_text\n" ;
printf FILE " start_of_line4_text\"%s\end_of_line4_text\n",$pair3->[0] ;
printf FILE " start_of_line5_text\"%s\end_of_line5_text\n",$pair2->[0] ;
printf FILE " start_of_line6_text\"%s\end_of_line6_text\n",rtrim($pair1->[0]) ;
printf FILE " line7_text\n" ;
printf FILE " line8_text\n\n\n" ;
$pair1 = read_file_line($f1);
$pair2 = read_file_line($f2);
$pair3 = read_file_line($f3);
}
close($f1);
close($f2);
close($f3);
close(FILE) ;
Related
The bit of code below is used in a script that queries a database and produces an output file of records. One problem that I am having is that a newline character seems to be inserted at the very end of the file and it is causing grief with another script I am working on to import the data into another database. I thought this would be as easy as a chomp on the filehandle but that is not allowed. I have read many ways to do this on the net but not sure the path to take. Do you guys see a way to do this?
while ($queryResults->MoveNext() == $CQPerlExt::CQ_SUCCESS) {
$swCR = $session->GetEntityByDbId("SWCR", $queryResults->GetColumnValue(1));
# Gather data
$swID = $swCR->GetFieldValue("RecordID")->GetValue();
$swData = "<RecordID>" . $swID . "</RecordID>";
foreach $fieldName (#fieldNames)
{
$swData = $swData . "<" . $fieldName . ">" . $swCR->GetFieldStringValue($fieldName) . "</" . $fieldName . ">";
}
# Build file with records seperated by custom line delimiter
print OUTFILE $swData . "~~lineDelimiter~~\n";
}
close(OUTFILE);
You could output the newline at the beginning of each line instead of the end, and skip the first one.
my $count = 0;
my #stuff = qw(a b c d);
while (my $letter = shift #stuff) {
print "\n" if $count++;
print $letter;
}
print "__test";
This will output:
a
b
c
d__test
Since you have complete control over the output, I would attack it from that angle by never emitting the final newline in the first place. However, since this is Perl we're talking about, there's more than one way to do it. ;-)
I'm using truncate here to truncate the file to 1 byte less than whatever it was before (-s returns the file size in bytes).
my $file = 'test.txt';
open(my $fh, '>', $file) or die $!;
print $fh "$_\n" for 1..10;
close($fh);
truncate($file, (-s $file) - 1);
Output:
$ od -c test.txt
0000000 1 \n 2 \n 3 \n 4 \n 5 \n 6 \n 7 \n 8 \n
0000020 9 \n 1 0
0000024
I am a total novice when it comes to computer programming and perl, so please forgive me if this question is simple!
I am trying to run a perl script (called ploteig, a component of a free genetics program download, Eigenstrat:Eigenstrat software) that works fine until I get to line 96-
open (YY, ">$dfile") || die "can't open $dfile\n" ;
I am given the error that the file is unable to be opened and the script dies.
Below, I have provided the entire code for you (since honestly, I have no idea what part of the code could be influencing the inability to open the file). The code uses input from a file created previously with Eigenstrat, example of 4 rows, 12 columns:
#eigvals: 20.388 7.503 4.033 2.929 2.822 2.726 2.700 2.590 2.451 2.365
GREY_BI_011_COMSTOCK_11 0.0164 0.0164 0.0382 -0.1283 -0.0658 0.0406 0.0322 0.0105 -0.0851 -0.0625 Case
GREY_BI_014_COMSTOCK_14 0.0191 0.0094 0.0567 -0.0250 0.0804 -0.0531 -0.0165 0.0321 0.1130 -0.0025 Control
GREY_BI_015_COMSTOCK_15 0.0221 -0.0042 -0.0031 0.0091 0.1448 0.0351 0.0430 0.0359 0.0049 0.0791 Control
(rows represent individual sample pca scores, columns specific pcas. First column sample names, last column case or control status)
Additionally, I call the code as follows:
perl ploteig –i combogreyout.pca.evec –p Case:Control –s Out –c 1:2 –x –o utploteig.xtxt –k
I am really unsure where to go from here. I tried changing the file permissions and ensuring it was in the working directory, but it wouldn’t allow me to change permissions and everything pointed to being in the correct directory. However, I am unsure if either of these are the real problem.
I would very much appreciate any help anyone can give me!
Thank you SO much!
> #!/usr/bin/perl -w
### ploteig -i eigfile -p pops -c a:b [-t title] [-s stem] [-o outfile] [-x] [-k] [-y]
[-z sep] [-f fixgreen]
use Getopt::Std ;
use File::Basename ;
## pops : separated -x = make postscript and pdf -z use another separator
## -k keep intermediate files
## NEW if pops is a file names are read one per line
getopts('i:o:p:c:s:d:z:t:xkyf',\%opts) ;
$postscmode = $opts{"x"} ;
$oldkeystyle = $opts{"y"} ;
$kflag = $opts{"k"} ;
$keepflag = 1 if ($kflag) ;
$keepflag = 1 unless ($postscmode) ;
$dofixgreen = ( exists $opts{"f"} ? $opts{"f"} : 0 );
$zsep = ":" ;
if (defined $opts{"z"}) {
$zsep = $opts{"z"} ;
$zsep = "\+" if ($zsep eq "+") ;
}
$title = "" ;
if (defined $opts{"t"}) {
$title = $opts{"t"} ;
}
if (defined $opts{"i"}) {
$infile = $opts{"i"} ;
}
else {
usage() ;
exit 0 ;
}
open (FF, $infile) || die "can't open $infile\n" ;
#L = (<FF>) ;
chomp #L ;
$nf = 0 ;
foreach $line (#L) {
next if ($line =~ /\#/) ;
#Z = split " ", $line ;
$x = #Z ;
$nf = $x if ($nf < $x) ;
}
printf "## number of fields: %d\n", $nf ;
$popcol = $nf-1 ;
if (defined $opts{"p"}) {
$pops = $opts{"p"} ;
}
else {
die "p parameter compulsory\n" ;
}
$popsname = setpops ($pops) ;
print "$popsname\n" ;
$c1 = 1; $c2 =2 ;
if (defined $opts{"c"}) {
$cols = $opts{"c"} ;
($c1, $c2) = split ":", $cols ;
die "bad c param: $cols\n" unless (defined $cols) ;
}
$stem = "$infile.$c1:$c2" ;
if (defined $opts{"s"}) {
$stem = $opts{"s"} ;
}
$gnfile = "$stem.$popsname.xtxt" ;
if (defined $opts{"o"}) {
$gnfile = $opts{"o"} ;
}
#T = () ; ## trash
open (GG, ">$gnfile") || die "can't open $gnfile\n" ;
print GG "## " unless ($postscmode) ;
print GG "set terminal postscript color\n" ;
print GG "set title \"$title\" \n" ;
print GG "set key outside\n" unless ($oldkeystyle) ;
print GG "set xlabel \"eigenvector $c1\" \n" ;
print GG "set ylabel \"eigenvector $c2\" \n" ;
print GG "plot " ;
$np = #P ;
$lastpop = $P[$np-1] ;
$d1 = $c1+1 ;
$d2 = $c2+1 ;
foreach $pop (#P) {
$dfile = "$stem:$pop" ;
push #T, $dfile ;
print GG " \"$dfile\" using $d1:$d2 title \"$pop\" " ;
print GG ", \\\n" unless ($pop eq $lastpop) ;
chomp $dfile;
open (YY, ">$dfile") || die "can't open $dfile\n" ;
foreach $line (#L) {
next if ($line =~ /\#/) ;
#Z = split " ", $line ;
next unless (defined $Z[$popcol]) ;
next unless ($Z[$popcol] eq $pop) ;
print YY "$line\n" ;
}
close YY ;
}
print GG "\n" ;
print GG "## " if ($postscmode) ;
print GG "pause 9999\n" ;
close GG ;
if ($postscmode) {
$psfile = "$stem.ps" ;
if ($gnfile =~ /xtxt/) {
$psfile = $gnfile ;
$psfile =~ s/xtxt/ps/ ;
}
system "gnuplot < $gnfile > $psfile" ;
if ( $dofixgreen ) {
system "fixgreen $psfile" ;
}
system "ps2pdf $psfile " ;
}
unlink (#T) unless $keepflag ;
sub usage {
print "ploteig -i eigfile -p pops -c a:b [-t title] [-s stem] [-o outfile] [-x] [-k]\n" ;
print "-i eigfile input file first col indiv-id last col population\n" ;
print "## as output by smartpca in outputvecs \n" ;
print "-c a:b a, b columns to plot. 1:2 would be common and leading 2 eigenvectors\n" ;
print "-p pops Populations to plot. : delimited. eg -p Bantu:San:French\n" ;
print "## pops can also be a filename. List populations 1 per line\n" ;
print "[-s stem] stem will start various output files\n" ;
print "[-o ofile] ofile will be gnuplot control file. Should have xtxt suffix\n";
print "[-x] make ps and pdf files\n" ;
print "[-k] keep various intermediate files although -x set\n" ;
print "## necessary if .xtxt file is to be hand edited\n" ;
print "[-y] put key at top right inside box (old mode)\n" ;
print "[-t] title (legend)\n" ;
print "[-f] fix green and yellow colors\n";
print "The xtxt file is a gnuplot file and can be easily hand edited. Intermediate files
needed if you want to make your own plot\n" ;
}
sub setpops {
my ($pops) = #_ ;
local (#a, $d, $b, $e) ;
if (-e $pops) {
open (FF1, $pops) || die "can't open $pops\n" ;
#P = () ;
foreach $line (<FF1>) {
($a) = split " ", $line ;
next unless (defined $a) ;
next if ($a =~ /\#/) ;
push #P, $a ;
}
$out = join ":", #P ;
print "## pops: $out\n" ;
($b, $d , $e) = fileparse($pops) ;
return $b ;
}
#P = split $zsep, $pops ;
return $pops ;
}
I wrote a perl program to take a regex from the command line and do a recursive search of the current directory for certain filenames and filetypes, grep each one for the regex, and output the results, including filename and line number. [ basic grep + find functionality that I can go in and customize as needed ]
cat <<'EOF' >perlgrep2.pl
#!/usr/bin/env perl
$expr = join ' ', #ARGV;
my #filetypes = qw(cpp c h m txt log idl java pl csv);
my #filenames = qw(Makefile);
my $find="find . ";
my $nfirst = 0;
foreach(#filenames) {
$find .= " -o " if $nfirst++;
$find .= "-name \"$_\"";
}
foreach(#filetypes) {
$find .= " -o " if $nfirst++;
$find .= "-name \\*.$_";
}
#files=`$find`;
foreach(#files) {
s#^\./##;
chomp;
}
#ARGV = #files;
foreach(<>) {
print "$ARGV($.): $_" if m/\Q$expr/;
close ARGV if eof;
}
EOF
cat <<'EOF' >a.pl
print "hello ";
$a=1;
print "there";
EOF
cat <<'EOF' >b.pl
print "goodbye ";
print "all";
$a=1;
EOF
chmod ugo+x perlgrep2.pl
./perlgrep2.pl print
If you copy and paste this into your terminal, you will see this:
perlgrep2.pl(36): print "hello ";
perlgrep2.pl(0): print "there";
perlgrep2.pl(0): print "goodbye ";
perlgrep2.pl(0): print "all";
perlgrep2.pl(0): print "$ARGV($.): $_" if m/\Q$expr/;
This is very surprising to me. The program appears to be working except that the $. and $ARGV variables do not have the values I expected. It appears from the state of the variables that perl has already read all three files (total of 36 lines) when it executes the first iteration of the loop over <>. What's going on ? How to fix ? This is Perl 5.12.4.
You're using foreach(<>) where you should be using while(<>). foreach(<>) will read every file in #ARGV into a temporary list before it starts iterating over it.
I have a text file where file content has delimiter as space in beginning.
Its like below:
First line doesn't have any space in beginning.
Second line has 2 space.
Third line has 4 space in the beginning.
Fourth line has 6 spaces in the beginning.
Again this pattern is repeated till end of file in a random way as shown in text file eg below.
I want to read these lines from the text file and save the lines in pattern:
having no space in first column.
having 2 spaces in second column.
4 spaces in third column.
6 spaces in fourth column of a CSV file.
The text file structure is (representing spaces by #) :
ABC
##EFG"123"
####<HIJK> 22: test file
######LMNOP "Test"
######sssstt"123"
QRS
##TU"223"
####<www> 32: test2 file
######yz test1
####<www> 88: test3 file
######rreeeww
######oooiiiii
##PP
##ss
####<qqq> 89: test6 file
######hhhhggg
######bbbbaaa
######cccczzz
######uu test3
Expected output image:
I am new to Perl, I know how to open a file and read through line but I am not understanding how to store this kind of structure in CSV columns.
my $file = 'C:\\outputfile.txt';
open(my $fh, '<:encoding(UTF-8)', $file) or die "Could not open file '$file' $!";
while (my $row = <$fh>) { # reading each row till end of file
chomp $row;
//what should be done here ?
}
Please help.
If you have questions about code, I will say: yes, I can answer, but this is not good or the best example of Perl code. Just fast to write.
my $previous_count = "-1"; #beginning, we will think, that no spaces.
my $current_count = "0"; #current default value
my $maximum_count = 3; #u say so
my $to_written = "";
my $delimiter_between_columns = ",";
my $newline_separator = ";";
my $symbol_at_the_beginning = "#"; #input any symbol. But I suppose, you want "\s" <- whitespace' symbol class. input it like this: $var = "\s";
my #aggregate_array_of_ports=();
while(my $row = <DATA>){
#ok, read.
chomp($row);
#print "row is : $row\n";
if($row =~ m/^([$symbol_at_the_beginning]*)/){
#print length($1);
$current_count = length($1) / 2; #take number of spaces divided by 2
$row =~ s/^[$symbol_at_the_beginning]+//;
#hint here, we can get counts as 0,1,2,3 <-see?
#if you take first and third word, you need to add 2 separators.
#OR if you take count with LESSER then previous count, it mean, that you need output
#print"prev : $previous_count and curr : $current_count\n ";
#print"I will write: $to_written\n";
#print "\n PREV: $previous_count --> CURR: $current_count \n";
if($previous_count>=$current_count){
#output here
print "$to_written".$newline_separator."\n";
$previous_count = 0;
$to_written = "";
}
$previous_count = 0 if($previous_count==-1);
#print "$delimiter_between_columns x($current_count-$previous_count)\n";
#print "current: $current_count previous: $previous_count \n";
$to_written .= $delimiter_between_columns x ($current_count - $previous_count + (($current_count-$previous_count)==3?2:0) )."$row";
if ($current_count==($maximum_count-1)){
#print "I input this!: $to_written\n";
$to_written = prepare_to_input_four_spaces($to_written, $delimiter_between_columns);
}
$previous_count = $current_count;
#print"\n";
}
}
#print "$to_written".$newline_separator."\n";
sub prepare_to_input_four_spaces{
my $str = shift; #take string
my $delim = shift;
if ($str=~ m/(.+?[>])\s+(\d+)[:]\s+(.+?)$/){
#here I want to find first capture group before [>] (also it includes) |(.+?[>])|
#next, some spaces |\s+| and I want to catch port |(\d+)|.
#next, |[:]| symbol and some spaces again |\s+| before the tail of the string.
#and will catch this tail: |(.+?)$|.
#where $ mean the right "border" of the string (really - end of the string)
$str = $1.$delim.$2.$delim.$3;
}
return $str;
}
=pod
__DATA__
ABC
EFG"123"
HIJK (12345)
LMNOP "Test"
sssstt"123"
QRS
TU"223"
vwx"55"
www"88"
yz:test1
__END__
=cut
__DATA__
ABC
##EFG"123"
####<HIJK> 22: test file
######LMNOP "Test"
######sssstt"123"
QRS
##TU"223"
####<www> 32: test2 file
######yz test1
####<www> 88: test3 file
######rreeeww
######oooiiiii
##PP
##ss
####<qqq> 89: test6 file
######hhhhggg
######bbbbaaa
######cccczzz
######uu test3
Probably this is ok for you:
I just skipped putting the header and had put the separator as "|".you can any how change it.
> perl -lne 'if(/^[^\#]/){if($.!=1){print "$a"};$a=$_;}else{s/^#*//g;$a.="|$_";}END{print $a}' temp
ABC|EFG"123"|HIJK (12345)|LMNOP "Test"|sssstt"123"
QRS|TU"223"|vwx"55"|www"88"|yz:test1
I have a file that is in the following format:
Preamble
---------------------
Section 1
...
---------------------
---------------------
Section 2
...
---------------------
---------------------
Section 3
...
---------------------
Afterwords
And I want to extract each section by the separator so that I'll have a result in:
file0:
Section 1
...
file1:
Section 2
...
file2:
Section 3
...
...
Is there a simple way to do this? Thanks.
[Update] Using chomp and $_ makes this even shorter.
This should do it:
If your input record separator is a sequence of 21 -'s, this is easy with perl -ne:
perl -ne 'BEGIN{ $/=("-"x21)."\n"; $i=0; }
do { open F, ">file".($i++);
chomp;
print F;
close F;
} if /^Section/' yourfile.txt
should work, and create files file0.. fileN.
Explanation
Easier to explain as a stand-alone Perl-script perhaps?
$/=("-"x21)."\n"; # Set the input-record-separator to "-" x 21 times
my $i = 0; # output file number
open IN, "<yourfile.txt" or die "$!";
while (<IN>) { # Each "record" will be available as $_
do { open F, ">file".($i++);
chomp; # remove the trailing "---..."
print F; # write the record to the file
close F; #
} if /^Section/ # do all this only it this is a Section
}
Perl's awk lineage was useful here, so let's show an awk version for comparion:
awk 'BEGIN{RS="\n-+\n";i=0}
/Section/ {chomp; print > "file_"(i++)".txt"
}' yourfile.txt
Not too bad compared to the perl version, it's actually shorter. The $/ in Perl is the RS variable in awk. Awk has an upper hand here: RS may be a regular expression!
You can do with shell too :
#!/bin/bash
i=0
while read line ; do
#If the line contain "Section " followed by a
#digit the next lines have to be printed
echo "$line"|egrep -q "Section [0-9]+"
if [ $? -eq 0 ] ; then
toprint=true
i=$(($i + 1))
touch file$i
fi
#If the line contain "--------------------"
#the next lines doesn't have to be printed
echo "$line"|egrep -q "[-]{20}"
if [ $? -eq 0 ] ; then
toprint=false
fi
#Print the line if needed
if $toprint ; then
echo $line >> file$i
fi
done < sections.txt
Here's what you're looking for:
awk '/^-{21}$/ { f++; next } f%2!=0 { print > "file" (f-1)/2 ".txt" }' file
Results:
Contents of file0.txt:
Section 1
...
Contents of file1.txt:
Section 2
...
Contents of file2.txt:
Section 3
...
As you can see the above filenames are 'zero' indexed. If you'd like filenames 'one' indexed, simply change (f-1)/2 to (f+1)/2. HTH.
Given your file's format, here's one option:
use strict;
use warnings;
my $fh;
my $sep = '-' x 21;
while (<>) {
if (/^Section\s+(\d+)/) {
open $fh, '>', 'file' . ( $1 - 1 ) . '.txt' or die $!;
}
print $fh $_ if defined $fh and !/^$sep/;
}
On your data, creates file0.txt .. file2.txt with file0.txt containing:
Section 1
...