How can I export an Oracle table to tab separated values? - perl

I need to export a table in the database to a tab separated values file. I am using DBI on Perl and SQLPlus. Does it support (DBI or SQLPlus) exporting and importing to or from TSV files?
I can write a code to do my need, But I would like to use a ready made solution if it is available.

It should be relatively simple to dump a table to a file with tab-separated values.
For example:
open(my $outputFile, '>', 'myTable.tsv');
my $sth = $dbh->prepare('SELECT * FROM myTable');
$sth->execute;
while (my $row = $sth->fetchrow_arrayref) {
print $outputFile join("\t", #$row) . "\n";
}
close $outputFile;
$sth->finish;
Note that this will not work well if your data contains either a tab or a newline.

From the information you have provided I am guessing you are using DBI to connect to an Oracle instance (since you mentioned sqlplus).
If you want a "ready made" solution as you have indicated, your best bet is to use "yasql" (Yet Another SQLplus) a DBD::Oracle based database shell for oracle.
yasql has a neat feature that you can write an sql select statement and redirect the output to a CSV file directly from its shell (You need Text::CSV_XS) installed for that.
On the other hand you can roll your own script with DBD::Oracle and Text::CSV_XS. Once your statement handles are prepared and executed, all you need to do is:
$csv->print ($fh, $_) for #{$sth->fetchrow_array};
Assuming you have initialised $csv with tab as record separator. See the Text::CSV_XS Documentation for details

Here's an approach with awk and sqlplus only. You can use store the awk script or copy/paste the oneliner. It uses the HTML output mode so that fields are not clobbered.
Store this script as sqlplus2tsv.awk:
# This requires you to use the -M "HTML ON" option for sqlplus, eg:
# sqlplus -S -M "HTML ON" user#sid #script | awk -f sqlplus2tsv.awk
#
# You can also use the "set markup html on" command in your sql script
#
# Outputs tab delimited records, one per line, without column names.
# Fields are URI encoded.
#
# You can also use the oneliner
# awk '/^<tr/{l=f=""}/^<\/tr>/&&l{print l}/^<\/td>/{a=0}a{l=l$0}/^<td/{l=l f;f="\t";a=1}'
# if you don't want to store a script file
# Start of a record
/^<tr/ {
l=f=""
}
# End of a record
/^<\/tr>/ && l {
print l
}
# End of a field
/^<\/td>/ {
a=0
}
# Field value
# Not sure how multiline content is output
a {
l=l $0
}
# Start of a field
/^<td/ {
l=l f
f="\t"
a=1
}
Didn't test this with long strings and weird characters, it worked for my use case. An enterprising soul could adapt this technique to a perl wrapper :)

I have had to do that in the past... I have a perl script that you pass the query you wish to run and pipe that through sqlplus. Here is an excerpt:
open(UNLOAD, "> $file"); # Open the unload file.
$query =~ s/;$//; # Remove any trailng semicolons.
# Build the sql statement.
$cmd = "echo \"SET HEAD OFF
SET FEED OFF
SET COLSEP \|
SET LINES 32767
SET PAGES 0
$query;
exit;
\" |sqlplus -s $DB_U/$DB_P";
#array = `$cmd`; # Execute the sql and store
# the returned data in "array".
print $cmd . "\n";
clean(#array); # Remove any non-necessary whitespace.
# This is a method to remove random non needed characters
# from the array
foreach $x (#array) # Print each line of the
{ # array to the unload file.
print UNLOAD "$x\|\n";
}
close UNLOAD; # Close the unload file.
Of course above I am making it pipe delimeted... if you want tabs you just need the \t instead of the | in the print.

Related

Emboss Cons for getting consensus sequence for many files, not just one

I installed and configured emboss and can run the simple command line arguments for getting the consensus of one previously aligned multifasta file:
% cons
Create a consensus sequence from a multiple alignment
Input (aligned) sequence set: dna.msf
output sequence [dna.fasta]: aligned.cons
This is perfect for dealing with one file at a time, but I have hundreds to process.
I have started to write a perl script with a foreach loop to try and process this for every file, but I guess I need to be outside of the script to run these commands. Any clue on how I can run a command line friendly program for getting a single consensus sequence in fasta format from a previously aligned multifasta file, for many files in succession? I don't have to use emboss- I could use another program.
Here is my code so far:
#!/usr/bin/perl
use warnings;
use strict;
my $dir = ("/Users/roblogan/Documents/Clustered_Barcodes_Aligned");
my #ArrayofFiles = glob "$dir/*"; #put all files in the directory into an array
#print join("\n", #ArrayofFiles), "\n"; #diagnostic print
foreach my $file (#ArrayofFiles){
print 'cons', "\n";
print "/Users/roblogan/Documents/Clustered_Barcodes_Aligned/Clustered_Barcode_Number_*.*.Sequences.txt.out", "\n";
print "*.*.Consensus.txt", "\n";
}
EMBOSS cons has two mandatory qualifier:
- sequence( to provide the input sequence)
- outseq (for output).
so you need to provide the above to field .
Now change your code little bit to run multiple program:
my $count=1;
foreach my $file (#ArrayofFiles){
$output_path= "/Users/roblogan/Documents/Clustered_Barcodes_Aligned/";
my $output_file = $output_path. "out$count";# please change here to get your desired output filename
my $command = "cons -sequence '$file' -outseq '$output_file' ";
system($command);
$count ++;
}
Hope the above code will work for you.

Perl script -- Multiple text file parsing and writing

Suppose i have this directory full of text files (raw text). What i need is a Perl script that will parse the directory (up2bottom) text files one by one and save their contents in a new single file, appointed by me. In other words i simply want to create a corpus of many documents. Note: these documents have to be separated by some tag e.g. indicating the sequence in which they were parsed.
So far i have managed to follow some examples and i know how to read, write and parse text files. But i am not yet in position to merge them into one script and handle many text files. Can you please provide some assistance. thanks
edit:
example code for writing to a file.
#!/usr/local/bin/perl
open (MYFILE, '>>data.txt');
print MYFILE "text\n";
close (MYFILE);
example code for reading a file.
#!/usr/local/bin/perl
open (MYFILE, 'data.txt');
while (<MYFILE>) {
chomp;
print "$_\n";
}
close (MYFILE);
I've also find out about the foreach function which can be used for tasks as such, but still don't know how to combine them and achieve the result explained in the description.
The important points in this suggestion are:
the "magic" diamond operator (a.k.a. readline), which reads from each file in *ARGV,
the eof function, which tells if the next readline on the current filehandle will return any data
the $ARGV variable, that contains the name of the currently opened file.
With that intro, here we go!
#!/usr/bin/perl
use strict; # Always!
use warnings; # Always!
my $header = 1; # Flag to tell us to print the header
while (<>) { # read a line from a file
if ($header) {
# This is the first line, print the name of the file
print "========= $ARGV ========\n";
# reset the flag to a false value
$header = undef;
}
# Print out what we just read in
print;
}
continue { # This happens before the next iteration of the loop
# Check if we finished the previous file
$header = 1 if eof;
}
To use it, just do: perl concat.pl *.txt > compiled.TXT

Counting records separated by CR/LF (carriage return and newline) in Perl

I'm trying to create a simple script to read a text file that contains records of book titles. Each record is separated with a plain old double space (\r\n\r\n). I need to count how many records are in the file.
For example here is the input file:
record 1
some text
record 2
some text
...
I'm using a regex to check for carriage return and newline, but it fails to match. What am I doing wrong? I'm at my wits' end.
sub readInputFile {
my $inputFile = $_[0]; #read first argument from the commandline as fileName
open INPUTFILE, "+<", $inputFile or die $!; #Open File
my $singleLine;
my #singleRecord;
my $recordCounter = 0;
while (<INPUTFILE>) { # loop through the input file line-by-line
$singleLine = $_;
push(#singleRecord, $singleLine); # start adding each line to a record array
if ($singleLine =~ m/\r\n/) { # check for carriage return and new line
$recordCounter += 1;
createHashTable(#singleRecord); # send record make a hash table
#singleRecord = (); # empty the current record to start a new record
}
}
print "total records : $recordCounter \n";
close(INPUTFILE);
}
It sounds like you are processing a Windows text file on Linux, in which case you want to open the file with the :crlf layer, which will convert all CRLF line-endings to the standard Perl \n ending.
If you are reading Windows files on a Windows platform then the conversion is already done for you, and you won't find CRLF sequences in the data you have read. If you are reading a Linux file then there are no CR characters in there anyway.
It also sounds like your records are separated by a blank line. Setting the built-in input record separator variable $/ to a null string will cause Perl to read a whole record at a time.
I believe this version of your subroutine is what you need. Note that people familiar with Perl will thank you for using lower-case letters and underscore for variables and subroutine names. Mixed case is conventionally reserved for package names.
You don't show create_hash_table so I can't tell what data it needs. I have chomped and split the record into lines, and passed a list of the lines in the record with the newlines removed. It would probably be better to pass the entire record as a single string and leave create_hash_table to process it as required.
sub read_input_file {
my ($input_file) = #_;
open my $fh, '<:crlf', $input_file or die $!;
local $/ = '';
my $record_counter = 0;
while (my $record = <$fh>) {
chomp;
++$record_counter;
create_hash_table(split /\n/, $record);
}
close $fh;
print "Total records : $record_counter\n";
}
You can do this more succinctly by changing Perl's record-separator, which will make the loop return a record at a time instead of a line at a time.
E.g. after opening your file:
local $/ = "\r\n\r\n";
my $recordCounter = 0;
$recordCounter++ while(<INPUTFILE>);
$/ holds Perl's global record-separator, and scoping it with local allows you to override its value temporarily until the end of the enclosing block, when it will automatically revert back to its previous value.
But it sounds like the file you're processing may actually have "\n\n" record-separators, or even "\r\r". You'd need to set the record-separator correctly for whatever file you're processing.
If your files are not huge multi-gigabytes files, the easiest and safest way is to read the whole file, and use the generic newline metacharacter \R.
This way, it also works if some file actually uses LF instead of CRLF (or even the old Mac standard CR).
Use it with split if you also need the actual records:
perl -ln -0777 -e 'my #records = split /\R\R/; print scalar(#records)' $Your_File
Or if you only want to count the records:
perl -ln -0777 -e 'my $count=()=/\R\R/g; print $count' $Your_File
For more details, see also my other answer here to a similar question.

Check for existence of directory in Perl with wildcard

I need to check whether any of a set of directories exist in a Perl script. The directories are named in the format XXXX*YYY - I need to check for each XXXX and enter an if statement if true.
In my script I have two variables $monitor_location (contains the path to the root directory being scanned) and $clientid (contains the XXXX).
The code snippet below has been expanded to show more of what I'm doing. I have a query which returns each client ID, I'm then looping for each record returned and trying to calculate the disk space used by that client ID.
I have the following code so far (doesn't work):
# loop for each client
while ( ($clientid, $email, $name, $max_record) = $query_handle1->fetchrow_array() )
{
# add leading zeroes to client ID if needed
$clientid=sprintf"%04s",$clientid;
# scan file system to check how much recording space has been used
if (-d "$monitor_location/$clientid\*") {
# there are some call recordings for this client
$str = `du -c $monitor_location/$clientid* | tail -n 1 2>/dev/null`;
$str =~ /^(\d+)/;
$client_recspace = $1;
print "Client $clientid has used $client_recspace of $max_record\n";
}
}
To be clear, I want to enter the if statement if there are any folders that start with XXXX.
Hope this makes sense! Thanks
You can use glob to expand the wildcard:
for my $dir (grep -d, glob "$monitor_location/$clientid*") {
...
}
I have a "thing" against glob. (It seems to only work once (for me), meaning you couldn't re-glob that same dir again later in the same script. It's probably just me, though.)
I prefer readdir(). This is definitely longer, but it WFM.
chdir("$monitor_location") or die;
open(DIR, ".") or die;
my #items = grep(-d, grep(/^$clientid/, readdir(DIR)));
close(DIR);
Everything in #items matches what you want.

How to insert a line into the middle of an existing file

consider an example where i want to insert few lines of text when
particular patter matches(if $line=~m/few lines in here/ then
insert lines in next line):
*current file:*
"This is my file and i wanna insert few lines in here and other
text of the file will continue."
*After insertion:*
"This is my file and i wanna insert few lines in here this is my
new text which i wanted to insert and other text of the file will
continue."
This is my code:
my $sourcename = $ARGV[1];
my $destname = $ARGV[0];
print $sourcename,"\n";
print $destname,"\n";
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename) or die "Could not open source Excel file $sourcename: $!";
my $source_cell;
#Sheet 1 - source sheet page having testnumber and worksheet number
my $source_sheet = $source_book->{Worksheet}[0]; #It is used to access worksheet
$source_cell = $source_sheet->{Cells}[1][0]; #Reads content of the cell;
my $seleniumHost = $source_cell->Value;
print $seleniumHost,"\n";
open (F, '+>>',"$destname") or die "Couldn't open `$destname': $!";
my $line;
while ($line = <F>){
print $line;
if($line=~m/FTP/){
#next if /FTP/;
print $line;
print F $seleniumHost;}
The perlfaq covers this. How do I change, delete, or insert a line in a file, or append to the beginning of a file?
Files are fixed blocks of data. They behave much like a piece of paper. How do you insert a line into the middle of a piece of paper? You can't, not unless you left space. You must recopy the whole thing, inserting your line into the new copy.
In a perl one-liner :
perl -ane 's/few lines in here and other\n/this is my\nnew text which i wanted to insert and other /; s/continue./\ncontinue./; print ' FILE
If you don't want a one-liner, it's easy to takes the substitutions in any script ;)
As long as you know the line:
perl -ne 'if ($. == 8) {s//THIS IS NEW!!!\n/}; print;'
Obviously you'd have to use -i to make the actual changes
OR:
perl -i -pe 'if($. == 8) {s//THIS IS NEW!!!\n/}' file
Someone mentioned Tie::File, which is a solution I'll have to look at for editing a file, but I generally use File::Slurp, which has relatively recently added edit_file and edit_file_lines subs.
Using perl's in-place edit flag (-i), it's easy to add lines to an existing file using Perl, as long as you can key off a text string, such as (in your case) "wanna insert few lines in here":
perl -pi -e 's{wanna insert few lines in here}{wanna insert few lines in here this is my\nnew text which i wanted to insert }' filename
It overwrites your old sentence (don't be scared) with a copy of your old sentence (nothing lost) plus the new stuff you want injected. You can even create a backup of the original file if you wish by passing a ".backup" extension to the -i flag:
perl -p -i'.backup' -e 's{wanna insert few lines in here}{wanna insert few lines in here this is my\nnew text which i wanted to insert }' filename
More info on Perl's search & replace capabilities can be found here:
http://www.atrixnet.com/in-line-search-and-replace-in-files-with-real-perl-regular-expressions/
You can avoid having to repeat the "markup" text using variable substitution.
echo -e "first line\nthird line" | perl -pe 's/(^first line$)/\1\nsecond line/'