Extract data and store to db

Extract data and store to db - perl

I have some Question and really need help on this thing but i really dont know where to start from
I want to submit such data via html at once using text area
---START----
|FULLNAME: JANE DEO
|BINF : 492BBJJ
|PRICE: 10
|COUNTRY: GR
---END---
---START----
|FULLNAME: JOHN DEO
|BINF : K92BBJJ
|PRICE: 24
|COUNTRY: AS
---END---
my main point is i want to be able to store multiple data to database instead of inserting 1 by 1
so i want to store each data where is START and END
#!/usr/bin/perl -w
use DBI;
use CGI qw/:standard/;
my $CGI = CGI->new;
my $host = "localhost";
my $dbname = "";
my $usr = "";
my $pwd = '';
my $dbh_usr = DBI->connect("DBI:mysql:$dbname:$host", $usr, $pwd, {RaiseError => 1,})
or die $DBI::errstr;
# $binf extracted from data where is |BINF :
# $price extracted from data where is |PRICE:
# $info this is the whole ---START---- and ---END---
my $upload = $CGI->param("data");
if ($upload) {
my $update_info = $dbh_usr->prepare("INSERT INTO ITEMS(user, pid, basnm, binf, info, status, price) VALUES(?,?,?,?,?,?,?)");
$update_info->execute('join123', '898', 'iono', $binf, $info, 'Active', $price);
$update_info->finish;
}
$dbh_usr->commit;
$dbh_usr->disconnect;
print "Content-type: text/html\n\n";
print <<HTML;
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<h1>The textarea</h1>
<form method="POST">
<textarea name="data" rows="4" cols="50"></textarea>
<br>
<input type="submit" value="Submit">
</form>
</body>
</html>
HTML

my #d1 = split /\n\n/, $upload;
my #data;
foreach (#d1) {
my #a = split /\n/;
$a[2] =~ m/\|BINF : (.+)/;
my $binf = $1;
$a[3] =~ m/\|PRICE: (.+)/;
my $price = $1;
push #data, {binf => $binf, price => $price};
}
foreach (#data) {
say $_->{binf};
say $_->{price};
}

Following sample demo code builds SQL query to add all data in one shot.
Repurpose the sample to your DB table structure.
use strict;
use warnings;
use feature 'say';
my($table,$record,#data,$re);
$table = 'tb_users';
$re = qr/\|(\S+)\s*:\s(.*)/;
while( <DATA> ) {
if( /---START----/ .. / ---END---/ ) {
$record->{$1} = $2 if /$re/;
}
if( / ---END--/ ) {
push #data, $record;
$record = undef;
}
}
my #keys = keys %{$data[0]};
my #values = map { "(" . join(',', map { "'$_'"} $_->#{#keys}) . ")" } #data;
my $query = "INSERT INTO $table
(" . join(',', map { "`$_`" } #keys) . ")
VALUES
" . join(",\n\t", #values) . ";";
say $query;
exit 0;
__DATA__
---START----
|FULLNAME: JANE DEO
|BINF : 492BBJJ
|PRICE: 10
|COUNTRY: GR
---END---
---START----
|FULLNAME: JOHN DEO
|BINF : K92BBJJ
|PRICE: 24
|COUNTRY: AS
---END---
Output
INSERT INTO tb_users
(`BINF`,`COUNTRY`,`FULLNAME`,`PRICE`)
VALUES
('492BBJJ','GR','JANE DEO','10'),
('K92BBJJ','AS','JOHN DEO','24');

Related

Calculations from db values perl cgi

I am trying to fetch calculate the gpa of a student by fetching their grade and the number of credits their class is and calculating it. I am having an issue where it is not calculating the creditsEarned and GPA properly. For the grade of U or F the credits earned should be 0 but that is not what the output is. I am not sure what is wrong with my statements.
#!/usr/bin/perl
#This is going to be the user login check and will set a cookie
use DBI;
use CGI qw(:standard);
use strict;
#Connection error
sub showErrorMsgAndExit {
print header(), start_html(-title=>shift);
print (shift);
print end_html();
exit;
}
#Connecting to the database
my $dbUsername = "root";
my $dbPassword = "password";
my $dsn = "DBI:mysql:f18final:localhost";
my $dbh = DBI->connect($dsn, $dbUsername, $dbPassword, {PrintError => 0});
#error checking
if(!$dbh) {
print header(), start_html(-title=>"Error connecting to DB");
print ("Unable to connec to the database");
print end_html();
exit;
}
print header;
print start_html(-title=>'Edit Classes');
#Need to execute sql command and then iterate row by row
my $sql = "SELECT * FROM tblclasses";
my $sth = $dbh->prepare($sql);
$sth->execute();
my $passedCredits = 0;
my $attemptedCredits = 0;
my $totalHonor = 0;
my $gpa = 0.000;
##SSSssssssearch part
print "<table border=solid 1px>"; #start of table
print "<tr><th>Class Name</th><th>Department</th><th>Class Number</th><th>Grade</th><th>Credits</th>";
print "</tr>";
while( my #row = $sth->fetchrow_array) {
print "<tr><td>";
print $row[1];
print "</td>";
print "<td>";
print $row[2];
print "</td>";
print "<td>";
print $row[3];
print "</td>";
print "<td>";
print $row[4];
print "</td>";
print "<td>";
print $row[5];
print "</td>";
$attemptedCredits = $attemptedCredits + $row[5];
if($row[4] == 'A' || $row[4] == 'a') {
$passedCredits = $passedCredits + $row[5];
$gpa = $gpa + (4 * $row[5]);
}
elsif($row[4] == 'B' || $row[4] == 'b') {
$passedCredits = $passedCredits + $row[5];
$gpa = $gpa + (3 * $row[5]);
}
elsif($row[4] == 'C' || $row[4] == 'c') {
$passedCredits = $passedCredits + $row[5];
$gpa = $gpa + (2 * $row[5]);
}
elsif($row[4] == 'D' || $row[4] == 'd') {
$passedCredits = $passedCredits + $row[5];
$gpa = $gpa + (1 * $row[5]);
}
elsif($row[4] == 'F' || $row[4] == 'f') {
}
elsif($row[4] == 'S' || $row[4] == 's') {
$passedCredits = $passedCredits + $row[5];
}
elsif($row[4] == 'U' || $row[4] == 'u') {
}
#calculate
print "</tr>";
}
print "</table>";
#Need to make a table and populate it with text boxes of all the class data
print "</table>"; #End of table
$gpa = $gpa / $attemptedCredits;
##RReturn values
print qq{
<table border = '1px solid'>
<tr>
<td>
Attempted Credits
</td>
<td>
Passed Credits
</td>
<td>
GPA
</td>
</tr>
<tr>
<td>
$attemptedCredits
</td>
<td>
$passedCredits
</td>
<td>
$gpa
</td>
</tr>
</table>
};
print "<form action=http://localhost/cgi-bin/actions.pl method = 'post' >";
print "<input type = 'submit' name = 'submit' value = 'More Options'>";
print "</form>";
print "<form action=http://localhost/cgi-bin/searchingTran.pl method = 'post' >";
print "<input type = 'text' name = 'search' size = '25'><br>";
print "<input type = 'submit' name = 'submit' value = 'Search'>";
print "</form>";
print end_html();
Here is my output
Also is there a way to print GPA out to three decimal places?

For the grade of U or F the credits earned should be 0 but that is not what the output is.
When you generate the output, you are printing the content of $row[5] before even looking at what the grade was. To display it correctly as 0, you will need to check the grade first and then print either 0 (if the grade was "F" or "U") or $row[5] (if the grade was anything else).
In real-world code, I would recommend using a templating system (such as Template::Toolkit) instead of printing out HTML directly, which would help to avoid this sort of mistake, but I see that this looks like a homework assignment and I doubt that using alternative methods like that would be within the boundaries of the assignment.
Also is there a way to print GPA out to three decimal places?
Use printf or sprintf:
$gpa = sprintf('%0.3f', $gpa / $attemptedCredits);

perl match part of string in two files

I'm using a perl script to look for matches between columns in two tab-delimited files. However for one column I only want to look for a partial match between two strings in two columns.
It concerns $row[4] of $table2 and $row{d} of $table1.
The values in $row[4] of $table2 look like this:
'xxxx'.
The values in $row{d} of $table1 look like this:
'xxxx.aaa'.
If the part before the '.' is the same, there is a match. If not, there is no match. I'm not sure how to implement this in my script. This is what I have so far. I only looks for complete matches between different columns. '...' denotes code that is not important for this question
#! /usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
local $Data::Dumper::Useqq = 1;
use Getopt::Long qw(GetOptions);
...
...
chomp( my #header_table2 = split /\t/, <$table2> );
my %lookup;
while(<$table2>){
chomp;
my #row = split(/\t/);
$lookup{ $row[0] }{ $row[1] }{ $row[4] }{ $row[5] }{ $row[6] }{ $row[7] }{ $row[8] } = [ $row[9], $row[10] ];
}
my #header = do {
my $header = <$table1>;
$header =~ s/\t?\n\z//;
split /\t/, $header;
};
print $table3 join ("\t", #header, qw/ name1 name2 /), "\n";
{
no warnings 'uninitialized';
while(<$table1>){
s/\t?\n\z//;
my %row;
#row{#header} = split /\t/;
print $table3 join ( "\t", #row{#header},
#{ $lookup{ $row{a} }{ $row{b} }{ $row{c} }{ $row{d} }{ $row{e} }{ $row{f} }{ $row{g} }
// [ "", "" ] }), "\n";
}
}

This is looking like a job for a database
The solution below isn't going to work, because you are building your %lookup hash with nine levels of keys ($row[0] .. $row[8]) , and accessing it with only seven levels ($row{a} .. $row{g}), so you're going to have to edit in the real situation
I see no reason to next your hashes so deeply. A single key formed by using join on the relevant fields will work fine and probably a little faster. I also see no reason to extract table2 fields into an array and table1 fields into a hash. An array seems fine in both cases
I've solved your immediate problem by copying each #row from table1 into array #key, and removing the last dot and anything following from the fourth element before building the $key string
In view of your history of adding a spare tab character before the newline at the end of each record, I've also added four die statements that verify the size of the header row and columns rows before continuing. You will probably need to tweak those values according to your real data
use strict;
use warnings 'all';
use Data::Dumper;
local $Data::Dumper::Useqq = 1;
use Getopt::Long qw(GetOptions);
use constant TABLE1_COLUMNS => 9;
use constant TABLE2_COLUMNS => 11;
open my $table2, '<', 'table2.txt' or die $!;
my #header_table2 = do {
my $header = <$table2>;
$header =~ s/\t?\n\z//;
split /\t/, $header;
};
die "Incorrect table 2 header count " . scalar #header_table2
unless #header_table2 == TABLE2_COLUMNS;
my %lookup;
while ( <$table2> ) {
chomp;
my #row = split /\t/;
die "Incorrect table 2 column count " . scalar #row
unless #row == TABLE2_COLUMNS;
my $key = do {
local $" = "\n";
"#row[0..8]";
};
$lookup{ $key } = [ #row[9,10] ];
}
open my $table1, '<', 'table1.txt' or die $!;
my #header = do {
my $header = <$table1>;
$header =~ s/\t?\n\z//;
split /\t/, $header;
};
die "Incorrect table 1 header count " . scalar #header
unless #header == TABLE1_COLUMNS;
open my $table3, '>', 'table3.txt' or die $!;
print $table3 join ("\t", #header, qw/ name1 name2 /), "\n";
while ( <$table1> ) {
s/\t?\n\z//;
my #row = split /\t/;
die "Incorrect table 1 column count " . scalar #row
unless #row == TABLE1_COLUMNS;
my $key = do {
my #key = #row;
$key[3] =~ s/\.[^.]*\z//;
local $" = "\n";
"#key";
};
my $lookup = $lookup{ $key } // [ "", "" ];
print $table3 join("\t", #row, #$lookup), "\n";
}

You're going to have a scoping problem because your array #row and your hash %row both exist in completely different scopes.
But if you have variables (say, $foo and $bar) and you want to know if $foo starts with the contents of $bar followed by a dot, then you can do that using a regular expression check like this:
if ($foo =~ /^$bar\./) {
# match
} else {
# no match
}

rearrange data from one column to a row

I have the below data and I need to make the second column as the header. Any help is appreciated.
Data:
IBM,Voltality,7,73894756.93897434897
IBM,Market,100,983874.34324
GOOG,Sanity,15,8932748
GOOG,Rate Jump,25,873476378.234234
MBLY,Market,340,23423423432.6783
Output:
PRODUCT|Market|Rate Jump|Sanity|Voltality
IBM|100,983874.34324|||7,73894756.93897434897
GOOG||25,873476378.234234|15,8932748|||
MBLY|340,23423423432.6783|||
Code (incomplete / not sure hot to get to the end):
#!/usr/bin/perl
use strict;
use Getopt::Long;
use warnings;
use Data::Dumper;
my $valsep = ',';
my ( %type, %keys, %ccy, %cnt, %avg );
while (<>) {
chomp;
my ( $product, $reason, $count, $lat ) = split /,/;
my $key = "$product,$reason";
if ( not exists( $type{$reason} ) ) {
$type{$reason} = $reason;
}
$ccy{$key} = $product;
$cnt{$key} = $count;
$avg{$key} = $lat;
}
close(INPUT);
print Dumper ( \%ccy );
print Dumper ( \%type );
my ( %pair, %details );
foreach my $rows ( sort keys %ccy ) {
print "the key is : $rows and $ccy{$rows}\n";
foreach my $res ( sort keys %type ) {
print "The type is : $res and $type{$res}\n";
}
}

You just need to keep track of your columns and row data when parsing the data structure.
The following demonstrates:
#!/usr/bin/perl
use strict;
use warnings;
my $fh = \*DATA;
my %columns;
my %rows;
while (<$fh>) {
chomp;
my ( $company, $col, $vals ) = split ',', $_, 3;
# Track Columns for later labeling
$columns{$col}++;
$rows{$company}{$col} = $vals;
}
my #columns = sort keys %columns;
# Header
print join( '|', 'PRODUCT', #columns ), "\n";
for my $company ( sort keys %rows ) {
print join( '|', $company, map { $_ // '' } #{ $rows{$company} }{#columns} ), "\n";
}
__DATA__
IBM,Voltality,7,73894756.93897434897
IBM,Market,100,983874.34324
GOOG,Sanity,15,8932748
GOOG,Rate Jump,25,873476378.234234
MBLY,Market,340,23423423432.6783
Outputs:
PRODUCT|Market|Rate Jump|Sanity|Voltality
GOOG||25,873476378.234234|15,8932748|
IBM|100,983874.34324|||7,73894756.93897434897
MBLY|340,23423423432.6783|||

The following code will do the job; rather than using several hashes, I've put all the data in a hash of hashes. I've put comments in the script to explain what is happening in case you are not sure. You can, of course, delete them in your script.
#!/usr/bin/perl
use warnings;
use strict;
my %market;
while (<DATA>) {
next unless /\w/;
# remove line endings
chomp;
# split line by commas -- only split into three parts
my #col = split ",", $_, 3;
# save the data as $market{col0}{col1} = col2
$market{$col[0]}{$col[1]} = $col[2];
}
# create an output file
my $outfile = 'output.txt';
open( my $fh, ">", $outfile ) or die "Could not open $outfile: $!";
my #headers = ('Market','Rate Jump','Sanity','Volatility');
# print out the header line, joined by |
print { $fh } join('|', 'PRODUCT', #headers) . "\n";
# for each product in the market data
for my $p (sort keys %market) {
# print the product name
print { $fh } join('|', $p,
# go through the headers using map (map acts like a "for" loop)
# if the relevant property exists in the market data, print it;
# if not, print nothing
map { $market{$p}{$_} // '' } #headers) . "\n";
}
# this is the input data. You might be reading yours in from a file
__DATA__
IBM,Voltality,7,73894756.93897434897
IBM,Market,100,983874.34324
GOOG,Sanity,15,8932748
GOOG,Rate Jump,25,873476378.234234
MBLY,Market,340,23423423432.6783
Output:
PRODUCT|Market|Rate Jump|Sanity|Volatility
GOOG||25,873476378.234234|15,8932748|
IBM|100,983874.34324|||7,73894756.93897434897
MBLY|340,23423423432.6783|||

How do I print a hash in the same order the elements were added?

Is it possible to use a hash in a LIFO or FIFO way? How can I make sure that the hash is printed in the same order the elements where added?
#!/usr/bin/perl
print "content-type: text/html \n\n";
# BEGINNING HASH
%coins = ( "Quarter" , .25,
"Dime" , .10,
"Nickel", .05 );
# PRINT THE OLD HASH
while (($key, $value) = each(%coins)){
print $key.", ".$value."<br />";
}
# ADD NEW ELEMENT PAIRS
$coins{Penny} = .01;
$coins{HalfDollar} = .50;
# PRINT THE NEW HASH
print "<br />";
while (($key, $value) = each(%coins)){
print $key.", ".$value."<br />";
}
This gives
Nickel, 0.05
Dime, 0.1
Quarter, 0.25
Nickel, 0.05
Dime, 0.1
HalfDollar, 0.5
Penny, 0.01
Quarter, 0.25

You can either keep a separate array of the hash keys, so that you can write
my #coins = qw/ Quarter Dime Nickel Penny HalfDollar /;
for my $key (#coins) {
print "$key, $coins{$key}<br />\n";
}
or you can install and use the Tie::IxHash module, which will keep the hash elements in the order they were inserted, like this
use strict;
use warnings;
use Tie::IxHash;
tie my %coins, 'Tie::IxHash';
print "content-type: text/html \n\n";
%coins = (
Quarter => 0.25,
Dime => 0.10,
Nickel => 0.05,
);
while (my ($key, $value) = each %coins){
print "$key, $value<br />\n";
}
$coins{Penny} = .01;
$coins{HalfDollar} = .50;
print "<br />\n";
while (my ($key, $value) = each(%coins)){
print "$key, $value<br />\n";
}

default hashes don't preserve order - that's the very nature of hashes. You could use a module like Tie::IxHash to preserve the order the items were added.

I have come up with a solution without any need of using Tie::IxHash.
The approach is to add a character/number before the key of the hash.
This will allow us to get the order in which we pushed keys in the hash when keys will be sorted.
Here's the code:
#!/usr/perl/bin -w
use strict;
use Data::Dumper;
my %hash = ();
my #keys = qw(one two three);
my $count = 0;
for(my $i=0; $i<scalar(#keys); $i++) {
$hash{ chr($count) . $keys[$i] } = 1;
$count++;
}
foreach my $key (sort { ord $a <=> ord $b } keys(%hash) ) {
print substr($key, 1) . " => " . $hash{$key}. "\n";
}

How to make LWP and HTML::TableExtract spitting out CSV with Text::CSV

I am currently working on a little parser.
i have had very good results with the first script! This was able to run great!
It fetches the data from the page: http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=20
(note 6142 records) - But note - the data are not separated, so the subequent work with the data is a bit difficult. Therefore i have a second script - see below!
Note - friends helped me with the both scripts. I need to introduce myself as a true novice who needs help in migration two in one. So, you see, my Perl-knowlgedge is not so elaborated that i am able to do the migration into one on my own! Any and all help would be great!
The first script: a spider and parser: it spits out the data like this:
lfd. Nr. Schul- nummer Schulname Straße PLZ Ort Telefon Fax Schulart Webseite
1 0401 Mädchenrealschule Marienburg, Abenberg, der Diözese Eichstätt Marienburg 1 91183  Abenberg  09178/509210 Realschulen mrs-marienburg.homepage.t-online.de
2 6581 Volksschule Abenberg (Grundschule) Güssübelstr. 2 91183  Abenberg  09178/215 09178/905060 Volksschulen home.t-online.de/home/vs-abenberg
3 6913 Mittelschule Abenberg  Güssübelstr. 2 91183  Abenberg  09178/215 09178/905060 Volksschulen home.t-online.de/home/vs-abenberg
4 0402 Johann-Turmair-Realschule Staatliche Realschule Abensberg Stadionstraße 46 93326  Abensberg  09443/9143-0,12,13 09443/914330 Realschulen www.rs-abensberg.de
But i need to separate the data: with commas or someting like that!
And i have a second script. This part can do the CSV-formate. i want to ombine it with the spider-logic. But first lets have a look at the first script: with the great spider-logic.
see the code that is appropiate:
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use LWP::Simple;
use Cwd;
use POSIX qw(strftime);
my $te = HTML::TableExtract->new;
my $total_records = 0;
my $suchbegriffe = "e";
my $treffer = 50;
my $range = 0;
my $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q=";
my $processdir = "processing";
my $counter = 50;
my $displaydate = "";
my $percent = 0;
&workDir();
chdir $processdir;
&processURL();
print "\nPress <enter> to continue\n";
<>;
$displaydate = strftime('%Y%m%d%H%M%S', localtime);
open OUTFILE, ">webdata_for_$suchbegriffe\_$displaydate.txt";
&processData();
close OUTFILE;
print "Finished processing $total_records records...\n";
print "Processed data saved to $ENV{HOME}/$processdir/webdata_for_$suchbegriffe\_$displaydate.txt\n";
unlink 'processing.html';
die "\n";
sub processURL() {
print "\nProcessing $url_to_process$suchbegriffe&a=$treffer&s=$range\n";
getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'tempfile.html') or die 'Unable to get page';
while( <tempfile.html> ) {
open( FH, "$_" ) or die;
while( <FH> ) {
if( $_ =~ /^.*?(Treffer <b>)(d+)( - )(d+)(</b> w+ w+ <b>)(d+).*/ ) {
$total_records = $6;
print "Total records to process is $total_records\n";
}
}
close FH;
}
unlink 'tempfile.html';
}
sub processData() {
while ( $range <= $total_records) {
getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'processing.html') or die 'Unable to get page';
$te->parse_file('processing.html');
my ($table) = $te->tables;
for my $row ( $table->rows ) {
cleanup(#$row);
print OUTFILE "#$row\n";
}
$| = 1;
print "Processed records $range to $counter";
print "\r";
$counter = $counter + 50;
$range = $range + 50;
$te = HTML::TableExtract->new;
}
}
sub cleanup() {
for ( #_ ) {
s/s+/ /g;
}
}
sub workDir() {
# Use home directory to process data
chdir or die "$!";
if ( ! -d $processdir ) {
mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!";
}
}
But as this-above script-unfortunatley does not take care for the separators i have had to take care for a method, that does look for separators. In order to get the data (output) separated.
So with the separation i am able to work with the data - and store it in a mysql-table.. or do something else...So here [below] are the bits - that work out the csv-formate Note - i want to put the code below into the code above - to combine the spider-logic of the above mentioned code with the logic of outputting the data in CSV-formate.
where to set in the code Question: can we identify this point to migrate the one into the other... !?
That would be amazing... I hope i could make clear what i have in mind...!? Are we able to use the benefits of the both parts (/scripts ) migrating them into one?
So the question is: where to set in with the CSV-Script into the script (above)
#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
use HTML::TableExtract;
use Text::CSV;
my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=20';
$html =~ tr/\r//d; # strip carriage returns
$html =~ s/ / /g; # expand spaces
my $te = new HTML::TableExtract();
$te->parse($html);
my #cols = qw(
rownum
number
name
phone
type
website
);
my #fields = qw(
rownum
number
name
street
postal
town
phone
fax
type
website
);
my $csv = Text::CSV->new({ binary => 1 });
foreach my $ts ($te->table_states) {
foreach my $row ($ts->rows) {
# trim leading/trailing whitespace from base fields
s/^\s+//, s/\s+$// for #$row;
# load the fields into the hash using a "hash slice"
my %h;
#h{#cols} = #$row;
# derive some fields from base fields, again using a hash slice
#h{qw/name street postal town/} = split /\n+/, $h{name};
#h{qw/phone fax/} = split /\n+/, $h{phone};
# trim leading/trailing whitespace from derived fields
s/^\s+//, s/\s+$// for #h{qw/name street postal town/};
$csv->combine(#h{#fields});
print $csv->string, "\n";
}
}
The thing is that i have had very good results with the first script! It fetches the data from the page: http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=20
(note 6142 records) - But note - the data are not separated...!
And i have a second script. This part can do the CSV-formate. i want to combine it with the spider-logic.
where is the part to insert? I look forward to any and all help.
if i have to be more precice - just let me know...

Since you have entered a complete script, I'll assume you want critique of the whole thing.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use LWP::Simple;
use Cwd;
use POSIX qw(strftime);
my $te = HTML::TableExtract->new;
Since you only use $te in one block, why are you declaring and initializing it in this outer scope? The same question applies to most of your variables -- try to declare them in the innermost scope possible.
my $total_records = 0;
my $suchbegriffe = "e";
my $treffer = 50;
In general, english variable names will enable you to collaborate with far more people than german names. I understand german, so I understand the intent of your code, but most of SO doesn't.
my $range = 0;
my $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q=";
my $processdir = "processing";
my $counter = 50;
my $displaydate = "";
my $percent = 0;
&workDir();
Don't use & to call subs. Just call them with workDir;. It hasn't been necessary to use & since 1994, and it can lead to a nasty gotcha because &callMySub; is a special case which doesn't do what you might think, while callMySub; does the Right Thing.
chdir $processdir;
&processURL();
print "\nPress <enter> to continue\n";
<>;
$displaydate = strftime('%Y%m%d%H%M%S', localtime);
open OUTFILE, ">webdata_for_$suchbegriffe\_$displaydate.txt";
Generally lexical filehandles are preferred these days: open my $outfile, ">file"; Also, you should check for errors from open or use autodie; to make open die on failure.
&processData();
close OUTFILE;
print "Finished processing $total_records records...\n";
print "Processed data saved to $ENV{HOME}/$processdir/webdata_for_$suchbegriffe\_$displaydate.txt\n";
unlink 'processing.html';
die "\n";
sub processURL() {
print "\nProcessing $url_to_process$suchbegriffe&a=$treffer&s=$range\n";
getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'tempfile.html') or die 'Unable to get page';
while( <tempfile.html> ) {
open( FH, "$_" ) or die;
while( <FH> ) {
if( $_ =~ /^.*?(Treffer <b>)(d+)( - )(d+)(</b> w+ w+ <b>)(d+).*/ ) {
$total_records = $6;
print "Total records to process is $total_records\n";
}
}
close FH;
}
unlink 'tempfile.html';
}
sub processData() {
while ( $range <= $total_records) {
getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'processing.html') or die 'Unable to get page';
$te->parse_file('processing.html');
my ($table) = $te->tables;
for my $row ( $table->rows ) {
cleanup(#$row);
print OUTFILE "#$row\n";
This is the line to change if you want to put commas in separating your data. Look at the join function, it can do what you want.
}
$| = 1;
print "Processed records $range to $counter";
print "\r";
$counter = $counter + 50;
$range = $range + 50;
$te = HTML::TableExtract->new;
}
It's very strange to initialize $te at the end of the loop instead of the beginning. It's much more idiomatic to declare and initialize $te at the top of the loop.
}
sub cleanup() {
for ( #_ ) {
s/s+/ /g;
Did you mean s/\s+/ /g;?
}
}
sub workDir() {
# Use home directory to process data
chdir or die "$!";
if ( ! -d $processdir ) {
mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!";
}
}
I haven't commented on your second script; perhaps you should ask it as a separate question.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Extract data and store to db - perl

my #d1 = split /\n\n/, $upload; my #data; foreach (#d1) { my #a = split /\n/; $a[2] =~ m/\|BINF : (.+)/; my $binf = $1; $a[3] =~ m/\|PRICE: (.+)/; my $price = $1; push #data, {binf => $binf, price => $price}; } foreach (#data) { say $_->{binf}; say $_->{price}; }

Related

Calculations from db values perl cgi

perl match part of string in two files

rearrange data from one column to a row

How do I print a hash in the same order the elements were added?

How to make LWP and HTML::TableExtract spitting out CSV with Text::CSV

Categories

Resources