I tried to run the following Perl script on the HTML further below. My problem is how to define the correct hash reference, with attribs that specify attributes of interest within my HTML <table> tag itself.
#!/usr/bin/perl
use strict; use warnings;
use HTML::TableExtract;
use YAML;
my $table = HTML::TableExtract->new(keep_html=>0, depth => 1, count => 1, br_translate => 0 );
$table->parse($html);
foreach my $row ($table->rows)
sub cleanup {
for ( #_ ) {
s/\s+//;
s/[\xa0 ]+\z//;
s/\s+/ /g;
}
}
{ print join("\t", #$row), "\n"; }
I want to apply this code on the HTML-document you see further below.
My first approach is to do this with the columns method. But i am not able to figure out how to use the columns method on the below HTML-file: My intuition makes me think it should be something like the following (but my intuition is wrong):
foreach my $column ($table->columns) {
print join("\t", #$column), "\n";
}
The HTML::TableExtract documentation doesn't shed much light (for me anyway).
I can see in the code of the module that the columns method belongs to HTML::TableExtract::Table, but I can't figure out how to use it. I appreciate any help.
Background:
I try to get the table extracted and I have a very very small document of tables that i want to parse with the HTML::TableExtract module I am trying to search for keywords in the HTML - so that i can take them for the attribs I have to print only the necessary data.
I tried going CPAN but could not really find how to search through it for particular keywords. One way to do it would be HTML::TableExtract - the other way would be to parse with HTML::TokeParser I have very little experience with HTML::TokeParser.
Well - one or the other way i need to do this parsing: I want to output the result of the parsed tables into some .text - or even better store it into a database. The problem here is I cant find anyway to search through the resulting parsed table and get necessary data.
The HTML
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
<link rel="stylesheet" href="jspsrc/css/bp_style.css" type="text/css">
<title>Weitere Schulinformationen</title>
</head>
<body class="bodyclass">
<div style="text-align:center;"><center>
<!-- <fieldset><legend> general information </legend>
-->
<br/>
<table border="1" cellspacing="0" bordercolordark="white" bordercolorlight="black" width="80%" class='bp_result_tab_info'>
<!-- <table border="0" cellspacing="0" bordercolordark="white" bordercolorlight="black" width="80%" class='bp_search_info'>
-->
<tr>
<td width="100%" colspan="2" class="ldstabTitel"><strong>data_one </strong></td>
</tr>
<tr>
<td width="27%"><strong>data_two</strong></td>
<td width="73%"> 116439
</td>
</tr>
<tr>
<td width="27%"><strong>official_description</strong></td>
<td width="73%">the name </td>
</tr>
<tr>
<td width="27%"><strong>name of the street</strong></td>
<td width="73%">champs elysee</td>
</tr>
<tr>
<td width="27%"><strong>number and town</strong></td>
<td width="73%"> 75000 paris </td>
</tr>
<tr>
<td width="27%"><strong>telefon</strong></td>
<td width="73%"> 000241 49321
</td>
</tr>
<tr>
<td width="27%"><strong>fax</strong></td>
<td width="73%"> 000241 4093287
</td>
</tr>
<tr>
<td width="27%"><strong>e-mail-adresse</strong></td>
<td width="73%"> <a href=mailto:1111116439#my_domain.org>1222216439#site.org</a>
</td>
</tr>
<tr>
<td width="27%"><strong>internet-site</strong></td>
<td width="73%"> <a href=http://www.thesite.org>http://www.thesite.org</td>
</tr>
<!--
<tr>
<td width="27%"> </td>
<td width="73%" align="right"><a href="schule_aeinfo.php?SNR=<? print $SCHULNR ?>" target="_blank">
[Schuldaten ändern] </a>
</tr>
</td> -->
<tr>
<td width="27%"> </td>
<td width="73%">the department</td>
</tr>
<tr>
<td width="100%" colspan=2><strong> </strong></td>
</tr>
<tr>
<td width="27%"><strong>number of indidviduals</strong></td>
<td width="73%"> 192</td>
<tr>
<td width="100%" colspan=2><strong> </strong></td>
</tr>
<!-- if (!fsp.isEmpty()){
ztext = " ";
int i = 0;
Iterator it = fsp.iterator();
while (it.hasNext()){
String[] zwert = new String[2];
zwert = (String[])it.next();
if (i==0){
if (zwert[1].equals("0")){
ztext = ztext+zwert[0];
}else{
ztext = ztext+zwert[0]+" mit "+zwert[1];
if (zwert[1].equals("1")){
ztext = ztext+" Schüler";
}else{
ztext = ztext+" Schülern";
}
}
i++;
}else{
if (zwert[1].equals("0")){
ztext = ztext+"<br> "+zwert[0];
}else{
ztext = ztext+"<br> "+zwert[0]+" mit "+zwert[1];
if (zwert[1].equals("1")){
ztext = ztext+" Schüler";
}else{
ztext = ztext+" Schülern";
}
}
}
}
-->
</table>
<!-- </fieldset> -->
<br>
</body>
</html>
Thanks for any and all help.
You need to provide something that uniquely identifies the table in question. This can be the content of its headers or the HTML attributes. In this case, there is only one table in the document, so you don't even need to do that. But, if I were to provide anything to the constructor, I would provide the class of the table.
Also, I do not think you want the columns of the table. The first column of this table consists of labels and the second column consists of values. To get the labels and values at the same time, you should process the table row-by-row.
#!/usr/bin/perl
use strict; use warnings;
use HTML::TableExtract;
use YAML;
my $te = HTML::TableExtract->new(
attribs => { class => 'bp_result_tab_info' },
);
$te->parse_file('t.html');
for my $table ( $te->tables ) {
print Dump $table->columns;
}
Output:
---
- 'data_one '
- data_two
- official_description
- name of the street
- number and town
- telefon
- fax
- e-mail-adresse
- internet-site
- á
- á
- number of indidviduals
- á
---
- ~
- "á116439\r\n "
- 'the name '
- champs elysee
- ' 75000 paris '
- "á000241 49321\r\n"
- "á000241 4093287\r\n"
- "á1222216439#site.org\r\n"
- áhttp://www.thesite.org
- the department
- ~
- á192
- ~
Finally, a word of advice: It is clear that you do not have much of an understanding of Perl (or HTML for that matter). It would be better for you to try to learn some of the basics first. This way, all you are doing is incorrectly copying and pasting code from one answer into another and not learning anything.
Related
I am making a report for the invoice line, I have purchased a module in the third-party odoo store and it performs its function well.
But I can't see the discount on the invoice line.
I think this is because the module prevents me, but I already have no developer support.
What I need is that the discount (price list) can be seen on the invoice line.
What table or what element of the invoice line discount?
I leave you the code that I have in the report
''''
<tbody class="invoice_tbody">
<tr t-foreach="invoice_lines[0]" t-as="line">
<td><b><span t-esc="line['client_ref']"/></b>
<span t-esc="line['description']"/></td>
<td class="text-right">
<span t-esc="line['qty']"/>
</td>
<td class="text-right">
<span t-esc="line['price_unit']"/>
</td>
<td t-if="display_discount" class="text-right">
</td>
<td class="text-right" id="subtotal">
<t t-if="line['price_subtotal']">
<span t-esc = "line ['price_subtotal']" t-options = "{& quot; widget & quot ;: & quot; monetario & quot ;, & quot; display_currency & quot ;: o.currency_id}" /> </t>
</td>
</tr>
<tr t-foreach = "range (max (5-len (o.invoice_line_ids), 0))" t-as = "l">
<td t-translation = "off"> & amp; nbsp; </td>
<td class = "hidden" />
<td />
<td />
<td t-if = "display_discount" />
<td />
<td />
</tr>
</tbody>
</t>
'''
Yes, this parameter is in the report
"view / report_invoice_document"
But the report that I try to modify is this
report_invoice_document_inherit
<?xml version="1.0"?>
<data inherit_id="account.report_invoice_document">
<xpath expr="//table[#name='invoice_line_table']/tbody" position="replace">
<t t-if="res_company.is_group_by_so">
<t t-set="invoice_lines" t-value="o.get_invoice_lines()"/>
<tbody class="invoice_tbody">
<tr t-foreach="invoice_lines[0]" t-as="line">
<td><b><span t-esc="line['client_ref']"/></b>
<span t-esc="line['description']"/></td>
<!-- <td class="hidden"><span t-esc="line['client_ref']"/></td> -->
<td class="text-right">
<span t-esc="line['qty']"/>
<!-- <span t-field="l.uom_id" groups="product.group_uom"/> -->
</td>
<td class="text-right">
<span t-esc="line['price_unit']"/>
</td>
</td>
<td t-if="display_discount" class="text-right">
<!-- <span t-esc="line['price_unit']"/> -->
</td>
<td class="text-right" id="subtotal">
<t t-if="line['price_subtotal']">
<span t-esc="line['price_subtotal']" t-options="{"widget": "monetary", "display_currency": o.currency_id}"/></t>
</td>
</tr>
<tr t-foreach="range(max(5-len(o.invoice_line_ids),0))" t-as="l">
<td t-translation="off"> </td>
<td class="hidden"/>
<td/>
<td/>
<td t-if="display_discount"/>
<td/>
<td/>
</tr>
</tbody>
</t>
<t t-else="">
<tbody class="invoice_tbody">
<tr t-foreach="o.invoice_line_ids" t-as="l">
<td><span t-field="l.name"/></td>
<td class="hidden"><span t-field="l.origin"/></td>
<td class="text-right">
<span t-field="l.quantity"/>
<span t-field="l.uom_id" groups="product.group_uom"/>
</td>
<td class="text-right">
<span t-field="l.price_unit"/>
</td>
<td t-if="display_discount" class="text-right">
<span t-field="l.discount"/>
</td>
<td class="text-right">
<span t-esc="', '.join(map(lambda x: (x.description or x.name), l.invoice_line_tax_ids))"/>
</td>
<td class="text-right" id="subtotal">
<span t-field="l.price_subtotal" t-options="{"widget": "monetary", "display_currency": o.currency_id}"/>
</td>
</tr>
<tr t-foreach="range(max(5-len(o.invoice_line_ids),0))" t-as="l">
<td t-translation="off"> </td>
<td class="hidden"/>
<td/>
<td/>
<td t-if="display_discount"/>
<td/>
<td/>
</tr>
</tbody>
</t>
</xpath>
</data>
I have tried to modify the second report, and put and have looked at the python code in case something
invoice_report_grouped_by \ report \ account_invoice.py
# -*- coding: utf-8 -*-
from odoo import api, models
from datetime import datetime
class AccountInvoice(models.Model):
_inherit = "account.invoice"
def get_notation_amt(self, amt):
'''This method help us to return the value of the product pricing'''
amount = str(amt).split('.')
if len(amount) == 2:
amount = amount[0] + "," + amount[1]
return amount
return amt
#api.multi
def get_product_invoice_lines(self, client_ref=False):
'''This method helps to get the data for the following Invoice Line.'''
product_invoices = []
client_order_ref = []
for line in self.invoice_line_ids:
sale_line = (False, line)
if line.sale_line_ids:
sale_line = (line.sale_line_ids[0].order_id, line)
client_order_ref.append(sale_line)
if client_order_ref:
for ref in client_order_ref:
if (client_ref == ref[0]):
product_invoices.append({'price_subtotal': ref[1].price_unit * ref[1].quantity,
'default_code': ref[1].product_id.default_code,
'client_ref': False,
'discount': ref[1].discount,
'taxes': ",".join(map(lambda x: (x.description or x.name), ref[1].invoice_line_tax_ids)),
'description': ref[1].name,
'qty': self.get_notation_amt(ref[1].quantity),
'price_unit': self.get_notation_amt("{0:.3f}".format(ref[1].price_unit)),
})
else:
for line in self.invoice_line_ids:
product_invoices.append({'price_subtotal': line.price_unit * line.quantity,
'default_code': line.product_id.default_code,
'client_ref': False,
'discount': line.discount,
'taxes': ",".join(map(lambda x: (x.description or x.name), ref[1].invoice_line_tax_ids)),
'description': line.name,
'qty': self.get_notation_amt(line.quantity),
'price_unit': self.get_notation_amt("{0:.3f}".format(line.price_unit)),
})
return product_invoices
#api.multi
def get_invoice_lines(self):
'''This method help to get the invoice line group by Sale order'''
vals = []
sale_order_lines = []
false_sale_order_lines = []
for line in self.invoice_line_ids:
sale_line = False
if line.sale_line_ids:
sale_line = line.sale_line_ids[0].order_id
if sale_line:
sale_order_lines.append(sale_line)
else:
false_sale_order_lines.append(sale_line)
sale_order_lines = list(set(sale_order_lines))
false_sale_order_lines = list(set(false_sale_order_lines))
for sale_order in sale_order_lines:
if sale_order and self.origin:
confirmation_date = str(
sale_order.confirmation_date, '%d-%m-%Y %H:%M:%S').strftime('%d/%m/%Y')
client_ref = sale_order.name + ' - ' + confirmation_date
if sale_order.client_order_ref:
client_ref = client_ref + ' - ' + sale_order.client_order_ref
vals.append({'price_subtotal': False, 'default_code': False,
'client_ref': client_ref, 'description': False,
'qty': False, 'price_unit': False, 'taxes': False, 'discount': False})
vals.extend(self.get_product_invoice_lines(client_ref=sale_order))
# for sort false sale order, display manually invoice line at last
for so in false_sale_order_lines:
vals.extend(self.get_product_invoice_lines(client_ref=so))
return [vals, len(vals)]
You can see the default report here:
https://github.com/odoo/odoo/blob/06f9baae968674547cb2592b1c22147bfb2e8ba9/addons/account/views/report_invoice.xml#L49
<t t-set="display_discount" t-value="any([l.discount for l in o.invoice_line_ids])"/>
This means that if any line has a discount, it should display it.
I think there are two options to disable it. One is to remove that line from the report, or the second option is to set display_discount to false.
Knowing the module that breaks your report, the problem should be easy to find.
But the exact reason is hard to tell without seeing your module.
I'm a beginner and I'm trying to write a Selenium web driver. I'm using eclipse and trying to locate a link available in next column. In each row I have different link. For example in my employee data table I have 2 columns and 10 rows in one page. The first column contains the name of person and second column contains its employee ID (ID is a hyperlink). I'm trying to select the hyperlink of any employee but I'm not able to do it.
I have below the HTML code and sample script.
<span id="37">
<div class="clear"></div>
<div id="FC_Schema.1021.WIP" type="DataCheckingList" name="FC_Schema.1021.WIP">
<input id="baseurl" type="hidden" value="/Web/" name="baseurl">
<input id="hdnDcCnt" type="hidden" value="13" name="hdnDcCnt">
<input id="PageNo" type="hidden" value="1" name="PageNo">
<meta content="width=device-width" name="viewport">
<style type="text/css">
<div id="DataCheckingLoad">
<div id="SupplyChainTabs">
<div id="gbo" class="InnerAlertsTabs">
<table id="one" class="draggable" width="100%">
<thead>
<tbody>
<tr>
<td valign="top" align="center">
<td valign="top">
<td valign="top">
<a onclick="return ButtonClick(this,'TxnyD.Communities.2.1','Org.2000476_Product.THQS|SMS|Questionnaire.WIP_Questionnaire_TxnyD.Communities.2.1_THQS_TxnyD.Operationaloffice.1.2');" href="#f">Axiom
Process LLC</a>
<div class="country-name">
<div class="CompanyInfoContactLinks">
</td>
<td valign="top">
THQS
<div class="clear"> </div>
<a id="Org.2000476_Product.THQS|SMS|Questionnaire.WIP_Questionnaire_TxnyD.Communities.2.1_THQS_TxnyD.Operationaloffice.1.2" onclick="RedirectToQuestionnairePage(id)" href="#">THQS</a>
<div class="clear"> </div>
</td>
In the above code first column contains company name i.e "Axiom
Process LLC" and second column contains company subcription name i.e."THQS". I have to select the THQS link of company "Axiom Process LLC"
Selenium script designed for it as below
public static void main(String[] args) throws Exception {
WebDriver driver = new FirefoxDriver();
driver.manage().window().maximize();
driver.get("http://XXX.XXX.XXX.XXX/Web");
Thread.sleep(1000);
driver.findElement(By.xpath("//input[#id='UserName']")).sendKeys("Gbouser.1");
driver.findElement(By.xpath("//input[#id='Password']")).sendKeys("****");
driver.findElement(By.xpath("//input[#name='Login']")).click();
driver.findElement(By.xpath("//a[contains(text(),'Task')]")).click();
driver.findElement(By.xpath("//a[contains(text(),'Data Checking')]")).click();
driver.findElement(By.partialLinkText("Axiom")).findElement(By.xpath("//a[contains(text(),'THQS')]")).click();
}
So basically these are dynamic tables and each time it will have different records. How can select the link available in column?
I also tried below script as well:
driver.findElement(By.name("*[id^='CLS'][id$='Demolition Limited']")).findElement(By.xpath("//a[contains(text(),'THQS')])[3]")).click();
welcome to Stackoverflow. Before answering your question, just a few pointers. Think that you had a downvote 'cause you didn't format your question well. Otherwise a perfectly legal question, and you would receive help much earlier.
Secondly, don't ever put the real Web site address, username and password :). I was able to login, and understand exactly what you need, but a malicious guy could do bad wonders there.
To answer your question, instead of the line that is causing you problems, try this
Thread.sleep(3000);
TablePageObject tablePageObject = PageFactory.initElements(driver, TablePageObject.class);
tablePageObject.clickLink("Axiom");
Where TablePageObject is
public class TablePageObject {
private WebDriver driver;
#FindBy(css = "table tr")
private List<WebElement> allTableRows; // find all the rows of the table
public TablePageObject(WebDriver driver) {
this.driver = driver;
}
public void clickLink(String companyName) {
for(WebElement row : allTableRows) {
List<WebElement> links = row.findElements(By.cssSelector("a"));
// the first link by row is the company name, the second is link to be clicked
if (links.get(0).getText().contains(companyName)) {
links.get(3).click();
}
}
}
}
An explanation, TablePageObject class will be your programmatic handle for the dynamic table you're trying to control. It follows a page factory pattern of selenium, which I heartly recommend, and you can learn more here https://code.google.com/p/selenium/wiki/PageFactory
It basically keeps a collection of rows of your table, which more or less looks like:
<tr>
<td valign="top" align="center">
// irelevant
</td>
<td valign="top">
// irelevant
</td>
<td valign="top">
<a>Axiom Process LLC</a>
<a>company details</a>
<a>web details</a>
</td>
<td valign="top">
<a>THQS</a>
</td>
<td valign="top" align="center" style="display: none">
// irelevant
</td>
<td valign="top">
//irelevant
</td>
<td valign="top">
// irelevant
</td>
<td valign="top">
</td>
<td>
</td>
</tr>
You collect all the links from the table, check the company name against the first link, and if matched, click the fourth link,
Hope it helps, best,
by the way, I've masked your password in the question
I'm needing some help with Sed. I'm using it on Windows and Mac OSX. I need to Sed to add a
</tr>
<tr>
every 4 lines, after the first <tr> found, and stop doing it on </tr>
i Just can't find a way to doing this.
Every file will have up to 20 tables, so i need to do it automatically...
changing from this
<div class="titulo"> TERMINAL CAPAO DA IMBUIA</div>
<div class="dataedia">
Válido a partir de: 30/07/2012 -
DIA ÚTIL</div>
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>
to this
<div class="titulo"> TERMINAL CAPAO DA IMBUIA</div>
<div class="dataedia">
Válido a partir de: 30/07/2012 -
DIA ÚTIL</div>
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
</tr>
<tr>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
</tr>
<tr>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
</tr>
<tr>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
</tr>
<tr>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
</tr>
<tr>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
</tr>
<tr>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
</tr>
<tr>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
</tr>
<tr>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>
Is it possible with sed? If not, what tool should i use?
Thanks
I don't like the idea of using sed to handle HTML code. Said that, try with this:
Content of script.sed:
## For every line between '<tr>' and '</tr>' do ...
/<tr>/,/<\/tr>/ {
## Omit range edges.
/<\/\?tr>/ b;
## Append '<td>...</td>' to Hold Space (HS).
H;
## Get HS to Pattern Space (PS) to work with it.
x;
## If there are at least four newline characters means that exists four
## '<td>' tags too, so add a '<tr>' before them and a '</tr>' after them,
## print, and delete them (already processed).
/\(\n[^\n]*\)\{4\}/ {
s/^\(\n\)/<tr>\1/;
s/$/\n<\/tr>/;
p
s/^.*$//;
}
## Save the '<td>'s to HS again and read next line.
x;
b;
}
## Print all lines out of the range.
p;
Assuming infile with the data posted in the question, run the script like:
sed -nf script.sed infile
That yields:
<div class="titulo"> TERMINAL CAPAO DA IMBUIA</div>
<div class="dataedia">
Válido a partir de: 30/07/2012 -
DIA ÚTIL</div>
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
</tr>
<tr>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
</tr>
<tr>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
</tr>
<tr>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
</tr>
<tr>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
</tr>
<tr>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
</tr>
<tr>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
</tr>
<tr>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
</tr>
<tr>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>
try awk
awk '{print}; /<td>/ && ++i==4 {print "</tr>\n<tr>"; i=0}' file
print the line
if it's a <td> then increase i
if i is 4 print </tr><tr> and reset i
Testing with given input the desired output is returned,
with the only "problem" that an extra <tr></tr> appears at the end of the list.
This is fixable but I'm running out of time here.
When I get back I can look into it if you think it is needed.
... part of the end of the result file
<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
<tr> <-- extra <tr></tr> here
</tr>
</table>
you can try with regular expressions. You can test following expression on:
http://gskinner.com/RegExr/
Catch expression:
?</td>.<td>.*?</td>.<td>.*?</td>.<td>.*?</td>)(?!.</tr>)
Replace expression:
$1\n</tr>\n<tr>
Flags checked:
global, ignorecase, dotall
Result:
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
</tr>
<tr>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
</tr>
<tr>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
</tr>
<tr>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
</tr>
<tr>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
</tr>
<tr>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
</tr>
<tr>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
</tr>
<tr>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
</tr>
<tr>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>
You can use editor like Notepad++ for batch replace on many files at once (syntax will be little different).
sed '\!<td>!,\!</table!{N;N;N;i\
</tr>\
<tr>
}' input_file
Perl solution, still using regular expression instead of parsing HTML:
perl -pe '
undef $inside if m{</tr>};
if ($inside and ($. % 4) == $tr_line) {
print "</tr>\n<tr>\n";
}
$inside = 1 if defined $tr_line;
$tr_line = ($. + 1) % 4 if /<tr>/;
' file
Using xsh:
open :F html file ; # Open as html.
while //table/tr[count(td)>4] wrap :U position()=8 tr //table/tr/td ; # Wrap four td's into a tr.
xmove :r //table/tr/tr before .. ; # Unwrap the extra tr.
remove //table/tr[last()] ; # Remove the extra tr.
My tables' rows in HTML are as follows,
<TR bgcolor="#FFFFFF" onmouseover="this.bgColor='#DBE9FF';" onmouseout="this.bgColor='#FFFFFF';">
<TD class="dlfont">07/01/2011 10:33 AM EDT</B> </TD>
<TD class="dlfont">DRB</B> </TD><TD class="dlfont">Blah</B> </TD>
<TD class="dlfont">PPD</B> </TD><TD class="dlfont"> </B> </TD>
<TD class="dlfont">07/01/2011</B> </TD>
<TD width=50 align=center><IMG border='0' src='/images/view.gif' height=10 width=19></TD>
</TR>
<TR bgcolor="#EEEEEE" onmouseover="this.bgColor='#DBE9FF';" onmouseout="this.bgColor='#EEEEEE';">
<TD class="dlfont">07/01/2011 10:33 AM EDT</B> </TD>
<TD class="dlfont">WHPSF</B> </TD>
<TD class="dlfont">Blah</B> </TD>
<TD class="dlfont"> </B> </TD>
<TD class="dlfont"> </B> </TD>
<TD class="dlfont">07/01/2011</B> </TD>
<TD width=50 align=center><IMG border='0' src='/images/view.gif' height=10 width=19></TD>
</TR>
When I extract the rows using HTML::TableExtract, the extra characters </B> also appear at the end and form some kind of special character. How can I get rid of this?
I would keep in mind two things when using HTML::TableExtract with the badly formatted HTML in your question
use keep_html=>1 in the HTML::TableExtract constructor
use a regex to remove the </B> , carefully
Here's some Perl code I wrote to prune the </B> out of the table cells, but note, this could change validly formatted HTML to badly formatted HTML if you blindly apply it in all cases.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
my($f) = #ARGV;
open F,$f;
my $html = join '',<F>;
close F;
### your html didn't include headers, so I added a first table row with td text, time a b c d e f, to help HTML::TableExtract find the table in file, $f
my $te = HTML::TableExtract->new(
keep_html=>1,
headers=>[qw/ time a b c d e f/]);
$te->parse($html);
for my $ts($te->tables)
{
print "Table(",join(',',$ts->coords),":\n";
for my $row ($ts->rows)
{
for my $cell (#$row)
{
next unless $cell;
## maybe add $ at end of regex or other test here to make sure valid cases of <B>...</B> are not affected
$cell =~ s/<\/B> //i;
print $cell."\n";
}
}
}
I need to be able to declare variables and after some markup later I
need to reference them. In order to accomplish this, this is
simplified version of my scala template:
#(map1:
java.util.LinkedHashMap[String,java.util.LinkedHashMap[String,Object]])
#import scala.collection.JavaConversions._
#import play.Logger
#for( (key,value) <- map1) {
<div>
#{
val rmap = Foo.someMethod(value)
val baz = rmap.getOrElse("baz", null)
<table border="0" cellpadding="0" cellspacing="0" >
<tbody>
<tr>
<td rowspan="3">
<div class="bar">
#baz
</div>
</td>
</tr>
</tbody>
</table>
}
</div>
}
Is above valid scala template and if not how can I declare baz and
reference it later in the markup?
I am using 1.2.2RC2 and scala 0.9.1
I was curious so did some digging. See https://groups.google.com/forum/#!topic/play-framework/Mo8hl5I0tBQ - there is no way at the moment, but an interesting work-around is shown. Define utils/Let.scala:
package utils
Object Let {
def let[A,B](a:A)(f:A=>B):B = f(a)
}
and then
#import utils.Let._
#let(2+3){ answer =>
#answer <hr> #answer
}
It's a very functional way of handling it, but then, what'd you expect in Scala :)
You can just use a for comprehension:
#for( (key,value) <- map1;
rmap = Foo.someMethod(value);
baz = rmap.getOrElse("baz", null)
) {
<div>
<table border="0" cellpadding="0" cellspacing="0" >
<tbody>
<tr>
<td rowspan="3">
<div class="bar">
#baz
</div>
</td>
</tr>
</tbody>
</table>
</div>
}
... and if you don't have anything you need to loop over, you can just say #for(i <- List(1); <declare variables>){<html here>}