Generate HTML from stats model summary - multiple-regression

I have the following code to model a regression and print the summary to a log file
#Finding the model fit using the multiple regression
fit = smf.ols(self.formula_string, data=df_train).fit()
fit_parameters = str(fit.params)
fit_summary = str(fit.summary())
logger.info('fit_summary' + fit_summary)
As we know the summary has a table followed by a grid. Can the grid part alone, of the summary, (in blue in this sample image below), be converted to a HTML file ?

The summary of OLS is build from 3 separate tables. Each of the tables can be converted separately to string/text, html or latex
res is an OLS results instance returned by the fit method in the following
>>> summ = res.summary()
>>> dir(summ)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
'__subclasshook__', '__weakref__', '_repr_html_', 'add_extra_txt',
'add_table_2cols', 'add_table_params', 'as_csv', 'as_html', 'as_latex',
'as_text', 'extra_txt', 'tables']
>>> len(summ.tables)
3
>>> summ.tables[1].as_html()
'<table class="simpletable">\n<tr>\n <td></td> <th>coef</th> <th>std err</th> <th>t</th> <th>P>|t|</th> <th>[0.025</th> <th>0.975]</th> \n</tr>\n<tr>\n <th>C(Region)[C]</th> <td> 38.6517</td> <td> 9.456</td> <td> 4.087</td> <td> 0.000</td> <td> 19.826</td> <td> 57.478</td>\n</tr>\n<tr>\n <th>C(Region)[E]</th> <td> 23.2239</td> <td> 14.931</td> <td> 1.555</td> <td> 0.124</td> <td> -6.501</td> <td> 52.949</td>\n</tr>\n<tr>\n <th>C(Region)[N]</th> <td> 28.6347</td> <td> 13.127</td> <td> 2.181</td> <td> 0.032</td> <td> 2.501</td> <td> 54.769</td>\n</tr>\n<tr>\n <th>C(Region)[S]</th> <td> 34.1034</td> <td> 10.370</td> <td> 3.289</td> <td> 0.002</td> <td> 13.459</td> <td> 54.748</td>\n</tr>\n<tr>\n <th>C(Region)[W]</th> <td> 28.5604</td> <td> 10.018</td> <td> 2.851</td> <td> 0.006</td> <td> 8.616</td> <td> 48.505</td>\n</tr>\n<tr>\n <th>Literacy</th> <td> -0.1858</td> <td> 0.210</td> <td> -0.886</td> <td> 0.378</td> <td> -0.603</td> <td> 0.232</td>\n</tr>\n<tr>\n <th>Wealth</th> <td> 0.4515</td> <td> 0.103</td> <td> 4.390</td> <td> 0.000</td> <td> 0.247</td> <td> 0.656</td>\n</tr>\n</table>'
>>> print(summ.tables[1])
================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------
C(Region)[C] 38.6517 9.456 4.087 0.000 19.826 57.478
C(Region)[E] 23.2239 14.931 1.555 0.124 -6.501 52.949
C(Region)[N] 28.6347 13.127 2.181 0.032 2.501 54.769
C(Region)[S] 34.1034 10.370 3.289 0.002 13.459 54.748
C(Region)[W] 28.5604 10.018 2.851 0.006 8.616 48.505
Literacy -0.1858 0.210 -0.886 0.378 -0.603 0.232
Wealth 0.4515 0.103 4.390 0.000 0.247 0.656
================================================================================

Related

html2text command line breaking html

I'm trying to figure out why html2text is breaking my HTML:
<div><table> <tbody> <tr> <td> <span><strong><span>About</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>Contact</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><a><span>Maths Games Order</span></a></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>FAQ</span></strong></span></td> </tr> </tbody> </table>s<div> <span><strong>Broadbent Maths Ltd<br> 3 High Street, Welbourn, Lincoln, LN5 0NH </strong></span></div> </div>
Processing it with:
cat "/home/spider/original-file.txt" | html2text -utf8 -nobs -style pretty
When I run that, I get:
nput recoding failed due to invalid input sequence. Unconverted part
of text follows. ▒Contact ▒Maths Games Order ▒FAQ
s Broadbent Maths Ltd 3 High Street, Welbourn, Lincoln, LN5 0NH
When I run Devel::Peek::Dump() (Perl), I see the string as:
SV = PV(0x564c0a72c860) at 0x564c09967c80
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK,UTF8)
PV = 0x564c0a58bc60 "\n<div><table> <tbody> <tr> <td> <span><strong><span>About</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>Contact</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><a><span>Maths Games Order</span></a></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>FAQ</span></strong></span></td> </tr> </tbody> </table>s<div> <span><strong>Broadbent Maths Ltd<br> 3 High Street, Welbourn, Lincoln, LN5 0NH </strong></span></div> </div>\n"\0 [UTF8 "\n<div><table> <tbody> <tr> <td> <span><strong><span>About</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>Contact</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><a><span>Maths Games Order</span></a></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>FAQ</span></strong></span></td> </tr> </tbody> </table>s<div> <span><strong>Broadbent Maths Ltd<br> 3 High Street, Welbourn, Lincoln, LN5 0NH </strong></span></div> </div>\n"]
CUR = 725
LEN = 736
COW_REFCNT = 1
If I remove the first bit:
<div><table>
It works fine! I don't get why its breaking there though - all seems ok to me?
Ok I think I've worked it out. In this case, for some reason `• was breaking it. I replaced that with "-", and it works now
html2text -utf8 -nobs -o test-out.txt test.co.uk.txt
It's a bit weird that html2text breaks with HTML entities though?
UPDATE: The problem turned out to be that while they were serving the page as utf-8 with the meta, it was being passed along as iso-8859-1 from the server. So what I did was parse out the server header and compare it before saving - then if it was windows-1252, then I would use this command instead of parse it out:
html2text -ansi -nobs -o test-out.txt test.co.uk.txt

invoice report - display discount in Line invoice - Odoo 12 - Qweb

I am making a report for the invoice line, I have purchased a module in the third-party odoo store and it performs its function well.
But I can't see the discount on the invoice line.
I think this is because the module prevents me, but I already have no developer support.
What I need is that the discount (price list) can be seen on the invoice line.
What table or what element of the invoice line discount?
I leave you the code that I have in the report
''''
<tbody class="invoice_tbody">
<tr t-foreach="invoice_lines[0]" t-as="line">
<td><b><span t-esc="line['client_ref']"/></b>
<span t-esc="line['description']"/></td>
<td class="text-right">
<span t-esc="line['qty']"/>
</td>
<td class="text-right">
<span t-esc="line['price_unit']"/>
</td>
<td t-if="display_discount" class="text-right">
</td>
<td class="text-right" id="subtotal">
<t t-if="line['price_subtotal']">
<span t-esc = "line ['price_subtotal']" t-options = "{& quot; widget & quot ;: & quot; monetario & quot ;, & quot; display_currency & quot ;: o.currency_id}" /> </t>
</td>
</tr>
<tr t-foreach = "range (max (5-len (o.invoice_line_ids), 0))" t-as = "l">
<td t-translation = "off"> & amp; nbsp; </td>
<td class = "hidden" />
<td />
<td />
<td t-if = "display_discount" />
<td />
<td />
</tr>
</tbody>
</t>
'''
Yes, this parameter is in the report
"view / report_invoice_document"
But the report that I try to modify is this
report_invoice_document_inherit
<?xml version="1.0"?>
<data inherit_id="account.report_invoice_document">
<xpath expr="//table[#name='invoice_line_table']/tbody" position="replace">
<t t-if="res_company.is_group_by_so">
<t t-set="invoice_lines" t-value="o.get_invoice_lines()"/>
<tbody class="invoice_tbody">
<tr t-foreach="invoice_lines[0]" t-as="line">
<td><b><span t-esc="line['client_ref']"/></b>
<span t-esc="line['description']"/></td>
<!-- <td class="hidden"><span t-esc="line['client_ref']"/></td> -->
<td class="text-right">
<span t-esc="line['qty']"/>
<!-- <span t-field="l.uom_id" groups="product.group_uom"/> -->
</td>
<td class="text-right">
<span t-esc="line['price_unit']"/>
</td>
</td>
<td t-if="display_discount" class="text-right">
<!-- <span t-esc="line['price_unit']"/> -->
</td>
<td class="text-right" id="subtotal">
<t t-if="line['price_subtotal']">
<span t-esc="line['price_subtotal']" t-options="{"widget": "monetary", "display_currency": o.currency_id}"/></t>
</td>
</tr>
<tr t-foreach="range(max(5-len(o.invoice_line_ids),0))" t-as="l">
<td t-translation="off">&nbsp;</td>
<td class="hidden"/>
<td/>
<td/>
<td t-if="display_discount"/>
<td/>
<td/>
</tr>
</tbody>
</t>
<t t-else="">
<tbody class="invoice_tbody">
<tr t-foreach="o.invoice_line_ids" t-as="l">
<td><span t-field="l.name"/></td>
<td class="hidden"><span t-field="l.origin"/></td>
<td class="text-right">
<span t-field="l.quantity"/>
<span t-field="l.uom_id" groups="product.group_uom"/>
</td>
<td class="text-right">
<span t-field="l.price_unit"/>
</td>
<td t-if="display_discount" class="text-right">
<span t-field="l.discount"/>
</td>
<td class="text-right">
<span t-esc="', '.join(map(lambda x: (x.description or x.name), l.invoice_line_tax_ids))"/>
</td>
<td class="text-right" id="subtotal">
<span t-field="l.price_subtotal" t-options="{"widget": "monetary", "display_currency": o.currency_id}"/>
</td>
</tr>
<tr t-foreach="range(max(5-len(o.invoice_line_ids),0))" t-as="l">
<td t-translation="off">&nbsp;</td>
<td class="hidden"/>
<td/>
<td/>
<td t-if="display_discount"/>
<td/>
<td/>
</tr>
</tbody>
</t>
</xpath>
</data>
I have tried to modify the second report, and put and have looked at the python code in case something
invoice_report_grouped_by \ report \ account_invoice.py
# -*- coding: utf-8 -*-
from odoo import api, models
from datetime import datetime
class AccountInvoice(models.Model):
_inherit = "account.invoice"
def get_notation_amt(self, amt):
'''This method help us to return the value of the product pricing'''
amount = str(amt).split('.')
if len(amount) == 2:
amount = amount[0] + "," + amount[1]
return amount
return amt
#api.multi
def get_product_invoice_lines(self, client_ref=False):
'''This method helps to get the data for the following Invoice Line.'''
product_invoices = []
client_order_ref = []
for line in self.invoice_line_ids:
sale_line = (False, line)
if line.sale_line_ids:
sale_line = (line.sale_line_ids[0].order_id, line)
client_order_ref.append(sale_line)
if client_order_ref:
for ref in client_order_ref:
if (client_ref == ref[0]):
product_invoices.append({'price_subtotal': ref[1].price_unit * ref[1].quantity,
'default_code': ref[1].product_id.default_code,
'client_ref': False,
'discount': ref[1].discount,
'taxes': ",".join(map(lambda x: (x.description or x.name), ref[1].invoice_line_tax_ids)),
'description': ref[1].name,
'qty': self.get_notation_amt(ref[1].quantity),
'price_unit': self.get_notation_amt("{0:.3f}".format(ref[1].price_unit)),
})
else:
for line in self.invoice_line_ids:
product_invoices.append({'price_subtotal': line.price_unit * line.quantity,
'default_code': line.product_id.default_code,
'client_ref': False,
'discount': line.discount,
'taxes': ",".join(map(lambda x: (x.description or x.name), ref[1].invoice_line_tax_ids)),
'description': line.name,
'qty': self.get_notation_amt(line.quantity),
'price_unit': self.get_notation_amt("{0:.3f}".format(line.price_unit)),
})
return product_invoices
#api.multi
def get_invoice_lines(self):
'''This method help to get the invoice line group by Sale order'''
vals = []
sale_order_lines = []
false_sale_order_lines = []
for line in self.invoice_line_ids:
sale_line = False
if line.sale_line_ids:
sale_line = line.sale_line_ids[0].order_id
if sale_line:
sale_order_lines.append(sale_line)
else:
false_sale_order_lines.append(sale_line)
sale_order_lines = list(set(sale_order_lines))
false_sale_order_lines = list(set(false_sale_order_lines))
for sale_order in sale_order_lines:
if sale_order and self.origin:
confirmation_date = str(
sale_order.confirmation_date, '%d-%m-%Y %H:%M:%S').strftime('%d/%m/%Y')
client_ref = sale_order.name + ' - ' + confirmation_date
if sale_order.client_order_ref:
client_ref = client_ref + ' - ' + sale_order.client_order_ref
vals.append({'price_subtotal': False, 'default_code': False,
'client_ref': client_ref, 'description': False,
'qty': False, 'price_unit': False, 'taxes': False, 'discount': False})
vals.extend(self.get_product_invoice_lines(client_ref=sale_order))
# for sort false sale order, display manually invoice line at last
for so in false_sale_order_lines:
vals.extend(self.get_product_invoice_lines(client_ref=so))
return [vals, len(vals)]
You can see the default report here:
https://github.com/odoo/odoo/blob/06f9baae968674547cb2592b1c22147bfb2e8ba9/addons/account/views/report_invoice.xml#L49
<t t-set="display_discount" t-value="any([l.discount for l in o.invoice_line_ids])"/>
This means that if any line has a discount, it should display it.
I think there are two options to disable it. One is to remove that line from the report, or the second option is to set display_discount to false.
Knowing the module that breaks your report, the problem should be easy to find.
But the exact reason is hard to tell without seeing your module.

how to echo a dom object such as html content

this is div content i want to get from a web page.
<div class="result clearfix table-responsive">
<table class="table table-striped">
<thead>
<tr>
<th>Giải thưởng</th>
<th>Trùng khớp</th>
<th>Số lượng giải</th>
<th style="text-align: left; width: 22%;">Giá trị giải (đồng)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Jackpot</td>
<td>Trùng 6 số</td>
<td>0</td>
<td style="text-align: left"><span>27.868.784.500</span></td>
</tr>
<tr>
<td>Giải nhất</td>
<td>Trùng 5 số</td>
<td>18</td>
<td style="text-align: left"><span>10.000.000</span></td>
</tr>
<tr>
<td>Giải nhì</td>
<td>Trùng 4 số</td>
<td>613</td>
<td style="text-align: left"><span>300.000</span></td>
</tr>
<tr>
<td>Giải ba</td>
<td>Trùng 3 số</td>
<td>11047</td>
<td style="text-align: left"><span>30.000</span></td>
</tr>
</tbody>
</table>
<p class="role-result">
<span>Thời hạn lĩnh thưởng của vé trúng thưởng: là 60 (sáu mươi) ngày, kể từ ngày xác định kết quả trúng thưởng hoặc kể từ ngày hết hạn lưu hành của vé xổ số tự chọn số điện toán (nếu có). Quá thời hạn trên, các vé trúng thưởng không còn giá trị lĩnh thưởng.</span>
</p>
<div>
<a class="view-more" href="winning-numbers">Các lần quay trước</a>
</div>
</div>
this is my code to get div content and echo to my site:
$kqxsmega = file_get_contents ("http://vietlott.vn/vi/trung-thuong/ket-qua-trung-thuong/mega-6-45/");
$dom = new DomDocument();
$dom->loadHTML($kqxsmega);
$finder = new DomXPath($dom);
$classname="result clearfix table-responsive";
$divContent = $finder->query("//*[contains(#class, '$classname')]");
My code is running good, and i want to convert $divContent become string and i can echo it.
now to echo $divContent will show nothing
echo $divContent ;
Please help me.
Thank you.

how to save value on table grails?

my image..
http://www.4shared.com/photo/Rj_0Ymdt/1111.html
this is my coding in controller
def simpan(){
def banklimit = new BankLimit()
for(int i = 0; i<params.limitPerDay.size();i++) {
def baaa = CurrencyList.list(params)
banklimit.dayLimit = params.limitPerDay[i].toBigDecimal()
banklimit.alertLimit = (banklimit.dayLimit*0.8)
banklimit.currency = CurrencyList.findBySym(baaa.sym[i])
banklimit.save()
}
return banklimit
}
<table>
<thead>
<tr>
<g:sortableColumn property="sym" title="${message(code: 'banklimit.currency.sym.label', default: 'Simbol')}" />
<g:sortableColumn property="limitday" title="${message(code: 'banklimit.dayLimit.label', default: 'Limit/Day')}" />
</tr>
</thead>
<tbody>
<g:each in="${aaa}" status="i" var="bbb">
<tr class="${(i % 2) == 0 ? 'even' : 'odd'}">
<td id="${bbb.id}"> ${fieldValue(bean: bbb, field: "sym")}</td>
<td id="${bbb.id}" colspan="2"> <g:textField name="limitPerDay" value="${value}" /></td>
</tr>
</g:each>
<tr>
<td></td>
<td><g:submitButton name="save" value="SAVE" /> <g:actionSubmit action="back" value="BACK" /></td>
</tr>
</tbody>
</table>
i want to save the field on coloum limit /day if i click "SAVE" buttons.
based my coding, when i click save, only the last row was save in database...
example : theres 4 row..
row 1,2,3,4 filled, then i click save..why only 4th rows save in database? row 1 ,2,3 arenot saving?
You have just one banklimit = new BankLimit() but saving it 4 times. it's one object saved 4 times with different values in loop.
You need:
for(int i = 0; i<params.limitPerDay.size();i++) {
def banklimit = new BankLimit()
//.... fill with values
banklimit.save()
}
For client side it's better to make:
<g:each in="${aaa}" status="i" var="bbb">
<tr class="${(i % 2) == 0 ? 'even' : 'odd'}">
<td>${fieldValue(bean: bbb, field: "sym")}</td>
<td colspan="2"><g:textField name="limitPerDay[${i}]" value="${value}" /></td>
</tr>
</g:each>

Add two lines every four lines between patterns - SED

I'm needing some help with Sed. I'm using it on Windows and Mac OSX. I need to Sed to add a
</tr>
<tr>
every 4 lines, after the first <tr> found, and stop doing it on </tr>
i Just can't find a way to doing this.
Every file will have up to 20 tables, so i need to do it automatically...
changing from this
<div class="titulo"> TERMINAL CAPAO DA IMBUIA</div>
<div class="dataedia">
Válido a partir de: 30/07/2012 -
DIA ÚTIL</div>
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>
to this
<div class="titulo"> TERMINAL CAPAO DA IMBUIA</div>
<div class="dataedia">
Válido a partir de: 30/07/2012 -
DIA ÚTIL</div>
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
</tr>
<tr>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
</tr>
<tr>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
</tr>
<tr>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
</tr>
<tr>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
</tr>
<tr>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
</tr>
<tr>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
</tr>
<tr>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
</tr>
<tr>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>
Is it possible with sed? If not, what tool should i use?
Thanks
I don't like the idea of using sed to handle HTML code. Said that, try with this:
Content of script.sed:
## For every line between '<tr>' and '</tr>' do ...
/<tr>/,/<\/tr>/ {
## Omit range edges.
/<\/\?tr>/ b;
## Append '<td>...</td>' to Hold Space (HS).
H;
## Get HS to Pattern Space (PS) to work with it.
x;
## If there are at least four newline characters means that exists four
## '<td>' tags too, so add a '<tr>' before them and a '</tr>' after them,
## print, and delete them (already processed).
/\(\n[^\n]*\)\{4\}/ {
s/^\(\n\)/<tr>\1/;
s/$/\n<\/tr>/;
p
s/^.*$//;
}
## Save the '<td>'s to HS again and read next line.
x;
b;
}
## Print all lines out of the range.
p;
Assuming infile with the data posted in the question, run the script like:
sed -nf script.sed infile
That yields:
<div class="titulo"> TERMINAL CAPAO DA IMBUIA</div>
<div class="dataedia">
Válido a partir de: 30/07/2012 -
DIA ÚTIL</div>
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
</tr>
<tr>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
</tr>
<tr>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
</tr>
<tr>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
</tr>
<tr>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
</tr>
<tr>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
</tr>
<tr>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
</tr>
<tr>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
</tr>
<tr>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>
try awk
awk '{print}; /<td>/ && ++i==4 {print "</tr>\n<tr>"; i=0}' file
print the line
if it's a <td> then increase i
if i is 4 print </tr><tr> and reset i
Testing with given input the desired output is returned,
with the only "problem" that an extra <tr></tr> appears at the end of the list.
This is fixable but I'm running out of time here.
When I get back I can look into it if you think it is needed.
... part of the end of the result file
<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
<tr> <-- extra <tr></tr> here
</tr>
</table>
you can try with regular expressions. You can test following expression on:
http://gskinner.com/RegExr/
Catch expression:
?</td>.<td>.*?</td>.<td>.*?</td>.<td>.*?</td>)(?!.</tr>)
Replace expression:
$1\n</tr>\n<tr>
Flags checked:
global, ignorecase, dotall
Result:
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
</tr>
<tr>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
</tr>
<tr>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
</tr>
<tr>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
</tr>
<tr>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
</tr>
<tr>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
</tr>
<tr>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
</tr>
<tr>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
</tr>
<tr>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>
You can use editor like Notepad++ for batch replace on many files at once (syntax will be little different).
sed '\!<td>!,\!</table!{N;N;N;i\
</tr>\
<tr>
}' input_file
Perl solution, still using regular expression instead of parsing HTML:
perl -pe '
undef $inside if m{</tr>};
if ($inside and ($. % 4) == $tr_line) {
print "</tr>\n<tr>\n";
}
$inside = 1 if defined $tr_line;
$tr_line = ($. + 1) % 4 if /<tr>/;
' file
Using xsh:
open :F html file ; # Open as html.
while //table/tr[count(td)>4] wrap :U position()=8 tr //table/tr/td ; # Wrap four td's into a tr.
xmove :r //table/tr/tr before .. ; # Unwrap the extra tr.
remove //table/tr[last()] ; # Remove the extra tr.