HtmlUnit & GWT error - gwt
I've a GWT application that I try to index.
I am using HtmlUnit to get the content of the generated HTML:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
HtmlPage refDesing = webClient.getPage("http://localhost:8080/MyGWTApp/#page2");
FileOutputStream fos1 = new FileOutputStream("D:\\work\\out\\page2.html");
fos1.write(refDesing.asXml().getBytes());
fos1.close();
But I get the following error and the page returned approximately empty!
Dec 22, 2010 6:16:25 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Expected content type of 'application/javascript' or 'application/ecmascript' for remotely loaded JavaScript element at 'http://xxxxxxxxxxxx/xxxxxxxx/xxxxxxxx/xxxxxxxxxx.nocache.js', but got 'application/x-javascript'.
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: null [485:24] Error in expression. Invalid token "=". Was expecting one of: <S>, <COMMA>, "/", <PLUS>, "-", <HASH>, <STRING>, ")", <URI>, "inherit", <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <NUMBER>, <FUNCTION>, <IDENT>.
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: null [485:29] Error in style rule. Invalid token "\n". Was expecting one of: "}", ";".
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: null [485:29] Ignoring the following declarations in this rule.
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: null [518:24] Error in expression. Invalid token "=". Was expecting one of: <S>, <COMMA>, "/", <PLUS>, "-", <HASH>, <STRING>, ")", <URI>, "inherit", <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <NUMBER>, <FUNCTION>, <IDENT>.
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: null [518:29] Error in style rule. Invalid token "\n ". Was expecting one of: "}", ";".
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: null [518:29] Ignoring the following declarations in this rule.
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: null [541:24] Error in expression. Invalid token "=". Was expecting one of: <S>, <COMMA>, "/", <PLUS>, "-", <HASH>, <STRING>, ")", <URI>, "inherit", <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <NUMBER>, <FUNCTION>, <IDENT>.
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: null [541:29] Error in style rule. Invalid token "\n ". Was expecting one of: "}", ";".
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: null [541:29] Ignoring the following declarations in this rule.
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: null [951:24] Error in expression. Invalid token "=". Was expecting one of: <S>, <COMMA>, "/", <PLUS>, "-", <HASH>, <STRING>, ")", <URI>, "inherit", <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <NUMBER>, <FUNCTION>, <IDENT>.
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: null [951:29] Error in style rule. Invalid token "\n". Was expecting one of: "}", ";".
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: null [951:29] Ignoring the following declarations in this rule.
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: null [977:24] Error in expression. Invalid token "=". Was expecting one of: <S>, <COMMA>, "/", <PLUS>, "-", <HASH>, <STRING>, ")", <URI>, "inherit", <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <NUMBER>, <FUNCTION>, <IDENT>.
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: null [977:29] Error in style rule. Invalid token "\n". Was expecting one of: "}", ";".
Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: null [977:29] Ignoring the following declarations in this rule.
EDIT:
What I mean by approximately empty is, here's snapshot of the returned HTML:
Please note that, not all data that is displayed in the original page (which original got from DB) is returned by HtmlUnit. Also What "?" means? I don't think it means any encoding error cause all words are clear ASCII characters.
<td align="center" style="vertical-align: top;">
<table class="refDesignGrid" cellspacing="5">
<colgroup>
<col/>
</colgroup>
<tbody align="left">
<tr>
<td align="left" style="vertical-align: top;">
<table cellpadding="0" class="categoryItem" cellspacing="0">
<tbody align="left">
<tr>
<td align="left" style="vertical-align: top;">
<div class="header4">
C++
</div>
</td>
</tr>
</tbody>
</table>
</td>
<td align="left" style="vertical-align: top;">
<table cellpadding="0" class="categoryItem" cellspacing="0">
<tbody align="left">
<tr>
<td align="left" style="vertical-align: top;">
<div class="header4">
Java
</div>
</td>
</tr>
</tbody>
</table>
</td>
<td align="left">
<table cellpadding="0" class="categoryItem" cellspacing="0">
<tbody align="left">
<tr>
<td align="left" style="vertical-align: top;">
<div class="header4">
C#
</div>
</td>
</tr>
</tbody>
</table>
</td>
<td>
?
</td>
</tr>
<tr>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
</tr>
<tr>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
</tr>
<tr>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
</tr>
<tr>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
HtmlUnit can be kinda chatty, and in particular can make things look worse than they are.
Create these two classes:
import org.w3c.css.sac.CSSException;
import org.w3c.css.sac.CSSParseException;
import com.gargoylesoftware.htmlunit.DefaultCssErrorHandler;
/*
* get rid of warnings... and provide a place to hang a break point
*/
public class QuietCssErrorHandler
extends DefaultCssErrorHandler
{
#Override public void error( CSSParseException e ) throws CSSException
{
super.error( e ) ;
}
#Override public void fatalError( CSSParseException e ) throws CSSException
{
super.fatalError( e ) ;
}
#Override public void warning( CSSParseException e ) throws CSSException
{
}
}
and
import com.gargoylesoftware.htmlunit.IncorrectnessListener;
public class SilentIncorrectnessListener
implements IncorrectnessListener
{
#Override public void notify( String message, Object origin )
{
// do nuttin' honey!
}
}
then when you create your WebClient...
wc.setIncorrectnessListener( new SilentIncorrectnessListener() ) ;
wc.setCssErrorHandler( new QuietCssErrorHandler() ) ;
And you should then get fewer warnings.
As for "approximately empty"... what does that mean?
Answer is here:
http://htmlunit.sourceforge.net/faq.html#AJAXDoesNotWork
The main thread using HtmlUnit may be finishing execution before
allowing background threads to run. You have a couple of options:
webClient.setAjaxController(new
NicelyResynchronizingAjaxController()); will tell your WebClient
instance to re-synchronize asynchronous XHR.
webClient.waitForBackgroundJavaScript(10000); or
webClient.waitForBackgroundJavaScriptStartingBefore(10000); just after
getting the page and before manipulating it. Explicitly wait for a
condition that is expected be fulfilled when your JavaScript runs,
e.g.
//try 20 times to wait .5 second each for filling the page.
for (int i = 0; i < 20; i++) {
if (condition_to_happen_after_js_execution) {
break;
}
synchronized (page) {
page.wait(500);
}
}
Related
html2text command line breaking html
I'm trying to figure out why html2text is breaking my HTML: <div><table> <tbody> <tr> <td> <span><strong><span>About</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>Contact</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><a><span>Maths Games Order</span></a></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>FAQ</span></strong></span></td> </tr> </tbody> </table>s<div> <span><strong>Broadbent Maths Ltd<br> 3 High Street, Welbourn, Lincoln, LN5 0NH </strong></span></div> </div> Processing it with: cat "/home/spider/original-file.txt" | html2text -utf8 -nobs -style pretty When I run that, I get: nput recoding failed due to invalid input sequence. Unconverted part of text follows. ▒Contact ▒Maths Games Order ▒FAQ s Broadbent Maths Ltd 3 High Street, Welbourn, Lincoln, LN5 0NH When I run Devel::Peek::Dump() (Perl), I see the string as: SV = PV(0x564c0a72c860) at 0x564c09967c80 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x564c0a58bc60 "\n<div><table> <tbody> <tr> <td> <span><strong><span>About</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>Contact</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><a><span>Maths Games Order</span></a></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>FAQ</span></strong></span></td> </tr> </tbody> </table>s<div> <span><strong>Broadbent Maths Ltd<br> 3 High Street, Welbourn, Lincoln, LN5 0NH </strong></span></div> </div>\n"\0 [UTF8 "\n<div><table> <tbody> <tr> <td> <span><strong><span>About</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>Contact</span></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><a><span>Maths Games Order</span></a></strong></span></td> <td> <span><strong><span>•</span></strong></span></td> <td> <span><strong><span>FAQ</span></strong></span></td> </tr> </tbody> </table>s<div> <span><strong>Broadbent Maths Ltd<br> 3 High Street, Welbourn, Lincoln, LN5 0NH </strong></span></div> </div>\n"] CUR = 725 LEN = 736 COW_REFCNT = 1 If I remove the first bit: <div><table> It works fine! I don't get why its breaking there though - all seems ok to me?
Ok I think I've worked it out. In this case, for some reason `• was breaking it. I replaced that with "-", and it works now html2text -utf8 -nobs -o test-out.txt test.co.uk.txt It's a bit weird that html2text breaks with HTML entities though? UPDATE: The problem turned out to be that while they were serving the page as utf-8 with the meta, it was being passed along as iso-8859-1 from the server. So what I did was parse out the server header and compare it before saving - then if it was windows-1252, then I would use this command instead of parse it out: html2text -ansi -nobs -o test-out.txt test.co.uk.txt
invoice report - display discount in Line invoice - Odoo 12 - Qweb
I am making a report for the invoice line, I have purchased a module in the third-party odoo store and it performs its function well. But I can't see the discount on the invoice line. I think this is because the module prevents me, but I already have no developer support. What I need is that the discount (price list) can be seen on the invoice line. What table or what element of the invoice line discount? I leave you the code that I have in the report '''' <tbody class="invoice_tbody"> <tr t-foreach="invoice_lines[0]" t-as="line"> <td><b><span t-esc="line['client_ref']"/></b> <span t-esc="line['description']"/></td> <td class="text-right"> <span t-esc="line['qty']"/> </td> <td class="text-right"> <span t-esc="line['price_unit']"/> </td> <td t-if="display_discount" class="text-right"> </td> <td class="text-right" id="subtotal"> <t t-if="line['price_subtotal']"> <span t-esc = "line ['price_subtotal']" t-options = "{& quot; widget & quot ;: & quot; monetario & quot ;, & quot; display_currency & quot ;: o.currency_id}" /> </t> </td> </tr> <tr t-foreach = "range (max (5-len (o.invoice_line_ids), 0))" t-as = "l"> <td t-translation = "off"> & amp; nbsp; </td> <td class = "hidden" /> <td /> <td /> <td t-if = "display_discount" /> <td /> <td /> </tr> </tbody> </t> '''
Yes, this parameter is in the report "view / report_invoice_document" But the report that I try to modify is this report_invoice_document_inherit <?xml version="1.0"?> <data inherit_id="account.report_invoice_document"> <xpath expr="//table[#name='invoice_line_table']/tbody" position="replace"> <t t-if="res_company.is_group_by_so"> <t t-set="invoice_lines" t-value="o.get_invoice_lines()"/> <tbody class="invoice_tbody"> <tr t-foreach="invoice_lines[0]" t-as="line"> <td><b><span t-esc="line['client_ref']"/></b> <span t-esc="line['description']"/></td> <!-- <td class="hidden"><span t-esc="line['client_ref']"/></td> --> <td class="text-right"> <span t-esc="line['qty']"/> <!-- <span t-field="l.uom_id" groups="product.group_uom"/> --> </td> <td class="text-right"> <span t-esc="line['price_unit']"/> </td> </td> <td t-if="display_discount" class="text-right"> <!-- <span t-esc="line['price_unit']"/> --> </td> <td class="text-right" id="subtotal"> <t t-if="line['price_subtotal']"> <span t-esc="line['price_subtotal']" t-options="{"widget": "monetary", "display_currency": o.currency_id}"/></t> </td> </tr> <tr t-foreach="range(max(5-len(o.invoice_line_ids),0))" t-as="l"> <td t-translation="off"> </td> <td class="hidden"/> <td/> <td/> <td t-if="display_discount"/> <td/> <td/> </tr> </tbody> </t> <t t-else=""> <tbody class="invoice_tbody"> <tr t-foreach="o.invoice_line_ids" t-as="l"> <td><span t-field="l.name"/></td> <td class="hidden"><span t-field="l.origin"/></td> <td class="text-right"> <span t-field="l.quantity"/> <span t-field="l.uom_id" groups="product.group_uom"/> </td> <td class="text-right"> <span t-field="l.price_unit"/> </td> <td t-if="display_discount" class="text-right"> <span t-field="l.discount"/> </td> <td class="text-right"> <span t-esc="', '.join(map(lambda x: (x.description or x.name), l.invoice_line_tax_ids))"/> </td> <td class="text-right" id="subtotal"> <span t-field="l.price_subtotal" t-options="{"widget": "monetary", "display_currency": o.currency_id}"/> </td> </tr> <tr t-foreach="range(max(5-len(o.invoice_line_ids),0))" t-as="l"> <td t-translation="off"> </td> <td class="hidden"/> <td/> <td/> <td t-if="display_discount"/> <td/> <td/> </tr> </tbody> </t> </xpath> </data> I have tried to modify the second report, and put and have looked at the python code in case something invoice_report_grouped_by \ report \ account_invoice.py # -*- coding: utf-8 -*- from odoo import api, models from datetime import datetime class AccountInvoice(models.Model): _inherit = "account.invoice" def get_notation_amt(self, amt): '''This method help us to return the value of the product pricing''' amount = str(amt).split('.') if len(amount) == 2: amount = amount[0] + "," + amount[1] return amount return amt #api.multi def get_product_invoice_lines(self, client_ref=False): '''This method helps to get the data for the following Invoice Line.''' product_invoices = [] client_order_ref = [] for line in self.invoice_line_ids: sale_line = (False, line) if line.sale_line_ids: sale_line = (line.sale_line_ids[0].order_id, line) client_order_ref.append(sale_line) if client_order_ref: for ref in client_order_ref: if (client_ref == ref[0]): product_invoices.append({'price_subtotal': ref[1].price_unit * ref[1].quantity, 'default_code': ref[1].product_id.default_code, 'client_ref': False, 'discount': ref[1].discount, 'taxes': ",".join(map(lambda x: (x.description or x.name), ref[1].invoice_line_tax_ids)), 'description': ref[1].name, 'qty': self.get_notation_amt(ref[1].quantity), 'price_unit': self.get_notation_amt("{0:.3f}".format(ref[1].price_unit)), }) else: for line in self.invoice_line_ids: product_invoices.append({'price_subtotal': line.price_unit * line.quantity, 'default_code': line.product_id.default_code, 'client_ref': False, 'discount': line.discount, 'taxes': ",".join(map(lambda x: (x.description or x.name), ref[1].invoice_line_tax_ids)), 'description': line.name, 'qty': self.get_notation_amt(line.quantity), 'price_unit': self.get_notation_amt("{0:.3f}".format(line.price_unit)), }) return product_invoices #api.multi def get_invoice_lines(self): '''This method help to get the invoice line group by Sale order''' vals = [] sale_order_lines = [] false_sale_order_lines = [] for line in self.invoice_line_ids: sale_line = False if line.sale_line_ids: sale_line = line.sale_line_ids[0].order_id if sale_line: sale_order_lines.append(sale_line) else: false_sale_order_lines.append(sale_line) sale_order_lines = list(set(sale_order_lines)) false_sale_order_lines = list(set(false_sale_order_lines)) for sale_order in sale_order_lines: if sale_order and self.origin: confirmation_date = str( sale_order.confirmation_date, '%d-%m-%Y %H:%M:%S').strftime('%d/%m/%Y') client_ref = sale_order.name + ' - ' + confirmation_date if sale_order.client_order_ref: client_ref = client_ref + ' - ' + sale_order.client_order_ref vals.append({'price_subtotal': False, 'default_code': False, 'client_ref': client_ref, 'description': False, 'qty': False, 'price_unit': False, 'taxes': False, 'discount': False}) vals.extend(self.get_product_invoice_lines(client_ref=sale_order)) # for sort false sale order, display manually invoice line at last for so in false_sale_order_lines: vals.extend(self.get_product_invoice_lines(client_ref=so)) return [vals, len(vals)]
You can see the default report here: https://github.com/odoo/odoo/blob/06f9baae968674547cb2592b1c22147bfb2e8ba9/addons/account/views/report_invoice.xml#L49 <t t-set="display_discount" t-value="any([l.discount for l in o.invoice_line_ids])"/> This means that if any line has a discount, it should display it. I think there are two options to disable it. One is to remove that line from the report, or the second option is to set display_discount to false. Knowing the module that breaks your report, the problem should be easy to find. But the exact reason is hard to tell without seeing your module.
Add two lines every four lines between patterns - SED
I'm needing some help with Sed. I'm using it on Windows and Mac OSX. I need to Sed to add a </tr> <tr> every 4 lines, after the first <tr> found, and stop doing it on </tr> i Just can't find a way to doing this. Every file will have up to 20 tables, so i need to do it automatically... changing from this <div class="titulo"> TERMINAL CAPAO DA IMBUIA</div> <div class="dataedia"> Válido a partir de: 30/07/2012 - DIA ÚTIL</div> <table> <tr> <td>05:50</td> <td>05:58</td> <td>06:04</td> <td>06:08</td> <td>06:12</td> <td>06:15</td> <td>06:17</td> <td>06:20</td> <td>06:22</td> <td>06:25</td> <td>06:27</td> <td>06:30</td> <td>06:32</td> <td>06:35</td> <td>06:37</td> <td>06:39</td> <td>06:42</td> <td>06:44</td> <td>06:47</td> <td>06:49</td> <td>06:52</td> <td>06:54</td> <td>06:57</td> <td>06:59</td> <td>07:01</td> <td>07:04</td> <td>07:06</td> <td>07:09</td> <td>07:11</td> <td>07:14</td> <td>07:16</td> <td>07:18</td> <td>07:21</td> <td>07:23</td> <td>07:26</td> <td>07:28</td> <td>07:31</td> <td>07:33</td> <td>07:36</td> <td>07:38</td> </tr> </table> </div> to this <div class="titulo"> TERMINAL CAPAO DA IMBUIA</div> <div class="dataedia"> Válido a partir de: 30/07/2012 - DIA ÚTIL</div> <table> <tr> <td>05:50</td> <td>05:58</td> <td>06:04</td> <td>06:08</td> </tr> <tr> <td>06:12</td> <td>06:15</td> <td>06:17</td> <td>06:20</td> </tr> <tr> <td>06:22</td> <td>06:25</td> <td>06:27</td> <td>06:30</td> </tr> <tr> <td>06:32</td> <td>06:35</td> <td>06:37</td> <td>06:39</td> </tr> <tr> <td>06:42</td> <td>06:44</td> <td>06:47</td> <td>06:49</td> </tr> <tr> <td>06:52</td> <td>06:54</td> <td>06:57</td> <td>06:59</td> </tr> <tr> <td>07:01</td> <td>07:04</td> <td>07:06</td> <td>07:09</td> </tr> <tr> <td>07:11</td> <td>07:14</td> <td>07:16</td> <td>07:18</td> </tr> <tr> <td>07:21</td> <td>07:23</td> <td>07:26</td> <td>07:28</td> </tr> <tr> <td>07:31</td> <td>07:33</td> <td>07:36</td> <td>07:38</td> </tr> </table> </div> Is it possible with sed? If not, what tool should i use? Thanks
I don't like the idea of using sed to handle HTML code. Said that, try with this: Content of script.sed: ## For every line between '<tr>' and '</tr>' do ... /<tr>/,/<\/tr>/ { ## Omit range edges. /<\/\?tr>/ b; ## Append '<td>...</td>' to Hold Space (HS). H; ## Get HS to Pattern Space (PS) to work with it. x; ## If there are at least four newline characters means that exists four ## '<td>' tags too, so add a '<tr>' before them and a '</tr>' after them, ## print, and delete them (already processed). /\(\n[^\n]*\)\{4\}/ { s/^\(\n\)/<tr>\1/; s/$/\n<\/tr>/; p s/^.*$//; } ## Save the '<td>'s to HS again and read next line. x; b; } ## Print all lines out of the range. p; Assuming infile with the data posted in the question, run the script like: sed -nf script.sed infile That yields: <div class="titulo"> TERMINAL CAPAO DA IMBUIA</div> <div class="dataedia"> Válido a partir de: 30/07/2012 - DIA ÚTIL</div> <table> <tr> <td>05:50</td> <td>05:58</td> <td>06:04</td> <td>06:08</td> </tr> <tr> <td>06:12</td> <td>06:15</td> <td>06:17</td> <td>06:20</td> </tr> <tr> <td>06:22</td> <td>06:25</td> <td>06:27</td> <td>06:30</td> </tr> <tr> <td>06:32</td> <td>06:35</td> <td>06:37</td> <td>06:39</td> </tr> <tr> <td>06:42</td> <td>06:44</td> <td>06:47</td> <td>06:49</td> </tr> <tr> <td>06:52</td> <td>06:54</td> <td>06:57</td> <td>06:59</td> </tr> <tr> <td>07:01</td> <td>07:04</td> <td>07:06</td> <td>07:09</td> </tr> <tr> <td>07:11</td> <td>07:14</td> <td>07:16</td> <td>07:18</td> </tr> <tr> <td>07:21</td> <td>07:23</td> <td>07:26</td> <td>07:28</td> </tr> <tr> <td>07:31</td> <td>07:33</td> <td>07:36</td> <td>07:38</td> </tr> </table> </div>
try awk awk '{print}; /<td>/ && ++i==4 {print "</tr>\n<tr>"; i=0}' file print the line if it's a <td> then increase i if i is 4 print </tr><tr> and reset i Testing with given input the desired output is returned, with the only "problem" that an extra <tr></tr> appears at the end of the list. This is fixable but I'm running out of time here. When I get back I can look into it if you think it is needed. ... part of the end of the result file <td>07:26</td> <td>07:28</td> </tr> <tr> <td>07:31</td> <td>07:33</td> <td>07:36</td> <td>07:38</td> </tr> <tr> <-- extra <tr></tr> here </tr> </table>
you can try with regular expressions. You can test following expression on: http://gskinner.com/RegExr/ Catch expression: ?</td>.<td>.*?</td>.<td>.*?</td>.<td>.*?</td>)(?!.</tr>) Replace expression: $1\n</tr>\n<tr> Flags checked: global, ignorecase, dotall Result: <table> <tr> <td>05:50</td> <td>05:58</td> <td>06:04</td> <td>06:08</td> </tr> <tr> <td>06:12</td> <td>06:15</td> <td>06:17</td> <td>06:20</td> </tr> <tr> <td>06:22</td> <td>06:25</td> <td>06:27</td> <td>06:30</td> </tr> <tr> <td>06:32</td> <td>06:35</td> <td>06:37</td> <td>06:39</td> </tr> <tr> <td>06:42</td> <td>06:44</td> <td>06:47</td> <td>06:49</td> </tr> <tr> <td>06:52</td> <td>06:54</td> <td>06:57</td> <td>06:59</td> </tr> <tr> <td>07:01</td> <td>07:04</td> <td>07:06</td> <td>07:09</td> </tr> <tr> <td>07:11</td> <td>07:14</td> <td>07:16</td> <td>07:18</td> </tr> <tr> <td>07:21</td> <td>07:23</td> <td>07:26</td> <td>07:28</td> </tr> <tr> <td>07:31</td> <td>07:33</td> <td>07:36</td> <td>07:38</td> </tr> </table> </div> You can use editor like Notepad++ for batch replace on many files at once (syntax will be little different).
sed '\!<td>!,\!</table!{N;N;N;i\ </tr>\ <tr> }' input_file
Perl solution, still using regular expression instead of parsing HTML: perl -pe ' undef $inside if m{</tr>}; if ($inside and ($. % 4) == $tr_line) { print "</tr>\n<tr>\n"; } $inside = 1 if defined $tr_line; $tr_line = ($. + 1) % 4 if /<tr>/; ' file
Using xsh: open :F html file ; # Open as html. while //table/tr[count(td)>4] wrap :U position()=8 tr //table/tr/td ; # Wrap four td's into a tr. xmove :r //table/tr/tr before .. ; # Unwrap the extra tr. remove //table/tr[last()] ; # Remove the extra tr.
Issue Binding AutoPopulating List to a form Spring MVC
I have an issue binding the AutoPupulating List in a form to update the data. I was able to save the data using Autopopulating list though. Here is the form backing model. public class AddUpdateShot { private Integer shootId; private char shotSelect; private String shotNotes; private Integer numOfItems; private AutoPopulatingList itemNumColors; private Integer totalNumOfItems; private String shotName; ---------- public void setItemNumColors(AutoPopulatingList itemNumColors){ this.itemNumColors = itemNumColors; } public AutoPopulatingList getItemNumColors(){ return this.itemNumColors; } -------- } Where itemNumClors is a simple model public class ItemNumColor { private Integer id; private Integer itemNum; private String itemName; private String colorCode; private String colorName; ------get and set methods } When I first saved the data, depending on how many ItemColors the user wanted,using jquery I added the input fields dynamically as shown in the following code. <form:form id="createShootForm" method="POST" commandName="createShoot"> <tr> <td align="left"><label for="shootName">*Shoot Name:</label></td> <td><form:input id="shootName" class="required" path="shootName" /></td> </tr> ------- other input fields in form backing obj---- <c:forEach var="i" begin="${start}" end="${end-1}" step="1" varStatus="status"> <tr> <td align="left"><label for="itemNumber${i}">Item Number${i+1}:</label></td> <td><form:input id="itemNumber${i}" path="createShoot.itemNumColors[${i}].itemNum" /></td> <td><form:select id="color${i}" path="createShoot.itemNumColors[${i}].colorCode"> <form:option value="" label="Color" /> </form:select> </td> </tr> </c:forEach> <tr id="submitRow"> <td></td> <td></td> <td align="right"><input name="submit" type="submit" value="Next" /></td> </tr> </table> </form:form> The above code worked perfectly fine when I initially saved the data. But now when the user want to update the earlier saved data, I am unable to bind the Autopopulating list to the JSP. Here is how am doing it. <form:form id="updateShotForm" method="POST" commandName="shotToUpdate"> ----other input fields of form backing object--- <c:forEach var="i" begin="0" end="${totalNumOfItems-1}" step="1" varStatus="status"> <tr><td align="left"><label for="itemNumber${i}">ItemNumber${i+1}:</label></td> <td><form:input id="itemNumber${i}"path="shotToUpdate.itemNumColors[${i}].itemNum" /></td> </tr> </c:forEach> <tr id="submitRow"> <td></td> <td></td> <td align="right"><input name="submit" type="submit" value="Next" /> </td> </table> </form:form> When I open the edit JSP, I get the following run time exception Sep 7, 2011 10:38:00 AM org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet [jalapeno] in context with path [/OnLocation] threw exception [An exception occurred processing JSP page /WEB-INF/views/app/updateShot.jsp at line 256 253: <tr> 254: <td align="left"><label for="itemNumber${i}">Item 255: Number${i+1}:</label></td> 256: <td><form:input id="itemNumber${i}" 257: path="shotToUpdate.itemNumColors[${i}].itemNum" /></td> 258: <td><form:select id="color${i}" 259: path="shotToUpdate.itemNumColors[${i}].colorCode"> Stacktrace:] with root cause org.springframework.beans.NotReadablePropertyException: Invalid property 'shotToUpdate' of bean class [com.jcrew.jalapeno.app.model.AddUpdateShot]: Bean property 'shotToUpdate' is not readable or has an invalid getter method: Does the return type of the getter match the parameter type of the setter? at org.springframework.beans.BeanWrapperImpl.getPropertyValue(BeanWrapperImpl.java:707) at org.springframework.beans.BeanWrapperImpl.getNestedBeanWrapper(BeanWrapperImpl.java:555) at org.springframework.beans.BeanWrapperImpl.getBeanWrapperForPropertyPath(BeanWrapperImpl.java:532) at org.springframework.beans.BeanWrapperImpl.getPropertyValue(BeanWrapperImpl.java:697) at org.springframework.validation.AbstractPropertyBindingResult.getActualFieldValue(AbstractPropertyBindingResult.java:98) at org.springframework.validation.AbstractBindingResult.getFieldValue(AbstractBindingResult.java:224) at org.springframework.web.servlet.support.BindStatus.<init>(BindStatus.java:120) at org.springframework.web.servlet.tags.form.AbstractDataBoundFormElementTag.getBindStatus(AbstractDataBoundFormElementTag.java:174) at org.springframework.web.servlet.tags.form.AbstractDataBoundFormElementTag.getPropertyPath(AbstractDataBoundFormElementTag.java:194) at org.springframework.web.servlet.tags.form.AbstractDataBoundFormElementTag.getName(AbstractDataBoundFormElementTag.java:160) at org.springframework.web.servlet.tags.form.AbstractDataBoundFormElementTag.writeDefaultAttributes(AbstractDataBoundFormElementTag.java:123) at org.springframework.web.servlet.tags.form.AbstractHtmlElementTag.writeDefaultAttributes(AbstractHtmlElementTag.java:408) at org.springframework.web.servlet.tags.form.InputTag.writeTagContent(InputTag.java:140) at org.springframework.web.servlet.tags.form.AbstractFormTag.doStartTagInternal(AbstractFormTag.java:102) I am not sure why I am not able to bind the object this way to the form since my form backing object does have an Autopopulating List which I initialised in the controller before loading this form AutoPopulatingList itemNumColors = new AutoPopulatingList(ItemNumColor.class); for( OnLocShotItemNumber onLocItemNumColor : itemNumColorsList){ ItemNumColor itemColor = new ItemNumColor(); itemColor.setId(onLocItemNumColor.getId()); itemColor.setColorCode(onLocItemNumColor.getItemColorCode()); itemColor.setItemNum(onLocItemNumColor.getItemNumber()); itemNumColors.add(itemColor); } shotToUpdate.setItemNumColors(itemNumColors); model.put("shotToUpdate", shotToUpdate); model.put("totalNumOfItems", itemNumColorsList.size()); Any help is greatly appreciated. Thanks, Shravanthi
Remove the 'shotToUpdate.' keyword from the PATH attribute. You have already specified the command object name so the PATH attributes should be relative to the command object.
HTML::TableExtract: applying the right attribs to specify the attributes of interest
I tried to run the following Perl script on the HTML further below. My problem is how to define the correct hash reference, with attribs that specify attributes of interest within my HTML <table> tag itself. #!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; use YAML; my $table = HTML::TableExtract->new(keep_html=>0, depth => 1, count => 1, br_translate => 0 ); $table->parse($html); foreach my $row ($table->rows) sub cleanup { for ( #_ ) { s/\s+//; s/[\xa0 ]+\z//; s/\s+/ /g; } } { print join("\t", #$row), "\n"; } I want to apply this code on the HTML-document you see further below. My first approach is to do this with the columns method. But i am not able to figure out how to use the columns method on the below HTML-file: My intuition makes me think it should be something like the following (but my intuition is wrong): foreach my $column ($table->columns) { print join("\t", #$column), "\n"; } The HTML::TableExtract documentation doesn't shed much light (for me anyway). I can see in the code of the module that the columns method belongs to HTML::TableExtract::Table, but I can't figure out how to use it. I appreciate any help. Background: I try to get the table extracted and I have a very very small document of tables that i want to parse with the HTML::TableExtract module I am trying to search for keywords in the HTML - so that i can take them for the attribs I have to print only the necessary data. I tried going CPAN but could not really find how to search through it for particular keywords. One way to do it would be HTML::TableExtract - the other way would be to parse with HTML::TokeParser I have very little experience with HTML::TokeParser. Well - one or the other way i need to do this parsing: I want to output the result of the parsed tables into some .text - or even better store it into a database. The problem here is I cant find anyway to search through the resulting parsed table and get necessary data. The HTML <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> <meta name="GENERATOR" content="Microsoft FrontPage 3.0"> <link rel="stylesheet" href="jspsrc/css/bp_style.css" type="text/css"> <title>Weitere Schulinformationen</title> </head> <body class="bodyclass"> <div style="text-align:center;"><center> <!-- <fieldset><legend> general information </legend> --> <br/> <table border="1" cellspacing="0" bordercolordark="white" bordercolorlight="black" width="80%" class='bp_result_tab_info'> <!-- <table border="0" cellspacing="0" bordercolordark="white" bordercolorlight="black" width="80%" class='bp_search_info'> --> <tr> <td width="100%" colspan="2" class="ldstabTitel"><strong>data_one </strong></td> </tr> <tr> <td width="27%"><strong>data_two</strong></td> <td width="73%"> 116439 </td> </tr> <tr> <td width="27%"><strong>official_description</strong></td> <td width="73%">the name </td> </tr> <tr> <td width="27%"><strong>name of the street</strong></td> <td width="73%">champs elysee</td> </tr> <tr> <td width="27%"><strong>number and town</strong></td> <td width="73%"> 75000 paris </td> </tr> <tr> <td width="27%"><strong>telefon</strong></td> <td width="73%"> 000241 49321 </td> </tr> <tr> <td width="27%"><strong>fax</strong></td> <td width="73%"> 000241 4093287 </td> </tr> <tr> <td width="27%"><strong>e-mail-adresse</strong></td> <td width="73%"> <a href=mailto:1111116439#my_domain.org>1222216439#site.org</a> </td> </tr> <tr> <td width="27%"><strong>internet-site</strong></td> <td width="73%"> <a href=http://www.thesite.org>http://www.thesite.org</td> </tr> <!-- <tr> <td width="27%"> </td> <td width="73%" align="right"><a href="schule_aeinfo.php?SNR=<? print $SCHULNR ?>" target="_blank"> [Schuldaten ändern] </a> </tr> </td> --> <tr> <td width="27%"> </td> <td width="73%">the department</td> </tr> <tr> <td width="100%" colspan=2><strong> </strong></td> </tr> <tr> <td width="27%"><strong>number of indidviduals</strong></td> <td width="73%"> 192</td> <tr> <td width="100%" colspan=2><strong> </strong></td> </tr> <!-- if (!fsp.isEmpty()){ ztext = " "; int i = 0; Iterator it = fsp.iterator(); while (it.hasNext()){ String[] zwert = new String[2]; zwert = (String[])it.next(); if (i==0){ if (zwert[1].equals("0")){ ztext = ztext+zwert[0]; }else{ ztext = ztext+zwert[0]+" mit "+zwert[1]; if (zwert[1].equals("1")){ ztext = ztext+" Schüler"; }else{ ztext = ztext+" Schülern"; } } i++; }else{ if (zwert[1].equals("0")){ ztext = ztext+"<br> "+zwert[0]; }else{ ztext = ztext+"<br> "+zwert[0]+" mit "+zwert[1]; if (zwert[1].equals("1")){ ztext = ztext+" Schüler"; }else{ ztext = ztext+" Schülern"; } } } } --> </table> <!-- </fieldset> --> <br> </body> </html> Thanks for any and all help.
You need to provide something that uniquely identifies the table in question. This can be the content of its headers or the HTML attributes. In this case, there is only one table in the document, so you don't even need to do that. But, if I were to provide anything to the constructor, I would provide the class of the table. Also, I do not think you want the columns of the table. The first column of this table consists of labels and the second column consists of values. To get the labels and values at the same time, you should process the table row-by-row. #!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; use YAML; my $te = HTML::TableExtract->new( attribs => { class => 'bp_result_tab_info' }, ); $te->parse_file('t.html'); for my $table ( $te->tables ) { print Dump $table->columns; } Output: --- - 'data_one ' - data_two - official_description - name of the street - number and town - telefon - fax - e-mail-adresse - internet-site - á - á - number of indidviduals - á --- - ~ - "á116439\r\n " - 'the name ' - champs elysee - ' 75000 paris ' - "á000241 49321\r\n" - "á000241 4093287\r\n" - "á1222216439#site.org\r\n" - áhttp://www.thesite.org - the department - ~ - á192 - ~ Finally, a word of advice: It is clear that you do not have much of an understanding of Perl (or HTML for that matter). It would be better for you to try to learn some of the basics first. This way, all you are doing is incorrectly copying and pasting code from one answer into another and not learning anything.