c# iTextsharp generated PDF with xmlworker is breaking Lists - itext

I am using iTextsharp library with XmlWorker version 5.5.12.0, and facing problems with list is enclosed with DIV.
<body>
<span>
<ul>
<ul>
<li>Project Management
<ul>
<li>
<a class="jwiki-small" data-containerid="2544" data-containertype="14" data-objectid="14695" data-objecttype="102" href="https://SampleUrl.com/DOC-146">Sample Text</a>
</li>
</ul>
</li>
</ul>
</ul>
</span>
</body>
and the pdf looks correct like the image below.
But Formatting problems start once List is enclosed in a Div at any level. List in pdf becomes inline.
<body>
<div>
<span>
<ul>
<ul>
<li>Project Management
<ul>
<li>
<a class="jwiki-small" data-containerid="2544" data-containertype="14" data-objectid="14695" data-objecttype="102" href="https://SampleUrl.com/DOC-146">Sample Text</a>
</li>
</ul>
</li>
</ul>
</ul>
</span>
</div>
</body>
Is there anything i can do to handle this? Why is Div influencing the formating of list.
FYI, Here is CreatePDF method I am using.
private void CreatePDF(string html)
{
var document = new Document(iTextSharp.text.PageSize.A4,20,20,20,20);
var memoryStream = new MemoryStream();
using (var pdfWriter = PdfWriter.GetInstance(document, memoryStream))
{
document.Open();
var htmlContext = new HtmlPipelineContext(null);
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
htmlContext.SetImageProvider(new CustomItextImageProvider());
htmlContext.CharSet(Encoding.UTF8);
var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
var pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, pdfWriter)));
var xmlWorker = new XMLWorker(pipeline, true);
var xmlParser = new XMLParser(true,xmlWorker);
StringReader rdr = new StringReader((html));
xmlParser.Parse(rdr);
pdfWriter.CloseStream = false;
document.AddCreator("iTextSharp");
document.AddAuthor("ThreeWill");
document.Close();
string fileName = #"c:\temp\" + "test" + DateTime.Now.ToString("yyyy-mm-dd hh.mm.ss") + ".pdf";
var outputFileStream = new FileStream(fileName, FileMode.Create, FileAccess.Write);
memoryStream.Position = 0;
memoryStream.WriteTo(outputFileStream);
outputFileStream.Close();
}
}

First this: your use of <span> is awkward. According to w3schools, the <span> tag is defined and used as follows:
The <span> tag is used to group inline-elements in a document.
The <span> tag provides no visual change by itself.
The <span> tag provides a way to add a hook to a part of a text or a part of a document.
When I look at the result you get, I see that the list is "flattened" to an inline element, instead of remaining the block element you want it to be. However, I understand why you would consider this an error, because a browser accepts badly written HTML and renders it as expected rather than as it probably should.
How to solve your problem?
You are using a maintenance release of a version of iText that is being phased out. Maintenance release means that this version is no longer supported for companies who aren't an iText customer. Only minor bugs are solved. Known problems, such as the one you are encountering now will not be fixed in iText 5!
Why won't we fix this in iText 5? Because this is already fixed in iText 7.1
I wrote the following code sample:
FileStream fs = new FileStream("list.pdf", FileMode.Create);
HtmlConverter.ConvertToPdf(htmlString, fs, props);
Where htmlString contains the HTML from your question.
This is the result I get:
So please stop complaining about errors in (a maintenance release of) an old iText version, and upgrade to iText 7 and pdfHTML! As explained in the introduction of the HTML to PDF tutorial, it will save you from a lot of frustration. It will also save me from a lot of frustration because I have been repeating this same message several times a day in the last couple of weeks.

Related

How do I find second div class with the same name?

I don't even know how to properly ask this.
I just started with python, and I'm trying to make a crawler.
Everything works fine but I can't "call" or "find" the second div with identical class names in the body.
I've been searching internet for help but the way people write their code is not similar to what I wrote.
so the HTML looks something like this:
<div class="card">
<div class="card-body">...</div>
<div class="card-body">...</div>
My code:
comp_link = comp_card.find('a', class_ = 'link')
href_link = comp_link['href']
link_final = 'https://www.someweb.com' + href_link
prof_text = requests.get(link_final).text
prof_soup = BeautifulSoup(prof_text, 'lxml')
comp_name = prof_soup.find('h2', class_ = 'company-name').text.strip()
comp_info = prof_soup.find('div', class_ ='col-md-12 col-lg-4')
but when I try to use
comp_info = comp_info.find('div', class_ = 'card-body'[1])
it doesn't work.
I've tried to experiment, use other peoples solutions from StackOverflow (but I'm too dumb).
Often, I prefer using CSS selectors. In this simple case you could select the second child that has the class name card-body. You can use the nth-child selector to grab the second div:
import bs4
html = """
<div class="card">
<div class="card-body">Not this</div>
<div class="card-body">But this</div>
</div>
"""
soup = bs4.BeautifulSoup(html)
print(soup.select('div.card-body:nth-child(2)'))
Output
[<div class="card-body">But this</div>]
If you happen to be in a situation where the targetted element is not actually the second element, but simply the second element with the class card-body, it may be advantagous to use nth-child(n of selector). This will select the second one element that matches the specified selector:
html = """
<div class="card">
<div class="other-class">Not this</div>
<div class="card-body">Or this</div>
<div class="card-body">But this</div>
</div>
"""
soup = bs4.BeautifulSoup(html)
print(soup.select('div:nth-child(2 of .card-body)'))
Output
[<div class="card-body">But this</div>]
BeautifulSoup's CSS selector logic is driven by the SoupSieve library, and more information can be found here: https://facelessuser.github.io/soupsieve/selectors/pseudo-classes/#:nth-child.

Umbraco 8 how to line break in textareaEditor

When using Umbraco textarea, I need to have multiple lines with the <br/> tag
code is here
var subTitle = item.Value("sliderSubTitle");
and html code is
div>
#subTitle
</div>
but data comes in single line
Can you check this?
<div>
#Html.Raw(subTitle.Replace("\n", "<br />"))
</div>

Wicket:: add custom attribute to <li> element

how can I add a custom attribute to an html list element?
I tried the following but got markup exception:
WebMarkupContainer con = new WebMarkupContainer("Temp");
con.add(new AttributeAppender("note",true, new Model<String>("Alpha")));
add(con);
HTML:
<li class="segment" wicket:id="Temp">Data Usage</li>
Any suggestions for custom attributes?
Thanks.
You have to use valid markup (note the closing li tag):
<li class="segment" wicket:id="Temp">Data Usage</li>

Selecting a DOM Element when (auto-generated) HTML is not well formed

I'm trying to select a control in order to manipulate it but I'm having a problem: I can't select it. Maybe it's because the xml structure, but I really can't change it because it is externally created. SO I have this:
<span class="xforms-value xforms-control xforms-input xforms-appearance xforms-optional xforms-enabled xforms-readonly xforms-valid " id="pName">
<span class="focus"> </span>
<label class="xforms-label" id="xsltforms-mainform-label-2_2_4_3_">Name:</label>
<span class="value">
<input readonly="" class="xforms-value" type="text">
</span>
<span class="xforms-required-icon">*</span>
<span class="xforms-alert">
<span class="xforms-alert-icon"> </span>
</span>
</span>
And what I need is to get the input (line 5). I tryed a lot, for example:
var elem01 = document.getElementById("pName");
console.log("getElementById: " + elem01);
var elem02 = document.evaluate(".//*[#id='pName']" ,document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null );
console.log("evaluate: " + elem02);
console.log(elem02.singleNodeValue);
var elem03 = document.querySelector("#pName");
console.log("querySelector: " + elem03);
But none of that allows me to get a reference to the control. What's wrong?
With XPath, the problem seems to be the XML is no well formed, so document.getElementById("pName") doesnt return anything.
http://jsfiddle.net/wmzyqqja/7/
The problem with your example is that you are executing your Javascript before the relevant DOM elements are loaded (i.e. your code is in the head element):
This will fix the example:
window.onload = changeControlValue;
JSFiddle: http://jsfiddle.net/TrueBlueAussie/wmzyqqja/8/
Try this
var elem01 = document.getElementById("pName");
var inp = elem01.getElementsByTagName("input")[0];
(in JSFiddle the "onload" setting is required.)

How to use "this" and not "this" selectors in jQuery

I have 4 divs with content like below:
<div class="prodNav-Info-Panel">content</div>
<div class="prodNav-Usage-Panel">content</div>
<div class="prodNav-Guarantee-Panel">content</div>
<div class="prodNav-FAQ-Panel">content</div>
And a navigation list like this:
<div id="nav">
<ul id="navigation">
<li><a class="prodNav-Info" ></a></li>
<li><a class="prodNav-Usage" ></a></li>
<li><a class="prodNav-Guarantee"></a></li>
<li><a class="prodNav-FAQ" ></a></li>
</ul>
</div>
When the page is first displayed I show all the content by executing this:
$('div.prodNav-Usage-Panel').fadeIn('slow');
$('div.prodNav-Guarantee-Panel').fadeIn('slow');
$('div.prodNav-FAQ-Panel').fadeIn('slow');
$('div.prodNav-Info-Panel').fadeIn('slow');
Now, when you click the navigation list item it reveals the clicked content and hides the others, like this:
$('.prodNav-Info').click( function() {
$('div.prodNav-Info-Panel').fadeIn('slow');
$('div.prodNav-Usage-Panel').fadeOut('slow');
$('div.prodNav-Guarantee-Panel').fadeOut('slow');
$('div.prodNav-FAQ-Panel').fadeOut('slow');
});
So what I have is 4 separate functions because I do not know which content is currently displayed. I know this is inefficient and can be done with a couple of lines of code. It seems like there is a way of saying: when this is clicked, hide the rest.
Can I do this with something like $(this) and $(not this)?
Thanks,
Erik
In your particular case you maybe able to use the .sibilings() method something like this:
$(this).fadeIn().sibilings().fadeOut()
Otherwise, lets say that you have a set of elements stored somewhere that points to all of your elements:
// contains 5 elements:
var $hiders = $(".prodNavPanel");
// somewhere later:
$hiders.not("#someElement").fadeOut();
$("#someElement").fadeIn();
Also, I would suggest changing the classes for your <div> and <a> to something more like:
<div class="prodNavPanel" id="panel-Info">content</div>
....
<a class="prodNavLink" href="#panel-Info">info</a>
This gives you a few advantages over your HTML. First: the links will have useful hrefs. Second: You can easily select all your <div>/<a> tags. Then you can do this with jQuery:
$(function() {
var $panels = $(".prodNavPanel");
$(".prodNavLink").click(function() {
var m = this.href.match(/(#panel.*)$/);
if (m) {
var panelId = m[1];
$panels.not(panelId).fadeOut();
$(panelId).fadeIn();
return false; // prevents browser from "moving" the page
}
});
});