Swift 2 parse HTML and find particular nodes - swift

Using the Kanna import I am currently parsing html using the following code:
if let doc = Kanna.HTML(url: NSURL(string: "https://en.wikipedia.org/wiki/Data")!, encoding: NSUTF8StringEncoding) {
// Search for nodes by XPath
for link in doc.xpath("/html/head...") {
primaryDisplay.text!=link.text!
print(link.text)
}
}
}
I was wondering how to identify specific "nodes"(not sure if that is the correct term) in/on a html page to parse the specific data I want...
Here is a image that shows what it is I wanted to know... I think...

A simple way to do what are you finding is using SwiftSoup
Try this:
do{
let html = "<!DOCTYPE html>" +
"<html>" +
"<head>" +
"<title>Some webpage</title>" +
"</head>" +
"<body>" +
"<p class='normal'>This is the first paragraph.</p>" +
"<p class='special'><b>this is in bold</b></p>" +
"</body>" +
"</html>";
let doc: Document = try SwiftSoup.parse(html)
let els: Elements = try doc.getElementsByClass("special")
let special: Element? = els.first()//get first element
print(try special?.text())//"this is in bold"
print(special?.tagName())//"p"
print(special?.child(0).tag().getName())//"b"
}catch Exception.Error(let type, let message)
{
print("")
}catch{
print("")
}

You should also take a look at xpath/xquery - it is a language specifically intended to traverse and query XML, which makes it applicable to XHTML and well HTML. XHTML is basically well formed HTML.
Assuming you had an xpath/xquery parser installed on your machine, you could...
get a list of all the p elements in the document: //p
get a list of all the p elements having class "special": //p[#class = 'special']
XQuery adds the ability to query documents using a SQL like syntax called FLWOR.
The difficulty in using this or any other parser for html is that often, the HTML is not well formed. That means that every opening tag does not have a closing tag. This makes any kind of parsing somewhat sketchy as the parser may not be able to figure out the hierarchy implied by the HTML.

Related

Is there any way to get specific css using swiftsoup?

Try to get CSS used using swiftsoup
let link = "<div style="background-image:URL(https://xxxx); color:blue">"
link.attr("style");
I can get all style from the div by using code link.attr
But is there any way to get specific style attr? e.g. I want to get background-image value ?
EDIT
Try using regex
let regex = "background-image:(?i)url?[(]"
print(try item.attr("style"))
print("\(try String(item.attr("style").replacingOccurrences(of: regex, with: "", options: .regularExpression)))")
The output shows: https://xxxxxx) -> Hot to add regex to delete ")" base on that regex ?

How to create virtual XML for ZUGFeRD Invoices

I try to create a PDF/A-3b file which contains an embedded XML-File to be ZUGFeRD conform. I use Perl and PDFLib for this purpose. The PDFLib Documentation out there is just for Java and PHP. Creating the PDF works fine, but the XML part is my problem.
So how can i create a pvf from xml and join this to my pdf?
This is what PDFLib recommends in Java:
// Place XML stream in a virtual PVF file
String pvf_name = "/pvf/ZUGFeRD-invoice.xml";
byte[] xml_bytes = xml_string.getBytes("UTF-8");
p.create_pvf(pvf_name, xml_bytes, "");
// Create file attachment (asset) from PVF file
int xml_asset = p.load_asset("Attachment", pvf_name,
"mimetype=text/xml description={ZUGFeRD invoice in XML format} "
+ "relationship=Alternative documentattachment=true");
// Associate file attachment with the document
p.end_document("associatedfiles={" + xml_asset + "}");
So I thought, take the example and fit it to perl:
my $xmldata = read_file($xmlfile, binmode => ':utf8'); #I use example xml at the moment
my $pvf_xml = "/pvf/ZUGFeRD-invoice.xml";
PDF_create_pvf($pdf, $pvf_xml, $xmldata, ""); #because no OOP i need to call it this way (works with all other PDF Functions)
my $xml_invoice = PDF_load_asset("Attachment", $pvf_xml, "mimetype=text/xml "
."description={Rechnungsdaten im Zugferd-Xml-Format} "
."relationship=Alternative documentattachment=true");
PDF_end_document($pdf, "associatedfiles={".$xml_invoice."}");
In PHP examples it's also not needed to convert to ByteArray after reading xml. Further tried it with unpack but don't seem to be the problem.
If I call my script I'm just getting:
Usage: load_asset(type, filename, optlist); at signatur_test.pl line
41.
I think the problem is that pvf_xml isn't created the line before.
Anyone did this before and no how to solve this?
Arg, i was just missing the PDF-Handle in the load_asset method:
my $xml_invoice = PDF_load_asset($pdf, "Attachment", $pvf_xml, "mimetype=text/xml "
."description={Rechnungsdaten im Zugferd-Xml-Format} "
."relationship=Alternative documentattachment=true");
This way it works.

Typoscript: how do I add a parameter to all links in the RTE?

I want to add a parameter to all links entered in the RTE by the user.
My initial idea was to do this:
lib.parseFunc_RTE.tags.link {
typolink.parameter.append = TEXT
typolink.parameter.append.value = ?flavor=lemon
}
So for example:
http://domain.com/mypage.php
becomes
http://domain.com/mypage.php?flavor=lemon
which sounds great -- as long as the link does not already have a query string!
In that case, I obviously end up with two question marks in the URL
So for example:
http://domain.com/prefs.php?id=1234&unit=moon&qty=300
becomes
http://domain.com/prefs.php?id=1234&unit=moon&qty=300?flavor=lemon
Is there any way to add my parameter with the correct syntax, depending on whether the URL already has a query string or not? Thanks!
That would be the solution:
lib.parseFunc_RTE.tags.link {
typolink.additionalParams = &flavor=lemon
}
Note that it has to start with an &, typo3 then generates a valid link. The parameter in the link also will be parsed with realURL if configured accordingly.
Edit: The above solution only works for internal links as described in the documentation https://docs.typo3.org/typo3cms/TyposcriptReference/Functions/Typolink/Index.html
The only solution that works for all links that I see is to use a userFunc
lib.parseFunc_RTE.tags.link {
typolink.userFunc = user_addAdditionalParams
}
Then you need to create a php script and include in your TS with:
includeLibs.rteScript = path/to/yourScript.php
Keep in mind that includeLibs is outdated, so if you are using TYPO3 8.x (and probably 7.3+) you will need to create a custom extension with just a few files
<?php
function user_addAdditionalParams($finalTagParts) {
// modify the url in $finalTagParts['url']
// $finalTagParts['TYPE'] is an indication of link-kind: mailto, url, file, page, you can use it to check if you need to append the new params
switch ($finalTagParts['TYPE']) {
case 'url':
case 'file':
$parts = explode('#', $finalTagParts['url']);
$finalTagParts['url'] = $parts[0]
. (strpos($parts[0], '?') === false ? '?' : '&')
. 'newParam=test&newParam=test2'
. ($parts[1] ? '#' . $parts[1] : '');
break;
}
return '<a href="' . $finalTagParts['url'] . '"' .
$finalTagParts['targetParams'] .
$finalTagParts['aTagParams'] . '>'
}
PS: i have not tested the actual php code, so it can have some errors. If you have troubles, try debugging the $finalTagParts variable
Test whether the "?" character is already in the URL and append either "?" or "&", then append your key-value pair. There's a CASE object available in the TypoScript Reference, with an example you can modify for your purpose.
For anyone interested, here's a solution that worked for me using the replacement function of Typoscript. Hope this helps.
lib.parseFunc_RTE.tags.link {
# Start by "replacing" the whole URL by itself + our string
# For example: http://domain.com/?id=100 becomes http://domain.com/?id=100?flavor=lemon
# For example: http://domain.com/index.html becomes http://domain.com/index.html?flavor=lemon
typolink.parameter.stdWrap.replacement.10 {
#this matches the whole URL
search = #^(.*)$#i
# this replaces it with itself (${1}) + our string
replace =${1}?flavor=lemon
# in this case we want to use regular expressions
useRegExp = 1
}
# After the first replacement is done, we simply replace
# the first '?' by '?' and all others by '&'
# the use of Option Split allow this
typolink.parameter.stdWrap.replacement.20 {
search = ?
replace = ? || & || &
useOptionSplitReplace = 1
}
}

How do I show a REST call response in Test Results with Serenity?

I am using a framework with Serenity BDD (Thucydides), Cucumber and RestAssured. I want to be able to show the Response that I get after performing a request in my Test results HTML page.
Is there any way for doing that?
Thanks!
You can pass valid HTML text as a parameter to #Step methods in the step library. This will show up as formatted text in the reports on the step details page.
This can be achieved by creating a dummy #Step method called description that takes a String parameter. At runtime, the tests supply this method with formatted html text as parameter.
#Step
public void description(String html) {
//do nothing
}
public void about(String description, String...remarks) {
String html =
"<h2 style=\"font-style:italic;color:black\">" + description + "</h2>" +
"<div><p>Remarks:</p>" +
"<ul style=\"margin-left:5%; font-weight:200; color:#434343; font-size:10px;\">";
for (String li : remarks) html += "<li>" + li + "</li>";
html += "<ul></div>";
description(html);
}
This approach is described more fully here.

How to add conditional elements in data-sly-list?

I currently have a data-sly-list that populates a JS array like this:
var infoWindowContent = [
<div data-sly-use.ed="Foo"
data-sly-list="${ed.allassets}"
data-sly-unwrap>
['<div class="info_content">' +
'<h3>${item.assettitle # context='unsafe'}</h3> ' +
'<p>${item.assettext # context='unsafe'} </p>' + '</div>'],
</div>
];
I need to add some logic into this array. If the assetFormat property is 'text/html' only then I want to print the <p> tag. If the assetFormat property is image/png then I want to print img tag.
I'm aiming for something like this. Is this possible to achieve?
var infoWindowContent = [
<div data-sly-use.ed="Foo"
data-sly-list="${ed.allassets}"
data-sly-unwrap>
['<div class="info_content">' +
'<h3>${item.assettitle # context='unsafe'}</h3> ' +
if (assetFormat == "image/png")
'<img src="${item.assetImgLink}</img>'
else if (assetFormat == "text/html")
'<p>${item.assettext # context='unsafe'}</p>'
+ '</div>'],
</div>
];
To answer your question quickly, yes you can have a condition (with data-sly-test) in your list as follows:
<div data-sly-list="${ed.allAssets}">
<h3>${item.assettitle # context='html'}</h3>
<img data-sly-test="${item.assetFormat == 'image/png'}" src="${item.assetImgLink}"/>
<p data-sly-test="${item.assetFormat == 'text/html'}">${item. assetText # context='html'}"</p>
</div>
But looking at what you're attempting to do, basically rendering that on the client-side rather than on the server, let me get a step back to find a better solution than using Sightly to generate JS code.
A few rules of thumb for writing good Sightly templates:
Try not to mix HTML, JS and CSS in the template: Sightly is on
purpose limited to HTML and therefore very poor to output JS or CSS.
The logic for generating a JS object should therefore be done in the
Use-API, by using some convenience APIs that are made fore that, like
JSONWriter.
Also avoid as much as possible any #context='unsafe', unless you filter that string somehow yourself. Each string that is not
escaped or filtered could be used in an XSS attack. This
is the case even if only AEM authors could have entered that string,
because they can be victim of an attack too. To be secure, a system
shouldn't hope for none of their users to get hacked. If you want to allow some HTML, use #context='html' instead.
A good way to pass information to JS is usually to use a data attribute.
<div class="info-window"
data-sly-use.foo="Foo"
data-content="${foo.jsonContent}"></div>
For the markup that was in your JS, I'd rather move that to the client-side JS, so that the corresponding Foo.java logic only builds the JSON content, without any markup inside.
package apps.MYSITE.components.MYCOMPONENT;
import com.adobe.cq.sightly.WCMUsePojo;
import org.apache.sling.commons.json.io.JSONStringer;
import com.adobe.granite.xss.XSSAPI;
public class Foo extends WCMUsePojo {
private JSONStringer content;
#Override
public void activate() throws Exception {
XSSAPI xssAPI = getSlingScriptHelper().getService(XSSAPI.class);
content = new JSONStringer();
content.array();
// Your code here to iterate over all assets
for (int i = 1; i <= 3; i++) {
content
.object()
.key("title")
// Your code here to get the title - notice the filterHTML that protects from harmful HTML
.value(xssAPI.filterHTML("title <span>" + i + "</span>"));
// Your code here to detect the media type
if ("text/html".equals("image/png")) {
content
.key("img")
// Your code here to get the asset URL - notice the getValidHref that protects from harmful URLs
.value(xssAPI.getValidHref("/content/dam/geometrixx/icons/diamond.png?i=" + i));
} else {
content
.key("text")
// Your code here to get the text - notice the filterHTML that protects from harmful HTML
.value(xssAPI.filterHTML("text <span>" + i + "</span>"));
}
content.endObject();
}
content.endArray();
}
public String getJsonContent() {
return content.toString();
}
}
A client-side JS located in a corresponding client library would then pick-up the data attribute and write the corresponding markup. Obviously, avoid inlining that JS into the HTML, or we'd be mixing again things that should be kept separated.
jQuery(function($) {
$('.info-window').each(function () {
var infoWindow = $(this);
var infoWindowHtml = '';
$.each(infoWindow.data('content'), function(i, content) {
infoWindowHtml += '<div class="info_content">';
infoWindowHtml += '<h3>' + content.title + '</h3>';
if (content.img) {
infoWindowHtml += '<img alt="' + content.img + '">';
}
if (content.text) {
infoWindowHtml += '<p>' + content.title + '</p>';
}
infoWindowHtml += '</div>';
});
infoWindow.html(infoWindowHtml);
});
});
That way, we moved the full logic of that info window to the client-side, and if it became more complex, we could use some client-side template system, like Handlebars. The server Java code needs to know nothing of the markup and simply outputs the required JSON data, and the Sightly template takes care of outputting the server-side rendered markup only.
Looking the at the example here, I would put this logic inside a JS USe-api to populate this Array.