The proper use of jsoup - append

I recently began to study how to use jsoup
Document doc = Jsoup.parse(responseString);
 Elements pngs = doc.select ("div.kk2");
To using jsoup made ​​a web page to put pictures of example I
<div class="kk2" id="12" style="border:2px solid #FFFF00; top:-1px; left:-203px; height:151px; width:200px"> <img src = "http:// kk.org / t / ea / ff.jpg "alt =" text "style =" fff "/> </ div>
After screening of the example
for(Element png : pngs){
sff2.append(png.attr("abs:href")).append(" ").append(png.text()).append("\n");
}
To obtain this value
init ~ kk.org ~ t / ea / ff.jpg ~ text
If I simply just want to get this value
http://kk.org/t/ea/ff.jpg
How can I do it??
I try to use
sff2.append (png.attr ("alt")). append (""). append (png.text ()). append ("\ n");
But without success

If I understand correctly, you just want to get the address of the image?
If so, this should do it.
Elements div = doc.select("div[class=kk2]");
Elements pngs = div.select("img");
for (Element png : pngs) {
String src = png.attr("abs:src");
src = src.replace(" ", ""); // Remove spaces
System.out.println(src);
}

Related

Uima Ruta Heading Levels

I'm trying to tag the Heading Levels (Level 1,Level 2,Level 3) from many headings. Using font size Information which is coming from HTML. First I took font size using regex like
"<span style=\"font-family:\'(.+?)\'"->1=fontfamily;
"<span style=\"font-family(.+?)font-size:\'(.+?)\'"->2=font size;
Now I need to compare all these heading sizes using fontsize information and tag heading levels based on it.
Input:
<span style="font-family:'Times New Roman'; font-size:'14pt'"><span class="">MATERIALS AND METHODS</span></span>
<span style="font-family:'Times New Roman'; font-size:'12pt'"><span class="">Chemicals</span></span>
<span style="font-family:'Times New Roman'; font-size:'10pt'"><span class="">HILIC-MS Profiling of Metabolites</span>
You can do something like this (tested with UIMA Ruta 2.5.0):
ENGINE utils.HtmlAnnotator;
TYPESYSTEM utils.HtmlTypeSystem;
CONFIGURE(HtmlAnnotator, "onlyContent" = false);
EXEC(HtmlAnnotator, {TAG});
DECLARE FontFamily;
DECLARE FontSize;
DECLARE Heading (INT level, INT size);
"<span style=\"font-family:\'(.+?)\'"->1=FontFamily;
"<span style=\"font-family(.+?)font-size:\'(\\d+.+?)\'"->2=FontSize;
INT size;
RETAINTYPE(MARKUP);
SPAN{-PARTOF(Heading) -> Heading, Heading.size = size}
<-{FontFamily # FontSize{PARSE(size)};};
# h:Heading{-> size = h.size};
h:Heading{h.size == size -> Heading.level = 1};
h1:Heading{h1.level != 0} # h2:Heading.level == 0
{h1.size>h2.size -> h2.level = (h1.level + 1)};
h1:Heading{h1.level != 0} # h2:Heading.level == 0
{h1.size==h2.size -> h2.level = h1.level};
RETAINTYPE;
These rules use the HtmlAnnotator, which requires somewhat valid html. I needed to added <html> tags to the document in order to get it to work.
These rules are not optimal but just a starting point. The actual rules that you should use depend mainly on the use case and on how robust they need to be.
DISCLAIMER: I am a developer of UIMA Ruta

How to filter tags in a component dialog. Adobe CQ

I am trying to filter the tags in a component dialog. I know that I can filter it by namespace, however that applies only to root level. Can I filter the tag selection one level deeper?
for example:
etc
tags
namespace
article-type
blog
news
asset-type
image
video
I want to filter the tags in the component dialog so the user can only select the tags under 'article-type'.
Thanks,
Yes and no. Officially you can go deeper according to the widget API, but there is a "bug" in the Widget JavaScript file that prevents it to work. I had the same issue and I just overwrite this JavaScript file.
Widget definition:
<article jcr:primaryType="cq:Widget"
fieldLabel="Article Type"
name="./cq:tags"
tagsBasePath="/etc/tags/namespace"
xtype="tags">
<namespaces jcr:primaryType="cq:WidgetCollection">
<ns1 jcr:primaryType="nt:unstructured" maximum="1" name="article-type" />
</namespaces>
</article>
<asset jcr:primaryType="cq:Widget"
fieldLabel="Asset Type"
name="./cq:tags"
namespaces="[asset-type]"
tagsBasePath="/etc/tags/offering"
xtype="tags"/>
In this case only one Tag below article-type can be selected; you can limit the number with the maximum attribute. The asset-type has no limits. So choose the option that suits your need.
JavaScript overwrite:
To make this work, you need to change the method CQ.tagging.parseTag in /libs/cq/tagging/widgets/source/CQ.tagging.js:
// private - splits tagID into namespace and local (also works for title paths)
CQ.tagging.parseTag = function(tag, isPath) {
var tagInfo = {
namespace: null,
local: tag,
getTagID: function() {
return this.namespace + ":" + this.local;
}
};
var tagParts = tag.split(':');
if (tagParts[0] == 'article-type' || tagParts[0] == 'asset-type') {
var realTag = tagParts[1];
var pos = realTag.indexOf('/');
tagInfo.namespace = realTag.substring(0, pos).trim();
tagInfo.local = realTag.substring(pos + 1).trim();
}
else {
// parse tag pattern: namespace:local
var colonPos = tag.indexOf(isPath ? '/' : ':');
if (colonPos > 0) {
// the first colon ":" delimits a namespace
// don't forget to trim the strings (in case of title paths)
tagInfo.namespace = tag.substring(0, colonPos).trim();
tagInfo.local = tag.substring(colonPos + 1).trim();
}
}
return tagInfo;
};

TinyMCE - applying a style over bullets and multiple paragraphs applies the style to each bullet & para - how do I avoid?

I'm trying to use the theme_advanced_styles command within TinyMCE to add classes to selections of text within the TinyMCE editor. The problem is that if the paragraph contains bullets, then the style is applied throughout them (as well as to each individual paragraph).
What I want is just for the entire selection I made to have the style class added to the start of it. Ie if my style class is 'expandCollapse' I want:
<p class="expandCollapse">some content... some content... some content... some content... som content... some content... some content...
<ul>
<li>asdsadsadsadsasda</li>
<li>asdsadsa</li>
<li>sada</li>
</ul>
asome content... some content... some content... some content... some content... some content... some content... some content... </p>
But what I get is:
<p class="expandCollapse">some content... some content... some content... some content... some content... some content... some content...
<ul>
<li class="expandCollapse">asdsadsadsadsasda</li>
<li class="expandCollapse">asdsadsa</li>
<li class="expandCollapse">sada</li>
</ul>
</p>
<p class="expandCollapse">asome content... some content... some content... some content... some content... some content... some content... some content... </p>
Any ideas anyone?!
So I had to answer my own question as I needed an answer very quickly. It appears the behaviour I was experiencing is intentional? and certainly not something that has been removed in the very latest versions of TinyMCE (both 3.x and 4.x after testing).
With this in mind I ended up having to make a plugin to do what I wanted.
I borrowed a huge amount of code by Peter Wilson, from a post he made here: http://www.tinymce.com/forum/viewtopic.php?id=20319 So thanks very much for this Peter!
I ended up slightly changing the rules from my original question in that my solution adds an outer wrapping div around all the content I want to select. This method also allowed me to reliably then grab the required areas of html with jQuery in my front-end site.
My version of Peter's code is just very slightly modified from the original in order to add a class to the DIV, rename it, use a different button etc.
The plugin works perfectly and allows for a div to be created wrapping any amount of content within TinyMCE. The divs inserted have the class name I need also applied to it.
Add 'customDiv' to your plugin AND button bar for it to appear.
(function() {
tinymce.create("tinymce.plugins.Div", {
init : function(editor, url) {
editor.addCommand("mceWrapDiv", function() {
var ed = this, s = ed.selection, dom = ed.dom, sb, eb, n, div, bm, r, i;
// Get start/end block
sb = dom.getParent(s.getStart(), dom.isBlock);
eb = dom.getParent(s.getEnd(), dom.isBlock);
// If the document is empty then there can't be anything to wrap.
if (!sb && !eb) {
return;
}
// If empty paragraph node then do not use bookmark
if (sb != eb || sb.childNodes.length > 1 || (sb.childNodes.length == 1 && sb.firstChild.nodeName != 'BR'))
bm = s.getBookmark();
// Move selected block elements into a new DIV - positioned before the first block
tinymce.each(s.getSelectedBlocks(s.getStart(), s.getEnd()), function(e) {
// If this is the first node then we need to create the DIV along with the following dummy paragraph.
if (!div) {
div = dom.create('div',{'class' : 'expandCollapse'});
e.parentNode.insertBefore(div, e);
// Insert an empty dummy paragraph to prevent people getting stuck in a nested block. The dummy has a '-'
// in it to prevent it being removed as an empty paragraph.
var dummy = dom.create('p');
e.parentNode.insertBefore(dummy, e);
//dummy.innerHTML = '-';
}
// Move this node to the new DIV
if (div!=null)
div.appendChild(dom.remove(e));
});
if (!bm) {
// Move caret inside empty block element
if (!tinymce.isIE) {
r = ed.getDoc().createRange();
r.setStart(sb, 0);
r.setEnd(sb, 0);
s.setRng(r);
} else {
s.select(sb);
s.collapse(1);
}
} else
s.moveToBookmark(bm);
});
editor.addButton("customDiv", {
//title: "<div>",
image: url + '/customdiv.gif',
cmd: "mceWrapDiv",
title : 'Wrap content in expand/collapse element'
});
}
});
tinymce.PluginManager.add("customDiv", tinymce.plugins.Div);
})();

Imperavi Redactor 9 removes &nbsp ; character

How to disable the Redactor editor auto remove &nbsp ; ? Please help.
In new version U may set "cleanSpaces" option to "false" for disabling of auto remove.
$('#redactor').redactor({ cleanSpaces: false });
The text and code you are seeing will be different between all the browsers and it's how contenteditable fields work. For example, some browsers insert UTF-8 characters for spaces some &nbsp.
RedactorJS don't gives methods to normalize the text, so you can parse the text manually. Check this:
var html = $('#redactor').redactor('get');
var sanitizeHtml = html.replace(/\u00a0/g, ' ').replace(/ /g, ' ');
to fix clean
open redactor.js
find
syncClean: function(html)
{
if (!this.opts.fullpage) html = this.cleanStripTags(html);
html = $.trim(html);
// removeplaceholder
html = this.placeholderRemoveFromCode(html);
// remove space
html = html.replace(/​/gi, '');
html = html.replace(/​/gi, '');
// html = html.replace(/ /gi, ' '); // COMMENT THIS!
...
}
comment replacing string
profit! :)

XHTML DOM - How to split a tag on IE?

Let's assume I have a part of an html document containing the following code (basic structure) :
<p>
<span class="1">This is my first content</span>
<span class="2">This is my second content</span>
</p>
I'd like to allow the user to select a part of the text and apply a new class to it.
Let's say the user selects "is my first" in the first span, and applies class "3".
I'd like to have the following result :
<p>
<span class="1">This </span>
<span class="3">is my first</span>
<span class="1"> content</span>
<span class="2">This is my second content</span>
</p>
I've managed to do this on Firefox by using the execCommand "InsertHTML", but I can't find a way to do this in IE (before IE9)
The only result I have is a nested span element, like below :
<p>
<span class="1">This <span class="3">is my first</span> content</span>
<span class="2">This is my second content</span>
</p>
Do you have any idea of how I could achieve this ?
Any help would be much appreciated !
By the way, if this looks too simple to you, how would you handle the case of a user selecting a portion of text that spans over 2 or more spans ? over 2 or more ps ?
you can get the selected segment using selection range. I would recommend using rangy, which is a cross browser range module.
Here's some "untested" code using jQuery and Rangy to hopefully point you in the right direction, for your first case:
var splitTag=function(class){
var sel = rangy.getSelection();
// this is your selection, in your example "is my first"
var r0 = sel.getRangeAt(0);
// create a new range
var r1 = rangy.createRange();
// this would be your <p>
var p = r0.endContainer.parentNode;
// set the new range to start at the end of your phrase and to end at <p>
r1.setStart(r0.endContainer, r0.endOffset);
r1.setEnd(p, p.length-1);
// extract the content of your first selection "is my first"
var r0Txt=r0.toHtml();
// make it into an span, with class set to "class argument" which would be 3
var newContent=$("<span/>").html(r0Txt).attr("class", class);
r0.deleteContents();
// insert the new node before r1
r1.insertNode(newContent[0]);
sel.removeAllRanges();
}
this should get you the result for your first situation. for selections across multiple paragraphs, here's a modification of the code:
var splitTag=function(class){
var sel = rangy.getSelection();
var r0 = sel.getRangeAt(0);
var r1 = rangy.createRange();
var p = r0.endContainer.parentNode;
r1.setStart(r0.endContainer, r0.endOffset);
r1.setEnd(p, p.length-1);
var r0Txt=r0.toHtml();
if(!r0.startContainer===r0.endContainer){
// the selection spans multiple dom's
// set the class of all spans in the highlight to 3
var newContent=$(r0Txt).find("span").attr("class",class);
}else{
var newContent=$("<span/>").html(r0Txt).attr("class", class);
}
r0.deleteContents();
r1.insertNode(newContent[0]);
sel.removeAllRanges();
}