Handle DTD with Eclipse - eclipse

I have built this DTD :
<!ELEMENT universes (universe+)>
<!ELEMENT universe (index,name,conf)>
<!ELEMENT index (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT conf (speed,resources-cdr,moons,bots)>
<!ELEMENT speed (game,fleet,resources)>
<!ELEMENT game (#PCDATA)>
<!ELEMENT fleet (#PCDATA)>
<!ELEMENT resources (#PCDATA)>
<!ELEMENT resources-cdr (ships,defs) >
<!ELEMENT ships (#PCDATA)>
<!ELEMENT defs (#PCDATA)>
<!ELEMENT moons (#PCDATA)>
<!ELEMENT bots (#PCDATA)>
and I use it inside a xml file like this :
<!DOCTYPE universes SYSTEM "universes.dtd" >
Now under Eclipse (indigo) when I use CTRL+SPACE to see elements list, I see simple elements only (those #PCDATA) not others. See below :
In this case, I see index and name proposals but not conf proposal.
If I enter conf tag manually, not with wizard, I have similar problem with nested tags :
How can I modify this Eclipse behaviour please ?
Thank you

ok problem is solved.
In my case, I have created xml file before linking it with its DTD.
If I create new xml file with Eclipse and choose create XML file from a DTD it's work

Related

VSCode: Delete all occurences of xml tag pair including differing contents

I'm working in a kml (xml) file in VSCode. There are 267 instances of the <description></description> tags with the same contents schema but different contents. I would like a fast way to delete all of the instances of <description> including the contents instead of manually deleting each one. I'm not married to VSCode if Notepad++ or another editor will do what I'm trying to do.
Use one command/macro to delete both of these (plus 265 more)
<description><![CDATA[<center><table><tr><th colspan='2' align='center'>
<em>Attributes</em></th></tr><tr bgcolor="#E3E3F3">
<th>NAME</th>
<td>Anderson</td>
</tr><tr bgcolor="#E3E3F3">
</tr></table></center>]]>
</description>
<description><![CDATA[<center><table><tr><th colspan='2' align='center'>
<em>Attributes</em></th></tr><tr bgcolor="#E3E3F3">
<th>NAME</th>
<td>Billingsly</td>
</tr><tr bgcolor="#F00000">
</tr></table></center>]]>
</description>
Thank you, Paul
You can use this regex in vscode find/replace:
\n?<description>[\S\s\n]*?<\/description>\n?
and replace with nothing. The \n?'s at the beginning and end are there if you want to delete the lines the tags occur on as well - see how it works, you can remove those if you don't care about empty lines where your deleted content used to be.
Obviously, if you have malformed input, like unmatched <description> or </description> tags the regex won't work.

Apache FOP insert special character [duplicate]

I am maintaining a program which uses the Apache FOP for printing PDF documents. There have been a couple complaints about the Chinese characters coming up as "####". I have found an existing thread out there about this problem and done some research on my side.
http://apache-fop.1065347.n5.nabble.com/Chinese-Fonts-td10789.html
I do have the uming.tff language files installed on my system. Unlike the person in this thread, I am still getting the "####".
From this point forward, has anyone seen a work around that would allow you to print complex characters in a PDF document using Apache FOP?
Three steps must be taken for chinese characters to correctly show in a PDF file created with FOP (this is also true for all characters not available in the default font, and more generally to use a non-default font).
Let us use this simple fo example to show the warnings produced by FOP when something is wrong:
<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="one">
<fo:region-body />
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="one">
<fo:flow flow-name="xsl-region-body">
<!-- a block of chinese text -->
<fo:block>博洛尼亚大学中国学生的毕业论文</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
Processing this input, FOP gives several warnings similar to this one:
org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "?" (0x535a) not available in font "Helvetica".
...
Without any explicit font-family indication in the FO file, FOP defaults to using Helvetica, which is one of the Base-14 fonts (fonts that are available everywhere, so there is no need to embed them).
Each font supports a set of characters, assigning a visible glyphs to them; when a font does not support a character, the above warning is produced, and the PDF shows "#" instead of the missing glyph.
Step 1: set font-family in the FO file
If the default font doesn't support the characters of our text (or we simply want to use a different font), we must use the font-family property to state the desired one.
The value of font-family is inherited, so if we want to use the same font for the whole document we can set the property on the fo:page-sequence; if we need a special font just for some paragraphs or words, we can set font-family on the relevant fo:block or fo:inline.
So, our input becomes (using a font I have as example):
<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="one">
<fo:region-body />
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="one">
<fo:flow flow-name="xsl-region-body">
<!-- a block of chinese text -->
<fo:block font-family="SimSun">博洛尼亚大学中国学生的毕业论文</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
But now we get a new warning, in addition to the old ones!
org.apache.fop.events.LoggingEventListener processEvent
WARNING: Font "SimSun,normal,400" not found. Substituting with "any,normal,400".
org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "?" (0x535a) not available in font "Times-Roman".
...
FOP doesn't know how to map "SimSun" to a font file, so it defaults to a generic Base-14 font (Times-Roman) which does not support our chinese characters, and the PDF still shows "#".
Step 2: configure font mapping in FOP's configuration file
Inside FOP's folder, the file conf/fop.xconf is an example configuration; we can directly edit it or make a copy to start from.
The configuration file is an XML file, and we have to add the font mappings inside /fop/renderers/renderer[#mime = 'application/pdf']/fonts/ (there is a renderer section for each possible output mime type, so check you are inserting your mapping in the right one):
<?xml version="1.0"?>
<fop version="1.0">
...
<renderers>
<renderer mime="application/pdf">
...
<fonts>
<!-- specific font mapping -->
<font kerning="yes" embed-url="/Users/furini/Library/Fonts/SimSun.ttf" embedding-mode="subset">
<font-triplet name="SimSun" style="normal" weight="normal"/>
</font>
<!-- "bulk" font mapping -->
<directory>/Users/furini/Library/Fonts</directory>
</fonts>
...
</renderer>
...
</renderers>
</fop>
each font element points to a font file
each font-triplet entry identifies a combination of font-family + font-style (normal, italic, ...) + font-weight (normal, bold, ...) mapped to the font file in the parent font element
using directory elements it is also possible to automatically configure all the font files inside the indicated folders (but this takes some time if the folders contain a lot of fonts)
If we have a complete file set with specific versions of the desired font (normal, italic, bold, light, bold italic, ...) we can map each file to the precise font triplet, thus producing a very sophisticated PDF.
On the opposite end of the spectrum we can map all the triplet to the same font file, if it's all we have available: in the output all text will appear the same, even if in the FO file parts of it were marked as italic or bold.
Note that we don't need to register all possible font triplets; if one is missing, FOP will use the font registered for a "similar" one (for example, if we don't map the triplet "SimSun,italic,400" FOP will use the font mapped to "SimSun,normal,400", warning us about the font substitution).
We are not done yet, as without the next and last step nothing changes when we process our input file.
Step 3: tell FOP to use the configuration file
If we are calling FOP from the command line, we use the -c option to point to our configuration file, for example:
$ fop -c /path/to/our/fop.xconf input.fo input.pdf
From java code we can use (see also FOP's site):
fopFactory.setUserConfig(new File("/path/to/our/fop.xconf"));
Now, at last, the PDF should correctly use the desired fonts and appear as expected.
If instead FOP terminates abruptly with an error like this:
org.apache.fop.cli.Main startFOP
SEVERE: Exception org.apache.fop.apps.FOPException: Failed to resolve font with embed-url '/Users/furini/Library/Fonts/doesNotExist.ttf'
it means that FOP could not find the font file, and the font configuration needs to be checked again; typical causes are
a typo in the font url
insufficient privileges to access the font file

In Spring MVC form-tag: Escape values for XML, not for XHTML

I use the Spring form taglib to generate html-forms within my xhtml page which is delivered with Content-Type: application/xhtml+xml;charset=UTF-8.
By default the the taglib escapes characters for HTML and thus it escapes e.g. the german umlaut ü to ü which is OK for HTML, but not for XML - it causes an unknown entity error on the client.
Of course I still want the XML characters (like <) to be escaped, but not perfectly valid UTF-8 characters. The taglib does have an option escapeHTML which I can set to false (even globally in web.xml), but then the XML-entities are not escaped anymore.
Surprisingly Google did not turn up anything useful here. It can't be that much of an uncommon problem, can it?
Read the source, it helps!
The escape symbols are loaded from classpath from file HtmlCharacterEntityReferences.properties in package org.springframework.web.util.
Create a file with the same name in the same package in a classpath folder with a higher priority than the spring-web.jar and with the following content:
160 = #160
34 = quot
38 = amp
39 = #39
60 = lt
62 = gt
And you'll be good.
It still feels a little hackish... I could not find any documentation about this and if it is not a documented feature it might easily be changed in a future version. Maybe someone has a better solution...
Every XHTML document served up with application/xhtml+xml should have an XHTML DOCTYPE declaration.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
(or any other valid XHTML DOCTYPE.)
The DTD in the declaration includes all HTML entity names, so you can use all of HTML's named references if you want.
That said, I think it's strange that Spring escapes things like ü. That shouldn't be necessary if the charset is UTF-8.

How can I spell check for multiple languages in emacs?

I write mostly my documentation in HTML using emacs as my main editor. Emacs let you interactively spell-check the current buffer with the command ispell-buffer.
Since I switch between a number of languages, I have an HTML comment at the end of the file specifying the main dictionary and personal dictionary for that file, E.g. for Norwegian (norsk) I use the following pair of dictionaries:
<!-- Local IspellDict: norsk -->
<!-- Local IspellPersDict: ~/.aspell/personal.dict -->
This works great.
However, sometimes I have a paragraph in another language (e.g. English) embedded in an otherwise Norwegian document. Example:
<p xml:lang="en">This paragraph is in English.</p>
The spell-checker naturally flag all the words in such a paragraph as misspellings (since the dictionary only contain Norwegian words).
To avoid this, I've tried to add a "british" dictionary to the document, like this:
<!-- Local IspellDict: british -->
<!-- Local IspellDict: norsk -->
<!-- Local IspellPersDict: ~/.aspell/personal.dict -->
Unfortunately, this does not work. The "british" dictionary is simply ignored.
My prefered solution would to load an additional dictionary and use this, toghether with the primary dictionary, for spell-checking. Is this possible?
However, I am also interested in a solution that let me mark paragraphs for not being spell checked. It is not ideal, but it would stop valid English words from being flagged as misspellings.
PS: I have also looked at the answer to this question: Multilingual spell checking with language detection, but it is much broader and does not address the specific use emacs ispell for doing the spell-check.
Try ispell-multi and flyspell-xml-lang http://www.dur.ac.uk/p.j.heslin/Software/Emacs/
You can spawn multiple instances of ispell, and use the xml:lang tag to decide which language to check for.

DocBook macros?

Is there any way of defining macros (like tex macros o latex defines) in DocBook documents?
DocBook is very verbose, and macros would help a lot. I didn't find them
in quickstart tutorials.
If so, could anyone provide a simple example or a link to?
Thanks
Not sure, if this is exactly what you want / if it full fills your requirements, but I'm thinking of ENTITYs. You can define them at the top (of your XML document, so general XML, nothing DocBook specific). As seen here for the 'doc.release.number' and 'doc.release.date'. But they can also be included through an separate file. As seen in the 3th ENTITY row. Here the SYSTEM means, comming from another file 'entities.ent'.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
<!ENTITY doc.release.number "1.0.0.beta-1" >
<!ENTITY doc.release.date "April 2010" >
<!ENTITY % entities SYSTEM "entities.ent" >
%entities;
]>
<!-- This document is based on http://readyset.tigris.org/nonav/templates/userguide.html -->
<article lang="en">
<articleinfo>
<title>&project.impl.title; - User Manual</title>
<subtitle></subtitle>
<date>&project.impl.release.date;</date>
<copyright>
<year>doc.release.year</year>
<holder>Team - &project.impl.title;</holder>
</copyright>
<releaseinfo>&doc.release.number;</releaseinfo>
</articleinfo>
<section>
<title>Introduction</title>
<para>
The &project.impl.title; has been created to clean up (X)HTML and XML documents as part of
</para>
<section>
</article>
In the document you reference the entities through a starting & and ending ; as in &project.impl.title;
In the file 'entities.ent' you specify the ENTITY elements in a similar way:
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY project.impl.title 'Maven Tidy Plug-in' >
<!ENTITY project.impl.group-id 'net.sourceforge.docbook-utils.maven-plugin' >
<!ENTITY project.impl.artifact-id 'maven-tidy-plugin' >
<!ENTITY project.impl.release.number '1.0.0.beta-1' >
<!ENTITY project.impl.release.date 'April 2010' >
<!ENTITY project.impl.release.year '2010' >
<!ENTITY project.impl.url '../' >
<!ENTITY project.spec.title '' >
<!ENTITY project.spec.release.number '' >
<!ENTITY project.spec.release.date '' >
<!ENTITY doc.release.year '2010' >
Not exactly what you asked for, but perhaps helpful for some of your cases: you can define templates in your wrapper stylesheet where you define fo commands. Some examples:
Code:
<xsl:template match="symbolchar">
<fo:inline font-family="Symbol">
<xsl:choose>
<xsl:when test=".='ge'">≥</xsl:when>
<xsl:when test=".='le'">≤</xsl:when>
<xsl:when test=".='sqrt'">√</xsl:when>
<xsl:otherwise>?!?</xsl:otherwise>
</xsl:choose>
</fo:inline>
</xsl:template>
Usage:
<symbolchar>le</symbolchar>
Code:
<xsl:template match="processing-instruction('linebreak')">
<fo:block/>
</xsl:template>
Usage:
<?linebreak?>
Have you considered generating DocBook from another format (like reStructuredText?)
I found it quite nice for documentation.
Also, you could probably write a macro preprocessor (or look into m4) pretty quickly. If you are using the XML version of DocBook, a simple XSLT will do. Just make up some tags and transform them. Have boilerplate stuff added automatically. And get ready to be really angry at XSLT. For not being all it could be. For making your thinking warp.