I have a program that is essentially a search application, and it exists in both VBScript and Perl form (I'm trying to make an executable with a GUI).
Currently the search application outputs an HTML file, and if a section of text in the HTML is longer than twelve lines then it hides anything after that and includes a clickable More... tag.
This is done in XSLT and works with VBScript.
I literally copied and pasted the stylesheet into the Perl program that I'm using and it does everything right except for the More... tag.
Is there any reason why it would be working with the VBScript but not Perl?
I'm using XML::LibXSLT in the Perl script, and here is the template that is supposed to be creating the More... tag
<xsl:template name="more">
<xsl:param name="text"/>
<xsl:param name="row-id"/>
<xsl:param name="cycle" select="1"/>
<xsl:choose>
<xsl:when test="($cycle > 12) and contains($text,'
')">
<span class="show" onclick="showID('SHOW{$row-id}');style.display = 'none';">More...</span>
<span class="hidden" id="SHOW{$row-id}">
<xsl:call-template name="highlight">
<xsl:with-param name="text" select="$text"/>
</xsl:call-template>
</span>
</xsl:when>
<xsl:when test="contains($text,'
')">
<xsl:call-template name="highlight">
<xsl:with-param name="text" select="substring-before($text,'
')"/>
</xsl:call-template>
<xsl:text>
</xsl:text>
<xsl:call-template name="more">
<xsl:with-param name="text" select="substring-after($text,'
')"/>
<xsl:with-param name="row-id" select="$row-id"/>
<xsl:with-param name="cycle" select="$cycle + 1"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="highlight">
<xsl:with-param name="text" select="$text"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
I believe the problem is that your XSLT is searching the text for CR characters -
.
Windows files use the CR LF character pair to terminate each line of text, and a version of Perl running on a Windows system will strip the CR to leave just the LF.
This provides an API compatible with Linux-like systems, which use just LF in the first place, but means your XSLT stylesheet doesn't find any CR characters when it looks for them.
I suggest you change your stylesheet to search instead for LF characters, which will be present at the end of every line regardless of the file's origin, and will be seen by both Perl and VBScript.
I think character codes are best expressed in hex, so you would change '
to '
throughout your XSLT code.
Note
By making this change your strings will be left with a trailing CR after you use substring-before($text, '
'). You can either leave this in place — I don't think it will do any harm as it won't be rendered by a browser — or you can remove from the string that you pass to the more template when you call it
<xsl:call-template name="more">
<xsl:with-parameter name="text" value="translate($text, '
', '')"/>
...
</xsl:call-template>
That would leave the template with a clean string to process, containing no CR characters.
Related
I need to create i18n properties files for non-Latin languages (simplified Chinese, Japanese Kanji, etc.) With the Swing portion of our product, we use Java properties files with the raw UTF-8 characters in them, which Netbeans automatically converts to 8859-1 for us, and it works fine. With JavaFX, this strategy isn't working. Our strategy matches this answer precisely which doesn't seem to be working in this case.
In my investigation into the problem, I discovered this old article indicating that I need to use native2ascii to convert the characters in the properties file; still doesn't work.
In order to eliminate as many variables as possible, I created a sample FXML project to illustrate the problem. There are three internationalized labels in Japanese Kanji. The first label has the text in the FXML document. The second loads the raw unescaped character from the properties file. The third loads the escaped Unicode (matching native2ascii output).
jp_test.properties
btn.one=閉じる
btn.two=\u00e9\u2013\u2030\u00e3\ufffd\u02dc\u00e3\u201a\u2039
jp_test.fxml
<?xml version="1.0" encoding="UTF-8"?>
<?import java.lang.*?>
<?import java.util.*?>
<?import javafx.scene.control.*?>
<?import javafx.scene.layout.*?>
<?import javafx.scene.paint.*?>
<?scenebuilder-preview-i18n-resource jp_test.properties?>
<AnchorPane id="AnchorPane" maxHeight="-Infinity" maxWidth="-Infinity" minHeight="-Infinity" minWidth="-Infinity" prefHeight="147.0" prefWidth="306.0" xmlns:fx="http://javafx.com/fxml">
<children>
<Label layoutX="36.0" layoutY="33.0" text="閉じる" />
<Label layoutX="36.0" layoutY="65.0" text="%btn.one" />
<Label layoutX="36.0" layoutY="97.0" text="%btn.two" />
<Label layoutX="132.0" layoutY="33.0" text="Static Label" textFill="RED" />
<Label layoutX="132.0" layoutY="65.0" text="Properties File Unescaped" textFill="RED" />
<Label layoutX="132.0" layoutY="97.0" text="Properties File Escaped" textFill="RED" />
</children>
</AnchorPane>
Result
As you can see, the third label is not rendered correctly.
Environment:
Java 7 u21, u27, u45, u51, 32-bit and 64-bit. (JavaFX 2.2.3-2.2.45)
Windows 7 Enterprise, Professional 64-bit.
UPDATE
I've verified that the properties files is ISO 8859-1
Most IDEs (NetBeans at least) handle the files in unicode encoding by default. If you are creating the properties files in NetBeans and entering the Japanese text in it, then the entered text will be automatically encoded to utf. To see this open the properties file with notepad(++), you will see that the Japanese characters are escaped.
The utf escaped equivalent of "閉じる" is "\u9589\u3058\u308b", whereas "\u00e9\u2013\u2030\u00e3\ufffd\u02dc\u00e3\u201a\u2039" is "é–‰ã�˜ã‚‹" on reverse side. So the program output in the picture is correct. Additionally, if you reopen the jp_test.properties file in NetBeans, you will see the escaped utf encoded texts will be seen as decoded.
EDIT: as per comment,
Why does it do this?
It maybe because you are omitting the -encoding parameter of native2ascii, then the default charset of your system may not be UTF. This maybe the reason of that output.
Also, why is it that Java and Swing have no problems with our properties files as they are,
but FXML can't handle it?
It cannot be the case, because the "FXML is a Java". The only difference may also be the "usage of system charset" vs "overriding the charset in some configuration place".
Anyway, I suggest using right encoding parameter of native2ascii according to the input files encoding. More specifically, convert the properties files to utf-8 encoding first then do the rest. If you are using NetBeans as IDE, then no need for native2ascii.
Properties files should be ISO 8859-1 encoded, not UTF-8.
Characters can be escaped using \uXXXX.
Tools such as NetBeans are doing this by default, AFAIK.
http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html
http://docs.oracle.com/javase/tutorial/i18n/text/convertintro.html
I want to use mpdf to create my PDF-files because I use norwegian letters such as ÆØÅ. The information on the PDF-file would mostly consist of text written by the user in a HTML form. But, I have some problems.
When using this code:
$mpdf->WriteHTML('Text with ÆØÅ');
The PDF will show the special characters.
But when using this:
<?php
include('mpdf/mpdf.php');
$name = 'Name - <b>' . $_POST['name'] . '</b>';
$mpdf = new mPDF();
$mpdf->WriteHTML($name);
$mpdf->Output();
exit;
?>
The special characters will not show.
The HTML form looks like this:
<form action="hidden.php" method="POST">
<p>Name:</p>
<input type="text" name="name">
<input type="submit" value="Send"><input type="reset" value="Clear">
</form>
Why won't the special characters show with this method? And which method should I use?
Since echoing the POST-data back onto the website does not show the characters as well, this clearly isn't an issue with mpdf. When using content including non-Ascii characters, special care about the websites character encoding has to be taken.
From the mpdf-documentation it can be seen that it supports UTF-8 encoding, so you might want to use that for your data. POST-data is received in the same encoding that is used by the website. So if the website is in latin-1, you will need to call utf8_encode() to convert the POST-data to unicode. If the website already uses UTF-8 you should be just fine.
If you don't set a specific encoding in the website header (which you should always to avoid this kind of trouble), encoding might depend on several factors such as the operating system and configuration on the server or the encoding of the original php sourcefile which, as it turns out, is influenced by your own OS configuration and choice of editor.
I'm creating the german version of a chm help file. My problem is in Table of Contents umlauts are not displayed. I assume it is because of code page. The hhc file is ANSI. Converting it to Unicode doesn't help - it displays different, but still wrong, characters.
The file "Table of Contents.hhc" starts with
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML>
<HEAD>
<meta name="GENERATOR" content="Microsoft® HTML Help Workshop 4.1">
<!-- Sitemap 1.0 -->
</HEAD><BODY>
<OBJECT type="text/site properties">
<param name="ImageType" value="Folder">
</OBJECT>
<UL>
<LI> <OBJECT type="text/sitemap">
<param name="Name" value="ÜÜÜÜÜÜÜÜÜÜÜÜÜÜ Uberblick">
<param name="Local" value="overview.htm">
<param name="URL" value="overview.htm">
</OBJECT>
</UL>
</BODY></HTML>
Make sure the "Language" setting in the "Options" section of the project file supports the character you want. Since you are on a Russian system, the default is probably Russian. Change it to German, for instance.
The engine rendering the chm is Unicode, only the compiler is ansi.
Try escaping them? http://www.w3schools.com/tags/ref_entities.asp
or the charset encoding:http://www.w3.org/TR/html4/charset.html#h-5.2.2
Actually, you don't need UTF-8 for CHM files because CHM doesn't support UTF-8 or Unicode. CHM is an ancient format that Microsoft has not really changed since Windows 98, and it has a number of quirks and restrictions like this
Read for more detail...
https://helpman.it-authoring.com/viewtopic.php?t=9294
https://blogs.msdn.microsoft.com/sandcastle/2007/09/29/chm-localization-and-unicode-issues-dbcsfix-exe/
Is there any way of defining macros (like tex macros o latex defines) in DocBook documents?
DocBook is very verbose, and macros would help a lot. I didn't find them
in quickstart tutorials.
If so, could anyone provide a simple example or a link to?
Thanks
Not sure, if this is exactly what you want / if it full fills your requirements, but I'm thinking of ENTITYs. You can define them at the top (of your XML document, so general XML, nothing DocBook specific). As seen here for the 'doc.release.number' and 'doc.release.date'. But they can also be included through an separate file. As seen in the 3th ENTITY row. Here the SYSTEM means, comming from another file 'entities.ent'.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
<!ENTITY doc.release.number "1.0.0.beta-1" >
<!ENTITY doc.release.date "April 2010" >
<!ENTITY % entities SYSTEM "entities.ent" >
%entities;
]>
<!-- This document is based on http://readyset.tigris.org/nonav/templates/userguide.html -->
<article lang="en">
<articleinfo>
<title>&project.impl.title; - User Manual</title>
<subtitle></subtitle>
<date>&project.impl.release.date;</date>
<copyright>
<year>doc.release.year</year>
<holder>Team - &project.impl.title;</holder>
</copyright>
<releaseinfo>&doc.release.number;</releaseinfo>
</articleinfo>
<section>
<title>Introduction</title>
<para>
The &project.impl.title; has been created to clean up (X)HTML and XML documents as part of
</para>
<section>
</article>
In the document you reference the entities through a starting & and ending ; as in &project.impl.title;
In the file 'entities.ent' you specify the ENTITY elements in a similar way:
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY project.impl.title 'Maven Tidy Plug-in' >
<!ENTITY project.impl.group-id 'net.sourceforge.docbook-utils.maven-plugin' >
<!ENTITY project.impl.artifact-id 'maven-tidy-plugin' >
<!ENTITY project.impl.release.number '1.0.0.beta-1' >
<!ENTITY project.impl.release.date 'April 2010' >
<!ENTITY project.impl.release.year '2010' >
<!ENTITY project.impl.url '../' >
<!ENTITY project.spec.title '' >
<!ENTITY project.spec.release.number '' >
<!ENTITY project.spec.release.date '' >
<!ENTITY doc.release.year '2010' >
Not exactly what you asked for, but perhaps helpful for some of your cases: you can define templates in your wrapper stylesheet where you define fo commands. Some examples:
Code:
<xsl:template match="symbolchar">
<fo:inline font-family="Symbol">
<xsl:choose>
<xsl:when test=".='ge'">≥</xsl:when>
<xsl:when test=".='le'">≤</xsl:when>
<xsl:when test=".='sqrt'">√</xsl:when>
<xsl:otherwise>?!?</xsl:otherwise>
</xsl:choose>
</fo:inline>
</xsl:template>
Usage:
<symbolchar>le</symbolchar>
Code:
<xsl:template match="processing-instruction('linebreak')">
<fo:block/>
</xsl:template>
Usage:
<?linebreak?>
Have you considered generating DocBook from another format (like reStructuredText?)
I found it quite nice for documentation.
Also, you could probably write a macro preprocessor (or look into m4) pretty quickly. If you are using the XML version of DocBook, a simple XSLT will do. Just make up some tags and transform them. Have boilerplate stuff added automatically. And get ready to be really angry at XSLT. For not being all it could be. For making your thinking warp.
I have an XML documment that I want to load for the iPhone, do I need to convert it to a plist first ? if so how ?
The xml document has the following code ( for 1 chapter)
<toolTipsBook>
− <chapter index="1" name="Chapter Name">
<line index="1" text="line text here"/>
<line index="2" text=" line text here "/>
<line index="3" text=" line text here "/>
<line index="4" text=" line text here "/>
<line index="5" text=" line text here "/>
<line index="6" text=" line text here "/>
<line index="7" text=" line text here "/>
</chapter>
How can I tell xcode to display chapter 1 line 1 and then leave space under that for my comment ( a seperate xml document) for chapter l line 1 directly under it.
The idea is that I'll have this control for all the chapters in the data I'm loading.
If you have a little time I'd really appreciate it if you could give some sample could to please show what you mean.
Thanks guys,
You can add the XML file into resources of your application.
You do that by dragging to resources directory in Xcode, and in popup select copy file to project.
When your application run you open the file by referring to it by it's name, read in the XML, parse it with NSXMLParser and extract required data. No need to convert to plist.
This assumes that the xml data is static and you don't indent to update and save it.
Have you looked at NSXMLParser?
It's purpose is to parse the XML into a data structure.
Then you display that data structure using whatever user interfaces you feel are appropriate (such as a UITableView).