I need to use the named entities for special character, but unable to find any thing for the two character U+1e7c (Ṽ) & U+1e7d (ṽ)?, i searched for it unable to find any where in the available lists online. kindly help.
I'm not sure if there are named entities for those characters. You can make your own though or just use either the hex (Ṽ and ṽ) or dec (Ṽ and ṽ) references.
Example of creating your own named entities:
<!ENTITY Vtilde "Ṽ">
<!ENTITY vtilde "ṽ">
Example usage:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" [
<!ENTITY Vtilde "Ṽ">
<!ENTITY vtilde "ṽ">
]>
<html>
<head>
<title></title>
</head>
<body>
<p>Here is the uppercase V with a tilde char: "&Vtilde;".</p>
<p>Here is the lowercase v with a tilde char: "&vtilde;".</p>
</body>
</html>
Related
I have used UTF-8 encoding and ASP classic with vbscript as default scripting language in my website. I have separated files to smaller parts for better management.
I always use this trick in first line of separated files to preserve UTF-8 encoding while saving files elsewhere the language characters are converted to weird characters.
mainfile.asp
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body>
<!--#include file="sub.asp"--->
</body>
</html>
sub.asp
<%if 1=2 then%>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<%end if%>
this is some characters in other language:
تست متن به زبان فارسی
This trick works good for offline saving and also works good when the page is running on the server because these Extra lines are omitted (because the condition is always false!):
Is there a better way to preserve encoding in separated files?
I use Microsoft expression web for editing files.
I use Textpad to ensure that all main files and includes are saved in UTF-8 encoding. Just hit 'Save As' and set the encoding dropdown on the dialog to the one you want.
Keep the meta tag as well because that is still necessary.
I am looking in a an HTML file to modify for the purpose of easy parsing. I need to put each item of HTML after body to separate line.
eg my current HTML file is
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-type" />
<meta name="ncc:files" content="78" />
</head>
<body>
<h1 class="title" id="h1">ABOUT DAISY</h1>
<h1 class="section" id="h7">
Cover
</h1>
<span class="page-normal" id="p13">
1
</span>
<h1 class="section" id="h18">
Swadesaabhimaani, K. Kelappan, Muhammad Abdul Rahiman
</h1>
<span class="page-normal" id="p24">
2
</span>
<span class="page-normal" id="p33">
3
</span>
<h1 class="section" id="h38">
Title
</h1>
<span class="page-normal" id="p45">
4
</span>
<h1 class="section" id="h50">
Publication
</h1>
<span class="page-normal" id="p69">
5
</span>
<h1 class="section" id="h74">
K. Ramakrishnapilla
</h1>
</body>
</html>
Required html after <body> tag is
<h1 class="title" id="h1">ABOUT DAISY</h1>
<h1 class="section" id="h7">Cover</h1>
<span class="page-normal" id="p13">1</span>
Means each tag content must come in same line without split.
Please advise how it can be done with sed.
It can be done like: joining all the lines into one, with e.g. tr -d '\n' INFILE > OUTFILE.
Then find out all the container tags which you want to have on a separate line, and create a sed script out of it, like e.g., you want <p>, <h1>:
#sedscript.sed
s/<h1>/\n&/
s/<\/h1>/&\n/
s/<p>/\n&/
s/<\/p>/&\n/
Then run it with sed -f sedscript.sed OUTFILE.
Although it might suit your needs, it can't handle mal-formatted HTML (e.g. overlapping tags, etc.).
I always use UTF-8 everywhere. But I just stumbled upon a strange issue.
Here's a minimal example html file:
<html>
<head>
<meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<script type="text/javascript">
function Foo()
{
var eacute_utf8 = "\xC3\xA9";
var eacute_ansi = "\xE9";
document.getElementById("bla1").value = eacute_utf8;
document.getElementById("bla2").value = eacute_ansi;
}
</script>
</head>
<body onload="Foo()">
<input type="text" id="bla1">
<input type="text" id="bla2">
</body>
</html>
The html contains a utf-8 charset header, thus the page uses utf-8 encoding. Hence I would expect the first field to contain an 'é' (e acute) character, and the second field something like '�', as a single E9 byte is not a valid utf-8 encoded string.
However, to my surprise, the first contains 'é' (as if the utf-8 data is interpreted as some ansi variant, probably iso-8859-1 or windows-1252), and the second contains the actual 'é' char. Why is this!?
Note that my problem is not related to the particular encoding that my text editor uses - this is exactly why I used the explicit \x character constructions. They contain the correct, binary representation (in ascii compatible notation) of this character in ansi and utf-8 encoding.
Suppose I would want to insert a 'ę' character, that's unicode U+0119, or 0xC4 0x99 in utf-8 encoding, and does not exist in iso-8859-1 or windows-1252 or latin1. How would that even be possible?
JavaScript strings are always strings of Unicode characters, never bytes. Encoding headers or meta tags do not affect the interpretation of escape sequences. The \x escapes do not specify bytes but are shorthand for individual Unicode characters. Therefore the behavior is expected. \xC3 is equivalent to \u00C3.
I can't get Zend_form to accept any inserted latin characters (ü, é, etc).
Even if I'm not validating it doesn't accept this.
Does anyone now how to get this to work?
Gr. Tosh
After doing a couple of tests, it seems to be a simple character encoding issue.
Your server is probably not delivering documents with UTF-8 encoding. You can easily force this in your view / layout by placing this in your <head> (preferably as the first child)
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
or if using a HTML 5 doctype
<meta charset="utf-8">
It probably doesn't hurt to set the Zend_View encoding as well in your application config file though this wasn't necessary in my tests (I think "UTF-8" is the default anyway)
resources.view.encoding = "utf-8"
Is there any way of defining macros (like tex macros o latex defines) in DocBook documents?
DocBook is very verbose, and macros would help a lot. I didn't find them
in quickstart tutorials.
If so, could anyone provide a simple example or a link to?
Thanks
Not sure, if this is exactly what you want / if it full fills your requirements, but I'm thinking of ENTITYs. You can define them at the top (of your XML document, so general XML, nothing DocBook specific). As seen here for the 'doc.release.number' and 'doc.release.date'. But they can also be included through an separate file. As seen in the 3th ENTITY row. Here the SYSTEM means, comming from another file 'entities.ent'.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
<!ENTITY doc.release.number "1.0.0.beta-1" >
<!ENTITY doc.release.date "April 2010" >
<!ENTITY % entities SYSTEM "entities.ent" >
%entities;
]>
<!-- This document is based on http://readyset.tigris.org/nonav/templates/userguide.html -->
<article lang="en">
<articleinfo>
<title>&project.impl.title; - User Manual</title>
<subtitle></subtitle>
<date>&project.impl.release.date;</date>
<copyright>
<year>doc.release.year</year>
<holder>Team - &project.impl.title;</holder>
</copyright>
<releaseinfo>&doc.release.number;</releaseinfo>
</articleinfo>
<section>
<title>Introduction</title>
<para>
The &project.impl.title; has been created to clean up (X)HTML and XML documents as part of
</para>
<section>
</article>
In the document you reference the entities through a starting & and ending ; as in &project.impl.title;
In the file 'entities.ent' you specify the ENTITY elements in a similar way:
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY project.impl.title 'Maven Tidy Plug-in' >
<!ENTITY project.impl.group-id 'net.sourceforge.docbook-utils.maven-plugin' >
<!ENTITY project.impl.artifact-id 'maven-tidy-plugin' >
<!ENTITY project.impl.release.number '1.0.0.beta-1' >
<!ENTITY project.impl.release.date 'April 2010' >
<!ENTITY project.impl.release.year '2010' >
<!ENTITY project.impl.url '../' >
<!ENTITY project.spec.title '' >
<!ENTITY project.spec.release.number '' >
<!ENTITY project.spec.release.date '' >
<!ENTITY doc.release.year '2010' >
Not exactly what you asked for, but perhaps helpful for some of your cases: you can define templates in your wrapper stylesheet where you define fo commands. Some examples:
Code:
<xsl:template match="symbolchar">
<fo:inline font-family="Symbol">
<xsl:choose>
<xsl:when test=".='ge'">≥</xsl:when>
<xsl:when test=".='le'">≤</xsl:when>
<xsl:when test=".='sqrt'">√</xsl:when>
<xsl:otherwise>?!?</xsl:otherwise>
</xsl:choose>
</fo:inline>
</xsl:template>
Usage:
<symbolchar>le</symbolchar>
Code:
<xsl:template match="processing-instruction('linebreak')">
<fo:block/>
</xsl:template>
Usage:
<?linebreak?>
Have you considered generating DocBook from another format (like reStructuredText?)
I found it quite nice for documentation.
Also, you could probably write a macro preprocessor (or look into m4) pretty quickly. If you are using the XML version of DocBook, a simple XSLT will do. Just make up some tags and transform them. Have boilerplate stuff added automatically. And get ready to be really angry at XSLT. For not being all it could be. For making your thinking warp.