Lightweight doxygen html output - doxygen

Is there for doxygen is a more lightweight HTML backend, which does not fill the page with tons of divs and tables? When looking at the css file, the output seems quite bloated. It possible to write another backend. I ask if there already exists one.
Reasons why I need this
It makes it easier to integrate the dox with the rest of the website.
I use hyphenate.js to make my "Related Pages" look good. But that script needs to know which tags it should use. This is much easier with less bloat markup.
Doxygen lacks complete documentation on how the output markup making reverse engineering using Firefox Web Developer tool necessary to modify the 1k lines CSS file. Less bloat markup makes less need for documentation, and it makes documentation more easy for Dimitri to maintain.
Less bloat markup makes the pages more portable.

With doxygen you can export your data in html format, tex format, XML (which you can later parse as you want), RTF, Man pages or Docbook.
The html output supports a custom header, footer and stylesheet (CSS) with the HTML_STYLESHEET attribute which might be what you want. You can rewrite those and adjust the output as you like.
If nothing satisfies you, then you might start thinking manually parsing one of the outputs above with your own scripting language and generate the desired format by yourself (if that suits you) or taking over control of the output generation directly via doxygen sources (https://github.com/doxygen/)
Sources: http://www.doxygen.nl/manual/output.html

What you need to do really depends on exactly what you want to end up with.
There are 'Input Filters', eg: ftp://ftp.rsa.com/pub/dsg/public/doxygen/doxyfilt.pl and 'Output Filters, eg: http://www.bigsister.ch/doxygenfilter/doxygenfilter.html . Writing a customized Filter should do exactly what you want. Using existing Code and putting up with it's limitations will be faster and may provide ideas for writing your own Program (if you wish to do so).
You can try this Website http://www.dirtymarkup.com/ with the output that you object to and see if one of the Tools it suggests will "clean" the Code up enough to your liking without removing too much functionality (IE: the ability to click on Links and expand / contract Sections).
If you really want it 'raw' try HTML2Text https://pypi.python.org/pypi/html2text and then you can 'wrestle it back' with Text2HTML http://txt2html.sourceforge.net/ . That will strip it bare and yet give you back some minimal HTML functionality (preserve Linking).
There are MANY 'HTML <-> Text' converters, in many Languages, use a Search Engine to find your own Source; one that is most suitable for you.
Here is a List of Tools from a reputable Site: http://www.w3.org/Tools/html2things.html .
Here is an List of alternates to convert Languages to HTML: http://www.w3.org/Tools/Prog_lang_filters.html . More Info here: http://www.w3.org/Tools/Filters.html .

Related

How to style DITA xrefs in structured Framemaker

In the middle of a conversion project from unstructured Framemaker to DITA-compliant, structured Framemaker. Customer wants xrefs to be underlined in the output. Seems straightforward enough, but I've been all over the documentation and all over the internet and can't find what I need. The EDD file shows that we should be using the "link.external" style, which makes perfect sense, but for the life of me I can't figure out where link.external is defined. I've found one piece of documentation in all my searching that sort of comes close to what I need, but the process for styling an xref, according to this document, is long and laborious. I just can't believe that applying a simple style to an element is so hard. Where would I look for the definition of the "link.external" style (or any other style, for that matter)? What obvious point am I missing?
You apply the style in the Cross-Reference panel using building blocks in the cross-reference format(s).
For example:
Section 2.3.4, Volcanoes.
would be styled using the x-ref format below:
Section <$paranumonly>, <Emphasis><$paratext>.
Therefore, to underline all of the x-refs, create an underline character format such as Underline, and use it in a building block within every x-ref format that you have.
<Underline>“<$paratext>” on page\ <$pagenum>
The change only applies to the x-ref, not to the following text.

HTML Tidy, cleaning up MS Word markup

Have 10 years of archived article data, most of it riddled with MS Word save-as-html markup like <p class="MsoNormal">
First of all, is html tidy up to the task of stripping out MS Word generated markup, or do I need to take another approach?
Secondly, the first few years of articles are globbed together by month and stored in DB as text storage type. I'd dearly love to break these out into individual articles so I can make the site more easily searched (i.e. not bring up an entire month of news when a search term/phrase matches). The only clear pattern I have to work with to isolate the articles is the article title (in bold, between 16-20px) and the article date, generally 10px; both title and date appear prior to article body text. Is there a way to detect the <h1>-ness or <small>-ness of markup when I do not have exact markup to match against?
This may be next to impossible to answer, but just in general, what approach would you take to this unenviable task? ;-) I'm on the JVM in Scala, but could do the cleanup job on LAMP stack as well.
Ideas appreciated!
If I was you, I'd use my favorite HTML::Parser kit for Perl. If goes very well for complex and fuzzily stated problems like yours one.

Translating longer texts (view and email templates) with gettext

I'm developing a multilingual PHP web application, and I've got long(-ish) texts that I need to translate with gettext. These are email templates (usually short, but still several lines) and parts of view templates (longer descriptive blocks of text). These texts would include some simple HTML (things like bold/italic for emphasis, probably a link here or there). The templates are PHP scripts whose output is captured.
The problem is that gettext seems very clumsy for handling longer texts. Longer texts would generally have more changes over time than short texts — I can either change the msgid and make sure to update it in all translations (could be lots of work and very error-prone when the msgid is long), or I can keep the msgid unchanged and modify only the translations (which would leave misleading outdated texts in the templates). Also, I've seen advice against including HTML in gettext strings, but avoiding it would break a single natural piece of text into lots of chunks, which will be an even bigger nightmare to translate and reassemble, and I've also seen advice against unnecessary splitting of gettext strings into separate msgids.
The other approach I see is to ignore gettext altogether for these longer texts, and to separate those blocks in external subtemplates for each locale, and just include the one for the current locale. The disadvantage is that I'm separating the translation effort between gettext .po files and separate templates located in a completely different location.
Since this application will be used as a starting point for other applications in the future, I'm trying to come up with the best approach for the long term. I need some advice for best practices in such scenarios. How have you implemented similar cases? What turned out to work and what turned out a bad idea?
Here's the workflow I used, on a very heavily-trafficked site that had about several dozen long-ish blocks of styled textual content, translated into six languages:
Pick a text-based markup language (we used Markdown)
For long strings, use fixed message IDs like "About_page_intro_markdown" that:
describes the intent of the text
makes clear that it will be interpreted in markdown format
Have our app render "*_markdown" strings appropriately, making sure to allow only a few safe HTML tags
Build a tool for translators that:
shows them their Markdown rendered in realtime (sort of like the Markdown dingus)
makes it easy for them to see the now-authoritative base language translation of the text (since that's no longer in the msgid)
Teach translators how to use the new workflow
Pros of this workflow:
Message IDs don't change all the time
Because translators are editing in a safe higher-level syntax, hard to mess up HTML
Non-technical translators found it very easy to write in Markdown, vs. HTML
Cons of this workflow:
Having static unchanging message IDs means changes in the text need to be transmitted out of band (which we'd do anyway, as long text can raise questions about tone or emphasis)
I'm very happy with the way this workflow operated for our website, and would absolutely recommend it, and use it again. It took a couple of days to get started, but it was easy to build, train, and launch.
Hope this helps, and good luck with your project.
I just had this particular problem, and I believe I solved it in an elegant way.
The problem: We wanted to use Gettext in PHP, and use primary language strings as keys translations. However, for large blocks of HTML (with h1, h2, p, a, etc...) I'd either have to:
Create a translation for each tag with content.
or
Put the entire block with tags in one translation.
Neither of those options appealed to me, so this is what I did:
Keep simple strings ("OK","Add","Confirm","My Awesome App") as regular Gettext .po entries, with the original text as the key
Write content (large text blocks) in markdown, and keep them in files.
Example files would be /homepage/content.md (primary / source text), /homepage/content.da-DK.md, /homepage/content.de-DE.md
Write a class that fetches the content files (for the current locale) and parses it. I then used it like:
<?=Template::getContent("homepage/content")?>
However, what about dynamic large text? Simple. Use a templating engine. I decided on Smarty, and used it in my Template class.
I could now use templating logic.. within markdown! How awesome is that?!
Then came the tricky part..
For content to look good, at times you need to structure your HTML differently. Consider a campaign area with 3 "feature boxes" beneath it. The easy solution: Have a file for the campaign area, and one for each of the 3 boxes.
But I could do better than that.
I wrote a quick block parser, so I would write all the content in one file, and then render each block seperately.
Example file:
[block campaign]
Buy this now!
=============
Blaaaah... And a smarty tag: {$cool}
[/block]
[block feature 1]
Feature 1
---------
asdasd you get it..
[/block]
[block feature 2] ...
And this is how I would render them in the markup:
<?php
// At the top of the document...
// Class handles locale. :)
$template = Template::getContent("homepage/content", [
"cool" => "Smarty variable! AWESOME!"
]);
?>
...
<title><?=_("My Awesome App")?></title>
...
<div class="hero">
<!-- Template data already processed! :) -->
<?=$template->renderBlock("campaign")?>
</div>
<div class="featurebox">
<?=$template->renderBlock("feature 1")?>
</div>
<div class="featurebox">
<?=$template->renderBlock("feature 2")?>
</div>
I'm afraid I can't provide any source code, as this was for a company project, but I hope you get the idea.
gettext wasn't really designed for translating large pieces of text.
fwiw I've included basic HTML (strong, a, etc) in gettext strings as I was confident our translators knew what they were doing (mostly right) and that the translations would be well tested.
I've tried the approach of breaking up the text into one string per paragraph. Roughly as it looks odd if there's one paragraph of English in the middle of the text. Where one of those strings have changed this has meant that we have had to wait for translations before releasing a new version, which has slowed us down. On the plus side it's easy for translators to see which part of the text has changed. This approach worked well for the one application I've tried it with.
Splitting some text out into external locations also worked, but it caused management overhead, rather than just a .po file or two, there was a whole bunch of other text that had to be manually compared to the English version and updated accordingly. This is doable if you remember to provide notes to your translators explaining where and what the difference was in the English version.
I'm still not sold on either approach myself.

Perl: Safe templating language

Here are already several questions in SO about the safe template languages, like:
Safe ERB Language?
templating system that is safe for end users to edit
Is there a “safe” subset of Python for use as an embedded scripting language?
Is Django's templating markup for views safe for end user editing like rails liquid templating
but the above questions are for asp, ruby, python.
My question is: What templating language can allow to be edited by users in perl based web-app?
I want allow for users edit pages, (like in an wiki) with some programming possibilities, so full featured mean with cycles, conditionals, variable substitutions, includes and so on.
Is TT "enough safe"? Is here another solution as TT?
Template::Toolkit Should be fine, as long as you limit what parameters are passed to the template. If you pass classes, the templates will be able to call any method on those classes, and any method on the return values of those classes, etc. It's much better to only pass Hashes.
HTML::Template Is also a good option, and it only allows hashes by default, so you are much less likely to leave open a hole that lets the template authors execute arbitrary code.
In either case, make sure that you read the documentation for whatever you use, and clean the output in order to prevent cross-site scripting attacks. Do not rely on people customizing the templates to get the output encoding correct for you.

Best way to author man pages?

What's the best way to author man pages? Should I write using the standard man macros, or is there some clever package available now that takes some kind of XML-ified source and can output man pages, HTML, ASCII, and what not?
Thanks
I have previously used the GNU version of nroff called groff to write man pages.
Nice intro article on it here:
http://www.linuxjournal.com/article/1158
Doxygen is what you are looking for.
Keep in mind that it is designed to document source code but you could easily adapt it.
It can generate html, pdf, and latex documentation too.
If you are looking at writing once and generating different output formats such as manpages, HTML, plain txt, or even PDF, then docbook should work best.
A tool that is commonly used in the Tcl community is doctools which can produce a restricted (but useful) subset of the manpage format, suitable for rendering with groff or nroff. It can also generate both plain text and HTML directly.
For my atinout program I have been using ronn which lets you write man pages in a very, very readable markdown like syntax. I am extremely happy with it.
atinout(1) -- Send AT commands to modem, capturing the response
===============================================================
## SYNOPSIS
`atinout` <input_file>|`-` <modem_device> <output_file>|`-`<br>
`atinout` `--version`<br>
`atinout` `--usage`<br>
`atinout` `--help`<br>
## DESCRIPTION
**Atinout** reads a list of AT commands. It sends those commands one by one
to the modem, waiting for the final result code for the
currently running command before continuing with the next command in
the list. The output from the commands is saved.
...
see the whole page here.