How to add HTML section in Perl POD using Pod::Weaver - perl

I use Pod::Weaver with Dist::Zilla. It does several good things for me. It adds POD section VERSION, AUTHOR, LICENSE automatically, and in my source code I can use simple POD syntax, I can write "=method new" and it will be converted to correct POD.
Now I wanted to add an image to the POD. To do it I need to add some HTML. So I'm writing in my source code:
=begin HTML
<p>
<a href="http://upload.bessarabov.ru/bessarabov/031VBX4pHw_ALPcxRTVjflnAWuc.png">
<img src="http://upload.bessarabov.ru/bessarabov/VdagpUXEQdMslOqUyOAzwa-DOaU.png" width="500" height="125" alt="Status board graph sample" />
</a>
</p>
=end HTML
Then I write dzil release and release the module on CPAN. After uploading to CPAN I recognize that my HTML POD was changed by Pod::Weaver and now it looks like:
=for HTML <p>
<p>
<a href="http://upload.bessarabov.ru/bessarabov/031VBX4pHw_ALPcxRTVjflnAWuc.png">
<img src="http://upload.bessarabov.ru/bessarabov/VdagpUXEQdMslOqUyOAzwa-DOaU.png" width="500" height="125" alt="Status board graph sample" />
</a>
</p>
And this HTML part has been moved in the POD. I wanted it to be just after SYNOPSIS part, but not it is after the last method.
I still want to use Pod::Weaver, because it does a lot of good things, but I want HTML to be put in the exact place of the POD and not to be converted.
How can I do it?

After much reading of perldoc perlpod and testing, the following things are apparent:
According to spec, =begin html segments can be translated safely to =for html segments.
However, this may only occur when the body of the segment contains segments that are grouped in paragraphs. So foo\nbar can be translated, but foo\n\nbar cannot.
The module that is doing the transformation from =begin to =for, when seeing a double \n in its body, refuses to emit =for, and instead, reverts to emitting =begin.
Given what number 3 Here is, If you want to force it to emit =begin html, simply spicing it up with a few extra \n's will do the trick.
However, the problem remains, that for whatever reason, what appears to be valid POD format given, is not rendering as intended.
Either
A. The Pod2HTML layer is not coded right to respond to =for ( in which case, a few extra \n's to force an =begin may help.
B. There's a security level thats preventing you from using complex HTML ( which may entail hyperlinks in images, which could be an XSS security threat )
If the problem is B, then you're not going to get around that by being tricky.
Though I assure you, displaying images IS doable, I do it myself

Related

Make backticks and links overlap work with GitHub Markdown

We are trying to implement an automatic markdown generator for an easily maintainable documentation.
When mentioning a variable's type, we would like to prefix it with ? when it is nullable, use backticks around it and add a link to its description. For example: `?[Article](#article)`.
However, the backticks break the link syntax because of the overlap. We use `?`[`Article`](#article) instead to make the link works but it creates a space between ? and Article as follow: ?Article.
Is it possible to make it look like ?Article with a link on Article only?
I just tested this out and discovered that there is no space between ? and Article. What appears to be a space is simply GitHub's styling of two <code> blocks up against each other.
Wrapping the whole thing in backticks won't work because backticks indicate code, and Markdown treats the contents as if they are a code sample where you want to show the source.
The best workaround I can find is to use <code> tags directly:
<code>?[Article](https://stackoverflow.com/)</code>
On both GitHub and Stack Overflow this renders like so:
?Article
(I have used a link to Stack Overflow as the link target here simply so we get a rendered link as an example. I expect that #article will work equally well in your environment.)
In my opinion this is even a reasonable way of doing what you want. Markdown's backticks compile to <code> tags, and inline HTML code is expressly permitted by Markdown:
For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags.

Restructured text: Fall back if doctest module is missing

It seems that Github does not support the sphinx.ext.doctest syntax in their reStructuredText renderer, which is causing some problems when trying to include a doctest-style code block (.. doctest) in a README.rst which is transcluded into the documentation index (which is rendered by sphinx). If I replace the .. doctest directive with something else, it doesn't render properly as a doctest on Sphinx, but if I don't remove the directive, the code block doesn't render at all (see this gist).
Ideally I'd like to find a solution which just does the right thing in both environments, but failing that, is there a way to fall back to a .. code block or some other supported format (e.g. the rST equivalent of a <NoScript> tag)?

Lightweight doxygen html output

Is there for doxygen is a more lightweight HTML backend, which does not fill the page with tons of divs and tables? When looking at the css file, the output seems quite bloated. It possible to write another backend. I ask if there already exists one.
Reasons why I need this
It makes it easier to integrate the dox with the rest of the website.
I use hyphenate.js to make my "Related Pages" look good. But that script needs to know which tags it should use. This is much easier with less bloat markup.
Doxygen lacks complete documentation on how the output markup making reverse engineering using Firefox Web Developer tool necessary to modify the 1k lines CSS file. Less bloat markup makes less need for documentation, and it makes documentation more easy for Dimitri to maintain.
Less bloat markup makes the pages more portable.
With doxygen you can export your data in html format, tex format, XML (which you can later parse as you want), RTF, Man pages or Docbook.
The html output supports a custom header, footer and stylesheet (CSS) with the HTML_STYLESHEET attribute which might be what you want. You can rewrite those and adjust the output as you like.
If nothing satisfies you, then you might start thinking manually parsing one of the outputs above with your own scripting language and generate the desired format by yourself (if that suits you) or taking over control of the output generation directly via doxygen sources (https://github.com/doxygen/)
Sources: http://www.doxygen.nl/manual/output.html
What you need to do really depends on exactly what you want to end up with.
There are 'Input Filters', eg: ftp://ftp.rsa.com/pub/dsg/public/doxygen/doxyfilt.pl and 'Output Filters, eg: http://www.bigsister.ch/doxygenfilter/doxygenfilter.html . Writing a customized Filter should do exactly what you want. Using existing Code and putting up with it's limitations will be faster and may provide ideas for writing your own Program (if you wish to do so).
You can try this Website http://www.dirtymarkup.com/ with the output that you object to and see if one of the Tools it suggests will "clean" the Code up enough to your liking without removing too much functionality (IE: the ability to click on Links and expand / contract Sections).
If you really want it 'raw' try HTML2Text https://pypi.python.org/pypi/html2text and then you can 'wrestle it back' with Text2HTML http://txt2html.sourceforge.net/ . That will strip it bare and yet give you back some minimal HTML functionality (preserve Linking).
There are MANY 'HTML <-> Text' converters, in many Languages, use a Search Engine to find your own Source; one that is most suitable for you.
Here is a List of Tools from a reputable Site: http://www.w3.org/Tools/html2things.html .
Here is an List of alternates to convert Languages to HTML: http://www.w3.org/Tools/Prog_lang_filters.html . More Info here: http://www.w3.org/Tools/Filters.html .

Magento "Cannot send headers; headers already sent in" error [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
When running my script, I am getting several errors like this:
Warning: Cannot modify header information - headers already sent by (output started at /some/file.php:12) in /some/file.php on line 23
The lines mentioned in the error messages contain header() and setcookie() calls.
What could be the reason for this? And how to fix it?
No output before sending headers!
Functions that send/modify HTTP headers must be invoked before any output is made.
summary ⇊
Otherwise the call fails:
Warning: Cannot modify header information - headers already sent (output started at script:line)
Some functions modifying the HTTP header are:
header / header_remove
session_start / session_regenerate_id
setcookie / setrawcookie
Output can be:
Unintentional:
Whitespace before <?php or after ?>
The UTF-8 Byte Order Mark specifically
Previous error messages or notices
Intentional:
print, echo and other functions producing output
Raw <html> sections prior <?php code.
Why does it happen?
To understand why headers must be sent before output it's necessary
to look at a typical HTTP
response. PHP scripts mainly generate HTML content, but also pass a
set of HTTP/CGI headers to the webserver:
HTTP/1.1 200 OK
Powered-By: PHP/5.3.7
Vary: Accept-Encoding
Content-Type: text/html; charset=utf-8
<html><head><title>PHP page output page</title></head>
<body><h1>Content</h1> <p>Some more output follows...</p>
and <img src=internal-icon-delayed>
The page/output always follows the headers. PHP has to pass the
headers to the webserver first. It can only do that once.
After the double linebreak it can nevermore amend them.
When PHP receives the first output (print, echo, <html>) it will
flush all collected headers. Afterward it can send all the output
it wants. But sending further HTTP headers is impossible then.
How can you find out where the premature output occurred?
The header() warning contains all relevant information to
locate the problem cause:
Warning: Cannot modify header information - headers already sent by
(output started at /www/usr2345/htdocs/auth.php:52) in
/www/usr2345/htdocs/index.php on line 100
Here "line 100" refers to the script where the header() invocation failed.
The "output started at" note within the parenthesis is more significant.
It denominates the source of previous output. In this example, it's auth.php
and line 52. That's where you had to look for premature output.
Typical causes:
Print, echo
Intentional output from print and echo statements will terminate the opportunity to send HTTP headers. The application flow must be restructured to avoid that. Use functions
and templating schemes. Ensure header() calls occur before messages
are written out.
Functions that produce output include
print, echo, printf, vprintf
trigger_error, ob_flush, ob_end_flush, var_dump, print_r
readfile, passthru, flush, imagepng, imagejpeg
among others and user-defined functions.
Raw HTML areas
Unparsed HTML sections in a .php file are direct output as well.
Script conditions that will trigger a header() call must be noted
before any raw <html> blocks.
<!DOCTYPE html>
<?php
// Too late for headers already.
Use a templating scheme to separate processing from output logic.
Place form processing code atop scripts.
Use temporary string variables to defer messages.
The actual output logic and intermixed HTML output should follow last.
Whitespace before <?php for "script.php line 1" warnings
If the warning refers to output inline 1, then it's mostly
leading whitespace, text or HTML before the opening <?php token.
<?php
# There's a SINGLE space/newline before <? - Which already seals it.
Similarly it can occur for appended scripts or script sections:
?>
<?php
PHP actually eats up a single linebreak after close tags. But it won't
compensate multiple newlines or tabs or spaces shifted into such gaps.
UTF-8 BOM
Linebreaks and spaces alone can be a problem. But there are also "invisible"
character sequences that can cause this. Most famously the
UTF-8 BOM (Byte-Order-Mark)
which isn't displayed by most text editors. It's the byte sequence EF BB BF, which is optional and redundant for UTF-8 encoded documents. PHP however has to treat it as raw output. It may show up as the characters  in the output (if the client interprets the document as Latin-1) or similar "garbage".
In particular graphical editors and Java-based IDEs are oblivious to its
presence. They don't visualize it (obliged by the Unicode standard).
Most programmer and console editors however do:
There it's easy to recognize the problem early on. Other editors may identify
its presence in a file/settings menu (Notepad++ on Windows can identify and
remedy the problem),
Another option to inspect the BOMs presence is resorting to an hexeditor.
On *nix systems hexdump is usually available,
if not a graphical variant which simplifies auditing these and other issues:
An easy fix is to set the text editor to save files as "UTF-8 (no BOM)"
or similar to such nomenclature. Often newcomers otherwise resort to creating new files and just copy&pasting the previous code back in.
Correction utilities
There are also automated tools to examine and rewrite text files
(sed/awk or recode).
For PHP specifically there's the phptags tag tidier.
It rewrites close and open tags into long and short forms, but also easily
fixes leading and trailing whitespace, Unicode and UTF-x BOM issues:
phptags --whitespace *.php
It's safe to use on a whole include or project directory.
Whitespace after ?>
If the error source is mentioned as behind the
closing ?>
then this is where some whitespace or the raw text got written out.
The PHP end marker does not terminate script execution at this point. Any text/space characters after it will be written out as page content
still.
It's commonly advised, in particular to newcomers, that trailing ?> PHP
close tags should be omitted. This eschews a small portion of these cases.
(Quite commonly include()d scripts are the culprit.)
Error source mentioned as "Unknown on line 0"
It's typically a PHP extension or php.ini setting if no error source
is concretized.
It's occasionally the gzip stream encoding setting
or the ob_gzhandler.
But it could also be any doubly loaded extension= module
generating an implicit PHP startup/warning message.
Preceding error messages
If another PHP statement or expression causes a warning message or
notice being printed out, that also counts as premature output.
In this case you need to eschew the error,
delay the statement execution, or suppress the message with e.g.
isset() or #() -
when either doesn't obstruct debugging later on.
No error message
If you have error_reporting or display_errors disabled per php.ini,
then no warning will show up. But ignoring errors won't make the problem go
away. Headers still can't be sent after premature output.
So when header("Location: ...") redirects silently fail it's very
advisable to probe for warnings. Reenable them with two simple commands
atop the invocation script:
error_reporting(E_ALL);
ini_set("display_errors", 1);
Or set_error_handler("var_dump"); if all else fails.
Speaking of redirect headers, you should often use an idiom like
this for final code paths:
exit(header("Location: /finished.html"));
Preferably even a utility function, which prints a user message
in case of header() failures.
Output buffering as a workaround
PHPs output buffering
is a workaround to alleviate this issue. It often works reliably, but shouldn't
substitute for proper application structuring and separating output from control
logic. Its actual purpose is minimizing chunked transfers to the webserver.
The output_buffering=
setting nevertheless can help.
Configure it in the php.ini
or via .htaccess
or even .user.ini on
modern FPM/FastCGI setups.
Enabling it will allow PHP to buffer output instead of passing it to the webserver instantly. PHP thus can aggregate HTTP headers.
It can likewise be engaged with a call to ob_start();
atop the invocation script. Which however is less reliable for multiple reasons:
Even if <?php ob_start(); ?> starts the first script, whitespace or a
BOM might get shuffled before, rendering it ineffective.
It can conceal whitespace for HTML output. But as soon as the application logic attempts to send binary content (a generated image for example),
the buffered extraneous output becomes a problem. (Necessitating ob_clean()
as a further workaround.)
The buffer is limited in size, and can easily overrun when left to defaults.
And that's not a rare occurrence either, difficult to track down
when it happens.
Both approaches therefore may become unreliable - in particular when switching between
development setups and/or production servers. This is why output buffering is
widely considered just a crutch / strictly a workaround.
See also the basic usage example
in the manual, and for more pros and cons:
What is output buffering?
Why use output buffering in PHP?
Is using output buffering considered a bad practice?
Use case for output buffering as the correct solution to "headers already sent"
But it worked on the other server!?
If you didn't get the headers warning before, then the output buffering
php.ini setting
has changed. It's likely unconfigured on the current/new server.
Checking with headers_sent()
You can always use headers_sent() to probe if
it's still possible to... send headers. Which is useful to conditionally print
info or apply other fallback logic.
if (headers_sent()) {
die("Redirect failed. Please click on this link: <a href=...>");
}
else{
exit(header("Location: /user.php"));
}
Useful fallback workarounds are:
HTML <meta> tag
If your application is structurally hard to fix, then an easy (but
somewhat unprofessional) way to allow redirects is injecting a HTML
<meta> tag. A redirect can be achieved with:
<meta http-equiv="Location" content="http://example.com/">
Or with a short delay:
<meta http-equiv="Refresh" content="2; url=../target.html">
This leads to non-valid HTML when utilized past the <head> section.
Most browsers still accept it.
JavaScript redirect
As alternative a JavaScript redirect
can be used for page redirects:
<script> location.replace("target.html"); </script>
While this is often more HTML compliant than the <meta> workaround,
it incurs a reliance on JavaScript-capable clients.
Both approaches however make acceptable fallbacks when genuine HTTP header()
calls fail. Ideally you'd always combine this with a user-friendly message and
clickable link as last resort. (Which for instance is what the http_redirect()
PECL extension does.)
Why setcookie() and session_start() are also affected
Both setcookie() and session_start() need to send a Set-Cookie: HTTP header.
The same conditions therefore apply, and similar error messages will be generated
for premature output situations.
(Of course, they're furthermore affected by disabled cookies in the browser
or even proxy issues. The session functionality obviously also depends on free
disk space and other php.ini settings, etc.)
Further links
Google provides a lengthy list of similar discussions.
And of course many specific cases have been covered on Stack Overflow as well.
The WordPress FAQ explains How do I solve the Headers already sent warning problem? in a generic manner.
Adobe Community: PHP development: why redirects don't work (headers already sent)
Nucleus FAQ: What does "page headers already sent" mean?
One of the more thorough explanations is HTTP Headers and the PHP header() Function - A tutorial by NicholasSolutions (Internet Archive link).
It covers HTTP in detail and gives a few guidelines for rewriting scripts.
This error message gets triggered when anything is sent before you send HTTP headers (with setcookie or header). Common reasons for outputting something before the HTTP headers are:
Accidental whitespace, often at the beginning or end of files, like this:
<?php
// Note the space before "<?php"
?>
       To avoid this, simply leave out the closing ?> - it's not required anyways.
Byte order marks at the beginning of a php file. Examine your php files with a hex editor to find out whether that's the case. They should start with the bytes 3F 3C. You can safely remove the BOM EF BB BF from the start of files.
Explicit output, such as calls to echo, printf, readfile, passthru, code before <? etc.
A warning outputted by php, if the display_errors php.ini property is set. Instead of crashing on a programmer mistake, php silently fixes the error and emits a warning. While you can modify the display_errors or error_reporting configurations, you should rather fix the problem.
Common reasons are accesses to undefined elements of an array (such as $_POST['input'] without using empty or isset to test whether the input is set), or using an undefined constant instead of a string literal (as in $_POST[input], note the missing quotes).
Turning on output buffering should make the problem go away; all output after the call to ob_start is buffered in memory until you release the buffer, e.g. with ob_end_flush.
However, while output buffering avoids the issues, you should really determine why your application outputs an HTTP body before the HTTP header. That'd be like taking a phone call and discussing your day and the weather before telling the caller that he's got the wrong number.
I got this error many times before, and I am certain all PHP programmer got this error at least once before.
Possible Solution 1
This error may have been caused by the blank spaces before the start of the file or after the end of the file.These blank spaces should not be here.
ex)
THERE SHOULD BE NO BLANK SPACES HERE
echo "your code here";
?>
THERE SHOULD BE NO BLANK SPACES HERE
Check all files associated with file that causes this error.
Note: Sometimes EDITOR(IDE) like gedit (a default linux editor) add one blank line on save file. This should not happen. If you are using Linux. you can use VI editor to remove space/lines after ?> at the end of the page.
Possible Solution 2:
If this is not your case, then use ob_start to output buffering:
<?php
ob_start();
// code
ob_end_flush();
?>
This will turn output buffering on and your headers will be created after the page is buffered.
Instead of the below line
//header("Location:".ADMIN_URL."/index.php");
write
echo("<script>location.href = '".ADMIN_URL."/index.php?msg=$msg';</script>");
or
?><script><?php echo("location.href = '".ADMIN_URL."/index.php?msg=$msg';");?></script><?php
It'll definitely solve your problem.
I faced the same problem but I solved through writing header location in the above way.
You do
printf ("Hi %s,</br />", $name);
before setting the cookies, which isn't allowed. You can't send any output before the headers, not even a blank line.
COMMON PROBLEMS:
(copied from: source)
====================
1) there should not be any output (i.e. echo.. or HTML codes) before the header(.......); command.
2) remove any white-space(or newline) before <?php and after ?> tags.
3) GOLDEN RULE! - check if that php file (and also, if you include other files) have UTF8 without BOM encoding (and not just UTF-8). That is problem in many cases (because UTF8 encoded file has something special character in the start of php file, which your text-editor doesnt show)!!!!!!!!!!!
4) After header(...); you must use exit;
5) always use 301 or 302 reference:
header("location: http://example.com", true, 301 ); exit;
6) Turn on error reporting, and find the error. Your error may be caused by a function that is not working. When you turn on error reporting, you should always fix top-most error first. For example, it might be "Warning: date_default_timezone_get(): It is not safe to rely on the system's timezone settings." - then farther on down you may see "headers not sent" error. After fixing top-most (1st) error, re-load your page. If you still have errors, then again fix the top-most error.
7) If none of above helps, use JAVSCRIPT redirection(however, strongly non-recommended method), may be the last chance in custom cases...:
echo "<script type='text/javascript'>window.top.location='http://website.com/';</script>"; exit;
It is because of this line:
printf ("Hi %s,</br />", $name);
You should not print/echo anything before sending the headers.
A simple tip: A simple space (or invisible special char) in your script, right before the very first <?php tag, can cause this !
Especially when you are working in a team and somebody is using a "weak" IDE or has messed around in the files with strange text editors.
I have seen these things ;)
Another bad practice can invoke this problem which is not stated yet.
See this code snippet:
<?php
include('a_important_file.php'); //really really really bad practise
header("Location:A location");
?>
Things are okay,right?
What if "a_important_file.php" is this:
<?php
//some php code
//another line of php code
//no line above is generating any output
?>
----------This is the end of the an_important_file-------------------
This will not work? Why?Because already a new line is generated.
Now,though this is not a common scenario what if you are using a MVC framework which loads a lots of file before handover things to your controller? This is not an uncommon scenario. Be prepare for this.
From PSR-2 2.2 :
All PHP files MUST use the Unix LF (linefeed) line ending.
All PHP files MUST end with a single blank line.
The closing ?> tag MUST be omitted from files containing only php
Believe me , following thse standards can save you a hell lot of hours from your life :)
Sometimes when the dev process has both WIN work stations and LINUX systems (hosting) and in the code you do not see any output before the related line, it could be the formatting of the file and the lack of Unix LF (linefeed)
line ending.
What we usually do in order to quickly fix this, is rename the file and on the LINUX system create a new file instead of the renamed one, and then copy the content into that. Many times this solve the issue as some of the files that were created in WIN once moved to the hosting cause this issue.
This fix is an easy fix for sites we manage by FTP and sometimes can save our new team members some time.
Generally this error arise when we send header after echoing or printing. If this error arise on a specific page then make sure that page is not echoing anything before calling to start_session().
Example of Unpredictable Error:
<?php //a white-space before <?php also send for output and arise error
session_start();
session_regenerate_id();
//your page content
One more example:
<?php
includes 'functions.php';
?> <!-- This new line will also arise error -->
<?php
session_start();
session_regenerate_id();
//your page content
Conclusion: Do not output any character before calling session_start() or header() functions not even a white-space or new-line

Translating longer texts (view and email templates) with gettext

I'm developing a multilingual PHP web application, and I've got long(-ish) texts that I need to translate with gettext. These are email templates (usually short, but still several lines) and parts of view templates (longer descriptive blocks of text). These texts would include some simple HTML (things like bold/italic for emphasis, probably a link here or there). The templates are PHP scripts whose output is captured.
The problem is that gettext seems very clumsy for handling longer texts. Longer texts would generally have more changes over time than short texts — I can either change the msgid and make sure to update it in all translations (could be lots of work and very error-prone when the msgid is long), or I can keep the msgid unchanged and modify only the translations (which would leave misleading outdated texts in the templates). Also, I've seen advice against including HTML in gettext strings, but avoiding it would break a single natural piece of text into lots of chunks, which will be an even bigger nightmare to translate and reassemble, and I've also seen advice against unnecessary splitting of gettext strings into separate msgids.
The other approach I see is to ignore gettext altogether for these longer texts, and to separate those blocks in external subtemplates for each locale, and just include the one for the current locale. The disadvantage is that I'm separating the translation effort between gettext .po files and separate templates located in a completely different location.
Since this application will be used as a starting point for other applications in the future, I'm trying to come up with the best approach for the long term. I need some advice for best practices in such scenarios. How have you implemented similar cases? What turned out to work and what turned out a bad idea?
Here's the workflow I used, on a very heavily-trafficked site that had about several dozen long-ish blocks of styled textual content, translated into six languages:
Pick a text-based markup language (we used Markdown)
For long strings, use fixed message IDs like "About_page_intro_markdown" that:
describes the intent of the text
makes clear that it will be interpreted in markdown format
Have our app render "*_markdown" strings appropriately, making sure to allow only a few safe HTML tags
Build a tool for translators that:
shows them their Markdown rendered in realtime (sort of like the Markdown dingus)
makes it easy for them to see the now-authoritative base language translation of the text (since that's no longer in the msgid)
Teach translators how to use the new workflow
Pros of this workflow:
Message IDs don't change all the time
Because translators are editing in a safe higher-level syntax, hard to mess up HTML
Non-technical translators found it very easy to write in Markdown, vs. HTML
Cons of this workflow:
Having static unchanging message IDs means changes in the text need to be transmitted out of band (which we'd do anyway, as long text can raise questions about tone or emphasis)
I'm very happy with the way this workflow operated for our website, and would absolutely recommend it, and use it again. It took a couple of days to get started, but it was easy to build, train, and launch.
Hope this helps, and good luck with your project.
I just had this particular problem, and I believe I solved it in an elegant way.
The problem: We wanted to use Gettext in PHP, and use primary language strings as keys translations. However, for large blocks of HTML (with h1, h2, p, a, etc...) I'd either have to:
Create a translation for each tag with content.
or
Put the entire block with tags in one translation.
Neither of those options appealed to me, so this is what I did:
Keep simple strings ("OK","Add","Confirm","My Awesome App") as regular Gettext .po entries, with the original text as the key
Write content (large text blocks) in markdown, and keep them in files.
Example files would be /homepage/content.md (primary / source text), /homepage/content.da-DK.md, /homepage/content.de-DE.md
Write a class that fetches the content files (for the current locale) and parses it. I then used it like:
<?=Template::getContent("homepage/content")?>
However, what about dynamic large text? Simple. Use a templating engine. I decided on Smarty, and used it in my Template class.
I could now use templating logic.. within markdown! How awesome is that?!
Then came the tricky part..
For content to look good, at times you need to structure your HTML differently. Consider a campaign area with 3 "feature boxes" beneath it. The easy solution: Have a file for the campaign area, and one for each of the 3 boxes.
But I could do better than that.
I wrote a quick block parser, so I would write all the content in one file, and then render each block seperately.
Example file:
[block campaign]
Buy this now!
=============
Blaaaah... And a smarty tag: {$cool}
[/block]
[block feature 1]
Feature 1
---------
asdasd you get it..
[/block]
[block feature 2] ...
And this is how I would render them in the markup:
<?php
// At the top of the document...
// Class handles locale. :)
$template = Template::getContent("homepage/content", [
"cool" => "Smarty variable! AWESOME!"
]);
?>
...
<title><?=_("My Awesome App")?></title>
...
<div class="hero">
<!-- Template data already processed! :) -->
<?=$template->renderBlock("campaign")?>
</div>
<div class="featurebox">
<?=$template->renderBlock("feature 1")?>
</div>
<div class="featurebox">
<?=$template->renderBlock("feature 2")?>
</div>
I'm afraid I can't provide any source code, as this was for a company project, but I hope you get the idea.
gettext wasn't really designed for translating large pieces of text.
fwiw I've included basic HTML (strong, a, etc) in gettext strings as I was confident our translators knew what they were doing (mostly right) and that the translations would be well tested.
I've tried the approach of breaking up the text into one string per paragraph. Roughly as it looks odd if there's one paragraph of English in the middle of the text. Where one of those strings have changed this has meant that we have had to wait for translations before releasing a new version, which has slowed us down. On the plus side it's easy for translators to see which part of the text has changed. This approach worked well for the one application I've tried it with.
Splitting some text out into external locations also worked, but it caused management overhead, rather than just a .po file or two, there was a whole bunch of other text that had to be manually compared to the English version and updated accordingly. This is doable if you remember to provide notes to your translators explaining where and what the difference was in the English version.
I'm still not sold on either approach myself.