How can I convert a Zend localization/translation array to gettext? - zend-framework

My multi-lingual site already successfully uses the "array" method of Zend translations.
I want to convert from that method to the "gettext" method because I've read that gettext is superior.
I've tried using http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/php2po.html but can't get it to work.
I think it's not meant to handle Zend arrays as the input.
My Zend file (which works) looks like this:
<?php
return array(
'choose your favorite stores' => 'Choose your %1$sfavorite stores%2$s',
'P.S. If you ever have question' => 'P.S. If you ever have questions, %1$semail us%2$s any time.',
'You can also find quick answer' => 'You can also find quick answers on our %1$sHelp page%2$s.',
'Earn X cash' => '%1$sEarn 1-30%% cash back%2$s, get money-saving coupons, and find the best price on every purchase at %3$s2,500+ stores%4$s.'
);
(But it's much longer, and I have multiple languages, each in their own PHP file.)

With the snippet you have given the conversion works for me.
$ php2po en.php en.po -t en.php
processing 1 files...
[###########################################] 100%
$ cat en.po
#. extracted from en.php, en.php
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2012-12-19 10:08+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL#ADDRESS>\n"
"Language-Team: LANGUAGE <LL#li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Translate Toolkit 1.9.1-pre\n"
#: return+array%28-%3E%27choose+your+favorite+stores%27
msgid "Choose your %1$sfavorite stores%2$s"
msgstr "Choose your %1$sfavorite stores%2$s"
#: return+array%28-%3E%27P.S.+If+you+ever+have+question%27
msgid "P.S. If you ever have questions, %1$semail us%2$s any time."
msgstr "P.S. If you ever have questions, %1$semail us%2$s any time."
#: return+array%28-%3E%27You+can+also+find+quick+answer%27
msgid "You can also find quick answers on our %1$sHelp page%2$s."
msgstr "You can also find quick answers on our %1$sHelp page%2$s."
I'm using a Translate Toolkit version from git master, maybe you should try that.

Related

Localization via Locale::Maketext::Simple always falls back to default instead of .po entry

In a perl Module I want to use https://metacpan.org/pod/Locale::Maketext::Simple to convert strings to different languages.
My .po files are located unter /opt/x/languages, e.g. /opt/x/languages/en.po.
In my module I'm using the following header:
use Locale::Maketext::Simple (
Path => '/opt/x/languages',
Style => 'maketext'
);
loc_lang('en');
An entry in the .po files looks like this:
msgid "string [_1] to be converted"
msgstr "string [_1] is converted"
and the check via console with msgfmt -c en.po throws no errors so far.
But when I'm converting a string with loc() like loc("string [_1] to be converted", "xy") it gives me the output of "string xy to be converted" instead of "string xy is converted" as I would expect it. This looks to me like the .po files are not loaded correctly.
How can I check what .po files are found during maketext instantiation? Or am I mixing things up and there' a general mistake?
Edit 1:
Thanks for the comments, but it still does not work.
I've checked the files with https://poedit.net/ and created the corresponding .mo files (currently for de and en) with this tool as well. They are located next to the .po files (inside /opt/x/languages).
For completeness, my header looks like this:
# MY OWN LANGUAGE FILE (DE)
# 06-2019 by me
#
msgid ""
msgstr ""
"Project-Id-Version: 1.0.0\n"
"POT-Creation-Date: 2019-06-01 00:00+0100\n"
"PO-Revision-Date: 2019-06-02 00:00+0100\n"
"Last-Translator: thatsme <me#me.de>\n"
"Language-Team: unknown\n"
"Language: de\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Poedit 2.2.3\n"
msgid "string [_1] to be converted"
msgstr "string [_1] is converted"
After some more digging and testing I finally found the issue for this behaviour, so here's my solution so far. Hope this may help others, because you rarely find any good documentation on this topic:
Add libraries
I added the library https://metacpan.org/pod/Locale::Maketext::Simple as stated above, but forgot to add https://metacpan.org/pod/Locale::Maketext::Lexicon.
This took me quite long to see, because there were no exceptions or errors thrown, just... nothing.
In the Maketext::Simple documentation it says
If Locale::Maketext::Lexicon is not present, it implements a minimal localization function by simply interpolating [_1] with the first argument, [_2] with the second, etc.
what looks at a first glance that .po files are loaded without Maketext::Lexikon, but it simply replaces placeholders.
Other issues:
I then discovered that all string are translated, except for the ones with placeholders like [_1]. I could not find a reason for this, but I moved to
Style => 'gettext'
and replaced all [_1], [_2]... with %1, %2... - that works like a charm.

How to remove quotes in my product description string?

I'm using OSCommerce for my online store and I'm currently optimizing my product page for rich snippets.
Some of my Google Indexed pages are being marked as "Failed" by Google due to double quotes in the description field.
I'm using an existing code which strips the html coding and truncates anything after 197 characters.
<?php echo substr(trim(preg_replace('/\s\s+/', ' ', strip_tags($product_info['products_description']))), 0, 197); ?>
How can I include the removal of quotes in that code so that the following string:
<strong>This product is the perfect "fit"</strong>
becomes:
This product is the perfect fit
Happened with me, try to use:
tep_output_string($product_info['products_description']))
" becomes "
We can try using preg_replace_callback here:
$input = "SOME TEXT HERE <strong>This product is the perfect \"fit\"</strong> SOME MORE TEXT HERE";
$output = preg_replace_callback(
"/<([^>]+)>(.*?)<\/\\1>/",
function($m) {
return str_replace("\"", "", $m[2]);
},
$input);
echo $output;
This prints:
SOME TEXT HERE This product is the perfect fit SOME MORE TEXT HERE
The regex pattern used does the following:
<([^>]+)> match an opening HTML tag, and capture the tag name
(.*?) then match and capture the content inside the tag
<\/\\1> finally match the same closing tag
Then, we use a callback function which does an additional replacement to strip off all double quotes.
Note that in general using regex against HTML is bad practice. But, if your text only has single level/occasional HTML tags, then the solution I gave above might be viable.

php gettext include string with phpcode

i'm trying to use gettext to translate the string in my site
gettext doesn't have problem detecting strings such as
<? echo _("Donations"); ?>
or
<? echo _("Donate to this site");?>
but obviously, usually we'll use codes like this in our site
<? echo _("$siteName was developed with one thing in mind"); ?>
Of course in the website, the $siteName is displayed correctly as
My Website was developed with one thing in mind
if we put
$siteName = "My Website";
previously.
My problem is, i'm using poedit to extract all the strings in my codes that needs to be translated, and it seems poedit doesn't extract all string with php codes like I described above. So how do I get poedit extract strings with php code inside it too? Or is there any other tools I should use?
One possibility is to use sprintf. Just make sure you keep the percent (%) in the poedit string!
echo sprintf( _("This %s can be translated "), 'string');
Or when using multiple variables
echo vsprintf( _("This %s can be %s"), ['string', 'translated']);

Why is msgid_plural necessary in gettext translation files?

I've read the GNU Gettext manual about Translating plural forms and see its example:
#, c-format
msgid "One file removed"
msgid_plural "%d files removed"
msgstr[0] "%d slika je uklonjena"
msgstr[1] "%d datoteke uklonjenih"
msgstr[2] "%d slika uklonjenih"
Why is msgid_plural different from msgid, and doesn't that defeat the purpose of having translations be aware of plural forms?
I'd think that I could do something like this (for English):
#, c-format
msgid "X geese"
msgstr[0] "%d goose"
msgstr[1] "%d geese"
#, c-format
msgid "sentence_about_geese_at_the_lake"
msgstr[0] "There is one goose at the lake."
msgstr[1] "There are %d geese at the lake."
(using just one msgid).
Then in my code, I'd have something like:
<?php echo $this->translate('X geese', $numberA); ?>
<?php echo $this->translate('sentence_about_geese_at_the_lake', $numberB); ?>
If $numberA is 3, it would say "3 geese."
If $numberB is 0, the next line would say "There are 0 geese at the lake."
(because for English, the rule is (n != 1), so plural is used for any number that equals 0 or greater than 1).
It seems redundant for me to be required to specify 2 msgids for the same collection of phrases.
Thanks for your help!
One of the ideas behind gettext is that the msgid is extracted from source files to create POT files, which are used as base for translations stored in PO files, and later compiled to MO files. A msgid is also used if no suitable translation is found.
A msgid is not a "key" that is non-readable by users; it is a real phrase that can be used in the program. So when in your code you request a translation for a plural (pseudocode here):
ngettext("One file removed", "%d files removed", file_count)
...these two strings will be used a) to extract messages from the source code; these messages will serve as guide for translators b) as the default strings when no suitable translation is found for the current locale.
That's why a plural string has two msgid: to show how they are defined in the source program (for translators) and to be used as default (when no translation exists).
In other localization systems, like Android String Resources or Rails YAML files, it works like you imagined -- the equivalent to a msgid is a single "key" even for plurals, but then the real phrase is not used in the source code, and defining translations is a two-steps action even for the original language.

Multiple "msgid" for an "msgstr" in gettext

Is it possible to make two or more msgids matching one msgstr?
For example, both ('list.empty') and ('list.null') return "There is no any objects yet."
If I write this way in po file:
msgid "list.empty"
msgid "list.null"
msgstr "There is no any objects yet."
It just errors with "missing 'msgstr'":
However,
msgid "list.empty"
msgstr "There is no any objects yet."
msgid "list.null"
msgstr "There is no any objects yet."
Looks and works fine but stupid, because once I change one msgstr without another, they return different result.
Does anyone have any better hacks?
You are approaching gettext in the wrong way, here is how it works:
msgid is required for each entry
msgctxt is optional and is used to differentiate between two msgid records with same content that may have different translations.
(msgid, msgctxt) is the unique key for the dictionary, if msgctxt is missing you can consider it null.
You should read the gettext documentation before implementing as it's not always straightforward.
In your case, this is how you are supposed to implement it:
msgctxt "list.empty"
msgid "There is no any objects yet."
msgctxt "list.null"
msgid "There is no any objects yet."