INTLTOOL_EXTRACT breaks translatable lines - gettext

I don't know why but my source code line and the one in the pot file are not the same, source code:
#include <libintl.h>
#define _(String) gettext(String)
/* more code */
printf (_("Error while saving file in %s:\n\n%s"), ...);
Now, in the pot file, looks like this:
#: ../src/main.c:72
#, c-format
msgid ""
"Error while saving file in %s:\n"
"\n"
"%s"
msgstr ""
Question: Why is the break line and how to avoid it? The expected is:
#: ../src/main.c:72
#, c-format
msgid "Error while saving file in %s:\n\n%s"
msgstr ""
PS: I'm using autotools, so everything is generated with gettextize and intltoolize.
Thanks

The two forms are equivalent. Different pot file writers format their pot files differently, and this is just the way that gettext/intltool does it.
I don't know how to avoid the line breaks, but I wouldn't waste time trying.

Related

Localization via Locale::Maketext::Simple always falls back to default instead of .po entry

In a perl Module I want to use https://metacpan.org/pod/Locale::Maketext::Simple to convert strings to different languages.
My .po files are located unter /opt/x/languages, e.g. /opt/x/languages/en.po.
In my module I'm using the following header:
use Locale::Maketext::Simple (
Path => '/opt/x/languages',
Style => 'maketext'
);
loc_lang('en');
An entry in the .po files looks like this:
msgid "string [_1] to be converted"
msgstr "string [_1] is converted"
and the check via console with msgfmt -c en.po throws no errors so far.
But when I'm converting a string with loc() like loc("string [_1] to be converted", "xy") it gives me the output of "string xy to be converted" instead of "string xy is converted" as I would expect it. This looks to me like the .po files are not loaded correctly.
How can I check what .po files are found during maketext instantiation? Or am I mixing things up and there' a general mistake?
Edit 1:
Thanks for the comments, but it still does not work.
I've checked the files with https://poedit.net/ and created the corresponding .mo files (currently for de and en) with this tool as well. They are located next to the .po files (inside /opt/x/languages).
For completeness, my header looks like this:
# MY OWN LANGUAGE FILE (DE)
# 06-2019 by me
#
msgid ""
msgstr ""
"Project-Id-Version: 1.0.0\n"
"POT-Creation-Date: 2019-06-01 00:00+0100\n"
"PO-Revision-Date: 2019-06-02 00:00+0100\n"
"Last-Translator: thatsme <me#me.de>\n"
"Language-Team: unknown\n"
"Language: de\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Poedit 2.2.3\n"
msgid "string [_1] to be converted"
msgstr "string [_1] is converted"
After some more digging and testing I finally found the issue for this behaviour, so here's my solution so far. Hope this may help others, because you rarely find any good documentation on this topic:
Add libraries
I added the library https://metacpan.org/pod/Locale::Maketext::Simple as stated above, but forgot to add https://metacpan.org/pod/Locale::Maketext::Lexicon.
This took me quite long to see, because there were no exceptions or errors thrown, just... nothing.
In the Maketext::Simple documentation it says
If Locale::Maketext::Lexicon is not present, it implements a minimal localization function by simply interpolating [_1] with the first argument, [_2] with the second, etc.
what looks at a first glance that .po files are loaded without Maketext::Lexikon, but it simply replaces placeholders.
Other issues:
I then discovered that all string are translated, except for the ones with placeholders like [_1]. I could not find a reason for this, but I moved to
Style => 'gettext'
and replaced all [_1], [_2]... with %1, %2... - that works like a charm.

How to assign particular context to --keyword for proper_name?

When using the xgettext tool it is possible to automatically add commenting to assist translators with regards to proper names (as documented).
The documentation suggests to add the following to the command line:
--keyword='proper_name:1,"This is a proper name. See the gettext manual, section Names."'
Which results in proper names being extracted to the .pot file like this:
#. This is a proper name. See the gettext manual, section Names.
#: ../Foo.cpp:18
msgid "Bob"
msgstr ""
The problem with this; is that no particular context has been defined for that string. Here is ideally how the proper name would be extracted:
#. This is a proper name. See the gettext manual, section Names.
#: ../Foo.cpp:18
msgctxt "Proper Name"
msgid "Bob"
msgstr ""
I've tried the following but with no success:
# Hoping that 0 would be the function name 'proper_name'.
--keyword='proper_name:0c,1,"This is a proper name. See the gettext manual, section Names."'
# Hoping that -1 would be the function name 'proper_name'.
--keyword='proper_name:-1c,1,"This is a proper name. See the gettext manual, section Names."'
# Hoping that the string would be used as the context.
--keyword='proper_name:"Proper Name"c,1,"This is a proper name. See the gettext manual, section Names."'
# Hoping that the string would be used as the context.
--keyword='proper_name:c"Proper Name",1,"This is a proper name. See the gettext manual, section Names."'
Is there a way to force a particular msgctxt to be used for all strings extracted with a keyword (such as proper_name from the example above)?
If there is no option to achieve this with xgettext as-is then I considered perhaps using the following:
--keyword='proper_name:1,"<PROPERNAME>"'
Resulting with:
#. <PROPERNAME>
#: ../Foo.cpp:18
msgid "Bob"
msgstr ""
The problem then becomes; how to automatically translate all occurrences of this in the resulting .pot file into the following:
#. This is a proper name. See the gettext manual, section Names.
#: ../Foo.cpp:18
msgctxt "Proper Name"
msgid "Bob"
msgstr ""
If you want to extract a message context, it has to be part of the argument list. And the numerical part in "Nc" has to be a positive integer. All your attempts with 0, -1 are fruitless, sorry.
The signature of your function must look like this:
#define PROPER_NAME "Proper Name"
const char *proper_name(const char *ctx, const char *name);
And then call it like this:
proper_name(PROPER_NAME, "Bob");
That repeats PROPER_NAME all over the code, but it's the only way to get it into the message context.
Maybe file a feature request?
There is also a hack that achieves the same without changing your source code. I assume that you're using C and the standard Makefile (but you can do the same in other languages):
Copy the file POTFILES to POTFILES-proper-names and add a line ./proper_names.pot to POTFILES.in.
Then you have to create proper_names.pot:
xgettext --files-from=POTFILES-proper-names \
--keyword='' \
--keyword='proper_names:1:"Your comment ..."' \
--output=proper_names.pox
This will now only contain the entries that were maked with "proper_names()". Now add the context:
msg-add-content proper_names.pox "Proper Name" >proper_names.pot
rm proper_names.pot
Unfortunately, there is no program called "msg-add-content". Grab one of the zillion po-parsers out there, and write one yourself (or take mine at the end of this post).
Now, update your PACKAGE.pot as usual. Since "proper_names.pox" is an input file for the main xgettext run, all your extracted proper names with the context added, are added to your pot file (and their context will be used).
Short of another script for adding a message context to all your entries in a .pot file, use this one:
#! /usr/bin/env perl
use strict;
use Locale::PO;
die "usage: $0 POFILE CONTEXT" unless #ARGV == 2;
my ($input, $context) = #ARGV;
my $entries = Locale::PO->load_file_asarray($input) or die "$input: failure";
foreach my $entry (#$entries) {
$entry->msgctxt($context) unless '""' eq $entry->msgid;
print $entry->dump;
}
You have to install the Perl library "Locale::PO" for it, either with "sudo cpan install Locale::PO" or use the pre-built version that your vendor may have.

Why is msgid_plural necessary in gettext translation files?

I've read the GNU Gettext manual about Translating plural forms and see its example:
#, c-format
msgid "One file removed"
msgid_plural "%d files removed"
msgstr[0] "%d slika je uklonjena"
msgstr[1] "%d datoteke uklonjenih"
msgstr[2] "%d slika uklonjenih"
Why is msgid_plural different from msgid, and doesn't that defeat the purpose of having translations be aware of plural forms?
I'd think that I could do something like this (for English):
#, c-format
msgid "X geese"
msgstr[0] "%d goose"
msgstr[1] "%d geese"
#, c-format
msgid "sentence_about_geese_at_the_lake"
msgstr[0] "There is one goose at the lake."
msgstr[1] "There are %d geese at the lake."
(using just one msgid).
Then in my code, I'd have something like:
<?php echo $this->translate('X geese', $numberA); ?>
<?php echo $this->translate('sentence_about_geese_at_the_lake', $numberB); ?>
If $numberA is 3, it would say "3 geese."
If $numberB is 0, the next line would say "There are 0 geese at the lake."
(because for English, the rule is (n != 1), so plural is used for any number that equals 0 or greater than 1).
It seems redundant for me to be required to specify 2 msgids for the same collection of phrases.
Thanks for your help!
One of the ideas behind gettext is that the msgid is extracted from source files to create POT files, which are used as base for translations stored in PO files, and later compiled to MO files. A msgid is also used if no suitable translation is found.
A msgid is not a "key" that is non-readable by users; it is a real phrase that can be used in the program. So when in your code you request a translation for a plural (pseudocode here):
ngettext("One file removed", "%d files removed", file_count)
...these two strings will be used a) to extract messages from the source code; these messages will serve as guide for translators b) as the default strings when no suitable translation is found for the current locale.
That's why a plural string has two msgid: to show how they are defined in the source program (for translators) and to be used as default (when no translation exists).
In other localization systems, like Android String Resources or Rails YAML files, it works like you imagined -- the equivalent to a msgid is a single "key" even for plurals, but then the real phrase is not used in the source code, and defining translations is a two-steps action even for the original language.

Perl write to file returns huge weird stacktrace

I have the following problem: when I try to save the file that contains a semicolon in the name it returns a huge and weird stacktrace of the characters on the page. I've tried to escape, to trim and to replace those semicolons, but the result is still the same. I use the following regex:
$value =~ s/([^a-zA-Z0-9_\-.]|;)/uc sprintf("%%%02x",ord($1))/eg;
(I've even added the |; part separately..)
So, when I open the file to write and call the print function it returns lots of weird stuff, like that:
PK!}�3y�[Content_Types].xml ���/�h9\�?�0���cz��:� �s_����o���>�T�� (it is a huge one, this is just a part of it).
Is there any way I could avoid this?
Thank you in advance!
EDIT:
Just interested - what is the PK responsible of in this string? I mean I can understand that those chars are just contents of the file, but what is PK ? And why does it show the content type?
EDIT 2.0:
I'm uploading the .docx file - when the name doesn't contain the semicolon it works all fine. This is the code for the file saving:
open (QSTR,">", "$dest_file") or die "can't open output file: $qstring_file";
print QSTR $value;
close (QSTR);
EDIT 3.0
This is a .cgi script, that is called after posting some data to the server. It has to save some info about the uploading file to a temp file (name, contents, size) in the manner of key-value pairs. So any file that contains the semicolon causes this error.
EDIT 4.0
Found the cause:
The CGI param function while uploading the params counts semicolon as the delimiter! Is there any way to escape it in the file header?
The PK in file header it means it is compressed ZIP like file, like docx.
One guess: The ; is not valid character in filename at the destination?
Your regexp is not good: (the dot alone is applicable to any character...)
$value =~ s/([^a-zA-Z0-9_\-.]|;)/uc sprintf("%%%02x",ord($1))/eg;
Try this:
#replace evey non valid char to underscore
$value =~ s/([^a-zA-Z0-9_\-\.\;])/_/g;

Localizable.strings corrupted?

I'm trying to include the internationalization of my application, and only for testing purposes I added a simple line in the file Localizable.string.
This is my whole file:
"Test locale" = "Test locale"
And when I try run my application I get this error:
Localizable.strings:0: error: validation failed: The data couldn’t be
read because it has been corrupted.
I've tried changing the "Text Encoding" to UTF-16 but nothing resolved.
If this is your whole file, add a semicolon at the end. Change it to:
"Test locale" = "Test locale";
To get more detailed informations you can use the Property List utility from the command line:
plutil -lint <your_strings_file>.strings
the -lint switch is for checking the syntax. If you have an error you'll get line number and more informations, and in general better directions on how to fix the issue.
You can verify your Localizable.strings file with this script:
https://github.com/dcordero/Rubustrings
In my case, it was like this:
/* Comment for Very Long Sentence */
"Very Long Sentence Very Long Sentence Very Long Sentence Very Long Sentence " =;
"Very Long Sentence Very Long Sentence Very Long Sentence Very Long Sentence ";
(Notice the ' = ; ' instead of ' = ' at the end of the first line)
In my case it was brackets inside string — I needed to add slash before \".
I've made a little script to check whole folders .strings files using plutil.
https://github.com/CarlesEstevadeordal/check_strings
There can be multiple reasons for this:
Semicolon is missing at the end.
Multiple semicolons at the end.
" within the message which should be escaped by \".
Extra character after semicolon.
Invalid white space in the file.
Other invalid characters in the file.
Merge conflict characters in the file!
<<<<<<< HEAD, ======= and >>>>>>>.
Please note that plutil -lint Localizable.strings returned OK for point-2 & 7!