When using the xgettext tool it is possible to automatically add commenting to assist translators with regards to proper names (as documented).
The documentation suggests to add the following to the command line:
--keyword='proper_name:1,"This is a proper name. See the gettext manual, section Names."'
Which results in proper names being extracted to the .pot file like this:
#. This is a proper name. See the gettext manual, section Names.
#: ../Foo.cpp:18
msgid "Bob"
msgstr ""
The problem with this; is that no particular context has been defined for that string. Here is ideally how the proper name would be extracted:
#. This is a proper name. See the gettext manual, section Names.
#: ../Foo.cpp:18
msgctxt "Proper Name"
msgid "Bob"
msgstr ""
I've tried the following but with no success:
# Hoping that 0 would be the function name 'proper_name'.
--keyword='proper_name:0c,1,"This is a proper name. See the gettext manual, section Names."'
# Hoping that -1 would be the function name 'proper_name'.
--keyword='proper_name:-1c,1,"This is a proper name. See the gettext manual, section Names."'
# Hoping that the string would be used as the context.
--keyword='proper_name:"Proper Name"c,1,"This is a proper name. See the gettext manual, section Names."'
# Hoping that the string would be used as the context.
--keyword='proper_name:c"Proper Name",1,"This is a proper name. See the gettext manual, section Names."'
Is there a way to force a particular msgctxt to be used for all strings extracted with a keyword (such as proper_name from the example above)?
If there is no option to achieve this with xgettext as-is then I considered perhaps using the following:
--keyword='proper_name:1,"<PROPERNAME>"'
Resulting with:
#. <PROPERNAME>
#: ../Foo.cpp:18
msgid "Bob"
msgstr ""
The problem then becomes; how to automatically translate all occurrences of this in the resulting .pot file into the following:
#. This is a proper name. See the gettext manual, section Names.
#: ../Foo.cpp:18
msgctxt "Proper Name"
msgid "Bob"
msgstr ""
If you want to extract a message context, it has to be part of the argument list. And the numerical part in "Nc" has to be a positive integer. All your attempts with 0, -1 are fruitless, sorry.
The signature of your function must look like this:
#define PROPER_NAME "Proper Name"
const char *proper_name(const char *ctx, const char *name);
And then call it like this:
proper_name(PROPER_NAME, "Bob");
That repeats PROPER_NAME all over the code, but it's the only way to get it into the message context.
Maybe file a feature request?
There is also a hack that achieves the same without changing your source code. I assume that you're using C and the standard Makefile (but you can do the same in other languages):
Copy the file POTFILES to POTFILES-proper-names and add a line ./proper_names.pot to POTFILES.in.
Then you have to create proper_names.pot:
xgettext --files-from=POTFILES-proper-names \
--keyword='' \
--keyword='proper_names:1:"Your comment ..."' \
--output=proper_names.pox
This will now only contain the entries that were maked with "proper_names()". Now add the context:
msg-add-content proper_names.pox "Proper Name" >proper_names.pot
rm proper_names.pot
Unfortunately, there is no program called "msg-add-content". Grab one of the zillion po-parsers out there, and write one yourself (or take mine at the end of this post).
Now, update your PACKAGE.pot as usual. Since "proper_names.pox" is an input file for the main xgettext run, all your extracted proper names with the context added, are added to your pot file (and their context will be used).
Short of another script for adding a message context to all your entries in a .pot file, use this one:
#! /usr/bin/env perl
use strict;
use Locale::PO;
die "usage: $0 POFILE CONTEXT" unless #ARGV == 2;
my ($input, $context) = #ARGV;
my $entries = Locale::PO->load_file_asarray($input) or die "$input: failure";
foreach my $entry (#$entries) {
$entry->msgctxt($context) unless '""' eq $entry->msgid;
print $entry->dump;
}
You have to install the Perl library "Locale::PO" for it, either with "sudo cpan install Locale::PO" or use the pre-built version that your vendor may have.
Related
I am trying to identify misspelled words with Aspell via Perl. I am working on a Linux server without administrator privileges which means I have access to Perl and Aspell but not, for example, Text::Aspell which is a Perl interface for Aspell.
I want to do the very simple task of passing a list of words to Aspell and having it return the words that are misspelled. If the words I want to check are "dad word lkjlkjlkj" I can do this through the command line with the following commands:
aspell list
dad word lkjlkjlkj
Aspell requires CTRL + D at the end to submit the word list. It would then return "lkjlkjlkj", as this isn't in the dictionary.
In order to do the exact same thing, but submitted via Perl (because I need to do this for thousands of documents) I have tried:
my $list = q(dad word lkjlkjlkj):
my #arguments = ("aspell list", $list, "^D");
my $aspell_out=`#arguments`;
print "Aspell output = $aspell_out\n";
The expected output is "Aspell output = lkjlkjlkj" because this is the output that Aspell gives when you submit these commands via the command line. However, the actual output is just "Aspell output = ". That is, Perl does not capture any output from Aspell. No errors are thrown.
I am not an expert programmer, but I thought this would be a fairly simple task. I've tried various iterations of this code and nothing works. I did some digging and I'm concerned that perhaps because Aspell is interactive, I need to use something like Expect, but I cannot figure out how to use it. Nor am I sure that it is actually the solution to my problem. I also think ^D should be an appropriate replacement for CTRL+D at the end of the commands, but all I know is it doesn't throw an error. I also tried \cd instead. Whatever it is, there is obviously an issue in either submitting the command or capturing the output.
The complication with using aspell out of a program is that it is an interactive and command-line driver tool, as you suspect. However, there is a simple way to do what you need.
In order to use aspell's command list one needs to pass it words via STDIN, as its man page says. While I find the GNU Aspell manual a little difficult to get going with, passing input to a program via its STDIN is easy enough and we can rewrite the invocation as
echo dad word lkj | aspell list
We get lkj printed back, as due. Now this can run out of a program just as it stands
my $word_list = q(word lkj good asdf);
my $cmd = qq(echo $word_list | aspell list);
my #aspell_out = qx($cmd);
print for #aspell_out;
This prints lines lkj and asdf.
I assemble the command in a string (as opposed to an array) for specific reasons, explained below. The qx is the operator form of backticks, which I prefer for its far superior readability.
Note that qx can return all output in a string, if in scalar context (assigned to a scalar for example), or in a list when in list context. Here I assign to an array so you get each word as an element (alas, each also comes with a newline, so may want to do chomp #aspell_out;).
Comment on a list vs string form of a command
I think that it's safe to recommend to use a list-form for a command, in general. So we'd say
my #cmd = ('ls', '-l', $dir); # to be run as an external command
instead of
my $cmd = "ls -l $dir"; # to be run as an external command
The list form generally makes it easier to manage the command, and it avoids the shell altogether.
However, this case is a little different
The qx operator doesn't really behave differently -- the array gets concatenated into a string, and that runs. The very fact that we can pass it an array is incidental, and not even documented
We need to pipe input to aspell's STDIN, and shell does that for us simply. We can use a shell with command's LIST form as well, but then we'd need to invoke it explicitly. We can also go for aspell's STDIN by means other than the shell but that's more complex
With a command in a list the command name must be the first word, so that "aspell list" from the question is wrong and it should fail (there is no command named that) ... except that in this case it wouldn't (if the rest were correct), since for qx the array gets collapsed into a string
Finally, apsell nicely exposes its API in a C library and that's been utilized for the module you mention. I'd suggest to install it as a user (no privileges needed) and use that.
You should take a step back and investigate if you can install Text::Aspell without administrator privilige. In most cases that's perfectly possible.
You can install modules into your home directory. If there is no C-compiler available on the server you can install the module on a compatible machine, compile and copy the files.
In a perl Module I want to use https://metacpan.org/pod/Locale::Maketext::Simple to convert strings to different languages.
My .po files are located unter /opt/x/languages, e.g. /opt/x/languages/en.po.
In my module I'm using the following header:
use Locale::Maketext::Simple (
Path => '/opt/x/languages',
Style => 'maketext'
);
loc_lang('en');
An entry in the .po files looks like this:
msgid "string [_1] to be converted"
msgstr "string [_1] is converted"
and the check via console with msgfmt -c en.po throws no errors so far.
But when I'm converting a string with loc() like loc("string [_1] to be converted", "xy") it gives me the output of "string xy to be converted" instead of "string xy is converted" as I would expect it. This looks to me like the .po files are not loaded correctly.
How can I check what .po files are found during maketext instantiation? Or am I mixing things up and there' a general mistake?
Edit 1:
Thanks for the comments, but it still does not work.
I've checked the files with https://poedit.net/ and created the corresponding .mo files (currently for de and en) with this tool as well. They are located next to the .po files (inside /opt/x/languages).
For completeness, my header looks like this:
# MY OWN LANGUAGE FILE (DE)
# 06-2019 by me
#
msgid ""
msgstr ""
"Project-Id-Version: 1.0.0\n"
"POT-Creation-Date: 2019-06-01 00:00+0100\n"
"PO-Revision-Date: 2019-06-02 00:00+0100\n"
"Last-Translator: thatsme <me#me.de>\n"
"Language-Team: unknown\n"
"Language: de\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Poedit 2.2.3\n"
msgid "string [_1] to be converted"
msgstr "string [_1] is converted"
After some more digging and testing I finally found the issue for this behaviour, so here's my solution so far. Hope this may help others, because you rarely find any good documentation on this topic:
Add libraries
I added the library https://metacpan.org/pod/Locale::Maketext::Simple as stated above, but forgot to add https://metacpan.org/pod/Locale::Maketext::Lexicon.
This took me quite long to see, because there were no exceptions or errors thrown, just... nothing.
In the Maketext::Simple documentation it says
If Locale::Maketext::Lexicon is not present, it implements a minimal localization function by simply interpolating [_1] with the first argument, [_2] with the second, etc.
what looks at a first glance that .po files are loaded without Maketext::Lexikon, but it simply replaces placeholders.
Other issues:
I then discovered that all string are translated, except for the ones with placeholders like [_1]. I could not find a reason for this, but I moved to
Style => 'gettext'
and replaced all [_1], [_2]... with %1, %2... - that works like a charm.
I am attempting to exclude certain files from my doxygen generated documentation. I am using version 1.8.14.
My files come in this naming convention:
/Path2/OtherFile.cs
/Path/DAL.Entity/Source.cs
/Path/DAL.Entity/SourceBase.generated.cs
I want to exclude all files that do NOT end in Base.generated.cs, and are located inside of /Path/.
Since it appears doxygen claims to use regex for the exclude_patterns variable, I eventually came up with this:
.*\\Path\\DAL\..{4,15}\\((?<!Base\.generated).)*
Needless to say, it did not work. Nor did multiple other variations. So far a simple wildcard * is the only regex character I have gotten to actually work.
doxygen uses QRegExp for a lot of things, so I assumed that was the library used for this variable as well, but even several variations of a pattern that that library claims to support did not work; granted apparently that library is full of bugs, but I would expect some things to work.
Does doxygen actually use a regex library for this variable?
If so, which library is it?
In either case, is there a method of achieving my goal?
My conclusion is; No... Doxygen Doxyfile does not support real regex. Even though they claim that it do. It's just standard wildcards that work.
We ended up with a really awkward solution to work around this.
What we did is that we added a macro in our CMakeLists.txt that creates a string with everything we want to include in INPUT instead. Manually excluding the parts we don't want.
The sad part is that CMakes regex also is crippled. So we couldn't use advanced regex such as negative lookahead in LIST(FILTER EXLUDE) similar to LIST(FILTER children EXCLUDE REGEX "^((?!autogen/public).)*$")... So even this solution is not really what we wanted.
Our CMakeLists.txt ended up looking something like this
cmake_minimum_required(VERSION 3.9)
project(documentation_html LANGUAGES CXX)
find_package(Doxygen REQUIRED dot)
# Custom macros
## Macro for getting all relevant directories when creating HTML documentain.
## This was created cause the regex matching in Doxygen and CMake are lacking support for more
## advanced syntax.
MACRO(SUBDIRS result current_dir include_regex)
FILE(GLOB_RECURSE children ${current_dir} ${current_dir}/*)
LIST(FILTER children INCLUDE REGEX "${include_regex}")
SET(dir_list "")
FOREACH(child ${children})
get_filename_component(path ${child} DIRECTORY)
IF(${path} MATCHES ".*autogen/public.*$" OR NOT ${path} MATCHES ".*build.*$") # If we have the /source/build/autogen/public folder available we create the doxygen for those interfaces also.
LIST(APPEND dir_list ${path})
ENDIF()
ENDFOREACH()
LIST(REMOVE_DUPLICATES dir_list)
string(REPLACE ";" " " dirs "${dir_list}")
SET(${result} ${dirs})
ENDMACRO()
SUBDIRS(DOCSDIRS "${CMAKE_SOURCE_DIR}/docs" ".*.plantuml$|.*.puml$|.*.md$|.*.txt$|.*.sty$|.*.tex$|")
SUBDIRS(SOURCEDIRS "${CMAKE_SOURCE_DIR}/source" ".*.cpp$|.*.hpp$|.*.h$|.*.md$")
# Common config
set(DOXYGEN_CONFIG_PATH ${CMAKE_SOURCE_DIR}/docs/doxy_config)
set(DOXYGEN_IN ${DOXYGEN_CONFIG_PATH}/Doxyfile.in)
set(DOXYGEN_IMAGE_PATH ${CMAKE_SOURCE_DIR}/docs)
set(DOXYGEN_PLANTUML_INCLUDE_PATH ${CMAKE_SOURCE_DIR}/docs)
set(DOXYGEN_OUTPUT_DIRECTORY docs)
# HTML config
set(DOXYGEN_INPUT "${DOCSDIRS} ${SOURCEDIRS}")
set(DOXYGEN_EXCLUDE_PATTERNS "*/tests/* */.*/*")
set(DOXYGEN_FILE_PATTERNS "*.cpp *.hpp *.h *.md")
set(DOXYGEN_RECURSIVE NO)
set(DOXYGEN_GENERATE_LATEX NO)
set(DOXYGEN_GENERATE_HTML YES)
set(DOXYGEN_HTML_DYNAMIC_MENUS NO)
configure_file(${DOXYGEN_IN} ${CMAKE_BINARY_DIR}/DoxyHTML #ONLY)
add_custom_target(docs
COMMAND ${DOXYGEN_EXECUTABLE} ${CMAKE_BINARY_DIR}/DoxyHTML -d Markdown
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMENT "Generating documentation"
VERBATIM)
and in the Doxyfile we added the environment variables for those fields
OUTPUT_DIRECTORY = #DOXYGEN_OUTPUT_DIRECTORY#
INPUT = #DOXYGEN_INPUT#
FILE_PATTERNS = #DOXYGEN_FILE_PATTERNS#
RECURSIVE = #DOXYGEN_RECURSIVE#
EXCLUDE_PATTERNS = #DOXYGEN_EXCLUDE_PATTERNS#
IMAGE_PATH = #DOXYGEN_IMAGE_PATH#
GENERATE_HTML = #DOXYGEN_GENERATE_HTML#
HTML_DYNAMIC_MENUS = #DOXYGEN_HTML_DYNAMIC_MENUS#
GENERATE_LATEX = #DOXYGEN_GENERATE_LATEX#
PLANTUML_INCLUDE_PATH = #DOXYGEN_PLANTUML_INCLUDE_PATH#
After this we can run cd ./build && cmake ../ && make docs to create our html documentation and have it include the autogenerated interfaces in our source folder without including all the other directories in the build folder.
Quick description of what actually happens in the CMakeLists.txt
# Macro that gets all directories from current_dir recursively and returns the result to result as a space separated string
MACRO(SUBDIRS result current_dir include_regex)
# Gets all files recursively from current_dir
FILE(GLOB_RECURSE children ${current_dir} ${current_dir}/*)
# Filter files so we only keep the files that match the include_regex (can't be to advanced regex)
LIST(FILTER children INCLUDE REGEX "${include_regex}")
SET(dir_list "")
# Let us act on all files... :)
FOREACH(child ${children})
# We're only interested in the path. So we get the path part from the file
get_filename_component(path ${child} DIRECTORY)
# Since CMakes regex also is crippled we can't do nice things such as LIST(FILTER children EXCLUDE REGEX "^((?!autogen/public).)*$") which would have been preferred (CMake regex does not understand negative lookahead/lookbehind)... So we ended up with this ugly thing instead... Adding all build/autogen/public paths and not adding any other paths inside build. I guess it would be possible to write this expression in regex without negative lookahead. But I'm both not really fluent in regex (who are... right?) and a bit lazy in this case. We just needed to get this one pointer task done... :P
IF(${path} MATCHES ".*autogen/public.*$" OR NOT ${path} MATCHES ".*build.*$")
LIST(APPEND dir_list ${path})
ENDIF()
ENDFOREACH()
# Remove all duplicates... Since we GLOBed all files there are a lot of them. So this is important or Doxygen INPUT will overflow... I know... I tested...
LIST(REMOVE_DUPLICATES dir_list)
# Convert the dir_list to a space seperated string
string(REPLACE ";" " " dirs "${dir_list}")
# Return the result! Coffee and cinnamon buns for everyone!
SET(${result} ${dirs})
ENDMACRO()
# Get all the pathes that we want to include in our documentation ... this is also where the build folders for the different applications are going to be... with our autogenerated interfaces which we want to keep.
SUBDIRS(SOURCEDIRS "${CMAKE_SOURCE_DIR}/source" ".*.cpp$|.*.hpp$|.*.h$|.*.md$")
# Add the dirs we want to the Doxygen INPUT
set(DOXYGEN_INPUT "${SOURCEDIRS}")
# Normal exlude patterns for stuff we don't want to add. This thing does not support regex... even though it should.
set(DOXYGEN_EXCLUDE_PATTERNS "*/tests/* */.*/*")
# Normal use of the file patterns that we want to keep in the documentation
set(DOXYGEN_FILE_PATTERNS "*.cpp *.hpp *.h *.md")
# IMPORTANT! Since we are creating all the INPUT paths our self we don't want Doxygen to do any recursion for us
set(DOXYGEN_RECURSIVE NO)
# Write the config
configure_file(${DOXYGEN_IN} ${CMAKE_BINARY_DIR}/DoxyHTML #ONLY)
# Create the target that will use that config to create the html documentation
add_custom_target(docs
COMMAND ${DOXYGEN_EXECUTABLE} ${CMAKE_BINARY_DIR}/DoxyHTML -d Markdown
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMENT "Generating documentation"
VERBATIM)
I know this isn't the answer anyone who stumbles in on this question wants... unfortunately it seems to be the only reasonable solution...
... you all have my deepest condolences...
I have a SQL query
select name from Employee
Output :
Sharma's
How can I store this output in perl string.
I tried below :
$sql =qq {select Name from Employee};
$Arr = &DataBaseQuery( $dbHandle, $sql );
$name = $Arr;
But when I print $name I get output as
Sharma::s
How can I store the single quote in the $name.
First of all, non of standard DBI/DBD exibits behavior you listed, in my experience.
Without knowing details of what DataBaseQuery() does it's impossible to answer conclusively, but a plausible theory can be formed:
Apostrophe is a valid package separator in Perl, equivalent to "::".
Reference: perldoc perlmod
The old package delimiter was a single quote, but double colon is now the preferred delimiter, in part because it's more readable to humans, and in part because it's more readable to emacs macros. It also makes C++ programmers feel like they know what's going on--as opposed to using the single quote as separator, which was there to make Ada programmers feel like they knew what was going on. Because the old-fashioned syntax is still supported for backwards compatibility, if you try to use a string like "This is $owner's house" , you'll be accessing $owner::s ; that is, the $s variable in package owner , which is probably not what you meant. Use braces to disambiguate, as in "This is ${owner}'s house" .
perl -e 'package A::B; $A::B=1; 1;
package main;
print "double-colon: $A::B\n";
print "apostrophe: $A'"'"'B\n";'
double-colon: 1
apostrophe: 1
I have a strong suspicion something within your own libraries inside DataBaseQuery() call was written to be "smart" and to convert apostrophes to double-colons because of this.
If you can't figure out root cause, you can always do one of the following:
Write your own DB wrapper
Assuming your text isn't likely to contain "::", run a regex to fix s#::#'#g; on all results from DataBaseQuery() (likely, in a function serving as a wrapper-replacement for DataBaseQuery())
I have a c++ project which directory structure like below:
server/
code/
BASE/
Thread/
Log/
Memory/
Net/
cmake/
CMakeList.txt
BASE/
CMakeList.txt
Net/
CMakeList.txt
here is part of /cmake/CMakeList.txt:
MACRO(SUBDIRLIST result curdir)
FILE(GLOB children RELATIVE ${curdir} ${curdir}/*)
SET(dirlist "")
FOREACH(child ${children})
IF(IS_DIRECTORY ${curdir}/${child})
SET(dirlist ${dirlist} ${child})
ENDIF()
ENDFOREACH()
SET(${result} ${dirlist})
ENDMACRO()
add_subdirectory(Base)
then use macro in /cmake/Base/CMakeList.txt:
SET(SUBDIR, "")
SUBDIRLIST(SUBDIRS, ${BASE_SRC_DIR})
message("SUBDIRS : " ${SUBDIRS})
output:
SUBDIRS :
I check ${dirlist} by output it's value in macro, I get directory list expected,but when message("result " ${result}) after SET(${result} ${dirlist}),I can not get output expected , what's wrong with my CMakeLists.txt?
There are a couple of minor issues here:
In your macro, SET(dirlist "") could be just SET(dirlist). Likewise, SET(SUBDIR, "") could be just SET(SUBDIRS) (I guess "SUBDIR" is a typo and should be "SUBDIRS". Also you don't want the comma in the set command - probably another typo?)
To output the contents of ${result} in the macro, use message("result: ${${result}}"), since you're not appending ${child} to result each time, but to ${result}. In your example ${result} is SUBDIRS, so ${${result}} is ${SUBDIRS}.
When you call SUBDIRLIST, don't use a comma between the arguments.
When you output the value of SUBDIRS, include ${SUBDIRS} in the quotes, i.e. message("SUBDIRS: ${SUBDIRS}") or you'll lose the semi-colon separators.
Other than those, your macro seems fine.