How to split frontend and backend translations? - gettext

I have a web project (php+js) translated by gettext. Later it was only translated at server side, pushing translations to JS by varying weird ways. Now i converted it to all gettext, convert my .po files by po2json and load them into Jed library. But this way I load all translations, even never used on client !
What i want to do now:
xgettext -js-options *.js > js-empty.po
xgettext -php-options *.php > php-empty.po
magic both-translated.po js-empty.po > js-translated.po
magic both-translated.po php-empty.po > php-translated.po
What should i use as 'magic' ?
P.S. I will be doing actual translation in one file and then split just for optimization, on every build.

I have found the solution:
msgcomm both-translated.po js-empty.po -o js-translated.po
Visual inspection confirms what lines are translated, and the following command confirms what number of 'msgid' is equal in empty and translated files:
grep msgid $1 | wc -l
Adding more value to answer: where's well recommended Python library https://pypi.python.org/pypi/polib, or in main Linux distributions, packages 'python3-polib' or 'python-polib'. I was considering using it to perform the task and maybe will use in future for other gettext-related tasks.

Related

colorgcc perl script with output to non-tty enabled writing to C dependency files

Ok, so here's my issue. I have written a build script in bash that pipes output to tee and sorts different output to different log files (so I can summarize errors/warnings at the end and get some statistics on files built). I wanted to use the colorgcc perl script (colorgcc.1.3.2) to colorize the output from gcc and had found in other places that this won't work piping to tee, since the script checks if it is writing to something that is not a tty. Having disabled this check everything was working until I did a full build and discovered some of the code we receive from another group builds C dependency files (we don't control this code, changing it or the build process for these isn't really an option).
The problem is that these .d files have the form as follows:
filename.o filename.d : filename.c \
dependant_file1.h \
dependant_file2.h (and so on for however many dependencies there are)
This output from GCC gets written into the .d file, but, since it is close enough to a warning/error message colorgcc outputs color codes (believe it's the check for filename:lineno:message but not 100% sure, could be filename:message check in the GCCOUT while loop). I've tried editing the regex to attempt to not match this but my perl-fu is admittedly pretty weak. So what I end up with is a color code on each line for these dependency files, which obviously causes the build to fail.
I ended up just replacing the check for ! -t STDOUT with a check for a NO_COLOR envar I set and unset in the build script for these directories (emulates the previous behavior of no color for non-tty). This works great if I run the full script, but doesn't if I cd into the directory and just run make (obviously setting and unsetting manually would work but this is a pain to do every time). Anyone have any ideas how to prevent this script from writing color codes into dependency files?
Here's how I worked around this. I added the following to colorgcc to search the gcc input for the flag to generate the .d files and just directly called the compiler in that case. This was inserted in place of the original TTY check.
for each $argnum (0 .. $#ARGV)
{
if ($ARGV[$argnum] =~ m/-M{1,2}/)
{
exec $compiler, #ARGV
or die("Couldn't exec");
}
}
I don't know if this is the proper 'perl' way of doing this sort of operation but it seems to work. Compiling inside directories that build .d files no longer inserts color codes and the source file builds do (both to terminal and my log files like I wanted). I guess sometimes the answer is more hacks instead of "hey, did you try giving up?".

wget appends query string to resulting file

I'm trying to retrieve working webpages with wget and this goes well for most sites with the following command:
wget -p -k http://www.example.com
In these cases I will end up with index.html and the needed CSS/JS etc.
HOWEVER, in certain situations the url will have a query string and in those cases I get an index.html with the query string appended.
Example
www.onlinetechvision.com/?p=566
Combined with the above wget command will result in:
index.html?page=566
I have tried using the --restrict-file-names=windows option, but that only gets me to
index.html#page=566
Can anyone explain why this is needed and how I can end up with a regular index.html file?
UPDATE: I'm sort of on the fence on taking a different approach. I found out I can take the first filename that wget saves by parsing the output. So the name that appears after Saving to: is the one I need.
However, this is wrapped by this strange character รข - rather than just removing that hardcoded - where does this come from?
If you try with parameter "--adjust-extension"
wget -p -k --adjust-extension www.onlinetechvision.com/?p=566
you come closer. In www.onlinetechvision.com folder there will be file with corrected extension: index.html#p=566.html or index.html?p=566.html on *NiX systems. It is simple now to change that file to index.html even with script.
If you are on Microsoft OS make sure you have latter version of wget - it is also available here: https://eternallybored.org/misc/wget/
To answer your question about why this is needed, remember that the web server is likely to return different results based on the parameters in the query string. If a query for index.html?page=52 returns different results from index.html?page=53, you probably wouldn't want both pages to be saved in the same file.
Each HTTP request that uses a different set of query parameters is quite literally a request for a distinct resource. wget can't predict which of these changes is and isn't going to be significant, so it's doing the conservative thing and preserving the query parameter URLs in the filename of the local document.
My solution is to do recursive crawling outside wget:
get directory structure with wget (no file)
loop to get main entry file (index.html) from each dir
This works well with wordpress sites. Could miss some pages tho.
#!/bin/bash
#
# get directory structure
#
wget --spider -r --no-parent http://<site>/
#
# loop through each dir
#
find . -mindepth 1 -maxdepth 10 -type d | cut -c 3- > ./dir_list.txt
while read line;do
wget --wait=5 --tries=20 --page-requisites --html-extension --convert-links --execute=robots=off --domain=<domain> --strict-comments http://${line}/
done < ./dir_list.txt
The query string is required because of the website design what the site is doing is using the same standard index.html for all content and then using the querystring to pull in the content from another page like with script on the server side. (it may be client side if you look in the JavaScript).
Have you tried using --no-cookies it could be storing this information via cookie and pulling it when you hit the page. also this could be caused by URL rewrite logic which you will have little control over from the client side.
use -O or --output-document options. see http://www.electrictoolbox.com/wget-save-different-filename/

Search for files & file names using silver searcher

Using Silver Searcher, how can I search for:
(non-binary) files with a word or pattern AND
all filenames, with a word or pattern including filenames of binary files.
Other preferences: would like to have case insensitive search and search through dotfiles.
Tried to alias using this without much luck:
alias search="ag -g $1 --smart-case --hidden && ag --smart-case --hidden $1"
According to the man page of ag
-G --file-search-regex PATTERN
Only search files whose names match PATTERN.
You can use the -G option to perform searches on files matching a pattern.
So, to answer your question:
root#apache107:~/rpm-4.12.0.1# ag -G cpio.c size
rpm2cpio.c
21: off_t payload_size;
73: /* Retrieve payload size and compression type. */
76: payload_size = headerGetNumber(h, RPMTAG_LONGARCHIVESIZE);
the above command searches for the word size in all files that matches the pattern cpio.c
Reference:
man page of ag version 0.28.0
Note 1:
If you are looking for a string in certain file types, say all C sources code, there is an undocumented feature in ag to help you quickly restrict searches to certain file types.
The commands below both look for foo in all php files:
find . -name \*.php -exec grep foo {}
ag --php foo
While find + grep looks for all .php files, the --php switch in the ag command actually looks for the following file extensions:
.php .phpt .php3 .php4 .php5 .phtml
You can use --cpp for C++ source files, --hh for .h files, --js for JavaScript etc etc. A full list can be found here
Try this:
find . | ag "/.*SEARCHTERM[^/]*$"
The command find . will list all files.
We pipe the output of that to the command ag "/.*SEARCHTERM[^/]*$", which matches SEARCHTERM if it's in the filename, and not just the full path.
Try adding this to your aliases file. Tested with zsh but should work with bash. The problem you encountered in your example is that bash aliases can't take parameters, so you have to first define a function to use the parameter(s) and then assign your alias to that function.
searchfunction() {
echo $(ag -g $1 --hidden)
echo $(ag --hidden -l $1)
}
alias search=searchfunction
You could modify this example to suit your purpose in a few ways, eg
add/remove the -l flag depending on whether or not you want text results to show the text match or just the filename
add headers to separate text results and filename results
deduplicate results to account for files that match both on filename and text, etc.
[Edit: removed unnecessary --smart-case flag per Pablo Bianchi's comment]
Found this question looking for the same answer myself. It doesn't seem like ag has any native capability to search file and directory names. The answers above from Zach Fogg and Jikku Jose both work, but piping find . can be very slow if you're working in a big directory.
I'd recommend using find directly, which is much faster than piping it through ag:
Linux (GNU version of find)
find -name [pattern]
OSX (BSD version of find)
find [pattern]
If you need more help with find, this guide from Digital Ocean is pretty good. I include this because the man pages for find are outrageously dense if you just want to figure out basic usage.
To add to the previous answers, you can use an "Or" Regular Expression to search within files matching different file extensions.
For example to just search a string in C++ header files [.hpp] and Makefiles [.mk] ) :
ag -G '.*\.(hpp|mk)' my_string_to_search
After being unsatisfied with mdfind, find, locate, and other attempts, the following worked for me. It uses tree to get the initial list of files, ag to filter out directories, and then awk to print the matching files themselves.
I wound up using tree because it was more (and more easily) configurable than the other solutions I tried and is fast.
This is a fish function:
function ff --description 'Find files matching given string'
tree . --prune --matchdirs -P "*$argv*" -I "webpack" -i -f --ignore-case -p |
ag '\[[^d].*' |
awk '{print $2}'
end
This gives output similar to the following:
~/temp/hello_world $ ff controller
./app/controllers/application_controller.rb
./config/initializers/application_controller_renderer.rb
~/temp/hello_world $

CVS command to get brief history of repository

I am using following command to get a brief history of the CVS repository.
cvs -d :pserver:*User*:*Password*#*Repo* rlog -N -d "*StartDate* < *EndDate*" *Module*
This works just fine except for one small problem. It lists all tags created on each file in that repository. I want the tag info, but I only want the tags that are created in the date range specified. How do I change this command to do that.
I don't see a way to do that natively with the rlog command. Faced with this problem, I would write a Perl script to parse the output of the command, correlate the tags to the date range that I want and print them.
Another solution would be to parse the ,v files directly, but I haven't found any robust libraries for doing that. I prefer Perl for that type of task, and the parsing modules don't seem to be very high quality.

JSF action, value and binding catalog generator

I am looking for a simple tool that generates a catalog of all used action methods, values and bindings. I'm working on a big JSF/RichFaces project and I have lost the overview of the used links to the beans. Therefore I need a tool (would be nice if it is a Eclipse plugin) that generates a simple list of all used EL expressions.
Is there something out there?
Run the following Unix/Linux command in the directory containing the code:
cat * | sed -e '/#{/!d' -e 's/#/\n#/g' -e 's/}/}\n/g' | sed '/#/!d' | sort | uniq
If you are using Windows, install Cygwin and run it with that.
Only works in the one directory at the moment but shouldnt be too hard to make the cat call recursive.