POEdit encodings issues - encoding

I'm using PoEdit to create my messages.mo file to be used in my php web app.
I checked my encoding is UTF-8 and still, my accents are not showing (e.g. "é", "è", ...). Actually, both source and target files are defined with UTF-8...
Here's the code I use to enable gettext:
<?php
$dir = "../locale";
$lang="fr_FR";
$domain="messages";
putenv("LANG=$lang");
setlocale(LC_ALL, $lang);
bindtextdomain ($domain, $dir);
textdomain ($domain);
echo gettext("TEST 1") . "\n";
echo __("Test 2"); // works if using gettext("Test 2");
?>
EDIT: I also add here the header of my page, stating I should be using UTF-8...
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
EDIT 2: Here a link to the po file. Also, I try to copy-paste the result here
R�gle plurielle 1 accessible en �criture.
I should get
Règle plurielle 1 accessible en écriture.
Any idea how to resolve this?

Related

negative regex with xidel + garbage-collect function

I currently use this command to extract URLs from a site
xidel https://www.website.com --extract "//h1//extract(#href, '.*')[. != '']"
This will extract all URLs (.*) but I would like to change this in a way that it would not extract URLs that contain specific strings in their URI path. For example, I would like to extract all URLs, except the ones that contain -text1- and -text2-
Also, xidel has a function called garbage-collect but it's not clear to me how to use these functions. I could be
--extract garbage-collect()
or
--extract garbage-collect()[0]
or
x:extract garbage-collect()
or
x"extract garbage-collect()
But these didn't reduce the memory usage when extracting URLs from multiple pages using --follow.
Just noticed this old question. It looks like OP's account is suspended, so I hope the following answer will be helpful for other users.
Let's assume 'test.htm' :
<html>
<body>
<span class="a-text1-u">1</span>
<span class="b-text2-v">2</span>
<span class="c-text3-w">3</span>
<span class="d-text4-x">4</span>
<span class="e-text5-y">5</span>
<span class="f-text6-z">6</span>
</body>
</html>
To extract all "class"-nodes, except the ones that contain "-text1-" and "-text2-":
xidel -s test.htm -e "//span[not(contains(#class,'-text1-') or contains(#class,'-text2-'))]/#class"
#or
xidel -s test.htm -e "//#class[not(contains(.,'-text1-') or contains(.,'-text2-'))]"
c-text3-w
d-text4-x
e-text5-y
f-text6-z
xidel has a function called garbage-collect but it's not clear to me how to use these functions.
http://www.benibela.de/documentation/internettools/xpath-functions.html#x-garbage-collect:
x:garbage-collect (0 arguments)
Frees unused memory. Always call it as garbage-collect()[0], or it might garbage collect its own return value
and crash.
So that would be -e "garbage-collect()[0]".

ConvertTo-Html from hash table of html content to html document

I am trying to display some html encoded information on a document that is generated by a scheduled execution of a powershell script.
The following MVP illustrates my issue:
#{ a="<div style=""color:red;"">Hello</div>"; b="Hi"}.GetEnumerator() | Select Key, Value | ConvertTo-Html | Out-File -Encoding utf8 -FilePath C:\Scripts\Test.html
Which outputs:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>HTML TABLE</title>
</head><body>
<table>
<colgroup><col/><col/></colgroup>
<tr><th>Key</th><th>Value</th></tr>
<tr><td>a</td><td><div style="color:red;">Hello</div></td></tr>
<tr><td>b</td><td>Hi</td></tr>
</table>
</body></html>
Which, when opened, looks like:
But I want my Hello to be red, and not to see the escaped html div code.
Is there any way to tell ConvertTo-Html not to escape my inputs?
Note: This MVP only illustrates the issue I'm facing. I actually have a very complex report that I would like to decorate for easier viewing (color coding, symbol, et al).
This is the report I am trying to configure:
The main purpose of the ConvertTo-Html cmdlet is to provide an easy-to-use tool for converting lists of objects into tabular HTML reports. The input for this conversion is expected to be non-HTML data, and characters that have a special meaning in HTML are automatically escaped. This cannot be turned off.
Unescaped HTML fragments can be inserted into the HTML report via the parameters -Body, -PreContent, and -PostContent before or after tabular data. However, for more complex reports this probably isn't versatile enough. The best approach in situations like that is to generate the individual parts of your report as fragments, e.g.
$ps = Get-Process | ConvertTo-Html -PreContext '<p>Process list</p>' -Fragment
and then combine all fragments with a here-string:
$html = #"
<html>
<head>
...
</head>
<body>
${ps}
<hr>
${other_fragment}
...
</body>
</html>
"#
As for individual formatting of particular parts of generated fragments: that is not supported. You need to modify the resulting HTML code yourself, either via search&replace (in fragments or the full HTML content) or by parsing and modifying the full HTML content.

Read html file as string in Powershell [duplicate]

This question already has answers here:
PowerShell: Store Entire Text File Contents in Variable
(5 answers)
Closed 5 years ago.
I need to read a html file and parse the content to a string
From this
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Index</title>
</head>
<body>
Index
</body>
</html>
To an output like this
$stringValue = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"...
I've tried with $stringValue = $htmlFile | ConvertTo-Json but it transforms some characters into new codes (> = u003e) where I want to keep the special characters intact.
Any help is appreciated
You can use below command to get the content of html file and that you can store in any string variable like below.
[string]$Datas = Get-Content [HTML_file_Location]
Try to read it as UTF-16 and see if output is passed through as desired. This answer shows how to read it as UTF-16.
Reading a "string in little-endian UTF-16 encoding" with BinaryReader

How to generate caption from img alt atribute

Is there a way to convert an img tag containing an alt attribute (in a html file),
<img src="pics/01.png" alt="my very first pic"/>
to an image link plus caption (org file),
#+CAPTION: my very first pic
[[pics/01.png]]
using pandoc?
I'm calling pandoc like this:
$ pandoc -s -r html index.html -o index.org
where index.html contains the img tag from above, but it doesn't add the caption in the output org file:
[[pics/01.png]]
Currently the Org Writer unfortunately throws away the image alt and title strings. Feel free to submit an issue or patch if there's a way to do alt text in Org.
You can also always write a filter to modify the doc AST and add the alt text to an additional paragraph.
OP here. I didn't manage to make pandoc bend to my needs in this case. But a little bash scripting with some awk help does the trick.
The script replaces all img tags with org-mode equivalents plus captions. Pandoc leaves these alone when converting from html to org-mode.
The awk script,
# replace_img.awk
#
# Sample input:
# <img src="/pics/01.png" alt="my very first pic"/>
# Sample output:
# #+CAPTION: my very first pic
# [[/pics/01.png]]
BEGIN {
# Split the input at "
FS = "\""
}
# Replace all img tags with an org-mode equivalent.
/^<img src/{
print "#+CAPTION: " $4
print "[["$2"]]"
}
# Leave the rest of the file intact.
!/^<img src/
and the bash script,
# replace_img.sh
php_files=`find -name "*.php"`
for file in $php_files; do
awk -f replace_img.awk $file > tmp && mv tmp $file
done
Place these files at the root of the project, chomod +x replace_img.sh and then run the script: ./replace_img.sh. Change the extension of the files, if needed. I've had over 300 php files.

php gettext include string with phpcode

i'm trying to use gettext to translate the string in my site
gettext doesn't have problem detecting strings such as
<? echo _("Donations"); ?>
or
<? echo _("Donate to this site");?>
but obviously, usually we'll use codes like this in our site
<? echo _("$siteName was developed with one thing in mind"); ?>
Of course in the website, the $siteName is displayed correctly as
My Website was developed with one thing in mind
if we put
$siteName = "My Website";
previously.
My problem is, i'm using poedit to extract all the strings in my codes that needs to be translated, and it seems poedit doesn't extract all string with php codes like I described above. So how do I get poedit extract strings with php code inside it too? Or is there any other tools I should use?
One possibility is to use sprintf. Just make sure you keep the percent (%) in the poedit string!
echo sprintf( _("This %s can be translated "), 'string');
Or when using multiple variables
echo vsprintf( _("This %s can be %s"), ['string', 'translated']);