When I attempt to display a Japanese string in a UILabel on iOS, it gets displayed using Chinese encoding instead of Japanese.
The two encodings are nearly identical, except in a few specific cases. For example, here is how the character 直 (Unicode U+76F4) is rendered in Chinese (top) vs. Japanese (bottom):
(see here for more examples)
The only time Japanese strings render correctly is when the user's system locale is ja-jp (Japan), but I'd like it to render as Japanese for all users.
Is there any way to force the Japanese encoding? Android has TextView.TextLocale, but I don't see anything similar on iOS UILabel
(Same question for Android. I tagged this Swift/Objective-C because, although I'm looking for a Xamarin.iOS solution, the API is almost the same)
You just need to specify language identifier for attributed string, like
let label = UILabel()
let text = NSAttributedString(string: "直", attributes: [
.languageIdentifier: "ja", // << this !!
.font: UIFont.systemFont(ofSize: 64)
])
label.attributedText = text
Tested with Xcode 13.2 / iOS 15.2
I found an extremely hacky solution that seems to work. However, it seems absurd that there's no way to simply set the locale of a label, so if anyone finds something I missed, please post an answer.
The trick relies on the fact that the Hiragino font displays kanji using Japanese encoding rather than Chinese encoding by default. However, the font looks like shit for English text, so I have to search every string in every label for Japanese substrings and manually change the font using NSMutableAttributedString. The font is also completely broken so I had to find another workaround to fix that.
[assembly: ExportRenderer(typeof(Label), typeof(RingotanLabelRenderer))]
namespace MyApp
{
public class MyLabelRenderer : LabelRenderer
{
private readonly UIFont HIRAGINO_FONT = UIFont.FromName("HiraginoSans-W6", 1); // Size gets updated later
protected override void OnElementPropertyChanged(object sender, PropertyChangedEventArgs e)
{
base.OnElementPropertyChanged(sender, e);
// BUGFIX: Chinese encoding is shown by default. Switch to Hiragino font, which correctly shows Japanese characters
// Taken from https://stackoverflow.com/a/71045204/238419
if (Control?.Text != null && e.PropertyName == "Text")
{
var kanjiRanges = GetJapaneseRanges(Control.Text).ToList();
if (kanjiRanges.Count > 0)
{
var font = HIRAGINO_FONT.WithSize((nfloat)Element.FontSize);
var attributedString = Control.AttributedText == null
? new NSMutableAttributedString(Control.Text)
: new NSMutableAttributedString(Control.AttributedText);
// Search through string for all instances of Japanese characters and update the font
foreach (var (start, end) in kanjiRanges)
{
int length = end - start + 1;
var range = new NSRange(start, length);
attributedString.AddAttribute(UIStringAttributeKey.Font, font, range);
// Bugfix: Hiragino font is broken (https://stackoverflow.com/a/44397572/238419) so needs to be adjusted upwards
// jesus christ Apple
attributedString.AddAttribute(UIStringAttributeKey.BaselineOffset, (NSNumber)(Element.FontSize/10), range);
}
Control.AttributedText = attributedString;
}
}
}
// Returns all (start,end) ranges in the string which contain only Japanese strings
private IEnumerable<(int,int)> GetJapaneseRanges(string str)
{
for (int i = 0; i < str.Length; i++)
{
if (IsJapanese(str[i]))
{
int start = i;
while (i < str.Length - 1 && KanjiHelper.IsJapanese(str[i]))
{
i++;
}
int end = i;
yield return (start, end);
}
}
}
private static bool IsJapanese(char character)
{
// An approximation. See https://github.com/caguiclajmg/WanaKanaSharp/blob/792f45a27d6e543d1b484d6825a9f22a803027fd/WanaKanaSharp/CharacterConstants.cs#L110-L118
// for a more accurate version
return character >= '\u3000' && character <= '\u9FFF'
|| character >= '\uFF00';
}
}
}
Related
I have a TEXT entry field. I want to limit the entry to letters or digits. Other characters should be converted to an under-score. The following code does this. However when the user pastes into the field, it bypasses the listener and the raw text is put into the field
private class TextKeyVerifyListener implements VerifyListener
{
#Override
public void verifyText( VerifyEvent event )
{
if (Character.isLetterOrDigit( event.character ))
event.text = "" + Character.toUpperCase( event.character );
else if (!Character.isISOControl( event.keyCode ))
event.text = "_";
}
}
How do I trap the paste action so at least I can re-parse the text field. It seems kind of heavy duty to do this in the modify listener for each keystroke. Any solution should be cross-platform :-)
Trapping for CTRL-V might do this, but the user can also use the pop menu and choose paste.
VerifyEvent has a text field containing all the text to be verified. You should be using this rather than the character field. text is set to the full pasted text.
Ok, for anyone else trying to do this:
if (event.keyCode == 0 || !Character.isISOControl( event.keyCode ))
{
StringBuilder text = new StringBuilder();
char[] chars = event.text.toCharArray();
for (char character : chars)
if (Character.isLetterOrDigit( character ))
text.append( Character.toUpperCase( character ) );
else
text.append( "_" ); //$NON-NLS-1$
event.text = text.toString();
}
It's a little heavy for single keystrokes, but it will convert as I wanted.
Thanks!
This answer has some code to convert a locale to a country emoji in Java. I tried implementing it in Dart but no success.
I tried converting the code above to Dart
void _emoji() {
int flagOffset = 0x1F1E6;
int asciiOffset = 0x41;
String country = "US";
int firstChar = country.codeUnitAt(0) - asciiOffset + flagOffset;
int secondChar = country.codeUnitAt(1) - asciiOffset + flagOffset;
String emoji =
String.fromCharCode(firstChar) + String.fromCharCode(secondChar);
print(emoji);
}
"US" locale should output "🇺🇸"
The code you posted works correctly, i.e. print(emoji) successfully prints 🇺🇸.
I assume that the real problem you have is that the Flutter Text widget displays it like this:
It is the US flag, however, I have to agree that it does not look like it when you see it on device as the font size is very small and the flag has a rather high resolution.
You will need to use a custom font and apply it to your Text widget using the following:
Text(emoji,
style: TextStyle(
fontFamily: '...',
),
)
Otherwise, both the conversion and displaying the flags works fine. I believe that they just look different than you expected.
I'm facing a problem when trying to export a Vietnamese document as PDF using iText.
I put Vietnamese words in .xml file like this
<td fontfamily="Helvetica" fontstyle="0" fontsize="9" align="0" colspan="48" lineoccupied="1">T\u1ED5 ch\u1EE9c tham gia</td>
then having java to get the phrases from xml file and convert it into Unicode using this method:
public String convertToUnicode(String s) {
int i = 0, len = s.length();
char c;
StringBuffer sb = new StringBuffer(len);
try {
while (i < len) {
c = s.charAt(i++);
if (c == '\\') {
if (i < len) {
c = s.charAt(i++);
if (c == 'u') {
if (Character.digit(s.charAt(i), 16) != -1
&& Character.digit(s.charAt(i + 1), 16) != -1
&& Character.digit(s.charAt(i + 2), 16) != -1
&& Character.digit(s.charAt(i + 3), 16) != -1) {
if (s.substring(i).length() >= 4) {
c = (char) Integer.parseInt(s.substring(i, i + 4), 16);
i += 4;
} else {
sb.append('\\');
}
} else {
sb.append('\\');
}
} // add other cases here as desired...
}
} // fall through: \ escapes itself, quotes any character but u
sb.append(c);
}
} catch (Exception e) {
System.out.println("Error Generate PDF :: " + e.getStackTrace().toString());
return s;
}
return sb.toString();
}
After that, export String to PDF - encoding UTF-8.
But the program failed to display Vietnamese character '\u1ED5' and '\u1EE9'
The output becomes "T chc tham gia"
Could you please show me how to fix this issue?
Thanks :)
There are 3 XML Worker examples involving Asian languages on the official iText web site. They parse an XHTML file containing Chinese characters, but it should be easy to adapt them to Vietnamese examples.
You can find the HTML files were going to parse here:
hero.html
hero2.html
Both files contain the following text:
長空 (Broken Sword), 秦王殘劍 (Flying Snow), 飛雪 (Moon), 如月 (the King), and 秦王 (Sky).
In the first case, a font is defined using CSS:
<span style="font-size:12.0pt; font-family:MS Mincho">長空</span>
In the second case, no specific font is defined:
<body><p>長空 (Broken Sword), 秦王殘劍 (Flying Snow), 飛雪 (Moon), 如月 (the King), and 秦王 (Sky).</p></body>
These files contain UTF-8 characters, so we're going to parse them like this:
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(HTML), Charset.forName("UTF-8"));
The first thing you need, is a font that supports Vietnamese characters. That's something iText can't help you with. In your HTML file, you've defined Helvetica, but that's a standard Type1 font that is never embedded when using iText and that doesn't know how to draw Vietnamese glyphs. That's never going to work.
The first example D07_ParseHtmlAsian will automatically search for a font named MS Mincho. If it finds that font (for instance because you have msmincho.ttc in your Windows fonts directory), the font will show up in your PDF. See hero.pdf. If it doesn't find a font with that name, then the glyphs won't be visible, because you didn't provide any font program for those glyphs.
The second example D07bis_ParseHtmlAsian offers a workaround in case you don't have MS Mincho anywhere. In that case, you have to use an XMLWorkerFontProvider and register a font that can be used instead of MS Mincho. For instance: we use a font stored in the file cfmingeb.ttf and assign the alias MS Mincho:
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/cfmingeb.ttf", "MS Mincho");
The resulting file asian.pdf is slightly different from what we expect, but now we can at least see the Chinese glyphs.
In the third example, the HTML file doesn't tell us anything about the font that needs to be used. We'll define the font using CSS like this:
CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = XMLWorkerHelper.getCSS(new ByteArrayInputStream("body {font-family:tsc fming s tt}".getBytes()));
cssResolver.addCss(cssFile);
Now, all the text in the body will use the font TSC FMing S TT (stored in the file cfmingeb.ttf). You can see the difference in the resulting PDF asian2.pdf.
I think you need an encoding as UTF-8 for your HTML and use &#xUNUM; for hex or &#NUM; for regular code to embed your special characters. Not sure where but somewhere in your program since it is not display shown, but your final HTML should be:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML LEVEL 1//EN">
<HTML>
<HEAD>
<TITLE>Your Page Title</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
</HEAD>
<BODY>
<!-- YOUR CONTENT HERE -->
<td fontfamily="Helvetica" fontstyle="0" fontsize="9"
align="0" colspan="48"
lineoccupied="1">Tổ chức tham gia</td>
</BODY>
</HTML>
You can cut and paste the above into an HTML file and view the result. More reading pleasure is here Unicode and HTML
I am attempting to use TextMarginFinder to prove that odd and even pages back up correctly when printing. I have based my code on:
http://itextpdf.com/examples/iia.php?id=280
The issue I have is that on odd pages I am looking for the box to be aligned to the left showing a 1CM back margin for example, and on an even page I would expect the page box to be aligned to the right also showing a 1CM back margin. Even in the example above this is not the case, but when printed the text does back up perfectly because the Trim Box conforms.
In summary I believe on certain PDF files the TextMarginFinder is incorrectly locating the text width, usually on Even pages. This is evident by the width being greater than the actual text. This is usually the case if there are slug marks outside of the Media Box area.
In the PDF the OP pointed to (margins.pdf from the iText samples themselves) indeed the box is not flush with the text:
If you look into the PDF Content, though, you'll see that many of the lines have a trailing space character, e.g. the first line:
(s I have worn out since I started my ) Tj
These trailing space characters are part of the text and, therefore, the box does not flush with the visible text but it does with the text including such space characters.
If you want to ignore such space characters, you can try doing so by filtering such trailing spaces (or for the sake of simplicity all spaces) before they get fed into the TextMarginFinder. To do this I'd explode the TextRenderInfo instances character-wise and then filter those which trim to empty strings.
A helper class to explode the render info objects:
import com.itextpdf.text.pdf.parser.ImageRenderInfo;
import com.itextpdf.text.pdf.parser.RenderListener;
import com.itextpdf.text.pdf.parser.TextRenderInfo;
public class TextRenderInfoSplitter implements RenderListener
{
public TextRenderInfoSplitter(RenderListener strategy) {
this.strategy = strategy;
}
public void renderText(TextRenderInfo renderInfo) {
for (TextRenderInfo info : renderInfo.getCharacterRenderInfos()) {
strategy.renderText(info);
}
}
public void beginTextBlock() {
strategy.beginTextBlock();
}
public void endTextBlock() {
strategy.endTextBlock();
}
public void renderImage(ImageRenderInfo renderInfo) {
strategy.renderImage(renderInfo);
}
final RenderListener strategy;
}
Using this helper you can update the iText sample like this:
RenderFilter spaceFilter = new RenderFilter() {
public boolean allowText(TextRenderInfo renderInfo) {
return renderInfo != null && renderInfo.getText().trim().length() > 0;
}
};
PdfReader reader = new PdfReader(src);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(RESULT));
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextMarginFinder finder = new TextMarginFinder();
FilteredRenderListener filtered = new FilteredRenderListener(finder, spaceFilter);
parser.processContent(i, new TextRenderInfoSplitter(filtered));
PdfContentByte cb = stamper.getOverContent(i);
cb.rectangle(finder.getLlx(), finder.getLly(), finder.getWidth(), finder.getHeight());
cb.stroke();
}
stamper.close();
reader.close();
The result:
In case of slug area text etc you might want to filter more, e.g. anything outside the crop box.
Beware, though, there might be fonts in which the space character is not invisible, e.g. a font of boxed characters. Taking the spaces out of the equation in that case would be wrong.
When moving from the versions 4.1.2 => 5.1.3 of iTextSharp I have come across a bug that happens when generating a PDF from text. The problem is that when the first character of a line has a leading spaces then that leading space gets truncated. This is a problem with a right justified columns.
Example: (dashes= spaces)
Input:
------Header
--------------1
--------------2
0123456789
Output:
-----Header
-------------1
-------------2
0123456789 ~~~Notice improper alignment because this column did not have leading space!
The problematic code has been narrowed down to the file "iTextSharp/text/pdf/PdfChunck.cs" method "TrimFirstSpace".
This method is called from the PdfDocument class while streaming out the bytes. The problem is that there is no code comments as to what this method trying to be accomplish.
What should I change to make this work right? It seems like commenting out the ELSE condition in here should fix this.
public float TrimFirstSpace()
{
BaseFont ft = font.Font;
if (ft.FontType == BaseFont.FONT_TYPE_CJK && ft.GetUnicodeEquivalent(' ') != ' ')
{
if (value.Length > 1 && value.StartsWith("\u0001"))
{
value = value.Substring(1);
return font.Width('\u0001');
}
}
else
{
if (value.Length > 1 && value.StartsWith(" "))
{
value = value.Substring(1);
return font.Width(' ');
}
}
return 0;
}
Newer code changes address the issue. The if statement is important.
OLD
chunk = overflow;
chunk.TrimFirstSpace();
New
bool newlineSplit = chunk.IsNewlineSplit();
chunk = overflow;
if (!newlineSplit)
chunk.TrimFirstSpace();
http://sourceforge.net/p/itextsharp/code/518/tree/trunk/src/core/iTextSharp/text/pdf/PdfDocument.cs#l415