This question already has answers here:
How can I render hindi correctly when exporting to pdf?
(4 answers)
Closed 7 months ago.
**
How to get Hindi content stored in informix database in JAVA code?
How to display Hindi content in Jasper ireport? Stored content in
database is showing as ??????????????????????????????????????'
My code is :
String sql="";
sql="select template from ropk_sms where template_id=1307165174435759958";
List<Map> getRecords = jdbcTemplateObject.queryForList(sql);
for (Map row : getRecords){
Sirkdpe0100ActionBean objBean = new Sirkdpe0100ActionBean();
Collection c = row.values();
Iterator itr = c.iterator();
obj = itr.next();
objBean.setParty_name(obj!= null ?obj.toString().trim():"");
System.out.println("DATA"+objBean.getParty_name());
allRecords.add(objBean);
}
**
Hindi content should just be a string of characters, like other content in languages.
The fact that your content shows as question marks is likely that you are using the wrong encoding or the font used to display the characters just does not support Hindi characters.
Just make sure all your systems use Unicode and the same encoding type (e.g. UTF-8).
To have your database work correctly when sorting results, set the collation to Hindi as well (though I do not know how it is done on Informix).
Finally ensure the fonts used for rendering contain glyphs for your characters.
Related
This question already has answers here:
When to use Unicode Normalization Forms NFC and NFD?
(2 answers)
Normalizing Unicode
(2 answers)
What is the best way to remove accents (normalize) in a Python unicode string?
(13 answers)
Closed 4 months ago.
I was looking into Matt Parker's problem of finding 5 5-letters words that have in total 25 distinct letters (link if you are interested, not really relevant to the issue), but I wanted to do it using french words (and so I had to use the french alphabet).
So I downloaded the following file from github, containing 1 french word per line, encoded in utf-8, with the accents. But something I never seen before occured : There seems to be two ways (maybe more) to encode accents in utf-8.
When I opened the file in VSCode (also works with Windows' Notepad), every accent displayed as they should, but :
when I select a letter with an accent, it shows that 2 letters are selected
when I write a letter with an accent using a french keyboard, and then select it, it shows that only 1 letter is selected (so the two appear to be distinct, though they display exactly identically)
I then tried the following code in python :
# first is typed from keyboard, second is typed from keyboard, returns true
print("é" == "é")
# first is copied from downloaded text file, second is typed from keyboard, returns false
print("é" == "é")
It displays identically, but is not identical !?
I then tried to change the encoding of the file, but it either changed nothing, or removed the accents : the word "abaissé" might for example show as "abaisse�".
After testing, it seems like the file is encoding accented letters as a special accent character adjacent to the letter character it wants to put the accent on. But when I then read the file character by character (it does it in rust for example, but I'm pretty sure it does the same in other languages), they come as two characters : I first get the accent and then the letter. And rust chars are by default valid unicode units, so if they where 1 single unicode char, rust should read them as 1 single unicode char. It's not the case.
I'm honestly stuck, so if you know, I would be really grateful for your help.
Why are these combos of characters showing as 1 character when they should display as 2 distinct ?
I know this is not a bug from utf-8, but it makes handling the data very hard. So how do I recombine them together, as the accent shouldn't be decoupled from the letter in the first place ?
This question already has answers here:
Jasper Reports PDF doesn't export cyrillic values
(1 answer)
How can I test if my font is rendered correctly in pdf?
(1 answer)
Closed 11 months ago.
I am having a trouble with jasper reports. The task is to update the font used on some reports from dejavu sans (used earlier to get all the Croatia diacritic letters - č, ć, đ, š, ž) to times new roman. The jaspersoft studio version application uses is 6.9.0.
The existing times new roman does not do what I want to, it does not show Croatian letters at all, so I created a "new" times new roman font (tried it with multiple encodings and preferences when adding a new font). It works OK besides the issue with đ letter. Did anybode here had similar problems? It would not be a problem for non-variable texts where đ can be changed with dj, but if the content that needs to be formated to times new roman comes from the database, I have to show it the way it is in there.
When I say that I don't see the specific letter - I don't see it in PDF when the report is published.
Thank you for your time.
This question already has answers here:
How can I render hindi correctly when exporting to pdf?
(4 answers)
Closed 4 years ago.
In Internal preview of Hindi fonts Displays Correctly. But in Pdf It Varies .
For Example : पिता . In Pdf it Shows प िता
How to resolve this.
Special characters like Hindi language symbols will be supported by few font types like Arial Unicode MS:
To Resolve this issue, Open Ireport designer and apply below properties to the report:
Font Name = "Arial Unicode MS".
PDF font name = This property is deprecated, so leave it blank.
PDF Encoding = 'Identity-H (Unicode with horizontal writing)'.
Hope this help you resolve the issue.
Regards,
Srikanth Kattam
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I want to know , how can we convert a font to Unicode font. I have PDF file in my native language but those text has been written in a specific font file (ttf file). So i want to convert those text into Unicode fonts.
So how can i convert those text into Unicode. Is there any free online software available or i have to write any software code in any language.
I have tried in PHP but not getting much more effective.
Your question mixes several basic concepts (it is unclear whether you want to convert a font or the text it's written with), and I suggest you look a bit deeper into font technology before asking "then so how would I do it".
"Normal" fonts are using Unicode encoding. The "encoding" of a font describes which character image inside a font gets output for a given character code. A font can contain several encodings -- MacRoman, Windows Western -- and nowadays including a Unicode encoding is practically standard.
A font that does not comply to Unicode encoding (or any of the common ones) cannot be used without a translation from its character set to Unicode.
Your description suggests that the font in your PDF may be such a non-conforming font, so you need a table that maps its character codes to Unicode values. Use Google to see if someone else did this before you; if not, you will have to create the table yourself.
However.
Since your text comes out of a PDF, you cannot rely anymore on the encoding! If a PDF gets created, the software that does it is free to move characters around to different positions -- usually it creates a subset font from the original, and it can be convenient to reassign character codes. Friendly PDF creators may also include their own encoding in the PDF, but it is not mandatory. If it is missing, and your font is subsetted, then there is only one solution: you will have to create a translation table for that particular PDF. It will not be of any use for other documents using "the same" font, because that most likely will have a different subset.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to index pdf, ppt, xl files in lucene (java based or python or php any of these is fine)?
I need to search a string in a collection of files in a folder includes the pdf, docx, txt formats. Is it possible to search a string using lucene.net.
please give some references helpful for this..
thank u..
You would need to extract the text of the various files (pdf, docx, txt) and insert that text into a that to a Lucene index. Lucene doesn't have the ability to read text out of the various document formats
Extract PDF text in .NET
Docx and Ifilters
Generally search for "extract {document format} text in .net" and you should find plenty of resources.