String codification to Twitter - encoding

I'm developing a program that sends tweets.
I have this piece of code:
StringBuilder sb = new StringBuilder("Recomendo ");
sb.append(lblName.getText());
sb.append(" no canal "+lblCanal.getText());
sb.append(" no dia "+date[2]+"/"+date[1]+"/"+date[0]);
sb.append(" às "+time[0]+"h"+time[1]);
byte[] defaultStrBytes = sb.toString().getBytes("ISO-8859-1");
String encodedString = new String(defaultStrBytes, "UTF-8");
But When I send it to tweet I get the "?" symbol or other strage characters because of the accents like "à" . I've also tried with only
String encodedString = new String(sb.toString().getBytes(), "UTF-8"); //also tried with ISO-8859-1
but the problem remains...

You are trying to read Latin-1 as UTF-8. That's why you are getting question marks.
Try to send your string as is,
String encodedString = sb.toString();
The charset should be taking care when you send the message to Tweet. If URL encoding is required, you would do something like
String msg = URLEncoder.encode(encodedString, "UTF-8");

Related

iTextSharp does not show Unicode symbols on pdf document

I have a little problem in ASP.NET Webforms I am using iTextSharp to create a pdf document. Everything works great, except the Unicode characters that are not displayed or they appear as question marks. Here is my code, if anyone knows what am I doing wrong I would be very grateful!
BaseFont base = BaseFont.CreateFont("C:\\Windows\\Fonts\\Wingdings.ttf", BaseFont.CP1252, BaseFont.EMBEDDED);
Font font = new Font(base, 12, Font.NORMAL, BaseColor.BLACK);
using (MemoryStream ms = new MemoryStream())
{
using (var document = new Document())
{
HTMLWorker parser = new HTMLWorker(document);
document.NewPage();
string str = " \u2022 \u2706 ";
Phrase phrase = new Phrase(str, font);
document.Add(phrase);
//and after that I have some other strings which are parsed with parser and that works perfectly
}}
I have tried to parse the phrase with XMLWorkerHelper, but it did not work at all. Also, I have tried to convert the string like this and the symbols appeared as question marks. I have tried to change the font like Wingding2, Wingding3 but it did not work.
byte[] bytestr = Encoding.Default.GetBytes(str);
string convertstr = Encoding.UTF8.GetString(bytestr);
Phrase phrase = new Phrase(convertstr, font);
document.Add(phrase);
Also I have tried to enter them like this, but also I've got the same result with the question marks
string str = "•✆";
Phrase phrase = new Phrase(str, font);
document.Add(phrase);
EDIT: I have tried to change the font to Segoe UI Symbol and escape characters like \u260F, but still does not work.

Norwegian Characters as '?' in ics c#

I have been trying to send calendar event ics in a mail as attachment but the summary and description is showing norwegian character like 'ø' as '?'.
Please help me as I am new to the calendar events in ASP.Net MVC.
System.Text.StringBuilder str = new StringBuilder();
str.AppendLine("BEGIN:VCALENDAR");
str.AppendLine("PRODID:-//Schedule a Meeting");
str.AppendLine("VERSION:2.0");
str.AppendLine("METHOD:PUBLISH");
str.AppendLine("BEGIN:VEVENT");
str.AppendLine(string.Format("DTSTART:{0:yyyyMMddTHHmmssZ}",model.Startdate));
str.AppendLine(string.Format("DTSTAMP:{0:yyyyMMddTHHmmssZ}", DateTime.UtcNow));
str.AppendLine(string.Format("DTEND:{0:yyyyMMddTHHmmssZ}", model.EndDate));
str.AppendLine("LOCATION: " + model.Location);
str.AppendLine(string.Format("UID:{0}", Guid.NewGuid()));
str.AppendLine(string.Format("DESCRIPTION:{0}", model.desc));
str.AppendLine(string.Format("SUMMARY:{0}", model.Name));
str.AppendLine(string.Format("ORGANIZER:MAILTO:{0}", model.Email));
str.AppendLine("BEGIN:VALARM");
str.AppendLine("TRIGGER:-PT15M");
str.AppendLine("ACTION:DISPLAY");
str.AppendLine("DESCRIPTION:Reminder");
str.AppendLine("END:VALARM");
str.AppendLine("END:VEVENT");
str.AppendLine("END:VCALENDAR");
byte[] byteArray = Encoding.ASCII.GetBytes(str.ToString());
MemoryStream stream = new MemoryStream(byteArray);
Attachment attach = new Attachment(stream, "Invitation.ics");`
The problem here is that you're loosing the special characters when using the ASCII Encoding. Use some other Encoding, e.g. UTF8, which is a variable multi-byte encoding that can cover all characters.
The attached link shows how to specify the used encoding in the ics file:
https://theeventscalendar.com/support/forums/topic/ical-text-encoding/

Arabic problems with converting html to PDF using ITextRenderer

When I use ITextRenderer converting html to PDF.this is my code
ByteArrayOutputStream out = new ByteArrayOutputStream();
ITextRenderer renderer = new ITextRenderer();
String inputFile = "C://Users//Administrator//Desktop//aaa2.html";
String url = new File(inputFile).toURI().toURL().toString();
renderer.setDocument(url);
renderer.getSharedContext().setReplacedElementFactory(
new B64ImgReplacedElementFactory());
// 解决阿拉伯语问题
ITextFontResolver fontResolver = renderer.getFontResolver();
try {
fontResolver.addFont("C://Users//Administrator//Desktop//arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
} catch (DocumentException e) {
e.printStackTrace();
}
renderer.layout();
OutputStream outputStream = new FileOutputStream("C://Users//Administrator//Desktop//HTMLasPDF.pdf");
renderer.createPDF(outputStream, true);
/*PdfWriter writer = renderer.getWriter();
writer.open();
writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
OutputStream outputStream2 = new FileOutputStream( "C://Users//Administrator//Desktop//HTMLasPDFcopy.txt");
renderer.createPDF(outputStream2);*/
renderer.finishPDF();
out.flush();
out.close();
Actual PDF Result:
Expected PDF Result:
How to make arabic ligature?
If you want to do this properly (I assume using iText, since your post is tagged as such), you should use
iText7
pdfHTML (to convert HTML to PDF)
pdfCalligraph (to handle Arabic ligatures properly)
a font that supports these features (as indicated by another answer)
For an example, please consult the HTML to PDF tutorial, more specifically the following FAQ item: How to convert HTML containing Arabic/Hebrew characters to PDF?
You need fonts that contain the glyphs you need, e.g.:
public static final String[] FONTS = {
"src/main/resources/fonts/noto/NotoSans-Regular.ttf",
"src/main/resources/fonts/noto/NotoNaskhArabic-Regular.ttf",
"src/main/resources/fonts/noto/NotoSansHebrew-Regular.ttf"
};
And you need a FontProvider that knows how to find these fonts in the ConverterProperties:
public void createPdf(String src, String[] fonts, String dest) throws IOException {
ConverterProperties properties = new ConverterProperties();
FontProvider fontProvider = new DefaultFontProvider(false, false, false);
for (String font : fonts) {
FontProgram fontProgram = FontProgramFactory.createFont(font);
fontProvider.addFont(fontProgram);
}
properties.setFontProvider(fontProvider);
HtmlConverter.convertToPdf(new File(src), new File(dest), properties);
}
Note that the text will come out all wrong if you don't have the pdfCalligraph add-on. That add-on didn't exist at the time Flying Saucer was created, hence you can't use Flying Saucer for converting documents with text in Arabic, Hindi, Telugu,... Read the pdFCalligraph white paper if you want to know more about ligatures.
Greek characters seemed to be omitted; they didn’t show up in the document.
In flying saucer the generated PDF uses some kind of default
(probably Helvetica) font, that contains a very limited character set,
that obviously does not contain the Greek code page. link
I change the way to convert pdf by using wkhtmltopdf.

Convert persian unicode to Ascii

I need to get the ASCII code of a Persian string to use it in a program. But the method below give the ? marks: "??? ????"
public string PerisanAscii()
{
//persian string
string unicodeString = "صبح بخیر";
// Create two different encodings.
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte array.
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
return asciiString;
}
Can you help me?
Best regards,
Mohsen
You can convert Persian UTF8 data to Windows-1256 (Arabic Windows):
var enc1256 = Encoding.GetEncoding("windows-1256");
var data = enc1256.GetBytes(unicodeString);
System.IO.File.WriteAllBytes(path, data);
ASCII does not support Persian. You may need old school Iran System encoding standard. This is determined by your Autocad application. I don't know if there is a direct Encoding in windows for it or not. But you can convert characters manually too. It's a simple mapping.

WP7's WebBrowser.NavigateToString() and text encoding

does anyone know how to load a UTF8-encoded string using WebBrowser.NavigateToString() method? For now I end up with a bunch of mis-displayed characters.
Here's the simple string that won't display correctly:
webBrowser.NavigateToString("ąęłóńżźćś");
The code file is saved with UTF-8 encoding (with signature).
Thanks.
Using ConvertExtendedASCII as suggested works, but is very slow. Using a StringBuilder instead was (in my case) about 800 times faster:
public string FixHtml(string HTML)
{
StringBuilder sb = new StringBuilder();
char[] s = HTML.ToCharArray();
foreach (char c in s)
{
if (Convert.ToInt32(c) > 127)
sb.Append("&#" + Convert.ToInt32(c) + ";");
else
sb.Append(c);
}
return sb.ToString();
}
First up, NavigateToString() is expecting a full html document.
Secondly, as you're passing HTML, it's best to pass HTML entities, rather than relying on encodings. Unfortunately, not that many entity codes are actually supported by the browser so you should look at using the numeric unicode values where necessary.
Much like this:
webBrowser1.NavigateToString("<html><body><p>ó Õ</p></body></html>");
Try this article. It should help. Shortly speaking, it proposes to use following snippet to convert your string into appropriate format:
private static string ConvertExtendedASCII(string HTML)
{
string retVal = "";
char[] s = HTML.ToCharArray();
foreach (char c in s)
{
if (Convert.ToInt32(c) > 127)
retVal += "&#" + Convert.ToInt32(c) + ";";
else
retVal += c;
}
return retVal;
}
If you have the UTF8 in memory in a byte array then you could try NavigateToStream with a MemoryStream rather than using NavigateToString. You should try to ensure their is a BOM on the UTF8 buffer if you can.
Note that the string in the question is not a UTF8 string. It is a UTF16 string with some garbage in it. By placing zeros between the bytes and storing it in a System.String you corrupted it.