enter image description here
In C# how to read the text file (File Sperator)FS
It should be something like:
// Or the encoding of the file
var encoding = Encoding.GetEncoding("iso-8859-1");
// where somefile.txt is your file
string text = File.ReadAllText("somefile.txt", encoding);
// the splitted parts. FS is the \x1C hex character.
string[] parts = text.Split('\x1C');
Related
I'm trying for quite some time now to generate a PDF in Java using itextpdf (com.itextpdf kernel,layout,form,pdfa) with text containing special characters (äöüß). I tried several things in different variations, like loading a TTF file and setting the encoding:
FontProgram fontProgram = FontProgramFactory.createFont( "font/FreeSans.ttf") ;
PdfFont font = PdfFontFactory.createFont( fontProgram, "UTF-8" ) ;
document.setFont( font );
This way it just doesn't display special characters at all.
This doesn't work either:
var font = PdfFontFactory.createFont(StandardFonts.HELVETICA, PdfEncodings.UTF8);
document.setFont( font );
I haven't found any solution to this and the official tutorials don't seem to have a solution.
Other encodings just render placeholder characters.
this is how I add the text:
PdfWriter writer = new PdfWriter(filename);
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
Paragraph p = new Paragraph("äüöß");
document.add(p);
document.close();
edit: I just realized that it works when I load the text from elsewhere like an input field, instead of passing a normal string. How can I make this work with hardcoded strings?
I tried re-encoding the string as described here: https://www.baeldung.com/java-string-encode-utf-8
but none of these methods work either. It always shows wrong characters.
PdfFont freeUnicode = PdfFontFactory.createFont("font/FreeSans.ttf", PdfEncodings.IDENTITY_H);
String rawString = "äöüß1234'";
byte[] bytes = StringUtils.getBytesUtf8(rawString);
String utf8EncodedString = StringUtils.newStringUtf8(bytes);
document.add(new Paragraph().setFont(freeUnicode)
.add(utf8EncodedString));
edit: The encoding in the source code editor is UTF-8 and I passed UTF-8 to the createFont() method, but that didn't work. When I pass CP1252 and change the source code encoding to ISO-8859-1, it shows the correct characters. Really strange how I couldn't find much information about this problem.
I've been given string data (stored in a database) where the originator read binary tiff files as a string using something like (they actually used the VB FileSystemObject):
var txt = File.ReadAllText(#"c:\some_image.tiff");
I need to re-create the original tiff files. I've tried looping through all encodings with:
var txt = File.ReadAllText(#"c:\test\some_image.tiff");
var encodings = Encoding.GetEncodings();
for (var i = 0; i < encodings.Length; i++)
{
File.WriteAllBytes(#"c:\test\some_image_" + i + ".tiff", encodings[i].GetEncoding().GetBytes(txt));
}
but to no avail.
When I get the file name from the link "Ð\u0097акаÑ\u0082.jpg" instead "Закат.jpg".
Determined that the file name comes in the encoding "windows-1251". When trying to convert to UTF8, UTF16. The file name continues to be displayed incorrectly. Has anyone come across something like this?
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Encoding wind1252 = Encoding.GetEncoding("windows-1252");
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = wind1252.GetBytes(FileName);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);
I create a text file with ANSI format and write persian word in that, now I read that with this code:
System.Net.WebClient wc = new System.Net.WebClient();
string textBoxNewsRight2Left = wc.DownloadString("http://dl.rosesoftware.ir/RoseSoftware%20List/Settings/News.Settings.txt");
MessageBox.Show(textBoxNewsRight2Left);
but I see ???????????????????? character!
I change file format to UTF8 and my problem fix with persian word, but again I find another problem this time with english word!
once again I use from this codes with UTF8 format file:
System.Net.WebClient wc = new System.Net.WebClient();
string textBoxNewsRight2Left = wc.DownloadString("http://dl.rosesoftware.ir/RoseSoftware%20List/Settings/News.Settings.txt");
MessageBox.Show(textBoxNewsRight2Left);
Now I see: "ÿþr" and I don't see my file words!
Now I use:
wc.Encoding = Encoding.Unicode;
Now I see my file words! without "ÿþr".
I don't think I have in original file hello and after read by C# I have hello
now I test equaling textBoxNewsRight2Left and hello
if (textBoxNewsRight2Left == "hello")
{
MessageBox.Show("Equal");
}
else
{
MessageBox.Show("Not Equal");
}
Now I see message Not Equal, but I see hello!
what is the problem?
I how can fix this problem?
I need to get the ASCII code of a Persian string to use it in a program. But the method below give the ? marks: "??? ????"
public string PerisanAscii()
{
//persian string
string unicodeString = "صبح بخیر";
// Create two different encodings.
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte array.
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
return asciiString;
}
Can you help me?
Best regards,
Mohsen
You can convert Persian UTF8 data to Windows-1256 (Arabic Windows):
var enc1256 = Encoding.GetEncoding("windows-1256");
var data = enc1256.GetBytes(unicodeString);
System.IO.File.WriteAllBytes(path, data);
ASCII does not support Persian. You may need old school Iran System encoding standard. This is determined by your Autocad application. I don't know if there is a direct Encoding in windows for it or not. But you can convert characters manually too. It's a simple mapping.