I am new to linux and antiword.
I want to convert word to PostScript.
I am able to create the file but the PS file is not openning.
And at the time of conversion it is giving error like
"The combination PostScript and UTF-8 is not supported"
This utility 'antiword', the latest version of which I found to be 0.37 (21 Oct. 2005) ships with a few Readme files.
It explains itself in its own History file as such:
The name comes from: "The antidote against people who send Microsoft(R) Word files to everybody, because they believe that everybody runs Windows(R) and therefore runs Word".
The same file says about its current version:
Beta release, for evaluation by the public.
And it also says, under the header Known Limitations:
[....]
2) Antiword doesn't show all the images included in a Word document.
[....]
7) PostScript ouput will not work in combination with UTF-8. It only works in combination with character sets ISO-8859-1, ISO-8859-2 and ISO-8859-5.
8) Antiword's error messages are not very helpful.
So here you are. The behavior you are experiencing is exactly as is documented. :-)
#Tushar: It's also recommended to read the included FAQ document.
Related
Edit: Note this is an older question, from a time when AWS CLI was v1 - as noted in comments, there are other likely better solutions with v2
I'm using AWS CLI on Windows to query items from DynamoDb. Some of these items include non-ASCII characters.
When the query hits those items, it dies with an error
'charmap' codec can't encode character u'u010d' in position....
After hours of searching, I finally stumbled across a hackish workaround; under the AWSCLI\encodings directory, I copied utf_8.pyc over cp1252.pyc. This allows me to continue, but of course is ugly.
Before resorting to that, I also tried setting environment variables such as LANG, LC_ALL, LC_CTYPE to various permutations of en-US.UTF-8 or similar, all with no effect that I could see.
Does anyone know how (or is it even possible) to tell AWS CLI to use a particular encoding?
Since you're using the command line interface, a change to the terminal's encoding scheme should fix the issue.
Type:
chcp 65001
in the console (for UTF-8; you may also try different encodings) and retry your operations.
Maybe it could help as well - the issue with translation from AWS and storing results to the file (or powershell variable):
With error:
aws translate translate-text --text "Investigation" --source-language-code "auto" --target-language-code "PL" >> a.txt
'charmap' codec can't encode character '\u015a' in position 1: character maps to
Adding env. variable fixes the problem
set PYTHONIOENCODING=UTF-8
aws translate translate-text --text "Investigation" --source-language-code "auto" --target-language-code "PL" >> a.txt
The same in powershell:
PS C:\Users\???\Documents> $aws = aws translate translate-text --text "Request" --source-language-code "auto"--target-language-code "PL"
'charmap' codec can't encode character '\u015b' in position 4: character maps to <undefined>
PS C:\Users\???\Documents> exit
C:\Users\???\Documents>set PYTHONIOENCODING=UTF-8
C:\Users\???\Documents>powershell
Windows PowerShell
Copyright (C) 2016 Microsoft Corporation. All rights reserved.
PS C:\Users\???\Documents> $aws = aws translate translate-text --text "Request" --source-language-code "auto"
--target-language-code "PL"
PS C:\Users\???\Documents> $aws
{
"TranslatedText": "Prośba",
"SourceLanguageCode": "en",
"TargetLanguageCode": "pl"
}
I've reinstalled AWS CLI using upgraded MSI installers which now use Python 3 instead of Python 2 and the unknown encoding error is now gone.
I am using git bash on windows 10 and set PYTHONIOENCODING=UTF-8 was not actually did env change. Used export PYTHONIOENCODING=UTF-8 then overcame the charmap error
for Windows 10 and cli is installed by python
Error:
'charmap' codec can't encode characters in position XX-XX: character maps to
Solution: run following command in command-line
set PYTHONIOENCODING=UTF-8
I've written some code that makes use of the Biopython Entrez wrapper. Code was working fine on my previous Win10 laptop (Python 3.5.1), but I've just ported the code to a new Win10 laptop with the same versions of every package and Python installed and I'm now getting a decode error.
The traceback error leads to a function that fetches text - it's attempting to decode the text using cp1252 when it should be using UTF-8. I know that similar questions have been asked, but none have dealt with this problem happening inside a package (Biopython in my case). Copying the UTF-8 encoding file in Python/lib and renaming it to cp1252.py solves the problem, but this obviously is not a long term solution.
File "C:\Users\arjun\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 21715: character maps to <undefined>
Use the io module for reading if you're using Python 3.x (https://docs.python.org/2/library/io.html#io.open).
By default, it will use the encoding specified on its running platform. You can also specify your own encoding as explained in the docs.
I use WAMP server and ElFinder 2.x, it works fine except for filenames are encoded in UTF-8 when uploaded, so they look like Список предприятий ВРК123.xlsx in Windows Explorer. It's OK, but it would be nice to be able to copy files with unicode filename to ElFinder's folder via Windows Explorer.
As far as I know NTFS uses UTF-16. nao-pon answered here that one needs to set encoding, locale in connector options for multi-byte encodings. I've tried to set these options to 'UTF-16' and 'ru_RU.UTF-16', but ElFinder cannot load folder at all then and gives Invalid backend configuration. Readable volumes not available error.
UPD: it works fine with 'encoding' => 'CP1251' but well... it doesn't list files with names like 한자.txt.
I have a similar question to this:
ColdFusion, CFDirectory and the French
which was not given a satisfactory answer.
We have upgraded from Coldfusion 9 to Coldfusion 11. So far no major problems except the following:
When using CFdirectory to display file names that contain non ASCII characters in their names (eg: accents, umlauts) we get to see the file name with replacement characters � instead of the correct UTF equivalent. For example a file named L’État, c’est moi.pdf is displayed as L�����tat, c���est moi.pdf.
We are confident that this is a Coldfusion issue as nothing has changed but the Coldfusion version. With Coldfusion 9 CFdirectory worked OK when listing the same accented filenames. Our OS is Redhat 7.0 and the file names are also displayed correctly on the terminal with the ls command. I have also created a quick PHP script to see if PHP can read correctly the directory with the "readdir" command and there no problems there either, filenames are rendered correctly.
So I believe this has to be a Coldfusion 11 issue. I have added the -Dfile.encoding=UTF-8 -Dencoding=UTF-8 parameters in the JVM settings from the Coldfusion administrator server interface but it made no difference.
Any suggestions on how to rectify this would be appreciated.
example of code used follows:
<cfdirectory
action="list"
directory="#ExpandPath( './' )#/pdfs"
listinfo="name"
name="qFile"
/>
<cfdump
var="#qFile#"
label="All Files"
/>
Have you tried setting the cfprocessingdirective tag?
<cfprocessingdirective pageencoding="utf-8">
CF 11 WikiDocs
Also, In the Chrome Network Inspector, make sure the encoding is being returned correctly. Eg:
Content-Type:text/html; charset=UTF-8
If your environment is Linux, you need to have a clean UTF-8 configuration.
Please have a look here.
I had the same problem, I just add into the file ~/.bashrc these lines:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
After that, don't forget to restart your Coldfusion Server
sudo /opt/coldfusion11/cfusion/bin/coldfusion restart
Please see: Why are certain characters not being injected correctly to SQL Server from a CFQUERY?
Make sure your file is saved with encoding Unicode UTF-8.
Also make sure your JVM arguments will process that as well. Admin > Server Settings > Java and JVM. Add " -Dfile.encoding=UTF-8" to the Arguments.
I had the same problem this solved my bug
/.bashrc
LC_ALL="de_DE.UTF-8"
on linux and after change restart coldfusion application
I have a text file that contains localized language strings that is currently encoded in GB2312 (simplified Chinese), but all of my other language files are in UTF-8. I am finding it very difficult to work with this file, as none of my text editors will work properly with it and keep corrupting it. Are there any tools to convert this to UTF-8, and are there any downsides to doing this? Would it be better to just keep it as GB2312 and use a different editor (if so, can you recommend one)?
Update: I'm using Windows XP (English install).
Update #2: I've tried using Notepad++ and Notepad2 to edit the GB2312 files, but both are unable to read the files and corrupt them.
You can try this online service that uses the Open Source iconv utility.
You can also install Charco, a command-line version of it on your machine.
For GB2312, you can use CP936 as the encoding.
If you are a .Net developer you can make a small tool that does just that.
I've struggled with this as well and found that it was actually simple to solve from a programmatic point of view.
All you need is something like this (I tested it and it works):
In C#
static void Main(string[] args) {
string infile = args[0];
string outfile = args[1];
using (StreamReader sr = new StreamReader(infile, Encoding.GetEncoding(936))) {
using (StreamWriter sw = new StreamWriter(outfile, false, Encoding.UTF8)) {
sw.Write(sr.ReadToEnd());
sw.Close();
}
sr.Close();
}
}
In VB.Net
Private Shared Sub Main(ByVal args() As String)
Dim infile As String = args(0)
Dim outfile As String = args(1)
Dim sr As StreamReader = New StreamReader(infile, Encoding.GetEncoding(936))
Dim sw As StreamWriter = New StreamWriter(outfile, false, Encoding.UTF8)
sw.Write(sr.ReadToEnd)
sw.Close
sr.Close
End Sub
I might be thinking a bit too simple here, but if it's just this one plain text file, you could try the following:
Replace all & by &, all < by < and all > by > (to be on the safe side)
Prepend the following to the text file:
<html><head><meta http-equiv="Content-Type" content="text/html; charset=gb2312" /></head><body><pre>
Open the file in your favorite browser
Select and copy all text
Paste it in Notepad and save as UTF-8.
You'd be done with this before you could have written any code to do the conversion or downloaded any programs that would do the conversion for you.
Of course, I'm not a hundred percent sure this'll work, and your browser would need the correct fonts and everything, but considering you're working with these kinds of files I'm assuming you already have those.
GB 2312 is mostly compatible with GB 18030, so any tool able to deal with the latter should treat GB 2312 correctly as well. There are many tools for converting GB 18030 to UTF-8 (or some other Unicode encoding form), but I can't recommend any specific one for Windows, because I work on Unix. If you're wanting to write a bit of code, the iconv library, or ICU, springs to mind: you'll find all the conversion data readily available in these libraries.
Conversion from GB 2312 to UTF-8 is completely safe and lossless, you shouldn't worry about it.
I agree on the currently chosen answer in that "found that it was actually simple to solve from a programmatic point of view", especially when your source file contains sensitive information that you do not want to expose to an unknown 3rd-party online service.
And, nowadays Python is available out-of-box in most Linux environment, and also easy to install on a Windows environment (easier than installing C# stack, IMHO). So, without further ado, this is the 2-liner Python script that can convert GB2312 to UTF8. I tested it, it works.
# Usage: python this_script.py your_input.txt your_output.txt
import io, sys
io.open(sys.argv[2], "w", encoding="utf-8").write(io.open(sys.argv[1], encoding="gb2312").read())
If there is command line tool iconv in your OS, you can achieve this by running the one-line scirpt:
# From GB18030
iconv -f gb18030 -t utf8 -o output.txt input.txt
# From GB2313
iconv -f gb2313 -t utf8 -o output.txt input.txt
Check whether your OS have iconv:
$ iconv --version
iconv (Debian GLIBC 2.31-13+deb11u3) 2.31
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Ulrich Drepper.