How to read an EBCDIC file and find and replace Hex value x'BE - datastage

How do I read an EBCDIC file and find and replace Hex value x'BE?
I have an EBCDIC file coming from z/OS and landing on Linux. This file is peppered with single smart quotes, x'BE. I want to replace x'BE with x'7D, which is the standard single quote.
Thanks in advance for any help.

How about
sed 'y/\xbe/\x7d/' input-file > output-file

Related

Difference between unicode 0001 and 2401?

I am trying to use the SOH character as a delimiter for a CSV file that my code generates. However, it looks like there are two unicode characters for SOH?
https://www.compart.com/en/unicode/U+2401
https://www.compart.com/en/unicode/U+0001
I am not sure what is the difference between the two is? or which one should I use?
U+0001 is the control character. U+2401 is a symbolic picture of the character.
Example: ␁ (May not display in all browsers, but is a single pictograph of SOH)

How to replace the same character in multiple text files?

So I have over 100 text files, all of which are over the size required to be opened in a normal text editor (eg; notepad, notepad++). Meaning I cannot use those mentioned.
All text files contain the same format, they contain:
abc0001:00000009a
abc0054:000000809a
abc00888:054450000009a
and so on..
I was wondering, how do I replace the ":" in each of those text files to then be "\n" (regex for new line)
So then it would be:
abc0001
00000009a
abc0054
000000809a
abc00888
054450000009a
How would I do this to all of the 100 text files, without doing this manually and individually. (if there's any way?)
Any help is appreciated.
You can use sed. The following does something similar to what you want. The question concerns Unix, but a lot of Unix utilities have been ported to MS Windows (even sed): http://gnuwin32.sourceforge.net/packages/sed.htm
UNIX: Replace Newline w/ Colon, Preserving Newline Before EOF
Something like (where you provide your text file as input, and the output becomes your new text file):
sed 's/:/\n/g'

windows right single quote error when copying data into postgres from csv file

i'm trying to import csv file into a postgres db (ver 9.3, with database encoding set as UTF8). using the command below, i get the error (also below)
copy mytable from 'C:/candidate_analyze.csv' delimiter ',' csv;
ERROR: invalid byte sequence for encoding "UTF8": 0x96
after researching, i see that this error is related Windows-1252 or the windows version of the right single quote mark instead of apostrophe.
there is a text field in the csv file (called "orig_text") that has the right single quote mark in it.
this copy functionality is something that is going to be automated, so i can't go in there and manuually do a search and replace for the windows right quote mark everytime.
any ideas as to a solution to this problem?
any help would be greatly appreciated. thank you in advance.
The COPY command has an ENCODING option:
ENCODING
Specifies that the file is encoded in the encoding_name. If this option is omitted, the current client encoding is used.
So if your file really is encoded in windows-1252 then you could say:
copy mytable from 'C:/candidate_analyze.csv' delimiter ',' encoding 'windows-1252' csv;

Charset conversion from XXX to utf-8, command line

I have a bunch of text files that are encoded in ISO-8851-2 (have some polish characters). Is there a command line tool for linux/mac that I could run from a shell script to convert this to a saner utf-8?
Use iconv, for example like this:
iconv -f LATIN1 -t UTF-8 input.txt > output.txt
Some more information:
You may want to specify UTF-8//TRANSLIT instead of plain UTF-8. To quote the manpage:
If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.
For a full list of encoding codes accepted by iconv, execute iconv -l.
The example above makes use of shell redirection. Make sure you are not using a shell that mangles encodings on redirection – that is, do not use PowerShell for this.
recode latin2..utf8 myfile.txt
This will overwrite myfile.txt with the new version. You can also use recode without a filename as a pipe.
GNU 'libiconv' should be able to do the job.

I do replace literal \xNN with their character in Perl?

I have a Perl script that takes text values from a MySQL table and writes it to a text file. The problem is, when I open the text file for viewing I am getting a lot of hex characters like \x92 and \x93 which stands for single and double quotes, I guess.
I am using DBI->quote function to escape the special chars before writing the values to the text file. I have tried using Encode::Encoder, but with no luck. The character set on both the tables is latin1.
How do I get rid of those hex characters and get the character to show in the text file?
ISO Latin-1 does not define characters in the range 0x80 to 0x9f, so displaying these bytes in hex is expected. Most likely your data is actually encoded in Windows-1252, which is the same as Latin1 except that it defines additional characters (including left/right quotes) in this range.
\x92 and \x93 are empty characters in the latin1 character set (see here or here). If you are certain that you are indeed dealing with latin1, you can simply delete them.
It sounds like you need to change the character sets on the tables, or translate the non-latin-1 characters into latin-1 equivalents. I'd prefer the first solution. Get used to Unicode; you're going to have to learn it at some point. :)