How to change SAS session encoding dynamically - encoding

I am writing sas script which will work in batch. Encoding of SASApp session is UTF8 and all my tables (in Oracle database and SAS Datasets) has UTF8 encoding. BUT I have one compiled macro which can work with WCYRILLIC encoding only (it crashes with error if I use UTF8 as session encoding). This macro doesn't work with my tables, it performs some auxiliary actions.
The question is: how could I dynamically change session encoding from UTF8 to WCYRILLIC before the macro invoked and change it back to UTF8 just after it will be executed.

I don't think there is any way to change the session-level encoding option. The documentation page indicates that it can only be set when a session is first started:
Valid in: configuration file, SAS invocation
I think the best you can do is to override the session encoding option for every individual encoding-dependent statement in your problematic macro - i.e. specifying encoding=WCYRRLIC on every file, infile, filename, %include and ods statement generated by that macro.
Alternatively, if you have SAS/CONNECT you could write code that signs on to another session with encoding=WCYRILLIC specified in the invocation options just to run your macro, dumping the output back to the parent session.

Related

psql copy from read data from script

I am trying to export / import a set of tables from a PostgreSQL database.
I am using psql's copy from with stdin from a script. I have read that data (formerly produced using copy to with stdout) can be read and delimited using the command escape \..
What I didn't get from the documentation clearly is what would happen if \. appears in the formerly exported data.
Specifcally this section of the documentation (emphasis mine) isn't very clear about that.
For \copy ... from stdin, data rows are read from the same source that
issued the command, continuing until \. is read or the stream reaches
EOF. This option is useful for populating tables in-line within a SQL
script file. For \copy ... to stdout, output is sent to the same place
as psql command output, and the COPY count command status is not
printed (since it might be confused with a data row). To read/write
psql's standard input or output regardless of the current command
source or \o option, write from pstdin or to pstdout.
Can / must a \. appearing in the data escaped somehow?
I am currently using utf8 encoded text format for the export / import.
I think I found the relevant information in the documentation of the SQL COPY command (TEXT Format section, again emphasis mine):
End of data can be represented by a single line containing just backslash-period (\.). An end-of-data marker is not necessary when reading from a file, since the end of file serves perfectly well; it is needed only when copying data to or from client applications using pre-3.0 client protocol.

Load csv to DB2 database

I'd like to ask if my syntax is correct in loading a csv format file to a DB2 Database. I cannot confirm as I'm having problems in configuring DB2 in my local. I'd also like to confirm the placement of double quote is correct for both dateformat and timeformat?
Below is my code snippet.
LOGFILE=/mnt/bin/log/myLog.txt
db2 "load from /mnt/bin/test.csv of del modified by coldel noeofchar noheader dateformat=\"YYYY-MM-DD\" timeformat=\"HH:MM:SS\" usedefaults METHOD P(1,2,3,4,5) messages $LOGFILE insert_update into myuser.desctb(DESC_ID,START_DATE,START_TIME,END_DATE,END_TIME)"
If you use modified by coldel then you should also specify the delimiter character. If the delimiter really is a comma, then omit the coldel option.
Additionally insert_update is for the IMPORT command (not for load command), but import is a logged action which reduces insert throughput. You can use ... replace into ... with the LOAD command. Study the docs for the details.
The quoting seems OK, but correctness of the formats depends on data file values.
Refer to the LOAD documentation for details, you should study this page and the related pages.
An alternative to LOAD is to use INGEST command (available in current Db2-clients) which has insert, replace, merge and other options and is high throughput (compared to import).

Importing SPSS file in SAS - Discrepancies in Language

I am having trouble importing an SPSS file into SAS. The code I am using is:
proc import datafile = "C:\SAS\Germany.sav"
out=test
dbms = sav
replace;
run;
All the data are imported, but the problem is that some of the values of the variables have slightly different names. So, for instance in the SPSS file, the value of variable "A", is "KÖL", but when imported in SAS it becomes "KÖL".
What I am thinking is that the problem might be based on the fact that the .sav file has some German Words, that SAS cannot understand.
Is there a command that loads a library or something in SAS so that it can understand language-specific values?
P.S. I have also found a similar post here: Importing Polish character file in SAS
but the answer is not really clear.
SAS by default is often installed using the standard windows-latin-1 codepage, often called "ASCII" (incorrectly). SAS itself can handle any encoding, but if it by default uses Windows-Latin-1, it won't handle some Unicode translations.
If you're using SAS 9.3 or 9.4, and possibly earlier versions of v9, you probably have a Unicode version of SAS installed. Look in
\SasFoundation\9.x\nls\
In there you'll probably find "en" (if you're using it in English, anyway), which is usually using the default Windows-latin-1 codepage. You'll also find (possibly, if it was installed) Unicode compatible versions. This is really just a configuration setting, but it's important enough to get them right that they supply a pre-baked config file.
In my case I have a "u8" folder under nls, which I can then use to enable Unicode character encoding on my datasets and when I read in data.
One caveat: I don't know for sure how well the SPSS import engine handles Unicdoe/MBCS characters. This is a separate issue; if you run the unicode version of SAS and it still has problems, that may be the issue, and you may need to either export your SPSS file differently or talk to SAS tech support.

FreeTDS runs out of memory from DBD::Sybase

When I add
client charset = UTF-8
to my freetds.conf file, my DBD::Sybase program emits:
Out of memory!
and terminates. This happens when I call execute() on an SQL query statement that returns any ntext fields. I can return numeric data, datetimes, and nvarchars just fine, but whenever one of the output fields is ntext, I get this error.
All these queries work perfectly fine without the UTF-8 setting, but I do need to handle some characters that throw warnings under the default character set. (See related question.)
The error message is not formatted the same way other DBD::Sybase error messages seem to be formatted. I do get a message that a rollback() is being issued, though. (My false AutoCommit flag is being honored.) I think I read somewhere that FreeTDS uses the iconv program to convert between character sets; is it possible that this message is being emitted from iconv?
If I execute the same query with the same freetds.conf settings in tsql (FreeTDS's command-line SQL shell), I don't get the error.
I'm connecting to SQL Server.
What do I need to do to get these queries to return successfully?
I saw this in .conf file - see if it helps:
# Command and connection timeouts
; timeout = 10
; connect timeout = 10
# If you get out of memory errors, it may mean that your client
# is trying to allocate a huge buffer for a TEXT field.
# (Microsoft servers sometimes pretend TEXT columns are
# 4 GB wide!) If you have this problem, try setting
# 'text size' to a more reasonable limit
text size = 64512
These links seem relevant as well and show how the setting can be changed without modifying the freetds.conf file:
http://lists.ibiblio.org/pipermail/freetds/2002q1/006611.html
http://www.freetds.org/faq.html#textdata
The FAQ is particularly unhelpful, not listing the actual error message.

Disabling the PostgreSQL 8.4 tsvector parser's `file` token type

I have some documents that contain sequences such as radio/tested that I would like to return hits in queries like
select * from doc
where to_tsvector('english',body) ## to_tsvector('english','radio')
Unfortunately, the default parser takes radio/tested as a file token (despite being in a Windows environment), so it doesn't match the above query. When I run ts_debug on it, that's when I see that it's being recognized as a file, and the lexeme ends up being radio/tested rather than the two lexemes radio and test.
Is there any way to configure the parser not to look for file tokens? I tried
ALTER TEXT SEARCH CONFIGURATION public.english
DROP MAPPING FOR file;
...but it didn't change the output of ts_debug. If there's some way of disabling file, or at least having it recognize both file and all the words that it thinks make up the directory names along the way, or if there's a way to get it to treat slashes as hyphens or spaces (without the performance hit of regexp_replaceing them myself) that would be really helpful.
I think the only way to do what you want is to create your own parser :-( Copy wparser_def.c to a new file, remove from the parse tables (actionTPS_Base and the ones following it) the entries that relate to files (TPS_InFileFirst, TPS_InFileNext etc), and you should be set. I think the main difficulty is making the module conform to PostgreSQL's C idiom (PG_FUNCTION_INFO_V1 and so on). Have a look at contrib/test_parser/ for an example.