Importing SPSS file in SAS - Discrepancies in Language - unicode

I am having trouble importing an SPSS file into SAS. The code I am using is:
proc import datafile = "C:\SAS\Germany.sav"
out=test
dbms = sav
replace;
run;
All the data are imported, but the problem is that some of the values of the variables have slightly different names. So, for instance in the SPSS file, the value of variable "A", is "KÖL", but when imported in SAS it becomes "KÖL".
What I am thinking is that the problem might be based on the fact that the .sav file has some German Words, that SAS cannot understand.
Is there a command that loads a library or something in SAS so that it can understand language-specific values?
P.S. I have also found a similar post here: Importing Polish character file in SAS
but the answer is not really clear.

SAS by default is often installed using the standard windows-latin-1 codepage, often called "ASCII" (incorrectly). SAS itself can handle any encoding, but if it by default uses Windows-Latin-1, it won't handle some Unicode translations.
If you're using SAS 9.3 or 9.4, and possibly earlier versions of v9, you probably have a Unicode version of SAS installed. Look in
\SasFoundation\9.x\nls\
In there you'll probably find "en" (if you're using it in English, anyway), which is usually using the default Windows-latin-1 codepage. You'll also find (possibly, if it was installed) Unicode compatible versions. This is really just a configuration setting, but it's important enough to get them right that they supply a pre-baked config file.
In my case I have a "u8" folder under nls, which I can then use to enable Unicode character encoding on my datasets and when I read in data.
One caveat: I don't know for sure how well the SPSS import engine handles Unicdoe/MBCS characters. This is a separate issue; if you run the unicode version of SAS and it still has problems, that may be the issue, and you may need to either export your SPSS file differently or talk to SAS tech support.

Related

Enterprise Architect (EA) reverting special characters on diagrams

My diagrams must contain some special characters like ı,ğ and such. When I use these on EA they revert back to the closest related character like i,g,etc after I restart EA.
The project is shared so I can't easily migrate to jet4 as I have seen suggested on older posts. Is this still the only solution?
This happens on some machines only. We use EA13.
Things I have tried.
Start-Preferences-General-Use Jet 4.0
Start-Preferences-XML Specifications-Code Page:UTF-8
Configure-Source Code Engineering-Code Page for source editing:UTF-8
Control Panel-Languages: Same on every machine.
OS language and version is the same (w10-Eng).
Thanks.
Using a Jet4 database or one of the real database systems is the only option.
The regular .eap files are in the Jet3.5 format and that format simply doesn't support unicode.
Since v14 Jet4 files have been given the .eapx extension.

Encoding Option in Scala

I have data file which contains some Chinese data. I am not able to read/write data properly. I have used Encoding/Charset option while reading and writing but no luck. I have to set encoding/charset option while reading and writing csv file.
I have tried the following two options:
.option("encoding", "utf-16")
.option("charset","UTF-16")
How should the encoding be set?
I have had some trouble reading files with Chinese before with Scala, although not with the Spark platform. Are you sure the encoding used is UTF-16? You can open the file with notepad or equivalent to check. In my case, I finally succeeded to read the files with the GB2312 encoding.
If it doesn't work I would recommend to try using a pure Scala or Java application (without Spark) to see if reading/writing works for the UTF-16 encoding.

Postgresql fulltext search for Czech language (no default language config)

I am trying to setup fulltext search for Czech language. I am little bit confused, because I see some cs_cz.affix and cs_cz.dict files inside tsearch_data folder, but there is no Czech language configuration (it's probably not shipped with Postgres).
So should I create one? Which dics do I have to create/config? Is there some support for Czech language at all?
Should I use all possible dicts? (Synonym Dictionary, Thesaurus Dictionary, Ispell Dictionary, Snowball Dictionary)
I am able to create Czech configuration for ispell dict and it works fine, bud I am not sure if it's enough (just ispell configuration).
Thanks a lot I tried to read https://www.postgresql.org/docs/9.5/static/textsearch.html but I am little bit confused.
I have never tried it, but you should be able to create a Czech Snowball stemmer as long as you are ready to compile PostgreSQL from source.
There is an explanation in src/backend/snowball/README:
The files under src/backend/snowball/libstemmer/ and
src/include/snowball/libstemmer/ are taken directly from their libstemmer_c
distribution, with only some minor adjustments of file inclusions. Note
that most of these files are in fact derived files, not master source.
The master sources are in the Snowball language, and are available along
with the Snowball-to-C compiler from the Snowball project. We choose to
include the derived files in the PostgreSQL distribution because most
installations will not have the Snowball compiler available.
To update the PostgreSQL sources from a new Snowball libstemmer_c
distribution:
Copy the *.c files in libstemmer_c/src_c/ to src/backend/snowball/libstemmer
with replacement of "../runtime/header.h" by "header.h", for example
for f in libstemmer_c/src_c/*.c
do
sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f`
done
(Alternatively, if you rebuild the stemmer files from the master Snowball
sources, just omit "-r ../runtime" from the Snowball compiler switches.)
Copy the *.c files in libstemmer_c/runtime/ to
src/backend/snowball/libstemmer, and edit them to remove direct inclusions
of system headers such as <stdio.h> – they should only include "header.h".
(This removal avoids portability problems on some platforms where <stdio.h>
is sensitive to largefile compilation options.)
Copy the *.h files in libstemmer_c/src_c/ and libstemmer_c/runtime/
to src/include/snowball/libstemmer. At this writing the header files
do not require any changes.
Check whether any stemmer modules have been added or removed. If so, edit
the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the
stemmer_modules[] table in dict_snowball.c.
The various stopword files in stopwords/ must be downloaded
individually from pages on the snowball.tartarus.org website.
Be careful that these files must be stored in UTF-8 encoding.
Now there is a Czech Snowball stemmer available here, it was contributed to the project. There is no stop word dictionary available, but I am sure you can either find one or create one yourself.
The real work would be to install Snowball and use the Snowball-to-C compiler to create the C and header files to add to the PostgreSQL source.
These files should then remain stable, so it shouldn't be difficult to upgrade to a new PostgreSQL version.
If you are willing to do the work, but don't want to patch PostgreSQL and build it from source every time, you could also consider submitting a patch to PostgreSQL. As long as the stemmer works fine, I don't expect that you will much resistance there (but the patch submission process is still tedious).

current scctext replacement for textual representation of vfp binary files

What are people using in vfp 9 for a replacement for the built-in scctext.prg that translates binary files in vfp to a textual representation?
We’ve moving an existing project that’s in vfp 9 sp1 into tfs source control, but we need a way to make sure that the non-textual files are able to get the benefits of comparison that only non-binary text files allow. We plan to check both the textual representation and the binary file into source control (the binary is more for the “just in case” scenario)
According to the document at
http://www.ita-software.com/papers/Borup_Mercurial_Published.pdf
there are at least three options for converting .scx, .frx, .lbx, .prj and other non-prg dbf files in visual foxpro (vfp) to a textual representation. Only some of them allow for converting the textual information back to binary - not sure how often we’d really use that or not.
ALTERNATE SCCTEXT
This one seems older with latest version in 2009 - not sure if it’s still the preferred tool - and it seems to have no way to take the textual representation and convert it back to a binary file.
http://vfpx.codeplex.com/releases/view/12955
TWOFOX
This one seems similar to the foxbin2prg except it creates xml files - seems like only one dev is working on it unlike the others that are open to contributions from others so not sure how current it is and how much it’s being used by other developers - it does have two way conversion like fox2binprg has.
http://www.foxpert.com/downloads.htm
FOXBIN2PRG
This one is fairly recent - but not sure if it’s production ready enough to use for prod coding working - it does have two way conversion
http://vfpx.codeplex.com/releases/view/116407
TRIGGER INVOKE ONE OF THE ABOVE ON CHANGE OF BINARY FILES IN VFP IDE
What are people using to invoke these textual representation options?
I’ve seen this class that was created to run one of the programs listed above for all files in the project. Apparently it does it when the date time of the last generate is older that the date time on the textual version of the file. One detriment I’ve read is that it generates for foundation classes and other things that really are not items that a dev is working on (code that is referenced by but not included in your project).
http://codepaste.net/9yy1gm
Thanks for any advice from those that are using vfp 9 with source control out there!
You should check out the scX library written by Paul McNett which is published on Ed Leafe's web site. I haven't used it in a mission-critical software project yet, but I have tested it out. It seemed to catch all the potential problems I've encountered with other scctext replacements.
The reason I haven't used it in a big project for a couple of reasons.
It is a breaking change for source control history. So, comparing source code in your current SCA or VCA files with the new files generated by scX isn't going to be simple.
It isn't a drop in replacement for scctext. Instead of checking files into and out of source control directly from the IDE, you'll have an intermediary folder.
You'll check your files out of source control into one folder, convert them to FoxPro format, and then edit them in the FoxPro IDE.
Then, you'll save your changes in the FoxPro IDE, convert them to scX format, and then check them into source control.
I'm sure much of #2 can be automated; but combined with #1, making the change to scX wasn't worth it for me.
FoxBin2Prg is Production ready, and AFAIK, it's the only tool that allow Diff and Merge of the generated text (tx2) files, and can regenerate the binaries from them.
The generated files are PRG style, so developers can see them as modifying a PRG (with PROc/ENDPROC structures and such), but they aren't mean to compile. Primary use is for SCM tools, but can be used seperately.
I'm actually using on production code with a 10 member team using concurrent modifications on forms and classes.
Some documentation is available on VFPx in English and Spanish, Internal messages are vailable on both languages and from version v1.19.24 a new translation to German is available too.
More info on VFPx site,
Best regards!

How do I export a Crystal Report to a Unicode text file?

I'm trying to export a Crystal Report to a text file, while preserving any Unicode characters that are found within. By default, Crystal Reports seems to export to an ANSI text file.
Here is a highly simplified version of what I'm doing:
Dim objCRReport As CRAXDRT.Report
[...]
objCRReport.ExportOptions.FormatType = 8 'crEFTText
objCRReport.ExportOptions.DestinationType = 1 'crEDTDiskFile
objCRReport.ExportOptions.DiskFileName = "C:\reportInTextFormat.txt"
objCRReport.Export blnPromptUser
Since it creates a file in ANSI format, I lose any special characters that were found within the report. These characters are all fine when you view the Crystal Report directly.
Please note that I am referencing the "Crystal Reports 9 ActiveX Designer Runtime Library" specifically.
I want to point out that I've tried pre-creating a Unicode file with the same name prior to the export, hoping the Crystal code would notice the file, and append to it rather than creating an ANSI file, but unfortunately this is not the case.
I then thought I could get around this problem (ninja style) by just exporting to an RTF file (which preserves the characters), then reading the contents of this RTF (minus the formating). I would then create a Unicode text file myself, writing the RTF contents to it. Unfortunately, to achieve this, I had to look into using a RichTextBox, but encountered a slew of problems with that. I think I'd have more success in VB.Net, but unfortunately I'm stuck with VB6 for this task.
After trying those approaches, I found an article that seems to suggest that Crystal Reports 9 supports exporting to a Unicode Text file, but I have yet to see it work. It mentions that the print engine supports it, so I'm going to look deeper to see if I can invoke it, in case the .export isn't doing so itself (which I doubt).
It turns out Crystal relies heavily on the printer driver for Unicode support, so I decided to look into that. Turns out the printer driver had to support Unicode, and this was the case on my test environment. While this was interesting to find out, it didn't solve my problem - I already had a compatible printer driver.
So, finally: after a few days of trying to find a solution to this, my boss decided it was time to cut our losses, and we instead planned for a re-design of the feature, without involving Crystal Report to Text exports. I am still, however, very interested in how to export to a Unicode text file with Crystal - so please do answer if you know how.