Problem downloading a dataset from a wlatin1 environment to a UTF-8 one - encoding

I'm downloading (with a proc download in SAS) a dataset from a server with wlatin1 encoding to a server with UTF-8 encoding.
I've got the error
ERROR: Some character data was lost during transcoding in the data set
libref.datasetname. Either the data contains characters that are not
representable in the new encoding or truncation occurred during transcoding.
I tried setting inencoding='utf8' or inencoding='asciiany' on the input dataset but it doesn't work (maybe because the wlatin1 server has SAS 9.3 whereas the UTF-8 server has SAS 9.4).
Rewriting the file like in the following code (and then doing the proc download of myoutput) works, but I was wondering if there is a more elegant way to do the same thing.
data myoutput;
set pathin.myinput;
/*Translittera la I accentata con I normale*/
des_nome = tranwrd(des_nome,'CD'x,'I');
des_nome = tranwrd(des_nome,'ED'x,'i');
/*Translittera la A accentata con A normale*/
des_nome = tranwrd(des_nome,'C1'x,'A');
des_nome = tranwrd(des_nome,'E1'x,'a');
/*Translittera la E accentata con E normale*/
des_nome = tranwrd(des_nome,'C9'x,'E');
des_nome = tranwrd(des_nome,'E9'x,'e');
/*Translittera la O accentata con O normale*/
des_nome = tranwrd(des_nome,'D2'x,'O');
des_nome = tranwrd(des_nome,'D3'x,'O');
des_nome = tranwrd(des_nome,'D6'x,'O');
des_nome = tranwrd(des_nome,'F3'x,'o');
/*Translittera la U accentata con U normale*/
des_nome = tranwrd(des_nome,'DC'x,'U');
des_nome = tranwrd(des_nome,'F9'x,'u');
/*Translittera la Y accentata con Y normale*/
des_nome = tranwrd(des_nome,'DD'x,'Y');
des_nome = tranwrd(des_nome,'FD'x,'y');
/*Translittera accenti strani con '*/
des_nome = tranwrd(des_nome,'B4'x,"'");
/*Translittera simboli strani con spazi*/
des_nome = tranwrd(des_nome,'A7'x,' '); /* § nel NOME */
des_nome = tranwrd(des_nome,'A3'x,' '); /* £ nel NOME */
cod_cap_res = tranwrd(cod_cap_res,'A3'x,' '); /* £ nel CAP */
run;

Check out this SAS macro %copy_to_utf8 which will convert your data to UTF8 automatically.
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/nlsref/p1f9ghftk3fgrin16t57ub6svmet.htm
%copy_to_utf8(pathin.myinput, myoutput)

This issue is that the storage length of at least one of the variables in original dataset is too short to fit the expanded length required by the UTF-8 representation of at least one of the strings.
Here is simple way to demonstrate the problem. Create a simple file with all 256 possible characters of the WLATIN1 (or LATIN1) encoding.
340 %put %sysfunc(getoption(encoding,keyword));
ENCODING=WLATIN1
341 data 'c:\downloads\wlatin1.sas7bdat';
342 string = collate(0,256);
343 run;
NOTE: The data set c:\downloads\wlatin1.sas7bdat has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Now try to read it in a session using UTF-8 encoding. Even if you make the variable longer in the output dataset the conversion fails.
38 data test1;
39 length string $1024;
40 set 'c:\downloads\wlatin1.sas7bdat';
NOTE: Data file WC000001.WLATIN1.DATA is in a format that is native to another host, or the file encoding does not match the
session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
41 run;
ERROR: Some character data was lost during transcoding in the dataset WC000001.WLATIN1. Either the data contains characters that
are not representable in the new encoding or truncation occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST1 may be incomplete. When this step was stopped there were 0 observations and 1 variables.
WARNING: Data set WORK.TEST1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
So to fix that read the file using ENCODING='ANY' and then convert the strings from WLATIN1 to UTF-8.
42 data test;
43 length string $1024;
44 set 'c:\downloads\wlatin1.sas7bdat' (encoding='any');
45 string = kcvt(string,'wlatin1','utf-8');
46 run;
NOTE: There were 1 observations read from the data set c:\downloads\wlatin1.sas7bdat.
NOTE: The data set WORK.TEST has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

You might have better luck letting the newer version of SAS that is running with UTF-8 encoding do the transcoding instead of forcing the remote libref engine to attempt to deal with it.
So if you wanted to download the dataset MYLIB.MYDATA you could first copy the actual dataset file. Then transcode it.
%syslput lwork=%qsysfunc(pathname(work));
rsubmit;
proc download binary infile="%sysfunc(pathname(MYLIB))/mydata.sas7bdat"
outfile="&lwork/mydata.sas7bdat" ;
run;
endrsubmit;
data mydata;
set mydata(encoding='wlatin1');
run;

Related

Issue sending/receiving data over serial connection using MATLAB

I recently connected a reactor control tower through a serial 'COMMS' port to my computer (serial to USB). It seems to create a connection on the COM4 port (as indicated on the control panel 'devices and printers' section). However, it always gives me the following message when i try to 'fwrite(s)' or 'fscanf(s)'.
Warning: Unsuccessful read: The specified amount of data was not returned within the Timeout period..
%My COM4
s = serial('COM4');
s.BaudRate = 9600;
s.DataBits = 8;
s.Parity ='none';
s.StopBits = 1;
s.FlowControl='none';
s.Terminator = ';';
s.ByteOrder = 'LittleEndian';
s.ReadAsyncMode = 'manual';
% Building write message.
devID = '02'; % device ID
cmd = 'S'; % command read or write; S for write
readM = cell(961,3);% Read at most 961-by-3 values filling a 961–by–3 matrix in column order
strF = num2str(i);
strF = '11'; %pH parameter
strP = '15'; %pH set point
val = '006.8'; %pH set value
msg_ = strcat('!', devID, cmd, strF, strP, val);%output the string
chksum = dec2hex(mod(sum(msg_),256)); %conversion to hexdec
msg = strcat(msg_,':', char(chksum), ';');
fopen(s); %connects s to the device using fopen , writes and reads text data
fwrite(s, uint8(msg)); %writes the binary data/ Convert to 8-bit unsigned integer (unit8) to the instrument connected to s.
reply=fscanf(s); %reads ASCII data from the device connected to the serial port object and returns it to reply, for binary data use fread
fclose(s); %Disconnect s from the scope, and remove s from memory and the workspace.
This leads me to believe that the device is connected but is not sending or receiving information and I am unsure as to how I can configure this or even really check if there is communication occurring between the tower and my computer.

Can not read more than 32 blocks in a single READ MULTIPLE BLOCKS command from M24LR

I am trying to read multiple blocks (all of them in a single READ MULTIPLE BLOCKS command) from a M24LR chip through NFC-V.
let writeData = new Uint8Array(5);
writeData[0] = 0x0A; // Flags
writeData[1] = 0x23; // Read multiple block
writeData[2] = 0x00; // Address of starting block (first 8bit)
writeData[3] = 0x00; // Address (second 8bit)
writeData[4] = 0x1F; // Numbers of block (0x20 is not working)
nfc.transceive(writeData.buffer)
.then(response => {
console.log('response: ' + response);
})
.catch(error => {
console.log('error transceive: ' + JSON.stringify(error));
});
If I am asking for 32 blocks it works well, if I ask for 33 blocks, the command fails with an error.
Is it something that I am doing wrong? Does the READ MULTIPLE BLOCKS command have a limit?
See the datasheet (M24LR64-R: Dynamic NFC/RFID tag IC with 64-Kbit EEPROM
with I²C bus and ISO 15693 RF interface, DocID15170 Rev 16, section 26.5; the same also applies to M24LR64E-R, M24LR16E-R, and M24LR04E-R):
The maximum number of blocks is fixed at 32 assuming that they are all located in the same sector. If the number of blocks overlaps sectors, the M24LR64-R returns an error code.
Thus, the READ MULTIPLE BLOCKS command for these chips is limited to 32 blocks.

Reading data only when present

I'm trying to read the data from the COM3 port.
I'm using this code:
in = fscanf(s);
if(in == 'A')
fclose(s);
break;
end
The problem is that when no data is sent to the com3 port, the fscanf() will wait for a certain time interval and then give a timeout.
Is there a way to read data only when it is present?
Read only when data present
You can read out the BytesAvailable-property of the serial object s to know how many bytes are in the buffer ready to be read:
bytes = get(s,'BytesAvailable'); % using getter-function
bytes = s.BytesAvailable; % using object-oriented-addressing
Then you can check the value of bytes to match your criteria. Assuming a char is 1 byte, then you can check for this easily before reading the buffer.
if (bytes >= 1)
in = fscanf(s);
% do the handling of 'in' here
end
Minimize the time to wait
You can manually set the Timeout-property of the serial object s to a lower value to continue execution earlier as the default timeout.
set(s,'Timeout',1); % sets timeout to 1 second (default is 10 seconds)
Most likely you will get the following warning:
Unsuccessful read: A timeout occurred before the Terminator was
reached..
It can be suppressed by executing the following command before fscanf.
warning('off','MATLAB:serial:fscanf:unsuccessfulRead');
Here is an example:
s = serial('COM3');
set(s,'Timeout',1); % sets timeout to 1 second (default is 10 seconds)
fopen(s);
warning('off','MATLAB:serial:fscanf:unsuccessfulRead');
in = fscanf(s);
warning('on','MATLAB:serial:fscanf:unsuccessfulRead');
if(in == 'A')
fclose(s);
break;
end

Changing format of many files in Excel

I have a folder filled with thousands of csv files. When I open one file, the data looks like:
20110503 01:46.0 1527.8 1 E
20110503 01:46.0 1537.8 1 E
20110504 37:40.0 1536.6 1 E
20110504 37:40.0 1533.6 1 E
20110504 36:17.0 1531.1 1 E
The second column(time) has minutes and seconds before the decimal point. If I select the second column, right click and click format cells, select time, and change to 13:30:55 mode, the same data looks like:
20110503 19:01:46 1527.8 1 E
20110503 19:01:46 1537.8 1 E
20110504 0:37:40 1536.6 1 E
20110504 0:37:40 1533.6 1 E
20110504 8:36:17 1531.1 1 E
Now I can see hours, minutes and seconds. I have written a matlab function that reads these files, but needs to be able to read the hours. The function can only be used after I change the format to display the hours. Now I have to apply the function to all the files in the folder.
I'm wondering, is there a way to change the default time display so hours are included? If not, is there a way of writing a script to change the format of these files? Thanks!
Note: the part of my matlab function that reads the file looks like:
fid = fopen('E:\Tick Data\Data Output\NGU13.csv','rt');
c = fscanf(fid, '%d,%d:%d:%d,%f,%d,%*c');
datamat = reshape(c,6,length(c)/6)'; % reshape into matrix
yyyymmdd = datamat(:,1);
hr = datamat(:,2);
mn = datamat(:,3);
sec = datamat(:,4);
pp = datamat(:,5); % price
vv = datamat(:,6); % volume
In Excel:
In Notepad, you can see hours, minutes, seconds, and milliseconds:
20111206,09:50:56.411,4.320,1,E
20111206,10:02:10.167,4.300,1,E
20111206,11:24:09.052,4.313,1,E
20111206,11:46:09.359,4.307,1,E
20111206,11:50:22.785,4.320,1,E
For a record of the type
20010402, 09:30:24.456, 4.235, 1, E
you should use this fmt:
fmt = '%f%f:%f:%f.%f%f%*s';
data = textscan(fid, fmt, 'Delimiter',',','CollectOutput',true);

Need help identifying and computing a number representation

I need help identifying the following number format.
For example, the following number format in MIB:
0x94 0x78 = 2680
0x94 0x78 in binary: [1001 0100] [0111 1000]
It seems that if the MSB is 1, it means another character follows it. And if it is 0, it is the end of the number.
So the value 2680 is [001 0100] [111 1000], formatted properly is [0000 1010] [0111 1000]
What is this number format called and what's a good way for computing this besides bit manipulation and shifting to a larger unsigned integer?
I have seen this called either 7bhm (7-bit has-more) or VLQ (variable length quantity); see http://en.wikipedia.org/wiki/Variable-length_quantity
This is stored big-endian (most significant byte first), as opposed to the C# BinaryReader.Read7BitEncodedInt method described at Encoding an integer in 7-bit format of C# BinaryReader.ReadString
I am not aware of any method of decoding other than bit manipulation.
Sample PHP code can be found at
http://php.net/manual/en/function.intval.php#62613
or in Python I would do something like
def encode_7bhm(i):
o = [ chr(i & 0x7f) ]
i /= 128
while i > 0:
o.insert(0, chr(0x80 | (i & 0x7f)))
i /= 128
return ''.join(o)
def decode_7bhm(s):
o = 0
for i in range(len(s)):
v = ord(s[i])
o = 128*o + (v & 0x7f)
if v & 0x80 == 0:
# found end of encoded value
break
else:
# out of string, and end not found - error!
raise TypeError
return o