strange byte diffence- is this impossible - encoding

In cygwin, I have two almost identical files, the other one I did minimal modification. And on the screen the first line is identical. So I use cmp utility to compare these two files byte by byte.
$cat double.c
#include <GL/gl.h>
#include <GL/glu.h>
#include <GL/glut.h>
#include <stdlib.h>
static GLfloat spin = 0.0;
void init(void)
{
glClearColor(0.0,0.0,0.0,0.0);
glShadeModel(GL_FLAT);
}
........
$cat double1.c
#include <GL/gl.h>
#include <GL/glu.h>
#include <GL/glut.h>
#include <stdlib.h>
static GLfloat spin = 0.0;
void init(void)
{
glClearColor (0.0, 0.0, 0.0, 0.0);
glShadeModel (GL_FLAT);
}
...........
$cmp double.c double1.c
It outputs these:
double.c double1.c diffrent: line 8 char 125
So
$cmp -l double.c double1.c
It output these:
125 12 40
126 173 12
127 12 173
128 40 12
144 50 40
145 60 50
146 56 60
.........
12 is "new line", 40 is ")" and 173 is "{"
the difference in a Windows editor is illustrated as following
So how this difference come from? I am confused.
double.c
https://docs.google.com/file/d/0B5qhYcc2Fk0sZmEySUNIc1RzM3M/edit?usp=sharing
double1.c
https://docs.google.com/file/d/0B5qhYcc2Fk0sRTdoV1RDNnB2QkU/edit?usp=sharing

Related

Matlab: delete complete line in txt-file if there is a non-ascii-character

I'm currently writing a Matlab code to plot measurement data. Unfortunately there is a hardware problem with serial communication and sometimes i receive just gibberish. My code works only for defined data, so this gibberish has to be removed. I want something like this pseudo code:
for eachLine
if currentLineContainsNonASCII
delete completeLine
end if
end for
the data is read like this
rawdataInputFilename = 'measurementData.txt';
fileID = fopen(rawdataInputFilename);
% load data as string
DataCell = textscan(fileID,'%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s','HeaderLines', 1);
I was thinking about first creating a new 'clean' file with only ASCII chars and then reading that file with my actual plotting code.
Where I stuck is how to identify a non ASCII and then deleting the whole line, not only overwriting that single char.
Some example data, 1. and 3. line are 'clean' and can be handled with the current code. Second Line has non ASCIIs in it and therefore kills my code. Whitespace characters are windows linefeed, tab and space.
61 380 Module03 Slot02 27.01.2015 13:47:13 450 3587 1175 84 101.83 22.30 5.20 1 1
62 386 Module03 Slot03 27.01.2015 13:47:18 450ÆădzШШ 106.83 22.30 25.20 1 1
63 391 Module03 Slot04 27.01.2015 13:47:24 ERROR dgsf 5643332 103.26 22.40 25.20 1 1
You can just check if the received character is in the range [32, 127], otherwise skip it.
The following function will tell you if there is any non-printable character in a given string:
function R = has_non_printable_characters(str)
% Remove non-printable characters
str2 = str(31<str & str<127);
% check if length of resulting string is the same than input string
R = (lenght(str) > length(str2))
end;
If instead of just skipping the entire string you want to remove non-printable characters keeping the printable ones, modify the function and return str2. (And change the function name so it matches the new behaviour)
There are several ways to do it.
Save that to a text file named data.txt:
bla Header bla
61 380 Module03 Slot02 27.01.2015 13:47:13 450 3587 1175 84 101.83 22.30 5.20 1 1
62 386 Module03 Slot03 27.01.2015 13:47:18 450ÆădzШШ 106.83 22.30 25.20 1 1
63 391 Module03 Slot04 27.01.2015 13:47:24 ERROR dgsf 5643332 103.26 22.40 25.20 1 1
Method 1 (using textscan and cellfun):
Removing the non-ASCII line completely:
fileID = fopen('data.txt'); % open file
DataCell = textscan(fileID,'%s','delimiter','','HeaderLines', 1); % read a complete line of text, ignore the first line
fclose(fileID); % close file
DataCell = DataCell{1}; % there is only one string per line
DataCell(cellfun(#(x) any(x>127),DataCell)) = []; % remove line if there is any non-ASCII in it, adjust that to your liking, i.e (x>126 | x<32)
celldisp(DataCell)
DataCell{1} =
61 380 Module03 Slot02 27.01.2015 13:47:13 450 3587 1175 84 101.83 22.30 5.20 1 1
DataCell{2} =
63 391 Module03 Slot04 27.01.2015 13:47:24 ERROR dgsf 5643332 103.26 22.40 25.20 1 1
You could now loop over the cell array or, if you like, start all over again with the updated text (f.e. as input to textscan). To do that join the cells together to one big chunk of text:
strjoin(DataCell','\n')
ans =
61 380 Module03 Slot02 27.01.2015 13:47:13 450 3587 1175 84 101.83 22.30 5.20 1 1
63 391 Module03 Slot04 27.01.2015 13:47:24 ERROR dgsf 5643332 103.26 22.40 25.20 1 1
Method 2 (using regexprep):
I'm loading the whole text file at once and replacing any line with an empty string '', which does not contain a given set of characters.
s = fileread('data.txt');
snew = regexprep(s, '.*[^\w\s.:].*\n', '', 'dotexceptnewline')
snew =
61 380 Module03 Slot02 27.01.2015 13:47:13 450 3587 1175 84 101.83 22.30 5.20 1 1
63 391 Module03 Slot04 27.01.2015 13:47:24 ERROR dgsf 5643332 103.26 22.40 25.20 1 1
The [^\w\s.:] bit bascially translates to:
Match any chararcter which is not (the ^ means not):
alphabetic, numeric or underscore (\w)
whitespace (\s)
a dot . or
a colon :
If you want to exclude any other ASCII character, just add it (to within the brackets).
here is the code which creates a new txt-file whitout the lines with non-ASCII
%% read in via GUI
[inputFilename, inputPathname] = uigetfile('*.txt', ...
'Pick a .txt file from which you want to remove lines with non ASCII characters.');
if isequal(inputFilename, 0)
disp('User selected ''Cancel''')
else
disp(['User selected ', fullfile(inputPathname, inputFilename)])
inputFileID = fopen(fullfile(inputPathname, inputFilename)); %open/load file
end
tempCell = (strsplit(inputFilename,'.'));
inputFilenameWOextension = cell2mat(tempCell(1));
fileExtension = cell2mat(tempCell(2));
outputFileID = fopen([inputFilenameWOextension, '_ASCIIonly.', fileExtension], 'w'); %overwrite existing file
% get a single line of text
tline = fgetl(inputFileID);
while tline ~= -1
% get a single line of text
tline = fgetl(inputFileID);
% Remove non-printable characters
tempStr = tline(tline<127); % not really ASCII, but also tab
%tempStr = tline(31<tline & tline<127); % true ASCII
if (length(tempStr) < length(tline));
continue;
else
fprintf(outputFileID, '%s\r\n', tempStr);
end
end
fclose(inputFileID);
fclose(outputFileID);

MIDI and Bit Order

I'm stuck trying to format a MIDI Sys Ex message that a device keeps rejecting as invalid. The problem is a section of the message that involves a type of data encoding described below.
According to the manual,
the device "will encode/interpret a consecutive group of 4-bytes"
Byte #0 - b31b30b29b28b27b26b25b24
Byte #1 - b23b22b21b20b19b18b17b16
Byte #2 - b15b14b13b12b11b10b09b08
Byte #3 - b07b06b05b04b03b02b01b00
as the following 5 consecutive SysEx bytes:"
Byte #0 - 0 b06b05b04b03b02b01b00
Byte #1 - 0 b13b12b11b10b09b08b07
Byte #2 - 0 b20b19b18b17b16b15b14
Byte #3 - 0 b27b26b25b24b23b22b21
Byte #4 - 0 0 0 0 b31b30b29b28
where "b" is the bit number. Notice the bit numbering has been flipped. Which way are you supposed to read the bits? MIDI data is, by convention, reverse bit ordered (MSB=7), if that helps. Also, the manual notes that "all data types are in Motorola big-endian byte order."
Here's a description of the message I'm trying to format correctly -
"A command will allow a consecutive group of one to four bytes to be edited. When 3 or less bytes are specified the device expects the Parameter Value field to be bit ordered as if it was performing a full 32-bit (4 byte) parameter change. For example, when editing a two byte parameter, Byte #0 will occupy the bit range of b24-b31, while Byte #1 will occupy bits b16-b23. The remaining bits (b00-b15) in the parameter value field should be set to zero."
bb Parameter Offset - 0 b06b05b04b03b02b01b00
bb Parameter Offset - 0 b13b12b11b10b09b08b07
bb Parameter Offset - 0 b20b19b18b17b16b15b14
bb Parameter Offset - 0 b27b26b25b24b23b22b21
bb Parameter Offset - 0 0 0 0 b31b30b29b28
0b Parameter Byte Size (1 to 4)
00
00
00
00
bb Parameter Value - 0 b06b05b04b03b02b01b00
bb Parameter Value - 0 b13b12b11b10b09b08b07
bb Parameter Value - 0 b20b19b18b17b16b15b14
bb Parameter Value - 0 b27b26b25b24b23b22b21
bb Parameter Value - 0 0 0 0 b31b30b29b28
So, when trying to enter offset values of 15H, 16H, 17H, and 18H, with respective values of let's say 00, 01, 02, 03 respectively, how would I encode those hex values, or do I even need to encode them? If I do need to, which direction do I write the bits so the binary values are correct?
When written, the order of bits in a byte is always big-endian, i.e., MSB first.
This can be confirmed by the fact that the MSB must be zero for data bytes.
Four bytes:
15h -> 00010101
16h -> 00010110
17h -> 00010111
18h -> 00011000
SysEx bytes:
0 0011000 -> 18h
0 0101110 -> 2eh
0 1011000 -> 58h
0 0101000 -> 28h
0000 0001 -> 01h
Four bytes:
01 -> 00000001
02 -> 00000010
03 -> 00000011
04 -> 00000100
SysEx bytes:
0 0000100 -> 04h
0 0000110 -> 06h
0 0001000 -> 08h
0 0001000 -> 08h
0000 0000 -> 00h
The basic approach to this conversion is to combine the four 8-bit values into a 32-bit value and then successively move the least significant seven bits into five bytes. Here's a sample program that does what you need.
#include <stdio.h>
#include <stdint.h>
void convert(uint8_t byte0, uint8_t byte1, uint8_t byte2, uint8_t byte3, uint8_t *outBytes) {
uint32_t inBytes = byte0 << 24 | byte1 << 16 | byte2 << 8 | byte3; //combine the input bytes into a single 32-bit value
for (int i = 0; i < 5; i++) {
outBytes[i] = inBytes & 0x7F; //Copy the least significant seven bits into the next byte in the output array
inBytes >>= 7; //Shift right to discard the seven bits that were copied
}
}
void printByteArray(uint8_t *byteArray) {
for (int i = 0; i < 5; i++) {
printf("Byte %d: %02xh\n", i, byteArray[i]);
}
printf("\n");
}
int main(int argc, char *argv[]) {
uint8_t sysExBytes[5]; //Five bytes to contain the converted SysExData
convert(0x15, 0x16, 0x17, 0x18, sysExBytes);
printByteArray(sysExBytes);
convert(0x00, 0x01, 0x02, 0x03, sysExBytes);
printByteArray(sysExBytes);
return 0;
}
Output:
Byte 0: 18h
Byte 1: 2eh
Byte 2: 58h
Byte 3: 28h
Byte 4: 01h
Byte 0: 03h
Byte 1: 04h
Byte 2: 04h
Byte 3: 00h
Byte 4: 00h
Your question is a bit confusing. 1 byte is 8 bits, therefore the following lines:
Byte - #0 b31b30b29b28b27b26b25b24
Byte - #1 b23b22b21b20b19b18b17b16
Byte - #2 b15b14b13b12b11b10b09b08
Byte - #3 b07b06b05b04b03b02b01b00
Do not make sense to me. The first line #0 has 8 bytes in it. I did some searching and this is a better explanation (http://www.music.mcgill.ca/~ich/classes/mumt306/midiformat.pdf). Taking the contents from page 2 and editing them for clarity.
Offset | Byte 0 | Byte 1 | Byte 2 | Byte 3
-------- bits | 24-31 | 16-23 | 8-15 | 0-7
00000000 | 00 | | |
00000040 | 40 | | |
0000007F | 7F | | |
00000080 | 81 | 00 | |
00002000 | C0 | 00 | |
00003FFF | FF | 7F | | <--- example
00004000 | 81 | 80 | 00 |
00100000 | C0 | 80 | 00 |
001FFFFF | FF | FF | 7F |
00200000 | 81 | 80 | 80 | 00
08000000 | C0 | 80 | 80 | 00
0FFFFFFF | FF | FF | FF | 7F
Example at offset 3FFF
Hex format 0xFF7F_0000 (32-bit number, unused bits are 0)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Does this help?

DEFLATE Encoding with static Huffman Codes

need some help to understand how DEFLATE Encoding works. I know that is a combination of the LZSS algorithm and Huffman coding.
So let encode for example "Deflate late". Params: [Search buffer: 8kb and Look-ahead buffer 4kb] Well, the output of LZSS algorithm is "Deflate <5, 4>" The next step uses static huffman coding to reduce the redundancy. Here is my problem, I dont know how should i encode this pair <5, 4> with huffman.
[Edited]
D 000
f 001
l 010
a 011
t 100
_ 101
e 11
So well, according to this table the string "Deflate " is written as 000 11 001 010 011 100 11 101. As a next step lets encode the pair (5, 4). The fixed prefix code of the length 4 according to the book "Data Compression - The Complete Reference" is 258, followed by fixed prefix code of the distance 5 (Code 4 + 1 Extra bit).
That can be summarized as:
length 4 -> 258 -> 0000010
distance 5 -> 4 + 1 extra bit -> 00100|0
So, the encoded string is written as [header: 1 01] 000 11 001 010 011 100 11 101 0000010 001000 [end-of-block: 0000000], BUT if i create a huffman tree, it is not a static huffman anymore, right?
Good day
D 000
f 001
l 010
a 011
t 100
_ 101
e 11
is not the Deflate static code. The static literal/length codes are all 7, 8, or 9 bits, and the distance codes are all 5 bits. You asked about the static codes.
'Deflate late' encoded in static deflate format as the literals 'Deflate ' and a length 4, distance 5 match in hex is:
73 49 4d cb 49 2c 49 55 00 11 00
That is broken down as follows (bits are read from the least significant part of each byte first):
011 - 01 means fixed code, 1 means last block
00101110 - D
10101001 - e
01101001 - f
00111001 - l
10001001 - a
00100101 - t
10101001 - e
00001010 - space
0100000 - length 4
00100 - distance 5 or 6 depending on one extra bit
0 - extra bit -> distance 5
0000000 - end code
0 - fill bit to byte boundary

Solaris and Preprocessor Macros

Would someone post the results of cpp -dM < /dev/null from a Solaris 10 or above system?
I'm having trouble locating what preprocessor macros are typically defined. Solaris documentation does not discuss it in detail [1], [2], and Google is not being very helpful.
Thanks in advance.
Solaris 11.1
#define __DBL_MIN_EXP__ (-1021)
#define __FLT_MIN__ 1.17549435e-38F
#define __CHAR_BIT__ 8
#define __WCHAR_MAX__ 2147483647
#define __DBL_DENORM_MIN__ 4.9406564584124654e-324
#define __FLT_EVAL_METHOD__ 0
#define __DBL_MIN_10_EXP__ (-307)
#define __FINITE_MATH_ONLY__ 0
#define __GNUC_PATCHLEVEL__ 3
#define sparc 1
#define __SHRT_MAX__ 32767
#define __LDBL_MAX__ 1.18973149535723176508575932662800702e+4932L
#define __unix 1
#define __LDBL_MAX_EXP__ 16384
#define __SCHAR_MAX__ 127
#define __USER_LABEL_PREFIX__
#define __STDC_HOSTED__ 1
#define __LDBL_HAS_INFINITY__ 1
#define __DBL_DIG__ 15
#define __FLT_EPSILON__ 1.19209290e-7F
#define __LDBL_MIN__ 3.36210314311209350626267781732175260e-4932L
#define __unix__ 1
#define __DECIMAL_DIG__ 36
#define __LDBL_HAS_QUIET_NAN__ 1
#define __GNUC__ 3
#define __DBL_MAX__ 1.7976931348623157e+308
#define __DBL_HAS_INFINITY__ 1
#define __SVR4 1
#define __DBL_MAX_EXP__ 1024
#define __LONG_LONG_MAX__ 9223372036854775807LL
#define __sparc__ 1
#define __GXX_ABI_VERSION 1002
#define __FLT_MIN_EXP__ (-125)
#define __DBL_MIN__ 2.2250738585072014e-308
#define __DBL_HAS_QUIET_NAN__ 1
#define __sun 1
#define __REGISTER_PREFIX__
#define __NO_INLINE__ 1
#define __FLT_MANT_DIG__ 24
#define __VERSION__ "3.4.3 (csl-sol210-3_4-20050802)"
#define __sparc 1
#define sun 1
#define unix 1
#define __SIZE_TYPE__ unsigned int
#define __ELF__ 1
#define __FLT_RADIX__ 2
#define __LDBL_EPSILON__ 1.92592994438723585305597794258492732e-34L
#define __FLT_HAS_QUIET_NAN__ 1
#define __FLT_MAX_10_EXP__ 38
#define __LONG_MAX__ 2147483647L
#define __FLT_HAS_INFINITY__ 1
#define __PRAGMA_REDEFINE_EXTNAME 1
#define __LDBL_MANT_DIG__ 113
#define __WCHAR_TYPE__ long int
#define __FLT_DIG__ 6
#define __INT_MAX__ 2147483647
#define __FLT_MAX_EXP__ 128
#define __DBL_MANT_DIG__ 53
#define __WINT_TYPE__ long int
#define __LDBL_MIN_EXP__ (-16381)
#define __LDBL_MAX_10_EXP__ 4932
#define __DBL_EPSILON__ 2.2204460492503131e-16
#define __sun__ 1
#define __svr4__ 1
#define __FLT_DENORM_MIN__ 1.40129846e-45F
#define __FLT_MAX__ 3.40282347e+38F
#define __FLT_MIN_10_EXP__ (-37)
#define __GNUC_MINOR__ 4
#define __DBL_MAX_10_EXP__ 308
#define __LDBL_DENORM_MIN__ 6.47517511943802511092443895822764655e-4966L
#define __PTRDIFF_TYPE__ int
#define __LDBL_MIN_10_EXP__ (-4931)
#define __LDBL_DIG__ 33
#jens:
Sparc systems from Solaris 10 onwards are always 64bit. X64 systems came in with Solaris 10, though you could still boot a 32bit x86 kernel.
$ isainfo
amd64 i386
$ isainfo -b
64
$ isainfo -v
64-bit amd64 applications
avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3
sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu efs f16c rdrand
32-bit i386 applications
avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3
sse2 sse fxsr mmx cmov sep cx8 tsc fpu efs f16c rdrand
/usr/bin/isainfo is your friend, from the command line.
From a programmatic point of view, examine the #defines in <sys/feature_tests.h>. You'll notice _LP64 ...

Hex combination of binary flags

Which of the following give back 63 as long (in Java) and how?
0x0
0x1
0x2
0x4
0x8
0x10
0x20
I'm working with NetworkManager API flags if that helps. I'm getting 63 from one of the operations but don't know how should I match the return value to the description.
Thanks
63 is 32 | 16 | 8 | 4 | 2 | 1, where | is the binary or operator.
Or in other words (in hex): 63 (which is 0x3F) is 0x20 | 0x10 | 0x8 | 0x4 | 0x2 | 0x1. If you look at them all in binary, it is obvious:
0x20 : 00100000
0x10 : 00010000
0x08 : 00001000
0x04 : 00000100
0x02 : 00000010
0x01 : 00000001
And 63 is:
0x3F : 00111111
If you're getting some return status and want to know what it means, you'll have to use binary and. For example:
if (status & 0x02)
{
}
Will execute if the flag 0x02 (that is, the 2nd bit from the right) is turned on in the returned status. Most often, these flags have names (descriptions), so the code above will read something like:
if (status & CONNECT_ERROR_FLAG)
{
}
Again, the status can be a combination of stuff:
// Check if both flags are set in the status
if (status & (CONNECT_ERROR_FLAG | WRONG_IP_FLAG))
{
}
P.S.: To learn why this works, this is a nice article about binary flags and their combinations.
I'd give you the same answer as Chris: your return value 0x63 seems like a combination of all the flags you mention in your list (except 0x0).
When dealing with flags, one easy way to figure out by hand which flags are set is by converting all numbers to their binary representation. This is especially easy if you already have the numbers in hexadecimal, since every digit corresponds to four bits. First, your list of numbers:
0x01 0x02 0x04 ... 0x20
| | | |
| | | |
V V V V
0000 0001 0000 0010 0000 0100 ... 0010 0000
Now, if you take your value 63, which is 0x3F (= 3 * 161 + F * 160, where F = 15) in hexadecimal, it becomes:
0x3F
|
|
V
0011 1111
You quickly see that the lower 6 bits are all set, which is an "additive" combination (bitwise OR) of the above binary numbers.
63 (decimal) equals 0x3F (hex). So 63 is a combination of all of the following flags:
0x20
0x10
0x08
0x04
0x02
0x01
Is that what you were looking for?