Reversing endianness with memcpy - iphone

I'm writing a method that creates an in-memory WAV file. The first 4 bytes of the file should contain the characters 'RIFF', so I'm writing the bytes like this:
Byte *bytes = (Byte *)malloc(len); // overall length of file
char *RIFF = (char *)'RIFF';
memcpy(&bytes[0], &RIFF, 4);
The problem is that this writes the first 4 bytes as 'FFIR', thanks to little-endianness. To correct this problem, I'm just doing this:
Byte *bytes = (Byte *)malloc(len);
char *RIFF = (char *)'FFIR';
memcpy(&bytes[0], &RIFF, 4);
This works, but is there a better-looking way of getting memcpy to reverse the order of the bytes it's writing?

You're doing some bad things with pointers (and some weird but not wrong things). Try this:
Byte *bytes = malloc(len); // overall length of file
char *RIFF = "RIFF";
memcpy(bytes, RIFF, 4);
It'll work fine.

Related

USART format data type

i would like to ask, how i can send data via usart as integer, i mean variable which stores number. I am able to send char variable, but terminal shows me ascii presentation of this number and i need to see number.
I edited code like shown below but it gives me error: "conflicting types for 'USART_Transmit'"
#include <avr/io.h>
#include <util/delay.h>
#define FOSC 8000000// Clock Speed
#define BAUD 9600
#define MYUBRR FOSC/16/BAUD-1
void USART_Init( unsigned int ubrr );
void USART_Transmit( unsigned char data );
unsigned char USART_Receive( void );
int main( void )
{
unsigned char str[5] = "serus";
unsigned char strLenght = 5;
unsigned int i = 47;
USART_Init ( MYUBRR );
//USART_Transmit('S' );
while(1)
{
/*USART_Transmit( str[i++] );
if(i >= strLenght)
i = 0;*/
USART_Transmit(i);
_delay_ms(250);
}
return(0);
}
void USART_Init( unsigned int ubrr )
{
/* Set baud rate */
UBRR0H = (unsigned char)(ubrr>>8);
UBRR0L = (unsigned char)ubrr;
/* Enable receiver and transmitter */
UCSR0B = (1<<RXEN)|(1<<TXEN);
/* Set frame format: 8data, 2stop bit */
UCSR0C = (1<<USBS)|(3<<UCSZ0);
}
void USART_Transmit( unsigned int data )
{
/* Wait for empty transmit buffer */
while ( !( UCSR0A & (1<<UDRE)) )
;
/* Put data into buffer, sends the data */
UDR0 = data;
}
unsigned char USART_Receive( void )
{
/* Wait for data to be received */
while ( !(UCSR0A & (1<<RXC)) )
;
/* Get and return received data from buffer */
return UDR0;
}
Do you have any ideas what is wrong?
PS: I hope you understand what im trying to explain.
I like to use sprintf to format numbers for serial.
At the top of your file, put:
#include <stdio.h>
Then write some code in a function like this:
char buffer[16];
sprintf(buffer, "%d\n", number);
char * p = buffer;
while (*p) { USART_Transmit(*p++); }
The first two lines construct a null-terminated string in the buffer. The last two lines are a simple loop to send all the characters in the buffer. I put a newline in the format string to make it easier to see where one number ends and the other begins.
Technically a UART serial connection is just a stream of bits divided into symbols of a certain length. It's perfectly possible send the data in raw form, but this comes with a number of issues the must be addressed:
How to identify the start and end of a transmission unambiguously?
How to deal with endianess on either side of the connection?
How to serialize and deserialize the data in a robust way?
How to deal with transmission errors?
At the end of the day it turns out, that you never can resolve all the ambiguties and binary data somehow must be escaped or otherwise encoded to prevent misinterpretation.
As far as delimiting transmissions is concerned, that has been addressed by the creators of the ASCII standard through the set of nonprintable control characters: Of interest for you should be the special control characters
STX / 0x02 / Start of Text
ETX / 0x03 / End of Text
There are also other control characters which form a pretty complete set to make up data structures; you don't need JSON or XML for this. However ASCII itself does support the transmission of arbitrary binary data. However the standard staple for this task for a long time has been and is base64 encoding. Use that for transmission of arbitrary binary data.
Numbers you probably should not transmit in binary at all; just push digits around; if you're using octal or hexadecimal digits parsing into integers is super simple (boils down to a bunch of bit masking and shifting).

How does Perl store integers in-memory?

say pack "A*", "asdf"; # Prints "asdf"
say pack "s", 0x41 * 256 + 0x42; # Prints "BA" (0x41 = 'A', 0x42 = 'B')
The first line makes sense: you're taking an ASCII encoded string, packing it into a string as an ASCII string. In the second line, the packed form is "\x42\x41" because of the little endian-ness of short integers on my machine.
However, I can't shake the feeling that somehow, I should be able to treat the packed string from the second line as a number, since that's how (I assume) Perl stores numbers, as little-endian sequence of bytes. Is there a way to do so without unpacking it? I'm trying to get the correct mental model for the thing that pack() returns.
For instance, in C, I can do this:
#include <stdio.h>
int main(void) {
char c[2];
short * x = c;
c[0] = 0x42;
c[1] = 0x41;
printf("%d\n", *x); // Prints 16706 == 0x41 * 256 + 0x42
return 0;
}
If you're really interested in how Perl stores data internally, I'd recommend PerlGuts Illustrated. But usually, you don't have to care about stuff like that because Perl doesn't give you access to such low-level details. These internals are only important if you're writing XS extensions in C.
If you want to "cast" a two-byte string to a C short, you can use the unpack function like this:
$ perl -le 'print unpack("s", "BA")'
16706
However, I can't shake the feeling that somehow, I should be able to treat the packed string from the second line as a number,
You need to unpack it first.
To be able to use it as a number in C, you need
char* packed = "\x42\x41";
int16_t int16;
memcpy(&int16, packed, sizeof(int16_t));
To be able to use it as a number in Perl, you need
my $packed = "\x42\x41";
my $num = unpack('s', $packed);
which is basically
use Inline C => <<'__EOI__';
SV* unpack_s(SV* sv) {
STRLEN len;
char* buf;
int16_t int16;
SvGETMAGIC(sv);
buf = SvPVbyte(sv, len);
if (len != sizeof(int16_t))
croak("usage");
Copy(buf, &int16, 1, int16_t);
return newSViv(int16);
}
__EOI__
my $packed = "\x42\x41";
my $num = unpack_s($packed);
since that's how (I assume) perl stores numbers, as little-endian sequence of bytes.
Perl stores numbers in one of following three fields of a scalar:
IV, a signed integer of size perl -V:ivsize (in bytes).
UV, an unsigned integer of size perl -V:uvsize (in bytes). (ivsize=uvsize)
NV, a floating point numbers of size perl -V:nvsize (in bytes).
In all case, native endianness is used.
I'm trying to get the correct mental model for the thing that pack() returns.
pack is used to construct "binary data" for interfacing with external APIs.
I see pack as a serialization function. It takes as input Perl values, and outputs a serialized form. The fact the output serialized form happens to be a Perl bytestring is more of an implementation detail than a core functionality.
As such, all you're really expected to do with the resulting string is feed it to unpack, though the serialized form is convenient to have it move around processes, hosts, planets.
If you're interested in serializing it to a number instead, consider using vec:
say vec "BA", 0, 16; # prints 16961
To take a closer look at the string's internal representation, take a look at Devel::Peek, though you're not going to see anything surprising with a pure ASCII string.
use Devel::Peek;
Dump "BA";
SV = PV(0xb42f80) at 0xb56300
REFCNT = 1
FLAGS = (POK,READONLY,pPOK)
PV = 0xb60cc0 "BA"\0
CUR = 2
LEN = 16

Objective-C character encoding - Change char to int, and back

Simple task: I need to convert two characters to two numbers, add them together and change that back to an character.
What I have got: (works perfect in Java - where encoding is handled for you, I guess):
int myChar1 = (int)([myText1 characterAtIndex:i]);
int myChar2 = (int)([myText2 characterAtIndex:keyCurrent]);
int newChar = (myChar1 + myChar2);
//NSLog(#"Int's %d, %d, %d", textChar, keyChar, newChar);
char newC = ((char) newChar);
NSString *tmp1 = [NSString stringWithFormat:#"%c", newC];
NSString *tmp2 = [NSString stringWithFormat:#"%#", newString];
newString = [NSString stringWithFormat:#"%#%#", tmp2, tmp1]; //Adding these char's in a string
The algorithm is perfect, but now I can't figure out how to implement encoding properties. I would like to do everything in UTF-8 but have no idea how to get a char's UTF-8 value, for instance. And if I've got it, how to change that value back to an char.
The NSLog in the code outputs the correct values. But when I try to do the opposite with the algorithm (I.e. - the values) then it goes wrong. It gets the wrong character value for weird/odd characters.
NSString works with unichar characters that are 2 bytes long (16 bits). Char is one byte long so you can only store code point from U+0000 to U+00FF (i.e. Basic Latin and Latin-1 Supplement).
You should do you math on unichar values then use +[NSString stringWithCharacters:length:] to create the string representation.
But there is still an issue with that solution. You code may generate code points between U+D800 and U+DFFF that aren't valid Unicode characters. The standard reserves them to encode code points from U+10000 to U+10FFFF in UTF-16 by pairs of 16-bit code units. In such a case, your string would be ill-formed and could neither be displayed nor converted in UTF8.
Also, the temporary variable tmp2 is useless and you should not create a new newString as you concatenate the string but rather use a NSMutableString.
I am assuming that your strings are NSStrings consisting of numerals which represent a number. If that is the case, you could try the following:
Include the following headers:
#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
Then use the following code:
// convert NSString to UTF8 string
const char * utf8String1 = [myText1 UTF8String];
const char * utf8String2 = [myText2 UTF8String];
// convert UTF8 string into long integers
long num1 = strtol(utf8String1, NULL 0);
long num2 = strtol(utf8String2, NULL 0);
// perform calculations
long calc = num1 - num2;
// convert calculated value back into NSString
NSString * calcText = [[NSString alloc] initWithFormat:#"%li" calc];
// convert calculated value back into UTF8 string
char calcUTF8[64];
snprintf(calcUTF8, 64, "%li", calc);
// log results
NSLog(#"calcText: %#", calcText);
NSLog(#"calcUTF8: %s", calcUTF8);
Not sure if this is what you meant, but from what I understood, you wanted to create a NSString with the UTF-8 string encoding from a char?
If that's what you want, maybe you can use the initWithCString:encoding: method in NSString.

What is wrong with this zlib string decompression?

i'm trying to create a simple stringdecompression algorithm for my app.
/*
Decompresses the source buffer into the destination buffer. sourceLen is
the byte length of the source buffer. Upon entry, destLen is the total size
of the destination buffer, which must be large enough to hold the entire
uncompressed data. (The size of the uncompressed data must have been saved
previously by the compressor and transmitted to the decompressor by some
mechanism outside the scope of this compression library.) Upon exit, destLen
is the actual size of the uncompressed buffer.
uncompress returns Z_OK if success, Z_MEM_ERROR if there was not
enough memory, Z_BUF_ERROR if there was not enough room in the output
buffer, or Z_DATA_ERROR if the input data was corrupted or incomplete.
*/
[Base64 initialize];
NSData * data = [Base64 decode:#"MDAwMDAwNTB42vPMVkhKzVNIBeLsnNTMPB0IpVCWWZyVqpAJkalKTVUoS8xTSMpJLC0HALWrEYi="];
NSString * deBase64 = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
int lengteOP = [[deBase64 substringWithRange:NSMakeRange(0,8)] intValue];
NSUInteger lengteIP = [deBase64 length];
const unsigned char *input = (const unsigned char *) [[deBase64 substringFromIndex:8] cStringUsingEncoding:NSASCIIStringEncoding];
unsigned char * dest;
uncompress(dest, lengteOP, input, lengteIP);
A get a EXC_BADD_ACCESS error when i try this.
The string is build with code in delphi using ZLib, same as the library in the iPhone sdk
its a base64 encode string with the first 8 characters representing the length of the string followed by the zlib-ed string.
I did not run the code but your issue is likely to be dest. Here is a snippet from the documentation.
Upon entry, destLen is the total size
of the destination buffer, which must
be large enough to hold the entire
uncompressed data.
The destination needs to have the memory allocated before calling the function, otherwise it will attempt to write data to invalid memory causing EXC_BAD_ACCESS.
Try the following:
unsigned char * dest = malloc(sizeof(unsigned char) * lengteOP);
uncompress(dest, lengteOP, input, lengteIP);
//Use dest (create NSString with proper encoding for example)
free(dest);
- finished code
/*
Decompresses the source buffer into the destination buffer. sourceLen is
the byte length of the source buffer. Upon entry, destLen is the total size
of the destination buffer, which must be large enough to hold the entire
uncompressed data. (The size of the uncompressed data must have been saved
previously by the compressor and transmitted to the decompressor by some
mechanism outside the scope of this compression library.) Upon exit, destLen
is the actual size of the uncompressed buffer.
uncompress returns Z_OK if success, Z_MEM_ERROR if there was not
enough memory, Z_BUF_ERROR if there was not enough room in the output
buffer, or Z_DATA_ERROR if the input data was corrupted or incomplete.
ZEXTERN int ZEXPORT uncompress OF((Bytef *dest, uLongf *destLen,
const Bytef *source, uLong sourceLen));
*/
[Base64 initialize];
NSData * data = [Base64 decode:#"MDAwMDAwNTB42vPMVkhKzVNIBeLsnNTMPB0IpVCWWZyVqpAJkalKTVUoS8xTSMpJLC0HALWrEYi="];
NSString * deBase64 = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
uLongf lengthOriginal = [[deBase64 substringWithRange:NSMakeRange(0,8)] floatValue];
uLongf * lengteOP = malloc(sizeof(uLongf));
lengteOP = &lengthOriginal;
NSUInteger lengteIP = [deBase64 length];
NSString * codedString = [deBase64 substringFromIndex:8];
const unsigned char *input = (const unsigned char *) [codedString cStringUsingEncoding:NSISOLatin1StringEncoding];
unsigned char * dest = malloc((sizeof(unsigned char) * lengthOriginal));
uncompress(dest, lengteOP, input, lengteIP);
NSString * bla = #"Decoded string :";
NSLog([bla stringByAppendingString:[NSString stringWithCString:dest encoding:NSASCIIStringEncoding]]);
free(dest);
free(lengteOP);

Converting an NSString to and from UTF32

I'm working with a database that includes hex codes for UTF32 characters. I would like to take these characters and store them in an NSString. I need to have routines to convert in both ways.
To convert the first character of an NSString to a unicode value, this routine seems to work:
const unsigned char *cs = (const unsigned char *)
[s cStringUsingEncoding:NSUTF32StringEncoding];
uint32_t code = 0;
for ( int i = 3 ; i >= 0 ; i-- ) {
code <<= 8;
code += cs[i];
}
return code;
However, I am unable to do the reverse (i.e. take a single code and convert it into an NSString). I thought I could just do the reverse of what I do above by simply creating a c-string with the UTF32 character in it with the bytes in the correct order, and then create an NSString from that using the correct encoding.
However, converting to / from cstrings does not seem to be reversible for me.
For example, I've tried this code, and the "tmp" string is not equal to the original string "s".
char *cs = [s cStringUsingEncoding:NSUTF32StringEncoding];
NSString *tmp = [NSString stringWithCString:cs encoding:NSUTF32StringEncoding];
Does anyone know what I am doing wrong? Should I be using "wchar_t" for the cstring instead of char *?
Any help is greatly appreciated!
Thanks,
Ron
You have a couple of reasonable options.
1. Conversion
The first is to convert your UTF32 to UTF16 and use those with NSString, as UTF16 is the "native" encoding of NSString. It's not actually all that hard. If the UTF32 character is in the BMP (e.g. it's high two bytes are 0's), you can just cast it to unichar directly. If it's in any other plane, you can convert it to a surrogate pair of UTF16 characters. You can find the rules on the wikipedia page. But a quick (untested) conversion would look like
UTF32Char inputChar = // my UTF-32 character
inputChar -= 0x10000;
unichar highSurrogate = inputChar >> 10; // leave the top 10 bits
highSurrogate += 0xD800;
unichar lowSurrogate = inputChar & 0x3FF; // leave the low 10 bits
lowSurrogate += 0xDC00;
Now you can create an NSString using both characters at the same time:
NSString *str = [NSString stringWithCharacters:(unichar[]){highSurrogate, lowSurrogate} length:2];
To go backwards, you can use [NSString getCharacters:range:] to get the unichar's back and then reverse the surrogate pair algorithm to get your UTF32 character back (any characters which aren't in the range 0xD800-0xDFFF should just be cast to UTF32 directly).
2. Byte buffers
Your other option is to let NSString do the conversion directly without using cStrings. To convert a UTF32 value into an NSString you can use something like the following:
UTF32Char inputChar = // input UTF32 value
inputChar = NSSwapHostIntToLittle(inputChar); // swap to little-endian if necessary
NSString *str = [[[NSString alloc] initWithBytes:&inputChar length:4 encoding:NSUTF32LittleEndianStringEncoding] autorelease];
To get it back out again, you can use
UTF32Char outputChar;
if ([str getBytes:&outputChar maxLength:4 usedLength:NULL encoding:NSUTF32LittleEndianStringEncoding options:0 range:NSMakeRange(0, 1) remainingRange:NULL]) {
outputChar = NSSwapLittleIntToHost(outputChar); // swap back to host endian
// outputChar now has the first UTF32 character
}
There are two probelms here:
1:
The first one is that both [NSString cStringUsingEncoding:] and [NSString getCString:maxLength:encoding:] return the C-string in native-endianness (little) without adding a BOM to it when using NSUTF32StringEncoding and NSUTF16StringEncoding.
The Unicode standard states that: (see, "How I should deal with BOMs")
"If there is no BOM, the text should be interpreted as big-endian."
This is also stated in NSString's documentation: (see, "Interpreting UTF-16-Encoded Data")
"... if the byte order is not otherwise specified, NSString assumes that the UTF-16 characters are big-endian, unless there is a BOM (byte-order mark), in which case the BOM dictates the byte order."
Although they're referring to UTF-16, the same applies to UTF-32.
2:
The second one is that [NSString stringWithCString:encoding:] internally uses CFStringCreateWithCString to create the C-string. The problem with this is that CFStringCreateWithCString only accepts strings using 8-bit encodings. From the documentation: (see, "Parameters" section)
The string must use an 8-bit encoding.
To solve this issue:
Explicitly state the encoding endianness you want to use both ways (NSString -> C-string and C-string -> NSString)
Use [NSString initWithBytes:length:encoding:] when trying to create an NSString from a C-string encoded in UTF-32 or UTF-16.
Hope this helps!