Using memcpy/memset - iphone

When using memset or memcpy within an Obj-C program, will the compiler optimise the setting (memset) or copying (memcpy) of data into 32-bit writes or will it do it byte by byte?

You can see the libc implementations of these methods in the Darwin source. In 10.6.3, memset works at the word level. I didn't check memcpy, but probably it's the same.
You are correct that it's possible for the compiler to do the work inline instead of calling these functions. I suppose I'll let someone who knows better answer what it will do, though I would not expect a problem.

Memset will come as part of your standard C library so it depends on the implementation you are using. I would guess most implementations will copy in blocks of the native CPU size (32/64 bits) and then the remainder byte-by-byte.
Here is glibc's version of memcpy for an example implementation:
void *
memcpy (dstpp, srcpp, len)
void *dstpp;
const void *srcpp;
size_t len;
{
unsigned long int dstp = (long int) dstpp;
unsigned long int srcp = (long int) srcpp;
/* Copy from the beginning to the end. */
/* If there not too few bytes to copy, use word copy. */
if (len >= OP_T_THRES)
{
/* Copy just a few bytes to make DSTP aligned. */
len -= (-dstp) % OPSIZ;
BYTE_COPY_FWD (dstp, srcp, (-dstp) % OPSIZ);
/* Copy whole pages from SRCP to DSTP by virtual address manipulation,
as much as possible. */
PAGE_COPY_FWD_MAYBE (dstp, srcp, len, len);
/* Copy from SRCP to DSTP taking advantage of the known alignment of
DSTP. Number of bytes remaining is put in the third argument,
i.e. in LEN. This number may vary from machine to machine. */
WORD_COPY_FWD (dstp, srcp, len, len);
/* Fall out and copy the tail. */
}
/* There are just a few bytes to copy. Use byte memory operations. */
BYTE_COPY_FWD (dstp, srcp, len);
return dstpp;
}
So you can see it copies a few bytes first to get aligned, then copies in words, then finally in bytes again. It does some optimized page copying using some kernel operations.

Related

How can I make a good hash function without unsigned integers?

I'm looking for a simple hash function that doesn't rely on integer overflow, and doesn't rely on unsigned integers.
The problem is that I have to create the hash function in blueprint from Unreal Engine (only has signed 32 bit integer, with undefined overflow behavior) and in PHP5, with a version that uses 64 bit signed integers.
So when I use the 'common' simple hash functions, they don't give the same result on both platforms because they all rely on bit-overflowing behavior of unsigned integers.
The only thing that is really important is that is has good 'randomness'. Does anyone know something simple that would accomplish this?
It's meant for a very basic signing symstem for sending messages to a server. Doesn't need to be top security... it's for storing high scores of a simple game on a server. The idea is that I would generate several hash-integers from the message (using different 'start numbers') and append them to make a hash-signature ). I just need to make sure that if people sniff the network messages send to the server that they cannot easily send faked messages. They would need to provide the correct hash-signature with their message, which they shouldn't be able to do unless they know the hash function being used. Ofcourse if they reverse engineer the game they can still 'hack' it, but I wouldn't know how to counter that...
I have no access to existing hash functions in the unreal engine blueprint system.
The first thing I would try would be to simulate the behavior of unsigned integers using signed integers, by explicitly applying the modulo operator whenever the accumulated hash-value gets large enough that it might risk overflowing.
Example code in C (apologies for the poor hash function, but the same technique should be applicable to any hash function, at least in principle):
#include <stdio.h>
#include <string.h>
int hashFunction(const char * buf, int numBytes)
{
const int multiplier = 33;
const int maxAllowedValue = 2147483648-256; // assuming 32-bit ints here
const int maxPreMultValue = maxAllowedValue/multiplier;
int hash = 536870912; // arbitrary starting number
for (int i=0; i<numBytes; i++)
{
hash = hash % maxPreMultValue; // make sure hash cannot overflow in the next operation!
hash = (hash*multiplier)+buf[i];
}
return hash;
}
int main(int argc, char ** argv)
{
while(1)
{
printf("Enter a string to hash:\n");
char buf[1024]; fgets(buf, sizeof(buf), stdin);
printf("Hash code for that string is: %i\n", hashFunction(buf, strlen(buf)));
}
}

D language unsigned hash of string

I am a complete beginner with the D language.
How to get, as an uint unsigned 32 bits integer in the D language, some hash of a string...
I need a quick and dirty hash code (I don't care much about the "randomness" or the "lack of collision", I care slightly more about performance).
import std.digest.crc;
uint string_hash(string s) {
return crc320f(s);
}
is not good...
(using gdc-5 on Linux/x86-64 with phobos-2)
While Adams answer does exactly what you're looking for, you can also use a union to do the casting.
This is a pretty useful trick so may as well put it here:
/**
* Returns a crc32Of hash of a string
* Uses a union to store the ubyte[]
* And then simply reads that memory as a uint
*/
uint string_hash(string s){
import std.digest.crc;
union hashUnion{
ubyte[4] hashArray;
uint hashNumber;
}
hashUnion x;
x.hashArray = crc32Of(s); // stores the result of crc32Of into the array.
return x.hashNumber; // reads the exact same memory as the hashArray
// but reads it as a uint.
}
A really quick thing could just be this:
uint string_hash(string s) {
import std.digest.crc;
auto r = crc32Of(s);
return *(cast(uint*) r.ptr);
}
Since crc32Of returns a ubyte[4] instead of the uint you want, a conversion is necessary, but since ubyte[4] and uint are the same thing to the machine, we can just do a reinterpret cast with the pointer trick seen there to convert types for free at runtime.

Can ancillary data be portably allocated?

IEEE Std 1003.1-2008's <sys/socket.h> section doesn't provide the CMSG_SPACE or CMSG_LEN macros, and instead merely says:
Ancillary data consists of a sequence of pairs, each consisting of a
cmsghdr structure followed by a data array.
Is there a portable way to allocate ancillary data without CMSG_SPACE, or to attach ancillary data to a message without CMSG_LEN? That quote suggests to me that a single buffer with size (sizeof(struct cmsghdr)+ sizeof data)*nr_of_pairs (where data may change per pair, of course), with each individual cmgshdr.cmsglen = sizeof(struct cmsghdr) + sizeof data and msg.msg_controllen = (sizeof(struct cmsghdr)+ sizeof data)*nr_of_pairs, but all of the system-specific documentation for CMSG_SPACE/CMSG_LEN suggests that there are alignment issues that may get in the way of this.
OK, so from what I can tell my guess as to how to allocate wouldn't work in general (I couldn't get it to work on Linux, I had to use CMSG_SPACE/CMSG_LEN instead). Based on the diagram in section 4.2 of rfc2292, I came up with the following definitions for CMSG_SPACE and CMSG_LEN that I think should be portable to conforming implementations of IEEE Std 1003.1-2008:
#include <stddef.h>
#include <sys/socket.h>
#ifndef CMSG_LEN
socklen_t CMSG_LEN(size_t len) {
return (CMSG_DATA((struct cmsghdr *) NULL) - (unsigned char *) NULL) + len;
}
#endif
#ifndef CMSG_SPACE
socklen_t CMSG_SPACE(size_t len) {
struct msghdr msg;
struct cmsghdr cmsg;
msg.msg_control = &cmsg;
msg.msg_controllen = ~0ULL; /* To maximize the chance that CMSG_NXTHDR won't return NULL */
cmsg.cmsg_len = CMSG_LEN(len);
return (unsigned char *) CMSG_NXTHDR(&msg, &cmsg) - (unsigned char *) &cmsg;
}
#endif
Obvously this should be done with macros, but I think this shows the idea. This seems really hacky to me and, due to possible size checks in CMSG_NXTHDR, can't be shoved into a compile-time constant, so probably the next version of POSIX should define CMSG_SPACE and CMSG_LEN since any program using ancillary data has to use them anyway.

Perl XS and C++ passing pointer to buffer

I know almost no C++ so that's not helping, and my XS isn't much better. I'm creating an XS interface for a C++ library and I have almost all my methods working except one.
The method in Perl should look like this:
$return_data = $obj->readPath( $path );
The method is defined as this the .h file:
int readPath(const char* path, char* &buffer, bool flag=true);
The "buffer" will get allocated if it's passed in NULL.
There's two additional versions of readPath with different signatures, but they are not the ones I want. (And interestingly, when I try and compile it tells me the "candidates" are the two I don't want.) Is that because it's not understanding the "char * &"?
Can someone help with the xsub I need to write?
I'm on Perl 5.14.2.
BTW -- I've also used a typemap "long long int" to T_IV. I cannot find any documentation on how to correctly typemap long long. Any suggestions how I should typemap long long?
Thanks,
I've never dealt with C++ from C or XS. If it was C, it would be:
void
readPath(SV* sv_path)
PPCODE:
{
char* path = SvPVbyte_nolen(sv_path, len);
char* buffer = NULL;
if (!readPath(path, &buffer, 0))
XSRETURN_UNDEF;
ST(0) = sv_2mortal(newSVpv(buffer, 0));
free(buffer);
XSRETURN(1);
}
Hopefully, that works or you can adjust it to work.
I assumed:
readPath returns true/false for success/failure.
buffer isn't allocated on failure.
The deallocator for buffer is free.
Second part of the question is TYPEMAP for long long (or long long int).
Normally long is at least 32 bits and long long is at least 64. The default typemap for long is T_IV. Perl's equivalent for long long is T_IV, too.
But sometimes, you wan't to reduce warnings for cast. So, you can use T_LONG for long. T_LONG is an equivalent to T_IV but explicitly casts the return to type long. The TYPEMAP for T_LONG is descripted at $PERLLIB/ExtUtils/typemap
With this knowledge you can write you own TYPEMAP for long long int:
TYPEMAP: <<TYPEMAPS
long long int T_LONGLONG
INPUT
T_LONGLONG
$var = (long long int)SvIV($arg)
OUTPUT
T_LONGLONG
sv_setiv($arg, (IV)$var);
TYPEMAPS

Determing the number of bytes ready to be recv()'d

I can use select() to determine if a call to recv() would block, but once I've determined that their are bytes to be read, is their a way to query how many bytes are currently available before I actually call recv()?
If your OS provides it (and most do), you can use ioctl(..,FIONREAD,..):
int get_n_readable_bytes(int fd) {
int n = -1;
if (ioctl(fd, FIONREAD, &n) < 0) {
perror("ioctl failed");
return -1;
}
return n;
}
Windows provides an analogous ioctlsocket(..,FIONREAD,..), which expects a pointer to unsigned long:
unsigned long get_n_readable_bytes(SOCKET sock) {
unsigned long n = -1;
if (ioctlsocket(sock, FIONREAD, &n) < 0) {
/* look in WSAGetLastError() for the error code */
return 0;
}
return n;
}
The ioctl call should work on sockets and some other fds, though not on all fds. I believe that it works fine with TCP sockets on nearly any free unix-like OS you are likely to use. Its semantics are a little different for UDP sockets: for them, it tells you the number of bytes in the next datagram.
The ioctlsocket call on Windows will (obviously) only work on sockets.
No, a protocol needs to determine that. For example:
If you use fixed-size messages then you know you need to read X bytes.
You could read a message header that indicates X bytes to read.
You could read until a terminal character / sequence is found.