How would one transfer files larger than 2,147,483,646 bytes (~2 GiB) with Win32 TransmitFile()? - sockets

Quoted from MSDN entry for TransmitFile:
The maximum number of bytes that can be transmitted using a single call to the TransmitFile function is 2,147,483,646, the maximum value for a 32-bit integer minus 1. The maximum number of bytes to send in a single call includes any data sent before or after the file data pointed to by the lpTransmitBuffers parameter plus the value specified in the nNumberOfBytesToWrite parameter for the length of file data to send. If an application needs to transmit a file larger than 2,147,483,646 bytes, then multiple calls to the TransmitFile function can be used with each call transferring no more than 2,147,483,646 bytes. Setting the nNumberOfBytesToWrite parameter to zero for a file larger than 2,147,483,646 bytes will also fail since in this case the TransmitFile function will use the size of the file as the value for the number of bytes to transmit.
Alright. Sending a file of size 2*2,147,483,646 bytes (~ 4 GiB) with TransmitFile would then have to be divided into two parts at minimum (e.g. 2 GiB + 2 GiB in two calls to TransmitFile). But how exactly would one go about doing that, while preferably also keeping the underlying TCP connection alive in between?
When the file is indeed <=2,147,483,646 bytes in size, one could just write:
HANDLE fh = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, NULL,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN, NULL);
TransmitFile(SOCK_STREAM_socket, fh, 0, 0, NULL, NULL, TF_DISCONNECT);
to let Windows handle all the lower-level stuff (caching, chunking the data up into pieces for efficient transmission etc. However, unlike the comparable Linux sendfile() syscall, there is no immediately obvious offset argument in the call (although the fifth argument, LPOVERLAPPED lpOverlapped probably is exactly what I'm looking for). I suppose I could hack something together, but I'm also looking for a graceful, good practice Win32 solution from someone who actually knows about this stuff.
You can use the lpOverlapped parameter to specify a 64-bit offset within the file at which to start the file data transfer by setting the Offset and OffsetHigh member of the OVERLAPPED structure. If lpOverlapped is a NULL pointer, the transmission of data always starts at the current byte offset in the file.
So, for lack of a minimal example readily available on the net, which calls are necessary to accomplish such a task?

Managed to figure it out based on the comments.
So, if LPOVERLAPPED lpOverlapped is a null pointer, the call starts transmission at the current file offset of the file (much like the Linux sendfile() syscall and its off_t *offset parameter). This offset (pointer) can be manipulated with SetFilePointerEx, so one could write:
#define TRANSMITFILE_MAX ((2<<30) - 1)
LARGE_INTEGER total_bytes;
memset(&total_bytes, 0, sizeof(total_bytes));
while (total_bytes < filesize) {
DWORD bytes = MIN(filesize-total_bytes, TRANSMITFILE_MAX);
if (!TransmitFile(SOCK_STREAM_socket, fh, bytes, 0, NULL, NULL, 0))
{ /* error handling */ }
total_bytes.HighPart += bytes;
SetFilePointerEx(fh, total_bytes, NULL, FILE_BEGIN);
}
closesocket(SOCK_STREAM_socket);
to accomplish the task.
Not very elegant imo, but it works.

Related

How to get string input from a socket in D?

I'm using this code to listen to a port:
int start(){
ushort port = 61888;
listener = new TcpSocket();
assert(listener.isAlive);
listener.blocking = false;
listener.bind(new InternetAddress(port));
listener.listen(10);
writefln("Listening on port %d.", port);
enum MAX_CONNECTIONS = 60;
auto socketSet = new SocketSet(MAX_CONNECTIONS + 1);
Socket[] reads;
while (true)
{
socketSet.add( listener);
foreach (sock; reads)
socketSet.add(sock);
Socket.select(socketSet, null, null);
}
return 0;
}
As far as I know, sockets interact with the bytes as they are. I want to find a way how to convert these bytes (which are essentially SQL requests) to strings. How can I do so, providing that input is in UTF-8, which is an encoding using variable size?
You seem to have a few questions here.
How do I get chars from bytes?
cast them with cast(char[]) st. This aliases the bytes, giving you a slice of the exact same data, and doesn't require new allocation. You are not yet assuming that the bytes are valid UTF-8, but autodecoding or other parts of your program might complain if they aren't. You can run it by std.utf.validate if you want.
do basically the same thing with std.string.assumeUTF(st), which at least also asserts on invalid UTF in debug builds only.
How do I get a string from char[]?
You can unsafely alias the char[] with std.exception.assumeUnique(st), or you can allocate an immutable copy with st.idup or std.utf.toUTF8(st).
What if my fixed buffer of bytes contains invalid UTF-8 -- because it got cut off?
If that's a risk you can use low level std.utf tools (decodeFront and catching UTFException is one way) to peel off the valid UTF-8 and then check if you have remaining bytes, or to check that the end of the input is valid UTF-8.
How do I know if I've gotten a complete SQL statement with my fixed buffer socket I/O?
Instead of just passing the raw SQL statement over the line, you can define a network protocol that includes information like statement size, or that has 'end of statement' markers that you can read for.
I've a cheat sheet for string type conversions, which links to a file of more elaborate unittests.
Try this:
import std.exception: assumeUnique;
string s = assumeUnique (cast(char[])ubyteArray);

What is the correct behavior of C_Decrypt in pkcs#11?

I am using C_Decrypt with the CKM_AES_CBC_PAD mechanism. I know that my ciphertext which is 272 bytes long should actually decrypt to 256 bytes, which means a full block of padding was added.
I know that according to the standard when invoking C_Decrypt with a NULL output buffer the function may return an output length which is somewhat longer than the actual required length, in particular when padding is used this is understandable, as the function can't know how many padding bytes are in the final block without carrying out the actual decryption.
So the question is whether if I know that I should get exactly 256 bytes back, such as in the scenario I explained above, does it make sense that I am still getting a CKR_BUFFER_TOO_SMALL error as a result, despite passing a 256 bytes buffer? (To make it clear: I am indicating that this is the length of the output buffer in the appropriate output buffer length parameter, see the parameters of C_Decrypt to observe what I mean)
I am encountering this behavior with a Safenet Luna device and am not sure what to make of it. Is it my code's fault for not querying for the length first by passing NULL in the output buffer, or is this a bug on the HSM/PKCS11 library side?
One more thing I should perhaps mention is that when I provide a 272 (256+16) bytes output buffer, the call succeeds and I am noticing that I am getting back my expected plaintext, but also the padding block which means 16 final bytes with the value 0x10. However, the output length is updated correctly to 256, not 272 - this also proves that I am not using CKM_AES_CBC instead of CKM_AES_CBC_PAD accidentally, which I suspected for a moment as well :)
I have used CKM.AES_CBC_PAD padding mechanism with C_Decrypt in past. You have to make 2 calls to C_Decrypt (1st ==> To get the size of the plain text, 2nd ==> Actual decryption). see the documentation here which talks about determining the length of the buffer needed to hold the plain-text.
Below is the step-by-step code to show the behavior of decryption:
//Defining the decryption mechanism
CK_MECHANISM mechanism = new CK_MECHANISM(CKM.AES_CBC_PAD);
//Initialize to zero -> variable to hold size of plain text
LongRef lRefDec = new LongRef();
// Get ready to decrypt
CryptokiEx.C_DecryptInit(session_1, mechanism, key_handleId_in_hsm);
// Get the size of the plain text -> 1st call to decrypt
CryptokiEx.C_Decrypt(session_1, your_cipher, your_cipher.length, null, lRefDec);
// Allocate space to the buffer to store plain text.
byte[] clearText = new byte[(int)lRefDec.value];
// Actual decryption -> 2nd call to decrypt
CryptokiEx.C_Decrypt(session_1, eFileCipher, eFileCipher.length, eFileInClear,lRefDec);
Sometimes, decryption fails because your input encryption data was misleading (however, encryption is successful but corresponding decryption will fail) the decryption algorithm. So it is important not to send raw bytes directly to the encryption algorithm; rather encoding the input data with UTF-8/16 schema's preserves the data from getting misunderstood as network control bytes.

Matlab TCP-IP Server Lossless Transfer

I have tried and failed to implement a TCP-Listen server in Matlab that is "lossless". By lossless, I mean using the linux socat utility to send a file:
socat -u file.bin TCP4:127.0.0.1:50000
And receive a byte-exact match to that file within Matlab:
function t = test
fid = fopen('x','W');
t =tcpip('0.0.0.0',50000,'NetWorkRole','server','InputBufferSize',50*1024^2);
t.BytesAvailableFcnMode = 'byte';
t.BytesAvailableFcnCount = 1024^2;
t.BytesAvailableFcn = {#FCN, fid };
fopen(t);
function FCN( obj, event, fid )
x=fread( igetfield(obj, 'jobject'), obj.BytesAvailable, 0, 0);
fwrite( fid, x(1),'int8' );
I've tested this a good bit, and had decent success in terms of transfer rate (without the fwrite and use /dev/zero for the file, it saturates a gigabit link), and low cpu load. The trick is bypassing Matlab's tcpip() default wrapper*, and accessing a lower-level method via:
igetfield(obj,'jobject')
For the 2139160576 Byte file I test with, it usually receives ~2139153336 bytes. I've tried various other implementations that receive the fread() output into structs, cells, and array concatenation. They also are missing a few KiB's. I've tried a repeating pattern of 512 random bytes; one test had byte mismatch at the beginning.
Socat->Socat transfer works (obviously).
Socat->my-matlab-code holds at fopen() until socat connects. No data is transferred until fread() is called. I've tried throttling the transfer with the linux "pv" utility:
pv -L 10m file.bin | socat -u - TCP4:127.0.0.1:50000
To no effect.
My question is: where are the bytes going, or what should I test next?
(Edited to include further test results):
Outputting to a file is unnecessary, i.e. the fwrite() call. It is easier and faster to execute t=test, wait for the transfer to complete, i.e. the socat client to return, then query the total bytes transferred from within Matlab:
t.BytesAvailable + t.ValuesReceived
On my Windows machine, this value is always less than the file size of 2139160576 bytes. On my Ubuntu machine, occasionally the values equate. Furthermore, when they do not equate, "netstat -s" segments retransmited and packets discarded do not change. Wireshark monitoring of the loopback interface shows a final Matlab/server ACK sequence number of 2139160578. Presumably, 2 more than the file size due to both the server and client incrementing by one.
*As an aside, Matlab's Instrument Control Toolbox implementation of fread is a terrible wrapper around lower-level code I can't see. matlab\toolbox\shared\instrument\#icinterface\fread.m, function localFormatData, LINE 296. All the data types are explicitly cast to double with numeric type conversion. This results in massive cpu load, not to mention lossy conversion between data types.

Very few write cycles in stm32f4

I'm using a STM32F401VCT6U "discovery" board, and I need to provide a way for the user to write addresses in memory at runtime.
I wrote what can be simplified to the following function:
uint8_t Write(uint32_t address, uint8_t* values, uint8_t count)
{
uint8_t index;
for (index = 0; index < count; ++index) {
if (IS_FLASH_ADDRESS(address+index)) {
/* flash write */
FLASH_Unlock();
if (FLASH_ProgramByte(address+index, values[index]) != FLASH_COMPLETE) {
return FLASH_ERROR;
}
FLASH_Lock();
} else {
/* ram write */
((uint8_t*)address)[index] = values[index]
}
}
return NO_ERROR;
}
In the above, address is the base address, values is a buffer of size at least count which contains the bytes to write to memory and count the number of bytes to write.
Now, my problem is the following: when the above function is called with a base address in flash and count=100, it works normally the first few times, writing the passed values buffer to flash. After those first few calls however, I cannot write just any value anymore: I can only reset bits in the values in flash, eg an attempt to write 0xFF to 0x7F will leave 0x7F in the flash, while writing 0xFE to 0x7F will leave 0x7E, and 0x00 to any value will be successful (but no other value will be writable to the address afterwards).
I can still write normally to other addresses in the flash by changing the base address, but again only a few times (two or three calls with count=100).
This behaviour suggests that the maximum write count of the flash has been reached, but I cannot imagine it can be so fast. I'd expect at the very least 10,000 writes before exhaustion.
So what am I doing wrong?
You have missunderstood how flash works - it is not for example as straight forward as writing EEPROM. The behaviour you are discribing is normal for flash.
To repeatidly write the same address of flash the whole sector must be first erased using FLASH_EraseSector. Generally any data that needs to preserved during this erase needs to be either buffered in RAM or in another flash sector.
If you are repeatidly writing a small block of data and are worried about flash burnout do to many erase write cycles you would want to write an interface to the flash where each write you move your data along the flash sector to unwriten flash, keeping track of its current offset from the start of sector. Only then when you run out of bytes in the sector would you need to erase and start again at start of sector.
ST's "right way" is detailed in AN3969: EEPROM emulation in STM32F40x/STM32F41x microcontrollers
This is more or less the process:
Reserve two Flash pages
Write the latest data to the next available location along with its 'EEPROM address'
When you run out of room on the first page, write all of the latest values to the second page and erase the first
Begin writing values where you left off on page 2
When you run out of room on page 2, repeat on page 1
This is insane, but I didn't come up with it.
I have a working and tested solution, but it is rather different from #Ricibob's answer, so I decided to make this an answer.
Since my user can write anywhere in select flash sector, my application cannot handle the responsability of erasing the sector when needed while buffering to RAM only the data that need to be preserved.
As a result, I transferred to my user the responsability of erasing the sector when a write to it doesn't work (this way, the user remains free to use another address in the sector to avoid too many write-erase cycles).
Solution
Basically, I expose a write(uint32_t startAddress, uint8_t count, uint8_t* values) function that has a WRITE_SUCCESSFUL return code and a CANNOT_WRITE_FLASH in case of failure.
I also provide my user with a getSector(uint32_t address) function that returns the id, start address and end address of the sector corresponding to the address passed as a parameter. This way, the user knows what range of address is affected by the erase operation.
Lastly, I expose an eraseSector(uint8_t sectorID) function that erase the flash sector whose id has been passed as a parameter.
Erase Policy
The policy for a failed write is different from #Ricibob's suggestion of "erase if the value in flash is different of FF", as it is documented in the Flash programming manual that a write will succeed as long as it is only bitreset (which matches the behavior I observed in the question):
Note: Successive write operations are possible without the need of an erase operation when
changing bits from ‘1’ to ‘0’.
Writing ‘1’ requires a Flash memory erase operation.
If an erase and a program operation are requested simultaneously, the erase operation is
performed first.
So I use the macro CAN_WRITE(a,b), where a is the original value in flash and b the desired value. The macro is defined as:
!(~a & b)
which works because:
the logical not (!) will transform 0 to true and everything else to false, so ~a & b must equal 0 for the macro to be true;
any bit at 1 in a is at 0 in ~a, so it will be 0 whatever its value in b is (you can transform a 1 in 1 or 0);
if a bit is 0 in a, then it is 1 in ~a, if b equals 1 then ~a & b != 0 and we cannot write, if bequals 0 it's OK (you can transform a 0 to 0 only, not to 1).
List of flash sector in STM32F4
Lastly and for future reference (as it is not that easy to find), the list of sectors of flash in STM32 can be found on page 7 of the Flash programming manual.

socket conversation terminator

While reading data in socket its important either keep a message terminator symbol or add the Packet size information at the begening of the message.
If a terminator symbol is used and a binary message is sent there is no guarantee that the terminator symbol would not appear in the middle of the message (unless some special encoding is used).
On the other hand if size information is attached. size information is unsigned and if one byte is used for it it cannot be used to transfer messages longer than 256 bytes. if 4 byte integer is used. its not even guaranteed that 4 bytes will come a s whole. just 2 bytes of the size information may come can assuming the size information has arrived it may use that 2 bytes and rest of the integer data will be discarded. waiting for 4 bytes to be available on read buffer may cause infinite awaiting if only 3 bytes are available on the buffer (e.g. if total buffer is 7 bytes or 4077 bytes long).
here comes two possible ways
sizeInfo separator chunk
read until the separator is found once found read until sizeInfo bytes passed
keep an unreadyBytes initialized at 4 upon receiving the sizeInfo change it accordingly
which one of these two is safer to use ? Please Criticize
Edit
My central question is how to make sure that the size bytes has arrived properly. assuming messages are of variable size.
its not even guaranteed that 4 bytes will come a s whole. just 2 bytes of the size information may come can assuming the size information has arrived it may use that 2 bytes and rest of the integer data will be discarded. waiting for 4 bytes to be available on read buffer may cause infinite awaiting if only 3 bytes are available on the buffer (e.g. if total buffer is 7 bytes or 4077 bytes long).
If you have a 4 bytes length descriptor you should always read at least 4 bytes, because the sender should have written this bytes in every message your server is receiving. If you can't get them, maybe there has been a problem in transmission. I really can't understand your problem.
Anyway I'll suggest to you not to use any separator chunk.
Put an header at data blocks you are transmitting and use a buffer to reconstruct the packet flow.
You must at least read the header of a packet to determine its length.
You can define a basic structure for a packet:
struct packet{
uint32 id;
char payload[MAX_PAYLOAD_SIZE];
};
The you read data from socket storing them into a buffer:
struct packet buffer;
Then you can read the data from the socket:
int n;
n = read(newsocket, &buffer, sizeof(uint32) + MAX_PAYLOAD_SIZE);
read returns the number of bytes read. If you read exactly a packet from the sender, then n = id. Otherwise maybe you read more data (es. the sender sent to you more packets). If you are receiving a stream of data split into unit (represented by packet structures), then you may use an array of packet to store the complete packet received and a temporarily buffer to manage incoming fragments.
Something like:
struct packet buffer[MAX_PACKET_STORED];
char temp_buffer[MAX_PAYLOAD_SIZE + 4];
int n;
n = read(newsocket, &buffer, sizeof(uint32) + MAX_PAYLOAD_SIZE);
//here suppose have received a packet of 100 Byte payload + 32 bit of length + 100 Byte
//fragments of the next packet.
//then:
int first_pack_len, second_pack_len;
first_pack_len = *((uint32 *)&temp_buffer[0]); //retrieve packet length
memcpy(&packet_buffer[0], temp_buffer, first_pack_len + sizeof(uint32)) //store the first packet into the array
second_pack_data_available_in_buffer = n - (first_pack_len + sizeof(uint32)); //total bytes read minus length of the first packet read
second_pack_len = *((int *)&temp_buffer[first_pack_len + sizeof(uint32)]);
I hope to have been clear enough. But maybe I'm misunderstanding your question.
Pay attention also that if the 2 end-systems communicating could have different endiannes, so it's a better idea use htonl/ntohl function on length when sending/receving length value. But this is another issue)