Binary files sent over socket are corrupted - sockets

I'm doing a code that will transfer files between two computers. I'm using tcp socket for the connection. The thing is I need to attach sort of headers to the file bytes that I'm sending so the receiveer know that what I'm sending is part of a file. Let's say my header is data. The string I'll send will be: data <file bytes>.
I'm able to send them and the receiver is able to receive them but the file seems corrupted. Though for unformatted text files it works well but for other files it doesn't seem to parse the file efficiently.
while(1){
fp = (char*) malloc (56);
rc = recv(connfd,fp,55,0);
if(strcmp(fp,"stop") == 0){
break;
}
fp = fp + 5; //I do this to skip the 'data<space>" header
wr = write(fd,pf2,rc-5);
tot = tot + wr;
printf("Received a total of %d bytes rc = %d \n",tot, rc);
}
But I've tried sending the file without the header and I get the file uncorrupted but I need to use those 'data' headers for this particular code. What am I doing wrong?

fp = fp + 5; //I do this to skip the 'data<space>" header
But you don't receive the data<space> header in every receive() call. You have to keep a buffer to which you add all data you receive, until you encounter another "data<space>".
Please note though that separators are generally a bad idea. What if you send a file that has the string "data<space>" in it? Your client will assume that after that, a new file will be sent, while in fact you're still receiving the original file.
Try to send some kind of message-length-header, for example an uint32, which occupies four bytes before each file you send. You can then read the first four bytes and then you know how many more bytes you can expect for that file.

Related

Invalid field in source data: 0 TCP_Message

I'm using ProtoBuf-Net for serialize and deserialize TCP_Messages.
I've tried all the suggestions I've found here, so I really don't know where the mistake is.
The serialize is made server side, and the deserialize is made on an application client-side.
Serialize code:
public void MssGetCardPersonalInfo(out RCPersonalInfoRecord ssPersonalInfoObject, out bool ssResult) {
ssPersonalInfoObject = new RCPersonalInfoRecord(null);
TCP_Message msg = new TCP_Message(MessageTypes.GetCardPersonalInfo);
MemoryStream ms = new MemoryStream();
ProtoBuf.Serializer.Serialize(ms, msg);
_tcp_Client.Send(ms.ToArray());
_waitToReadCard.Start();
_stopWaitHandle.WaitOne();
And the deserialize:
private void tpcServer_OnDataReceived(Object sender, byte[] data, TCPServer.StateObject clientState)
{
TCP_Message message = new TCP_Message();
MemoryStream ms = new MemoryStream(data);
try
{
//ms.ToArray();
//ms.GetBuffer();
//ms.Position = 0;
ms.Seek(0, SeekOrigin.Begin);
message = Serializer.Deserialize<TCP_Message>(ms);
} catch (Exception ex)
{
EventLog.WriteEntry(_logSource, "Error deserializing: " + ex.Message, EventLogEntryType.Error, 103);
}
As you can see, I've tried a bunch of different approache, now comented.
I have also tried to deserialize using the DeserializeWithLengthPrefix but it didn't work either.
I'm a bit noob on this, so if you could help me I would really appreciate it.
Thank's
The first thing to look at here is: is the data you receive the data you send. Until you can answer "yes" to that, all other questions are moot. It is very easy to confuse network code and end up reading partial frames, etc. As a crude debugger test:
Debug.WriteLine(Convert.ToBase64String(ms.GetBuffer(), 0, (int)ms.Length));
should work. If the two base-64 strings are not identical, then you aren't working with the same data. This can be because of a range of reasons, including packet splitting and combining. You need to keep in mind that in a stream, what you send is not what you get - at least, not down to the fragment level. You might "send" data in the way of:
one bundle of 20 bytes
one bundle of 10 bytes
but at the receiving end, it would be entirely legitimate to read:
1 byte
22 bytes
7 bytes
All that TCP guarantees is the order and accuracy of the bytes. It says nothing about their breakdown in terms of chunks. When writing network code, there are basically 2 approaches:
have one thread that synchronously reads from a stream and local buffer (doesn't scale well)
async code (very scalable), but accept that you're going to have to do a lot of "do I have a complete frame? if not, append to an input buffer; if so, process any available frame data (could be multiple), then shuffle any incomplete data to the start of the buffer"

Using GSocketClient, how do I read incoming data without knowing how many incoming bytes there will be?

I am still struggling to be able to read incoming response messages from a piece of hardware my program is communicating with.
I am using a GSocketClient and am able to connect and successfully send messages using g_output_stream_write(). I then want to read the response sent back from the device, but I have no way of knowing how many bytes the reply will be in order to use g_input_stream_read(). I have also tried using g_input_stream_read_all(), but this seems to block the application and never return. I don't know how g_input_stream_read_all() determines that it has reached the end of a stream, but I assume the problem is somewhere there?
I know that there is incoming data because I can use g_input_stream_read() with a made-up byte size like 5 and I then see the first 5 incoming bytes, but the response size will always be different.
So my questions is, is there a way to determine how much data is waiting to be read so that I can plug that into g_input_stream_read() as a variable for the size to read? And if not, what is the correct usage of g_input_stream_read_all() to get it to not block like I am seeing it do?
Does something like the following work?
#define BUF_SIZE 1024
guint8 buffer[BUF_SIZE];
GByteArray *array = g_byte_array_new();
gsize bytes_read;
GError *error = NULL;
while (g_input_stream_read_all(istream, buffer, BUF_SIZE, &bytes_read, NULL, &error))
{
g_byte_array_append(array, buffer, bytes_read);
if (bytes_read < BUF_SIZE)
/* We've reached the end of the stream */
break;
}
if (error)
// error handling code

CocoaAsyncSocket and reading data from a socket

On my TCP-socket based server, I send a packets over the stream where packets consist of a header specifying the number of bytes in the packet, followed by that number of bytes. For those familiar with Erlang, I'm simply setting the {packet, 4} option. On the iOS side, I have code that looks like this, assuming I want to figure out the size of the stream for this message:
[asyncSocket readDataToLength:4 withTimeout:-1 tag:HEADER_TAG];
That works fine and the following delegate method callback is invoked:
onSocket:didReadData:withTag:
I figure the next logical step is to figure out the size of the stream, and I do that with:
UInt32 readLength;
[data getBytes:&readLength length:4];
readLength = ntohl(readLength);
After hard coding a string of 12 bytes on the server-side, readLength does indeed read 12 on the client also, so all is good so far. I proceed with the following:
[sock readDataToLength:readLength withTimeout:1 tag:MESSAGE_TAG];
At this point though the callback onSocket:didReadData:withTag: is no longer invoked. Instead timeouts on the read are occurring, probably because I didn't handle the read properly, this delegate method gets invoked:
- (NSTimeInterval)onSocket:(AsyncSocket *)sock shouldTimeoutReadWithTag:(long)tag elapsed:(NSTimeInterval)elapsed bytesDone:(NSUInteger)length
so in total, the server is sending 16 bytes, a 4 byte header and a 12 byte binary stream.
I'm confident that the error is on how I'm using CocoaAsyncSocket. What's the right way to go about reading the rest of the stream after I figure out its size?
** UPDATE **
I changed my client and it seems to be working now. The problem is, I don't understand the point of readDataToLength with the new solution. Here's what I changed my initial read to:
[socket readDataWithTimeout:-1 tag:HEADER_TAG];
Now in my callback, I just do the following:
- (void)onSocket:(AsyncSocket *)sock didReadData:(NSData *)data withTag:(long)tag {
if (tag == HEADER_TAG) {
UInt32 readLength;
[data getBytes:&readLength length:4];
readLength = ntohl(readLength);
int offset = 4;
NSRange range = NSMakeRange(offset, readLength);
char buffer[readLength];
[data getBytes:&buffer range:range];
NSLog(#"buffer %s", buffer);
//[sock readDataToLength:readLength withTimeout:1 tag:MESSAGE_TAG];
} else if (tag == MESSAGE_TAG) {
//[sock readDataToLength:4 withTimeout:1 tag:HEADER_TAG];
}
}
So everything is coming in as one, atomic payload. Perhaps this is because of the way Erlang {packet, 4} works. I hope it is. Otherwise, what's the point of readDataToLength? there's no way to know the length of a message in advance on the client, so what is a good use case to use that method in?
It depends on how you send from the Erlang side, I suppose. The option {packet, 4} will send each data packet with a 4-byte length prefixed to it. Each send operation in Erlang will result in one packet being sent with it's length prefixed (the max size for length 4, for example, is 2 Gb). The relevant part of the Erlang documentation is for setting the socket options using inet:setopts/2.
I'm guessing the data is the total accumulated data read from the socket so far. If that data contains your whole packet, it's fine. But if not, you might want to continue to do a blocked read from the socket using readDataToLength with the remaining data.

socket conversation terminator

While reading data in socket its important either keep a message terminator symbol or add the Packet size information at the begening of the message.
If a terminator symbol is used and a binary message is sent there is no guarantee that the terminator symbol would not appear in the middle of the message (unless some special encoding is used).
On the other hand if size information is attached. size information is unsigned and if one byte is used for it it cannot be used to transfer messages longer than 256 bytes. if 4 byte integer is used. its not even guaranteed that 4 bytes will come a s whole. just 2 bytes of the size information may come can assuming the size information has arrived it may use that 2 bytes and rest of the integer data will be discarded. waiting for 4 bytes to be available on read buffer may cause infinite awaiting if only 3 bytes are available on the buffer (e.g. if total buffer is 7 bytes or 4077 bytes long).
here comes two possible ways
sizeInfo separator chunk
read until the separator is found once found read until sizeInfo bytes passed
keep an unreadyBytes initialized at 4 upon receiving the sizeInfo change it accordingly
which one of these two is safer to use ? Please Criticize
Edit
My central question is how to make sure that the size bytes has arrived properly. assuming messages are of variable size.
its not even guaranteed that 4 bytes will come a s whole. just 2 bytes of the size information may come can assuming the size information has arrived it may use that 2 bytes and rest of the integer data will be discarded. waiting for 4 bytes to be available on read buffer may cause infinite awaiting if only 3 bytes are available on the buffer (e.g. if total buffer is 7 bytes or 4077 bytes long).
If you have a 4 bytes length descriptor you should always read at least 4 bytes, because the sender should have written this bytes in every message your server is receiving. If you can't get them, maybe there has been a problem in transmission. I really can't understand your problem.
Anyway I'll suggest to you not to use any separator chunk.
Put an header at data blocks you are transmitting and use a buffer to reconstruct the packet flow.
You must at least read the header of a packet to determine its length.
You can define a basic structure for a packet:
struct packet{
uint32 id;
char payload[MAX_PAYLOAD_SIZE];
};
The you read data from socket storing them into a buffer:
struct packet buffer;
Then you can read the data from the socket:
int n;
n = read(newsocket, &buffer, sizeof(uint32) + MAX_PAYLOAD_SIZE);
read returns the number of bytes read. If you read exactly a packet from the sender, then n = id. Otherwise maybe you read more data (es. the sender sent to you more packets). If you are receiving a stream of data split into unit (represented by packet structures), then you may use an array of packet to store the complete packet received and a temporarily buffer to manage incoming fragments.
Something like:
struct packet buffer[MAX_PACKET_STORED];
char temp_buffer[MAX_PAYLOAD_SIZE + 4];
int n;
n = read(newsocket, &buffer, sizeof(uint32) + MAX_PAYLOAD_SIZE);
//here suppose have received a packet of 100 Byte payload + 32 bit of length + 100 Byte
//fragments of the next packet.
//then:
int first_pack_len, second_pack_len;
first_pack_len = *((uint32 *)&temp_buffer[0]); //retrieve packet length
memcpy(&packet_buffer[0], temp_buffer, first_pack_len + sizeof(uint32)) //store the first packet into the array
second_pack_data_available_in_buffer = n - (first_pack_len + sizeof(uint32)); //total bytes read minus length of the first packet read
second_pack_len = *((int *)&temp_buffer[first_pack_len + sizeof(uint32)]);
I hope to have been clear enough. But maybe I'm misunderstanding your question.
Pay attention also that if the 2 end-systems communicating could have different endiannes, so it's a better idea use htonl/ntohl function on length when sending/receving length value. But this is another issue)

unix sockets: how to send really big data with one "send" call?

I'm using unix scoket for data transferring (SOCK_STREAM mode)
I need to send a string of more than 100k chars. Firstly, I send length of a string - it's sizeof(int) bytes.
length = strlen(s)
send(sd, length, sizeof(int))
Then I send the whole string
bytesSend = send(sd, s, length)
but for my surprise "bytesSend" is less than "length".
Note, that this works fine when I send not so big strings.
May be there exist some limitations for system call "send" that I've been missing ...
The send system call is supposed to be fast, because the program may have other things useful things to do. Certainly you do not want to wait for the data to be sent out and the other computer to send a reply - that would lead to terrible throughput.
So, all send really does is queues some data for sending and returns control to the program. The kernel could copy the entire message into kernel memory, but this would consume a lot of kernel memory (not good).
Instead, the kernel only queues as much of the message as is reasonable. It is the program's responsibility to re-attempt sending of the remaining data.
In your case, use a loop to send the data that did not get sent the first time.
while(length > 0) {
bytesSent = send(sd, s, length);
if (bytesSent == 0)
break; //socket probably closed
else if (bytesSent < 0)
break; //handle errors appropriately
s += bytesSent;
length -= bytesSent;
}
At the receiving end you will likely need to do the same thing.
Your initial send() call is wrong. You need to pass send() the address of the data, i.e.:
bytesSend = send(sd, &length, sizeof(int))
Also, this runs into some classical risks, with endianness, size of int on various platforms, et cetera.