Decoding delimited frames from byte arrays - sockets

I have frames that are delimited by bytes to start and stop the frame (they do not appear in the stream).
I read a chunk from disk or network socket, i then need to pass to a deserializer but only after I have de-framed the packet first.
Frames may span multiple chunks that have been read, note how frame 3 is split across array 1 and array 2.
Rather than reinvent the wheel for this common problem, do any github or similar projects exist?
I am investigating ReadOnlySequenceSegment<T> from https://www.codemag.com/article/1807051/Introducing-.NET-Core-2.1-Flagship-Types-Span-T-and-Memory-T and will post updates as I work out the requirements.
Update
Further to Stephen Cleary link (thank you!!) to https://github.com/davidfowl/TcpEcho/blob/master/src/Server/Program.cs I have the below.
My data is json, so unlike the original question the delimiter tokens will appear in the stream. Therefore I have to count the array delimitator and only declare a frame when i have found the outermost [ and ] characters.
The below code works, and less manual copies done (not sure if still done behind the scenes - code is quite neater using David Fowl approach).
However I am casting to array instead of using buffer.PositionOf((byte)'[') since I was unable to see how I could call the PositionOf with an offset applied (i.e. scan deeper into the frame past previously found delimiter tokens).
Am i using/butchering the library in a brute force way, or is the below good to go with the array cast?
class Program
{
static async Task Main(string[] args)
{
using var stream = File.Open(args[0], FileMode.Open);
var reader = PipeReader.Create(stream);
while (true)
{
ReadResult result = await reader.ReadAsync();
ReadOnlySequence<byte> buffer = result.Buffer;
while (TryDeframe(ref buffer, out ReadOnlySequence<byte> line))
{
// Process the line.
var str = System.Text.Encoding.UTF8.GetString(line.ToArray());
Console.WriteLine(str);
}
// Tell the PipeReader how much of the buffer has been consumed.
reader.AdvanceTo(buffer.Start, buffer.End);
// Stop reading if there's no more data coming.
if (result.IsCompleted)
{
break;
}
}
// Mark the PipeReader as complete.
await reader.CompleteAsync();
}
private static bool TryDeframe(ref ReadOnlySequence<byte> buffer, out ReadOnlySequence<byte> frame)
{
int frameCount = 0;
int start = -1;
int end = -1;
var bytes = buffer.ToArray();
for (var i = 0; i < bytes.Length; i++)
{
var b = bytes[i];
if (b == (byte)'[')
{
if (start == -1)
start = i;
frameCount++;
}
else if (b == (byte)']')
{
frameCount--;
if (frameCount == 0)
{
end = i;
break;
}
}
}
if (start == -1 || end == -1) // no frame found
{
frame = default;
return false;
}
frame = buffer.Slice(start, end+1);
buffer = buffer.Slice(frame.Length);
return true;
}
}

do any github or similar projects exist?
David Fowler has an echo server that uses Pipelines to implement delimited frames.

Related

HAL_UARTEx_RxEventCallback() circular DMA: What address is the data?

I'm using the HAL with an STM32F3xx, implementing UART receive with circular DMA. The data should continually be received into the huart->pRxBuffPtr buffer, overwriting old data as new data arrives, and the HAL_UARTEx_RxEventCallback() function gets called regularly to copy out the data before it gets overwritten. The HAL_UARTEx_RxEventCallback() function receives a size parameter and of course a pointer huart, but no direct indication of where in the huart->pRxBuffPtr the freshly arrived data was DMA'd to.
How do I know whereabouts in huart->pRxBuffPtr the freshly arrived data starts?
Thank you to Tom V for the hint. For future generations, here is the solution code - a function which returns true and gets the next available byte, otherwise returns false:
bool uart_receiveByte(uint8_t *pData) {
static size_t dmaTail = 0u;
bool isByteReceived = false;
// dmaHead is the next position in the buffer the DMA will write to.
// dmaTail is the next position in the buffer to be read from.
const size_t dmaHead = huart->RxXferSize - huart->hdmarx->Instance->CNDTR;
if (dmaTail != dmaHead) {
isByteReceived = true;
*pData = pRxDmaBuffer[dmaTail];
if (++dmaTail >= huart->RxXferSize) {
dmaTail = 0u;
}
}
return isByteReceived;
}

A flaw reported by Flawfinder, but I don't think it makes sense

The question is specific to a pattern that Flawfinder reports:
The snippet
unsigned char child_report;
...
auto readlen = read(pipefd[0], (void *) &child_report, sizeof(child_report));
if(readlen == -1 || readlen != sizeof(child_report)) {
_ret.failure = execute_result::PREIO ; // set some flags to report to the caller
close(pipefd[0]);
return _ret;
}
...
int sec_read = read(pipefd[0], (void *) &child_report, sizeof(child_report));
child_report = 0; // we are not using the read data at all
// we just want to know if the read is successful or not
if (sec_read != 0 && sec_read != -1) { // if success
_ret.failure = execute_result::EXEC; // it means that the child is not able to exec
close(pipefd[0]); // as we set the close-on-exec flag
return _ret; // and we do write after exec in the child
}
I turned out that Codacy (therefore flawfinder) reports such issues on both read:
Check buffer boundaries if used in a loop including recursive loops (CWE-120, CWE-20).
I don't understand.
There is no loop.
In the second case we are not using the read data at all
This is not typical C string, and we don't rely on the ending '\0'
Is there any flaw that I'm not aware of in the code?
I finally conclude this should be a false positive. I check Flawfinder's code and it seems that it is basically doing pattern matching.
https://github.com/david-a-wheeler/flawfinder/blob/293ca17d8212905c7788aca1df7837d4716bd456/flawfinder#L1057

Manatee.Trello Moving Cards

I'm writing a small application to manage Trello Boards in only a few aspects such as sorting Cards on a List, moving/copying Cards based on Due Date and/or Labels, archiving Lists on a regular basis and generating reports based on Labels, etc. As such, I've been putting together a facade around the Manatee.Trello library to simplify the interface for my services.
I've been getting comfortable with the library and things have been relatively smooth. However, I wrote an extension method on the Card class to move Cards within or between Lists, and another method that calls this extension method repeatedly to move all Cards from one List to another.
My issue is that when running the code on a couple of dummy lists with 7 cards in one, it completes without error, but at least one card doesn't actually get moved (though as many as 3 cards have failed to move). I can't tell if this is because I'm moving things too rapidly, or if I need to adjust the TrelloConfiguration.ChangeSubmissionTime, or what. I've tried playing around with delays but it doesn't help.
Here is my calling code:
public void MoveCardsBetweenLists(
string originListName,
string destinationListName,
string originBoardName,
string destinationBoardName = null)
{
var fromBoard = GetBoard(originBoardName); // returns a Manatee.Trello.Board
var toBoard = destinationBoardName == null
|| destinationBoardName.Equals(originBoardName, StringComparison.OrdinalIgnoreCase)
? fromBoard
: GetBoard(destinationBoardName);
var fromList = GetListFromBoard(originListName, fromBoard); // returns a Manatee.Trello.List from the specified Board
var toList = GetListFromBoard(destinationListName, toBoard);
for (int i = 0; i < fromList.Cards.Count(); i++)
{
fromList.Cards[i].Move(1, toList);
}
}
Here is my extension method on Manatee.Trello.Card:
public static void Move(this Card card, int position, List list = null)
{
if (list != null && list != card.List)
{
card.List = list;
}
card.Position = position;
}
I've created a test that replicates the functionality you want. Basically, I create 7 cards on my board, move them to another list, then delete them (just to maintain initial state).
private static void Run(System.Action action)
{
var serializer = new ManateeSerializer();
TrelloConfiguration.Serializer = serializer;
TrelloConfiguration.Deserializer = serializer;
TrelloConfiguration.JsonFactory = new ManateeFactory();
TrelloConfiguration.RestClientProvider = new WebApiClientProvider();
TrelloAuthorization.Default.AppKey = TrelloIds.AppKey;
TrelloAuthorization.Default.UserToken = TrelloIds.UserToken;
action();
TrelloProcessor.Flush();
}
#region http://stackoverflow.com/q/39926431/878701
private static void Move(Card card, int position, List list = null)
{
if (list != null && list != card.List)
{
card.List = list;
}
card.Position = position;
}
[TestMethod]
public void MovingCards()
{
Run(() =>
{
var list = new List(TrelloIds.ListId);
var cards = new List<Card>();
for (int i = 0; i < 10; i++)
{
cards.Add(list.Cards.Add("test card " + i));
}
var otherList = list.Board.Lists.Last();
for(var i = 0; i < cards.Count; i++)
{
Move(card, i, otherList);
}
foreach (var card in cards)
{
card.Delete();
}
});
}
#endregion
Quick question: Are you calling TrelloProcessor.Flush() before your execution ends? If you don't, then some changes will likely remain in the request processor queue when the application ends, so they'll never be sent. See my wiki page on processing requests for more information.
Also, I've noticed that you're using 1 as the position for each move. By doing this, you'll end up with an unreliable ordering. The position data that Trello uses is floating point. To position a card between two other cards, it simply takes the average of the other cards. In your case, (if the destination list is empty), I'd suggest sending in the indexer variable for the ordering. If the destination list isn't empty, you'll need to calculate a new position based on the other cards in the list (by the averaging method Trello uses).
Finally, I like the extension code you have. If you have ideas that you think would be useful to add to the library, please feel free to fork the GitHub repo and create a pull request.

order of execution of forked processes

#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/sem.h>
#include<sys/ipc.h>
int sem_id;
void update_file(int number)
{
struct sembuf sem_op;
FILE* file;
printf("Inside Update Process\n");
/* wait on the semaphore, unless it's value is non-negative. */
sem_op.sem_num = 0;
sem_op.sem_op = -1; /* <-- Amount by which the value of the semaphore is to be decreased */
sem_op.sem_flg = 0;
semop(sem_id, &sem_op, 1);
/* we "locked" the semaphore, and are assured exclusive access to file. */
/* manipulate the file in some way. for example, write a number into it. */
file = fopen("file.txt", "a+");
if (file) {
fprintf(file, " \n%d\n", number);
fclose(file);
}
/* finally, signal the semaphore - increase its value by one. */
sem_op.sem_num = 0;
sem_op.sem_op = 1;
sem_op.sem_flg = 0;
semop( sem_id, &sem_op, 1);
}
void write_file(char* contents)
{
printf("Inside Write Process\n");
struct sembuf sem_op;
sem_op.sem_num = 0;
sem_op.sem_op = -1;
sem_op.sem_flg = 0;
semop( sem_id, &sem_op, 1);
FILE *file = fopen("file.txt","w");
if(file)
{
fprintf(file,contents);
fclose(file);
}
sem_op.sem_num = 0;
sem_op.sem_op = 1;
sem_op.sem_flg = 0;
semop( sem_id, &sem_op, 1);
}
int main()
{
//key_t key = ftok("file.txt",'E');
sem_id = semget( IPC_PRIVATE, 1, 0600 | IPC_CREAT);
/*here 100 is any arbit number to be assigned as the key of the
semaphore,1 is the number of semaphores in the semaphore set, */
if(sem_id == -1)
{
perror("main : semget");
exit(1);
}
int rc = semctl( sem_id, 0, SETVAL, 1);
pid_t u = fork();
if(u == 0)
{
update_file(100);
exit(0);
}
else
{
wait();
}
pid_t w = fork();
if(w == 0)
{
write_file("Hello!!");
exit(0);
}
else
{
wait();
}
}
If I run the above code as a c code, the write_file() function is called after the update_file () function
Whereas if I run the same code as a c++ code, the order of execution is reverse... why is it so??
Just some suggestions, but it looks to me like it could be caused by a combination of things:
The wait() call is supposed to take a pointer argument (that can
be NULL). Compiler should have caught this, but you must be picking
up another definition somewhere that permits your syntax. You are
also missing an include for sys/wait.h. This might be why the
compiler isn't complaining as I'd expect it to.
Depending on your machine/OS configuration the fork'd process may
not get to run until after the parent yields. Assuming the "wait()"
you are calling isn't working the way we would be expecting, it is
possible for the parent to execute completely before the children
get to run.
Unfortunately, I wasn't able to duplicate the same temporal behavior. However, when I generated assembly files for each of the two cases (C & C++), I noticed that the C++ version is missing the "wait" system call, but the C version is as I would expect. To me, this suggests that somewhere in the C++ headers this special version without an argument is being #defined out of the code. This difference could be the reason behind the behavior you are seeing.
In a nutshell... add the #include, and change your wait calls to "wait(0)"

Bounded Buffers (Producer Consumer)

In the shared buffer memory problem , why is it that we can have at most (n-1) items in the buffer at the same time.
Where 'n' is the buffer's size .
Thanks!
In an OS development class in college, I had an adjunct teacher that claimed it was impossible to have a software-only solution that could use all N elements in the buffer.
I proved him wrong with something I decided to call the race track solution (inspired by the fact that I like to run track).
On a race track, you are not limited to a 400 meter race; a race can consist of more than one lap. What happens if two runners are neck and neck
in a race? How do you know whether they are tied, or whether one runner has lapped the other? The answer is simple: in a race, we don't monitor a runner's position
on the track; we monitor the distance each runner has traversed. Thus, when two runners are neck and neck, we can disambiguafy between a tie and when one runner has
lapped the other.
So, our algorithm has an N-element array, and manages a 2N race. We don't restart the producer/consumer's counter back to zero until they finish their respective 2N race.
We don't allow the producer to be more than one lap ahead of the consumer, and we don't allow the consumer to be ahead of the producer.
Actually, we only have to monitor the distance between the producer and consumer.
The code is as follows:
Item track[LAP];
int consIdx = 0;
int prodIdx = 0;
void consumer()
{ while(true)
{ int diff = abs(prodIdx - consIdx);
if(0 < diff) //If the consumer isn't tied
{ track[consIdx%LAP] = null;
consIdx = (consIdx + 1) % (2*LAP);
}
}
}
void producer()
{ while(true)
{ int diff = (prodIdx - consIdx);
if(diff < LAP) //If prod hasn't lapped cons
{ track[prodIdx%LAP] = Item(); //Advance on the 1-lap track.
prodIdx = (prodIdx + 1) % (2*LAP);//Advance in the 2-lap race.
}
}
}
It's been a while since I originally solved the problem, so this is according to my best recollection. Hopefully I didn't overlook any bugs.
Hope this helps!
Oops, here's a bug fix:
Item track[LAP];
int consIdx = 0;
int prodIdx = 0;
void consumer()
{ while(true)
{ int diff = prodIdx - consIdx; //When prodIdx wraps to 0 before consIdx,
diff = 0<=diff? diff: diff + (2*LAP); //think in 3 Laps until consIdx wraps to 0.
if(0 < diff) //If the consumer isn't tied
{ track[consIdx%LAP] = null;
consIdx = (consIdx + 1) % (2*LAP);
}
}
}
void producer()
{ while(true)
{ int diff = prodIdx - consIdx;
diff = 0<=diff? diff: diff + (2*LAP);
if(diff < LAP) //If prod hasn't lapped cons
{ track[prodIdx%LAP] = Item(); //Advance on the 1-lap track.
prodIdx = (prodIdx + 1) % (2*LAP);//Advance in the 2-lap race.
}
}
}
Well, theoretically a bounded buffer can hold elements upto its size. But what you are saying could be related to certain implementation quirks like a clean way of figuring out when the buffer is empty/full. This question -> Empty element in array-based bounded buffer deals with a similar thing. See if it helps.
However you can of course have implementations that have all n slots filled up. That's how the bounded buffer problem is defined anyway.