Cloning / Copying / Duplicating Streams in Lazarus - copy

I developed a procedure that receives a TStream; but the basic type, to allow the sending of all the types of stream heirs.
This procedure is intended to create one thread to each core, or multiple threads. Each thread will perform detailed analysis of stream data (read-only), and as Pascal classes are assigned by reference, and never by value, there will be a collision of threads, since the reading position is intercalará.
To fix this, I want the procedure do all the work to double the last TStream in memory, allocating it a new variable. This way I can duplicate the TStream in sufficient numbers so that each thread has a unique TStream. After the end of the very thread library memory.
Note: the procedure is within a DLL, the thread works.
Note 2: The goal is that the procedure to do all the necessary service, ie without the intervention of code that calls; You could easily pass an Array of TStream, rather than just a TStream. But I do not want it! The aim is that the service is provided entirely by the procedure.
Do you have any idea how to do this?
Thank you.
Addition:
I had a low-level idea, but my knowledge in Pascal is limited.
Identify the object's address in memory, and its size.
create a new address in memory with the same size as the original object.
copy the entire contents (raw) object to this new address.
I create a pointer to TStream that point to this new address in memory.
This would work, or is stupid?? If yes, how to operate? Example Please!
2º Addition:
Just as an example, suppose the program perform brute force attacks on encrypted streams (just an example, because it is not applicable):
Scene: A 30GB file in a CPU with 8 cores:
1º - TMemoryStream:
Create 8 TMemoryStream and copy the entire contents of the file for each of TMemoryStreams. This will result in 240GB RAM in use simultaneously. I consider this broken idea. In addition it would increase the processing time to the point of fastest not use multithreading. I would have to read the entire file into memory, and then loaded, begin to analyze it. Broke!
 * A bad alternative to TMemoryStream is to copy the file slowly to TMemoryStream in lots of 100MB / core (800MB), not to occupy the memory. So each thread looks only 100MB, frees the memory until you complete the entire file. But the problem is that it would require Synchronize() function in DLL, which we know does not work out as I open question in Synchronize () DLL freezes without errors and crashes
2º - TFileStream:
This is worse in my opinion. See, I get a TStream, create 8 TFileStream and copy all the 30GB for each TFileStream. That sucks because occupy 240GB on disk, which is a high value, even to HDD. The read and write time (copy) in HD will make the implementation of multithreaded turns out to be more time consuming than a single thread. Broke!
Conclusion: The two approaches above require use synchronize() to queue each thread to read the file. Therefore, the threads are not operating simultaneously, even on a multicore CPU. I know that even if he could simultaneous access to the file (directly creating several TFileStream), the operating system still enfileiraria threads to read the file one at a time, because the HDD is not truly thread-safe, he can not read two data at the same time . This is a physical limitation of the HDD! However, the queuing management of OS is much more effective and decrease the latent bottleneck efficiently, unlike if I implement manually synchronize(). This justifies my idea to clone TStream, would leave with S.O. all the working to manage file access queue; without any intervention - and I know he will do it better than me.
Example
In the above example, I want 8 Threads analyze differently and simultaneously the same Stream, knowing that the threads do not know what kind of Stream provided, it can be a file Stream, a stream from the Internet, or even a small TStringStream . The main program will create only one Strean, and will with configuration parameters. A simple example:
TModeForceBrute = (M1, M2, M3, M4, M5...)
TModesFB = set of TModeForceBrute;
TService = record
  stream: TStream;
  modes: array of TModesFB;
end;
For example, it should be possible to analyze only the Stream M1, M2 only, or both [M1, M2]. The TModesFB composition changes the way the stream is analyzed.
Each item in the array "modes", which functions as a task list, will be processed by a different thread. An example of a task list (JSON representation):
{
  Stream: MyTstream,
  modes: [
    [M1, m5],
    [M1],
    [M5, m2],
    [M5, m2, m4, m3],
    [M1, m1, m3]
  ]
}
Note: In analyzer [m1] + [m2] <> [m1, m2].
In Program:
function analysis(Task: TService; maxCores: integer): TMyResultType; external 'mydll.dll';
In DLL:
// Basic, simple and fasted Exemple! May contain syntax errors or logical.
function analysis(Task: TService; maxCores: integer): TMyResultType;
var
i, processors : integer;
begin
processors := getCPUCount();
if (maxCores < processors) and (maxCores > 0) then
processors := maxCores;
setlength (globalThreads, processors);
for i := 0 to processors - 1 do
// It is obvious that the counter modes in the original is not the same counter processors.
if i < length(Task.modes) then begin
globalThreads[i] := TAnalusysThread.create(true, Task.stream, Task.modes[i])
globalThreads[i].start();
end;
[...]
end;
Note: With a single thread the program works beautifully, with no known errors.
I want each thread to take care of a type of analysis, and I can not use Synchronize() in DLL. Understand? There is adequate and clean solution?

Cloning a stream is code like this:
streamdest:=TMemoryStream.create;
streamsrc.position:=0;
streamdest.copyfrom(streamdest);
streamsrc.position:=0;
streamdest.position:=0;
However doing things over DLL borders is hard, since the DLL has an own copy of libraries and library state. This is currently not recommended.

I'm answering my question, because I figured that no one had a really good solution. Perhaps because there is none!
So I adapted the idea of Marco van de Voort and Ken White, for a solution that works using TMemoryStream with partial load in memory batch 50MB, using TRTLCriticalSection for synchronization.
The solution also contains the same drawbacks mentioned in addition 2; are they:
Queuing access to HDD is the responsibility of my program and not of the operating system;
A single thread carries twice the same data in memory.
Depending on the processor speed, it may be that the thread analyze well the fast 50MB of memory; On the other hand, to load memory can be very slow. That would make the use of multiple threads are run sequentially, losing the advantage of using multithreaded, because every thread are congested access to the file, running sequentially as if they were a single thread.
So I consider this solution a dirty solution. But for now it works!
Below I give a simple example. This means that this adaptation may contain obvious errors of logic and / or syntax. But it is enough to demonstrate.
Using the same example of the issue, instead of passing a current to the "analysis" is passed a pointer to the process. This procedure is responsible for making the reading of the stream batch 50MB in sync.
Both DLL and Program:
TLotLoadStream = function (var toStm: TMemoryStream; lot, id: integer): int64 of object;
TModeForceBrute = (M1, M2, M3, M4, M5...)
TModesFB = set of TModeForceBrute;
TaskTService = record
reader: TLotLoadStream; {changes here <<<<<<< }
modes: array of TModesFB;
end;
In Program:
type
{ another code here }
TForm1 = class(TForm)
{ another code here }
CS : TRTLCriticalSection;
stream: TFileStream;
function MyReader(var toStm: TMemoryStream; lot: integer): int64 of object;
{ another code here }
end;
function analysis(Task: TService; maxCores: integer): TMyResultType; external 'mydll.dll';
{ another code here }
implementation
{ another code here }
function TForm1.MyReader(var toStm: TMemoryStream; lot: integer): int64 of object;
const
lotSize = (1024*1024) * 50; // 50MB
var
ler: int64;
begin
result := -1;
{
MUST BE PERFORMED PREVIOUSLY - FOR EXAMPLE IN TForm1.create()
InitCriticalSection (self.CriticalSection);
}
toStm.Clear;
ler := 0;
{ ENTERING IN CRITICAL SESSION }
EnterCriticalSection(self.CS);
{ POSITIONING IN LOT OF BEGIN}
self.streamSeek(lot * lotSize, soBeginning);
if (lot = 0) and (lotSize >= self.stream.size) then
ler := self.stream.size
else
if self.stream.Size >= (lotSize + (lot * lotSize)) THEN
ler := lotSize
else
ler := (self.stream.Size) - self.stream.Position; // stream inicia em 0?
{ COPYNG }
if (ler > 0) then
toStm.CopyFrom(self.stream, ler);
{ LEAVING THE CRITICAL SECTION }
LeaveCriticalSection(self.CS);
result := ler;
end;
In DLL:
{ another code here }
// Basic, simple and fasted Exemple! May contain syntax errors or logical.
function analysis(Task: TService; maxCores: integer): TMyResultType;
var
i, processors : integer;
begin
processors := getCPUCount();
if (maxCores < processors) and (maxCores > 0) then
processors := maxCores;
setlength (globalThreads, processors);
for i := 0 to processors - 1 do
// It is obvious that the counter modes in the original is not the same counter processors.
if i < length(Task.modes) then begin
globalThreads[i] := TAnalusysThread.create(true, Task.reader, Task.modes[i])
globalThreads[i].start();
end;
{ another code here }
end;
In DLL Thread Class:
type
{ another code here }
MyThreadAnalysis = class(TThread)
{ another code here }
reader: TLotLoadStream;
procedure Execute;
{ another code here }
end;
{ another code here }
implementation
{ another code here }
procedure MyThreadAnalysis.Execute;
var
Stream: TMemoryStream;
lot: integer;
{My analyzer already all written using buff, the job of rewriting it is too large, then it is so, two readings, two loads in memory, as I already mentioned in the question!}
buf: array[1..$F000] of byte; // 60K
begin
lot := 0;
Stream := TMemoryStream.Create;
self.reader(stream, lot);
while (assigned(Stream)) and (Stream <> nil) and (Stream.Size > 0) then begin
Stream.Seek(0, soBeginning);
{ 2º loading to memory buf }
while (Stream.Position < Stream.Size) do begin
n := Stream.read(buf, sizeof(buf));
{ MY CODE HERE }
end;
inc(lot);
self.reader(stream, lot, integer(Pchar(name)));
end;
end;
So as seen this is a stopgap solution. I still hope to find a clean solution that allows me to double the flow controller in such a way that access to data is the operating system's responsibility and not my program.

Related

How to understand the beat in chisel language?

I am studying the design of riscv-Boom. It is designed with chisel. When I read its source code, I always cannot understand when the signals in these circuits will be obtained. Normal programs are executed one by one. But hardware languages like chisel are not.
https://github.com/riscv-boom/riscv-boom/blob/master/src/main/scala/exu/issue-units/issue-slot.scala
For example, the link above is the source code of Issue-slot in Boom.
103: val next_uop = Mux(io.in_uop.valid, io.in_uop.bits, slot_uop)
113: state := io.in_uop.bits.iw_state
126: next_state := state
127: next_uopc := slot_uop.uopc
128: next_lrs1_rtype := slot_uop.lrs1_rtype
129: next_lrs2_rtype := slot_uop.lrs2_rtype
155 slot_uop := io.in_uop.bits
208: for loop
This is some code in the IssueSlot class in the link above. For the chisel hardware language, these := should mean that they are connected together by wires. So do these signals change at the same time? For example, when io.in_uop.valid is true, does the above code assign values at the same time?
For example, the current uop is fmul, and in_uop= is fadd. When io.in_uop.valid, the above code will be executed at the same time. But there is a problem.
origin
uop = fmux
when io.in_uop.valid
103: val next_uop = io.in_uop.bits (fadd uop)
113: state := io.in_uop.bits.iw_state (fadd state)
126: next_state := state (fmux state)
127: next_uopc := slot_uop.uopc (fmux uopc)
128: next_lrs1_rtype := slot_uop.lrs1_rtype (fmux lrs1_rtype )
129: next_lrs2_rtype := slot_uop.lrs2_rtype (fmux lrs2_rtype )
155 slot_uop := io.in_uop.bits (faddlrs2_rtype )
208: for loop
When io.in_uop.valid is true, the transmit slot at this time will be input fadd information. At the same time, the original fmul information will still be output to next-related signals. This should be unreasonable. Where does the problem occur?
For the for loop of line 207. I still find it difficult to understand. Will the for loop be executed in one beat? For example, if I use a for loop to traverse a queue, when did the for loop finish?
If anyone is willing to answer me, I would be extremely grateful!
first
The a := b expression mean that b is connected to a yes. b is the source and a the sink.
If a new connection to a is done after in the code, it will be replaced.
In the following code, a is connected to c:
a := b
a := c
This writing could be strange but it's usefull to set a default value in conditionnal branch.
For example :
a := b
when(z === true.B){
a := c
}
By default, a will be connected to b except when z is true.
second
Do not forget that Chisel is an HDL generator. It generate hardware code, some keywords are pure Scala like if, for, ... and other are hardware chisel keyword like when, Mux, ...
Then the for loop in line 208 is executed at the code generation stage. And it will generate some hardware chisel when mux code.
I'd highly recommend you spend some time with the Chisel Bootcamp. It can really help you grasp the generator aspects of chisel.

FireDAC Query RecordCountMode

I am trying to configure a FireDAC TFDQuery component so it fetches records on demand in batches of no more than 500, but I need it to report back what is the total record count for the query not just the number of fetched records. The FetchOptions are configured as follows:
FetchOptions.AssignedValues = [evMode, evRowsetSize, evRecordCountMode, evCursorKind, evAutoFetchAll]
FetchOptions.CursorKind = ckForwardOnly
FetchOptions.AutoFetchAll = afTruncate
FetchOptions.RecordCountMode = cmTotal
FetchOptions.RowSetSize = 500
This immediately returns all records in the table not just 500. I have tried setting RecsMax to 500, which works in limiting the fetched records, but RecordCount for the query shows only 500 not the total.
The FireDAC help file states that setting RecordCountMode to `cmTotal' causes FireDAC to issue
SELECT COUNT(*) FROM (original SQL command text).
Either there is a bug or I am doing something wrong!
I cannot see what other properties I can change. I am confused as to the relationship between RowSetSize and RecsMax and din't find the help file clarified.
I have tried playing with the properties of AutoFetchAll (Again confused as to this properties' purpose), but noting that is was set to afAll I set it to afTruncate to see if that would make a difference, but it didn't.
I have tested FetchOptions' fmOnDemand Mode with a FDTable component and a FDQuery component. Both with identical settings for FetchOptions ie RowSetSize=50. 425,000 rows in the dataset fetched over a network server.
FDTable performs as expected. It loads just 50 tuples and does so almost instantly. When pressing Ctrl+End to get to the end of the DBGrid display, it loads just 100 tuples. Whilst scrolling it rarely loads more than 100 tuples. Impact on memory negligible. But it is slow in scrolling.
FDQuery loads 50 tuples, but takes around 35 seconds to do so and consumes over 0.5GB of memory in the process. If you press Ctrl+Home to move to the end of the connected DBGrid it does so virtually instantly and in the process loads the entire table and consumes a further 700MB of memory.
I also experimented with CachedUpdates. The results above where with CachedUpdates off. When on, there was no impact at all on the performance of FDQuery (still poor), but for FDTable it resulted in it loading the entire table at start up, taking over half a minute to do so and consuming 1.2GBs of memory.
It looks like fmOnDemand mode is only practically usable with FDTable with CachedUpdates off and is not suitable for use with FDQuery at all.
The results of my tests using fmOnDemand with postgreSQL and MySQL are basically the same. With FDTable fmOnDemand only downloads what it needs limited to the RowSetSize. With a RowSetSize of 50 it initially downloads 50 tuples and no matter where you scroll to it never downloads more than 111 tuples (though doubtless that is dependent on the size of the connected DBGrid. If you disconnect the FDTable from a data source it initially downloads 50 tuples and if you then navigate to any record in the underlying table it downloads one tuple only and discards all other data.
FDQuery in fmOnDemand downloads only the initial 50 tuples when opened, but if you navigate by RecNo it downloads every tuple in between. I had rather hoped it would use LIMIT and OFFSET commands to get only records that were being requested.
To recreate the test for PostGre you need the following FireDAC components:
object FDConnectionPG: TFDConnection
Params.Strings = (
'Password='
'Server='
'Port='
'DriverID=PG')
ResourceOptions.AssignedValues = [rvAutoReconnect]
ResourceOptions.AutoReconnect = True
end
object FDQueryPG: TFDQuery
Connection = FDConnectionPG
FetchOptions.AssignedValues = [evMode, evRowsetSize]
end
object FDTable1: TFDTable
CachedUpdates = True
Connection = FDConnectionPG
FetchOptions.AssignedValues = [evMode, evRowsetSize, evRecordCountMode]
FetchOptions.RecordCountMode = cmFetched
end
If you wish to recreate it with MYSQL, you will basically need the same FireDAC components, but the FDConnectionneeds to be set as follows:
object FDConnectionMySql: TFDConnection
Params.Strings = (
'DriverID=MySQL'
'ResultMode=Use')
ResourceOptions.AssignedValues = [rvAutoReconnect]
ResourceOptions.AutoReconnect = True
end
You'll need an edit box, two buttons, a checkbox, a timer and a label and the following code:
procedure TfrmMain.Button1Click(Sender: TObject);
begin
if not FDQueryPG.IsEmpty then
begin
FDQueryPG.EmptyDataSet;
FDQueryPG.ClearDetails;
FDQueryPG.Close;
end;
if not FDTable1.IsEmpty then
begin
FDTAble1.EmptyDataSet;
FDTable1.ClearDetails;
FDTable1.Close;
end;
lFetched.Caption := 'Fetched 0';
lFetched.Update;
if cbTable.checked then
begin
FDTable1.TableName := '[TABLENAME]';
FDTable1.Open();
lFetched.Caption := 'Fetched '+ FDTable1.Table.Rows.Count.ToString;
end
else
begin
FDQueryPG.SQL.Text := 'Select * from [TABLENAME]';
FDQueryPG.open;
lFetched.Caption := 'Fetched '+ FDQueryPG.Table.Rows.Count.ToString;
end;
timer1.Enabled:=true;
end;
procedure TfrmMain.Button2Click(Sender: TObject);
begin
if cbTable.Checked then
FDTable1.RecNo := strToInt(Edit1.Text)
else
FDQueryPG.RecNo := strToInt(Edit1.Text);
end;
procedure TfrmMain.cbTableClick(Sender: TObject);
begin
timer1.Enabled := False;
end;
procedure TfrmMain.Timer1Timer(Sender: TObject);
begin
if cbTable.checked then
lFetched.Caption := 'Fetched '+ FDTable1.Table.Rows.Count.ToString
else
lFetched.Caption:='Fetched '+FDQueryPG.Table.Rows.Count.ToString;
lFetched.Update;
end;

How to modify bit bash sequence for write delays and read delays of DUT?

I have a DUT were the writes takes 2 clock cycles and reads consume 2 clock cycles before it could actually happen, I use regmodel and tried using inbuilt sequence uvm_reg_bit_bash_seq but it seems that the writes and reads happens at 1 clock cycle delay, could anyone tell what is the effective way to model 2 clock cycle delays and verify this, so that the DUT delays are taken care.
Facing the following error now,
Writing a 1 in bit #0 of register "ral_pgm.DIFF_STAT_CORE1" with initial value 'h0000000000000000 yielded 'h0000000000000000 instead of 'h0000000000000001
I have found one way of doing it, took the existing uvm_reg_single_bit_bash_seq and modified by adding p_sequencer and added 2 clock cycle delays after write and read method calls as per the DUT latency, this helped me in fixing the issue as well added a get call after write method to avoid fetching old value after read operation.
...
`uvm_declare_p_sequencer(s_master_sequencer)
rg.write(status, val, UVM_FRONTDOOR, map, this);
wait_for_clock(2); // write latency
rg.read(status, val, UVM_FRONTDOOR, map, this);
wait_for_clock(2); // read latency
if (status != UVM_IS_OK) begin
`uvm_error("mtm_reg_bit_bash_seq", $sformatf("Status was %s when reading register \"%s\" through map \"%s\".",
status.name(), rg.get_full_name(), map.get_full_name()));
end
val = rg.get(); // newly added method call (get) to fetch value after read
...
task wait_for_clock( int m = 1 );
repeat ( m ) begin
#(posedge p_sequencer.vif.CLK);
end
endtask: wait_for_clock
...

Using burst_read/write with register model

I've a register space of 16 registers.
These are accessible through serial bus (single as well as burst).
I've UVM reg model defined for these registers.
However none of the reg model method supports burst transaction on bus.
As a workaround
I can declare memory model for same space and whenever I need burst access I use memory model but it seems redundant to declare 2 separate classes for same thing and this approach won't mirror register values correctly.
create a function which loops for number of bytes iterations and access registers one by one however this method doesn't create burst transaction on bus.
So I would like to know if there is a way to use burst_read and burst_write methods with register model. It would be nice if burst_read and burst_write support mirroring (current implementation doesn't support this) but if not I can use .predict and .set so its not big concern.
Or can I implement a method for register model easily to support burst operation.
I found this to help get you started:
http://forums.accellera.org/topic/716-uvm-register-model-burst-access/
The guy mentions using the optional 'extension' argument that read/write take. You could store the length of the burst length inside a container object (think int vs. Integer in Java) and then pass that as an argument when calling write() on the first register.
A rough sketch (not tested):
// inside your register sequence
uvm_queue #(int) container = new("container");
container.push_front(4);
start_reg.write(status, data, .extension(container));
// inside your adapter
function uvm_sequence_item reg2bus(const ref uvm_reg_bus_op rw);
int burst_len = 1;
uvm_reg_item reg_item = get_item();
uvm_queue #(int) extension;
if ($cast(extension, reg_item.extension))
burst_len = extension.pop_front();
// do the stuff here based on the burst length
// ...
endfunction
I've used uvm_queue because there isn't any trivial container object in UVM.
After combining opinions provided by Tudor and links in the discussion, here is what works for adding burst operation to reg model.
This implementation doesn't show all the code but only required part for adding burst operation, I've tested it for write and read operation with serial protocols (SPI / I2C). Register model values are updated correctly as well as RTL registers are updated.
Create a class to hold data and burst length:
class burst_class extends uvm_object;
`uvm_object_utils (....);
int burst_length;
byte data [$];
function new (string name);
super.new(name);
endfunction
endclass
Inside register sequence (for read don't initialize data)
burst_class obj;
obj = new ("burstInfo");
obj.burst_length = 4; // replace with actual length
obj.data.push_back (data1);
obj.data.push_back (data2);
obj.data.push_back (data3);
obj.data.push_back (data4);
start_reg.read (status,...., .extension(obj));
start_reg.write (status, ...., .extension (obj));
After successful operation data values should be written or collected in obj object
In adapter class (reg2bus is updated for write and bus2reg is updated for read)
All the information about transaction is available in reg2bus except data in case of read.
adapter class
uvm_reg_item start_reg;
int burst_length;
burst_class adapter_obj;
reg2bus implementation
start_reg = this.get_item;
adapter_obj = new ("adapter_obj");
if($cast (adapter_obj, start_reg.extension)) begin
if (adapter_obj != null) begin
burst_length = adapter_obj.burst_length;
end
else
burst_length = 1; /// so that current implementation of adapter still works
end
Update the size of transaction over here according to burst_length and assign data correctly.
As for read bus2reg needs to be updated
bus2reg implementation (Already has all control information since reg2bus is always executed before bus2reg, use the values captured in reg2bus)
According to burst_length only assign data to object passed though extension in this case adapter_obj

High CPU and Memory Consumption on using boost::asio async_read_some

I have made a server that reads data from client and I am using boost::asio async_read_some for reading data, and I have made one handler function and here _ioService->poll() will run event processing loop to execute ready handlers. In handler _handleAsyncReceive I am deallocating the buf that is assigned in receiveDataAsync. bufferSize is 500.
Code is as follows:
bool
TCPSocket::receiveDataAsync( unsigned int bufferSize )
{
char *buf = new char[bufferSize + 1];
try
{
_tcpSocket->async_read_some( boost::asio::buffer( (void*)buf, bufferSize ),
boost::bind(&TCPSocket::_handleAsyncReceive,
this,
buf,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred) );
_ioService->poll();
}
catch (std::exception& e)
{
LOG_ERROR("Error Receiving Data Asynchronously");
LOG_ERROR( e.what() );
delete [] buf;
return false;
}
//we dont delete buf here as it will be deleted by callback _handleAsyncReceive
return true;
}
void
TCPSocket::_handleAsyncReceive(char *buf, const boost::system::error_code& ec, size_t size)
{
if(ec)
{
LOG_ERROR ("Error occurred while sending data Asynchronously.");
LOG_ERROR ( ec.message() );
}
else if ( size > 0 )
{
buf[size] = '\0';
LOG_DEBUG("Deleting Buffer");
emit _asyncDataReceivedSignal( QString::fromLocal8Bit( buf ) );
}
delete [] buf;
}
Here the problem is buffer is allocated at much faster rate as compare to deallocation so memory usage will go high at exponential rate and at some point of time it will consume all the memory and system will be stuck. CPU usage will also be around 90%. How can I reduce the memory and CPU consumption?
You have a memory leak. io_service poll does not guarantee that it with dispatch your _handleAsyncReceive. It can dispatch other event (e.g an accept), so your memory at char *buf is lost. My guess you are calling receiveDataAsync from a loop, but its not necessary - leak will exist in any case (with different leak speed).
Its better if you follow asio examples and work with suggested patterns rather than make your own.
You might consider using a wrap around buffer, which is also called a circular buffer. Boost has a template circular buffer version available. You can read about it here. The idea behind it is that when it becomes full, it circles around to the beginning where it will store things. You can do the same thing with other structures or arrays as well. For example, I currently use a byte array for this purpose in my application.
The advantage of using a dedicated large circular buffer to hold your messages is that you don't have to worry about creating and deleting memory for each new message that comes in. This avoids fragmentation of memory, which could become a problem.
To determine an appropriate size of the circular buffer, you need to think about the maximum number of messages that can come in and are in some stage of being processed simultaneously; multiply that number by the average size of the messages and then multiply by a fudge factor of perhaps 1.5. The average message size for my application is under 100 bytes. My buffer size is 1 megabyte, which would allow for at least 10,000 messages to accumulate without it affecting the wrap around buffer. But, if more than 10,000 messages did accumulate without being completely processed, then the circular buffer would be unuseable and the program would have to be restarted. I have been thinking about reducing the size of the buffer because the system would probably be dead long before it hit the 10,000 message mark.
As PSIAlt suggest, consider following the Boost.Asio examples and build upon their patterns for asynchronous programming.
Nevertheless, I would suggest considering whether multiple read calls need to be queued onto the same socket. If the application only allows for a single read operation to be pending on the socket, then resources are reduced:
There is no longer the scenario where there are an excessive amount of handlers pending in the io_service.
A single buffer can be preallocated and reused for each read operation. For example, the following asynchronous call chain only requires a single buffer, and allows for the concurrent execution of starting an asynchronous read operation while the previous data is being emitted on the Qt signal, as QString performs deep-copies.
TCPSocket::start()
{
receiveDataAsync(...) --.
} |
.---------------'
| .-----------------------------------.
v v |
TCPSocket::receiveDataAsync(...) |
{ |
_tcpSocket->async_read_some(_buffer); --. |
} | |
.-------------------------------' |
v |
TCPSocket::_handleAsyncReceive(...) |
{ |
QString data = QString::fromLocal8Bit(_buffer); |
receiveDataAsync(...); --------------------------'
emit _asyncDataReceivedSignal(data);
}
...
tcp_socket.start();
io_service.run();
It is important to identify when and where the io_service's event loop will be serviced. Generally, applications are designed so that the io_service does not run out of work, and the processing thread is simply waiting for events to occur. Thus, it is fairly common to start setting up asynchronous chains, then process the io_service event loop at a much higher scope.
On the other hand, if it is determined that TCPSocket::receiveDataAsync() should process the event loop in a blocking manner, then consider using synchronous operations.