Postgres 13.6, my background process, causes high CPU with warning "worker took too long to start; canceled" - postgresql

I am using PostgreSql & golang in my project
I have created a postgresql background worker using golang. This background worker listens to LISTEN/NOTIFY channel in postgres and writes the data it receives to a file.
The trigger(which supplies data to channel), background worker registration and the background worker's main function are all in one .so file. The core background process logic is in another helper .so file. The background process main loads the helper .so file using dlopen and executes the core logic from it.
Issue faced:
It works fine in windows, but in linux where I tried with postgres 13.6, there is a problems.
It runs as intended for a while, but after about 2-3hrs, postgresql postmaster's CPU utilization shoots up and seems to get stuck. The CPU shoot up is very sudden (less than 5 mins, till it starts shooting up it is normal).
I am not able to establish connection with postgres from psql client.
In the log the following message keeps repeating:
WARNING: worker took too long to start; canceled.
I tried commenting out various areas of my code, adding sleeps at different places of the core processing loop and even disabling the trigger but the issue occurs.
Softwares and libraries used:
go version go1.17.5 linux/amd64
Postgres 13.6, in Ubuntu
LibPq: https://github.com/lib/pq,
My core processing loop looks like this:
maxReconn := time.Minute
listener := pq.NewListener(<connectionstring>, minReconn, maxReconn, EventCallBackFn); // libpq used here.
defer listener.UnlistenAll();
if err = listener.Listen("mystream"); err != nil {
panic(err);
}
var itemsProcessedSinceLastSleep int = 0;
for {
select {
case signal := <-signalChan:
PgLog(PG_LOG, "Exiting loop due to termination signal : %d", signal);
return 1;
case pgstatus := <-pmStatusChan:
PgLog(PG_LOG, "Exiting loop as postmaster is not running : %d ", pgstatus);
return 1;
case data := <-listener.Notify:
itemsProcessedSinceLastSleep = itemsProcessedSinceLastSleep + 1;
if itemsProcessedSinceLastSleep >= 1000 {
time.Sleep(time.Millisecond * 10);
itemsProcessedSinceLastSleep = 0;
}
ProcessChangeReceivedAtStream(data); // This performs the data processing
case <-time.After(10 * time.Second):
time.Sleep(100 * time.Millisecond);
var cEpoch = time.Now().Unix();
if cEpoch - lastConnChkTime > 1800 {
lastConnChkTime = cEpoch;
if err := listener.Ping(); err!=nil {
PgLog(PG_LOG, "Seems to be a problem with connection")
}
}
default:
time.Sleep(time.Millisecond * 100);
}
}```

Related

fast insertion in loop -mongodb make Application not responding

Iam using qtcreator 5, Database mongodb(mongocxx) .
Below code is placed inside a loop ,loop works when there is data available in socket .Data reads fast from socket, when i put mongodb insertion code here it makes builded application not responding ,But the data is sending to mongodb without any problem.
Data can be see in Qt creater editor ..only the builded application cannot access (not responding ).
How to fix it????
code part:
in header file:
mongocxx::client& conn1 = get_client();
mongocxx::database db;
db = conn1["ASSET_TRACKING"];
in Loop:
if (k != -1)
{
// qDebug() << "value found at "<<k ;
qWarning("Got db conn");
switch(k)
{
case 0 :
{
std::vector<bsoncxx::document::value> documents;
documents.push_back(
bsoncxx::builder::stream::document{} <<"tag_ID"<< e.toStdString() <<"POSX"<<posx<<"POSY"<<posy<<"POSZ"<<posz<< finalize);
db["Tagdemo"].insert_many(documents);

Play sounds synchronously using snd_pcm_writei

I need to play sounds upon certain events, and want to minimize
processor load, because some image processing is being done too, and
processor performance is limited.
For the present, I play only one sound at a time, and I do it as
follows:
At program startup, sounds are read from .wav files
and the raw pcm data are loaded into memory
a sound device is opened (snd_pcm_open() in mode SND_PCM_NONBLOCK)
a worker thread is started which continously calls snd_pcm_writei()
as long as it is fed with data (data->remaining > 0).
Somewhat resumed, the worker thread function is
static void *Thread_Func (void *arg)
{
thrdata_t *data = (thrdata_t *)arg;
snd_pcm_sframes_t res;
while (1)
{ pthread_mutex_lock (&lock);
if (data->shall_stop)
{ data->shall_stop = false;
snd_pcm_drop (data->pcm_device);
snd_pcm_prepare (data->pcm_device);
data->remaining = 0;
}
if (data->remaining > 0)
{ res = snd_pcm_writei (data->pcm_device, data->bufptr, data->remaining);
if (res == -EAGAIN) continue;
if (res < 0) // error
{ fprintf (stderr, "snd_pcm_writeX() error: %s\n", snd_strerror(result));
snd_pcm_recover (data->sub_device, res);
}
else // another chunk has been handed over to sound hw
{ data->bufptr += res * bytes_per_frame;
data->remaining -= res;
}
if (data->remaining == 0) snd_pcm_prepare (data->pcm_device);
}
pthread_mutex_unlock (&lock);
usleep (sleep_us); // processor relief
}
} // Thread_Func
Ok, so this works well for one sound at a time. How do I play various?
I found dmix, but it seems a tool on user level, to mix streams coming
from separate programs.
Furthermore, I found the Simple Mixer Interface in the ALSA Project C
Library Interface, without any hint or example or tutorial about how
to use all these function described by one line of text each.
As a last resort I could calculate the mean value of all the buffers
to be played synchronously. So long I've been avoiding that, hoping
that an ALSA solution might use sound hardware resources, thus
relieving the main processor.
I'd be thankful for any hint about how to continue.

Workflow execution fails when a worker is restarted on the same workflow service client

We're in the process of writing a .NET Cadence client and ran into an issue while unit testing workflows. When we start a worker, execute a workflow, stop the worker, start it again, and then try and execute another workflow, the first workflow completes, but any workflow after the first hangs during the client.ExecuteWorkflow() call, eventually failing with a START_TO_CLOSE timeout. I replicated this behavior by munging the greetings cadence-samples workflow. See the loop in func main():
package main
import (
"context"
"time"
"go.uber.org/cadence/client"
"go.uber.org/cadence/worker"
"go.uber.org/zap"
"github.com/pborman/uuid"
"github.com/samarabbas/cadence-samples/cmd/samples/common"
)
// This needs to be done as part of a bootstrap step when the process starts.
// The workers are supposed to be long running.
func startWorkers(h *common.SampleHelper) worker.Worker {
// Configure worker options.
workerOptions := worker.Options{
MetricsScope: h.Scope,
Logger: h.Logger,
}
return h.StartWorkers(h.Config.DomainName, ApplicationName, workerOptions)
}
func startWorkflow(h *common.SampleHelper) client.WorkflowRun {
workflowOptions := client.StartWorkflowOptions{
ID: "greetings_" + uuid.New(),
TaskList: ApplicationName,
ExecutionStartToCloseTimeout: time.Minute,
DecisionTaskStartToCloseTimeout: time.Minute,
}
return h.StartWorkflow(workflowOptions, SampleGreetingsWorkflow)
}
func main() {
// setup the SampleHelper
var h common.SampleHelper
h.SetupServiceConfig()
// Loop:
// - start a worker
// - start a workflow
// - block and wait for workflow result
// - stop the worker
for i := 0; i < 3; i++ {
// start the worker
// execute the workflow
workflowWorker := startWorkers(&h)
workflowRun := startWorkflow(&h)
// create context
// get workflow result
var result string
ctx, cancel := context.WithCancel(context.Background())
err := workflowRun.Get(ctx, &result)
if err != nil {
panic(err)
}
// log the result
h.Logger.Info("Workflow Completed", zap.String("Result", result))
// stop the worker
// cancel the context
workflowWorker.Stop()
cancel()
}
}
This is not a blocking issue and will probably not come up in production.
Background:
We (Jeff Lill and I) noticed this issue during unit testing workflows in our .NET Cadence client. When we run our workflow tests individually they all pass, but when we run multiple at a time (sequentially, not in parallel), we see the behavior described above. This is because of the cleanup done in the .NET Cadence client dispose() method called after a test completes (pass or fail). One of the dispose behaviors is to stop workers created during a test. When the next test runs, new workers are created using the same workflow service client, and this is where the issue arises.

Errors when many clients connect to Go server

full code could download at https://groups.google.com/forum/#!topic/golang-nuts/e1Ir__Dq_gE
Could anyone help me to improve this sample code to zero bug?
I think it will help us to develop a bug free client / server code.
my develop steps:
Create a server which could handle multiple connections by goroutine.
Build a client which works fine with simple protocol.
Expand the client to simulate multiple clients (with option -n=1000 clients as default)
TODO: try to reduce lock of server
TODO: try to use bufio to enhance throughput
I found this code is very unstable contains with three problems:
launch 1000 clients, one of them occurs a EOF when reading from server.
launch 1050 clients, got too many open files soon (No any clients opened).
launch 1020 clients, got runtime error with long trace stacks.
Start pollServer: pipe: too many open files
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x28 pc=0x4650d0]
Here I paste my more simplified code.
const ClientCount = 1000
func main() {
srvAddr := "127.0.0.1:10000"
var wg sync.WaitGroup
wg.Add(ClientCount)
for i := 0; i < ClientCount; i++ {
go func(i int) {
client(i, srvAddr)
wg.Done()
}(i)
}
wg.Wait()
}
func client(i int, srvAddr string) {
conn, e := net.Dial("tcp", srvAddr)
if e != nil {
log.Fatalln("Err:Dial():", e)
}
defer conn.Close()
conn.SetTimeout(proto.LINK_TIMEOUT_NS)
defer func() {
conn.Close()
}()
l1 := proto.L1{uint32(i), uint16(rand.Uint32() % 10000)}
log.Println(conn.LocalAddr(), "WL1", l1)
e = binary.Write(conn, binary.BigEndian, &l1)
if e == os.EOF {
return
}
if e != nil {
return
}
// ...
}
This answer on serverfault [1] suggests that for servers that can handle a lot of connections, setting a higher ulimit is the thing to do. Also check for application leaks of memory or file descriptor leaks using lsof.
ulimit -n 99999
[1] https://serverfault.com/a/48820/110909

Why would alSourceUnqueueBuffers fail with INVALID_OPERATION

Here's the code:
ALint cProcessedBuffers = 0;
ALenum alError = AL_NO_ERROR;
alGetSourcei(m_OpenALSourceId, AL_BUFFERS_PROCESSED, &cProcessedBuffers);
if((alError = alGetError()) != AL_NO_ERROR)
{
throw "AudioClip::ProcessPlayedBuffers - error returned from alGetSroucei()";
}
alError = AL_NO_ERROR;
if (cProcessedBuffers > 0)
{
alSourceUnqueueBuffers(m_OpenALSourceId, cProcessedBuffers, arrBuffers);
if((alError = alGetError()) != AL_NO_ERROR)
{
throw "AudioClip::ProcessPlayedBuffers - error returned from alSourceUnqueueBuffers()";
}
}
The call to alGetSourcei returns with cProcessedBuffers > 0, but the following call to alSourceUnqueueBuffers fails with an INVALID_OPERATION. This in an erratic error that does not always occur. The program containing this sample code is a single-threaded app running in a tight loop (typically would be sync'ed with a display loop, but in this case I'm not using a timed callback of any sort).
Try alSourceStop(m_OpenALSourceId) first.
Then alUnqueueBuffers(), and after that, Restart playing by alSourcePlay(m_OpenALSourceId).
I solved the same problem by this way. But I don't know why have to do so in
Mentioned in this SO thread,
If you have AL_LOOPING enabled on a streaming source the unqueue operation will fail.
The looping flag has some sort of lock on the buffers when enabled. The answer by #MyMiracle hints at this as well, stopping the sound releases that hold, but it's not necessary..
AL_LOOPING is not meant to be set on a streaming source, as you manage the source data in the queue. Keep queuing, it will keep playing. Queue from the beginning of the data, it will loop.