Perform long-polling from nodejs (possible memory leak) - facebook

I wrote a piece of code that is going to perform a request to Facebook.
Now i wrapped this code into a infinite loop which is going to send those requests every 10 seconds using timeouts.
Code:
var poll = function(socket, userProvider) {
var lastCallTime = new Date();
var polling = true;
// The stream itself, non blocking
function performPoll() {
var results = feed(function (err, data) {
lastCallTime = new Date();
// PROCESS DATA
// Check new posts
if (polling) {
setTimeout(performPoll, 1000 * 10);
}
});
};
// Start infinite loop
performPoll();
};
The feed(cb) is just going to call a request to Facebook requesting data, this works 100% and does what i want it to do, the only problem that i am having now is that this piece of code is keeping to increase my memory usage. After a few minutes it increased by 50MB already (From 50 -> 100).
Is there anybody that can help me identify the cause of this?

v8 does not collect memory immediately. If it stabilizes at 100mb, then it is to be expected. For more information, checkout nodejs setTimeout memory leak?
If you really, really want to clear the memory, use global.gc(). Read this blog about how to call garbage collector manually.

Related

Running Mirth Channel with API Requests to external server very slow to process

In this question Mirth HTTP POST request with Parameters using Javascript I used a semblance of the first answer. Code seen below.
I'm running this code for a file that has nearly 46,000 rows. Which equates to about 46,000 requests hitting our external server. I'm noting that Mirth is making requests to our API endpoint about 1.6 times per second. This is unusually slow, and I would like some help to understand whether this is something related to Mirth, or related to the code above. Can repeated Imports in a for loop cause slow downs? Or is there a specific Mirth setting that limits the number of requests sent?
Version of Mirth is 3.12.0
Started the process at 2:27 PM and it's expected to be finished by almost 8:41 PM tonight, that's ridiculously slow.
//Skip the first header row
for (i = 1; i < msg['row'].length(); i++) {
col1 = msg['row'][i]['column1'].toString();
col2...
...
//Insert into results if the file and sample aren't already present
InsertIntoDatabase()
}
function InsertIntoDatabase() {
with(JavaImporter(
org.apache.commons.io.IOUtils,
org.apache.http.client.methods.HttpPost,
org.apache.http.client.entity.UrlEncodedFormEntity,
org.apache.http.impl.client.HttpClients,
org.apache.http.message.BasicNameValuePair,
com.google.common.io.Closer)) {
var closer = Closer.create();
try {
var httpclient = closer.register(HttpClients.createDefault());
var httpPost = new HttpPost('http://<server_name>/InsertNewCorrection');
var postParameters = [
new BasicNameValuePair("col1", col1),
new BasicNameValuePair(...
...
];
httpPost.setEntity(new UrlEncodedFormEntity(postParameters, "UTF-8"));
httpPost.setHeader('Content-Type', 'application/x-www-form-urlencoded')
var response = closer.register(httpclient.execute(httpPost));
var is = closer.register(response.entity.content);
result = IOUtils.toString(is, 'UTF-8');
} finally {
closer.close();
}
}
return result;
}

process Swift DispatchQueue without affecting resource

I have a Swift DispatchQueue that receives data at 60fps.
However, depending on phones or amount of data received, the computation of those data becomes expensive to process at 60fps. In actuality, it is okay to process only half of them or as much as the computation resource allows.
let queue = DispatchQueue(label: "com.test.dataprocessing")
func processData(data: SomeData) {
queue.async {
// data processing
}
}
Does DispatchQueue somehow allow me to drop some data if a resource is limited? Currently, it is affecting the main UI of SceneKit. Or, is there something better than DispatchQueue for this type of task?
There are a couple of possible approaches:
The simple solution is to keep track of your own Bool as to whether your task is in progress or not, and when you have more data, only process it if there's not one already running:
private var inProgress = false
private var syncQueue = DispatchQueue(label: Bundle.main.bundleIdentifier! + ".sync.progress") // for reasons beyond the scope of this question, reader-writer with concurrent queue is not appropriate here
func processData(data: SomeData) {
let isAlreadyRunning = syncQueue.sync { () -> Bool in
if self.inProgress { return true }
self.inProgress = true
return false
}
if isAlreadyRunning { return }
processQueue.async {
defer {
self.syncQueue.async { self.inProgress = false }
}
// process `data`
}
}
All of that syncQueue stuff is to make sure that I have thread-safe access to the inProgress property. But don't get lost in those details; use whatever synchronization mechanism you want (e.g. a lock or whatever). All we want to make sure is that we have thread-safe access to the Bool status flag.
Focus on the basic idea, that we'll keep track of a Bool flag to know whether the processing queue is still tied up processing the prior set of SomeData. If it is busy, return immediately and don't process this new data. Otherwise, go ahead and process it.
While the above approach is conceptually simple, it won't offer great performance. For example, if your processing of data always takes 0.02 seconds (50 times per second) and your input data is coming in at a rate of 60 times per second, you'll end up getting 30 of them processed per second.
A more sophisticated approach is to use a GCD user data source, something that says "run the following closure when the destination queue is free". And the beauty of these dispatch user data sources is that it will coalesce them together. These data sources are useful for decoupling the speed of inputs from the processing of them.
So, you first create a data source that simply indicates what should be done when data comes in:
private var dataToProcess: SomeData?
private lazy var source = DispatchSource.makeUserDataAddSource(queue: processQueue)
func configure() {
source.setEventHandler() { [unowned self] in
guard let data = self.syncQueue.sync(execute: { self.dataToProcess }) else { return }
// process `data`
}
source.resume()
}
So, when there's data to process, we update our synchronized dataToProcess property and then tell the data source that there is something to process:
func processData(data: SomeData) {
syncQueue.async { self.dataToProcess = data }
source.add(data: 1)
}
Again, just like the previous example, we're using syncQueue to synchronize our access to some property across multiple threads. But this time we're synchronizing dataToProcess rather than the inProgress state variable we used in the first example. But the idea is the same, that we must be careful to synchronize our interation with a property across multiple threads.
Anyway, using this pattern with the above scenario (input coming in at 60 fps, whereas processing can only process 50 per second), the resulting performance much closer to the theoretical max of 50 fps (I got between 42 and 48 fps depending upon the queue priority), rather than 30 fps.
The latter process can conceivably lead to more frames (or whatever you're processing) to be processed per second and results in less idle time on the processing queue. The following image attempts to graphically illustrate how the two alternatives compare. In the former approach, you'll lose every other frame of data, whereas the latter approach will only lose a frame of data when two separate sets of input data came in prior to the processing queue becoming free and they were coalesced into a single call to the dispatch source.

Bulk operations in Mongoskin [duplicate]

I'm having trouble using Mongoskin to perform bulk inserting (MongoDB 2.6+) on Node.
var dbURI = urigoeshere;
var db = mongo.db(dbURI, {safe:true});
var bulk = db.collection('collection').initializeUnorderedBulkOp();
for (var i = 0; i < 200000; i++) {
bulk.insert({number: i}, function() {
console.log('bulk inserting: ', i);
});
}
bulk.execute(function(err, result) {
res.json('send response statement');
});
The above code gives the following warnings/errors:
(node) warning: possible EventEmitter memory leak detected. 51 listeners added. Use emitter.setMaxListeners() to increase limit.
TypeError: Object #<SkinClass> has no method 'execute'
(node) warning: possible EventEmitter memory leak detected. 51 listeners added. Use emitter.setMaxListeners() to increase limit.
TypeError: Object #<SkinClass> has no method 'execute'
Is it possible to use Mongoskin to perform unordered bulk operations? If so, what am I doing wrong?
You can do it but you need to change your calling conventions to do this as only the "callback" form will actually return a collection object from which the .initializeUnorderedBulkOp() method can be called. There are also some usage differences to how you think this works:
var dbURI = urigoeshere;
var db = mongo.db(dbURI, {safe:true});
db.collection('collection',function(err,collection) {
var bulk = collection.initializeUnorderedBulkOp();
count = 0;
for (var i = 0; i < 200000; i++) {
bulk.insert({number: i});
count++;
if ( count % 1000 == 0 )
bulk.execute(function(err,result) {
// maybe do something with results
bulk = collection.initializeUnorderedBulkOp(); // reset after execute
});
});
// If your loop was not a round divisor of 1000
if ( count % 1000 != 0 )
bulk.execute(function(err,result) {
// maybe do something here
});
});
So the actual "Bulk" methods themselves don't require callbacks and work exactly as shown in the documentation. The exeception is .execute() which actually sends the statements to the server.
While the driver will sort this out for you somewhat, it probably is not a great idea to queue up too many operations before calling execute. This basically builds up in memory, and though the driver will only send in batches of 1000 at a time ( this is a server limit as well as the complete batch being under 16MB ), you probably want a little more control here, at least to limit memory usage.
That is the point of the modulo tests as shown, but if memory for building the operations and a possibly really large response object are not a problem for you then you can just keep queuing up operations and call .execute() once.
The "response" is in the same format as given in the documentation for BulkWriteResult.

graphQL slow response and duplicate response

Does anyone have experienced about slow response from using graphQL ?
This is my code in resolver:
getActiveCaseWithActiveProcess(){
console.log ("getActiveCaseWithActiveProcess");
var result = [];
var activeElements = ActiveElements.find({
type:"signal",
$or:[
{signalRef:"start-process"},
{signalRef:"start-task"},
{signalRef:"close-case"}
]
},{limit:200}).fetch();
for (var AE of activeElements){
var checkAECount = ActiveElements.find({caseId:AE['caseId']}).count();
if (checkAECount <= 3){
console.log ('caseId: ' + AE['caseId']);
var checkExistInResult = result.filter(function (obj) {
return obj.caseId === AE['caseId'];
})[0];
if (checkExistInResult == null){
result.push({
caseId: AE['caseId'],
caseStart: AE['createdDate']
});
}
}
}
console.log("loaded successfully");
return result;
}
I have a huge data from my collection actually. Approximately 20000 records. However when I load this, the response is too slow and it can repeat to reload by itself which makes the response is even longer.
I20160812-04:07:25.968(0)? caseId: CASE-0000000284,
I20160812-04:07:26.890(0)? caseId: CASE-0000000285
I20160812-04:07:28.200(0)? caseId: CASE-0000000285
I20160812-04:07:28.214(0)? getActiveCaseWithActiveProcess
I20160812-04:07:28.219(0)? caseId: CASE-0000000194
I20160812-04:07:29.261(0)? caseId: CASE-0000000197
As you notice from my attachment above, at this time(20160812-04:07:28.214) the server repeats to load from the beginning again, and that's why the response will take longer.
This is not always happening. It happens when the server loads slowly. When the server loads fast. Everything just runs smoothly.
Not really enough information to answer that question here, but my guess would be that it has nothing to do with GraphQL. I think your client just cancels the request and makes another one because the first one timed out. You can find out if that happens by logging requests to your server before they're passed to GraphQL.

RXJS : Idiomatic way to create an observable stream from a paged interface

I have paged interface. Given a starting point a request will produce a list of results and a continuation indicator.
I've created an observable that is built by constructing and flat mapping an observable that reads the page. The result of this observable contains both the data for the page and a value to continue with. I pluck the data and flat map it to the subscriber. Producing a stream of values.
To handle the paging I've created a subject for the next page values. It's seeded with an initial value then each time I receive a response with a valid next page I push to the pages subject and trigger another read until such time as there is no more to read.
Is there a more idiomatic way of doing this?
function records(start = 'LATEST', limit = 1000) {
let pages = new rx.Subject();
this.connect(start)
.subscribe(page => pages.onNext(page));
let records = pages
.flatMap(page => {
return this.read(page, limit)
.doOnNext(result => {
let next = result.next;
if (next === undefined) {
pages.onCompleted();
} else {
pages.onNext(next);
}
});
})
.pluck('data')
.flatMap(data => data);
return records;
}
That's a reasonable way to do it. It has a couple of potential flaws in it (that may or may not impact you depending upon your use case):
You provide no way to observe any errors that occur in this.connect(start)
Your observable is effectively hot. If the caller does not immediately subscribe to the observable (perhaps they store it and subscribe later), then they'll miss the completion of this.connect(start) and the observable will appear to never produce anything.
You provide no way to unsubscribe from the initial connect call if the caller changes its mind and unsubscribes early. Not a real big deal, but usually when one constructs an observable, one should try to chain the disposables together so it call cleans up properly if the caller unsubscribes.
Here's a modified version:
It passes errors from this.connect to the observer.
It uses Observable.create to create a cold observable that only starts is business when the caller actually subscribes so there is no chance of missing the initial page value and stalling the stream.
It combines the this.connect subscription disposable with the overall subscription disposable
Code:
function records(start = 'LATEST', limit = 1000) {
return Rx.Observable.create(observer => {
let pages = new Rx.Subject();
let connectSub = new Rx.SingleAssignmentDisposable();
let resultsSub = new Rx.SingleAssignmentDisposable();
let sub = new Rx.CompositeDisposable(connectSub, resultsSub);
// Make sure we subscribe to pages before we issue this.connect()
// just in case this.connect() finishes synchronously (possible if it caches values or something?)
let results = pages
.flatMap(page => this.read(page, limit))
.doOnNext(r => this.next !== undefined ? pages.onNext(this.next) : pages.onCompleted())
.flatMap(r => r.data);
resultsSub.setDisposable(results.subscribe(observer));
// now query the first page
connectSub.setDisposable(this.connect(start)
.subscribe(p => pages.onNext(p), e => observer.onError(e)));
return sub;
});
}
Note: I've not used the ES6 syntax before, so hopefully I didn't mess anything up here.