Swift: fastest way to copy buffers of UInt32 - swift

I've got code that has two UnsafeMutableBufferPointer<UInt32>s, and need to selectively copy from one buffer to the other, as fast as possible. There is some logic that decides wether the element at a particular index/address is copied or not, so unfortunately I can't just memcpy() the whole lot.
What is the quickest way to do a per-element buffer copy in Swift?
For the purpose of simplicity, I've removed the per-element logic in the examples below.
This is slow (56ms):
for i in 0..<size {
if ... {
dest[i] = source[i]
}
}
This is much faster (18ms), but I'd like to go quicker:
var srcPtr = source.baseAddress
var dstPtr = dest.baseAddress
for i in 0..<size {
if ... {
srcPtr.pointee = dstPtr.pointee
srcPtr = srcPtr.advance(by: 1)
dstPtr = dstPtr.advance(by: 1)
}
}
Is there a quicker way?

Related

What could cause a meteor app to sometimes create two documents?

I have a meteor app that takes the text of an article and splits it into paragraphs, sentences, words, and characters and then stores that in a json which I then save as a document in a collection. The document I am testing now ends up as 15133 bytes in mongodb.
When I insert the document it takes about 20 or 30 seconds to insert. Then sometimes it starts going through my article creation routine again and inserts another document. Sometimes it ends up inserting 3 or more documents. Sometimes it behaves as it should and only inserts 1 document into the collection.
What should I be looking for that could be causing this behavior?
Here is my code, as requested:
Meteor.methods({
'createArticle': function (text, title) {
var article = {}
article.title = title
article.userID = "sdfgsdfg"
article.text = text
article.paragraphs = []
var paragraphs = splitArticleIntoParagraphs(text)
console.log("paragraphs", paragraphs)
_.each(paragraphs, function (paragraph, p) {
if (paragraph !== "") {
console.log("paragraph", paragraph)
article.paragraphs[p] = {}
article.paragraphs[p].read = false
article.paragraphs[p].text = paragraph
console.log("paragraphs[p]", article.paragraphs[p])
var sentences = splitParagraphIntoSentences(paragraph)
article.paragraphs[p].sentences = []
}
_.each(sentences, function (sentence, s) {
if (sentence !== "") {
article.paragraphs[p].sentences[s] = {}
console.log("sentence", sentence)
article.paragraphs[p].sentences[s].text = sentence
article.paragraphs[p].sentences[s].read = false
console.log("paragraphs[p].sentences[s]", article.paragraphs[p].sentences[s])
var wordsForward = splitSentenceIntoWordsForward(sentence)
console.log("wordsForward", JSON.stringify(wordsForward))
article.paragraphs[p].sentences[s].forward = {}
article.paragraphs[p].sentences[s].forward.words = wordsForward
// var wordsReverse = splitSentenceIntoWordsReverse(sentence)
_.each(wordsForward, function (word, w) {
if (word) {
// console.log("word", JSON.stringify(word))
// article.paragraphs[p].sentences[s] = {}
// article.paragraphs[p].sentences[s].forward = {}
// article.paragraphs[p].sentences[s].forward.words = []
article.paragraphs[p].sentences[s].forward.words[w] = {}
article.paragraphs[p].sentences[s].forward.words[w].wordID = word._id
article.paragraphs[p].sentences[s].forward.words[w].simp = word.simp
article.paragraphs[p].sentences[s].forward.words[w].trad = word.trad
console.log("word.simp", word.simp)
var characters = word.simp.split('')
console.log("characters", characters)
article.paragraphs[p].sentences[s].forward.words[w].characters = []
_.each(characters, function (character, c) {
if (character) {
console.log("character", character, p, s, w, c)
article.paragraphs[p].sentences[s].forward.words[w].characters[c] = {}
article.paragraphs[p].sentences[s].forward.words[w].characters[c].text = character
article.paragraphs[p].sentences[s].forward.words[w].characters[c].wordID = Words.findOne({simp: character})._id
}
})
}
})
}
})
})
// console.log("article", JSON.stringify(article))
// console.log(JSON.stringify(article.paragraphs[10].sentences[1].forward))//.words[4].characters[0])
console.log("done")
var id = Articles.insert(article)
console.log("id", id)
return id
}
})
I call the method here:
Template.articleList.events({
"click #addArticle": function(event) {
event.preventDefault();
var title = $('#title').val();
var text = $('#text').val();
$('#title').value = '';
$('#text').value = '';
$('#text').attr('rows', '3');
Meteor.call('createArticle', text, title);
}
})
An important thing to keep in mind is that meteor methods do not work very well when it comes to perform CPU intensive tasks. Since your meteor server only works in a single thread, any kind of blocking computations - like yours - will affect all client connections, e.g. delaying DDP heartbeats. This - in turn - can result in clients thinking that the connection was dropped.
As #ghybs suggested in one of the comments your method is probably triggered several times by an impatient DDP client who thinks that server has disconnected. The easiest way to prevent this behavior is by adding noRetry flag to Meteor.apply as explained here:
https://docs.meteor.com/api/methods.html#Meteor-apply
I believe Meteor.call does not have this option.
Another strategy would be trying to make sure that your methods are idempotent, i.e. calling them more than once should not produce any additional effects. This is usually true - at least when you use method simulation - because retrying db insert will re-use the same document id which will fail on the second try. For some reason this is not happening in your case.
Finally, the problem you described clearly shows that probably a different pattern should be used for a computationally expensive task like yours. If I were you, I would start by splitting the job in several steps:
First I would make sure that the document is uploaded to the server with a POST request rather than through DDP.
Then I would implement a "process file" server side method that grabs the file which is already on the server or in the database (in case you used files collection). The first thing the method should do would be calling this.unblock(), but thats not all.
Ideally the computational task should be executed in a separated process. Only when that process is completed the method would return telling the actual caller that the job is done. But since we called this.unblock() that caller can perform different tasks, e.g. calling other methods/subscriptions while waiting for the result.
Sometimes, having a separated process will not be good enough. I've experienced situations where I had to delegate the task to another worker server(s).

perl code structure for post-processing

The question I have is a bit abstract, but I'll attempt to be clear in my statement. (This is something of a "rubber duck effect" post, so I'll be thankful if just typing it out gets me somewhere. Replies, however, would be brilliant!)
I have old fortran code that I can't change (at least not yet) so I'm stuck with its awkward output.
I'm using perl to post-process poorly annotated ascii output files, which, as you might imagine, are a very specialized mixture of text and numbers. "Ah, the perfect perl objective," you say. Yes, it is. But what I've come up with is pretty terrible coding, I've recently concluded.
My question is about the generic structure that folks prefer to use to achieve such an objective. As I've said, I'm not happy with the one I've chosen.
Here's some pseudocode of the structure that I've arrived at:
flag1 = 0;
flag2 = 0;
while (<INPUT>) {
if (cond1) {
do something [like parse and set a header];
flag1 = 1;
} else {
next;
}
if (flag1 == 1 && cond2) {
do something else [like process a block of data];
} else {
next;
}
}
The objective of the above code is to be able to break the processing into blocks corresponding to the poorly partitioned ascii file -- there isn't much by way of "tags" in the file, so the conditions (cond1, cond2, etc.) are involved. The purpose of having the flags set is, among other reasons, to track the progress of the code through the file.
It occurs to me now that a better structure might be
while (<INPUT>) {
do stuff;
}
while (<INPUT>) {
do other stuff;
}
In any event, if my rambling inspires any thoughts I'd appreciate hearing them.
Thank you
Your original structure is perfectly fine. You're building a state machine, and doing it in a completely reasonable way that can't really be made any more idiomatic.
The only thing you can possibly do if you wish is to modularize the code a wee bit:
our %state = (last => 0, current => 0, next => 0);
our %extra_flags = ();
sub cond1($line) { return $next_state } # Returns 0 if cond==false
sub cond2($line) { return $next_state } # Returns 0 if cond==false
our %conditions = (
0 => \&cond1
1 => \&cond2 # flag1 is set
);
while (<INPUT>) {
my $state = $state->{current};
if ($state->{next} = $conditions{$state}->($_, $state)) {
$do_stuff{$state}->{$next_state}->($line);
$state->{last} = $state->{current};
$state->{current} = $state->{next};
next;
}
}
If the file does indeed lend itself to being processed in multiple loops, that would be a much clearer way to do it than emulating that with conditionals, IMO.
If not, even if there are just a few exceptions to code around, it's probably better to stick with the original approach you describe.

NodeJS: What is the proper way to handling TCP socket streams ? Which delimiter should I use?

From what I understood here, "V8 has a generational garbage collector. Moves objects aound randomly. Node can’t get a pointer to raw string data to write to socket." so I shouldn't store data that comes from a TCP stream in a string, specially if that string becomes bigger than Math.pow(2,16) bytes. (hope I'm right till now..)
What is then the best way to handle all the data that's comming from a TCP socket ? So far I've been trying to use _:_:_ as a delimiter because I think it's somehow unique and won't mess around other things.
A sample of the data that would come would be something_:_:_maybe a large text_:_:_ maybe tons of lines_:_:_more and more data
This is what I tried to do:
net = require('net');
var server = net.createServer(function (socket) {
socket.on('connect',function() {
console.log('someone connected');
buf = new Buffer(Math.pow(2,16)); //new buffer with size 2^16
socket.on('data',function(data) {
if (data.toString().search('_:_:_') === -1) { // If there's no separator in the data that just arrived...
buf.write(data.toString()); // ... write it on the buffer. it's part of another message that will come.
} else { // if there is a separator in the data that arrived
parts = data.toString().split('_:_:_'); // the first part is the end of a previous message, the last part is the start of a message to be completed in the future. Parts between separators are independent messages
if (parts.length == 2) {
msg = buf.toString('utf-8',0,4) + parts[0];
console.log('MSG: '+ msg);
buf = (new Buffer(Math.pow(2,16))).write(parts[1]);
} else {
msg = buf.toString() + parts[0];
for (var i = 1; i <= parts.length -1; i++) {
if (i !== parts.length-1) {
msg = parts[i];
console.log('MSG: '+msg);
} else {
buf.write(parts[i]);
}
}
}
}
});
});
});
server.listen(9999);
Whenever I try to console.log('MSG' + msg), it will print out the whole buffer, so it's useless to see if something worked.
How can I handle this data the proper way ? Would the lazy module work, even if this data is not line oriented ? Is there some other module to handle streams that are not line oriented ?
It has indeed been said that there's extra work going on because Node has to take that buffer and then push it into v8/cast it to a string. However, doing a toString() on the buffer isn't any better. There's no good solution to this right now, as far as I know, especially if your end goal is to get a string and fool around with it. Its one of the things Ryan mentioned # nodeconf as an area where work needs to be done.
As for delimiter, you can choose whatever you want. A lot of binary protocols choose to include a fixed header, such that you can put things in a normal structure, which a lot of times includes a length. In this way, you slice apart a known header and get information about the rest of the data without having to iterate over the entire buffer. With a scheme like that, one can use a tool like:
node-buffer - https://github.com/substack/node-binary
node-ctype - https://github.com/rmustacc/node-ctype
As an aside, buffers can be accessed via array syntax, and they can also be sliced apart with .slice().
Lastly, check here: https://github.com/joyent/node/wiki/modules -- find a module that parses a simple tcp protocol and seems to do it well, and read some code.
You should use the new stream2 api. http://nodejs.org/api/stream.html
Here are some very useful examples: https://github.com/substack/stream-handbook
https://github.com/lvgithub/stick

What is the better way to do the below program(c#3.0)

Consider the below program
private static bool CheckFactorPresent(List<FactorReturn> factorReturnCol)
{
bool IsPresent = true;
StringBuilder sb = new StringBuilder();
//Get the exposure names from Exposure list.
//Since this will remain same , so it has been done outside the loop
List<string> lstExposureName = (from item in Exposures
select item.ExposureName).ToList<string>();
foreach (FactorReturn fr in factorReturnCol)
{
//Build the factor names from the ReturnCollection dictionary
List<string> lstFactorNames = fr.ReturnCollection.Keys.ToList<string>();
//Check if all the Factor Names are present in ExposureName list
List<string> result = lstFactorNames.Except(lstExposureName).ToList();
if (result.Count() > 0)
{
result.ForEach(i =>
{
IsPresent = false;
sb.AppendLine("Factor" + i + "is not present for week no: " + fr.WeekNo.ToString());
});
}
}
return IsPresent;
}
Basically I am checking if all the FactorNames[lstFactorNames] are present in
ExposureNames[lstExposureName] list by using lstFactorNames.Except(lstExposureName).
And then by using the Count() function(if count() > 0), I am writing the error
messages to the String Builder(sb)
I am sure that someone can definitely write a better implementation than the one presented.
And I am looking forward for the same to learn something new from that program.
I am using c#3.0 and dotnet framework 3.5
Thanks
Save for some naming convention issues, I'd say that looks fine (for what I can figure out without seeing the rest of the code, or the purpose in the effort. The naming conventions though, need some work. A sporadic mix of ntnHungarian, PascalCase, camelCase, and abbrv is a little disorienting. Try just naming your local variables camelCase exclusively and things will look a lot better. Best of luck to you - things are looking good so far!
- EDIT -
Also, you can clean up the iteration at the end by just running a simple foreach:
...
foreach (var except in result)
{
isPresent = false;
builder.AppendFormat("Factor{0} is not present for week no: {1}\r\n", except, fr.WeekNo);
}
...

Simplest way to process a list of items in a multi-threaded manner

I've got a piece of code that opens a data reader and for each record (which contains a url) downloads & processes that page.
What's the simplest way to make it multi-threaded so that, let's say, there are 10 slots which can be used to download and process pages in simultaneousy, and as slots become available next rows are being read etc.
I can't use WebClient.DownloadDataAsync
Here's what i have tried to do, but it hasn't worked (i.e. the "worker" is never ran):
using (IDataReader dr = q.ExecuteReader())
{
ThreadPool.SetMaxThreads(10, 10);
int workerThreads = 0;
int completionPortThreads = 0;
while (dr.Read())
{
do
{
ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);
if (workerThreads == 0)
{
Thread.Sleep(100);
}
} while (workerThreads == 0);
Database.Log l = new Database.Log();
l.Load(dr);
ThreadPool.QueueUserWorkItem(delegate(object threadContext)
{
Database.Log log = threadContext as Database.Log;
Scraper scraper = new Scraper();
dc.Product p = scraper.GetProduct(log, log.Url, true);
ManualResetEvent done = new ManualResetEvent(false);
done.Set();
}, l);
}
}
You do not normally need to play with the Max threads (I believe it defaults to something like 25 per proc for worker, 1000 for IO). You might consider setting the Min threads to ensure you have a nice number always available.
You don't need to call GetAvailableThreads either. You can just start calling QueueUserWorkItem and let it do all the work. Can you repro your problem by simply calling QueueUserWorkItem?
You could also look into the Parallel Task Library, which has helper methods to make this kind of stuff more manageable and easier.