Beginner Pd user here. I have a number message, and I am trying to get the numerical difference between the current number and the previous number - does anyone know of a simple way to do this?
This is probably the simplest way to do it:
If you need the absolute difference, use the object abs before the final result (to ignore the negative sign).
Related
One of the features in my random forest model has missing values. There are 5 reasons for the data is missing and I know the reason for all the missing values. My question is how can I feed this information into the model? I can create a categorical variable (or encoded dummies) for the reason of the data being missing but how can I make sure that random forest gets information from this categorical variable when there is a missing value in my main variable?
Adding another variable will not help you much, because 1) Random For
rest assumes independence of the variables, so you will not be able to entangle two variables and 2) it does not guarantee that it will use it all.
If you want to use Random Forest, you will have to impute the missing values one way or another.
The most simple approach is if your variable is in some range, set the missing values to an out of range values encoding the reasons. That is if your variable lays in range [-1..1], set the missing value (say) to -101 if the reason is reason #1, -102 for the reason #2, etc. The idea is to allow the algorithm find a distinct borders between different values.
Second method called MissForest is a bit more computationally complex. As you don't know the value, information about why you miss it does not contribute much. Still, you can find the best value to set instead of the missing one iteratively.
I need to calculate a pseudo random number in a given range (e.g. 0-150) based on another, strictly increasing number. Is there a mathematical way to solve this?
I am given one number x, which increases by 1 every day. Based on this number, I need to - somehow - calculate a number in a given range, which seems to be random.
I have a feeling that there is an easy mathematical solution for this, but sadly I am not able to find it. So any help would be appreciated. Thanks!
One sound way to do that is to hash the number x (either its binary representation or in text form) and then to use the hash to produce the 'random' number in the desired range (say by taking the first 32 bits of the hash and extracting by any known method the desired value). A cryptographic hash can be used like Sha256, but this is not necessary, MurmurHash is possibly a good one for your application.
Normally when you generate a random number, a seed value is used so that the same sequence of psuedorandom numbers isn't repeated. When a seed isn't explicitly given, many systems will use the time as a seed value.
Perhaps you could use x as a seed.
Here's an article explaining seeding: https://www.statisticshowto.com/random-seed-definition/
I'm trying to write a basic digit counter (an integer is inputted and the number of digits of that integer is outputted) for positive integers. This is my general formula:
dig(x) := Math.floor(Math.log(x,10))
I tried implementing the equivalent of dig(x) in Ruby, and found that when I was computing dig(1000) I was getting 2 instead of 3 because Math.log was returning 2.9999999999999996 which would then be truncated down to 2. What is the proper way to handle this problem? (I'm assuming this problem can occur regardless of the language used to implement this approach, but if that's not the case then please explain that in your answer).
To get an exact count of the number of digits in an integer, you can do the usual thing: (in C/C++, assuming n is non-negative)
int digits = 0;
while (n > 0) {
n = n / 10; // integer division, just drops the ones digit and shifts right
digits = digits + 1;
}
I'm not certain but I suspect running a built-in logarithm function won't be faster than this, and this will give you an exact answer.
I thought about it for a minute and couldn't come up with a way to make the logarithm-based approach work with any guarantees, and almost convinced myself that it is probably a doomed pursuit in the first place because of floating point rounding errors, etc.
From The Art of Computer Programming volume 2, we will eliminate one bit of error before the floor function is applied by adding that one bit back in.
Let x be the result of log and then do x += x / 0x10000000 for a single precision floating point number (C's float). Then pass the value into floor.
This is guaranteed to be the fastest (assuming you have the answer in numerical form) because it uses only a few floating point instructions.
Floating point is always subject to roundoff error; that's one of the hazards you need to be aware of, and actively manage, when working with it. The proper way to handle it, if you must use floats is to figure out what the expected amount of accumulated error is and allow for that in comparisons and printouts -- round off appropriately, compare for whether the difference is within that range rather than comparing for equality, etcetera.
There is no exact binary-floating-point representation of simple things like 1/10th, for example.
(As others have noted, you could rewrite the problem to avoid using the floating-point-based solution entirely, but since you asked specifically about working log() I wanted to address that question; apologies if I'm off target. Some of the other answers provide specific suggestions for how you might round off the result. That would "solve" this particular case, but as your floating operations get more complicated you'll have to continue to allow for roundoff accumulating at each step and either deal with the error at each step or deal with the cumulative error -- the latter being the more complicated but more accurate solution.)
If this is a serious problem for an application, folks sometimes use scaled fixed point instead (running financial computations in terms of pennies rather than dollars, for example). Or they use one of the "big number" packages which computes in decimal rather than in binary; those have their own round-off problems, but they round off more the way humans expect them to.
I am using the Levenshtein distance algorithm to compare a company name provided as a user input against a database of known company names to find closest match. By itself, the algorithm works okay, but I want to build in a Bias so that the edit distance is considered lower if the initial parts of the strings match.
For Example, if the search criteria is "ABCD", then both "ABCD Co." and "XYX ABCD" have identical Edit Distance. However I want to add weight to the fact that the initial parts of the first string matches the search criteria more closely than the second string.
One way of doing this might be to modify the insert/delete/replace costs to be higher at the beginning of the strings and lower towards the end. Does anyone have an example of a successful implementation of this? Is using Levenshtein distance still the best way to do what I am trying to achieve? Is my assumption of the approach accurate?
UPDATE: For my immediate purposes I have decided to forgo the above and instead use the Jaro Winkler edit distance which seems to solve the problem. However I will leave this open for further inputs.
What you're looking for looks like a Smith-Waterman local alignment: http://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm
I have written matlab codes for two different block matching algorithms, extensive search and three step search, but i am not sure how i can check whether i am getting the correct results. Is there any standard way to check these or any standard code which i can run and compare my result with.I read somewhere that JM software can be used but i didnt find any way to use it.
You can always use the results produced by your algorithms to create the next frame of video and then analyze its quality by either visually inspecting it (which is rather subjective, and we like to deal in numbers) or calculating the mean square error between the produced image and the one you're trying to estimate. Mean square error of the exhaustive (extensive) search should be lower than the one three-step gives you.
Well, did you try to plot it? I mean,after the block-matching you have a new image, right?.
A way to know if you result if true or not is to check the sum of the difference of 2 frames.
A - pre_frame
B - post_frame
C - Compensated frame
If abs(abs(A-B)) is lower than abs(abs(A-C))) that mean it could be true.
Next time, try to specify your algoritm. Also, put your code here to help you more.