Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Lets say you have an array of size n with randomly generated elements and you want to use quicksort to sort the array. For large enough n (say 1,000,000), in order to speed up quicksort, it would make sense to stop recursing when the array gets small enough, and use insertion sort instead. In such an implementation, the base case for Quicksort is some value base > 1. What would the optimal base value to choose and why?
Think about the time complexity of quicksort (average and worst case) and the time complexity of other sort that might do better for small n.
Try starting with Wikipedia - it has good starting info about comparing the two algorithms. When you have a more specific question, feel free to come back.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am looking for a solution to split my data to Test and Train sets but I want to have all the levels of my categorical variable in both test and train.
My variable has 200 levels and the data is 18 million records. I tried sampleBy function with fractions (0.8) and could get the training set but had difficulties getting the test set since there is no index in Spark and even with creating a key, using left join or subtract is very slow to get the test set!
I want to do a groupBy based on my categorical variable and randomly sample each category and if there is only one observation for that category, put that in the train set.
Is there a default function or library to help with this operation?
A pretty hard problem.
I don't know of an in-built function which will help you get this. Using sampleBy and then so subtraction subtraction would work but as you said - would be pretty slow.
Alternatively, wonder if you can try this*:
Use window functions, add row num and remove everything with rownum=1 into a separate dataframe which you will add into your training in the end.
On the remaining data, using randomSplit (a dataframe function) to divide into training and test
Add the separated data from Step 1 to training.
This should work faster.
*(I haven't tried it before! Would be great if you can share what worked in the end!)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I understand that a CRC verifies data integrity by producing a checksum, which is the result of polynomial long division. I've heard hash values referred to as hash checksums, so my question is whether hash functions use some sort of polynomial division as well? I know they break the data up into block ciphers, so my guess would be that the hash functions create some relationship between the polynomial check value and how it's divided into the different blocks. Can someone let me know if I'm way off base here?
A CRC is a hash function, but there are many other ways to implement a hash function. The other ways generally do not use polynomial division, though there are some that use a CRC as a part of the hash calculation, in order to make use of hardware CRC instructions. Most hash functions use a long, convoluted series of ands, nots, exclusive-ors, integer additions, multiplications, and modulos.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I am confused about a situation which is presented on the following slide:
Last sentences says that:
It is important to note that deterministic does not mean that
xt is non-random. What does this mean? If A and B are random variable, then x must be random right?
I think the point may be that nature may choose randomly among different paths, but once you know which path has been chosen you can predict future values of x_t on the path from past values x_{t-1}, etc. So e.g. nature may flip a coin to choose between the following two paths: x_t=0 for all t, and x_t=1 for all t. Then if you don't know the path, x_t is indeed random. But once you know x_{t-1}, you know x_t.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Given a fair coin (0/1), how do you simulate fair dice(0 to 5)
Obvious answer I know is toss 3 times, treat each toss as a bit to produce (2^3 = 0 to 7)
If result == 7, discard and repeat.
Well, theoretically worst case big-O of this is really bad (another question in itself, something to do with monte-carlo algos). Lets keep this soln on shelf.
So now my question is,
Is there/can there be n number of coin toss that can guarantee simulation of dice ?
Of course if exists would like to know minimum number of tosses. :)
In absence of a number that is both divisible by three and 2^n, I couldnt think of any way to solve this. :(
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I need a list of Large Carmichael numbers( 10 to 100 digits ). Is there any website which provides such data. It would be more helpful if i can get a list of large odd composite numbers and primes as well.
Did you already try The On-Line Encyclopedia of Integer Sequences? Carmichael numbers are sequence A002997, and from there you can find a link to a Table of n, a(n) for n = 1..10000.
Here's the first 33: http://oeis.org/search?q=carmichael&language=english&go=Search.
This is generally a good site for sequences.