Let's say during a side milling operation, a 40mm dia tool is used, so an offset is specified using a syntax as follows:
G41 D4
From what I understand from the above syntax is that, D must be followed by a number obtained by dividing the tool dia by 10. Am I right?
I've seen another example where a 25mm dia tool was used and the D there was followed by 2 and not 2.5 (25 divided by 10). How come?
Can you please explain how the number that must be followed by "D" must be obtained.
First of all you should consider there are many different dialects of G-code, you should always mentioned the specifications of the machine you are working with.
The G41 as mentioned here in LinuxCNC page, indicates the tool. When the value starts with a D it refers to the tool number not the diameter. So D4 refers to the tool number 4 and D2 refers to tool number 2. Also
if there is no D word the radius of the currently loaded tool will be used
so I think if you want to indicate the radius of the currently loaded tool you should put
G41 20
or
G41 12.5
Related
I am working on a software to encode postal addresses using the PostBar barcode symbology in use in Canada.
I can't find the relevant information for these codes. Wikipedia does describe PostBars, but with a caveat saying that the article is about the D12 type, whereas the Canadian Post actually uses the types D52.01/D82.01/S52.40 and S82.39, which are different and undocumented. (I also know the "CANADA POST CORPORATION 4-STATE BAR CODE HANDBOOK" document, which doesn't help.)
I need the specifics of the encoding of the fields (DCI, Postal Code, Address Locator...) and the parameters of Reed-Solomon parity bits.
I am not after an implementation, which I am able to craft myself. Thank you in advance for any tip.
This is the only thing I could find on the subject. It is not much, I'm afraid:
https://en.wikipedia.org/wiki/Canada_Post#Barcodes
Canada Post uses a 13 character barcode for their pre-printed labels. Bar codes consist of two letters, followed by eight sequence digits, and a ninth digit which is the check digit. The last two characters are the letters CA. The check digit seems to ignore the letters and only concern itself with the first 8 numeric digits. The scheme is to multiply each of those 8 digits by a different weighting factor, (8 6 4 2 3 5 9 7). Add up the total of all of these multiplications and divide by 11. The remainder after dividing by 11 gives a number from 0 to 10. Subtracting this from 11 gives a number from 1 to 11. That result is the check digit, except in the two cases where it is 10 or 11. If 10 it is then changed to a 0, and if 11 then it is changed to a 5. The check digit may be used to verify if a barcode scan is correct, or if a manual entry of the barcode is correct.
And as bonus, an explanation of the barcodes, in Dutch:
https://www.postnl.nl/Images/Brochure-KIX-code-van-PostNL_tcm10-10210.pdf
I don't think we ( Canada Post ) use PostBar anymore. Management made adoption too much of a pain for the mailer so it died. I haven't seen one on an envelope in years. Now that OCR tech is so good it wouldn't help that much to include a PostBar anyway.
What they should have done is given away software that printed up the address labels in alpha-numeric order of the postal code and printed a bunch of positional marks on the top fold of the envelope based on that same postal code. That way a postal clerk need not even take the mail out of the box to see where it should be shipped to. LVM's (large volume mailers) would do this for a rebate on their bill.
Ase for smaller businesses or the general public we should have just soled them prepaid envelopes in 2 or 3 standard sizes for a dime less than the cost of a stamp alone. A standard envelop can have a dedicated spot for a machine readable postal code. I would have gone with good old public-domain Braille! printed or in sharpy:-) Oh well I'm rambling now I'll stop.
I am relatively new to vowpal wabbit and would like to find out about the -b parameter (feature bits in feature table).
My training data are something like this. There are a total of about 1 million words.
1 | a = "word" b ="word131232" c="word1233" d = "word123124" e = "word23145"
However, each row would only have 5 features. how many bits should i use? I tried to run it and it seems with an increasing number of examples, the number of features set seem to increase. I do not seem to understand why is this so.
If you use -b 18 (which is the default), the features will be hashed into a table with 2^18 items, so if the number of unique features in your dataset is close to 2^18 (or even higher), you should increase the parameter -b, so there are not so many hash collisions. There is no easy way how to detect the number of collisions, but the common practice is to tune the parameter -b for a best progressive validation loss (or holdout loss if you use more passes). Of course, it also depends on the available memory on your machine.
1 | a = "word" b ="word131232" c="word1233" d = "word123124" e = "word23145"
Note that this example is wrong (not what you intended) because of the spaces around =. The equal sign has no special meaning (unlike colon which is used for separating the feature value). Features cannot contain space in their name. There is no need to enclose feature names in quotes. So the example should look like
1 | word word131232 word1233 word123124 word23145
If the prefix a, b, c, d, e has some special meaning (i.e. a=word42 should be a different feature than b=word42) you can use:
1 | a=word b=word131232 c=word1233 d=word123124 e=word23145
If all your words are already mapped to integers (within the range 0-2^b), you can use them directly as feature names and no hashing will be done (unless you specify --hash=all):
1 | 0 131232 1233 123124 23145
See the wiki page about input format.
the number of features set seem to increase
In the progress report (by default each 2^x th example), in the last column you can see current features, which is the number of features for the current example (including the constant feature and quadratic/cubic/... features if you use them) and it should not be increasing (unless you have such strange data).
In the final report, vw prints total feature number, which is the average number of features per example times the number of examples times the number of passes (so it is not the number of unique features in the dataset).
I have a set of say, 100, genomic features for which I've created a fasta file with a 500 bp window around each. I've searched these windows for a DNA sequence and found an average of 1.5 sequences per individual 500 bp window in the feature set. By chance, I expect the sequence to be present once every 1024 bp, or on average ~0.49 of my sequence per 500 bp window.
My question is how can I determine whether the 1.5 binding sites per individual feature I've uncovered is significant or not, and obtain a p-value?
And as a follow up, if I use the same set of 100 windows and search for a different sequence with the same probability (1/1024) and determine that there are now an average of 0.9 sequences per individual window, how can I determine whether this is significantly different than the 1.5 for the sequence for which I searched above?
As a second follow up, if I search for the same two sequences above (both found on average 1/1024 base pairs) in a different set of 500 bp windows for a different feature type (say, n=50), how can I determine if the results of this search are significantly different than the results above (particularly if the difference between sequence A and sequence B in feature set 1 and feature set 2 is significant)?
Thank you in advance.
I ended up using simulations to answer all of the above questions. Generate windows of desired size, 500 bp in this case, of random genomic sequence. Search for motifs in X windows (where X = number of individuals in a feature set) and compare with results obtained searching for motifs in the features of interest. To Repeat with sample size equal to that of the second feature set being analyzed. To compare features with one another, do the analogous simulation and compare results.
The IADD instruction in IJVM adds two 1-word numbers. When I add EEEEEEEE to itself I get DDDDDDDC. What happens to the carry 1? How can I get it? Is it saved in a register?
It appears that the carry-out bit is lost.
No version of the IJVM Assembly Language Specification that I've come across says anything about a carry-out bit, or carry flag.
IADD Pop two words from stack; push their sum
downeyt adds:
The MIC1 that interprets IJVM only has two condition codes, N and Z. The carry out from the ALU is not stored. The microarchitecture could be modified to store the carry out, like it stores the N and Z bits.
Is there a online version of the k-Means clustering algorithm?
By online I mean that every data point is processed in serial, one at a time as they enter the system, hence saving computing time when used in real time.
I have wrote one my self with good results, but I would really prefer to have something "standardized" to refer to, since it is to be used in my master thesis.
Also, does anyone have advice for other online clustering algorithms?
(lmgtfy failed ;))
Yes there is. Google failed to find it because it's more commonly known as "sequential k-means".
You can find two pseudo-code implementations of sequential K-means in this section of some Princeton CS class notes by Richard Duda. I've reproduced one of the two implementations below:
Make initial guesses for the means m1, m2, ..., mk
Set the counts n1, n2, ..., nk to zero
Until interrupted
Acquire the next example, x
If mi is closest to x
Increment ni
Replace mi by mi + (1/ni)*( x - mi)
end_if
end_until
The beautiful thing about it is that you only need to remember the mean of each cluster and the count of the number of data points assigned to the cluster. Once you update those two variables, you can throw away the data point.
I'm not sure where you would be able to find a citation for it. I would start looking in Duda's classic text Pattern Classification and Scene Analysis or the newer edition Pattern Classification. If it's not there, you could try Chris Bishop's newest book or Daphne Koller and Nir Friedman's recent text.