I am using genetic algorithms to determine survival in my Netlogo model, and the ultimate output of the GA is a decimal number between 0 and 1, inclusive. For crossover / mutation purposes, I need to work with gray code rather than binary numbers. I have a function to convert binary to decimal, but not gray code to binary (which I've struggled with).
Any suggestions on how to code a gray code to binary function?
Thanks for the above comments.
I did find another solution: the "bitstring: extension for Netlogo. I spoke with the author, and he added a "to-gray" function.
https://github.com/garypolhill/netlogo-bitstring
Related
I want to convert data(double precision,15 decimal points) to data of another type(quadruple precision,34 decimal points). So, I used vpa function like this:
data = sin(2*pi*frequency*time);
quad_data = vpa(data,34);
But, the type of the result is sym, not double. And when I checked each cell of the sym type data, 1x1 sym was created in each cell. I tried to use fft function using 'quad_data', but it didn't work. Is there any solution that I can change the decimal point of double type from 15 to 34?
The only numeric floating point types that MATLAB currently supports is double, single, and half. Extended precision types can be achieved via the Symbolix Toolbox (e.g., vpa) or 3rd party code (e.g., John D'Errico's FEX submission High Precision Floating HPF class). But even then, only a subset of floating point functions will typically be supported. If the function you are trying to use doesn't support the variable type, then you would have to supply your own function.
Also, you are not building vpa objects properly in the first place. Typically you would convert the operands to vpa first and then do arithmetic on them. Doing the arithmetic in double precision first as you are doing with data, and then converting to extended precision vpa, just adds garbage to the values. E.g., set the digits first and then use vpa('pi') to get the full extended precision version of pi as a vpa variable.
There is a commercial 3rd-party toolbox for this purpose, called the Multiprecision Computing Toolbox for MATLAB.
This tool implements many of the mathematical operations you would expect from double inputs, and according to benchmarks on the website, it's much faster than vpa.
Disclosure: I am not affiliated with the creators of this tool in any way, however I can say that we had a good experience with this tool for one of our lab's projects.
The other suggestion I can give is doing the high-precision arithmetic in an another language\environment to which MATLAB provides interfaces (e.g., C, python, java), and which should have the quad data type implemented.
This question already has an answer here:
What is the typical method to separate connected letters in a word using OCR
(1 answer)
Closed 5 years ago.
I made a basic OCR system in Matlab using correlation. (It's not a professional project, only as an exercise and I am not using the ocr() function of Matlab). My code is working almost correctly for clean text images. But if I make the job a little harder (taking the text photo for side position with angle) my code does not give good results. I use Principal Component Analysis for correct text alignment, but if I do this (taking photo with angle), the characters are very close together and I can't separate them for the recognizing process.
Original Image and after preprocessing (adaptive thresholding, adjusting,PCA)
How can I separate the characters correctly?
An alternative to what Yves suggests is to erode the image. Which is implemented as imerode in matlab. Perhaps scale the image first (though it is not needed here)
e.g. with this code
ocr(imerode(I,strel('disk',3)))
where I is your "BOOLEAN" black-white image, I receive
ocrText with properties:
Text: 'BOOLEAN↵↵'
CharacterBoundingBoxes: [9×4 double]
CharacterConfidences: [9×1 single]
Words: {'BOOLEAN'}
WordBoundingBoxes: [14 36 208 43]
WordConfidences: 0.5477
Splitting characters is a pretty difficult problem.
Unless the character widths are constant (which is the case for this image but might not be true with other letters), the methods based on projection analysis (vertcial extent of the characters as a function of abscissa) will fail.
Actually, for a method to be effective, it must be font-aware, i.e. know in advance what the alphabet looks like. In other words, you can't separate segmentation from recognition.
A possibility is to attempt to decompose the blob assumed to be made of touching characters (possibly based on projections or known character sizes), perform the recognition, and check the recognition results. Preferably, try several decompositions and keep the best.
I encountered some problem while using Matlab. I'm doing some computations concerning OTC instruments (pricing, constructing discount curve, etc.), firstly in Excel and after that in Matlab (for comparison). While I`m 100% sure that computations in Excel are good (comparing to market data), it seems that Matlab is producing some differences (i.e. -4,18-05E). Matlab algorithm looks fine. I was wondering - maybe it is because Matlab is rounding some computations - I heard a little bit about it. I'm trying to convert a double numbers to float by function vpa(), but it looks that it is not working with double numbers. Any other ideas?
Excel uses 64 bit double precision floating point numbers compliant with IEEE 754 floating point specification.
The way that Excel treats results like =1/5 and appears to compute them exactly (despite this example not being a dyadic rational) is purely down to formatting. It handles =1/3 + 1/3 + 1/3 similarly. It's quite smart really if you think about it: the implementers of Excel had no real choice given that the average Excel user is not au fait with the finer points of floating point arithmetic and would simply scorn a spreadsheet package that "couldn't even get 1/5 correct".
That all said, you're very unlucky if you get a difference of -4,18-05E between the two systems. That's because double floating point is accurate to around 15 significant figures. Your algorithms would be implemented very poorly indeed for the error terms to bubble up to that magnitude if you're consistently using double precision floating point types.
Most likely (and I too work in finance), the difference will be in the way you're interpolating your discount curve. That's where I would look first if I were you.
Given the value of the error compared to the default format settings, this is almost certainly because of using the default format short and comparing the output on the command line to the real value.
x = 5.4444418
Output:
x =
5.4444
Then:
x-5.4444
Output:
ans =
4.1800e-05
The value stored in x remains at 5.4444418, it is only the measure output to the command line that changes.
It's part of the process of OCR,which is :
How to segment the sentences into words,and then characters?
What's the candidate algorithm for this task?
As a first pass:
process the text into lines
process a line into segments (connected parts)
find the largest white band that can be placed between each pair of segments.
look at the sequence of widths and select "large" widths as white space.
everything between white space is a word.
Now all you need a a good enough definition of "large".
First, NIST (Nat'l Institutes of Standards and Tech.) published a protocol known as the NIST Form-Based Handwriting Recognition System about 15 years ago for the this exact question--i.e., extracting and preparing text-as-image data for input to machine learning algorithms for OCR. Members of this group at NIST also published a number of papers on this System.
The performance of their classifier was demonstrated by data also published with the algorithm (the "NIST Handwriting Sample Forms.")
Each of the half-dozen or so OCR data sets i have downloaded and used have referenced the data extraction/preparation protocol used by NIST to prepare the data for input to their algorithm. In particular, i am pretty sure this is the methodology relied on to prepare the Boston University Handwritten Digit Database, which is regarded as benchmark reference data for OCR.
So if the NIST protocol is not a genuine standard at least it's a proven methodology to prepare text-as-image for input to an OCR algorithm. I would suggest starting there, and using that protocol to prepare your data unless you have a good reason not to.
In sum, the NIST data was prepared by extracting 32-bit x 32 bit normalized bitmaps directly from a pre-printed form.
Here's an example:
00000000000001100111100000000000
00000000000111111111111111000000
00000000011111111111111111110000
00000000011111111111111111110000
00000000011111111101000001100000
00000000011111110000000000000000
00000000111100000000000000000000
00000001111100000000000000000000
00000001111100011110000000000000
00000001111100011111000000000000
00000001111111111111111000000000
00000001111111111111111000000000
00000001111111111111111110000000
00000001111111111111111100000000
00000001111111100011111110000000
00000001111110000001111110000000
00000001111100000000111110000000
00000001111000000000111110000000
00000000000000000000001111000000
00000000000000000000001111000000
00000000000000000000011110000000
00000000000000000000011110000000
00000000000000000000111110000000
00000000000000000001111100000000
00000000001110000001111100000000
00000000001110000011111100000000
00000000001111101111111000000000
00000000011111111111100000000000
00000000011111111111000000000000
00000000011111111110000000000000
00000000001111111000000000000000
00000000000010000000000000000000
I believe that the BU data-prep technique subsumes the NIST technique but added a few steps at the end, not with higher fidelity in mind but to reduce file size. In particular, the BU group:
began with the 32 x 32 bitmaps; then
divided each 32 x 32 bitmap into
non-overlapping blocks of 4x4;
Next, they counted the number of
activated pixels in each block ("1"
is activated; "0" is not);
the result is an 8 x 8 input matrix
in which each element is an integer (0-16)
for finding binary sequence like 101000000000000000010000001
detect sequence 0000,0001,001,01,1
I am assuming you are using the image-processing toolbox in matlab.
To distinguish text in an image. You might want to follow:
Grayscale (speeds up things greatly).
Contrast enhancement.
Erode the image lightly to remove noise (scratches/blips)
Dilation (heavy).
Edge-Detection ( or ROI calculation).
With Trial-and-error, you'll get the proper coefficients such that the image you obtain after 5th step will contain convex regions surrounding each letter/word/line/paragraph.
NOTE:
Essentially the more you dilate, the larger element you get. i.e. least dilation would be useful in identifying letters, whereas comparitively high dilation would be needed to identify lines and paragraphs.
Online ImgProc MATLAB docs
Check out the "Examples in Documentation" section in the online docs or refer to the image-processing toolbox documentation in Matlab Help menu.
The examples given there will guide you to the proper functions to call and their various formats.
Sample CODE (not mine)
I tried to assign a very small number to a double value, like so:
double verySmall = 0.000000001;
9 fractional digits. For some reason, when I multiplicate this value by 10, I get something like 0.000000007. I slighly remember there were problems writing big numbers like this in plain text into source code. Do I have to wrap it in some function or a directive in order to feed it correctly to the compiler? Or is it fine to type in such small numbers in text?
The problem is with floating point arithmetic not with writing literals in source code. It is not designed to be exact. The best way around is to not use the built in double - use integers only (if possible) with power of 10 coefficients, sum everything up and display the final useful figure after rounding.
Standard floating point numbers are not stored in a perfect format, they're stored in a format that's fairly compact and fairly easy to perform math on. They are imprecise at surprisingly small precision levels. But fast. More here.
If you're dealing with very small numbers, you'll want to see if Objective-C or Cocoa provides something analagous to the java.math.BigDecimal class in Java. This is precisely for dealing with numbers where precision is more important than speed. If there isn't one, you may need to port it (the source to BigDecimal is available and fairly straightforward).
EDIT: iKenndac points out the NSDecimalNumber class, which is the analogue for java.math.BigDecimal. No port required.
As usual, you need to read stuff like this in order to learn more about how floating-point numbers work on computers. You cannot expect to be able to store any random fraction with perfect results, just as you can't expect to store any random integer. There are bits at the bottom, and their numbers are limited.