For my computer vision class, I'm going to be doing a project where I extract information about a hallway based on an image of that hallway. In particular, the lines of the hallway which extend toward a vanishing point will be of interest. My question is whether I should use Matlab, OpenCV, or something else to implement this.
I don't have a ton of time for this project. This fact makes Matlab seem like a good option since it seems you can usually get things up and running quickly there. On the other hand, I hope to take what I do for this class project and extend it out further for research once the class is complete. This makes OpenCV seem better as (from what I've read) it's much more efficient. It's possible another choice would be to implement it in Matlab for the project than port that code to an OpenCV form later. It should be noted that I have plenty of experience with C/C++, but only a little in both Matlab and OpenCV.
At the moment, I'm leaning toward just using OpenCV from the start. However, I would like the opinion of someone who's had a bit more experience here than myself. If you'd recommend something over both OpenCV and Matlab, please say so. Also, if you have any tips on what packages or toolkits might be useful for such a project, they would be greatly appreciated.
Any suggestions? Thanks for your time!
Using which one it is easier for you to write a piece of code to read an image file and display it?
If you know C++ very well, then it should be easy to debug the code. Since you say you have little experience with Matlab, if you make a small mistake in the code debugging can take a long time.
So I suggest break down the problem into:
read image and display it, this is very easy in both
detect edges using a simple/classic method, this is super easy in both, display the result and visually check it's correctly done
use a robust line fitting method, the RANSAC and Hough transform methods are probably what you're going to use, OpenCV makes using the easier than you can guess, Matlab also has built in functions to detect lines using the Hough transform, and gives you the start/end points of each segment. But if you're finding a vanishing point, you shouldn't need those.
The decision is yours, this is not a very difficult problem, can find loads of help on the web. Good luck with the project, and please let us know how it goes.
Related
I'm building a solution to fit a number of objects most efficiently into a box. I hope to implement more efficient algorithms soon, but to start out with I'm going to use the brute force method, checking every possible position. This is fine for now since the box is small, with a very few number of items. Later, the complexity will grow.
I'm using Unity to allow the user to see how the items ultimately fit in the box. My initial thought was to also use Unity's physics and collision detection to implement the best fit algorithm; but, with a huge potential number of locations and positions to check, is this a bad approach? Am I much better off running my algorithm in a data structure instead? A 10x10x10 box with even three 1x1x1 objects have almost a billion possible positions...
I'm new to Unity so any advice is welcome; thanks!
Update: right, so this problem is definitely in the bin-packing set of problems, which I know is NP hard. I'm assuming a rectangular box, filled with rectangular box-shaped items of random dimensions.
My question is...
My question is: given my particular algorithm, when we ask, "is there currently something in this x,y,z space?" would it be more efficient to figure that out via code, or to use Unity objects with collision-detection.
Based on the answers I've seen, I can see using Unity would be profoundly inefficient.
If you LITERALLY want to know:
"is there currently something in this x,y,z space?"
the best possible way to do that, is to simply use Unity's engine. So, you trivially check the AABB to see if a point is inside it (or perhaps just check for intersection). You can use one of many
I understand that the question "is there currently something in this x,y,z space?" is or could be one important part of whatever solution you are planning. And indeed the best way to do that is to let Unity's engine do that. It's absolutely impossible you or I could write anything as efficient -- to begin with it comes right off the quaternion cloud in the GPU.
That is the actual answer to what you have now stated is your specific question.
Now regarding the more general issue, which I first fully explained when that was the question you were asking :)
Here are some of my thoughts on trivial "box packing" algorithms in 2D, at the level useful in video games.
https://stackoverflow.com/a/35228592/294884
Regarding 3D "box packing" it's absolutely impossible to offer any guidance unless you include a screen shot of what you are trying to do and fully explain the shapes and constraints involved.
If you are a matheatician and looking for the latest in algorithmic thinking on the matter, just google something like "3d box packing algorithm"
example , example
Again, readers here have utterly no clue what shapes/etc you are dealing with, so please click Edit and explain!
Note too that sphere packing is a really fascinating scientific problem, if that's what you are talking about:
https://en.wikipedia.org/wiki/Close-packing_of_equal_spheres
Are there any tools or algorithm in Matlab or OpenCv, which will take multiple images of any object as input (from different location around the object) and produce the 3D coordinate of the object in the world.
Like Naveh said, in OpenCV the building blocks are there, but putting it together is something you would have to do.
That being said, people have generated a number of SfM tools in both C++ and Matlab. Depending on your goals there are a number of prepackaged things you can look at:
-There is a SfM Matlab Toolbox here, I have not personally used it but I've seen it a number of times.
-If you are just looking for a black-box solution, check out Visual SfM, it is a GUI-fied version of a common SfM workflow.
-A while ago I put together a guide for installing the Visual SfM components individually on Fedora, if you wanted to dig into them. I'm not sure how relevant it is now but it might help.
Regardless, you should certainly educate yourself on the processes involved in creating 3D structure from imagery. It is a complicated process with many details which need to be understood.
What you are asking for is a fully fledged structure from motion algorithm. I don't think such a thing exists in MATLAB or OpenCV right off the shelf. However, the building blocks required for such an algorithm are there.
I suggest you do some background reading to better understand what specific algorithm will suit your needs. A good place to start is in Richard Szeliski's textbook, chapter 7. A free draft is available here. This book is recommended both in general as a good computer vision textbook, and specifically as well for your question, in which Szeliski himself is quite an expert.
I have a MP3 file and need to constantly detect and show Hz value of this playing MP3 file. A bit of googling shows that I have 2 opportunities: use FFT or use Apple Accelerate framework. Unfortunately I haven't found any easy to use sample of neither. All samples, like AurioTouch etc, need tons of code to get simple number for sample buffer. Is there any easy example for pitch detection for iOS?
For example I've found https://github.com/clindsey/pkmFFT, but it's missing some files that its' author has removed. Anything working like that?
I'm afraid not. Working with sound is generally hard, and Core Audio makes no exception. Now to the matter at hand.
FFT is an algorithm for transforming input from time domain to frequency domain. Is not necessarily linked with sound processing, you can use it for other things other than sound as well.
Accelerate is an Apple provided framework, which among many other things offer an FFT implementation. So, you actually don't have two options there, just one and its implementation.
Now, depending on what you want to do(e.g. if you favour speed over accuracy, robustness over simplicity etc) and the type of waveform you have(simple, complex, human speech, music), FFT may be not enough on its own or not even the right choice for your task. There are other options, auto-correlation, zero-crossing, cepstral analysis, maximum likelihood to mention some. But none are trivial, except for zero-crossing, which also gives you the poorest results and will fail to work with complex waveforms.
Here is a good place to start:
http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html
There are also other question on SO.
However, as indicated by other answers, this is not something that can just be "magically" done. Even if you license code from someone (eg, iZotope, and z-plane both make excellent code for doing what you want to do), you still need to understand what's going on to get data in and out of their libraries.
If you need fast pitch detection go with http://www.schmittmachine.com/dywapitchtrack.html
You'll find a IOS sample code inside.
If you need FFT you should use Apple Accelerate framework.
Hope this help
Hi I have been searching though research papers on what features would be good for me to use in my handwritten OCR classifying neural network. I am a beginner so I have been just taking the image of the handwritten character, made a bounding box around it, and then resize it into a 15x20 binary image. So this means i have an input layer of 300 features. From the papers i have found on google (most of which are quite old) the methods really vary. My accuracy is not bad with just a binary grid of the image, but I was wondering if anyone had other features I could use to boost my accuracy. Or even just pointing me in the right direction. I would really appreciate it!
Thanks,
Zach
I haven't read any actual papers on this topic, but my advice would be to get creative. Use anything you could think of that might help the classifier identify numbers.
My first thought would be to try and identify "lines" in the image, maybe via a modified "sliding window" algorithm (sliding/rotating line?), or to try and identify a "line of best fit" to the image (to help the classifier respond to changes in italicism or writing style). Really though, if you're using a neural network, it should be picking up on these sorts of things without your manual help (that's the whole point of them!)
I would focus first on the structure and topology of your net to try and improve performance, and worry about additional features only if you cannot get satisfactory performance some other way. Also you could try improving the features you already have, make sure the character is centered in the image, maybe try an algorithm to skew italicised characters to make them vertical?
In my experience these sorts of things don't often help, but you could get lucky and run into one that improves your net :)
I have a small UIImage (jpg) with a single typed number. I want to be able to read the number with some kind of pattern recognition. I'm really not sure where to start, so any help would be appreciated.
my initial idea was to compare this image with other images. For instance compare the image with that of a 1,2,3, etc until a match was found. That just seems slow and cumbersome and wondered if there was a better way to do it?
Thanks
Update - I'm trying to convert sudoku puzzles from newspaper print to interactive puzzles
No, you are right, it will be slow and cumbersome. But on the plus side you don't have to write it yourself
http://sourceforge.net/projects/opencvlibrary/
Still not exactly easy tho, and i'm not sure about licensing, so… you don't mention why you need to do this (sounds a little odd).
Maybe you can avoid it? If you know the images are numerical digits 0-9, is there another way to track which one a particalur images is, apart from the way it's pixels are arranged?
Sorry if that sounds like i'm missing the point… Maybe you could fill in a few more details?
I read this really good write-up about this exact problem here: http://sudokugrab.blogspot.com/2009/07/how-does-it-all-work.html
It doesn't have any code samples, but explains the concepts, and might be able to point you in the right direction.
The following tutorial may be right down your alley:
http://blog.damiles.com/2008/11/basic-ocr-in-opencv/
It is a simple tutorial on doing number recognition and comes with the source code also.
Additionally, you may want to do a search on OCR SDK (Optical Character Recognition Software Development Kit). You will surely find a stack of them. Commercial ones a pricey though.
I would go for a "role your own" approach along the line of the OpenCV tutorial, especially since you are only interested in numbers.
All of the best ':-)