VNRecognizeTextRequest stopped recognising text in iOS 15 - swift

I'm using VNRecognizeTextRequest via VNImageRequestHandler with following settings:
request.recognitionLevel = .accurate
request.usesLanguageCorrection = false
request.recognitionLanguages = ["en-US", "de-DE"]
I'm using real time capturing from AVFoundation, then converting CMSampleBuffer to CVPixelBuffer and send it to VNImageRequestHandler.
Basically the code is taken from Apple sample:
https://developer.apple.com/documentation/vision/reading_phone_numbers_in_real_time
The thing is it was working well until iOS 15 came. Then it stopped recognising the text with following errors:
Could not determine an appropriate width index for aspect ratio 0.0062
Could not determine an appropriate width index for aspect ratio 0.0078
Could not determine an appropriate width index for aspect ratio 0.0089
...
Generally I was able to partly fix it by changing recognitionLevel from .accurate to .fast. But I prefer the first more as it gives better results and sometimes .fast recognises only individual characters from words, e.g. if there is a number with spaces like "7 2 5 6" it can recognise only 7 or 2 and so on.
Thanks in advance for suggestions.

Related

Can Flutter render images from raw pixel data? [duplicate]

Setup
I am using a custom RenderBox to draw.
The canvas object in the code below comes from the PaintingContext in the paint method.
Drawing
I am trying to render pixels individually by using Canvas.drawRect.
I should point out that these are sometimes larger and sometimes smaller than the pixels on screen they actually occupy.
for (int i = 0; i < width * height; i++) {
// in this case the rect size is 1
canvas.drawRect(
Rect.fromLTWH(index % (width * height),
(index / (width * height)).floor(), 1, 1), Paint()..color = colors[i]);
}
Storage
I am storing the pixels as a List<List<Color>> (colors in the code above). I tried differently nested lists previously, but they did not cause any noticable discrepancies in terms of performance.
The memory on my Android Emulator test device increases by 282.7MB when populating the list with a 999x999 image. Note that it only temporarily increases by 282.7MB. After about half a minute, the increase drops to 153.6MB and stays there (without any user interaction).
Rendering
With a resolution of 999x999, the code above causes a GPU max of 250.1 ms/frame and a UI max of 1835.9 ms/frame, which is obviously unacceptable. The UI freezes for two seconds when trying to draw a 999x999 image, which should be a piece of cake (I would guess) considering that 4k video runs smoothly on the same device.
CPU
I am not exactly sure how to track this properly using the Android profiler, but while populating or changing the list, i.e. drawing the pixels (which is the case for the above metrics as well), CPU usage goes from 0% to up to 60%. Here are the AVD performance settings:
Cause
I have no idea where to start since I am not even sure what part of my code causes the freezing. Is it the memory usage? Or the drawing itself?
How would I go about this in general? What am I doing wrong? How should I store these pixels instead.
Efforts
I have tried so much that did not help at all that I will try to only point out the most notable ones:
I tried converting the List<List<Color>> to an Image from the dart:ui library hoping to use Canvas.drawImage. In order to do that, I tried encoding my own PNG, but I have not been able to render more than a single row. However, it did not look like that would boost performance. When trying to convert a 9999x9999 image, I ran into an out of memory exception. Now, I am wondering how video is rendered as all as any 4k video will easily take up more memory than a 9999x9999 image if a few seconds of it are in memory.
I tried implementing the image package. However, I stopped before completing it as I noticed that it is not meant to be used in Flutter but rather in HTML. I would not have gained anything using that.
This one is pretty important for the following conclusion I will draw: I tried to just draw without storing the pixels, i.e. is using Random.nextInt to generate random colors. When trying to randomly generate a 999x999 image, this resulted in a GPU max of 1824.7 ms/frames and a UI max of 2362.7 ms/frame, which is even worse, especially in the GPU department.
Conclusion
This is the conclusion I reached before trying my failed attempt at rendering using Canvas.drawImage: Canvas.drawRect is not made for this task as it cannot even draw simple images.
How do you do this in Flutter?
Notes
This is basically what I tried to ask over two months ago (yes, I have been trying to resolve this issue for that long), but I think that I did not express myself properly back then and that I knew even less what the actual problem was.
The highest resolution I can properly render is around 10k pixels. I need at least 1m.
I am thinking that abandoning Flutter and going for native might be my only option. However, I would like to believe that I am just approaching this problem completely wrong. I have spent about three months trying to figure this out and I did not find anything that lead me anywhere.
Solution
dart:ui has a function that converts pixels to an Image easily: decodeImageFromPixels
Example implementation
Issue on performance
Does not work in the current master channel
I was simply not aware of this back when I created this answer, which is why I wrote the "Alternative" section.
Alternative
Thanks to #pslink for reminding me of BMP after I wrote that I had failed to encode my own PNG.
I had looked into it previously, but I thought that it looked to complicated without sufficient documentation. Now, I found this nice article explaining the necessary BMP headers and implemented 32-bit BGRA (ARGB but BGRA is the order of the default mask) by copying Example 2 from the "BMP file format" Wikipedia article. I went through all sources but could not find an original source for this example. Maybe the authors of the Wikipedia article wrote it themselves.
Results
Using Canvas.drawImage and my 999x999 pixels converted to an image from a BMP byte list, I get a GPU max of 9.9 ms/frame and a UI max of 7.1 ms/frame, which is awesome!
| ms/frame | Before (Canvas.drawRect) | After (Canvas.drawImage) |
|-----------|---------------------------|--------------------------|
| GPU max | 1824.7 | 9.9 |
| UI max | 2362.7 | 7.1 |
Conclusion
Canvas operations like Canvas.drawRect are not meant to be used like that.
Instructions
First of, this is quite straight-forward, however, you need to correctly populate the byte list, otherwise, you are going to get an error that your data is not correctly formatted and see no results, which can be quite frustrating.
You will need to prepare your image before drawing as you cannot use async operations in the paint call.
In code, you need to use a Codec to transform your list of bytes into an image.
final list = [
0x42, 0x4d, // 'B', 'M'
...];
// make sure that you either know the file size, data size and data offset beforehand
// or that you edit these bytes afterwards
final Uint8List bytes = Uint8List.fromList(list);
final Codec codec = await instantiateImageCodec(bytes));
final Image image = (await codec.getNextFrame()).image;
You need to pass this image to your drawing widget, e.g. using a FutureBuilder.
Now, you can just use Canvas.drawImage in your draw call.

Apple Vision – Can't recognize a single number as region

I want to use VNDetectTextRectanglesRequest from a Vision framework to detect regions in an image containing only one character, number '9', with the white background. I'm using following code to do this:
private func performTextDetection() {
let textRequest = VNDetectTextRectanglesRequest(completionHandler: self.detectTextHandler)
textRequest.reportCharacterBoxes = true
textRequest.preferBackgroundProcessing = false
let handler = VNImageRequestHandler(cgImage: loadedImage.cgImage!, options: [:])
DispatchQueue.global(qos: .userInteractive).async {
do {
try handler.perform([textRequest])
} catch {
print ("Error")
}
}
}
func detectTextHandler(request: VNRequest, error: Error?) {
guard let observations = request.results, !observations.isEmpty else {
fatalError("no results")
}
print("there is result")
}
Number of observations results I get is 0, however if I provide an image with text '123' on black background, '123' is detected as a region with text. The described problem also occurs for 2 digit numbers, '22' on white background also doesn't get detected.
Why does a Vision API detect only 3 digits+ numbers on white background in my case?
Long characters continue to be a problem for VNRecognizeTextRequest and VNDetectTextRectanglesRequest in XCode 12.5 and Swift 5.
I've seen VNDetectTextRectanglesRequest find virtually all the individual words on a sheet of paper, but fail to detect lone characters [when processing the entire image]. Setting the property VNDetectTextRectanglesRequest.regionOfInterest to a smaller region may help.
What has worked for me is to have the single characters occupy more of the region of interest (ROI) for VNRecognizeTextRequest. I tested single characters at a variety of heights, and it became clear that single characters would start reading once they reached a certain size within the ROI.
For some single characters, detection seems to occur when the ROI is roughly three times the width and three times the height of the character itself. That's a rather tight region of interest. Placing it correctly is another problem, but also solvable.
If processing time isn't an issue for your application, you can create an array [CGRect] spanning a region suspected to contain lone characters.
My suspicion is that when VNRecognizeTextRequest performs an initial check for edge content, edge density, and/or image features that resemble strokes, it exits early if it doesn't find enough candidates. That initial check may simply be an embedded VNDetectTextRectanglesRequest. Whatever the initial check is, it runs fast, so I don't imagine it's that complicated.
For more about stroke detection to find characters, search for SO posts and articles about the Stroke Width Transform. Also this: https://www.microsoft.com/en-us/research/publication/detecting-text-in-natural-scenes-with-stroke-width-transform/. The SWT is meant to work on "natural" images, such as text seen outdoors.
There are some hacks to get around the problem. Some of these hacks are unpleasant, but for a particular application they may be worth it.
Create a grid of small regions of interest (ROIs). Run the text request on one ROI after the other.
As a cheap substitute for VNDetectTextRectanglesRequest, look for regions of the image with edge content that suggests a single character may be present. If nothing else, this could help ignore regions where there is no edge content.
Try use a scaling filter to scale up the image before processing it. That could ensure single characters are big enough to read. (For CIFilters, a very handy resource is https://cifilter.io/)
Run multiple passes on your image. First, run OCR on the full image. Then get the bounding boxes for words that were read. Search for suspicious gaps between boxes. Run grids of small ROIs on the suspiciously blank regions.
Use Tesseract as a backup. (https://www.seemuapps.com/swift-optical-character-recognition-tutorial)

Displaying a 500x500 pixel image in a 640x852 pixel NSImageView without any kind or blurriness (Swift on OS X)

I have spent hours on Google searching for an answer to this and trying pieces of code but I have just not been able to find one. I also recognise that this is a question that has been asked lots of times, however I do not know what else to do now.
I have access to 500x500 pixel rainfall radar images from the Met Offices' DataPoint API, covering the UK. They must be displayed in a 640x852 pixel area (an NSImageView, which I currently have the scaling property set to axis independent) because this is the correct size of the map generated for the boundaries covered by the imagery. I want to display them at the enlarged size of 640x852 using the nearest neighbour algorithm and in an aliased format. This can be achieved in Photoshop by going to Image > Image Size... and setting resample to nearest neighbour (hard edges). The source images should remain at 500x500 pixels, I just want to display them in a larger view.
I have tried setting the magnificationFilter of the NSImageView.layer to all three of the different kCAFilter... options but this has made no difference. I have also tried setting the shouldRasterize property of the NSImageView.layer to true, which also had no effect. The images always end up being smoothed or anti-aliased, which I do not want.
Having recently come from C#, there could be something I have missed as I have not been programming in Swift for very long. In C# (using WPF), I was able to get what I want by setting the BitmapScalingOptions of the image element to NearestNeighbour.
To summarise, I want to display a 500x500 pixel image in a 640x852 pixel NSImageView in a pixelated form, without any kind of smoothing (irrespective of whether the display is retina or not) using Swift. Thanks for any help you can give me.
Below is the image source:
Below is the actual result (screenshot from a 5K iMac):
This was created by simply setting the image property on an NSImageSource with the tableViewSelectionDidChange event of my NSTableView used to select the times to show the image for, using:
let selected = times[timesTable.selectedRow]
let formatter = NSDateFormatter()
formatter.dateFormat = "d/M/yyyy 'at' HH:mm"
let date = formatter.dateFromString(selected)
formatter.dateFormat = "yyyyMMdd'T'HHmmss"
imageData.image = NSImage(contentsOfFile: basePathStr +
"RainObs_" + formatter.stringFromDate(date!) + ".png")
Below is what I want it to look like (ignoring the background and cropped out parts). If you save the image yourself you will see it is pixellated and aliased:
Below is the map that the source is displayed over (the source is just in an NSImageView laid on top of another NSImageView containing the map):
Try using a custom subclass of NSView instead of an NSImageView. It will need an image property with a didSet observer that sets needsDisplay. In the drawRect() method, either:
use the drawInRect(_:fromRect:operation:fraction:respectFlipped:hints:) method of the NSImage with a hints dictionary of [NSImageHintInterpolation:NSImageInterpolation.None], or
save the current value of NSGraphicsContext.currentContext.imageInterpolation, change it to .None, draw the NSImage with any of the draw...(...) methods, and then restore the context's original imageInterpolation value

MATLAB "CCTV" image processing, contrast filtering/feature detection

I'm a bit of a noob in MATLAB (and image processing in general) and I'm wondering if you can help me with a bit of an issue I'm having. Essentially, I'm given an image of an alley, and then multiple images of the same alley, but with different contrasts and some of the images have a picture of a robber in them. I need to be able to detect the robbers in the images, and run the same code on all of the images (i.e. I'm not allowed to custom-tailor the code for specific images). Here's what I have so far:
background = imread('backalley.jpg');
criminal = imread('backalleyX.jpg'); % Where X is the number of the image, there
%are 16 in total from 0 to 15
J = imhist(background);
K = histeq(criminal,J);
diffImage = abs(double(background)-double(K));
thresholdValue = 103;
filteredImage = diffImage > thresholdValue;
(Keep in mind I'm still playing around with the thresholdValue)
This leaves me with either a gray image if there isn't a robber, or a black and white image showing some of the features of the robber. The issue I'm having is that three of the 16 images with a very high contrast initially leave me with most of the features of the alley still visible, even after having histogram equalization done. Is there anything I can do to filter these images or adjust the contrast better, that won't cause an issue with the rest of the successfully processed images? Unfortunately since I'm new here I can't post images showing what's going on, sorry.
EDIT: Here is a link to the photobucket album: http://s997.photobucket.com/user/52TulaSKS/library/Image%20Processing
All of the images needing processing are there, as well as the original, and examples of processed images. I gave titles to the important ones (original, ones giving me trouble, and the examples of correctly and incorrectly processed images).
Change your threshold to a higher value.

CGImageGetBytesPerRow() returns different values on iOS simulator and iOS Device

I have an image taken by my ipod touch 4 that's 720x960. In the simulator calling CGImageGetBytesPerRow() on the image returns 2880 (720 * 4 bytes) which is what I expected. However on the device CGImageGetBytesPerRow() will return 3840, the "bytes per row" along the height. Does anyone know why there is different behavior even though the image I'm CGImageGetBytesPerRow() on has a width of 720 and height of 960 in both cases?
Thanks in advance.
Bytes per row can be anything as long as it is sufficient to hold the image bounds, so best not to make assumptions that it will be the minimum to fit the image.
I would guess that on the device, bytes per row is dictated by some or other optimisation or hardware consideration: perhaps an image buffer that does not have to be changed if the orientation is rotated, or the image sensor transfers extra bytes of dead data per row that are then ignored instead of doing a second transfer into a buffer with minimum bytes per row, or some other reason that would only make sense if we knew the inner workings of these devices.
It may slightly different because the internal memory allocation: "The number of bytes used in memory for each row of the specified bitmap image (or image mask)."1
Consider using NSBitmapRepresention for some special tasks.