Can I tell Google Cloud vision to isolate the largest (font size) text it reads in an image?

Can I tell Google Cloud vision to isolate the largest (font size) text it reads in an image? - image-recognition

I would like to use cloud vision to recognize and isolate the "main" text in a picture (say for instance the largest writing on packaging). I assume cloud vision would be able to do that on the backend by looking at the font size/how many pixels each letter takes up, but I am not sure whether there is an output for that in the API.
My best guess at where this info would be in the documentation is at the following link but I couldn't see anything related to this: https://cloud.google.com/natural-language/reference/rest/v1beta1/documents/annotateText

TEXT_DETECTION Vision API
(https://cloud.google.com/vision/docs/samples#detecting_text_in_images)
can return arrays of (string + boundingPoly).
The json representation AnnotateImageResponse is below.
"textAnnotations": [
{
object(EntityAnnotation)
}
]
EntityAnnotation
https://cloud.google.com/vision/reference/rest/v1/images/annotate#EntityAnnotation
You can get the font size (in pixels) from BoundingPoly
https://cloud.google.com/vision/reference/rest/v1/images/annotate#BoundingPoly

Related

in web-audio api how to obtain an array(eg. FLOAT32 array) from a stream (eg a microphone stream) for several seconds

I would like to fill an array from a stream for around ten seconds.{I wish to do some processing on the data)So far I can:
(a) obtain the microphone stream using mediaRecorder
(b) use analyser and analyser.getFloatTimeDomainData(dataArray) to obtain an array but it is size limited to only a little over half a second of data.I can also successfully output the data after processing back onto a stream and to outDestination.
(c) I have also experimented with obtaining a 'chunks' array from mediaRecorder directly but the problem then is that I can't find any mime type that would give me a simple array of values - ie an uncompressed sample by sample single channel set of value - ie a longer version of 'dataArray' in (b).
I am wondering if I am missing a simple way round this problem?
Solutions I have seen tend to use step (b) and do regular polls then reassemble a longer array - however it seems the timing is a bit tricky ..
I'v also seen suggestions to use audio workouts - I might have to do this but would prefer a simpler solution!
Or again, if someone knows how to drive mediaRecorder to output the chunks array in a simple array format FLOAT32.of one channel.That would do the trick.
Or maybe I'm missing something simpler?
I have code showing those steps that have been successful and will upload if anyone requests.

Marytts HMM voice quality changes with text length

I am using MaryTTS as a text to speech engine inside a Grails Application.
During app testing I found out that the language quality drastically changes (for the worst) with increasing text length when using a HMM voice.
So naturally I tested via the MARY Web Client while tweeking all HMM relevant parameters (F0Add, F0Scale and Rate) as well as removing them or leaving the default values, but to no success.
The voice I am using is bits1-hsmm:5.2 (German Female)
gradle dependency:
compile "de.dfki.mary:voice-bits1-hsmm:5.2"
The code is as simple as:
def marytts = new LocalMaryInterface()
marytts.locale = Locale.GERMAN
marytts.generateAudio text
Everything works fine up to the point where the text to convert goes over 120 characters (not only in the code but also via the Mary Web Client)
Here the text I used for the last tests:
Baumaßnahmen im Mai und Oktober Notwendige Instandhaltungsarbeiten an der Münchner S-Bahn-Stammstrecke sollen von nun an gebündelt stattfinden. Die Bahn möchte dadurch die baubedingten Fahrplaneinschränkungen durch gesperrte Gleise geringer halten.
To see the difference in quality use a part of the text (first couple words) vs the whole.
Another important point: This does not occur when using a Unit Selection voice .
Am I missing something like a configuration or specific parameter set or is this the standard behaviour of HMM voices inside MaryTTS?
It will be great to be able to use this voice with decent quality, since Unit Selection voices are not available as standalone dependencies and having to split the text in smaller parts and play them sequentially is not really something I would consider.
Any input is appreciated.
Update
Further trial and error showed that the robotic background sound is added when the text contains punctoation marks such as . , : ; [ ] { }. Independent of text length! Not really sure what the root cause is but atleast with a text manipulation before the conversion the voice is useable.

Controlling light using midi inputs

I currently am using Max/MSP to create an interactive system between lights and sound.
I am using Philips hue lighting which I have hooked up to Max/MSP and now I am wanting to trigger an increase in brightness/saturation on the input of a note from a Midi instrument. Does anyone have any ideas how this might be accomplished?

I have built this.
I used the shell object. And then feed an array of parameters into it via a javascipt file with the HUE API. There is a lag time of 1/6 of a second between commands.
Javascript file:
inlets=1;
outlets=1;
var bridge="192.168.0.100";
var hash="newdeveloper";
var bulb= 1;
var brt= 200;
var satn= 250;
var hcolor= 10000;
var bulb=1;
function list(bulb,hcolor,brt,satn,tran) {
execute('PUT','http://'+bridge+'/api/'+hash+'/lights/'+bulb+'/state', '"{\\\"on\\\":true,\\\"hue\\\":'+hcolor+', \\\"bri\\\":'+brt+',\\\"sat\\\":'+satn+',\\\"transitiontime\\\":'+tran+'}"');
}
function execute($method,$url,$message){
outlet(0,"curl --request",$method,"--data",$message,$url);
}

To control Philips Hue you need to issue calls to a restful http based api, like so: http://www.developers.meethue.com/documentation/core-concepts, using the [jweb] or [maxweb] objects: https://cycling74.com/forums/topic/making-rest-call-from-max-6-and-saving-the-return/
Generally however, to control lights you use DMX, the standard protocol for professional lighting control. Here is a somewhat lengthy post on the topic: https://cycling74.com/forums/topic/controlling-video-and-lighting-with-max/, scroll down to my post from APRIL 11, 2014 | 3:42 AM.

To change the bri/sat of your lights is explained in the following link (Registration/Login required)
http://www.developers.meethue.com/documentation/lights-api#16_set_light_state
You will need to know the IP Address of your hue hue bridge which is explained here: http://www.developers.meethue.com/documentation/getting-started and a valid username.
Also bear in mind the performance limitations. As a general rule you can send up to 10 lightstate commands per second. I would recommend having a 100ms gap between each one, to prevent flooding the bridge (and losing commands).

Are you interested in finding out details of who to map this data from a MIDI input to the phillips HUE lights within max? or are you already familiar with Max.
Using Tommy b's javascript (which you could put into a js object), You could for example scale the MIDI messages you want to use using midiin and borax objects and map them to the outputs you want using the scale object. Karlheinz Essl's RTC library is a good place to start with algorithmic composition if you want to transform the data at all http://www.essl.at/software.html

+1 for DMX light control via Max. There are lots of good max-to-dmx tutorials and USB-DMX hardware is getting pretty cheap. However, as someone who previously believed in dragging a bunch of computer equipment on stage just to control a light or two with an instrument, I'd recommend researching and purchasing a simple one channel "color organ" circuit kit (e.g., Velleman MK 110). Controlling a 120/240V light bulb via audio is easier than you might think; a computer for this type of application is usually overkill. Keep it simple and good luck!

how to use "FindBarcodesInUIImage"?

I am developing Barcode scanner application for iPhone.
Library: RedLaser
Just I want to scan the barcode from the existing image, not from camera.
I didn't get any documentation to call FindBarcodesInUIImage method manually.
Can I get any sample code ?

Does this snippet from the documentation help?
This method analyses a given image and returns information on any barcodes discovered in the image. It is intended to be used in cases where the user already has a picture of a barcode (in their photos library, for example) that they want to decode. This method performs a thorough check for all barcode symbologies we support, and is not intended for real-time use.
When scanning barcodes using this method, you cannot (and need not) specify a scan orientation or active scan region; the entire image is scanned in all orientations. Nor can you restrict the scan to particular symbol types. If such a feature is absolutely necessary, you can implement it post-scan by filtering the result set.
FindBarcodesInUIImage operates synchronously, but can be placed in a thread. Depending on image size and processor speed, it can take several seconds to process an image.

void ScanImageForBarcodes(UIImage *inputImage)
{
NSSet *resultSet = FindBarcodesInUIImage(inputImage);
// Print the results
NSLog(#"%#", resultSet);
}
If the SDK did not find any barcodes in the image, the log message will be (null). Otherwise, it will be something like:
{(
(0x19e0660) Code 39: 73250110 -- (1 finds)
)}
This log message indicates a found set containing one item, a Code 39 barcode with the value "73250110".
Remember that the SDK is not guaranteed to find barcodes in an image. Even if an image contains a barcode, the SDK might not be able to read it, and you will receive no results.

Formula or API for calulating desktop icon spacing on Windows XP

I've built a simple application that applies grid-lines to an image or just simple colors for use as desktop wallpaper. The idea is that the desktop icons can be arranged within the grid. The problem is that depending on more things than I understand the actual spacing in pixels seems to be different from system to system. I've learned that at least these things play a factor:
Resolution (duh)
Taskbar size and placement
Fonts
There has to be more than this. Maybe there's some api call that I don't know about?

there are a 1001 ways to get/set this (but I only know 2) :-D
Windows Register:
HKEY_CURRENT_USER\Control Panel\Desktop\WindowMetrics
values are IconSpacing and IconVerticalSpacing
by code:
using System.Management;
public string GetWinIconSpace()
{
ManagementObjectSearcher searcher = new ManagementObjectSearcher("root\\CIMV2","SELECT * FROM Win32_Desktop");
foreach (ManagementObject wmi in searcher.Get())
{
try
{
return "Desktop Icon Spacing: " + wmi.GetPropertyValue("IconSpacing").ToString();
}
catch { }
}
return "Desktop Icon Spacing: Unknown";
}
and the 3rd that I never tried you can find it here

They might also be a size problem due to scaling algorithm if the requested size of the icon is not available.
(since an icon file is actually a collection of icons, as explained in this thread about Icons and cursors know where they came from, from the The Old New Thing)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse