Is there a way to determine unicode compatibility, particularly with newer sets? I want to use the Unicode 6.0 character ❌ (❌), but was not sure which browsers it was compatible with. Searched on caniuse.com, but could not find compatability tables.
This is not a browser issue. Any browser can handle the character ❌, but the user would need to have a font installed that has a shape for it otherwise they'll see some kind of placeholder.
A glyph for U+274C Cross Mark ❌ is supplied as part of fonts bundled with Windows 7, and at least the Android and Ubuntu versions I have here, so that's not bad. If you need to be sure, you can use an embedded font that has such a glyph.
Alternatively there's the very old and well-supported U+00D7 Multiplication Sign ×, or other alternatives.
(There are cases when a browser—or, more specifically, the text layout engine used by the browser—does need to understand what a character is to render it correctly, in addition to having font support. But that's for when characters are added as part of complex scripts like Arabic where characters change their shape in response to context. Not a concern for simple standalone characters like ❌.)
You can check if an emoji is supported by:
Rendering it into a Canvas.
Resizing it to a 1 × 1 to get a single pixel average of it.
Comparing that average value to the one you get when rendering the square character that shows up when an emoji is missing.
Something like this (just a starting point, it just works on Chrome):
function emojiExists(emoji) {
try {
const canvas = document.createElement('CANVAS');
const context = canvas.getContext('2d');
// Set font baseline, size and family:
context.textBaseline = 'top';
context.font = '100px sans-serif';
// Scale so that we get a 1 x 1 representation of the emoji
// (just an average of all the pixels):
context.scale(0.01, 0.01);
// Write the emoji:
context.fillText(emoji, 0, 0);
// Just for testing. Uncoment this line and comment context.scale(...);
// document.body.appendChild(canvas);
// [0, 0, 0, 42] is the value returned for the rectangle character that shows
// up for missing emojis:
return context.getImageData(0, 0, 1, 1).data.join(',') !== '0,0,0,42';
} catch (err) {
// Canvas might not be suported...
}
}
console.log(emojiExists('😃')); // https://emojipedia.org/smiling-face-with-open-mouth/
console.log(emojiExists('💭')); // https://emojipedia.org/right-anger-bubble/
console.log(emojiExists('🛹')); // https://emojipedia.org/skateboard/
console.log(emojiExists('🇨🇳')); // https://emojipedia.org/flag-for-china/
For me, with Chrome Version 63.0.3239.132 (Official Build) (64-bit), I get true, true, false, false, but it doesn't work right in Firefox. It seems to be a known bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1209480
You might need to adapt the code (and change or dynamically calculate the '0,0,0,42' value) when using custom fonts or different browsers.
Related
I finally got some sort of pdf scanner to work. It reads into the callback functions without a problem, but when I try to NSLog the result from a CGPDFScannerPopString I get a result like this:
ˆ ˛˝ # ˜˜˜ #˜' ˜˜˜ "˜ '˜˜ " ' ˜˜
No string to be found here...
Any ideas of what it can be?
This is my callback function:
static void op_Tj (CGPDFScannerRef s, void *info)
{
CGPDFStringRef string;
if (!CGPDFScannerPopString(s, &string))
return;
NSLog(#"string: %#", (__bridge NSString *)CGPDFStringCopyTextString(string));
}
Thanks already!
Edit: Example PDF
You should be aware that the CGPDFStringRef is not a ASCII string or something similar at all. Cf. http://developer.apple.com/library/mac/documentation/graphicsimaging/Reference/CGPDFString/Reference/reference.html --- it is a "series of bytes—unsigned integer values in the range 0 to 255" which have to be interpreted according to the latest PDF reference.
The PDF reference in turn will tell you that the interpretation of the bytes depends on the font used, and while ASCII-like interpretations are common in case of European languages, they are not mandatory, and in case of Asian languages where font subset embedding is very common, the interpretation may look random.
CGPDFStringCopyTextString tries to interpret those bytes accordingly, but there does not have to be a sensible interpretation as a regular string.
EDIT Inspection of the sample PDF Ron supplied showed that in case of this sample indeed the encoding of the font in object 3 0 (which is dominant on most pages of the document) is not a standard encoding but instead:
<</Type/Encoding
/Differences[0/.notdef/C/O/V/E/R/space/slash/H/L/F/underscore/W/B/five/eight/four
/zero/two/six/D/one/period/three/Z/I/N/G/U/S/T/colon/seven/A/M/P/Y
/plus/nine/X/hyphen/i/s/p/a/t/c/h/n/f/o/K/greater/equal/l/m/y/J/Q
/parenleft/parenright/comma/dollar/ampersand/d/r/v/b/e/u/w/k/g/x/bar
/quotesingle/asterisk/q/question/percent]
>>
Looking at the top of the first document page
COVER / HLF_CWEB_58408485 / 58408485 / 26DEC12 10.30.22Z
BRIEFING INCLUDES FOLLOWING FLIGHTS:
26DEC12 OR0337 EHAM0630 MUVR1710 PHOYE VSM+2/8 179
NEXT FLIGHTS OF AIRCRAFT:
26DEC12 OR0338 MUVR1830 MMUN1940 PHOYE VSM+2/8 213
26DEC12 OR0338 MMUN2105 EHAM0655 PHOYE GPT+2/7 263
27DEC12 OR0365 EHAM0900 TNCB1930 PHOYE BAH+1/8 272
27DEC12 OR0366 TNCB2030 TNCC2110 PHOYE BAH+1/8 250
27DEC12 OR0366 TNCC2250 EHAM0835 PHOYE ASD+1/8 199
that encoding seems to have been created by dealing out the next number starting from one for the next required glyph. This obviously results in a highly individualistic encoding...
That being said the font object does include both an /Encoding entry and a /ToUnicode entry. Thus, if the method CGPDFStringCopyTextString was given a reference to the font here and really tried, it would easily be able to correctly translate those bytes into the corresponding text. That it doesn't achieve anything decent, seems to indicate that it simply does not have the information which font to interpret the bytes for --- I don't assume it doesn't try...
For accurate text extraction, therefore, you have to interpret the bytes in the CGPDFStringRef yourself using the information of the the font in the content stream. If you don't want to do that from scratch, you might be interested in PDFKitten, a framework for extracting data from PDFs in iOS. While it is not yet perfect (some font structures can baffle it), it is a good starting point.
I am looking for character sets to display each character in my LED Display Board.
Normally I have to put all these characters together in an array of booleans, for example H and A:
bool[] H = { 1,0,0,0,0,1, bool[] A = { 0,0,1,1,0,0,
1,0,0,0,0,1, 0,1,0,0,1,0,
1,1,1,1,1,1, 0,1,1,1,1,0,
1,0,0,0,0,1, 1,0,0,0,0,1,
1,0,0,0,0,1 } 1,0,0,0,0,1 }
I think there should be such collections already available in the internet, but under the seach keys with character set I found nothing. So a list with possibly many characters expressed with this bitmap format.
Do you have a tip for me. Would save me a lot of stupid word :)
Thanks you very much for the help. I appreciate it.
Regards,
Chris
check out this site: character-set-generator
Is site great for you.
To import fonts in a C code for LED display, we need to convert bitmap of a font into byte arrays so that it can be easily superimposed onto the video memory of Display.
For that you will find number of tools to fits your requirement.
but best format of byte code is given below to be used in LED display code:
{0x07e0, 0x1ff8, 0x3ffc, 0x600e, 0x8001, 0x0000 }, // Byte code for A
{0x8001, 0x4006, 0x7ffc, 0x3ff8, 0x0fe0, 0x0000 }, // Byte code for B
{0x0018, 0x003e, 0x003e, 0x003e, 0x003e, 0x0018, 0x0000 }, // Byte code for W
Some times in back I had made such custom build application on VC++ platform. It generates byte code for bitmap in above form.
I can share that app if you really indeed.
By using such bytecode formats you can easily achieve LED display sign such as shown in following images:
http://www.ledsignsdisplays.com/indoor-led-signs.html
http://photonplay.com/indoor-led-display.html
Specifically, I'd like to write the Windows Media Player frames, such as theWM/MediaClassPrimaryID, WM/MediaClassSecondaryID and WM/WMCollectionGroupID frames.
I'm using PowerShell, but C# would be good too.
I'm a bit rusty, but the following should be mostly correct. I remember these tags being UTF-16 but you'll probably want to get an existing tag and try decoding its value with the Unicode encoder to be sure.
// Get or create the ID3v2 tag.
TagLib.Id3v2.Tag id3v2_tag = file.GetTag(TagLib.TagTypes.Id3v2, true);
if(id3v2_tag != null) {
// Get the private frame, create if necessary.
PrivateFrame frame = PrivateFrame.Get(id3v2_tag, "WM/MediaClassPrimaryID", true);
// Set the frame data to your value. I am 90% sure that these are encoded with UTF-16.
frame.PrivateData = System.Text.Encoding.Unicode.GetBytes(value);
}
Why is it so hard to figure out how to draw Unicode characters on the iPhone, deriving simple font metrics along the way, such as how wide each imaged glyph is going to be in the font of choice?
It looks like it'd be easy with NSLayoutManager, but that API apparently isn't available on the phone. It appears the way people are doing this is to use a private API, CGFontGetGlyphsForUnichars, which won't get you past the Apple gatekeepers into the App store.
Can anybody point me to documentation that shows how to do this? I'm losing hair rapidly.
Howard
I assumed that the exclusion of CGFontGetGlyphsForUnichars
was an oversight rather than a deliberate move, however I'm not
betting the farm on it. So instead I use
[NSString drawAtPoint:withFont:]; (in UIStringDrawing.h)
and
[NSString sizeWithFont];
This also has the advantage of performing decent substitution
on characters missing from your font, something that
CGContextShowGlyphs does not do.
CoreText is the answer if you want to draw unicode rather than CGContextShowGlyphsAtPositions. Also it's better than [NSString drawAtPoint:withFont:] if you need custom drawing.
Here is a complete example:
CTLineRef line = CTLineCreateWithAttributedString((CFAttributedStringRef)attributedString);
CFArrayRef runArray = CTLineGetGlyphRuns(line);
//in more complicated cases make loop on runArray
//here I assumed this array has only 1 CTRunRef within
const CTRunRef run = (CTRunRef)CFArrayGetValueAtIndex(runArray, 0);
//do not use CTFontCreateWithName, otherwise you won't see e.g. chinese characters
const CTFontRef font = CFDictionaryGetValue(CTRunGetAttributes(run), kCTFontAttributeName);
CFIndex glyphCount = CTRunGetGlyphCount(run);
CGGlyph glyphs[glyphCount];
CGPoint glyphPositions[glyphCount];
CTRunGetGlyphs(run, CFRangeMake(0, 0), glyphs);
//you can modify positions further
CTRunGetPositions(run, CFRangeMake(0, 0), glyphPositions);
CTFontDrawGlyphs(font, glyphs, glyphPositions, glyphCount, context);
CFRelease(line);
I've made a pretty suitable replacement for the private function. Read about it here:
http://thoughts.codemelody.com/2009/07/a-replacement-for-cgfontgetglyphsforunichars/
I've built a simple application that applies grid-lines to an image or just simple colors for use as desktop wallpaper. The idea is that the desktop icons can be arranged within the grid. The problem is that depending on more things than I understand the actual spacing in pixels seems to be different from system to system. I've learned that at least these things play a factor:
Resolution (duh)
Taskbar size and placement
Fonts
There has to be more than this. Maybe there's some api call that I don't know about?
there are a 1001 ways to get/set this (but I only know 2) :-D
Windows Register:
HKEY_CURRENT_USER\Control Panel\Desktop\WindowMetrics
values are IconSpacing and IconVerticalSpacing
by code:
using System.Management;
public string GetWinIconSpace()
{
ManagementObjectSearcher searcher = new ManagementObjectSearcher("root\\CIMV2","SELECT * FROM Win32_Desktop");
foreach (ManagementObject wmi in searcher.Get())
{
try
{
return "Desktop Icon Spacing: " + wmi.GetPropertyValue("IconSpacing").ToString();
}
catch { }
}
return "Desktop Icon Spacing: Unknown";
}
and the 3rd that I never tried you can find it here
They might also be a size problem due to scaling algorithm if the requested size of the icon is not available.
(since an icon file is actually a collection of icons, as explained in this thread about Icons and cursors know where they came from, from the The Old New Thing)