CNTK: Start of features is set to 1 - mechanism of UCIFastReader - frameworks

sorry for this rather simple question, however there is yet too little documentation about the usage of Microsoft's OpenSource AI library CNTK.
I continue to witness people setting the reader's feature start to 1, while setting the labels start to 0. But should both of them be always 0, as informations does in computer science always start from the zero point? Wouldn't we lose one piece of information this way?
Example of CIFAR10 02_BatchNormConv
features=[
#dimension = 3 (rgb) * 32 (width) * 32(length)
dim=3072
start=1
]
labels=[
dim=1
start=0
labelDim=10
labelMappingFile=$DataDir$/labelsmap.txt
]
Update: New format
Microsoft has recently updated this, in order to get rid of these confusion and make the CNTK Definition Language more readable.
Instead of having to define the start of the values within the line, you can now simply define the type of data in the dataset itself:
|labels <tab seperated values> | features <tab seperated values> [EndOfLine/EOL]
if you want to reverse the order of features and lables you can simply go for:
|features <tab seperated values> | labels <tab seperated values> [EndOfLine/EOL]
You only have still to define the dim value, in order to specify the amount of values you want to input.
Note: There's no | at the end of the row. EOL indicates the end of the row.
For more information visit the CNTK Wiki on this topic.

You are misunderstanding how the reader works. The UCIFastReader works on a file which contains tab separated feature vector. Each line in this file corresponds to an entry (an image in this case), as well as its classification.
So, dim tells the reader how many columns to read, while start tells the reader from which column to start reading.
So, if you had an image of size 2x2, with a 2 labels for each, your file could be of the form <image_pixel_columns><label_columns>:
0 0 0 0 0 0
0 0 1 0 1 0
...
So the first 4 entries in the line are your features (image pixel values), and the last two are your labels. Your reader would be of the form:
reader=[
readerType=UCIFastReader
file=$DataDir$/Train.txt
randomize=None
features=[
dim=4
start=0
]
labels=[
dim=2
start=4
labelDim=10
labelMappingFile=$DataDir$/labelsmap.txt
]
]
It's just that all examples given have the label placed in the first column.

Related

itextsharp , why is GetSingleSpaceWidth() returning 0 when a space is visually obvious?

Hi All,
This is a question related to itextsharp version 5.5.13.1. I am using a custom LocationTextExtractionStrategy implementation to extract sensible words from a PDF document. I am calling the method GetSingleSpaceWidth of TextRenderInfo to determine when to
join two adjacent blocks of characters into a single word as per the SFO link
itext java pdf to text creation
This approach has generally worked well. However, if you look at the attached document, the words "Credit" and "Extended" is giving me some problems.
Why are all the characters shown encircled in the screen capture returning a zero value for GetSingleSpaceWidth? This causes a problem . Instead of two separate words, my logic returns me one word "CreditExtended".
I understand that itextsharp5 is not supported any more. Any suggestions would be highly appreciated?
Sample document
https://drive.google.com/open?id=1pPyNRXvnUyIA2CeRrv05-H9q0sTUN97d
As already conjectured in a comment, the cause is that the font in question does not contain a regular space glyph, or even more exactly, does not map any of its glyphs to the Unicode value U+0020 in its ToUnicode map.
If a font has a ToUnicode map, iText uses only the information from that map. Thus, iText does not identify a space glyph in that font, so it cannot provide the actual SingleSpaceWidth value and returns 0 instead.
The font in question is named F5 and has this ToUnicode map:
/CIDInit /ProcSet findresource begin
14 dict begin
begincmap
/CIDSystemInfo
<< /Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
/CMapName /Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<0000> <FFFF>
endcodespacerange
4 beginbfchar
<0004> <0041>
<0012> <0043>
<001C> <0045>
<002F> <0049>
endbfchar
1 beginbfrange
<0044> <0045> <004D>
endbfrange
13 beginbfchar
<0102> <0061>
<0110> <0063>
<011A> <0064>
<011E> <0065>
<0150> <0067>
<015D> <0069>
<016F> <006C>
<0176> <006E>
<017D> <006F>
<0189> <0070>
<018C> <0072>
<0190> <0073>
<019A> <0074>
endbfchar
5 beginbfrange
<01C0> <01C1> <0076>
<01C6> <01C7> <0078>
<0359> <0359> [<2026>]
<035A> <035B> <2018>
<035E> <035F> <201C>
endbfrange
1 beginbfchar
<0374> <2013>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
end
end
As you can see, there is no mapping to <0020>.
The use of fonts in this PDF page is quite funny, by the way:
Its body is (mostly) drawn using Calibri, but it uses two distinct PDF font objects for this, F4 which uses WinAnsiEncoding from character 32 through 122, i.e. including the space glyph, and F5 which uses Identity-H and provides the above quoted ToUnicode map without a space glyph. Each maximal sequence of glyphs without gap is drawn separately; if that whole sequence can be drawn using F4, that font is used, otherwise F5 is used.
Thus, CMI, (Credit, and sub-indexes are drawn using F4 while I’ve, “Credit, and Extended” are drawn using F5.
In your problem string “Credit Extended”, therefore, we see two consecutive sequences drawn using F5. Thus, you'll get a 0 SingleSpaceWidth both for the “Credit t and the Extended” E.
At first glance these are the only two consecutive sequences using F5, so you have that issue only there.
As a consequence you should develop a fallback strategy for the case of two consecutive characters both coming with a 0 SingleSpaceWidth, e.g. using something like a third of the font size.

Is there a way to get the page number of the start of a paragraph with Applescript in Mac Word 2011?

I have a Word document and want to get the page number for any arbitrary paragraph within the document. I realise that paragraphs can span pages, so I actually need to ask about the start (or end) of the paragraph. Something like this pseudocode:
set the_page_number to page number of character 1 of paragraph 1 of my_document
I haven't been able to figure out how you link a range object with any kind of information about its rendering and am officially baffled.
Does anyone know the proper way?
I just found this question about dealing with this in C#: How do I find the page number for a Word Paragraph?
Poking around in the answer to that I found reference to range.get_Information(Word.WdInformation.wdActiveEndPageNumber)
It turns out there's a get range information command in the applescript dictionary, so you can do this:
set the_range to text object of character 1 of paragraph 123 of the_document
set page_number to get range information the_range information type active end adjusted page number
That'll get the page number that would be printed (e.g. if you'd set the document to start at page 42, this will produce the number you expect). Or, you can get the number without adjustment, i.e. your document page numbering is set to start at 42, but you want the page number as if numbering started at 1.
set the_range to text object of character 1 of paragraph 456 of the_document
set page_number to get range information the_range information type active end page number
Phew.

Zebra Programming Language (ZPL) II using ^FB or ^TB truncates text at specific lenghts

I am writing code to print labels for botanic gardens. Each label is printed individually but with different information on each label. Each label contains a scientific name which can vary greatly in size and thus can go over 2 lines (our label size is 10cm wide by 2.5cm high).
Our problem occurs mainly with the name when we go over 24 characters (See line with **).
If we choose a name that has 24 characters or less then it prints fine.
Anything more it will not print.
If we take all the other "items" off the label and just leave the "name" element then it prints only the first 24 characters and truncates the rest (we did this to test whether a possible overlap between our ^FB block and another element could be causing this problem).
We tried this with other elements that use a ^FB and we found that they displayed the same behaviour but varied in the length at which this issue occurred: for example "cc" (short for country code) had a limit of 21 characters.
For added information: we compile this code within a BASIC environment and use variables such as ":name:", ":Acc.dt":" as seen bellow. Our database provides this information and we have checked for any internal routines that would have truncated long names etc. Our code was working fine in ZPL but we recently had to move to ZPL II (we purchased a newer model GX430t) and had to modify our ZPL code at which point this problem started to occur.
Here is our code:
^XA
^LH40,40
^MMT
^PW1200
^LL1200
^FO16,05^A0N,35,^FDAcc. num.^FS
^FO170,05^A0,35,^FV":accnum:"^FS
^FO360,05^A0,35,^FV":qual:"^FS
^FO350,35^A0N,30,^FDAcc.dt.^FS
^FO450,35^A0N,30,^FB790,3,0,L,
^FH\^FV":accdt:"^FS
^FO430,70^^A0N,25,^FB790,3,0,L,
^FH\^FDProv. type^FS
^FO560,70^A0N,25,^FV":provtype:"^FS
^FO800,225^A0N,30,^FB790,3,0,L,
^FV":cc:"^FS
**^FO10,100^A0N,40,^FB790,3,0,L,
^FV":name:"^FS**
^FO1000,05^A0,35,^FV":proptype:"^FS
^FO5,225^A0,25^FVColl.^FS
^FO55,225^A0,25^FV":coll:"^FS
^FO375,225^A0,25,^FV":consstat:"^FS
^FO1000,70^A0,25,^FV":reqby:"^FS
^FO535,180^BCN,55,N,N,N^FV":qual:"^FS
^FO60,45^BCN,35,N,N,N^FV":accnum:"^FS
^PQ1,0,1,Y
^XZ
Here is what we have tried to fix this (apologies if some seem like wild cards):
Changing font type, size, and location on label;
Changing ^FO to ^FT;
Looked at our internal database logic;
Taking away ^FH\;
Changing the values within the ^FB line (we tried nearly all possible permutations);
Manually typed in a name longer than 24 characters (using notepad - no database/compiler) - same issue.
Any thoughts on this would be greatly appreciated
Kerry
I've had this issue before, and across printer manufacturers, firmwares and languages.
First, some paraphrased explanations straight out of the 2014 ZPL II Programming Guide (P1012728-009 Rev. A).
"The ^TB command prints a text block with defined width and height. The text block has an automatic word-wrap function. If the text exceeds the block height, the text is truncated."
"The ^FB (Field Block) command allows you to print text into a defined block type format. It can format a ^FD (Field Data) string into a block of text using the origin, font, and
rotation specified for the text string, and it contains an automatic word-wrap function."
Technically, the difference between a text block and a field block is that height is in dots for the former and in lines for the latter.
Also notice that although not mentioned, the ^FB command also truncates text that does not fit in the number of lines specified, and here's where the font size of the A0 command and the line spacing of the FB command now play an important role in determining whether to show or truncate that second or third line.
Incidentally, in other languages such as TSPL there is no truncation of text blocks--if you tell the block to be 3 lines in height but there's enough text for 4 lines, line 4 overlaps line 3 to indicate this--which may seem awful, but it is better than the data loss of truncation, which is not obvious.
For both commands:
"Using ^FT (Field Typeset) for your data takes the baseline origin of
the last possible line of text, meaning that the field block will be
filled from bottom to top."
"Using ^FO (Field Origin) means that the field block will be filled from top to bottom."
In reality, I have only been able to make the ^FB command work as expected, but that may be because ^TB is not implemented in the firmware I've worked with (ZPL II "compliant" Bluetooth printers).
You can test the following snippet for a 2x2 label in the Labelary Viewer:
^XA
~TA0
^MTD
^MNW
^MMT
^MFN
~SD15
^PR6
^PON
^PMN
^PW406
^LS0
^LRN
^LL406
^LT0
^LH0,0
^CI0
^XZ
^XA
^FO324,10,0^FB386,2,0,C,0^A0R,36,28.8^FH^FD"The King" Cupcake^FS
^FO278,10,0^FB386,1,0,C,0^A0R,28,22.4^FH^FDUse By 11/24/2015 02:45 PM^FS
^FO152,10,0^FB386,1,0,C,0^A0R,24,19.2^FH^FD11/24/2015 02:45 PM^FS
^FO62,140,0^FB250,1,0,R,0^A0R,24,19.2^FH^FDSL: 4 hours^FS
^FO38,10,0^FB386,1,0,L,0^A0R,18,14.4^FH^FDPREP DATE:^FS
^FO8,10,0^FB386,1,0,L,0^A0R,28,22.4^FH^FD11/24/2015 10:45 AM^FS
^FO62,10,0^FB50,1,0,L,0^A0R,24,19.2^FH^FDEMP:^FS
^FO92,10,0^FB376,3,0,J,0^A0R,18,14.4^FH^FDIngredients: 1 1/2 cups all-purpose flour, 1 teaspoon baking powder, 1/2 teaspoon salt, 8 tablespoons (1 stick) unsalted butter, room temperature, 1 cup sugar, 3 large eggs, 1 1/2 teaspoons pure vanilla extract, 3/4 cup milk.^FS
^PQ3,,,Y
^XZ
In particular, I've preceeded the A0 and FD commands with FB. Using the viewer, you can quickly test the effects of changing from FT and FO in the ingredients line, the effects of changing the A0 font sizes and the effects of changing the FB number of lines from say 3 to 2 (the viewer does not truncate text btw).
Of course there is no match for actually printing a label, for your ZPL II "compliant" printer may or may not truncate text according to its manufacturer and firmware version.
I hope that helps!

Excel: searching for a value within multiple arrays within cells

I'm trying to set up an error check between two systems and need to compare week numbers in different formats. One system produces week numbers in a text format e.g "8-15, 18, 31-32" and the other produces discrete values. How would I see whether a value e.g 16 fell within a multiple range like the one above?
It's part of a bigger issue where I'm checking a reference number, day, time and week number (e.g XXX111 Weds 9:00 9) in one system against the output of another system (e.g XXX111 Wed 9:00 7:11, 13, 16, 52-63 or XXX111 Thu 9:00 5, 6, 11-16). Despite lots of searching I've hit a wall with the bit above so any help would be greatly appreciated.
I'd rather not use VBA if possible. Thanks in advance for your wisdom.
Assumed:
7:11 should be 7-11
63 should be 53
A number not part of a range (eg 18) is not a problem
Ranges are in Text format
I hope the following helps or at least is ‘a step in the right direction’:
A Parse the components
Eg for 8-15, 18, 31-32, paste into a cell (say A1) and Data > Data Tools - Text to Columns > Delimited > Next > check Comma, Space and Treat consecutive delimiters as one > Next > Select Columns as required > select Text for each > Finish
May be easier to deal with a single column so select data, Copy > Select A2 > Paste Special > Transpose > OK and Delete contents of Row1.
B Add your search value (16) into B1
C Copy the formula below into B2 and copy down as required:
=AND(B$1>=VALUE(LEFT($A2,SEARCH("-",$A2)-1)),B$1<=VALUE(RIGHT($A2,LEN($A2)-SEARCH("-",$A2)))))
The result should be TRUE where the search value is within or on either bound of the discrete range:
The formula uses the hyphen to ‘recognise’ a discrete range. SEARCH looks for where it is positioned (because there could be one or two characters either side of it). LEFT and RIGHT are for the lower and upper bounds (in the case of RIGHT used in conjunction with LEN to address whether the upper bound is one or two characters). VALUE is required to convert the Text into something that can be equated to the search value. AND is for the process to consider both bounds in determining whether ‘in range’.
“I’d rather not use VBA if possible” – but might be advisable!
However, use of some fixed references ($) should make it a little easier than otherwise with standard formulae only because the given discrete ranges (which may be appended in ColumnA) can be queried for various search values by copying the formulae across to the right/down as required and entering (as Number format) further search values in Row1.

How can I generate this hash?

I'm new to programming (just started!) and have hit a wall recently. I am making a fansite for World of Warcraft, and I want to link to a popular site (wowhead.com). The following page shows what I'm trying to figure out: http://www.wowhead.com/?talent#ozxZ0xfcRMhuVurhstVhc0c
From what I understand, the "ozxZ0xfcRMhuVurhstVhc0c" part of the link is a hash. It contains all the information about that particular talent spec on the page, and changes whenever I add or remove points into a talent. I want to be able to recreate this part, so that I can then link my users directly to wowhead to view their talent trees, but I havn't the foggiest idea how to do that. Can anyone provide some guidance?
The first character indicates the class:
0 Druid
c Hunter
o Mage
s Paladin
b Priest
f Rogue
h Shaman
I Warlock
L Warrior
j Death Knight
The remaining characters indicate where in each tree points have been allocated. Each tree is separate, delimited by 'Z'. So if e.g. all the points are in the third tree, then the 2nd and 3rd characters will be "ZZ" indicating "end of first tree" and "end of second tree".
To generate the code for a given tree, split the talents up into pairs, going left-to-right and top-to-bottom. Each pair of talents is represented by a single character. So for example, in the DK's Blood tree segment, the first character will indicate the number of points allocated to Butchery and Subversion, and the second character will stand for Blade Barrier and Bladed Armor.
What character represents each allocation among the pair? I'm sure there's an algorithm, probably based on the ASCII character set, but all I've worked out so far is this lookup table. Find the number of points in the first talent along the top, and the number of points in the second talent along the left side. The encoded character is at the intersection.
0 1 2 3 4 5
0 0 o b h L x
1 z k d u p t
2 M R r G T g
3 c s f I j e
4 m a w N n v
5 V q i A y E
So if our Death Knight has one point in Butchery and two points in Subversion, the first character is 'R'. If instead we put no points in those two and five in Blade Barrier, the first two characters will be "0x". Trailing '0's (all the other pairs in the tree with no points allocated) can be omitted, as can trailing 'Z' delimiters (when there are no points in the subsequent trees). For one final example, the entire code for a DK with just a single point in Toughness would be "jZ0o": "Death Knight", "End of the first tree", "No points in the first pair of talents", "one point in the first talent of the second pair".
Can anyone work out what function generates the lookup table above? There's probably a clue in the codes for the classes: in alphabetical order (except for the DK which was added to the game after the others), they correspond to a series in the lookup table of (0,0), (0,3), (1,0), (1,3), (2,0), etc.
If you go to http://www.wowhead.com/?talent and start using the talent tree you can see the mysterious code being built up in the address bar as you click on the various boxes. So it's definitely not a hash but some kind of encoded structure data.
As the code is built up as you click the logic for building the code will be in the JavaScript on that page.
So my advice is do a view source on the page, download the JavaScript files and have a look at them.
I think it isn't a hash value, because hash values are normally one-ways values. This means you cannot (easily) restore the original information from which the hash code was generated.
Best thing would be to contact someone from wowhead.com and ask them how to interpret this information. I am sure they will help you out with some information about what type of encoding they use for the parameters. But without any help of the developers from wowhead.com it is almost impossible to figure out what information is encoded into this parameter.
I am not even sure the parameter you mentioned contains the talents of your character. Maybe it's just a session id or something like that. Take a look into the post data your browser sends to the server, it may contain a hidden field with the value you are searching for (you can use Tamper Data Firefox Addon).
I don't think ozxZ0xfcRMhuVurhstVhc0c is a hash value. I think it is a key (probably encrypted/encoded in some way). The server uses this key to retrieve information from it database. Since you don't have access to the database you don't know which key is needed, let alone how to encode it.
You need the original function that generates the hash.
I don't think that's public though :(
Check this out: hash wikipedia
Good luck learning how to program!
These hashes are hard to 'reverse engineer' unless you know how it was generated.
For example, it could be:
s1 = "random_string-" + score;
hash = encrypt(s1)
...etc
so it is hard to get the original data back from the hash (that is the whole point anyway).
your best bet would be link to the profile that would have the latest score ..etc