Start token in window word embeddings - neural-network

I'm using the pre-trained SENNA embeddings and feeding a 3 word window into a Dense neural net.
Does senna have a start or end token embedding?
Or do I create a random vector?
Sentence: 'McDonalds sells fries'
input 1: ['<s>', 'McDonalds', 'sells']
But there is no embedding for <s>...
Do I create my own? (all -1 for example)?

Reading the main page on https://ronan.collobert.com/senna/ it seems tokenisation and (which would include special sentence boundary tokens) are taking care of internally.
Is there a reason why you want to include them?
From the site, they seem implicit on input context ( single sentence ).
Taken from the website:
Usage
SENNA reads input sentences from the standard input
and outputs tags into the standard output.
The most likely command line usage for SENNA is therefore:
senna [options] < input.txt > output.txt
Of course you can run SENNA in an interactive mode
without the "pipes" < and >.
Each input line is considered as a sentence.
SENNA has its own tokenizer for separating words,
which can be deactivated with the -usrtokens option.

Related

Starspace: What is the interpretation of the labelDoc fileFormat?

The starspace documentation is unclear on the parameter 'fileFormat' which takes the value 'labelDoc' or 'fastText'.
I would like to understand intuitively what material difference setting this paramter would have.
Currently, my best guess is that if you set fileFormat to 'fastText' then all tokens in the training file that do not have the prefix '__label__' will be broken down into character-level n-grams as in fastText.
Alternatively, if you set fileFormat to 'labelDoc' then starspace will assume that all tokens are actually labels, and you do not need to prepend '__label__' to the tokens, because they will be recognized as labels anyway.
Is my thinking correct?
The way StarSpace uses the labels highly depends on the trainMode you are using. The labelDoc format is useful when you go for a trainMode that just relies on labels (trainMode 1 through 4) where it may be the same thing to use a fastText format specifying the __label__ prefix but some trainModes benefit from labelDoc format (i.e. trainMode 1 or 3) to use a whole sentence as a label element for that trainMode.
So to clarify that, if you are performing a text classification task(as explained in this example labelDoc wouldn't have any input recognized but on the other hand, as you stated, using fastText format will breakdown all non-labeled text as input and learn to predict the __label__ tags.
And an example for labelDoc format would be developing a content based recommender system (as explained in this example) every tab separated sentence is used at LHS or RHS during training time. But if you go on a collaborative approach (the content of the articles or wherever you sentences come from is not taken in account) it can be trained either with fastText (specifying the __label__ prefix) or labelDoc file format as labels are picked randomly during training time for LHS or RHS. (This second example is explained here).

Using fprintf() and disp() functions to display messages to command window in MATLAB?

Currently working on a project in which I must take multiple user-inputs. Because my input prompts must outline specific formatting to the user regarding how they should input their values, this makes each input prompt rather lengthy and so I've deemed it appropriate to separate each one with a line break so that it's easy to tell them apart/so that it looks nice. The last prompt is two lines long, so it would be hard to distinguish this one from the rest if they were all jumbled together rather than separated by line breaks.
I've explored the usage of fprintf() and disp(), and have found that fprintf() has some tricky behavior and sometimes will not work without including things like fflushf(), etc. Moreover, I've read that fprintf() is actually purposed for writing data to text files (from the MathWorks page, at least), and using it for another purpose is something I could definitely see my professor deducting points for if there is indeed an easier way (we are graded very harshly on script efficiency).
The disp() command seems to be more in-line with what I'm looking for, however I can't find anything on it being able to support formatting operators like \n. For now, I've resorted to replacing the usage of \n with disp(' '), however this is certainly going to result in a deduction of points.
TL;DR Is there a more efficient way to create line-breaks without using fprintf('text\n')? I'll attach a portion of my script for you to look at:
disp('i) For the following, assume Cart 1 is on the left and Cart 3 is on the right.');
disp('ii) Assume positive velocities move to the right, while negative velocities move to the left.');
prompt = '\nEnter an array of three cart masses (kg) in the form ''[M1 M2 M3]'': ';
m = input(prompt);
prompt = '\nEnter an array of three initial cart velocities (m/s) in the form ''[V1 V2 V3]'': ';
v0 = input(prompt);
disp(' ');
disp('Because the initial position of the three carts is not specified,');
prompt = 'please provide which two carts will collide first in the form ''[CartA CartB]'': ';
col_0 = input(prompt);
You can get disp to display a new line with the newline function. Putting multiple strings in square bracket will concatenate them.
disp(['Line 1' newline 'Line 2'])
You mention using fprintf, but as you found this is meant for writing to files. You can use the sprintf function to display the same formatted strings if desired.
disp(sprintf('Line 1 \nLine 2'))
In addition to Matt's solution, I figured out another way to solve my problem and wanted to post it here for anyone in the future with the same problem.
After some experimentation and some thought, I figured the most efficient way to do this (ideally) would not involve using disp() or fprintf() at all and instead would, in theory, involve actually manipulating the input prompts themselves to appear on multiple lines (rather than adding 'dummy' lines before the last line of each prompt, to make it seem as if it was all part of the prompt itself). I've been aware this whole time that simply a newline character \n will give me a linebreak in the middle of the sentence, and in theory this would work. But because the very last prompt is two lines long, simply typing one line with \n halfway through would make that line of code very long, which is what I was trying to avoid in the first place.
I realize my initial question didn't explicitly mention concatenating two (or more) strings to form an input prompt that appears on multiple lines both in the console and in the script itself, but that's essentially where I was going with this post and I apologize for any lack of clarity regarding this.
Anyways, I fixed this problem without having to use disp() or fprint() by declaring the prompt as a string array, rather than as a single string with the preceding lines of the prompt specified above it using disp() and/or fprintf() as you can see in the code I originally provided in the question. Here's how it looked before:
disp(' ');
disp('Because the initial position of the three carts is not specified,');
prompt = 'please provide which two carts will collide first in the form ''[CartA CartB]'': ';
col_0 = input(prompt);
versus how it looks now:
prompt = ['\nBecause the initial position of the three carts is not specified, please',...
'\nprovide which two carts will collide first in the form ''[CartA CartB]'': '];
col_0 = input(prompt);
In short, you can concatenate portions of the entire prompt by declaring it as a string array and inserting \n where you see fit.

How to run a disassembled code 6502?

I have to program in assembly the 6502.
I was forced to use the emulator Vice 128
I was told that the Commodore 128 is compatible with the instructions of 6502
I am a novice and I was made a practical demonstration but I did not understand anything.
There was an interface of 80 columns which passed with a command (which one?)
The instructions in machine language or assembly (the program)
were entered directly on this matrix of 80 columns.
Also the data are entered in this matrix.
So is this matrix the memory? Each line represents what?
I was told that this is disassembled code 6502. But I do not know what it means
I'm very confused
I want to run this simple program that
performs the sum of two numbers.
The two numbers are stored in the first page to the word zero and to the word one. I want to store the result in the second word of the first page.
I imagined that the first line contains 80 words. Is that right?
So I put here the data in hexadecimal (3 and 2).
$03 $02
LDA $00
ADC $01
STA $02
But I have a syntax error.
I hope someone can help me because it escapes me how things work.
Thanks in advance
Fir'st, in 6502, we use we deal with bytes, not words. (it's an 8 bit architecture)
You don't mention which macro assembler you are using, but I assume that its trying to interpret $03 as an op code, not data. I looked up two options
in ca65 you can use
.BYTE $03 $02
in dasm you use
HEX 03 02
In addition, 6502 has no concept of 80 anything (words, lines whatever). The only 80 I can think of is the old terminals that had 80 columns. I don't see how this is relevant here.
How to run a disassembled code 6502?
You have to assemble back the code.
Each 6502 instruction stands for 1, 2, or 3 bytes, the first is called the opcode, the optional second or third is the data used by the instruction (the operand).
You need a program to translate the instruction mnemonics to bytes. There were many such programs on the Commodore.
The Commodore 128 had a built-in monitor that let you enter instructions to assemble directly. You can enter it by typing MONITOR at the BASIC prompt. You would need to first set the address, then use "assemble" commands. Then use the "go" command at the starting address to run it. Use BASIC POKE command to set locations containing data, before you enter the monitor. The address 0B00 is a good address to use as it's the tape buffer which is unused except during tape I/O.
Good luck.

What is print <<EOF; and how is it working? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Help me understand this Perl statement with <<'ESQ'
What is the statement in https://stackoverflow.com/questions/4151279/perl-print-eof doing exactly? I came across the previous post but didn't understand what he is trying to explain. What is that PETE? Can anyone explain every line? How is the code is working?
print <<EOF;
This is
a multiline
string
EOF
print <<PETE;
This is
a multiline
string
PETE
What is the difference and similarity between these two? In place of PETE I have used many other words like DOG and it works the same every time.
This is called a here-doc. It basically grabs everything from the next line up until an end marker line and presents that as standard input to the program you're running. The end marker line is controlled by the text following the <<.
As an example, in bash (which I'm more familiar with than Perl), the command:
cat <<EOF
hello
goodbye
EOF
will run cat and then send two lines to its standard input (the hello and goodbye lines). Perl also has this feature though the syntax is slightly different (as you would expect, given it's a different language). Still, it's close enough for the explanation to still hold.
Wikipedia has an entry for this which you probably would have found had you known it was called a here-doc, but otherwise it would be rather hard to figure it out.
In your particular cases, there is no difference between using EOF and PETE, there's a relationship between the heredoc marker (the bit following <<) and the end of standard input.
For example, if one of your input lines was EOF, you couldn't really use that as a marker since the standard input would be terminated prematurely:
cat <<EOF
This section contains the line ...
EOF
but then has more stuff
and this line following is the real ...
EOF
In that case, you could use PETE (or anything else that doesn't appear in the text on its own line).
There are other options such as using quotes around the marker (so the indentation can look better) and the use of single or double quotes to control variable substitution.
If you go to the perlop page and search for <<EOF, it will hopefully all become clear.
See Quote and Quote-like Operators (it's pretty well explained).

Help me understand this Perl statement with <<'ESQ'

substr($obj_strptime,index($strptime,"sub")+6,0) = <<'ESQ';
shift; # package
....
....
ESQ
What is this ESQ and what is it doing here? Please help me understand these statements.
It marks the end of a here-doc section.
EOF is more traditional than ESQ though.
This construct is known as a here-doc (because you're getting standard input from a document here rather than an external document on the file system somewhere).
It basically reads everything from the next line up to but excluding an end marker line, and uses that as standard input to the program or command that you're running. The end marker line is controlled by the text following the <<.
As an example, in bash (which I'm more familiar with than Perl), the command:
cat <<EOF
hello
goodbye
EOF
will run cat and then send two lines to its standard input (the hello and goodbye lines). Perl also has this feature though the syntax is slightly different (as you would expect, given it's a different language). Still, it's close enough for the explanation to still hold.
Wikipedia has an entry for this which you probably would have found had you known it was called a here-doc, but otherwise it would be rather hard to figure it out.
You can basically use any suitable marker. For example, if one of your input lines was EOF, you couldn't really use that as a marker since the standard input would be terminated prematurely:
cat <<EOF
This section contains the line ...
EOF
but then has more stuff
and this line following is the real ...
EOF
In that case, you could use DONE (or anything else that doesn't appear in the text on its own line).
There are other options such as using quotes around the marker (so the indentation can look better) and the use of single or double quotes to control variable substitution.
If you go to the perlop page and search for <<EOF, it will hopefully all become clear.