Orange: importing data with hash (#) in column name - orange

I want to load a data set which contains hash (#) symbol in the header. I work with bigrams and some of the columns are named "d#" or "z#"
According to Orange docs,
hash is used to attribute type information, so for some of the columns I will get error that I am using a wrong specifier ("Invalid attribute flag z").
Is there any workaround to tell orange that my labels are not attributed?

You could prefix those header labels with a correct type# specifier, e.g. if you have a bag of bigrams data, replace d# and z# columns with C#d# and C#z#, marking them counts, continuous.
With some luck, Orange will interpret the first C# as continuous and the rest as attribute name.

Related

Extracting data from old text file into usable format

I have some data in a text file in the following format:
1079,40,011,1,301 17,310 4,668 6,680 1,682 1,400 7,590 2,591 139,592 332,565 23,568 2,569 2,595 1,471 1,470 10,481 12,540 117,510 1,522 187,492 9,533 41,558 15,555 12,556 9,558 27,546 1,446 1,523 4000,534 2000,364 1,999/
1083,40,021,1,301 4,310 2,680 1,442 1,400 2,590 2,591 90,592 139,595 11,565 6,470 2,540 66,522 4,492 1,533 19,546 3,505 1,523 3000,534 500,999/
These examples represent what would be two rows in a spreadsheet. The first four values (in the first example, "1079,40,011,1") each go into their own column. The rest of the data are in a paired format, first listing a name of a column, designated by a number, then a space followed by the value that should appear in that column. So again, example: 301 17,310 4,668 6: in this row, column 301 has a value of 17, column 310 has value of 4, column 668 has value of 6, etc. Then 999/ indicates an end to that row.
Any suggestions on how I can transform this text file format into a usable spreadsheet would be greatly appreciated. There are thousands of "rows" and so can't just manually convert them and I don't possess the coding skills to execute such a transformation myself.
This is messy but since there is a pattern it should be doable. What software are you using?
My first idea would be to identify when the delimeter changes from comma to space. Is it based on a fixed width, like always after 14 characters? Or is it based on the delimiter, like it is always after the 4th comma?
Once you've done that, you could make two passes at the data. The first pass imports the first four values from the beginning of the line which are separated by comma. The second pass imports the remaining values which are separated by space.
If you include a row number when importing you can then use it to join first and second passes at importing.

Trying to work around the error DF-CSVWriter-InvalidEscapeSetting

So I have a dataset which I want to export to csv with pipe as separator and no escape character.
That dataset contains in fact 4 source columns, 3 regular ones (just text) and one variable one.
That last column holds another subset of values that are also separated with a pipe.
Purpose is that the export looks like this, where the values are coming from my 4th field.
COL1|COL2|COL3|VAL1|VAL2|VAL3|....
The number of values can be different for each record but.
When I set the csv export separator to ";", I get this result which is expected
COL1;COL2;COL3;VAL1|VAL2|VAL3|....
However setting it to "|", it throws the error DF-CSVWriter-InvalidEscapeSetting.
Most likely because it detected the separator character in my 4th field and then enforces that an escape character needs to be set.
Which is a logical thing in most case but in my case I would like him to ignore this and just export as-is.
Any way how I can work around this, perhaps with a different approach or some additional settings?
Split & flatten produces extra rows but that's not what I want.
Regards,
Sven Peeters
As you have the same characters in the column value same as your delimiter character, with no escape character in your dataset will throw an error.
You have to change the delimiter character to a different character or add a Quote character and Escape character to Double quote(").
Downloaded file:

Can to_tsvector be used on a column having single word rather than a sentence?

I am trying to find the English words in a Column "Name" in Postgresql.
I found the function "to_tsvector" does this:
If the name has a column with value "The Best Dogs", the vector returns the following tokens:
best:2 dogs:3
I wonder if it will work when the input is a single word (rather than a sentence)
For e.g. value "Dogs" in the column
Can I expect Dogs:1
I tried and its not working, I am wondering if its a configuration issue or the input format issue.

Matlab - Using special characters in table header

I have constructed a table with data from a struct and now wish to add custom headers to the columns before exporting the table. I found the following command:
T.Properties.VariableNames{'OldHeader'} = 'NewHeader';
This command however does not allow me to use spaces or special characters for my headers. My table contains the output from processed lab data and I wish to have headers like "Vol. [mL]" and "Conc. [wt%]".
To illustrate using the example from matlab documentation:
S.Name = {'CLARK';'BROWN';'MARTIN'};
S.Gender = {'M';'F';'M'};
S.SystolicBP = [124;122;130];
S.DiastolicBP = [93;80;92];
T = struct2table(S)
T.Properties.VariableNames{'Gender'} = 'Sex';
The above works, but restricts me to normal characters and no spaces. My question is how to change "Gender" to "Vol. [mL]" - if even possible?
As #Jubobs already mentioned in the comment, there are some rules for naming variables that prevent you from choosing the exact name that you want. From the documentation that I found by googling the topic:
A valid variable name starts with a letter, followed by letters,
digits, or underscores. MATLABĀ® is case sensitive, so A and a are not
the same variable. The maximum length of a variable name is the value
that the namelengthmax command returns.
You cannot define variables with the same names as MATLAB keywords,
such as if or end. For a complete list, run the iskeyword command.
However, I could think of two easy ways to work around this:
Different names, for example you can use the variable name Vol_ml
Store the names in a list, perhaps with an index code like v1 for the first variable name, then you can make v1 the name of the first variable.

ReadParse() and Hash values order

I am trying to read values from form using ReadParse() function in Hash (%in), I am not getting elements as order I submit in form, I Want get in same oreder as I submit in form,
please give me solution. Thanks.
Check perldoc CGI FETCHING THE NAMES OF ALL THE PARAMETERS PASSED TO YOUR SCRIPT:
my #names = $query->param;
As of version 1.5, the array of parameter names returned will be in the same order as they were submitted by the browser. Usually this order is the same as the order in which the parameters are defined in the form (however, this isn't part of the spec, and so isn't guaranteed).
Hash keys/values are not stored in the order they are added.
What are you trying to accomplish? Perhaps there is another way?
I didn't realize that the order is specified in the HTML spec:
application/x-www-form-urlencoded
This is the default content type. Forms submitted with this content
type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by
'+', and then reserved characters are escaped as described in
[RFC1738], section 2.2: Non-alphanumeric characters are replaced by
'%HH', a percent sign and two hexadecimal digits representing the
ASCII code of the character. Line breaks are represented as "CR LF"
pairs (i.e., '%0D%0A').
The control names/values are listed in the
order they appear in the document. The name is separated from the
value by '=' and name/value pairs are separated from each other by
'&'.
[http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4]