Saving a CSV with the following format in iPhone application - iphone

In my application I have a simple logbook where the user can save simple posts about an event. The format is like this:
Date, duration(seconds), distance(km), a comment AND categories with a variable number between 0 - 4 AND circumstances/conditions with a variable number between 0 - 4
An example would be:
Header of CSV file
Date,Duration,Distance,Comment
Then multiple rows like this
07.02.11,7800,300,"A comment"
07.02.11,7800,300,"A comment"
07.02.11,7800,300,"A comment"
But how can I add the categories and conditions to this format and how would I know where in the categories/conditions end in the CSV if I at a later point in the application want to import the file again?
(I do not need help with how to save etc this to file, already done that, but I could need guidiance on how to format it, thank you)
(This seems pretty odd)
Header of CSV file
Date,Duration,Distance,Comment, Category, Category, Category, Category, Condition, Condition, Condition, Condition
Then multiple rows like this
07.02.11,7800,300,"A comment", "Categoryname", "Categoryname", "Categoryname","Categoryname", "Condtion", "Condition", "Condtion", "Condition"
(Would this be better)
Header of CSV file
Date,Duration,Distance,Comment, Category, Condition
Then multiple rows like this
07.02.11,7800,300,"A comment", "Multiple category names separted by -", "Multiple condition names separted by -"

I think you last proposal of separating conditions or categories using a special separator symbol (the hyphen in your example) is the right one.
By the way I would suggest two extra things:
use a less common separator, that is something you can forbid the user to use without limiting user choice; probably the hyphen is a character you don't want to forbid, use a different sequence such as three pipes: ||| which is not common.
if possible (but be careful in this case about final destination of the CSV file) you can avoid using the standard "comma separator". The reason for this is that if comma is used inside the fields content, then this content must be separated by double quotes. This is some time problematic if you need to do some custom parsing by other software. Normally when I know that my CSV will not be used as source import from other software (e.g. Numbers or Excel) I prefer to use a different separator, e.g. a sequence of 2 hash (##) or something more "strange". Note that in this case you are no more strict-CSV compliant! but there is some software, like OpenOffice, which is more flexible with this special formats.

The second solution you are proposing will work but practically defeats the idea of using a standard format, since you will need to do the parsing of categories and conditions on your own instead of using a standard CSV parser. Writing your own parser is never good.
I would personally do this differently: not trying to put everything in a single file and have two files, one for events (each event has a unique id) and the other for categories/conditions (each category condition is associated to an event through the event's id, multiple categories/events for a given event would appear on multiple lines associated sharing the same event id). Both files would be standard CSV files.
As an alternative, if you are not tied to CSV for any reason, you might think of using JSON, which allows for a richer set of data types, including arrays, and offers plenty of code that you can reuse. This will not require much change to your code.
Another option, more "canonical" (IMO) but also more expensive in terms of code rewrite, would be using sqlite3.
If I had to choose, I would go for JSON, but I don't know if this is ok for you.

Related

Dataprep import dataset does not detect headers in first row automatically

I am importing a dataset from Google Cloud Storage (parameterized) into Dataprep. So far, this worked perfectly fine and one of the feature that I liked is that it auto detects that the first row in my (application/octet-stream) .csv file are my headers.
However, today I tried to import a new dataset and it did not detect the headers, but it auto assigned column1, column2...
What has changed and or why is this the case. I have checked the box auto-detect and use UTF-8:
While the auto-detect option is usually pretty good, there are times that it fails for numerous reasons. I've specifically noticed this when the field names contain certain characters (e.g. comma, invisible characters like zero-width-non-joiners, null bytes), or when multiple different styles of newline delimiters are used within the same file.
Another case I saw this is when there were more columns of data than there were headers.
As you already hit on, you can use the following snippet to do mostly the same thing:
rename type: header method: filter sanitize: true
. . . or make separate recipe steps to convert the first row to header and then bulk-rename to your own liking.
More often than not, however, I've found that when auto-detect fails on a previously working file, it tends to be a sign of some sort of issue with the source file. I would look for mismatched data, as well as misplaced commas within the output, as well as comparing the header and some data rows to the original source using a plaintext editor.
When all else fails, you can try a CSV validator . . . but in my experience they tend to be incredibly opinionated when it comes to the formatting options of the fileā€”so depending on the system generating the CSV, it could either miss any errors or give false-positives. I have had two experiences where auto-detect fails for no apparent reason on perfectly clean files, so it is possible that process was just skipped for some reason.
It should also be noted that if you have a structured file that was correctly detected but want to revert it, you can go to the dataset details, select the "..." (More) button, and choose "Remove structure..." (I'm hoping that one day they'll let you do the opposite when you want to add structure to a raw dataset or work around bugs like this!)
Best of luck!
Can be resolved as a transformation within a Flow:
rename type: header method: filter sanitize: true

Is there any cross-references mechanism in Netbeans?

One should think the title of this post self-explanatory: I would like to generate a table of all variable names and references inside a Netbeans project.
Is there a means I can do this?
I am mainly concerned with javascript.
By 'cross-reference' I mean a way to collect all names of variables and display in a list, along with those names, all places where the said variables are used. The term 'variables' also includes names of functions, of course.
Ideally the list generated could be exported in some format for ulterior use.

Postgres full text search ignore url

I am trying to use PostgreSQL to implement a full-text search system.
I encounter this strange or may be intended feature with that.
While trying to index or search for a column which contains names of files with extension (e.g. myimage.jpg), the system treats it as a url and does not properly tokenize.
I referred to the documentation and see that via ts_debug that the file name is taken as a host of a url.
Could some one tell how to take all inputs as normal word in the FTS of PostgreSQL.
Also, on a second request, how can one do a contains, startswith, and endswith searches with it?
Update
I have now tried the statement create text search configuration..., copied from pg_catalog.english and removed host,url, and url_path and then specified the configuration for the ts_debug method. But still no go., myimage.jpg is still identified as host.
Version
I use version 9.4
tl;dr Look at pre-parsing your input and removing punctuation if you really only want words (and not emails, urls, hosts, etc).
So after trying to figure this out myself the issue is that you don't seem to be able to easily customise the parser. From my understanding the parser runs first, which generates tokens. Those tokens are then matched to dictionaries.
By removing host, url, url_path from the configuration all you are doing is making it so that these tokens don't get looked up in a dictionary, resulting in no lexeme from these tokens. Which essentially means that they don't exist in terms of search. Which is not want you want...
Ideally what you need to do is customise the parser to not generate those tokens in the first place, or to also generate overlapping tokens (similar to how hyphenated words generate a token for the entire word as well as individual components) . This doesn't seem to be possible at the moment without writing a custom parser.
The only solution to this would be to pre-parse the text to remove the full stop. Note that if you rely on other types of tokens like version (e.g. 8.3.0) or email (e.g. name#domain.com) this will break those. So you may need to be a bit clever on how you remove characters.
select ts_debug('english', replace('this-is-a-file.jpg', '.', ' '));
"(asciihword,"Hyphenated word, all ASCII",this-is-a-file,{english_stem},english_stem,{this-is-a-fil})"
"(hword_asciipart,"Hyphenated word part, all ASCII",this,{english_stem},english_stem,{})"
"(blank,"Space symbols",-,{},,)"
"(hword_asciipart,"Hyphenated word part, all ASCII",is,{english_stem},english_stem,{})"
"(blank,"Space symbols",-,{},,)"
"(hword_asciipart,"Hyphenated word part, all ASCII",a,{english_stem},english_stem,{})"
"(blank,"Space symbols",-,{},,)"
"(hword_asciipart,"Hyphenated word part, all ASCII",file,{english_stem},english_stem,{file})"
"(blank,"Space symbols"," ",{},,)"
"(asciiword,"Word, all ASCII",jpg,{english_stem},english_stem,{jpg})"
In terms of your second question. Are you talking about partial word matches? You get this a little bit with the stemming when using a config like english, so running becomes run which will match if you search for run or running. If you're talking about fuzzy matching it gets a little more complicated. I suggest reading this article http://rachbelaid.com/postgres-full-text-search-is-good-enough/

Reading large csv files with strings containing commas as one field

I have a large .csv file (~26000 rows). I want to be able to read it into matlab. Another problem is that it contains a collection of strings delimited by commas in one of the fields.
I'm having trouble reading it. I tried stuff like tdfread, which won't work here. Any tricks with textscan i should be aware about?
Is there any other way?
I'm not sure what is generating your CSV file but that is your problem.
The point of a CSV file, is that the file itself designates separation of fields. If the text of the CSV contains commas, then nothing you can do will help you. How would ANY program know when the text in a single field contains commas, or when that comma is a field delimiter?
Proper CSV would have a text qualifier. Some generators/readers gives you the option to use one. The standard text qualifier is a " (quote). Its changeable, though, because your text may contain those, too.
Again, its all about generating proper CSV content.
There's a chance that xlsread won't give you the answer you expect -- do the strings always appear in the same columns, for example? I think (as everyone else seems to :-) that it would be more robust to just use
fid = fopen('yourfile.csv');
and then either textscan
t = textscan(fid, '%s', delimiter', sprintf('\n'));
t = t{1};
or just fgetl (the example in the help is perfect).
After that you can do some line-by-line processing -- using textscan again on the text content of each line, for example, is a nice, quick way to get a cell-array that will allow fast analysis of each line.
You have a problem because you're reading it in as a .csv, and you have commas within your data. You can get it in Excel and manipulate the date, possibly extract the unwanted commas with Excel formulas. I work with .csv files for DB imports quite a bit. I imagine matLab has similar rules, which is - no commas in your data.
Can you tell us more about your data? Are there commas throughout, our just one column? Maybe you can read it in as tab delimited?
Are you using a Unix system? The reason I am asking is that you could use a command-line function such as sed and regular expressions to clean those data files before you pass them into Matlab. Here is a link that explains how to do exactly what you are looking for.
Since, as others have observed, your file is CSV with commas inside what you think of as a single field, it's going to be hard to persuade Matlab that that really is only one field. I think your best strategy is going to be to read one line at a time, into a string acting as a buffer, and to translate it, field-by-field, into the variables or other data structures that you want. Since Matlab has in-built regular expression capabilities this shouldn't be too hard.
And, as others have already suggested, posting a sample of your data would help us to help you.
One easy solution is:
path='C:\folder1\folder2\';
data = 'data.csv';
data = dataset('xlsfile',sprintf('%s\%s', path,data));
Of course you could also do the following:
[data,path] = uigetfile('C:\folder1\folder2\*.csv');
data = dataset('xlsfile',sprintf('%s\%s', path,data));
now you will have loaded the data as dataset. An easy way to get a column 1 for example is
double(data(1))

Disabling the PostgreSQL 8.4 tsvector parser's `file` token type

I have some documents that contain sequences such as radio/tested that I would like to return hits in queries like
select * from doc
where to_tsvector('english',body) ## to_tsvector('english','radio')
Unfortunately, the default parser takes radio/tested as a file token (despite being in a Windows environment), so it doesn't match the above query. When I run ts_debug on it, that's when I see that it's being recognized as a file, and the lexeme ends up being radio/tested rather than the two lexemes radio and test.
Is there any way to configure the parser not to look for file tokens? I tried
ALTER TEXT SEARCH CONFIGURATION public.english
DROP MAPPING FOR file;
...but it didn't change the output of ts_debug. If there's some way of disabling file, or at least having it recognize both file and all the words that it thinks make up the directory names along the way, or if there's a way to get it to treat slashes as hyphens or spaces (without the performance hit of regexp_replaceing them myself) that would be really helpful.
I think the only way to do what you want is to create your own parser :-( Copy wparser_def.c to a new file, remove from the parse tables (actionTPS_Base and the ones following it) the entries that relate to files (TPS_InFileFirst, TPS_InFileNext etc), and you should be set. I think the main difficulty is making the module conform to PostgreSQL's C idiom (PG_FUNCTION_INFO_V1 and so on). Have a look at contrib/test_parser/ for an example.