I have a .csv file and I can't read it on Octave. On R I just use the command below and everything is read alright:
myData <- read.csv("myData.csv", stringsAsFactors = FALSE)
However, when I go to Octave it doesn't do it properly with the below command:
myData = csvread('myData.csv',1,0);
When I open the file with Notepad, the data looks something like the below. Note there isn't a comma separating the last column name (i.e. Column3) from the first value (i.e. Value1) and the same thing happens with the last value of the first row (i.e. Value3) and the first value of the second row (i.e Value4)
Column1,Column2,Column3Value1,Value2,Value3Value4,Value5,Value6
The Column1 is meant for date values (with format yyyy-mm-dd hh:mm:ss), I don't know if that has anything to do with the problem.
Alex's answers already explains why csvread does not work for your case. That function only reads numeric data and returns an array. Since your fields are all strings, you need something that reads a csv file into a cell array.
That function is named csv2cell and is part of the io package.
As a separate note, if you plan to make operation with those dates, you may want to convert those dates as strings, into serial date numbers. This will allow you to put your dates in a numeric array which will allow for faster operations and reduced memory usage. Also, the financial package has many functions to deal with dates.
csvread only reads numeric data, so a date does not qualify unfortunately.
In Octave you might want to check out the dataframe package. In Matlab you would do readtable.
Otherwise there are also more primitive functions you can use like textscan.
Related
I am trying to read in some data in date format and the solution is eluding me. Here are four of my tries using the simplest self-contained examples I could devise. (And the site is making me boost my text-to-code ratio in order for this to post, so please ignore this sentence).
*EDIT - my example was too simplistic. I have spaces in my variables, so I do need to specify positions (the original answer said to ignore positions entirely). The solution below works, but the date variable is not a date.
data clinical;
input
name $ 1-13
visit_date $ 14-23
group $ 25
;
datalines;
John Turner 03/12/1998 D
Mary Jones 04/15/2008 P
Joe Sims 11/30/2009 J
;
run;
No need to specify the lengths. datalines already assumes space-delimited values. A simple way to specify an informat is to use a : after each input variable.
data clinical;
input ID$ visit_date:mmddyy10. group$;
format visit_date mmddyy10.; * Make the date look human-readable;
datalines;
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
;
run;
Output:
ID visit_date group
01 03/12/1998 D
02 04/15/2008 P
03 11/30/2009 J
A friend of mine suggested this, but it seems odd to have to switch syntax markedly depending on whether the variable is a date or not.
data clinical;
input
name $ 1-12
#13 visit_date MMDDYY10.
group $ 25 ;
datalines;
John Turner 03/12/1998 D
Mary Jones 04/15/2008 P
Joe Sims 11/30/2009 J
;
run;
SAS provides a lot of different ways to input data, just depending on what you want to do.
Column input, which is what you start with, is appropriate when this is true:
To read with column input, data values must have these attributes:
appear in the same columns in all the input data records
consist of standard numeric form or character form
Your data does not meet this in the visit_date column. So, you need to use something else.
Formatted input is appropriate to use when you want these features:
With formatted input, an informat follows a variable name and defines how SAS reads the values of this variable. An informat gives the data type and the field width of an input value. Informats also read data that is stored in nonstandard form, such as packed decimal, or numbers that contain special characters such as commas.
Your visit_date column matches this requirement, as you have a specific informat (mmddyy10.) you would like to use to read in the data into date format.
List input would also work, especially in modified list format, in some cases, though in your example of course it wouldn't due to the spaces in the name. Here's when you might want to use it:
List input requires that you specify the variable names in the INPUT statement in the same order that the fields appear in the input data records. SAS scans the data line to locate the next value but ignores additional intervening blanks. List input does not require that the data is located in specific columns. However, you must separate each value from the next by at least one blank unless the delimiter between values is changed. By default, the delimiter for data values is one blank space or the end of the input record. List input does not skip over any data values to read subsequent values, but it can ignore all values after a given point in the data record. However, pointer controls enable you to change the order that the data values are read.
(For completeness, there is also Named input, though that's more rare to see, and not helpful here.)
You can mix Column and Formatted inputs, but you don't want to mix List input as it doesn't have the same concept of pointer control exactly so it can be easy to end up with something you don't want. In general, you should use the input type that's appropriate to your data - use Column input if your data is all text/regular numerics, use formatted input if you have particular formats for your data.
I have a program that writes dates as text in a column called Expression. Is there a way to write a query that will remove the day of the month? I need it to be something that I can share with multiple people to use in multiple databases. Examples of the format of the values stored are 6/2021, 06/2021, and 06/01/2021.
You can use IIf in the query to compare the length of the data, and pad/split as required. Something like:
SELECT
Expression,
IIf(Len([Expression])=6,"0" & [Expression],IIf(Len([Expression])=7,[Expression],IIf(Len([expression])=10,Left(Expression,2) & "/" & right(expression,4)))) AS OutputData
FROM tblFormatDate;
You may need to add a few more IIfs, and also a final false return statement. If it starts to get messy with the data, then you may want to look at creating a custom VBA function that makes the process flow easier to understand.
Regards,
I have an imported CSV file with string values.
In this file there are amounts, of which several lines equal 0,00
I want to create a TotalCA column by adding several fields in my table and convert it to a numeric value.
I use the toDecimal function and the values are all returned NULL and the created column is grayed..
I have done a lot of research and I can't find a solution. Can you help me?
Thank you
Lea
I made an example csv data if I understand you correctly:
Like you said, some rows are enriched with values greater than 0, and others contain "0.00" when it is a zero value. Actually, the row data contains different data type, int and decimal.
For these reason and as I tested, no matter toDecimal(), toFloat() or toDouble(), all of the functions don't work. I use Derived column expression to do the data conversion.
We can't keep these data and only can choose one type of them. If you choose the decimal or float, other rows data would be converted to '11.0', I think that also doesn't you want.
Source Projection: I preset the column type to double:
(Decimal can't keep '0.00', it only returns '0')
In one word, the only way is that use String data type to keep the data. And also use String data type to receive the data in sink dataset.
HTH.
Thank you all for your answers.
Here is my CSV file
If I go to the Source Projection module and change the type of my column LFC1_UM01S to decimal this is what I get:
Why are some values considered as NULL?
To decimal column
I have a csv file containing 8 lines and 1777 columns.
I need to read all the contents in matlab, excluding the first line and first column. First line and first column contain strings and matlab can't parse them.
Do you have any idea?
data = csvread(filepath);
The code above reads all the contents
As suggested, csvread with a range will read in the numeric data. If you would like to read in the strings as well (which are presumably column headers), you can use readtable:
t = readtable(filepath);
This will create a table with the column headers in your file as variable names of the columns of the table. This way you can keep the strings associated with the data, if need be.
I have many files, with different numbers of columns (.txt files). How to automaticaly change this %s%s%s... string to correspond numbers '%s' to amount of column?
data=textscan(fid,'%*s%*s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%','HeaderLines',skip_lines,'CollectOutput',1);
The files looks like:
I have algoritm which load many files and it will nice, when I could automatize this.
try this method:
importdata('path to file')
You can specify the delimiter as well. this method is adaptive and you do not need to take care about columns. This method return the header, text data and number data in separate variables.