I want to convert the second column of table T using datenum.
The elements of this column are '09:30:31.848', '15:35:31.325', etc. When I use datenum('09:30:31.848','HH:MM:SS.FFF') everything works, but when I want to apply datenum to the whole column it doesn't work. I tried this command datenum(T(:,2),'HH:MM:SS.FFF') and I receive this error message:
"The input to DATENUM was not an array of character vectors"
Here a snapshot of T
Thank you
You are not calling the data from the table, but rather a slice of the table (so its stays a table). Refer to the data in the table using T.colName:
times_string = ['09:30:31.848'; '15:35:31.325'];
T = table(times_string)
times_num = datenum(T.times_string, 'HH:MM:SS.FFF')
Alternatively, you can slice the table using curly braces to extract the data (if you want to use the column number instead of name):
times_num = datenum(T{:,2}, 'HH:MM:SS.FFF')
Related
I am working with the OpenFoodFacts dataset using PySpark. There's quite a lot of columns which are entirely made up of missing values and I want to drop said columns. I have been looking up ways to retrieve the number of missing values on each column, but they are displayed in a table format instead of actually giving me the numeric value of the total null values.
The following code shows the number of missing values in a column but displays it in a table format:
from pyspark.sql.functions import col, isnan, when, count
data.select([count(when(isnan("column") | col("column").isNull(), "column")]).show()
I have tried the following codes:
This one does not work as intended as it doesn't drop any columns (as expected)
for c in data.columns:
if(data.select([count(when(isnan(c) | col(c).isNull(), c)]) == data.count()):
data = data.drop(c)
data.show()
This one I am currently trying but takes ages to execute
for c in data.columns:
if(data.filter(data[c].isNull()).count() == data.count()):
data = data.drop(c)
data.show()
Is there a way to get ONLY the number? Thanks
If you need the number instead of showing in the table format, you need to use the .collect(), which is:
list_of_values = data.select([count(when(isnan("column") | col("column").isNull(), "column")]).collect()
What you get is a list of Row, which contain all the information in the table.
I have some samples I need to take.
In order to create a good identifier/serial number for the samples, I want it to be a product of its characteristics.
For example, if the sample was taken from India and the temperature was 40 degrees then I would click dropdowns in the form to create those two entries and then a serial number would be spat out in the form "Ind40".
Assuming that your form is bound to a table, you can create a calculated column in the table that concatenates the values from other columns into a single value.
For instance, create a new column and give it a name (for example, SerialNbr). Then for Data Type select "Calculated". An expression builder window will appear:
Enter the columns you'd like to concatenate and separate them with &. Here is an example of how the expression could look:
Left([Country],3) & [Temperature]
This expression takes the first 3 chars from the Country column and combines it with the value from Temperature column to create the value in column SerialNbr. The calculated column will automatically update when values are entered into the other fields. I'd also suggest adding another value to the calculated expression to help avoid duplicates, such as date/time of submission.
I'm using matlab to read in COVID-19 data provided by Johns Hopkins as a .csv-file using urlread, but I'm not sure how to use textscan in the next step in order to convert the string into a table. The first two columns of the .csv-file are strings specifying the region, followed by a large number of columns containing the registered number of infections by date.
Currently, I just save the string returned by urlread locally and open this file with importdata afterwards, but surely there should be a more elegant solution.
You have mixed-up two things: Either you want to read from the downloaded csv-file using ´textscan´ (and ´fopen´,´fclose´ of course), or you want to use ´urlread´ (or rather ´webread´ as MATLAB recommends not to use ´urlread´ anymore). I go with the latter, since I have never done this myself^^
So, first we read in the data and split it into rows
url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv";
% read raw data as single character array
web = webread(url);
% split the array into a cell array representing each row of the table
row = strsplit(web,'\n');
Then we allocate a table (pre-allocation is good for MATLAB as it stores variables on consecutive addresses in the RAM, so tell MATLAB beforehand how much space you need):
len = length(row);
% get the CSV-header as information about the number of columns
Head = strsplit(row{1},',');
% allocate table
S = strings(len,2);
N = NaN(len,length(Head)-2);
T = [table(strings(len,1),strings(len,1),'VariableNames',Head(1:2)),...
repmat(table(NaN(len,1)),1,length(Head)-2)];
% rename columns of table
T.Properties.VariableNames = Head;
Note that I did a little trick to allocate so many reparate columns of ´NaN´s by repeating a single table. However, concatnating this table with the table of strings is difficult as both contain the column-names var1 and var2. That is why I renamed the column of the first table right away.
Now we can actually fill the table (which is a bit nasty due to the person who found it nice to write ´Korea, South´ into a comma-separated file)
for i = 2:len
% split this row into columns
col = strsplit(row{i},',');
% quick conversion
num = str2double(col);
% keep strings where the result is NaN
lg = isnan(num);
str = cellfun(#string,col(lg));
T{i,1} = str(1);
T{i,2} = strjoin(str(2:end));% this is a nasty workaround necessary due to "Korea, South"
T{i,3:end} = num(~lg);
end
This should also work for the days that are about to come. Let me know what you actually gonna do with the data
My data in CSV is like this(Expected Image):
Actual Data
And I want to convert this Data into:
Expected Data
(hivetablename.hivecolumnname = dbtablename.dbtablecolumn)
By joining the multiple Row values into a Single row value like above.
Please note that 'AND' is a Literal between the condition to be built, which would appear until the second last record.
Once the Last Record is reached, Only the condition would appear(xx=yy)
I wish the result to be in SCALA SPARK.
Many thanks in advance!
I have a .csv file and I can't read it on Octave. On R I just use the command below and everything is read alright:
myData <- read.csv("myData.csv", stringsAsFactors = FALSE)
However, when I go to Octave it doesn't do it properly with the below command:
myData = csvread('myData.csv',1,0);
When I open the file with Notepad, the data looks something like the below. Note there isn't a comma separating the last column name (i.e. Column3) from the first value (i.e. Value1) and the same thing happens with the last value of the first row (i.e. Value3) and the first value of the second row (i.e Value4)
Column1,Column2,Column3Value1,Value2,Value3Value4,Value5,Value6
The Column1 is meant for date values (with format yyyy-mm-dd hh:mm:ss), I don't know if that has anything to do with the problem.
Alex's answers already explains why csvread does not work for your case. That function only reads numeric data and returns an array. Since your fields are all strings, you need something that reads a csv file into a cell array.
That function is named csv2cell and is part of the io package.
As a separate note, if you plan to make operation with those dates, you may want to convert those dates as strings, into serial date numbers. This will allow you to put your dates in a numeric array which will allow for faster operations and reduced memory usage. Also, the financial package has many functions to deal with dates.
csvread only reads numeric data, so a date does not qualify unfortunately.
In Octave you might want to check out the dataframe package. In Matlab you would do readtable.
Otherwise there are also more primitive functions you can use like textscan.