How to properly remove NaN values from table - matlab

After reading an Excel spreadsheet in Matlab I unfortunately have NaNs included in my resulting table. So for example this Excel table:
would result in this table:
where an additional column of NaNs occurs. I tried to remove the NaNs with the following code snippet:
measurementCells = readtable('MWE.xlsx','ReadVariableNames',false,'ReadRowNames',true);
measurementCells = measurementCells(any(isstruct(measurementCells('TIME',1)),1),:);
However this results in a 0x6 table, without any values present anymore. How can I properly remove the NaNs without removing any data from the table?

Either this:
tab = tab(~any(ismissing(tab),2),:);
or:
tab = rmmissing(tab);
if you want to remove rows that contain one or more missing value.
If you want instead to replace missing values with other values, read about how fillmissing (https://mathworks.com/help/matlab/ref/fillmissing.html) and standardizeMissing (https://mathworks.com/help/matlab/ref/standardizemissing.html) functions work. The examples are exhaustive and should help you to find the solution that best fits your needs.
One last solution you have is to spot (and manipulate in the way you prefer) NaN values within the call to the readtable function using the EmptyValue parameter. But this works only against numeric data.

Related

Remove rows with a certain value in any column in pyspark

I am working in pyspark to clean a data set. The data set has "?" in various rows in various columns. I want to remove any row that has the value anywhere in it. I tried the following:
df = df.replace("?", "np.Nan")
df=df.dropna()
However, It did not work to remove those values.
I keep looking online but can't find any understandable answers (i am a newbie)

How to obtain rows of an incomplete Matlab table?

After getting answers to this question I realized that I do have a problem with importing data into Matlab but it has nothing to do with NaNs but rather with different data types stored in the table.
In the same example I used in the other question importing an Excel table
using
measurementTable = readtable('MWE.xlsx','ReadVariableNames',false,'ReadRowNames',true);
leads to the Matlab table
As you can see the values in column 1 to 4 are of type cell while the values in column 5 are of type double. If I would now try to obtain a single row of the table by using
measurementTable{'DATE',:}
I get the error message:
Cannot concatenate the table variables 'Var5' and 'Var1', because their types are double and cell.
How can I tackle this problem?
As you worked out the command you are using is failing due to Matlab trying to combine the cells and doubles to an array.
Since you have multiple data types you need to store your "row" in a cell array.
You can obtain a single row of mixed data by doing:
table2cell ( measurementTable('DATE',:) )

Handle missing data

I was wondering how people typically handle missing data problems?
I read some articles about imputing missing data, where basically the idea is to replace the missing data by some value calculated in some way.
For example, suppose I have a table with some missing cells, and I want to fill these cells using some imputation technique. I image I should first use some carefully chosen function f and apply f on some existing data in the table to compute the value to replace a specific missing value. Is this true?

Transpose data using Talend

I have this kind of data:
I need to transpose this data into something like this using Talend:
Help would be much appreciated.
dbh's suggestion should work indeed, but I did not try it.
However, I have another solution which doesn't require to change input format and is not too complicated to implement. Indeed the job has only 2 transformation components (tDenormalize and tMap).
The job looks like the following:
Explanation :
Your input is read from a CSV file (could be a database or any other kind of input)
tDenormalize component will Denormalize your column value (column 2), based on value on id column (column 1), separating fields with a specific delimiter (";" in my case), resulting as shown in 2 rows.
tMap : split the aggregated column into multiple columns, by using java's String.split() method and spreading the resulting array into multiple columns. The tMap should like like this:
Since Talend doesn't accept to store Array objects, make sure to store the splitted String in Object format. Then, cast that object into Array on the right side of the Map.
That approach should give you the expected result.
IMPORTANT:
tNormalize might shuffle the rows, meaning for bigger input, you might encounter unsorted output. Make sure to sort it if needed or use tDenormalizeSortedRow instead.
tNormalize is similar to an aggregation component meaning it scans the whole input before processing, which results into possible performance issues with particularly big inputs (tens of millions of records).
Your input is probably wrong (you have 5 entries with 1 as id, and 6 entries with 2 as id). 6 columns are expected meaning you should always have 6 lines per id. If not, then you should implement dbh's solution, and you probably HAVE TO add a column with a key.
You can use Talend's tPivotToColumnsDelimited component to achieve this. You will most likely need an additional column in your data to represent the field name.
Like "Identifier, field name, value "
Then you can use this component to pivot the data and write a file as output. If you need to process the data further, read the resulting file with tFileInoutDelimited .
See docs and an example at
https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide521EN/13.43+tPivotToColumnsDelimited

Convert single row to multiple rows in tableau based on some delimiter

I had a column with data as => a,b,c,d,e
I need to display(in worksheet) as
a
b
c
d
e
Note: need to be split based on ','
Do I need to to use calculation field or any other approach is there???
Went through split function but is used to generate new columns, I want to store in a single column.
is this something that could work? (you said it's just a matter of visualization without altering data, right?)
I just created a CF like this:
REPLACE(value,",","
")
EDIT: since it seems that your need involves a data manipulation (you want multiple row instead of one) I think that the best way is using the split function even though, as you noticed, it will create new columns.
Otherwise if it's just a visualization need, you could use the solution posted before which shows your data ("a,b,c,d,e") in the same cell with the same horizontal alignment, just replacing commas with CR