How do I extract the last string of a csv file and append it to the other? - matlab

I have csv file of many rows, each having 101 columns, with the 101th column being a char, while the rest of the columns are doubles. Eg.
1,-2.2,3 ... 98,99,100,N
I implemented a filter to operate on the numbers and wrote the result in a different file, but now I need to map the last column of my old csv to my new csv. how should I approach this?
I did the original loading using loadcsv but that didn't seem to load the character so how should I proceed?

In MATLAB there are many ways to do it, this answer expands on the use of tables:
Input
test.csv
1,2,5,A
2,3,5,G
5,6,8,C
8,9,7,T
test2.csv
1,2,1.2
2,3,8
5,6,56
8,9,3
Script
t1 = readtable('test.csv'); % Read the csv file
lastcol = t{:,end}; % Extract the last column
t2 = readtable('test2.csv'); % Read the second csv file
t2.addedvar = lastcol; % Add the last column of the first file to the table from the second file
writetable(t2,'test3.csv','Delimiter',',','WriteVariableNames',false) % write the new table in a file
Note that test3.csv is a new file but you could also overwrite test2.csv
'WriteVariableNames',false allows you to write the csv file without the headers of the table.
Output
test3.csv
1,2,1.2,A
2,3,8,G
5,6,56,C
8,9,3,T

Related

Writing Tables from Matlab into CSV

I have several tables in matlab that and I would like to write all to one .csv file, vertically concatenating. I would like to keep the column names from each table as the top row, and would like to use a loop to write the csv. The ultimate goal is to read the data in to R, but R.matlab did not work well. Suggestions about how to do this?
Alternatively how can I change filenames in a for loop using the iterator?
e.g. along the lines of
for i=1:10
writecsv('mydatai.csv',data(i))
end
So I must have at the end 10 csv files as output.
You can change the filename within the loop by using for sprintf string formatting function, for example:
dlmwrite(sprintf('mydata%i.csv', i), data(i) )
Note that the %i portion of the string is the sprintf formatting operator for an integer, it is just a coincidence that you also decided to name your iterator variable 'i'.
You can append extra data to an existing CSV by using the dlmwrite function, which uses a comma delimiter as the default, and including the '-append' flag.
Another way would be to use
writetable(Table,filename )
and to change file name after every alternation you can use
filename = ['mydata' num2str(i) '.csv']

How to import column of numbers from mixed string numerical text file

Variations of this question have already been asked several times, for example here. However, I can't seem to get this to work for my data.
I have a text file with 3 columns. First and third columns are floating point numbers. Middle column is strings. I'm only interested in getting the first column really.
Here's what I tried:
filename=fopen('heartbeatn1nn.txt');
A = textscan(filename,'%f','HeaderLines',0);
fclose(filename);
When I do this A comes out as just a single number--the first element in the column. How do I get the whole column? I've also tried this with the '.tsv' file extension, same result.
Also tried:
filename=fopen('heartbeatn1nn.txt');
formatSpec='%f';sizeA=[1 Inf];
A = fscanf(filename,formatSpec,sizeA);
fclose(filename);
with same result.
Could the file size be a problem? Not sure how many rows but probably quite a few since file size is 1.7M.
Assuming the columns in your text file are separated by single whitespace characters your format specification should look like this:
A = textscan(filename,'%f %s %f');
A now contains the complete file content. To obtain the first column:
first_col = A{:,1};
Alternatively, you can tell textscan to skip the unneeded fields with the * option:
first_col = cell2mat( textscan(filename, '%f %*s %*f') );
This returns only the first column.

Storing comma separated .csv data from a web source into a matrix in matlab

I'm trying to download this comma separated info and save it so that it can be stored as a matrix which can then be accessed. So far I have code which I think should store the info in a file called test.csv but im not sure:
>> urlwrite('http://xweb.geos.ed.ac.uk/~weather/jcmb_ws/JCMB_2013_Mar.csv','test.csv');
d = csvread('test.csv');
??? Error using ==> dlmread at 145
Mismatch between file and format string.
Trouble reading number from file (row 1, field 1) ==> date-
Error in ==> csvread at 50
m=dlmread(filename, ',', r, c);
I am getting the above error. It reads the data fine using urlread. Does anybody know what the correct syntax should be and how I can get it stored as a matrix? Thanks in advance.
You can get the data directly from web without saving to file with URLREAD:
webdata = urlread('http://xweb.geos.ed.ac.uk/~weather/jcmb_ws/JCMB_2013_Mar.csv');
This will give you the whole file as one string with lines delimited by '\n'. You can process it in multiple ways. For example:
tmp = textscan(webdata,['%s',repmat('%f',1,8)],'delimiter',',','headerlines',1);
date = tmp{1};
data = horzcat(tmp{2:end});
To get column headers you can do, for example:
colheader = textscan(webdata,'%s',1,'delimiter','\n');
colheader = regexp(colheader{:},',','split');
colheader = colheader{:};
You can also convert the data to a structure:
Data = cell2struct(tmp, genvarname(colheader),2);
Try using readtext.m. That is a program which can read almost any text file. The problem in your data could be: they don't have uniform delimiters i.e. somewhere two columns are separated by tab, somewhere by comma.
The operation can be performed like this:
urlwrite('http://xweb.geos.ed.ac.uk/~weather/jcmb_ws/JCMB_2013_Mar.csv','test.csv');
data= readtext('test.csv');
It should work.
Your problem lies right here:
Trouble reading number from file (row 1, field 1) ==> date-
Matlab says it encountered "date-" in the first cell. I guess you have a header row or two. You can check in the file and call
d = csvread('test.csv',ROW);
Where ROW is the number of the row where actual data starts (number of header rows + 1).

How can I copy columns from several files into the same output file using Perl

This is my problem.
I need to copy 2 columns each from 7 different files to the same output file.
All input and output files are CSV files.
And I need to add each new pair of columns beside the columns that have already been copied, so that at the end the output file has 14 columns.
I believe I cannot use
open(FILEHANDLE,">>file.csv").
Also all 7 CSV files have nearlly 20,000 rows each, therefore I'm reading and writing the files line by line.
It would be a great help if you could give me an idea as to what I should do.
Thanx a lot in advance.
Provided that your lines are 1:1 (Meaning you're combining data from line 1 of File_1, File_2, etc):
open all 7 files for input
open output file
read line of data from all input files
write line of combined data to output file
Text::CSV is probably the way to access CSV files.
You could define a csv handler for each file (including output), use getline or getline_hr (returns hashref) methods to fetch data, combine it into arrayrefs, than use print.

How do I sort files into multiple folders in MATLAB?

I am little stuck on a problem. I have tons of files generated daily and I need to sort them by file name and date. I need to do this so my MATLAB script can read them. I currently do this manually, but was wondering if there is a easier way in MATLAB to sort and copy files.
My file names look like:
data1_2009_12_12_9.10
data1_2009_12_12_9.20
data1_2009_12_12_9.30
data1_2009_12_12_9.40
data2_2009_12_12_9.10
data2_2009_12_12_9.20
data2_2009_12_12_9.30
data2_2009_12_12_9.40
data3_2009_12_12_9.10
data3_2009_12_12_9.20
data3_2009_12_12_9.30
data3_2009_12_12_9.40
...
and tons of files like this.
Addition to above problem :
There has to be a easier way to stitch the files together.
I mean copy file
' data1_2009_12_12_9.20' after file 'data1_2009_12_12_9.10' and so on ,...
such that i am left with a huge txt file in end named data1_2009_12_12 ( or what ever ). containing all the data stitched together.
Only way now i know to do is open all files with individual dlmread command in matlab and xls write one after another ( or more trivial way of copy paste manually )
Working in the field of functional imaging research, I've often had to sort large sets of files into a particular order for processing. Here's an example of how you can find files, parse the file names for certain identifier strings, and then sort the file names by a given criteria...
Collecting the files...
You can first get a list of all the file names from your directory using the DIR function:
dirData = dir('your_directory'); %# Get directory contents
dirData = dirData(~[dirData.isdir]); %# Use only the file data
fileNames = {dirData.name}; %# Get file names
Parsing the file names with a regular expression...
Your file names appear to have the following format:
'data(an integer)_(a date)_(a time)'
so we can use REGEXP to parse the file names that match the above format and extract the integer following data, the three values for the date, and the two values for the time. The expression used for the matching will therefore capture 6 "tokens" per valid file name:
expr = '^data(\d+)\_(\d+)\_(\d+)\_(\d+)\_(\d+)\.(\d+)$';
fileData = regexp(fileNames,expr,'tokens'); %# Find tokens
index = ~cellfun('isempty',fileData); %# Find index of matches
fileData = [fileData{index}]; %# Remove non-matches
fileData = vertcat(fileData{:}); %# Format token data
fileNames = fileNames(index); %# Remove non-matching file names
Sorting based on the tokens...
You can convert the above string tokens to numbers (using the STR2DOUBLE function) and then convert the date and time values to a date number (using the function DATENUM):
nFiles = size(fileData,1); %# Number of files matching format
fileData = str2double(fileData); %# Convert from strings to numbers
fileData = [fileData zeros(nFiles,1)]; %# Add a zero column (for the seconds)
fileData = [fileData(:,1) datenum(fileData(:,2:end))]; %# Format dates
The variable fileData will now be an nFiles-by-2 matrix of numeric values. You can sort these values using the function SORTROWS. The following code will sort first by the integer following the word data and next by the date number:
[fileData,index] = sortrows(fileData,1:2); %# Sort numeric values
fileNames = fileNames(index); %# Apply sort to file names
Concatenating the files...
The fileNames variable now contains a cell array of all the files in the given directory that match the desired file name format, sorted first by the integer following the word data and then by the date. If you now want to concatenate all of these files into one large file, you could try using the SYSTEM function to call a system command to do this for you. If you are using a Windows machine, you can do something like what I describe in this answer to another SO question where I show how you can use the DOS for command to concatenate text files. You can try something like the following:
inFiles = strcat({'"'},fileNames,{'", '}); %# Add quotes, commas, and spaces
inFiles = [inFiles{:}]; %# Create a single string
inFiles = inFiles(1:end-2); %# Remove last comma and space
outFile = 'total_data.txt'; %# Output file name
system(['for %f in (' inFiles ') do type "%f" >> "' outFile '"']);
This should create a single file total_data.txt containing all of the data from the individual files concatenated in the order that their names appear in the variable fileNames. Keep in mind that each file will probably have to end with a new line character to get things to concatenate correctly.
An alternative to what #gnovice suggested is to loop over the file names and use sscanf() to recover the different sections in the filenames you are interested in:
n = sscanf(filename, 'data%d_%d_%d_%d_%d.%d')
n(1) %# data number
n(2) %# the year
...
Example:
files = dir('data*'); %# list all entries beginning with 'data'
parts = zeros(length(files), 6); %# read all the 6 parts into this matrix
for i=1:length(files)
parts(i,:) = sscanf(files(i).name, 'data%d_%d_%d_%d_%d.%d')'; %'#transposed
end
[parts idx] = sortrows(parts, [6 1]); %# sort by one/multiple columns of choice
files = files(idx); %# apply the new order to the files struct
EDIT:
I just saw your edit about merging those files. That can be done easily from the shell. For example lets create one big file for all data from the year 2009 (assuming it makes sense to stack files on top of each other):
on Windows:
type data*_2009_* > 2009.backup
on Unix:
cat data*_2009_* > 2009.backup
In Matlab the function call
files = dir('.');
returns a structure (called files) with fields
name
date
bytes
isdir
datenum
You can use your usual Matlab techniques for manipulating files.names.