Importing .csv WITHOUT the NA in empty cells - import

Six years ago this guy asked my question...
Getting rid of NA values in R when importing a CSV file
...but no final answer showed up because he never displayed his data
Here is an image of my data in a.csv file. If you need it as code, just say so and I will supply (I just used the RAND() function in Excel to create it).
There are no "missing values" in the usual sense. It just that some columns have more, others less. I will never need to invert so it need not be rectangular. All the "empty" cells are always at the bottom. AND rm.na does not work as I need to operate on each column separately with a special function that does not take rm.na as an argument. How can I import this into R with NOTHING in the cells which are empty. I have tried readxl, RStudio, using Excel instead of .csv, lots of different approaches. Most of them want me to eliminate rows with NA in them. That does not work, either.
Thanks for anyone who can help

Related

Reading csv from the second line and creating output

I want from a csv archive to read only one column. The problem is that I want to read this column from the second line and by using these commands:
[d1,tex]= xlsread(filename1);
name=tex(:,4)
it's reading from the first line.
Also, I would like to create a matrix that will inclue two columns that have come from commants (equations etc) in my Matlab code.
xlsread is deprecated by MathWorks. Try using readtable in the future.
To your original question, I'm assuming that you want to read everything in the 4th column from the second row onward. If so, your second line is incorrect:
name = tex(2:end,4)
Without further example code, I can't answer the rest of your question. Add some details and I'll see what I can do.

Latex variable automation: writing variables for Latex directly from Stata and MATLAB

I am working on this project which requires analyzing a large (>50GB) dataset in a server, both in Stata and MATLAB. Both parts are required and I cannot use only one of them.
My ultimate goal is to generate a .tex file named something like commands.tex which looks like this:
\newcommand{\var1}{val1}
\newcommand{\var2}{val2} % MATLAB file matlab_file.m on DD/MM/YYYY
\newcommand{\var3}{val3} % Stata file stata_file.m on DD/MM/YYYY
...
where variables are ordered alphabetically and each of the values is most probably a number. Note that the commands in the comments would help me trace where did I generate the values. The usage of the file is so that after a preamble I can use LaTeX on the following way:
<preamble>
\input{commands.tex}
\begin{document}
Variable 1 has a value of \var1 and variable 2 has a value of \var2.
\end{document}
The purpose of this is so that I can analyze locally (or remotely) a sample, say of 0.1 or 10 percent of the total observations, write a report with those, and then run the analysis again with a bigger size. I want to completely eliminate the chances of me copying a number wrong.
I am trying to write some code both in MATLAB and Stata, but I think that is beyond my expertise, and would be very grateful if someone could help me figure out how to do it. To be honest, I feel I would be able to do the MATLAB part but the Stata I have no idea.
Stata code
What I am trying to do is to generate a command that takes as an input a name and a scalar and as an output defines the corresponding variable in my commands.tex file detailed above. My goal is to be able to generate something like this:
sysuse auto
reg price weight
define_variable PriceWeight = _b[weight], format(%4.2f)
and what I hope the code to do is that:
If \newcommand{\PriceWeight} does not exist in commands.tex then it adds its value to the list, preserving the alphabetical order.
If the variable exists then it deletes its value and rewrites above it, with the value given in the scalar.
I know how to give the values to a program in Stata, but I do not exactly know how to use those values and perform the necessary commands. The syntax is something like:
program define define_variable
syntax anything = X, [format(string)]
<other code>
end
Note: Of course, I need something way deeper than regression coefficients, but as a simple example this would suffice.
MATLAB code
This seems to be easier in MATLAB, but I do not know exactly how to automate the process. In MATLAB what I want to be able to do is something like:
clc; clear;
PriceWeight = 3
define_variable('PriceWeight',PriceWeight,format)
again where it automatically goes to the single file and updates it accordingly. Any hel[p with be very much appreciated.
Based on your comments and assuming that your file with all relevant variables is not huge, I would suggest getting your data from Stata to Matlab, and update your variables there as necessary (using functions such as exist or strcmp if you have a list of names). A quick google search gives me this link for Stata to Matlab.
To make it easy to process you might want to create a cell (I will call C), where one column contains all variable names and one column contains the scalar values.
Then, once you have assembled all your variables, you can sort your cell alphabetically and write it to a file using this.. Of course you would write a .tex file, and then iterate over your cell with something like
fprintf(fID,'\newcommand{\%s}{%f} ',C{i,1},C{i,2})
I hope this is understandable and helps.

MATLAB: datasets stuck in each box

I'm unsure how to phrase exactly my problem. I have run a script and it is putting each file of data into one box, making four files, a 1x4 array. I can click on each and t will expand, however this is not the format that is acceptable. Is it possible to extract this to turn it into one single file? I attached a picture.
Your second file has 23 columns instead of 24, so you cannot just add them together. If you add a column to the second file you can convert them using:
b=cell2mat(a')
adding a column can be done like this:
a{2}(:,24)=nan;

Jasper reports: Overlapped fields not shown in excel output

Take the following example:
I have one very wide column (lets say 150pt), positioned on x=0
I have 2 columns of 25pt, positioned on x=100 and x=125. Thus, these are overlapping the first.
Depending on certain conditions (parameters to the report), I do or do not print the 2 overlapping columns. I do this by using the "print when expression ...".
This works like a charm when I use the PDF as output, but when I generate the report in excel, I do not get the big field, it is just missing. As long as I do not print the 2 overlapping fields, everything remains OK.
Any ideas on how to solve this one?
Thanks
Sounds like the answer is no ... hopefully there is a better answer
http://community.jaspersoft.com/questions/503288/missing-columns-excel
... the columns are overlapping. And the so-called "grid
exporters" like the HTML, XLS and CSV exporters do not support
overlappging elements. The elements that are behind do not print.
I hope this helps. Teodor

Reading large csv files with strings containing commas as one field

I have a large .csv file (~26000 rows). I want to be able to read it into matlab. Another problem is that it contains a collection of strings delimited by commas in one of the fields.
I'm having trouble reading it. I tried stuff like tdfread, which won't work here. Any tricks with textscan i should be aware about?
Is there any other way?
I'm not sure what is generating your CSV file but that is your problem.
The point of a CSV file, is that the file itself designates separation of fields. If the text of the CSV contains commas, then nothing you can do will help you. How would ANY program know when the text in a single field contains commas, or when that comma is a field delimiter?
Proper CSV would have a text qualifier. Some generators/readers gives you the option to use one. The standard text qualifier is a " (quote). Its changeable, though, because your text may contain those, too.
Again, its all about generating proper CSV content.
There's a chance that xlsread won't give you the answer you expect -- do the strings always appear in the same columns, for example? I think (as everyone else seems to :-) that it would be more robust to just use
fid = fopen('yourfile.csv');
and then either textscan
t = textscan(fid, '%s', delimiter', sprintf('\n'));
t = t{1};
or just fgetl (the example in the help is perfect).
After that you can do some line-by-line processing -- using textscan again on the text content of each line, for example, is a nice, quick way to get a cell-array that will allow fast analysis of each line.
You have a problem because you're reading it in as a .csv, and you have commas within your data. You can get it in Excel and manipulate the date, possibly extract the unwanted commas with Excel formulas. I work with .csv files for DB imports quite a bit. I imagine matLab has similar rules, which is - no commas in your data.
Can you tell us more about your data? Are there commas throughout, our just one column? Maybe you can read it in as tab delimited?
Are you using a Unix system? The reason I am asking is that you could use a command-line function such as sed and regular expressions to clean those data files before you pass them into Matlab. Here is a link that explains how to do exactly what you are looking for.
Since, as others have observed, your file is CSV with commas inside what you think of as a single field, it's going to be hard to persuade Matlab that that really is only one field. I think your best strategy is going to be to read one line at a time, into a string acting as a buffer, and to translate it, field-by-field, into the variables or other data structures that you want. Since Matlab has in-built regular expression capabilities this shouldn't be too hard.
And, as others have already suggested, posting a sample of your data would help us to help you.
One easy solution is:
path='C:\folder1\folder2\';
data = 'data.csv';
data = dataset('xlsfile',sprintf('%s\%s', path,data));
Of course you could also do the following:
[data,path] = uigetfile('C:\folder1\folder2\*.csv');
data = dataset('xlsfile',sprintf('%s\%s', path,data));
now you will have loaded the data as dataset. An easy way to get a column 1 for example is
double(data(1))