How can I import a worksheet from a password-protected xlsx workbook into R?
I would like to be able to convert an Excel worksheet into a csv file without having to go through Excel itself.
It is possible for xls workbooks using the perl-based function xls2csv from package gdata. I gather that the problem is Spreadsheet::XLSX doesn't support it.
There are a variety of functions and packages for importing non-encrypted xlsx workbooks, but none seems to address this issue.
At present it seems the only alternatives are to go through Excel or figure out how to write perl code that can do it.
It looks to be what you need except it isn't with the xlsx package:
https://stat.ethz.ch/pipermail/r-help/2011-March/273678.html
library(RDCOMClient)
eApp <- COMCreate("Excel.Application")
wk <- eApp$Workbooks()$Open(Filename="your_file",Password="your_password")
tf <- tempfile()
wk$Sheets(1)$SaveAs(tf, 3)
To build on ed82's answer, there are a few caveats:
You may need to pass another password parameter, WriteResPassword. See docs here
I didn't find learning COM interface appealing after I got used to xlsx R package. So I would rather save a copy of the protected Excel file without a password immediately, close it, and read it in with another package:
eApp <- COMCreate("Excel.Application")
# Find out whether you need to pass **Password** or **WriteResPassword**
wk <- eApp$Workbooks()$Open(Filename= filename, Password="somepass", WriteResPassword = "somepass")
# Save a copy, clear the password (otherwise copy is still pass-protected)
wk$SaveAs(Filename = '...somepath...', WriteResPassword = '', Password = '')
# The copied file is still open by COM, so close it
wk$Close(SaveChanges = F)
# Now read into data.frame using a familiar package {xlsx}
my.data <- raed.xlsx('...somepath...', sheetIndex = ...)
Related
I am really a newbie in matlab programming. I have a problem in coding to import multiple csv files into one from certain folder:
This is my code:
%% Importing multiple CSV files
myDir = uigetdir; %gets directory
myFiles = dir(fullfile(myDir,'*.csv')); %gets all csv files in struct
for k = 1:length(myFiles)
data{k} = csvread(myFiles{k});
end
I use the code uigetdir in order to be able to select data from any folder, because I try to make an automation program so it would be flexible to use by others. The code that I run only look for the directory and shows the list, but not for merging the csv files into one and read it in "import data". I want it to be merged and read as one file.
My merged file should look like this with semicolon delimited and consist of 47 csv files merged together (this picture is one of the csv file I have):
my merged file
I have been working for it a whole day but I find always error code. Please help me :(. Thank you very much in advance for your help.
As the error message states, you're attempting to reference myFiles as a cell array when it is not. The output of dir is a structure, which cannot be indexed like a cell array.
You want to do something like the following:
for k = 1:numel(myFiles)
filepath = fullfile(myFiles(k).folder, myFiles(k).name);
data{k} = csvread(filepath);
end
I want to read multiple files from a folder but this code does not work properly:
direction=dir('data');
for i=3:length(direction)
Fold_name=strcat('data\',direction(i).name);
filename = fullfile(Fold_name);
fileid= fopen(filename);
data = fread (fileid)';
end
I modified your algorithm to make it easier
Just use this form :
folder='address\datafolder\' ( provide your folder address where data is located)
then:
filenames=dir([folder,'*.txt']); ( whatever your data format is, you can specify it in case you have other files you do not want to import, in this example, i used .txt format files)
for k = 1 : numel(filenames)
Do your code
end
It should work. It's a much more efficient method, as it can apply to any folder without you worrying about names, number order etc... Unless you want to specify certain files with the same format within the folder. I would recommend you to use a separate folder to put your files in.
In case of getting access to all the files after reading:
direction=dir('data');
for i=3:length(direction)
Fold_name=strcat('data\',direction(i).name);
filename = fullfile(Fold_name);
fileid(i)= fopen(filename);
data{i-2} = fread (fileid(i))';
end
I have several data log files (here: 34) for those I have to calculate some certain values. I wrote a seperate function to publish the results of the calculation in a pdf file. But I only can publish one file after another, so it takes a while to publish all 34 files.
Now I want to automize that with a loop - importing the data, calculate the values and publish the results for every log file in a new pdf file. I want 34 pdf files for every log file at the end.
My problem is, that I couldn't find a way to rename the pdf files during publishing. The pdf file is always named after the script which is calculating the values. Obviously the pdf is overwritten within a loop. So at the end everything is calculated, but I only have the pdf from the last calculated log file.
There was this hacky solution to change the Matlab publish script, but since I don't have admin rights I can't use that:
"This is really hacky, but I would modify publish to accept a new option prefix. Replace line 93
[scriptDir,prefix] = fileparts(fullPathToScript);
with
if ~isfield(options, 'prefix')
[scriptDir,prefix] = fileparts(fullPathToScript);
else
[scriptDir,~] = fileparts(fullPathToScript);
prefix = options.prefix; end
Now you can set options.prefix to whatever filename you want. If you want to be really hardcore, make the appropriate modifications to supplyDefaultOptions and checkOptionFields as well."
Any suggestions?
Thanks in advance,
Martin
Here's one idea using movefile to rename the resultant published PDF on each iteration:
for i = 1:34
file = publish(files(i)); % Replace with your own command(s)
[pathStr,fileName,ext] = fileparts(file);
newFile = [pathStr filesep() fileName '_' int2str(i) ext]; % Example: append _# to each
[success,msg,msgid] = movefile(file,newFile);
if ~success
error(msgid,msg);
end
end
Also used are fileparts and filesep. See this question for other ways to rename and move files.
I am new to programming & python and is trying to write a program to process astronomical data.I have a huge list of files naming like ww_12m_no0021.spc, ww_12m_no0022.spc and so on. I want to move all the odd numbered files and even numbered files in two seperate folders.
import shutil
import os
for file in os.listdir("/Users/asifrasha/Desktop/python_test/input"):
if os.path.splitext(file) [1] == ".spc":
print file
shutil.copy(file, os.path.join("/Users/asifrasha/Desktop/python_test/output",file))
which is actually copying all the spc file to a different folder. I am struggling a bit on how I can only copy the odd number files (no0021, no0023…) to a seperate folder. Any help or suggestions will be much appreciated!!!
import os
import shutil
# Modify these to your need
odd_dir = "/Users/asifrasha/Desktop/python_test/output/odd"
even_dir = "/Users/asifrasha/Desktop/python_test/output/even"
for filename in os.listdir("/Users/asifrasha/Desktop/python_test/input"):
basename, extenstion = os.path.splitext(filename)
if extenstion == ".spc":
num = basename[-4:] # Get the numbers (i.e. the last 4 characters)
num = int(num, 10) # Convert to int (base 10)
if num % 2: # Odd
dest_dir = odd_dir
else: # Even
dest_dir = even_dir
dest = os.path.join(dest_dir, filename)
shutil.copy(filename, dest)
Obviously you can simplify it a bit; I'm just trying to be as clear as possible.
Assuming your files are named ww_12m_no followed by the number:
if int(os.splitext(file)[0][9:])%2==1:
#file is oddly numbered, go ahead and copy...
If the length of the first half of the name changes, I would use regex... I didn't test the code, but that's the gist of it. I'm not sure this question belongs here though...
I am reading an .xls file using Spreadsheet::ParseExcel and was able to get data as is.
But,when reading an .xlsx file using Spreadsheet::XLSX, the read values are truncated.
E.g., 2.4578 in .xls and .xlsx file is read as 2.4578 and 2.45, respectively.
Please suggest why .xlsx file data is corrupted.
I created a simple workbook containing one sheet and only the value 2.4578 in A1 and ran the following script:
use Spreadsheet::XLSX;
my $excel = Spreadsheet::XLSX->new('Book1.xlsx');
my ($sheet) = #{ $excel->{Worksheet} };
print $sheet->{Cells}[0][0]{Val}, "\n";
Output:
C:\Temp> x
2.4578000000000002
So, in this simple case, everything seems to be OK.
If you can post a short, self-contained example that exhibits the problem and a small sample .xlsx file which we can look at, we would have a better chance of identifying the problem.
Try $cell->{Val} for the unformatted raw value instead of $cell->Value() for the Excel formatted value.