Reading files into a pyspark dataframe from directories and subdirectories - pyspark

I have the below to read all files within a directory, but I am struggling with getting the subdirectories too. I won't always know what the subdirectories are and hence cannot explicitly define it
Can anyone advise me please?
df = my_spark.read.format("csv").option("header", "true").load(yesterday+"/*.csv")

Use Wildcards after the directory location where you wish to read all the sub directories.
"path/*/*"

Thanks to Joby
can you try giving wildcards in this way and see "path//" – Joby 23
hours ago

Related

load .mat files with the same prefix name in matlab (e.g. load abc*.mat)

I have many .mat files in a folder that has the same prefix, for example abc***.mat.
Is there a way to use a wildcard type for loading files?
I can write dir abc*.mat and get the list of files but I can't load abc*.mat...
Thanks!
I was hoping there was a way without a for loop for that, but following Cris Luengo (thanks!) suggestion here is a way to accomplish that.
temp=dir('abc*.mat');
for ii=1:numel(temp)
load(temp(ii).name);
end

Extracting Multiple 7z Files Overrides Same Folder

I'm currently working on a project where I take multiple 7z files and extract the contents of these files in a folder named the same way as the 7z file itself. I also apologize if something like this has been answered already; I spent time trying to investigate this issue but I can't seem to find anyone else who has had a similar issue.
For example:
a1.7z -> <targetpath>/a1/<contents within a1.7z>
The following shell line: a1.7z | % {& "C:\Program Files\7-Zip\7z.exe" "x" $_.fullname "-o<targetpath>\a1" -y -r}
Works like a dream, but only for one 7z file. However, whenever I start extracting a second 7z file, it won't create a new folder but instead will continue to add into the same first folder that is created; second folder is never made. When I manually highlight all of the 7z files I want to extract, right click and select "Extract to "*\", it does what I would like it to do but I can't figure out how to script this action. I should also mention that some of the 7z files, when extracted, can contain subfolders of the same name. I'm not sure if this is throwing off the recursion cycle, but I'm assuming this might be the case.
Any help or advice on this topic would be greatly appreciated!
If you get all the .7z files as IOFileInfo objects (Using get-ChildItem) you can use Mathias comment, as one way to do this with the pipeline, but I recommend you put this inside a loop and look for a better way to choose the names of the folders I.e. "NofFolder_$_.BaseName" just in case of more than 1 folder with the same name.
It really depends on the format you want.

How to put all the file names into an array in MATLAB

I plan to list all the file names of a current folder (include subfolder) and put them and their path into an array. I can use s=dir to put the names and path of all the files in the current folder, I can also use "dir **/." to show the files in the current folder and subfolders.
But when I use "s=dir **/.", Matlab gives me error and I am not able to proceed. Is there anyone can help me on this?
The reason why I want to do this is to compare two folders which may contain plenty of duplicate files. I want to use file name as the indicator and to find out the new adding or removed files, so that I can update the log excel we have.
Thank you for your help.
To list only the files and not the directories try
file_names = dir('**/');
file_names = file_names(~[file_names.isdir]);
file_names = {file_names.name}
You were really close, you can just run:
s = dir('**\');
And that should get you what you need

TYPO3 Filecollection, Type "Folder from Storage" - Recursion possible?

is there a way to get a file list recursively based on one file collection that points to a directory in fileadmin?
Currently I only got it to work with files directly in that directory, not also with files in sub-directories of that directory.
So instead of setting lots of file collections for each (sub directory)
I'd like to set only the "top"level directory (here "Kurs77") and have the files, even from sub directories, displayed.
Reason is, editors may add an unknown amount of (sub)sdirectories, and I'd like to have the files automagically displayed in the file list in the front end -- without the need to create an increasing amount of file collections.
cheers,
Tom
it seems that this is a missing feature. Check out https://forge.typo3.org/issues/61238. It seems that the underlaying API is able to do that.
So one solution would be to use TypoScript to make that work.
To give the correct answer now: The recursive option is of course available but it is part of the sys_file_collection record.
In TYPO3 9 this is working out of the box. pity is not showing folder as title, but recursive works:

MATLAB: How do I copy files with a specific extension to a folder one directory above it?

I am trying to copy specific files from one folder into another folder one directory above it. I want to do this for all of the folders I have at once. Here's my file structure:
201415ContinuousForDropTeqc/StationA/201411/
This path has 25 folders labeled 5 through 30 (representing days).
In each of these 25 folders there are 3 folders named 'dat', 'RAW', 'rinex'.
I want all files ending in .14o from the RAW folder (there are many other file types in this folder as well) to be copied to the rinex folder.
I'm also hoping I can find a way to repeat this for every day in the 201411 folder. This last part isn't critical since I think can type the path manually and just run the script that copy and pastes the files I want.
I hope this was clear. I'm new-ish to MatLab.
Thank you in advance for your help!
Tiffany
You can do all that using the dir command. Check this link.
You can use it twice. first to get all the 25 folders and then to get all the files within the folder.
Days = dir('201415ContinuousForDropTeqc/StationA/201411/');
for k=3:numel(Days) %notice the 3
files = dir([Days(k).name '/RAW/*.14o']);
for n=1:numel(files)
copyfile([Days(k).name '/RAW/' files(n).name],[Days(k).name '/rinex/' files(n).name]);
end
end