I have a gcs bucket with many files
gs://bucket/file_0001.txt
gs://bucket/file_0002.txt
...
gs://bucket/file_1000.txt
In a juputer notebook, I can copy everything to local with something like
subprocess.run(f"gsutil -m cp {gcs_path}/*.txt {local_path}/".strip().split())
Now suppose I am given a list of file names
f = ['file_0123', 'file_0456', 'file_0789']
My question is, how can I copy the files in this list, without a for loop?
EDIT: In a simpler setup, I just want to copy all the files within a numeric range, e.g., file_0010, file_0011, ..., file_0100. How can I do that? I found using regex to be very messy..
Related
I've got a directory of CSV files.
I can do:
\ls /home/chris/data/
which works, producing a list:
"AAA.csv"
"AAAU.csv"
"AABA.csv"
etc.
However, when I try to assign it, so I can actually do something useful with it:
files: \ls /home/chris/data/
I'm unable to actually save the list.
What am I doing wrong?
You can use the system keyword, \ only works if it's the first character on a line:
files: system "ls /home/chris/data/"
Alternatively, key can be used, which I prefer as it's platform-independent with respect to path separators, and the shell command for listing files:
key `:/home/chris/data
I would like to be able to create a list ,array, of file names on a folder so that I can use PeopleCode to loop through them and delete files that match a pattern and are in a date range.
I'm pretty sure I have the last half of that, matching a pattern and in a date range, but I do not know how to get the list on remote servers. I can do it on our local servers, but not remote ones.
I had hoped that this would work:
Local object &files = CreateJavaObject("java.io.File", SFO_DEL_FTP_AET.FTPDIRECTORY | "*.*");
But I don't think it is working.
Can somebody help me?
Thanks,
JPS
You can use Java to access/modify the files in a directory. Try:
Local JavaObject instead of Local object
We created a PS component to view, upload, and delete files in an App Server directory. You can see how we did it here:
https://github.com/cy2hq/PeopleSoft-Directory-Viewer
I have datewise folders in the form of root-dir/yyyy/mm/dd
under which there are so many files present.
I want to update the timestamp of all the files falling under certain date-range,
for example 2 weeks ie. 14 folders, so that these these files can be picked up by my file-Streaming Data Ingestion process.
What is the easiest way to achieve this?
Is there a way in UI console? or is it through gsutil?
please help
GCS objects are immutable, so the only way to "update" the timestamp would be to copy each object on top of itself, e.g., using:
gsutil cp gs://your-bucket/object1 gs://your-bucket/object1
(and looping over all objects you want to do this to).
This is a fast (metadata-only) operation, which will create a new generation of each object, with a current timestamp.
Note that if you have versioning enabled on the bucket doing this will create an extra version of each file you copy this way.
When you say "folders in the form of root-dir/yyyy/mm/dd", do you mean that you're copying those objects into your bucket with names like gs://my-bucket/root-dir/2016/12/25/christmas.jpg? If not, see Mike's answer; but if they are named with that pattern and you just want to rename them, you could use gsutil's mv command to rename every object with that prefix:
$ export BKT=my-bucket
$ gsutil ls gs://$BKT/**
gs://my-bucket/2015/12/31/newyears.jpg
gs://my-bucket/2016/01/15/file1.txt
gs://my-bucket/2016/01/15/some/file.txt
gs://my-bucket/2016/01/15/yet/another-file.txt
$ gsutil -m mv gs://$BKT/2016/01/15 gs://$BKT/2016/06/20
[...]
Operation completed over 3 objects/12.0 B.
# We can see that the prefixes changed from 2016/01/15 to 2016/06/20
$ gsutil ls gs://$BKT/**
gs://my-bucket/2015/12/31/newyears.jpg
gs://my-bucket/2016/06/20/file1.txt
gs://my-bucket/2016/06/20/some/file.txt
gs://my-bucket/2016/06/20/yet/another-file.txt
My team faces the need to encrypt all files in a repository with AES256. For this purpose, we decided we are going to zip all files with such encryption, using the same key for all of them.
The problem we have is that these files sit in a NAS, so from windows boxes they are accessible by \ to them.
The directory structure is something like this:
Original Structure:
Root
-1
|--folder1
|---file1.ext
|---file2.ext
|--folder2
|---filea.ext
|---fileb.ext
|--folder2.a
|---filec.ext
and so on...
Essentially, what we need is to have all the original files contained in a zip file, keeping their original names, which would be something like this:
Desired Outcome:
|-Root
|-1
|--folder1
|---file1.zip
|---file2.zip
|--folder2
|---filea.zip
|---fileb.zip
|--folder2a
|---filec.zip
and so on...
To accomplish this, we tried a batch script that calls 7zip, but it only works if it's run from the root directory, which is something we cannot use as the files are not in a server.
Here is the syntax of the batch script we came up with:
FOR /R %%i IN ("*.wmv") DO "C:\Program Files\7-Zip\7z.exe" a -mx0 -tzip -pPasswordHere "%%~dpni.zip" "%%i"
But, as wrote previously, it only works when run from the root folder, which is something we cannot do as files sit on a network location.
Mapping the drive or making a symbolic link to it doesn't do the trick either.
I've also checked on 7zip to do this, namely, making use of its "-r" operator, but I couldn't find a way to get the desired outcome (namely, recurse through all folders in the remote tree structure -there are a lot of them...- and keep the original file name).
I'm open to any suggestions as any kind of script, trick or guizmo that gets the job done will be more than welcome. =)
Thanks a million in advance!,
Sebas.
----SOLUTION----
I actually found a sollution here, mapping the drive in a different way (it's so simple it just made me feel stupid(er), but it's altogheter beautiful).
Using the batch script below, the remote share can be mapped like so:
You can map a drive using
net use X: \\server\directory
and then you can change to that directory using
pushd X:
(Post from which the answer was taken from: Batch File Iterating through files on a local network server)
Is it possible to list only the folders in a bucket using the gsutil tool?
I can't see anything listed here.
For example, I'd like to list only the folders in this bucket:
Folders don't actually exist. gsutil and the Storage Browser do some magic under the covers to give the impression that folders exist.
You could filter your gsutil results to only show results that end with a forward slash but this may not show all the "folders". It will only show "folders" that were manually created (i.e., not implicitly exist because an object name contains slashes):
gsutil ls gs://bucket/ | grep -e "/$"
Just to add here, if you directly drag a folder tree to google cloud storage web GUI, then you don't really get a file for a parent folder, in fact each file name is a fully qualified url e.g. "/blah/foo/bar.txt" , instead of a folder blah>foo>bar.txt
The trick here is to first use the GUI to create a folder called blah and then create another folder called foo inside (using the button in the GUI) and finally drag the files in it.
When you now list the file you will get a separate entry for
blah/
foo/
bar.txt
rather than only one
blah/foo/bar.txt