FIrestore: Finding large files or directories - google-cloud-firestore

I have a few hundred folders in my Firestorage bucket, each containing a couple of small images. However, my storage size is almost 80GB!
Is there a way to find the culprit files or folder? I can't seem to find a way to view folder sizes or get a list of the top largest files without causes a huge number of reads.

By using the gsutil command, you could try using the DU command:
gsutil du -sh YOUR_BUCKET/YOUR_DIRECTORY
The -s flag will give you only the size of the directory, if you remove it you will also see the size of the files inside.
The -h flag prints object sizes in human-readable format (e.g., 1 KiB, 234 MiB, 2GiB, etc.)
You can then find out which files are the biggest.

Related

copy (multiple) files only if filesize is smaller

I'm trying to make my image reference library take up less space. I know how to make Photoshop batch save directories of images with a particular amount of compression. BUT some of my images were originally save with more compression than what I would have done.
So I wind up with two directories of images, some of the newer files have a larger filesize, some smaller, and some the same. I want to copy over the new images into the old directory, excluding any files that have a larger filesize (or the same, though these probably aren't numerous enough for me to care about the extra time to process them).
I obviously don't want to sit there and parse through each file, but other than that I'm not picky about how it gets tackled.
running Windows 10, btw.
We have similar situations. Instead of Photoshop, I use FFmpeg (using its qscale option) to batch re-encode multiple images into a subfolder then use XXCOPY to overwrite only the larger original source images. In fact I ended up creating my BATCH file which let FFmpeg do the batch e-encoding (using its "best" qscale setting), then let ExifTool batch copy the metadata to the newly encoded images, then let XXCOPY copy only the smaller newly created images. All automated, with the "new" folder and its leftover newly created but larger-sized images deleted too. Thus I save considerable disk space, as I have many images categorized/grouped in many different folders. But you should make a test run first or back up your images. I hope this works for you.
Here is my XXCOPY command line:
xxcopy "C:\SOURCE" "C:\DESTINATION" /s /bzs /y
The original post/forum where I learned this from is:
overwrite only files wich are smaller
https://groups.google.com/forum/#!topic/alt.msdos.batch.nt/Agooyf23kFw
Just to add, XXCOPY can also do it if the larger file size is wanted instead which I think is /BZL. I think it's also mentioned in that original post/forum.

Matlab: Number of files in a folder excluding the file names information

I am looking for a way to count the number of files in a folder path without caring about the names of the files. DIR function extracts all the names which is unnecessary for my specific application.
Since I'm looking at 100 folders and each folder includes almost 35000 files in it, it is very time consuming if I use the "dir" function.
Any help is greatly appreciated.
Do
someDir = 'c:\Users\You\somePath\' //whatever directory you want to do it for
[status,cmdout] = System(['dir ' someDir '*.* /s'])
and you can parse out the number of files from cmdout
This should be faster because its just running a system command so you lose all the overhead of matlab.

How to check if there is enough free space inside directory on Linux

I want to check all availalble space in directory A using 'stat'. Then I want to check the size of directory B using 'du' and if directory A has enough free space, then i want to copy B into A.
The question is what arguments I need to pass to the 'stat' and 'du' commands so that they will return their output in the same format (nodes, bytes, etc...)
On Linux there is no limit to the files contained in a directory, there isn't even a limit on how many files can be placed in a directory. This can all be found in the linux manpages.
Iff the device that A is on is different from the one B is on, you may be curious about how much available space there is left on A's device. For that you use:
stat --file-system A B

Reading image files serially from a folder in matlab

I tried to read .jpg files from a folder in matlab using dir command. But I am not getting them from the first image stored in the folder. Instead it started from 10th image. I want to know how to read the files serially starting from the beginning.
I am almost certain that if you are using a simple enough command, it will give you all files that are there. However, this may not seem to be the case because of this little line in the description:
Results appear in the order returned by the operating system.
This may mean, that you will first see files like 1, 100, 1000,1999 and only later a file with numer 2. Of course you can sort the results after you have collected them, and then process them in your desired order.
For completenes, one would like to have something like:
dir *.jpg
or if you want to be sure to catch everything that even remotely resembles .jpg:
dir *.*j*p*g*

Copying constantly changing directory

I am trying to copy files from a directory that is in constant use by a security cam program. To archive these .jpg files to another HD, I first need to copy them. The problem is, the directory is being filled as the copying proceeds at the rate of about 10 .jpgs per second. I have the option of stopping the program, do the copy then start it again which is not what I want to do for many reasons. Or I could do the find/mtime approach. I have tried the following:
find /var/cache/zm/events/* -mmin +5 -exec cp -r {} /media/events_cache/ \;
Which under normal circumstances would work. But it seems the directories are also changing their timestamps and branch off in different directions so it never comes out logically and for some reason each directory is very deep like /var/cache/zm/events/../../../../../../../001.jpg x 3000. All I want to do is copy the files and directories via cron with a simple command line if possible. With the directories constantly changing, is there are way to make this copy without stopping the program?
Any insight would be appreciated.
rsync should be a better option in this case but you will need to try it out. Try setting it up at off peak hours when the traffic is not that high.
Another option would be setting up the directory on a volume which uses say mirroring or RAID 5 ; this way you do not have to worry about losing data (if that indeed is your concern).