How can i check storage size by each folder in EFS? - amazon-efs

In EFS directory, I have folder structure like below
project/A
/B
/C
...
I want to monitor each folder storage size every day.
ex) A folder size => 500 GB, B folder size => 200GB
How can i do this efficiently?
Can i see the information in AWS console??

The best option I found is:
du -h --max-depth=1
Note: the --max-depth=1 make it faster
also, you can use ncdu it takes time to index them all but it give an excellent navigation option...

Related

GSUTIL CP using file size

I am trying to copy files from a directory on my Google Compute Instance to Google Cloud Storage Bucket. I have it working, however there are ~35k files but only ~5k have an data in them.
Is there anyway to only copy files above a certain size?
I've not tried this but...
You should be able to do this using a resumable transfer and setting the threshold to 5k (defaults to 8Mib). See: https://cloud.google.com/storage/docs/gsutil/commands/cp#resumable-transfers
May be advisable to set BOTO_CONFIG specifically for this copy (a) to be intentional; (b) to remind yourself how it works. See: https://cloud.google.com/storage/docs/boto-gsutil
Resumable uploads has the added benefit, of course, of resuming if there are any failures.
Recommend: try this on a small subset and confirm it works to your satisfaction.
While it's not possible to do it only with gsutil, it's possible to do it by parsing the names and use the -I flag on the cp command to process them. If you're using a Linux Compute Engine instance you can perform it by using the du and awk commands:
du * | awk '{if ($1 > 1000) print $2 }' | gsutil -m cp -I gs://bucket2
The command will get the filesize of the files inside the current directory on your compute engine with du * and will only copy the files which size are larger than 1000 bytes to bucket2, you can change that value to adjust it to your needs.

Too many files on my Databricks Community cluster, but where?

I started playing with streaming on my Community Edition Databricks but after some minutes of producing test events I encountered some problem. I believe it's somehow connected with the fact of some temporary small files produced during streaming process. I would like to find them and remove, but can't find where are they stored. My exception is
com.databricks.api.base.DatabricksServiceException: QUOTA_EXCEEDED: You have exceeded the maximum number of allowed files on Databricks Community Edition. To ensure free access, you are limited to 10000 files and 10 GB of storage in DBFS. Please use dbutils.fs to list and clean up files to restore service. You may have to wait a few minutes after cleaning up the files for the quota to be refreshed. (Files found: 11492);
And I have tried to run some shell script to find out the number of files per each folder but unfortunately I cannot find suspicious, mostly lib, usr and other folder containing system or python files are there, cannot find anything that could be produced by my streaming. This script I use
find / -maxdepth 2 -mindepth 1 -type d | while read dir; do
printf "%-25.25s : " "$dir"
find "$dir" -type f | wc -l
done
Where can I find the reason for too many files problem? Maybe it's not connected to Streaming at all?
To make it clear, I have not uploaded many custom files to /FileStore
It looks like you have only checked for files on the local filesystem and not DBFS itself. You can take a look at DBFS by running the following cell in a Databricks notebook:
%sh
fs ls /
or:
%python
dbutils.fs.ls("/")
You could check for files there and remove them with dbutils.fs.rm or fs rm. Also take a look at the /tmp folder on DBFS and delete any files there.

How to list all files in Google Storage bucket in a short time?

I have a Google Storage bucket that contains more than 20k+ filenames. Is there any way to list all the filenames in the bucket in a short time?
It depends on what you mean by "short", but:
One thing you can do to speed up listing a bucket is to shard the listing operation. For example, if your bucket has objects that begin with English alphabetic characters you could list each letter in parallel and combine the results. You could do this with gsutil in bash like this:
gsutil ls gs://your-bucket/a* > a.out &
gsutil ls gs://your-bucket/b* > b.out &
...
gsutil ls gs://your-bucket/b* > z.out &
wait
cat ?.out > listing.out
If your bucket has objects with different naming you'd have to adjust how you do the sharding.

Batch file: How to make * ignore ".svn" directories

I need to zip a large and deep directory tree with thousands of files on various levels of the tree.
The problem is that the whole tree is under SVN's version control. SVN has it's hidden metadata ".svn" directories in every dir, which inflates the size of the resulting ZIP by more than 100% (which is unacceptable since the resulting archive is purposed for online distribution).
Currently I'm using this:
7z -u archive.zip baseDir\*.png
7z -u archive.zip baseDir\*\*.png
7z -u archive.zip baseDir\*\*\*.png
7z -u archive.zip baseDir\*\*\*.png
7z -u archive.zip baseDir\*\*\*\*.png
...where the number of * levels is the maximum theoretical value of the tree. And all this is repeated for every extension that can possibly appear in the tree. This works - it builds the archive exactly as it should, but it takes far too long (a few minutes), since the whole tree has to be traversed many times.
And I want to make it faster, since I need to repeat this for every debug session.
Is there a more efficient way to select the "real" files in the directory tree?
Thanks for any help!
Try -xr!.svn
Stupid site won't recognise my answer because it was too simple...

How to know each solaris zone is occupying how much disk space?

IF I use the df command, I can only see in the Solaris server how much disk space is being used up. But I want to know how much diskspace a particular solaris zone is occupying
The problem I found with these solutions is that they do not take into account directory inheritance. Yes, you will find out how much "space" is under a certain directory. But if you want to actually find out how much extra space a zone is taking, you have to go a different route.
Do a
zonecfg -z zonename info
where zonename is the name of the zone. And look at each inherit-pkg-dir line.
inherit-pkgdir-dir:
dir: /lib
inherit-pkgdir-dir:
dir: /sbin
Any line that has inheritance is hard-linked to the zone. So you will be double
counting against the global zone if you simply do a
du -sh /zonepath/zonename
Basically you have to count only the directories (excluding /proc, and maybe /tmp) that aren't listed in any inherit-pkg-dir lines.
cd /zonepath/zonename/root
du -sh /bin /dev /etc /home /kernel opt /system /var etc....
try the du command
Yes - definitely su to root to do this.
I was able to run: /usr/xpg4/bin/du -sk -h -x /zonepath/zonename
to get the space that was used in a UFS root partition of a zone.
For example, /usr/xpg4/bin/du -sk -h -x /zonepath/zonename
returned the following: 3.5G /zonepath/zonename
The -x option when evaluating file sizes, evaluates only those files that have the same device as the file specified by the file operand.
The -x operand only seems to work when calling du with this path: /usr/xpg4/bin/du
This also worked to display the used space of the zfs attached drives in the zone! We mounted one zfs lun to the path /zoneepath/zonename/data and running this matched the output of "zfs list" for the data file:
# /usr/xpg4/bin/du -sk -h -x /zonepath/zoneneame/data
11G /zonepath/zoneneame/data
If you run #/usr/xpg4/bin/du -sk -h -x /zonepath/zoneneame
then you should get an overall total of the used space in the zone such as:
53G /zonepath/zonename
It will not include NFS attached drives, nor will it include directories that root is not the owner of.
I hope this helps!
Since I tried both John's solution and Pierre-Luc solution, what works for me is:
list all the zone (from the global zone)
:
tcsh>zoneadm list -civ
ID NAME STATUS PATH BRAND IP
0 global running / native shared
1 myZone1 running /export/zones/myZone1 native shared
2 myZone2 running /export/zones/myZone2 native shared
du -sk as root
(since local zones are not readable from global zone, I had to du -sk them as root)
:
tcsh>s du -sk /export/zones/myZone1
9930978 /export/zones/myZone1
According to Solaris Operating System Managing ZFS in Solaris 10 Containers the following command would give you the information you require.
zfs list
If you install the zone on a zfs volume then you can use the zfs tools ("zfs list") to quickly see how much space has been used.
Otherwise you'll have to use "du" as you already discovered (which will be much slower).