How to reduce par folder size in /tmp? - perl

I have a perl project packaged by Par::Packer and distribute by RPM on RedHat.
I found that a par-hex{6} directory would be created after each user run my command.
$ /tmp
lirian 19M par-6c3973
andy 19M par-6d6a7a
raymond 19M par-726679
john 19M par-736a69
Can I reduce the par directory size or use one shared par folder among all user?
More: I've read the instruction on pp and PAR::FAQ#RPM.
I use pp -d to package the par file.

I think I found an not-so-perfect but enough answer.
Now I use "pp -d -T lirian-tool" to package my perl script.
According to pp-option-page:
-T, --tempcache
Set the program unique part of the cache directory name that is used if the program is run without -C. If not set, a hash of the executable is used.
When the program is run, its contents are extracted to a temporary directory. On Unix systems, this is commonly /tmp/par-USER/cache-XXXXXXX. USER is replaced by the name of the user running the program, but "spelled" in hex. XXXXXXX is either a hash of the executable or the value passed to the -T or --tempcache switch.
So now we have /tmp/par-User/cache-lirian-tool. Each user's cache directory may take 20M, which is acceptable.
And you can cleanup the cache by running "pp -c":
-C, --clean
Clean up temporary files extracted from the application at runtime. By default, these files are cached in the temporary directory; this allows the program to start up faster next time.

Related

LSF moving files into created output dir

When executing a job on LSF you can specify the working directory and create a output directory, i.e
bsub -cwd /home/workDir -outdir /home/$J program inputfile
where it will look for inputfile in the specified working directory. The -outdir will create a new directory based on the JobId.
What I'm wondering is how you pipe the results created from the run in the working directory to the newly created output dir.
You can't add a command like
mv * /home/%J
as the underlying OS has no understanding of the %J identifier. Is there an option in LSF for piping the data inside the job, where it knows the jobId?
You can use the environment variable $LSB_JOBID.
mv * /data/${LSB_JOBID}/
If you copy the data inside your job script then it will hold the compute resource during the data copy. If you're copying a small amount of data then its not a problem. But if its a large amount of data you can use bsub -f so that other jobs can start while the data copy is ongoing.
bsub -outdir "/data/%J" -f "/data/%J/final < bigfile" sh script.sh
bigfile is the file that your job creates on the compute host. It will be copied to /data/%J/final after the job finishes. It even works on a non-shared filesystem.

Can we wget with file list and renaming destination files?

I have this wget command:
sudo wget --user-agent='some-agent' --referer=http://some-referrer.html -N -r -nH --cut-dirs=x --timeout=xxx --directory-prefix=/directory/for/downloaded/files -i list-of-files-to-download.txt
-N will check if there is actually a newer file to download.
-r will turn the recursive retrieving on.
-nH will disable the generation of host-prefixed directories.
--cut-dirs=X will avoid the generation of the host's subdirectories.
--timeout=xxx will, well, timeout :)
--directory-prefix will store files in the desired directorty.
This works nice, no problem.
Now, to the issue:
Let's say my files-to-download.txt has these kind of files:
http://website/directory1/picture-same-name.jpg
http://website/directory2/picture-same-name.jpg
http://website/directory3/picture-same-name.jpg
etc...
You can see the problem: on the second download, wget will see we already have a picture-same-name.jpg, so it won't download the second or any of the following ones with the same name. I cannot mirror the directory structure because I need all the downloaded files to be in the same directory. I can't use the -O option because it clashes with --N, and I need that. I've tried to use -nd, but doesn't seem to work for me.
So, ideally, I need to be able to:
a.- wget from a list of url's the way I do now, keeping my parameters.
b.- get all files at the same directory and being able to rename each file.
Does anybody have any solution to this?
Thanks in advance.
I would suggest 2 approaches -
Use the "-nc" or the "--no-clobber" option. From the man page -
-nc
--no-clobber
If a file is downloaded more than once in the same directory, >Wget's behavior depends on a few options, including -nc. In certain >cases, the local file will be
clobbered, or overwritten, upon repeated download. In other >cases it will be preserved.
When running Wget without -N, -nc, -r, or -p, downloading the >same file in the same directory will result in the original copy of file >being preserved and the second copy
being named file.1. If that file is downloaded yet again, the >third copy will be named file.2, and so on. (This is also the behavior >with -nd, even if -r or -p are in
effect.) When -nc is specified, this behavior is suppressed, >and Wget will refuse to download newer copies of file. Therefore, ""no->clobber"" is actually a misnomer in
this mode---it's not clobbering that's prevented (as the >numeric suffixes were already preventing clobbering), but rather the >multiple version saving that's prevented.
When running Wget with -r or -p, but without -N, -nd, or -nc, >re-downloading a file will result in the new copy simply overwriting the >old. Adding -nc will prevent this
behavior, instead causing the original version to be preserved >and any newer copies on the server to be ignored.
When running Wget with -N, with or without -r or -p, the >decision as to whether or not to download a newer copy of a file depends >on the local and remote timestamp and
size of the file. -nc may not be specified at the same time as >-N.
A combination with -O/--output-document is only accepted if the >given output file does not exist.
Note that when -nc is specified, files with the suffixes .html >or .htm will be loaded from the local disk and parsed as if they had been >retrieved from the Web.
As you can see from this man page entry, the behavior might be unpredictable/unexpected. You will need to see if it works for you.
Another approach would be to use a bash script. I am most comfortable using bash on *nix, so forgive the platform dependency. However the logic is sound, and with a bit of modifications, you can get it to work on other platforms/scripts as well.
Sample pseudocode bash script -
for i in `cat list-of-files-to-download.txt`;
do
wget <all your flags except the -i flag> $i -O /path/to/custom/directory/filename ;
done ;
You can modify the script to download each file to a temporary file, parse $i to get the filename from the URL, check if the file exists on the disk, and then take a decision to rename the temp file to the name that you want.
This offers much more control over your downloads.

How to make a file executable using Makefile

I want to copy a particular file using Makefile and then make this file executable. How can this be done?
The file I want to copy is a .pl file.
For copying I am using the general cp -rp command. This is done successfully. But now I want to make this file executable using Makefile
Its a bad practice to use cp and chmod, instead use install command.
all:
install -m 0777 hello ../hello
You can use -m option with install to set the permission mode, and even note that by using the install you will preserve not only the permission but also the owner of the file.
You can still use chmod accordingly but it would be a bad practice
all:
cp hello ../hello
chmod +x ../hello
Update: install vs cp
cp would simply copy files with current permissions, install not only copies, but also can change perms/ownership as arg flags. (This is what your requirement was)
One significant difference is that cp truncates the destination file and starts copying data from the source into the destination file. install, on the other hand, removes the destination file first.
This is significant because if the destination file is already in use, bad things could happen to whomever is using that file in case you cp a new file on top of it. e.g. overwriting an executable that is running might fail. Truncating a data file that an existing process is busy reading/writing to could cause pretty weird behavior. If you just remove the destination file first, as install does, things continue much like normal - the removed file isn't actually removed until all processes close that file.[source]
For more details check these,
install vs. cp; and mmap
How is install -c different from cp

Limit to number of files to cp in parallel

Im running the gsutil cp command in parallel (with the -m option) on a directory with 25 4gb json files (that i am also compressing with the -z option).
gsutil -m cp -z json -R dir_with_4g_chunks gs://my_bucket/
When I run it, it will print out to terminal that it is copying all but one of the files. By this I mean that it prints one of these lines per file:
Copying file://dir_with_4g_chunks/a_4g_chunk [Content-Type=application/octet-stream]...
Once the transfer for one of them is complete, it says that it'll be copying the last file.
The result of this is that there is one file that only starts to copy only when one of the others finishes copying, significantly slowing down the process
Is there a limit to the number of files I can upload with the -m option? Is this configurable in the boto config file?
I was not able to find the .boto file on my Mac (as per jterrace's answer above), instead I specified these values using the -o switch:
gsutil -m -o "Boto:parallel_thread_count=4" cp directory1/* gs://my-bucket/
This seemed to control the rate of transfer.
From the description of the -m option:
gsutil performs the specified operation using a combination of
multi-threading and multi-processing, using a number of threads and
processors determined by the parallel_thread_count and
parallel_process_count values set in the boto configuration file. You
might want to experiment with these value, as the best value can vary
based on a number of factors, including network speed, number of CPUs,
and available memory.
If you take a look at your .boto file, you should see this generated comment:
# 'parallel_process_count' and 'parallel_thread_count' specify the number
# of OS processes and Python threads, respectively, to use when executing
# operations in parallel. The default settings should work well as configured,
# however, to enhance performance for transfers involving large numbers of
# files, you may experiment with hand tuning these values to optimize
# performance for your particular system configuration.
# MacOS and Windows users should see
# https://github.com/GoogleCloudPlatform/gsutil/issues/77 before attempting
# to experiment with these values.
#parallel_process_count = 12
#parallel_thread_count = 10
I'm guessing that you're on Windows or Mac, because the default values for non-Linux machines is 24 threads and 1 process. This would result in copying 24 of your files first, then the last 1 file afterward. Try experimenting with increasing these values to transfer all 25 files at once.

How to know each solaris zone is occupying how much disk space?

IF I use the df command, I can only see in the Solaris server how much disk space is being used up. But I want to know how much diskspace a particular solaris zone is occupying
The problem I found with these solutions is that they do not take into account directory inheritance. Yes, you will find out how much "space" is under a certain directory. But if you want to actually find out how much extra space a zone is taking, you have to go a different route.
Do a
zonecfg -z zonename info
where zonename is the name of the zone. And look at each inherit-pkg-dir line.
inherit-pkgdir-dir:
dir: /lib
inherit-pkgdir-dir:
dir: /sbin
Any line that has inheritance is hard-linked to the zone. So you will be double
counting against the global zone if you simply do a
du -sh /zonepath/zonename
Basically you have to count only the directories (excluding /proc, and maybe /tmp) that aren't listed in any inherit-pkg-dir lines.
cd /zonepath/zonename/root
du -sh /bin /dev /etc /home /kernel opt /system /var etc....
try the du command
Yes - definitely su to root to do this.
I was able to run: /usr/xpg4/bin/du -sk -h -x /zonepath/zonename
to get the space that was used in a UFS root partition of a zone.
For example, /usr/xpg4/bin/du -sk -h -x /zonepath/zonename
returned the following: 3.5G /zonepath/zonename
The -x option when evaluating file sizes, evaluates only those files that have the same device as the file specified by the file operand.
The -x operand only seems to work when calling du with this path: /usr/xpg4/bin/du
This also worked to display the used space of the zfs attached drives in the zone! We mounted one zfs lun to the path /zoneepath/zonename/data and running this matched the output of "zfs list" for the data file:
# /usr/xpg4/bin/du -sk -h -x /zonepath/zoneneame/data
11G /zonepath/zoneneame/data
If you run #/usr/xpg4/bin/du -sk -h -x /zonepath/zoneneame
then you should get an overall total of the used space in the zone such as:
53G /zonepath/zonename
It will not include NFS attached drives, nor will it include directories that root is not the owner of.
I hope this helps!
Since I tried both John's solution and Pierre-Luc solution, what works for me is:
list all the zone (from the global zone)
:
tcsh>zoneadm list -civ
ID NAME STATUS PATH BRAND IP
0 global running / native shared
1 myZone1 running /export/zones/myZone1 native shared
2 myZone2 running /export/zones/myZone2 native shared
du -sk as root
(since local zones are not readable from global zone, I had to du -sk them as root)
:
tcsh>s du -sk /export/zones/myZone1
9930978 /export/zones/myZone1
According to Solaris Operating System Managing ZFS in Solaris 10 Containers the following command would give you the information you require.
zfs list
If you install the zone on a zfs volume then you can use the zfs tools ("zfs list") to quickly see how much space has been used.
Otherwise you'll have to use "du" as you already discovered (which will be much slower).