cp with reflink flag: how system determines if reflink is possible? - copy

When I copy a file using reflink option, e.g.,
cp --reflink foo bar
how and what stage in execution of cp determines if the underneath file system supports COW. I tried looking into coreutils/src/cp.c but couldn't find the specific system call/ ioctl or any other method which determines COW capability and accordingly proceeds for cp execution / reports error:
cp: failed to clone 'bar' from 'foo': Bad address
In short, I am looking for how resolution of --reflink=auto option happens.

BTRFS_IOC_CLONE or FICLONE are the ioctl request codes tried by cp. The former is for BTRFS reflinks while The latter got introduced when XFS gained [1] reflink support. You can strace the cp command to see what is happening in the version that you have.
[1] http://lwn.net/Articles/702633/

Related

How to correctly use `gsutil -q stat` in scripts?

I am creating a KSH script to check whether a subdirectory is exist on GCS bucket. I am writing the script like this:
#!/bin/ksh
set -e
set -o pipefail
gsutil -q stat ${DESTINATION_PATH}/
PATH_EXIST=$?
if [ ${PATH_EXIST} -eq 0 ] ; then
# do something
fi
Weird thing happens when the ${DESTINATION_PATH}/ does not exist, the script exit without evaluating PATH_EXIST=$?. If ${DESTINATION_PATH}/ is exist, the script will run normally as expected.
Why does that thing happen? How can I do better?
The statement set -e implies that your script will be exited if a command exits with a non-zero status.
The gsutil stat command can be used to check wheter an object exists:
gsutil -q stat gs://some-bucket/some-object
It has an exit status of 0 for an existing object and 1 for a non-existent object.
However it is advised against to use it with subdirectories:
Note: Unlike the gsutil ls command, the stat command does not support
operations on sub-directories. For example, if you run the command:
gsutil -q stat gs://some-bucket/some-subdir/
gsutil will look for
information about an object called some-subdir/ (with a trailing
slash) inside the bucket some-bucket, as opposed to operating on
objects nested under gs://some-bucket/some-subdir/. Unless you
actually have an object with that name, the operation will fail.
The reason because your command is not failing when your ${DESTINATION_PATH}/ exists is because if you create the folder using the Cloud Console i.e the UI, then a placeholder object will be created with its name. But let me be clear, folders don't exist in Google Cloud Storage, they are just a visualization of the bucket objects hierarchy.
So if you upload an object named newFolder/object to your bucket and the newFolder does not exists, it will be "created" but your gsutil -q stat ${DESTINATION_PATH}/ will return exit code 1. However if you create the folder using the UI and run the same command it will return exit 0. Thus follow the documentation, and avoid using it for checking if a directory exists.
Instead if you want to check whether a subdirectory exists just check if it contains any object inside:
gsutil -q stat ${DESTINATION_PATH}/*
Which will return 0 if any object is in the subdirectory and 1 otherwise.

How to make a file executable using Makefile

I want to copy a particular file using Makefile and then make this file executable. How can this be done?
The file I want to copy is a .pl file.
For copying I am using the general cp -rp command. This is done successfully. But now I want to make this file executable using Makefile
Its a bad practice to use cp and chmod, instead use install command.
all:
install -m 0777 hello ../hello
You can use -m option with install to set the permission mode, and even note that by using the install you will preserve not only the permission but also the owner of the file.
You can still use chmod accordingly but it would be a bad practice
all:
cp hello ../hello
chmod +x ../hello
Update: install vs cp
cp would simply copy files with current permissions, install not only copies, but also can change perms/ownership as arg flags. (This is what your requirement was)
One significant difference is that cp truncates the destination file and starts copying data from the source into the destination file. install, on the other hand, removes the destination file first.
This is significant because if the destination file is already in use, bad things could happen to whomever is using that file in case you cp a new file on top of it. e.g. overwriting an executable that is running might fail. Truncating a data file that an existing process is busy reading/writing to could cause pretty weird behavior. If you just remove the destination file first, as install does, things continue much like normal - the removed file isn't actually removed until all processes close that file.[source]
For more details check these,
install vs. cp; and mmap
How is install -c different from cp

gsutil returning 0 code even if it fails

we are trying to script a fail-safe copy using gsutil.
The problem is that gsutil cp returns 0 even if it failed. Is this expected? Do i have to parse the log?
/usr/local/bin/gsutil -m cp -L gsutilM.log gs://my-bucket/mydir/myfile1.gz /home/myuser
From log file:
Result,Description
error, CommandException: crc32c signature computed for local file (FGa0jw==) doesn't match cloud-supplied digest (N1S6Ew==).
Local file (/home/myuser/myFile1.gz) will be deleted.
Thanks
I tried modifying the code that makes the crc32c check, to force that condition to occur. I then downloaded a file, saw output like you saw, and verified that $status was set to 1.
What OS and shell are you using?

Limit to number of files to cp in parallel

Im running the gsutil cp command in parallel (with the -m option) on a directory with 25 4gb json files (that i am also compressing with the -z option).
gsutil -m cp -z json -R dir_with_4g_chunks gs://my_bucket/
When I run it, it will print out to terminal that it is copying all but one of the files. By this I mean that it prints one of these lines per file:
Copying file://dir_with_4g_chunks/a_4g_chunk [Content-Type=application/octet-stream]...
Once the transfer for one of them is complete, it says that it'll be copying the last file.
The result of this is that there is one file that only starts to copy only when one of the others finishes copying, significantly slowing down the process
Is there a limit to the number of files I can upload with the -m option? Is this configurable in the boto config file?
I was not able to find the .boto file on my Mac (as per jterrace's answer above), instead I specified these values using the -o switch:
gsutil -m -o "Boto:parallel_thread_count=4" cp directory1/* gs://my-bucket/
This seemed to control the rate of transfer.
From the description of the -m option:
gsutil performs the specified operation using a combination of
multi-threading and multi-processing, using a number of threads and
processors determined by the parallel_thread_count and
parallel_process_count values set in the boto configuration file. You
might want to experiment with these value, as the best value can vary
based on a number of factors, including network speed, number of CPUs,
and available memory.
If you take a look at your .boto file, you should see this generated comment:
# 'parallel_process_count' and 'parallel_thread_count' specify the number
# of OS processes and Python threads, respectively, to use when executing
# operations in parallel. The default settings should work well as configured,
# however, to enhance performance for transfers involving large numbers of
# files, you may experiment with hand tuning these values to optimize
# performance for your particular system configuration.
# MacOS and Windows users should see
# https://github.com/GoogleCloudPlatform/gsutil/issues/77 before attempting
# to experiment with these values.
#parallel_process_count = 12
#parallel_thread_count = 10
I'm guessing that you're on Windows or Mac, because the default values for non-Linux machines is 24 threads and 1 process. This would result in copying 24 of your files first, then the last 1 file afterward. Try experimenting with increasing these values to transfer all 25 files at once.

How to read a block in a storage pool (zpool) using dd?

I want to read a block in zpool storage pool using dd command. Since zpool doesn't create a device file like other volume manager like vxvm. I dunno which block device to use for reading. Is there any way to read block by block data in zpool ?
You can probably use the zdb command. Here is a pdf about it, and the help output.
http://www.bruningsystems.com/osdevcon_draft3.pdf
# zdb --help
zdb: illegal option -- -
Usage: zdb [-CumdibcsDvhL] poolname [object...]
zdb [-div] dataset [object...]
zdb -m [-L] poolname [vdev [metaslab...]]
zdb -R poolname vdev:offset:size[:flags]
zdb -S poolname
zdb -l [-u] device
zdb -C
Dataset name must include at least one separator character '/' or '#'
If dataset name is specified, only that dataset is dumped
If object numbers are specified, only those objects are dumped
Options to control amount of output:
-u uberblock
-d dataset(s)
-i intent logs
-C config (or cachefile if alone)
-h pool history
-b block statistics
-m metaslabs
-c checksum all metadata (twice for all data) blocks
-s report stats on zdb's I/O
-D dedup statistics
-S simulate dedup to measure effect
-v verbose (applies to all others)
-l dump label contents
-L disable leak tracking (do not load spacemaps)
-R read and display block from a device
Below options are intended for use with other options (except -l):
-A ignore assertions (-A), enable panic recovery (-AA) or both (-AAA)
-F attempt automatic rewind within safe range of transaction groups
-U <cachefile_path> -- use alternate cachefile
-X attempt extreme rewind (does not work with dataset)
-e pool is exported/destroyed/has altroot/not in a cachefile
-p <path> -- use one or more with -e to specify path to vdev dir
-P print numbers parsable
-t <txg> -- highest txg to use when searching for uberblocks
Specify an option more than once (e.g. -bb) to make only that option verbose
Default is to dump everything non-verbosely
Unfortunately, I don't know how to use it.
# zdb
tank:
version: 28
name: 'tank'
...
vdev_tree:
...
children[0]:
...
children[0]:
...
path: '/dev/label/bank1d1'
phys_path: '/dev/label/bank1d1'
...
So I took the array indexes 0 0 to get my first disk (bank1d1) and did this command. It did something. I don't know how to read the output.
zdb -R tank 0:0:4e00:200 | strings
Have fun... try not to destroy anything. Here is your warning from the man page:
The zdb command is used by support engineers to diagnose failures and
gather statistics. Since the ZFS file system is always consistent on
disk and is self-repairing, zdb should only be run under the direction
by a support engineer.
And please tell us what you actually were looking for. Was Alan right that you wanted to do backups?
You can read from underlying raw devices in the pool, but as far as I can tell there's no concept of single contiguous block device representing the whole pool.
The pool in ZFS is not a single contiguous block of sectors that 'classic' volume managers are. ZFS internal structure is closer to a tree which would be somewhat challenging to represent as a flat array of blocks.
Ben Rockwood's blog post "zdb: Examining ZFS At Point-Blank Range" may help getting better idea of what's under the hood.
No idea about what might be useful doing so but you certainly can read blocks in the underlying devices used by the pool. They are shown by the zpool status command. If you are really asking about zvols instead of zpools, they are accessible under /dev/zvol/rdsk/pool-name/zvol-name. If you want to look at internal zpool data, you probably want to use zdb.
If you want to backup ZFS filesystems you should be using the following tools:
'zfs snapshot' to create a stable snapshot of the filesystem
'zfs send' to send a copy of the snapshot to somewhere else
'zfs receive' to go back from a snapshot to a filesystem.
'dd' is almost certainly not the tool you should be using. In your case you could 'zfs send' and redirect the output into a file on your other filesystem.
See chapter 7 of the ZFS administration guide for more details.