psutil and du give different output - psutil

I'm checking the directory size, but du and psutil gives so much different outputs,
>>> import psutil
>>> print(psutil.disk_usage("/home/user1"))
diskusage(total=52586614784, used=3006468096, free=49580146688, percent=5.7)
so the used size is 3006468096 bytes, (roughly 2.9 GB)
.
With du,
du -sb /home/user1
498960095 /home/user1
, which is roughly 0.5 GB.
For me the du result seems correct (as there aren't many things in the directory), but I wonder why psutil gives such result.

I'm pretty sure that psutils is giving you the total size of the mounted block disk where the directory you're passing exists (e.g.: / possibly). You can confirm with checking df -h output.

Related

gsutil multiprocessing and multithreaded does not sustain cpu usage & copy rate on GCP instance

I am running a script to copy millions (2.4 million to be exact) images from several gcs buckets into one central bucket, with all buckets in the same region. I was originally working from one csv file but broke it into 64 smaller ones so each process can iterate through its own file as to not wait for the others. When the script launches on a 64 vCPU, 240 GB memory instance on GCP it runs fine for about an hour and a half. In 75 minutes 155 thousand files copied over. The CPU usage was registering a sustained 99%. After this, the CPU usage drastically declines to 2% and the transfer rate falls significantly. I am really unsure why this. I am keeping track of files that fail by creating blank files in an errors directory. This way there is no write lock when writing to a central error file. Code is below. It is not a spacing or syntax error, some spacing got messed up when I copied into the post. Any help is greatly appreciated.
Thanks,
Zach
import os
import subprocess
import csv
from multiprocessing.dummy import Pool as ThreadPool
from multiprocessing import Pool as ProcessPool
import multiprocessing
gcs_destination = 'gs://dest-bucket/'
source_1 = 'gs://source-1/'
source_2 = 'gs://source-2/'
source_3 = 'gs://source-3/'
source_4 = 'gs://source-4/'
def copy(img):
try:
imgID = img[0] # extract name
imgLocation = pano[9] # extract its location on gcs
print pano[0] + " " + panoLocation
source = ""
if imgLocation == '1':
source = source_1
elif imgLocation == '2':
source = source-2
elif imgLocation == '3':
source = source_3
elif imgLocation == '4':
source = source_4
print str(os.getpid())
command = "gsutil -o GSUtil:state_dir=.{} cp {}{}.tar.gz {}".format(os.getpid(), source, imgID , g
prog = subprocess.call(command, shell="True")
if prog != 0:
command = "touch errors/{}_{}".format(imgID, imgLocation)
os.system(command)
except:
print "Doing nothing with the error"
def split_into_threads(csv_file):
with open(csv_file) as f:
csv_f = csv.reader(f)
pool = ThreadPool(15)
pool.map(copy, csv_f)
if __name__ == "__main__":
file_names = [None] * 64
# Read in CSV file of all records
for i in range(0,64):
file_names[i] = 'split_origin/origin_{}.csv'.format(i)
process_pool = ProcessPool(multiprocessing.cpu_count())
process_pool.map(split_into_threads, file_names)
For gsutil, I agree strongly with the multithreading suggestion by adding -m. Further, composite uploads, -o, may be unnecessary and undesirable as the images are not GB each in size and need not be split into shards. They're likely in the X-XXMB range.
Within your python function, you are calling gsutil commands, which are in turn calling further python functions. It should be cleaner and more performant to leverage the google-made client library for python, available [below]. Gsutil is built for interactive CLI use rather than for calling programatically.
https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-python
Also, for gsutil, see your ~/.boto file and look at the multi-processing and multi-threading values. Beefier machines can handle greater thread and process. For reference, I work from my Macbook Pro w/ 1 process and 24 threads. I use an ethernet adapter and hardwire into my office connection and get incredible performance off internal SSD (>450 Mbps). That's Megabits, not bytes. The transfer rates are impressive, nonetheless
I strongly recommend you to use the "-m" flag on gsutil to enable multi thread copy.
Also as an alternative you can use the Storage Transfer Service [1] to move data between buckets.
[1] https://cloud.google.com/storage/transfer/

Reversing a hash to find something which works, but hashcat seems to have issues

I saw some unfamiliar code on a project i was working on.
I saw a function which said:
var salt = 1514691869198;
var result hex_hmac_sha1(salt, hmac_sha1(password))
# result is: 462435F34EAD6BB7C70751D90984DADD90EED9A4
I was having some issues with hashcat though. It seems to be getting killed early because of a driver or something.
It seems that option -m160 would be the one I would want to use since 160 = HMAC-SHA1 (key = $salt) in the man page for it.
Given the sha1.js file i was looking at, which gave me the code above, it showed the salt as the key which makes me think the 160 code as the most relevant.
Obviously this is a nested sha, but trying to find something to reverse it would be ideal.
I am aware reversing a hash would not return the actual password, but I figured I could run a wordlist and attempt to find a hash which matches this one.
That being said, I was thinking I can find a string which works. I am having issues though building either the hashcat command or finding this answer in general. I was not sure how i would want to put the hash in the command. I was thinking it would be along the lines of:
hashcat -m160 462435F34EAD6BB7C70751D90984DADD90EED9A4: 1514691869198 mywordlist.txt
but it seems to fail for me with the following:
* Device #1: Not a native Intel OpenCL runtime. Expect massive speed loss.
You can use --force to override, but do not report related errors.
No devices found/left.
Started: Sat Dec 30 22:52:33 2017
Stopped: Sat Dec 30 22:52:33 2017
and if i used --force it would say:
hashcat (pull/1273/head) starting...
OpenCL Platform #1: The pocl project
====================================
* Device #1: pthread-Intel(R) Core(TM) i7-4770HQ CPU # 2.20GHz,
2656/2656 MB allocatable, 1MCU
Hashes: 1 digests; 1 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13
rotates
Rules: 1
Applicable optimizers:
* Zero-Byte
* Not-Iterated
* Single-Hash
* Single-Salt
Watchdog: Hardware monitoring interface not found on your system.
Watchdog: Temperature abort trigger disabled.
Watchdog: Temperature retain trigger disabled.
* Device #1: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=64 -D CUDA_ARCH=0 -D VECT_SIZE=1 -D DEVICE_TYPE=2 -D DGST_R0=3 -D DGST_R1=4 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=5 -D KERN_TYPE=160 -D _unroll -cl-std=CL1.2'
* Device #1: Kernel m00160_a0.0bbec6e5.kernel not found in cache! Building may take a while...
Kernel library file /usr/share/pocl/kernel-i686-pc-linux-gnu.bc doesn't exist.
Try reading How to use hashcat on CPU only
Relevant parts:
Download latest OpenCL Drivers and Runtimes for CPU:
https://software.intel.com/en-us/articles/opencl-drivers#latest_CPU_runtime
Latest release (16.1.1) – at time of writing

What corruption is indicated by WinDbg and !chkimg?

I am having often BSODs and WinDbg report similar corruption for most of them
4: kd> !chkimg -lo 50 -d !nt
fffff80177723e6d-fffff80177723e6e 2 bytes - nt!MiPurgeZeroList+6d
[ 80 fa:00 e9 ]
2 errors : !nt (fffff80177723e6d-fffff80177723e6e)
and
CHKIMG_EXTENSION: !chkimg -lo 50 -d !nt
fffff8021531ae6d-fffff8021531ae6e 2 bytes - nt!MiPurgeZeroList+6d
[ 80 fa:00 aa ]
2 errors : !nt (fffff8021531ae6d-fffff8021531ae6e)
What does it mean? What with what is compared and how it can be that corruption is similar? Does it explicitly indicates RAM problem?
UPDATE
What do these numbers mean? fffff80177723e6d and fffff8021531ae6d? What does it mean, that endings conincide?
What does the following code mean: nt!MiPurgeZeroList+6d?
I already answered this on superuser.com. Windbg downloads the original Exe/DLLs from the Symbol Server and now the chkimg command detects corruption in the images of executable files by comparing them to the copy on a symbol store.
All sections of the file are compared, except for sections that are
discardable, that are writeable, that are not executable, that have
"PAGE" in their name, or that are from INITKDBG. You can change this
behavior can by using the -ss, -as, or -r switches.
!chkimg displays any mismatch between the image and the file as an
image error, with the following exceptions:
Addresses that are occupied by the Import Address Table (IAT) are not checked.
Certain specific addresses in Hal.dll and Ntoskrnl.exe are not checked, because certain changes occur when these sections are loaded.
To check these addresses, include the -nospec option.
If the byte value 0x90 is present in the file, and if the value 0xF0 is present in the corresponding byte of the image (or vice
versa), this situation is considered a match. Typically, the symbol
server holds one version of a binary that exists in both uniprocessor
and multiprocessor versions. On an x86-based processor, the lock
instruction is 0xF0, and this instruction corresponds to a nop (0x90)
instruction in the uniprocessor version. If you want !chkimg to
display this pair as a mismatch, set the -noplock option.
If the RAM is fine, check the HDD / HDD cables for errors (disk diag tool and run chkdsk to detect and fix NTFS issues). You can also connect the HDD to different SATA port on the mainboard.

Listing the volumes on Solaris OS

I am new to solaris OS, and trying to write a script which collects volume data from solaris box.
We did a similar script for Linux, and we used "df -P" command to list the volumes, and select the entries that start with "/dev".
By default, in linux, i could see a volume "/dev/sda1".
when i run df command on solaris box(df -k),i could not see any entry similar to (/dev/*) in my output.
When i mounted a CD, i could see an entry in df output as below.
/dev/dsk/c1t1d0s2 57632 57632 0 100% /media/VBOXADDITIONS_5.0.14_105127
So, in solaris, what is the pattern, i should look for to pick the volumes?
And, why am I not seeing at least one volume in the pattern /dev/
is it "/dev" or something else?
I am using solaris 11 image on oracle virtual box.
When i try "format" command, i could see 3 disks:
AVAILABLE DISK SELECTIONS:
0. c1d0 <VBOX HAR-8ea18e8b-2b2a0a5-0001-31.25GB> testvolu
/pci#0,0/pci-ide#1,1/ide#0/cmdk#0,0
1. c2d0 <VBOX HAR-b4343b55-dbed77c-0001 cyl 1020 alt 2 hd 64 sec 32>
/pci#0,0/pci-ide#1,1/ide#1/cmdk#0,0
2. c3t0d0 <ATA-VBOX HARDDISK-1.0 cyl 1009 alt 2 hd 64 sec 32>
/pci#0,0/pci8086,2829#d/disk#0,0
But, i dont see any partition in "df -k"
Also, i read here(https://docs.oracle.com/cd/E19455-01/805-6331/6j5vgg680/index.html), that disk names should be in "/dev/dsk/*" format.
Solaris 11 uses ZFS which has no one to one relationship between volumes (partitions) and file systems.
You can look at zpool status output to get the underlying devices.
$ zpool status
pool: rpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
Here, the whole c1t0d0 disk is used, hence no sx or px suffix.

weird results on db_dump(berkeley db)

I have about 400MB sized berkeley db file.
$> ls -alh ses.db
-rw-rw-r-- 1 junyoung junyoung 391M 9월 23 17:32 ses.db
after dumping it, I've checked the size again.
$> db_dump ses.db > ses.db.dump
$> ls -alh ses.db.dump
-rw-rw-r-- 1 junyoung junyoung 2.2M 9월 23 18:09 ses.db.dump1
the result file size is too small than I expected.
what's the reason of this? any comments?
There could be many reasons for this, to be sure. But possibly the most common reason is that the database once held many more records which were later deleted. This space is not returned to the filesystem.
See this thread in the Oracle forums for more information https://community.oracle.com/thread/879030 . And, as it says in there, try the db_stat command to get some visibility into what's going on in your database.