what's the purpose of fcntl with parameter F_DUPFD - fcntl

I traced an oracle process, and find it first open a file /etc/netconfig as file handle 11, and then duplicate it as 256 by calling fcntl with parameter F_DUPFD, and then close the original file handle 11. Later it read using file handle 256. So what's the point to duplicate the file handle? Why not just work on the original file handle?
12931: 0.0006 open("/etc/netconfig", O_RDONLY|O_LARGEFILE) = 11
12931: 0.0002 fcntl(11, F_DUPFD, 0x00000100) = 256
12931: 0.0001 close(11) = 0
12931: 0.0002 read(256, " # p r a g m a i d e n".., 1024) = 1024
12931: 0.0003 read(256, " t s t p i _ c".., 1024) = 215
12931: 0.0002 read(256, 0x106957054, 1024) = 0
12931: 0.0001 lseek(256, 0, SEEK_SET) = 0
12931: 0.0002 read(256, " # p r a g m a i d e n".., 1024) = 1024
12931: 0.0003 read(256, " t s t p i _ c".., 1024) = 215
12931: 0.0003 read(256, 0x106957054, 1024) = 0
12931: 0.0001 close(256) = 0

On some systems, like Solaris, standard I/O with FILE only works with file descriptors 0-255 because its implementation of the FILE structure uses an 8-bit integer instead of int. If a program uses a lot of file descriptors, it's useful to reserve file descriptors 3-255 using fnctl(fd, F_DUPFD, 256). Otherwise, functions like fopen(), freopen() and fdopen() will fail if you have 256 files open.

As an aside, they're file descriptors rather than file handles. The latter are a C feature used with fopen and its brethren while descriptors are more UNIXy, for use with open et al.
Interesting. The only reason that comes to mind is that some other piece of code has a specific need for the file descriptor to be 256. I suspect only Oracle would know the bizarre reasons for that. In any case, you're not guaranteed to get 256, you get the file first available file descriptor greater than or equal to that number.
From a bit of investigation (I don't know every little thing about the innards of UNIX off the top of my head), there are attributes that belong to a group of duplicated descriptors such as file position and access mode. There are other attributes that belong to a single file descriptor, even when duplicated, such as the close-on-exec flag in GNULib.
Doing a duplicate (either with dup, dup2 or your fcntl) could be a way to create two descriptors, one with different file descriptor attributes, but I can't see that being the case in your question since the first descriptor is closed anyway. As you say, why not just use the low descriptor?
Interestingly enough, if you google for netconfig f_dupfd, you will see similar traces where the fcntl fails and it continues to read that file with the low descriptor so my thoughts on the matter are that this is an attempt to preserve low file descriptors as much as possible. For example:
4327: open("/etc/netconfig", O_RDONLY|O_LARGEFILE) = 4
4327: fcntl(4, F_DUPFD, 0x00000100) Err#22 EINVAL
4327: read(4, " # p r a g m a i d e n".., 1024) = 1024
4327: read(4, " t s t p i _ c".., 1024) = 215
4327: read(4, 0x00296B80, 1024) = 0
4327: lseek(4, 0, SEEK_SET) = 0
4327: read(4, " # p r a g m a i d e n".., 1024) = 1024
4327: read(4, " t s t p i _ c".., 1024) = 215
4327: read(4, 0x00296B80, 1024) = 0
4327: close(4) = 0
Maybe the software has a byte array of file descriptors somewhere that's limited so it attempts to move other files above the 255-limit.
But really, that's just guesswork on my part (although I'd like to think it's relatively intelligent guesswork). Also keep in mind that it may not be Oracle itself doing this. The netconfig stuff is used in a lot of places so it may be some underlying library doing that, especially in light of the fact that most of the afore-mentioned web hits weren't Oracle-specific (ftp, remsh and so on).

Here is another example when a technique of reserving low-numbered file descriptors is needed.
Assume that a process opens a large number of file descriptor e.g. it accepts more than 1024 simultaneous socket connections. At the same time the process also uses third party library that opens socket connections and uses select() to see if sockets are ready for reading or writing. Additionally the third party library was compiled with __FD_SETSIZE set to 1024 (default value).
If the library opens a socket when all file descriptors below 1024 are in use then it will get a descriptor that select() and associated FD_* macros can not cope with. This will result in process crashing or undefined behaviour.

Related

MATLAB H5 files cannot exceed 2GB

If I create a h5 file larger than 2GB, I get an error. Smaller than 2GB things work fine. I didn't think this filesize was a problem with the h5 format. This seems to not be a problem on my windows 8.1 machine, but is on my Ubuntu 20.04 machine.
MWE:
fname = 'tmp001.h5';
h5create(fname,'/DS1',[195 2048 1500],'DataType','single'); % doesn't work
h5write(fname,'/DS1',1,[1 1 1],[1 1 1]);
fname = 'tmp002.h5';
h5create(fname,'/DS1',[195 2048 1400],'DataType','single');% doesn't work
h5write(fname,'/DS1',1,[1 1 1],[1 1 1]);
fname = 'tmp003.h5';
h5create(fname,'/DS1',[195 2048 1300],'DataType','single');% does work
h5write(fname,'/DS1',1,[1 1 1],[1 1 1]);
This only seems to be a problem if I give it only a segment of the data set, and even then it works providing the size of the data is more than 2^14! See files 4 and 5:
fname = 'tmp004.h5';
h5create(fname,'/DS1',[536886273],'DataType','single');% does work
h5write(fname,'/DS1',rand(536886273,1)); % filesize > 2gb but length > 2^14
fname = 'tmp005.h5';
h5create(fname,'/DS1',[536886273],'DataType','single');% doesn't work
h5write(fname,'/DS1',1,1,1); % filesize > 2gb but length < 2^14
fname = 'tmp006.h5';
h5create(fname,'/DS1',[536886272],'DataType','single');% does work
h5write(fname,'/DS1',1,1,1); % length > 2^14 but okay since filesize < 2gb
Error:
Warning: The following error was caught while executing 'H5ML.id' class destructor:
Error using hdf5lib2
The HDF5 library encountered an error and produced the following stack trace information:
H5FD_truncate driver truncate request failed
H5F_dest low level truncate failed
H5F_try_close problems closing file
H5O_close problem attempting file close
H5D_close unable to release object header
H5I_dec_app_ref can't decrement ID ref count
H5I_dec_app_ref_always_close can't decrement ID ref count
H5Dclose can't decrement count on dataset ID
Error in H5ML.id/close (line 47)
H5ML.hdf5lib2(obj.callback, obj.identifier);
Error in H5ML.id/delete (line 36)
obj.close();

Encoding Spotify URI to Spotify Codes

Spotify Codes are little barcodes that allow you to share songs, artists, users, playlists, etc.
They encode information in the different heights of the "bars". There are 8 discrete heights that the 23 bars can be, which means 8^23 different possible barcodes.
Spotify generates barcodes based on their URI schema. This URI spotify:playlist:37i9dQZF1DXcBWIGoYBM5M gets mapped to this barcode:
The URI has a lot more information (62^22) in it than the code. How would you map the URI to the barcode? It seems like you can't simply encode the URI directly. For more background, see my "answer" to this question: https://stackoverflow.com/a/62120952/10703868
The patent explains the general process, this is what I have found.
This is a more recent patent
When using the Spotify code generator the website makes a request to https://scannables.scdn.co/uri/plain/[format]/[background-color-in-hex]/[code-color-in-text]/[size]/[spotify-URI].
Using Burp Suite, when scanning a code through Spotify the app sends a request to Spotify's API: https://spclient.wg.spotify.com/scannable-id/id/[CODE]?format=json where [CODE] is the media reference that you were looking for. This request can be made through python but only with the [TOKEN] that was generated through the app as this is the only way to get the correct scope. The app token expires in about half an hour.
import requests
head={
"X-Client-Id": "58bd3c95768941ea9eb4350aaa033eb3",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"App-Platform": "iOS",
"Accept": "*/*",
"User-Agent": "Spotify/8.5.68 iOS/13.4 (iPhone9,3)",
"Accept-Language": "en",
"Authorization": "Bearer [TOKEN]",
"Spotify-App-Version": "8.5.68"}
response = requests.get('https://spclient.wg.spotify.com:443/scannable-id/id/26560102031?format=json', headers=head)
print(response)
print(response.json())
Which returns:
<Response [200]>
{'target': 'spotify:playlist:37i9dQZF1DXcBWIGoYBM5M'}
So 26560102031 is the media reference for your playlist.
The patent states that the code is first detected and then possibly converted into 63 bits using a Gray table. For example 361354354471425226605 is encoded into 010 101 001 010 111 110 010 111 110 110 100 001 110 011 111 011 011 101 101 000 111.
However the code sent to the API is 6875667268, I'm unsure how the media reference is generated but this is the number used in the lookup table.
The reference contains the integers 0-9 compared to the gray table of 0-7 implying that an algorithm using normal binary has been used. The patent talks about using a convolutional code and then the Viterbi algorithm for error correction, so this may be the output from that. Something that is impossible to recreate whithout the states I believe. However I'd be interested if you can interpret the patent any better.
This media reference is 10 digits however others have 11 or 12.
Here are two more examples of the raw distances, the gray table binary and then the media reference:
1.
022673352171662032460
000 011 011 101 100 010 010 111 011 001 100 001 101 101 011 000 010 011 110 101 000
67775490487
2.
574146602473467556050
111 100 110 001 110 101 101 000 011 110 100 010 110 101 100 111 111 101 000 111 000
57639171874
edit:
Some extra info:
There are some posts online describing how you can encode any text such as spotify:playlist:HelloWorld into a code however this no longer works.
I also discovered through the proxy that you can use the domain to fetch the album art of a track above the code. This suggests a closer integration of Spotify's API and this scannables url than previously thought. As it not only stores the URIs and their codes but can also validate URIs and return updated album art.
https://scannables.scdn.co/uri/800/spotify%3Atrack%3A0J8oh5MAMyUPRIgflnjwmB
Your suspicion was correct - they're using a lookup table. For all of the fun technical details, the relevant patent is available here: https://data.epo.org/publication-server/rest/v1.0/publication-dates/20190220/patents/EP3444755NWA1/document.pdf
Very interesting discussion. Always been attracted to barcodes so I had to take a look. I did some analysis of the barcodes alone (didn't access the API for the media refs) and think I have the basic encoding process figured out. However, based on the two examples above, I'm not convinced I have the mapping from media ref to 37-bit vector correct (i.e. it works in case 2 but not case 1). At any rate, if you have a few more pairs, that last part should be simple to work out. Let me know.
For those who want to figure this out, don't read the spoilers below!
It turns out that the basic process outlined in the patent is correct, but lacking in details. I'll summarize below using the example above. I actually analyzed this in reverse which is why I think the code description is basically correct except for step (1), i.e. I generated 45 barcodes and all of them matched had this code.
1. Map the media reference as integer to 37 bit vector.
Something like write number in base 2, with lowest significant bit
on the left and zero-padding on right if necessary.
57639171874 -> 0100010011101111111100011101011010110
2. Calculate CRC-8-CCITT, i.e. generator x^8 + x^2 + x + 1
The following steps are needed to calculate the 8 CRC bits:
Pad with 3 bits on the right:
01000100 11101111 11110001 11010110 10110000
Reverse bytes:
00100010 11110111 10001111 01101011 00001101
Calculate CRC as normal (highest order degree on the left):
-> 11001100
Reverse CRC:
-> 00110011
Invert check:
-> 11001100
Finally append to step 1 result:
01000100 11101111 11110001 11010110 10110110 01100
3. Convolutionally encode the 45 bits using the common generator
polynomials (1011011, 1111001) in binary with puncture pattern
110110 (or 101, 110 on each stream). The result of step 2 is
encoded using tail-biting, meaning we begin the shift register
in the state of the last 6 bits of the 45 long input vector.
Prepend stream with last 6 bits of data:
001100 01000100 11101111 11110001 11010110 10110110 01100
Encode using first generator:
(a) 100011100111110100110011110100000010001001011
Encode using 2nd generator:
(b) 110011100010110110110100101101011100110011011
Interleave bits (abab...):
11010000111111000010111011110011010011110001...
1010111001110001000101011000010110000111001111
Puncture every third bit:
111000111100101111101110111001011100110000100100011100110011
4. Permute data by choosing indices 0, 7, 14, 21, 28, 35, 42, 49,
56, 3, 10..., i.e. incrementing 7 modulo 60. (Note: unpermute by
incrementing 43 mod 60).
The encoded sequence after permuting is
111100110001110101101000011110010110101100111111101000111000
5. The final step is to map back to bar lengths 0 to 7 using the
gray map (000,001,011,010,110,111,101,100). This gives the 20 bar
encoding. As noted before, add three bars: short one on each end
and a long one in the middle.
UPDATE: I've added a barcode (levels) decoder (assuming no errors) and an alternate encoder that follows the description above rather than the equivalent linear algebra method. Hopefully that is a bit more clear.
UPDATE 2: Got rid of most of the hard-coded arrays to illustrate how they are generated.
The linear algebra method defines the linear transformation (spotify_generator) and mask to map the 37 bit input into the 60 bit convolutionally encoded data. The mask is result of the 8-bit inverted CRC being convolutionally encoded. The spotify_generator is a 37x60 matrix that implements the product of generators for the CRC (a 37x45 matrix) and convolutional codes (a 45x60 matrix). You can create the generator matrix from an encoding function by applying the function to each row of an appropriate size generator matrix. For example, a CRC function that add 8 bits to each 37 bit data vector applied to each row of a 37x37 identity matrix.
import numpy as np
import crccheck
# Utils for conversion between int, array of binary
# and array of bytes (as ints)
def int_to_bin(num, length, endian):
if endian == 'l':
return [num >> i & 1 for i in range(0, length)]
elif endian == 'b':
return [num >> i & 1 for i in range(length-1, -1, -1)]
def bin_to_int(bin,length):
return int("".join([str(bin[i]) for i in range(length-1,-1,-1)]),2)
def bin_to_bytes(bin, length):
b = bin[0:length] + [0] * (-length % 8)
return [(b[i]<<7) + (b[i+1]<<6) + (b[i+2]<<5) + (b[i+3]<<4) +
(b[i+4]<<3) + (b[i+5]<<2) + (b[i+6]<<1) + b[i+7] for i in range(0,len(b),8)]
# Return the circular right shift of an array by 'n' positions
def shift_right(arr, n):
return arr[-n % len(arr):len(arr):] + arr[0:-n % len(arr)]
gray_code = [0,1,3,2,7,6,4,5]
gray_code_inv = [[0,0,0],[0,0,1],[0,1,1],[0,1,0],
[1,1,0],[1,1,1],[1,0,1],[1,0,0]]
# CRC using Rocksoft model:
# NOTE: this is not quite any of their predefined CRC's
# 8: number of check bits (degree of poly)
# 0x7: representation of poly without high term (x^8+x^2+x+1)
# 0x0: initial fill of register
# True: byte reverse data
# True: byte reverse check
# 0xff: Mask check (i.e. invert)
spotify_crc = crccheck.crc.Crc(8, 0x7, 0x0, True, True, 0xff)
def calc_spotify_crc(bin37):
bytes = bin_to_bytes(bin37, 37)
return int_to_bin(spotify_crc.calc(bytes), 8, 'b')
def check_spotify_crc(bin45):
data = bin_to_bytes(bin45,37)
return spotify_crc.calc(data) == bin_to_bytes(bin45[37:], 8)[0]
# Simple convolutional encoder
def encode_cc(dat):
gen1 = [1,0,1,1,0,1,1]
gen2 = [1,1,1,1,0,0,1]
punct = [1,1,0]
dat_pad = dat[-6:] + dat # 6 bits are needed to initialize
# register for tail-biting
stream1 = np.convolve(dat_pad, gen1, mode='valid') % 2
stream2 = np.convolve(dat_pad, gen2, mode='valid') % 2
enc = [val for pair in zip(stream1, stream2) for val in pair]
return [enc[i] for i in range(len(enc)) if punct[i % 3]]
# To create a generator matrix for a code, we encode each row
# of the identity matrix. Note that the CRC is not quite linear
# because of the check mask so we apply the lamda function to
# invert it. Given a 37 bit media reference we can encode by
# ref * spotify_generator + spotify_mask (mod 2)
_i37 = np.identity(37, dtype=bool)
crc_generator = [_i37[r].tolist() +
list(map(lambda x : 1-x, calc_spotify_crc(_i37[r].tolist())))
for r in range(37)]
spotify_generator = 1*np.array([encode_cc(crc_generator[r]) for r in range(37)], dtype=bool)
del _i37
spotify_mask = 1*np.array(encode_cc(37*[0] + 8*[1]), dtype=bool)
# The following matrix is used to "invert" the convolutional code.
# In particular, we choose a 45 vector basis for the columns of the
# generator matrix (by deleting those in positions equal to 2 mod 4)
# and then inverting the matrix. By selecting the corresponding 45
# elements of the convolutionally encoded vector and multiplying
# on the right by this matrix, we get back to the unencoded data,
# assuming there are no errors.
# Note: numpy does not invert binary matrices, i.e. GF(2), so we
# hard code the following 3 row vectors to generate the matrix.
conv_gen = [[0,1,0,1,1,1,1,0,1,1,0,0,0,1]+31*[0],
[1,0,1,0,1,0,1,0,0,0,1,1,1] + 32*[0],
[0,0,1,0,1,1,1,1,1,1,0,0,1] + 32*[0] ]
conv_generator_inv = 1*np.array([shift_right(conv_gen[(s-27) % 3],s) for s in range(27,72)], dtype=bool)
# Given an integer media reference, returns list of 20 barcode levels
def spotify_bar_code(ref):
bin37 = np.array([int_to_bin(ref, 37, 'l')], dtype=bool)
enc = (np.add(1*np.dot(bin37, spotify_generator), spotify_mask) % 2).flatten()
perm = [enc[7*i % 60] for i in range(60)]
return [gray_code[4*perm[i]+2*perm[i+1]+perm[i+2]] for i in range(0,len(perm),3)]
# Equivalent function but using CRC and CC encoders.
def spotify_bar_code2(ref):
bin37 = int_to_bin(ref, 37, 'l')
enc_crc = bin37 + calc_spotify_crc(bin37)
enc_cc = encode_cc(enc_crc)
perm = [enc_cc[7*i % 60] for i in range(60)]
return [gray_code[4*perm[i]+2*perm[i+1]+perm[i+2]] for i in range(0,len(perm),3)]
# Given 20 (clean) barcode levels, returns media reference
def spotify_bar_decode(levels):
level_bits = np.array([gray_code_inv[levels[i]] for i in range(20)], dtype=bool).flatten()
conv_bits = [level_bits[43*i % 60] for i in range(60)]
cols = [i for i in range(60) if i % 4 != 2] # columns to invert
conv_bits45 = np.array([conv_bits[c] for c in cols], dtype=bool)
bin45 = (1*np.dot(conv_bits45, conv_generator_inv) % 2).tolist()
if check_spotify_crc(bin45):
return bin_to_int(bin45, 37)
else:
print('Error in levels; Use real decoder!!!')
return -1
And example:
>>> levels = [5,7,4,1,4,6,6,0,2,4,3,4,6,7,5,5,6,0,5,0]
>>> spotify_bar_decode(levels)
57639171874
>>> spotify_barcode(57639171874)
[5, 7, 4, 1, 4, 6, 6, 0, 2, 4, 3, 4, 6, 7, 5, 5, 6, 0, 5, 0]

Meaning of the benchmark variables in snakemake

I included a benchmark directive to some of the rules in my snakemake workflow, and the resulting files have the following header:
s h:m:s max_rss max_vms max_uss max_pss io_in io_out mean_load
The only documentation I've found mentions a "benchmark txt file (which will contain a tab-separated table of run times and memory usage in MiB)".
I can guess that columns 1 and 2 are two different ways of displaying the time taken to execute the rule (in seconds, and converted to hours, minutes and seconds).
io_in and io_out likely related to disk read and write activity, but in what units are they measured?
What are the others? Is this documented somewhere?
Edit: Looking at the source code
I've found the following piece of code in /snakemake/benchmark.py, that might well be where the benchmark data come from:
def _update_record(self):
"""Perform the actual measurement"""
# Memory measurements
rss, vms, uss, pss = 0, 0, 0, 0
# I/O measurements
io_in, io_out = 0, 0
# CPU seconds
cpu_seconds = 0
# Iterate over process and all children
try:
main = psutil.Process(self.pid)
this_time = time.time()
for proc in chain((main,), main.children(recursive=True)):
meminfo = proc.memory_full_info()
rss += meminfo.rss
vms += meminfo.vms
uss += meminfo.uss
pss += meminfo.pss
ioinfo = proc.io_counters()
io_in += ioinfo.read_bytes
io_out += ioinfo.write_bytes
if self.bench_record.prev_time:
cpu_seconds += proc.cpu_percent() / 100 * (
this_time - self.bench_record.prev_time)
self.bench_record.prev_time = this_time
if not self.bench_record.first_time:
self.bench_record.prev_time = this_time
rss /= 1024 * 1024
vms /= 1024 * 1024
uss /= 1024 * 1024
pss /= 1024 * 1024
io_in /= 1024 * 1024
io_out /= 1024 * 1024
except psutil.Error as e:
return
# Update benchmark record's RSS and VMS
self.bench_record.max_rss = max(self.bench_record.max_rss or 0, rss)
self.bench_record.max_vms = max(self.bench_record.max_vms or 0, vms)
self.bench_record.max_uss = max(self.bench_record.max_uss or 0, uss)
self.bench_record.max_pss = max(self.bench_record.max_pss or 0, pss)
self.bench_record.io_in = io_in
self.bench_record.io_out = io_out
self.bench_record.cpu_seconds += cpu_seconds
So this seems to come from functionalities provided by psutil.
I will just leave this here for future reference.
Reading through
snakemake >= 6.0.0 benchmark module
psutil's memory_info(), memory_full_info(), io_counters(), cpu_times()
as previously suggested:
colname
type (unit)
description
s
float (seconds)
Running time in seconds
h:m:s
string (-)
Running time in hour, minutes, seconds format
max_rss
float (MB)
Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used.
max_vms
float (MB)
Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process
max_uss
float (MB)
“Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now.
max_pss
float (MB)
“Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only)
io_in
float (MB)
the number of MB read (cumulative).
io_out
float (MB)
the number of MB written (cumulative).
mean_load
float (-)
CPU usage over time, divided by the total running time (first row)
cpu_time
float(-)
CPU time summed for user and system
Benchmarking in snakemake could certainly be better documented, but psutil is documanted here:
get_memory_info()
Return a tuple representing RSS (Resident Set Size) and VMS (Virtual Memory Size) in bytes.
On UNIX RSS and VMS are the same values shown by ps.
On Windows RSS and VMS refer to "Mem Usage" and "VM Size" columns of taskmgr.exe.
psutil.disk_io_counters(perdisk=False)
Return system disk I/O statistics as a namedtuple including the following attributes:
read_count: number of reads
write_count: number of writes
read_bytes: number of bytes read
write_bytes: number of bytes written
read_time: time spent reading from disk (in milliseconds)
write_time: time spent writing to disk (in milliseconds)
The code you found confirms that all the memory usage and IO counts are reported in MB (= bytes * 1024 * 1024).

How to divide or cut a YUV file?

I have a YUV file with 150 frames, I want to divide it into 2 files of 75 frames each. How to go bu doing this ? Are there any software's to do this ?
No specific SW is needed. All you need to do is read/write the number of bytes that corresponds to a frame into a new file. "Normally" the YCbCr format used is subsampled according to 4:2:0, i.e. the chroma samples are reduced by a factor of 2 both horizontally and vertically; meaning that 1 frame in YCbCr 4:2:0 corresponds to
1 frame = width x height x 3 / 2 bytes
If you are on a linux based system, you can use the dd utility to extract the first n-frames into a new file like this:
dd if=input.yuv bs=1 count=$((width*height*3/2*num_frames)) of=output.yuv
for the first 10 frames of a 1080p clip, the above would be:
dd if=input.yuv bs=1 count=$((1920*1080*3/2*10)) of=output.yuv
or
dd if=input.yuv bs=1 count=3110400 of=output.yuv
or use your favorite programming/scripting language to do this.
For example, the following python-scripts writes the first 10 frames to a new file (one frame at a time), tweek it to your needs:
#!/usr/bin/env python
f_in = 'BQMall_832x480_60.yuv'
f_out = 'BQMall_first_10_frames.yuv'
f_size = 832*480*3/2
with open(f_in, 'rb') as fd_in, open(f_out, 'wb') as fd_out:
for i in range(10):
data = fd_in.read(f_size)
fd_out.write(data)
I suggest that "do not use bs = 1"
dd if=176x144.yuv bs= $((176 * 144 * 3 / 2)) count=$FrameNo of=output.yuv

How to remove header of a wavfile in Matlab

I need to remove first 1024 bytes of a wave file. I tried to do but I got corrupted/distorted wavfile:
wavFile = fopen(fullFileName, 'r'); % Open file for reading
fseek (wavFile, 1024, -1);% Skip past header, which is 1024 bytes long.
wF = fread (wavFile, inf, 'int16');% 16-bit signed integers
wF = wF(:)';
newWavFile = fopen(strcat('new_',fileNames(fileNo).name), 'w+');% Open file for reading
fwrite(newWavFile,wF);
fclose(wavFile);
What can be the problem?
I could manage to remove header and correct the distortion by :
wavFile = fopen(fullFileName, 'r');% Open file for reading
fseek (wavFile, 1024, -1);% Skip past header, which is 1024 bytes long.
wF = fread (wavFile, inf, 'int16');% 16-bit signed integers
wF = wF(:)';
wF = 0.8*wF/max(abs(wF));
newWavFile =strcat('1_',fileNames(fileNo).name);% Open file for reading
wavwrite(wF,16000,16,newWavFile);