how to replay block I/O trace file in csv format - trace

I'm trying to replay MS-Cambridge I/O trace, which is converted in csv file, but I don't know how to convert this csv file into blktrace binary file, which can be replayed by fio or else.
Can anyone answer me how to convert csv formatted I/O trace into replay-able binary format?

https://ysoh.wordpress.com/2011/09/06/706/ talks about converting SNIA Cambridge trace files to Disksim but unfortunately I've yet to see anything about getting the CSV or Disksim files into blktrace format. It may be easier to convert those files into fio's trace v2 format (as that is just ASCII) and then you can use fio to play it back...

As what Anon said, you could convert csv file to a simple ASCII file (e.g. trace.log), and make a fio test config file (e.g. test.fio). Then use fio test.fio in your command.
Specifically, I give the example of these two files:
trace.log:
fio version 2 iolog
/dev/loop1 add
/dev/loop1 open
/dev/loop1 write 0 4096
/dev/loop1 write 4096 4096
/dev/loop1 close
test.fio:
[write-test]
ioengine=libaio
iodepth=32
direct=1
thread=1
read_iolog=trace.log
Hope it helps you understand.

Related

Import data in gzip archive to mongodb

I have data stored in gzip archive folders, every archive contains a big file that includes json in the following format:
{key:value, key:value}
{key:value, key:value}
{key:value, key:value}
I need to import the data to MongoDB. What is the best way to do that? I can't extract the gzip on my PC as each file (not archived) is about 1950MB.
You can unzip the files to STDOUT and pipe the stream into mongoimport. Then you don't need to safe the uncompressed file to your local disk:
gunzip --stdout your_file.json.gz | mongoimport --uri=<connection string> --collection=<collection> --db=<database>
I've imported tens of billions of lines of CSV and JSON to MongoDB in the past year, even from zipped formats. Having tried them all to save precious time, here's what I would like to recommend:
unzip the file
pass it as an argument to mongoimport
create the index on the fields you want, but ONLY at the end of the entire data insert process.
You can find the mongoimport documentation at: https://www.mongodb.com/docs/database-tools/mongoimport/
If you have a lot of files, you may want to do a for in bash that unzips and passes the filename as an argument to mongoimport.
If you are worried about not having enough disk space you can also delete the unzipped file at the end of each single import of mongoimport.
Hope it helped!

How can I read a zipped CSV file with KDB?

I've got a number of CSV files saved with pandas as zip files. I'd like to read them into KDB without having to manually unzip them in a terminal beforehand.
It looks like KDB supports compression:
https://code.kx.com/q/kb/file-compression/
But I can't figure out how to get it to decompress it. What I read in looks like the literal zip file.
How do I read a zipped CSV file in KDB?
Named pipes can be used for this purpose
https://code.kx.com/q/kb/named-pipes/
q)system"rm -f fifo && mkfifo fifo"
q)system"unzip -p t.zip t.csv > fifo &"
q)trade:flip `sym`time`ex`cond`size`price!"STCCFF"$\:()
q).Q.fps[{`trade insert ("STCCFF";",")0:x}]`:fifo

ffmpeg concat command not reading input file correctly

I am trying to concatenate two video files using ffmpeg, and I am receiving an error.
ffmpeg -f concat -safe 0 -i list.txt -c copy concat.mp4
And the error output I receive is....
[concat # 0x7ff922000000] Line 1: unknown keyword '43.mp4'
list.txt: Invalid data found when processing input
It looks like that the file names in the list have to be specially formatted to look like:
file '/path/to/file1.wav'
with a word file included. I spent a lot of time trying to guess why ffmpeg encountered an error trying to read the file names. It didn't matter if they were in the list or in the command line. So only after I utilized a command
for f in *.wav; do echo "file '$f'" >> mylist.txt; done
to make list from ffmpeg's manual I had success. The only difference was an additional word file.
Here you can read it yourself: https://trac.ffmpeg.org/wiki/Concatenate#demuxer

How to create an identical gzip of the same file?

I have a file, its contents are identical. It is passed into gzip and only the compressed form is stored. I'd like to be able to generate the zip again, and only update my copy should they differ. As it stands diffing tools (diff, xdelta, subversion) see the files as having changed.
Premise, I'm storing a mysqldump of an important database into a subversion repository. It is my intention that a cronjob periodically dump the db, gzip it, and commit the file. Currently, every time the file is dumped and then gzipped it is considered as differing. I'd prefer not to have my revision numbers needlessly increase every 15m.
I realize I could dump the file as just plain text, but I'd prefer not as it's rather large.
The command I am currently using to generate the dumps is:
mysqldump $DB --skip-extended-insert | sed '$d' | gzip -n > $REPO/$DB.sql.gz
The -n instructs gzip to remove the filename/timestamp information. The sed '$d' removes the last line of the file where mysqldump places a timestamp.
At this point, I'm probably going to revert to storing it in a plain text fashion, but I was curious as to what kind of solution there is.
Resolved, Mr. Bright was correct, I had mistakenly used a capital N when the correct argument was a lowercase one.
The -N instructs gzip to remove the
filename/timestamp information.
Actually, that does just the opposite. -n is what tells it to forget the original file name and time stamp.
I think gzip is preserving the original date and timestamp on the file(s) which will cause it to produce a different archive.
-N --name
When compressing, always save the original file
name and time stamp; this is the default. When
decompressing, restore the original file name and
time stamp if present. This option is useful on
systems which have a limit on file name length or
when the time stamp has been lost after a file
transfer.
But watchout: two gzips made at different times of the same unchanged file differ. This is because the gzip is itself timestamped with the gzip creation date - this is written to the header of the gzip file. Thus the apparently different gzips can contain the exact same content.

How to read multiple pcap files >2GB?

I am trying to parse large pcap files with libpcap but there is a file limitation so my files are separated at 2gb. I have 10 files of 2gb and I want to parse them at one shot. Is there a possibility to feed this data on an interface sequentially (each file separately) so that libpcap can parse them on the same run?
I am not aware of any tools that will allow you to replay more than one file at a time.
However, if you have the disk space, you can use mergecap to merge the ten files into a single file and then replay that.
Mergecap supports merging the packets according to
chronological order of each packet's timestamp in each file
ignoring the timestamps and performing what amounts to a packet version of 'cat'; write the contents of the first file to the output, then the next input file, then the next.
Mergecap is part of the Wireshark distribution.
I had multiple 2GB pcap files. Used the following one liner to go through each pcap file sequentially and with display filter. This worked without merging the pcap files (avoided using more disk space and cpu)
for i in /mnt/tmp1/tmp1-pcap-ens1f1-tcpdump* ; do tcpdump -nn -r $i host 8.8.8.8 and tcp ; done
**Explanation:**
for loop
/mnt/tmp1/tmp1-pcap-ens1f1-tcpdump* # path to files with * for wildcard
do tcpdump -nn -r $i host 8.8.8.8 and tcp # tcpdump not resolving ip or port numbers and reading each file in sequence
done #
Note: Please remember to adjust the file path and display filter according to your needs.