tcpdump session issue in moving dump files out of current folder - dump

I would like to have a tcpdump script which dumps into files let's say every hour.
This I can achieve quite simply like this:
tcpdump -i eth0 -G 3600 -w /tmp/files/<some-name>-%F-%H-%M-%S.pcap -Z root -z gzip
I want to MOVE the "finished" files to s3 for which I'm using the rclone tool:
rclone move /tmp/files remote:<s3 bucket name> --filter "- *.pcap"
All runs fine apart from the fact that whenever I move some any of the *pcap.gz files the currently processed *.pcap file size is enlarged with all the current session data which makes the file pretty big.
Does this mean that I can't move out any of the files from the directory and have to restart the tcpdump command on regular basis?
Thanks

Modify your tcpdump command to add a capture filter that excludes the rclone traffic. For example, assuming the remote IP address and TCP port number are 192.0.2.1 and 1234, respectively, apply a capture filter of "not (host 192.0.2.1 and tcp port 1234)" to exclude that traffic.

Related

PostgreSQL COPY pipe output to gzip and then to STDOUT

The following command works well
$ psql -c "copy (select * from foo limit 3) to stdout csv header"
# output
column1,column2
val1,val2
val3,val4
val5,val6
However the following does not:
$ psql -c "copy (select * from foo limit 3) to program 'gzip -f --stdout' csv header"
# output
COPY 3
Why do I have COPY 3 as the output from this command? I would expect that the output would be the compressed CSV string, after passing through gzip.
The command below works, for instance:
$ psql -c "copy (select * from foo limit 3) to stdout csv header" | gzip -f -c
# output (this garbage is just the compressed string and is as expected)
߉T`M�A �0 ᆬ}6�BL�I+�^E�gv�ijAp���qH�1����� FfВ�,Д���}������+��
How to make a single SQL command that directly pipes the result into gzip and sends the compressed string to STDOUT?
When you use COPY ... TO PROGRAM, the PostgreSQL server process (backend) starts a new process and pipes the file to the process's standard input. The standard output of that process is lost. It only makes sense to use COPY ... TO PROGRAM if the called program writes the data to a file or similar.
If your goal is to compress the data that go across the network, you could use sslmode=require sslcompression=on in your connect string to use the SSL network compression feature I built into PostgreSQL 9.2. Unfortunately this has been deprecated and most OpenSSL binaries are shipped with the feature disabled.
There is currently a native network compression patch under development, but it is questionable whether that will make v14.
Other than that, you cannot get what you want at the moment.
copy is running gzip on the server and not forwarding the STDOUT from gzip on to the client.
You can use \copy instead, which would run gzip on the client:
psql -q -c "\copy (select * from foo limit 3) to program 'gzip -f --stdout' csv header"
This is fundamentally the same as piping to gzip, which you show in your question.
If the goal is to compress the output of copy so it transfers faster over the network, then...
psql "postgresql://ip:port/dbname?sslmode=require&sslcompression=1"
It should display "compression active" if it's enabled. That probably requires some server config variable to be enabled though.
Or you can simply use ssh:
ssh user#dbserver "psql -c \"copy (select * from foo limit 3) to stdout csv header\" | gzip -f -c" >localfile.csv.gz
But... of course, you need ssh access to the db server.
If you don't have ssh to the db server, maybe you have ssh to another box in the same datacenter that has a fast network link to the db server, in that case you can ssh to it instead of the db server. Data will be transferred uncompressed between that box and the database, compressed on the box, and piped via ssh to your local machine. That will even save cpu on the database server since it won't be doing the compression.
If that doesn't work, well then, why not put the ssh command into the "to program" and have the server send it via ssh to your machine? You'll have to setup your router and open a port, but you can do that. Of course you'll have to find a way to put the password in the ssh command line, that's usually a big no-no, but maybe just for once. Or just use netcat instead, that doesn't require a password.
Also, if you want speed, please, use zstd instead of gzip.
Here's an example with netcat. I just tested it and it worked.
On destination machine which is 192.168.0.1:
nc -lp 65001 | zstd -d >file.csv
In another terminal:
psql -c "copy (select * from foo) to program 'zstd -9 |nc -N 192.168.0.1 65001' csv header" test
Note -N option for netcat.
You can use copy to PROGRAM:
COPY foo_table to PROGRAM 'gzip > /tmp/foo_table.csv' delimiters',' CSV HEADER;

"cat urls.txt | gsutil -m cp -I gs://target-bucket-name/" consistently hangs after transferring ~10,000 files

I am trying to copy ~80,000 images from one google cloud storage bucket to another.
I am initiating this operation from a mac with google cloud sdk 180.0.1 which contains gsutil 4.28.
The ~url of each image to be transferred in in a text file which I feed to gsutil cp like so ...
$cat urls.txt | gsutil -m cp -I gs://target-bucket-name/
wherein urls.txt looks like ...
head -3 urls.txt
gs://source-bucket-name/1506567870546.jpg
gs://source-bucket-name/1506567930548.jpg
gs://source-bucket-name/1507853339446.jpg
The process consistently hangs after ~10,000 of the images have been transferred.
I have edited $HOME/.boto to uncomment:
parallel_composite_upload_threshold = 0
This has not prevented the operation from hanging.
I am uncertain what causes the hanging.
The underlying need is for a general utility to copy N items from one bucket to another. I need a work around that will enable me to accomplish that mission.
UPDATE
Removing the -m option seems to work around the hanging problem but the file transfer is now significantly slower. I would like to be able avoid the hanging problem whilst still gaining the speed that comes with using concurrency if possible.
gstuil should not be hanging. This is a bug. Could you record the output of gsutl -D and when it hangs, create an issue in the gsutil github repo with the output attached and comment here with a link to it? You can use the following command to log the output:
$ cat urls.txt | gsutil -D -m cp -I gs://target-bucket-name/ 2>&1 | tee output
In the meanwhile, you could try experimenting with reducing the number of threads and processes that the parallel mode (-m) uses by changing these defaults in your boto file.
parallel_process_count = 1 # Default - 12
parallel_thread_count = 10 # Default - 10
Note that gsutil has options to copy all files in a bucket or subdirectory to a new bucket, as well as only copy files that have changed or do not exist in the target with the following commands:
gsutil -m cp gs://source-bucket/ gs://target-bucket
gsutil -m cp 'gs://source-bucket/dir/**' gs://target-bucket
gsutil -m rsync -r gs://source-bucket gs://target-bucket

Efficient way upload multiple files to separate locations on google cloud storage

Here is my case, I'd like to copy multiple files to separate locations on google cloud storage, e.g:
gsutil -m cp /local/path/to/d1/x.csv.gz gs://bucketx/d1/x.csv.gz
gsutil -m cp /local/path/to/d2/x.csv.gz gs://bucketx/d2/x.csv.gz
gsutil -m cp /local/path/to/d3/x.csv.gz gs://bucketx/d3/x.csv.gz
...
I have over 10k of such files, and executing them by separate calls of gsutil seems to be really slow, and lots of time is wasted on setting up network connection. What's the most efficient way to do that, please?
If your paths are consistently of the nature in your example, you could do this with one gsutil command:
gsutil -m cp -r /local/path/to/* gs://bucketx
However, this only works if you want the destination naming to mirror the source. If your paths are arbitrary mappings of source name to destination name, you'll need to run individual commands (which as you note can be sped up with parallel).

gsutil rsync with gzip compression

I'm hosting publicly available static resources in a google storage bucket, and I want to use the gsutil rsync command to sync our local version to the bucket, saving bandwidth and time. Part of our build process is to pre-gzip these resources, but gsutil rsync has no way to set the Content-Encoding header. This means we must run gsutil rsync, then immediately run gsutil setmeta to set headers on all the of gzipped file types. This leaves the bucket in a BAD state until that header is set. Another option is to use gsutil cp, passing the -z option, but this requires us to re-upload the entire directory structure every time, and this includes a LOT of image files and other non-gzipped resources that wastes time and bandwidth.
Is there an atomic way to accomplish the rsync and set proper Content-Encoding headers?
Assuming you're starting with gzipped source files in source-dir you can do:
gsutil -h content-encoding:gzip rsync -r source-dir gs://your-bucket
Note: If you do this and then run rsync in the reverse direction it will decompress and copy all the objects back down:
gsutil rsync -r gs://your-bucket source-dir
which may not be what you want to happen. Basically, the safest way to use rsync is to simply synchronize objects as-is between source and destination, and not try to set content encodings on the objects.
I'm not completely answering the question but I came here as I was wondering the same thing trying to achieve the following:
how to deploy efficiently a static website to google cloud storage
I was able to find an optimized way for deploying my static web site from a local folder to a gs bucket
Split my local folder into 2 folders with the same hierarchy, one containing the content to be gzip (html,css,js...), the other the other files
Gzip each file in my gzip folder (in place)
Call gsutil rsync in for each folder to the same gs destination
Of course, it is only a one way synchronization and deleted local files are not deleted remotely
For the gzip folder the command is
gsutil -m -h Content-Encoding:gzip rsync -c -r src/gzip gs://dst
forcing the content encoding to be gzippped
For the other folder the command is
gsutil -m rsync -c -r src/none gs://dst
the -m option is used for parallel optimization. The -c option is needed to force using checksum validation (Why is gsutil rsync re-downloading all our files?) as I was touching each local file in my build process. the -r option is used for recursivity.
I even wrote a script for it (in dart): http://tekhoow.blogspot.fr/2016/10/deploying-static-website-efficiently-on.html

How do I copy symbolic links between servers?

We are moving web servers. (LAMP)
Webserver 1 has hundreds of symbolic links pointing to files in a different directory (eg. ../../files/001.png). When we move over to the new server (have downloaded the site to my computer then reuploading to Webserver 2 using SFTP client Transmit). It does not copy the symbolic links...
Is there a better way to get the symbolic links from one server to another apart from recreating them on the new server?
rsync -a from one server to another will preserve file attributes and symlinks.
rsync -av user#server:/path/to/source user#server2:/path/to/target
Something like the following? On Webserver 1:
tar czf - the_directory | (ssh Webserver2 "cd /path/to/wherever && tar xzf -")
This creates a tar of the stuff to copy to STDOUT, and pipes it to an untar on the other server over ssh. It can be faster than a recursive ssh copy too.