enabling compression during "gsutil cp" download - google-cloud-storage

The gsutil command supports options to enable compression during transport only (with -J or -j ext), allowing you to compress during transport only, thereby saving network bandwidth and speeding up the copy itself.
Is there an equivalent way to do this when downloading from GCS to local machine? That is, if I have an uncompressed text file at gs://foo/bar/file.json, is there some equivalent to -J that will compress the contents of "file.json" during transport only?
The goal is to speed up a copy from remote to local, and not just for a single file but dozens. I'm already using -m to do parallel copies, but would like to transmit compressed data to reduce network transfer time.
I didn't find anything relevant in the docs, and including -J doesn't appear to do anything during downloads. I've tried the following, but the "ETA" numbers printed by gsutil look identical whether -J is present or absent:
gsutil -cp -J gs://foo/bar/file.json .

This feature is not yet available.
As an alternative, you will need to implement your own solution for compressing, be it an App Engine, Cloud Function or Cloud Run. Your application will need to compress your files while they are on Cloud Storage.
The ideal solution would be to use -m along with the compressed files. This entails that you're making parallel copies of compressed files. Consider the following structures. If [1] is how you are doing your you are downloading each file individually. If you look at [2], you would only download the compressed files.
[1]
Bucket Foo
├───FooScripts
│ ├───SysWipe.sh
│ └───DropAll.sql
├───barconfig
│ ├───barRecreate.sh
│ └───reGenAll.sql
├───Baz
│ ├───BadBaz.sh
│ └───Drop.sh
...
[2]
Bucket Foo
├───FooScripts
│ ├───SysWipe.sh
│ └───DropAll.sql
│ ├───FooScripts.zip
├───barconfig
│ ├───barRecreate.sh
│ └───reGenAll.sql
│ ├───barconfig.zip
├───Baz
│ ├───BadBaz.sh
│ └───Drop.sh
│ ├───Baz.zip
...
Once your data has been downloaded, you should consider deleting the compressed files as they are no longer needed for your operations and you will be charged for them. Alternatively, you can raise a Feature Request on the Public Issue Tracker, which will be sent to the Cloud Storage team, who can look into the feasibility of this request.

Related

How to load fish configuration from a remote repository?

I have a zillion machines in different places (home network, cloud, ...) and I use fish on each of them. The problem is that I have to synchronize their configuration every time I change something in there.
Is there a way to load the configuration from a remote repository? (= a place where it would be stored, not necessarily git but ideally I would manage them in GitHub). In such a case I would just have a one liner everywhere.
I do not care too much about startup time, loading the config each time would be acceptable
I cannot push the configuration to the machines (via Ansible for instance) - not of them are reachable from everywhere directly - but all of them can reach Internet
There are two parts to your question. Part one is not specific to fish. For systems I use on a regular basis I use Dropbox. I put my ~/.config/fish directory in a Dropbox directory and symlink to it. For machines I use infrequently, such as VMs I use for investigating problems unique to a distro, I use rsync to copy from my main desktop machine. For example,
rsync --verbose --archive --delete -L --exclude 'fishd.*' krader#macpro:.config .
Note the exclusion of the fishd.* pattern. That's part two of your question and is unique to fish. Files in your ~/.config/fish directory named with that pattern are the universal variable storage and are currently unique for each machine. We want to change that -- see https://github.com/fish-shell/fish-shell/issues/1912. The problem is that file contains the color theme variables. So to copy your color theme requires exporting those vars on one machine:
set -U | grep fish_color_
Then doing set -U on the new machine for each line of output from the preceding command. Obviously if you have other universal variables you want synced you should just do set -U and import all of them.
Disclaimer: I wouldn't choose this solution myself. Using a cloud storage client as Kurtis Rader suggested or a periodic cron job to pull changes from a git repository (+ symlinks) seems a lot easier and fail-proof.
On those systems where you can't or don't want to sync with your cloud storage, you can download the configuration file specifically, using curl for example. Some precious I/O time can be saved by utilizing HTTP cache control mechanisms. With or without cache control, you will still need to create a connection to a remote server each time (or each X times or each Y time passed) and that wastes quite some time already.
Following is a suggestion for such a fish script, to get you started:
#!/usr/bin/fish
set -l TMP_CONFIG /tmp/shared_config.fish
curl -s -o $TMP_CONFIG -D $TMP_CONFIG.headers \
-H "If-None-Match: \"$SHARED_CONFIG_ETAG\"" \
https://raw.githubusercontent.com/woj/dotfiles/master/fish/config.fish
if test -s $TMP_CONFIG
mv $TMP_CONFIG ~/.config/fish/conf.d/shared_config.fish
set -U SHARED_CONFIG_ETAG (sed -En 's/ETag: "(\w+)"/\1/p' $TMP_CONFIG.headers)
end
Notes:
Warning: Not tested nearly enough
Assumes fish v2.3 or higher.
sed behavior varies from platform to platform.
Replace woj/dotfiles/master/fish/config.fish with the repository, branch and path that apply to your case.
You can run this from a cron job, but if you insist to update the configuration file on every init, change the script to place the configuration in a path that's not already automatically loaded by fish, e.g.:
mv $TMP_CONFIG ~/.config/fish/shared_config.fish
and in your config.fish run this whole script file, followed by a
source ~/.config/fish/shared_config.fish

Transfer a MongoDB database over an unstable connection

I have a fairly small MongoDB instance (15GB) running on my local machine, but I need to push it to a remote server in order for my partner to work on it. The problem is twofold,
The server only has 30GB of free space
My local internet connection is very unstable
I tried copyDatabase to transfer it directly, but it would take approximately 2 straight days to finish, in which the connection is almost guaranteed to fail at some point. I have also tried both mongoexport and mongodump but both produce files that are ~40GB, which won't fit on the server, and that's ignoring the difficulties of transferring 40GB in the first place.
Is there another, more stable method that I am unaware of?
Since your mongodump output is much larger than your data, I'm assuming you are using MongoDB 3.0+ with the WiredTiger storage engine and your data is compressed but your mongodump output is not.
As at MongoDB 3.2, the mongodump and mongorestore tools now have support for compression (see: Archiving and Compression in MongoDB Tools). Compression is not used by default.
For your use case as described I'd suggest:
Use mongodump --gzip to create a dump directory with compressed backups of all of your collections.
Use rsync --partial SRC .... DEST or similar for a (resumable) file transfer over your unstable internet connection.
NOTE: There may be some directories you can tell rsync to ignore with --exclude; for example the local and test databases can probably be skipped. Alternatively, you may want to specify a database to backup with mongodump --gzip --db dbname.
Your partner can use a similar rsync commandline to transfer to their environment, and a command line like mongorestore --gzip /path/to/backup to populate their local MongoDB instance.
If you are going to transfer dumps on an ongoing basis, you will probably find rsync's --checksum option useful to include. Normally rsync transfers "updated" files based on a quick comparison of file size and modification time. A checksum involves more computation but would allow skipping collections that have identical data to previous backups (aside from the modification time).
If you need to sync data changes on ongoing basis, you also may be better moving your database to a cloud service (eg. a Database-as-a-Service provider like MongoDB Atlas or your own MongoDB instance).

monitoring number of postgresql connections

I'd like to monitor on my Zabbix server the number of concurrent PostgreSQL connections, so I've created a cron job that outputs the COUNT of rows on the pg_stat_activity to a file, that will be read by zabbix once a minute.
My problem is that I might have a scenario where I get a COUNT of, say, 10, then have a quick peak of 50 connections, get back to 10, where I do the COUNT again.
In this case the peak would not be noticed.
I've wondered about some counter being inc/dec each connection/disconnection, but failed to think how to do this.
Also I could do the COUNT on a higher frequency and keep an average per minute, but this not solve the problem.
Any thougts in that matter?
Thanks,
Gabriel
Use log files. Here is a quick tutorial for Linux.
1)
Find out where if postgres.conf file located:
postgres=# show config_file;
┌──────────────────────────────────────────┐
│ config_file │
├──────────────────────────────────────────┤
│ /etc/postgresql/9.5/main/postgresql.conf │
└──────────────────────────────────────────┘
2)
Find and edit parameters in it (store the copy somewhere before):
log_connections = on
log_disconnections = on
log_destination = 'csvlog'
logging_collector = on
3)
Restart PostgreSQL:
sudo service postgresql restart
(not sure but probably sudo service postgresql reload will be enough)
4)
Find out where the logs stored:
postgres=# show data_directory; show log_directory;
┌──────────────────────────────┐
│ data_directory │
├──────────────────────────────┤
│ /var/lib/postgresql/9.5/main │
└──────────────────────────────┘
┌───────────────┐
│ log_directory │
├───────────────┤
│ pg_log │
└───────────────┘
5)
Almost done. In files /var/lib/postgresql/9.5/main/pg_log/*.csv you will find records about connections/disconections. It is up to you how to deal with this info.

How to restore a single mongodb database with oplogReplay?

I'm having trouble finding if this is even a supported operation. I found things that suggest it didn't use to be, but I'm not getting any logs that indicate it's not supported (just confusing logs) and I wasn't able to find anything in mongo's docs that indicate it's not allowed.
For reference:
OS: Centos6.6
Mongod: v3.0.2
Mongo Shell: v3.0.2
Mongodump: v3.0.2
Mongorestore: v3.0.2
Here's the command I'm running to create my dump (I am using auth):
mongodump -u username -p password --authenticationDatabase admin --oplog
Here's the original file structure after a dump.
└── dump
├── oplog.bson
├── admin
│   ├── system.users.bson
│   ├── system.users.metadata.json
│   ├── system.version.bson
│   └── system.version.metadata.json
├── dogs
│   ├── tails.bson
│   └── tails.metadata.json
└── mydata
├── objects.bson
├── objects.metadata.json
├── fs.chunks.bson
├── fs.chunks.metadata.json
├── fs.files.bson
├── fs.files.metadata.json
├── configuration.bson
└── configuration.metadata.json
I've tried a few different variations of restore to get what I want, but they each seem a little off. After reading the following in mongo's docs concerning mongorestore:
--db does not control which BSON files mongorestore restores. You must use the mongorestore path option to limit that restored data.
it seems to me that I should be able to copy the oplog.bson into the particular database's folder that I want to restore and then run the following from inside dump/:
mongorestore -u username -p password --authenticationDatabase admin --oplogReplay --db dogs dogs
I found this confusing because it gives these logs:
2015-05-13T22:10:12.694+0000 building a list of collections to restore from dogs dir
2015-05-13T22:10:12.695+0000 reading metadata file from dogs/tails.metadata.json
2015-05-13T22:10:12.695+0000 restoring dogs.oplog from file dogs/oplog.bson
2015-05-13T22:10:12.696+0000 no indexes to restore
2015-05-13T22:10:12.696+0000 finished restoring dogs.oplog
2015-05-13T22:10:12.696+0000 restoring dogs.tails from file dogs/tails.bson
2015-05-13T22:10:12.697+0000 restoring indexes for collection dogs.tails from metadata
2015-05-13T22:10:12.697+0000 finished restoring dogs.tails
2015-05-13T22:10:12.697+0000 replaying oplog
2015-05-13T22:10:12.697+0000 no oplog.bson file in root of the dump directory, skipping oplog application
2015-05-13T22:10:12.697+0000 done
The first part about dogs.oplog makes it seem as if things are working, however the later message about oplog confuses me.
No matter which variations of directories and paths that I try I can't seem to satiate this message in particular:
2015-05-13T22:10:12.697+0000 replaying oplog
2015-05-13T22:10:12.697+0000 no oplog.bson file in root of the dump directory, skipping oplog application
Does this mean my oplog replay isn't happening? Is my point-in-time backup / restore still doing what I expect? I recall seeing some tickets about improving the log messages of mongotools, perhaps this is just poor logging?

Nothing changed (missing files, see hg status) when comitting

I mapped to a drive an ftp, I went and did hg init
then added a file, did hg add then hg commit -u username -m 'message'
I am getting the message nothing changed (2 missing files, see hg status)
hg status return this:
X:\public_html>hg status
A .htaccess
A index.html
I can't seem to find someone else remotely close to my problem and official doc didn't help me either.
I'm out of ideas, every bit of information is appreciated.
It seems incredibly unlikely this is going to work on a FTP mapped drive. Version control systems rely on coherent filesystem primitives (lock counts, etc.) that your mapping software likely doesn't fake correctly enough. Mercurial has its own protocol for moving changes to/from a computer (push and pull over HTTP or SSH) and that's the right way to get stuff to and from the machine on which the FTP server is running.
That said, you might have a small bit of luck with:
hg commit -u username -m 'message' .htaccess index.html
if the problem is commit not detecting the files as modified/added.
It looks like the server you're FTPing to/from is a linux box, so it's already running sshd. That means you can clone to it with:
hg clone c:\localclone ssh://you#there//full/path/to/repo
and can push/pull from that URL as well.