Version Control in Big Query [duplicate] - version-control

I am working with bigquery, and there have been a few hundred views created. Most of these are not used and should be deleted. However, there is a chance that some are used and I cannot just blindly delete all. Therefore, I need to backup all view definitions somehow before deleting them.
Does anyone know of a good way? I am not trying to save the data, just the view definition queries and their names.
Thanks for reading!

Building off the existing answer, you can automate the backing up of all views by parsing the output of bq with jq:
#!/bin/bash
DATASETS=$(bq ls --format=sparse | tail -n+3)
for d in $DATASETS; do
TABLES=$(bq ls --format=prettyjson "$d" | jq '.[] | "\(.id), \(.type)"')
IFS=$'\n'
for table in $TABLES; do
[[ ! "$table" == *VIEW* ]] && continue
view=$(echo "$table" | sed -e 's/"//g' | cut -d , -f 1)
query=$(bq show --format=prettyjson "$view" | jq -r '.view.query')
echo -e "$query" > "$view.sql"
done
done

Part 1.
Issue the bq ls command. The --format flag can be used to control the output. If you are listing views in a project other than your default project, add the project ID to the dataset in the following format: [PROJECT_ID]:[DATASET].
bq ls --format=pretty [PROJECT_ID]:[DATASET]
Where:
[PROJECT_ID] is your project ID.
[DATASET] is the name of the dataset.
When you run the command, the Type field displays either TABLE or VIEW. For example:
+-------------------------+-------+----------------------+-------------------+
| tableId | Type | Labels | Time Partitioning |
+-------------------------+-------+----------------------+-------------------+
| mytable | TABLE | department:shipping | |
| myview | VIEW | | |
+-------------------------+-------+----------------------+-------------------+
Part 2.
Issue the bq show command. The --format flag can be used to control the output. If you are getting information about a view in a project other than your default project, add the project ID to the dataset in the following format: [PROJECT_ID]:[DATASET]. To write the view properties to a file, add > [PATH_TO_FILE] to the command.
bq show --format=prettyjson [PROJECT_ID]:[DATASET].[VIEW] > [PATH_TO_FILE]
Where:
[PROJECT_ID] is your project ID.
[DATASET] is the name of the dataset.
[VIEW] is the name of the view.
[PATH_TO_FILE] is the path to the output file on your local machine.
Examples:
Enter the following command to display information about myview in mydataset. mydataset is in your default project.
bq show --format=prettyjson mydataset.myview
Enter the following command to display information about myview in mydataset. mydataset is in myotherproject, not your default project. The view properties are written to a local file — /tmp/myview.json.
bq show --format=prettyjson myotherproject:mydataset.myview > /tmp/myview.json

You could try using bqup, which is a python script that some colleagues and I have been using to regularly back up BigQuery views and table schemata.

Related

How do I fuzzy find all files containing specific text using ripgrep and fzf and open it VSCode

I have the following command to fuzzy find files in the command line and to open the selected file in VSCode.:
fzf --print0 -e | xargs -0 -r code
Now I want to be able to search also file contents for a string. I am able to find the searched string in the command line:
rg . | fzf --print0 -e
but now it does not work anymore to open the file in VSCode using this command:
rg . | fzf --print0 -e | xargs -0 -r code
because to VSCode is passed a file name which contains the file name itself and the search string which is of course an empty file.
How can I combine to two above commands to pass the file name to VSCode which contains the searched string?
The --vimgrep option to rg returns just what the doctor ordered. That's what I used in the vscode extension issue request you put in:
https://github.com/rlivings39/vscode-fzf-quick-open/commit/101a6d8e44b707d11e661ca10aaf37102373c644
It returns data like:
$ rg --vimgrep
extension.ts:5:5:let fzfTerminal: vscode.Terminal | undefined = undefined;
extension.ts:6:5:let fzfTerminalPwd: vscode.Terminal | undefined = undefined;
Then you can cut out the first 3 fields and pass them to code -g:
rg --vimgrep --color ansi | fzf --ansi --print0 | cut -z -d : -f 1-3 | xargs -0 -r code -g

gsutil command to delete old files from last day

I have a bucket in google cloud storage. I have a tmp folder in bucket. Thousands of files are being created each day in this directory. I want to delete files that are older than 1 day every night. I could not find an argument on gsutil for this job. I had to use a classic and simple shell script to do this. But the files are deleting very slowly.
I have 650K files accumulated in the folder. 540K of them must be deleted. But my own shell script worked for 1 day and only 34K files could be deleted.
The gsutil lifecycle feature is not able to do exactly what I want. He's cleaning the whole bucket. I just want to delete the files regularly at the bottom of certain folder.. At the same time I want to do deletion faster.
I'm open to your suggestions and your help. Can I do this with a single gsutil command? or a different method?
simple script I created for testing (I prepared to delete bulk files temporarily.)
## step 1 - I pull the files together with the date format and save them to the file list1.txt.
gsutil -m ls -la gs://mygooglecloudstorage/tmp/ | awk '{print $2,$3}' > /tmp/gsutil-tmp-files/list1.txt
## step 2 - I filter the information saved in the file list1.txt. Based on the current date, I save the old dated files to file list2.txt.
cat /tmp/gsutil-tmp-files/list1.txt | awk -F "T" '{print $1,$2,$3}' | awk '{print $1,$3}' | awk -F "#" '{print $1}' |grep -v `date +%F` |sort -bnr > /tmp/gsutil-tmp-files/list2.txt
## step 3 - After the above process, I add the gsutil delete command to the first line and convert it into a shell script.
cat /tmp/gsutil-tmp-files/list2.txt | awk '{$1 = "/root/google-cloud-sdk/bin/gsutil -m rm -r "; print}' > /tmp/gsutil-tmp-files/remove-old-files.sh
## step 4 - I'm set the script permissions and delete old lists.
chmod 755 /tmp/gsutil-tmp-files/remove-old-files.sh
rm -rf /tmp/gsutil-tmp-files/list1.txt /tmp/gsutil-tmp-files/list2.txt
## step 5 - I run the shell script and I destroy it after it is done.
/bin/sh /tmp/gsutil-tmp-files/remove-old-files.sh
rm -rf /tmp/gsutil-tmp-files/remove-old-files.sh
There is a very simple way to do this, for example:
gsutil -m ls -l gs://bucket-name/ | grep 2017-06-23 | grep .jpg | awk '{print $3}' | gsutil -m rm -I
There isn't a simple way to do this with gsutil or object lifecycle management as of today.
That being said, would it be feasible for you to change the naming format for the objects in your bucket? That is, instead of uploading them all under "gs://mybucket/tmp/", you could append the current date to that prefix, resulting in something like "gs://mybucket/tmp/2017-12-27/". The main advantages to this would be:
Not having to do a date comparison for every object; you could run gsutil ls "gs://mybucket/tmp/" | grep "gs://[^/]\+/tmp/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}/$" to find those prefixes, then do date comparisons on the last portion of of those paths.
Being able to supply a smaller number of arguments on the command line (prefixes, rather than the name of each individual file) to gsutil -m rm -r, thus being less likely to pass in more arguments than your shell can handle.

Is there a way to see a count of all contributions made to Github by me?

The calendar shows contributions made in the last year. Is there a way to see a similar count but without a start date restriction ?
Is there a way to see a similar count but without a start date restriction ?
No, but I built git-stats–a tool to track your local commits and show graphs like GitHub does.
An example with my graphs.
You can use Github API to retrieve statistics on your repositories and a few lines of code to generate a global count.
Note: There is a pretty low limit in terms of requests for public access. I advise you to generate a token (Settings > Developper settings > Personal access tokens) with Access commit status and Read all user profile data rights.
Here is a small bash script using curl and jq. You just have to change your user name. You can also uncomment the AUTH line and set your generated token to avoid hitting the limit of queries:
#!/bin/bash
# Parameters
USER=jyvet
#AUTH="-u $USER:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
GAPI="https://api.github.com"
REPOS=$(curl $AUTH -s $GAPI/users/$USER/repos | jq -c -r '.[].name')
COMMITS=0
ADDITIONS=0
DELETIONS=0
# Iterate over all the repositories owned by the user
for r in $REPOS; do
STATS=$(curl $AUTH -s "$GAPI/repos/$USER/$r/stats/contributors" |
jq ".[] | select(.author.login == \"$USER\")" 2> /dev/null)
if [ $? -eq 0 ]; then
tmp=$(echo -n "$STATS" | jq '.total' 2> /dev/null)
COMMITS=$(( COMMITS + tmp ))
tmp=$(echo -n "$STATS" | jq '[.weeks[].a] | add' 2> /dev/null)
ADDITIONS=$(( ADDITIONS + tmp ))
tmp=$(echo -n "$STATS" | jq '[.weeks[].d] | add' 2> /dev/null)
DELETIONS=$(( DELETIONS + tmp ))
fi
done
echo "Commits: $COMMITS, Additions: $ADDITIONS, Deletions: $DELETIONS"
Result:
> Commits: 193, Additions: 20403, Deletions: 2687

Zenity --list --checklist issue

I'm trying to create a Zenity list to select kernel versions for removal.
So far I have:
dpkg -l | grep linux-image- | cut -f 3 -d ' ' | sed -e 's/^/FALSE /' | zenity --list --checklist --title="Select the Kernel versions to remove" --column="Kernel Version"
Most of this works in isolation, but I can't get the checkbox bits to work at all.
I just end up with a list of unchecked checkboxes and no corresponding items.
Finally figured it out, though I couldn't find it explained anywhere...
You need to specify a column name for all columns INCLUDING the checkbox column.
AND, there was no need to include the word FALSE at the start of every line as was implied by the Zenity help pages and examples I read.. strange.
So:
dpkg -l | grep linux-image- | cut -f 3 -d ' ' | zenity --list --checklist --title="Select the Kernel versions to remove" --column="Remove?" --column="Kernel Version"
works perfectly now (other than a GLib-WARNING... Bad file descriptor (9) on my system which is another issue).

Dynamically building a exlude list for both rsync & egrep format

I wonder if anyone out there can assist me in trying to solve a issue with me.
I have written a set of shell scripts with the purpose of auditing remote file systems based on a GOLD build on a audit server.
As part of this, I do the following:
1) Use rsync to work out any new files or directories, any modified or removed files
2) Use find ${source_filesystem} -ls on both local & remote to work out permissions differences
Now as part of this there are certain files or directories that I am excluding, i.e. logs, trace files etc.
So in order to achieve this I use 2 methods:
1) RSYNC - I have an exclude-list that is added using --exclude-from flag
2) find -ls - I use a egrep -v statement to exclude the same as the rsync exclude-list:
e.g. find -L ${source_filesystem} -ls | egrep -v "$SEXCLUDE_supt"
So my issue is that I have to maintain 2 separate lists and this is a bit of a admin nightmare.
I am looking for some assistance or some advice on if it is possible to dynamically build a list of exlusions that can be used for both the rsync or the find -ls?
Here is the format of what the exclude lists look like::
RSYNC:
*.log
*.out
*.csv
logs
shared
tracing
jdk*
8.6_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
**/lost+found*/
dlxwhsr*
regression
tmp
working
investigation
Investigation
dcsserver_weblogic_*.ear
dcswebrdtEAR_weblogic_*.ear
FIND:
SEXCLUDE_supt="\.log|\.out|\.csv|logs|shared|PR116PICL|tracing|lost\+found|jdk|8\.6\_Code|rpsupport|dbarchive|inarchive|comms|dlxwhsr|regression|tmp|working|investigation|Investigation|dcsserver_weblogic_|dcswebrdtEAR_weblogic_"
You don't need to create a second list for your find command. grep can handle a list of patterns using the -f flag. From the manual:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero
patterns, and therefore matches nothing. (-f is specified by POSIX.)
Here's what I'd do:
find -L ${source_filesystem} -ls | grep -Evf your_rsync_exclude_file_here
This should also work for filenames containing newlines and spaces. Please let me know how it goes.
In the end the grep -Evf was a bit of a nightmare as rsync didnt support regex, it uses regex but not the same.
So I then pursued my other idea of dynamically building the exclude list for egrep by parsing the rsync exclude-list and building variable on the fly to pass into egrep.
This the method I used:
#!/bin/ksh
# Create Signature of current build
AFS=$1
#Create Signature File
crSig()
{
find -L ${SRC} -ls | egrep -v **"$SEXCLUDE"** | awk '{fws = ""; for (i = 11; i <= NF; i++) fws = fws $i " "; print $3, $6, fws}' | sort >${BASE}/${SIFI}.${AFS}
}
#Setup SRC, TRG & SCROOT
LoadAuditReqs()
{
export SRC=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $2'}`
export TRG=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $3'}`
export SCROOT=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $4'}`
**export BEXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' ${CONF}/exclude-list.${AFS} | tr "\n" "|")**
**export SEXCLUDE=$(echo ${BEXCLUDE} | sed 's/\(.*\)|/\1/')**
}
#Load Properties File
LoadProperties()
{
. /users/rpapp/rpmonit/audit_tool/conf/environment.properties
}
#Functions
LoadProperties
LoadAuditReqs
crSig
So with these new variables:
**export BEXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' ${CONF}/exclude-list.${AFS} | tr "\n" "|")**
**export SEXCLUDE=$(echo ${BEXCLUDE} | sed 's/\(.*\)|/\1/')**
I use them to remove "*" and "/", then match my special characters and prepend with "\" to escape them.
Then it using "tr" replace a newline with "|" and then rerunning that output to remove the trailing "|" to make the variable $SEXCLUDE to use for egrep that is used in the crSig function.
What do you think?