How to remove Grafana data periodically? - grafana

I have deployed Grafana monitoring system, it saves the databases on "/home/user/data" directory, the problem is that the data persists for ever so the filesystem is full usage and I would like to remove this data, for example weekly.

You do not say what data you would like to remove or what is generating all the data (it is logs?). It seems strange to just delete data from your database? Will your users not miss their dashboards?
I am going to assume you want to remove data from the database. There are a few ways to do this.
Save a clean copy of the Sqlite database and then replace the database file once a week. This will lose all your data.
For most data saved in the database, you could use the Grafana API to remove data.
An example, would be to remove dashboards. With curl and basic auth:
curl -X DELETE http://admin:admin#localhost:3000/api/dashboards/db/testdash
Use the sqlite cli to write sql queries to delete data directly.

Related

How to replicate a Postgres DB with only a sample of the data

I'm attempting to mock a database for testing purposes. What I'd like to do is given a connection to an existing Postgres DB, retrieve the schema, limit the data pulled to 1000 rows from each table, and persist both of these components as a file which can later be imported into a local database.
pg_dump doesn't seem to fullfill my requirements as theres no way to tell it to only retrieve a limited amount of rows from tables, its all or nothing.
COPY/\copy commands can help fill this gap, however, it doesn't seem like theres a way to copy data from multiple tables into a single file. I'd rather avoid having to create a single file per table, is there a way to work around this?

How to update a postgres database in kubernetes using scripts [CI/CD]?

I have one postgres database deployed in kubernetes attached to a pvc [with RWX access mode]. What is the right way to update (create table) the database through my CI/CD instead of logging in to the pod and running queries [Without deleting the pvc] ?
My understanding is that the background reason for your question is how to deploy DB structure changes DB onto production with minimal downtime. For that I'd go with a Blue Green Deployments 1 .
After your comments I assume that you already have running instance of PostgreSQL and would like to modify the content of the DB by altering file structure directly on a "disk" (in this case PVC).
Modifying data structure directly on disk is not the best idea if we are speaking about data integrity, etc.
The reasons for that statement are explained in this article 2. It describes how exactly postgreSQL stores data on disk.
PostgreSQL (by default) writes blocks of data (what PostgreSQL calls pages) to disk in 8k chunks.
Additionally, there is a relation between table and file_path, so postgresql knows which exactly file stores which table.
SELECT pg_relation_filepath('test_data');
pg_relation_filepath
----------------------
base/20886/186770
In this example the file /database/base/20866/186770 contains the actual data for the table test_data.
What is the right way to update the database instead of logging in to the pod and running queries
However, if you sure that you have complete set of files for the DB to operate (like the one you are using during pg_dump / pg_restore) you can try placing that data on another PVC and recreate pod, however that will still result in a downtime.
Hope that helps.

Insert MongoDB document with an objectId that existed in the past

I've a bunch of collections (12) and I need to rename many fields of them. I can't do it live in the database, all I can do is download and reupload a dump of it.
So I've downloaded the collection with mongodump manipulated the data and I'm planning to use mongorestore to push it back on the database.
I'm wondering what will happen then with ObjectIds.. I know that an objectId is unique throughout the database so I'm thinking about deleting all the old data right before using mongorestore, is it ok or will I still have problems with the ids?
You can specify any value for MongoID whatever you want. You even can use string instead of MongoID.
If you have production app you need to perform upgrade and migrate data by application itself step by step.
If you have one proccess singlethreaded application or if you can run your app in that way - it is most simple case. Else you need synchronization service.
Be carefull with async/await and promises and so on asyncronous processes. They receive and have in memory one the data in one time and continue process with that data in another time, and it need to have in mind that.
You need to do:
modify service to be ready to both data format
create modification code which will go through all the data and migrate it
modify service to be ready only to new data format once all the data migrate done

Remove a series in Graphite when using postgres for storage?

Just installed graphite using postgres for storage and sending data to graphite using statsd. Works fine!
My issue is I created a bunch of series (mostly gauges) that were just for testing and I want them gone but see no way to delete them. I have no whisper files to delete since I am using postgres.
In looking at the tables in postgres for the graphite database I see nothing that contains the series. I see my custom graphs and my user but nowhere in the graphite database can I find my testing series to blow away.
Any pointers? Are the series not kept in the postgres DB?
Graphite only uses PostgreSQL/MySQL/SQLite for storing user profiles, saved graphs & dashboards, and events (annotation-style data). Time-series metrics are stored in the native Whisper files. In most cases these files will exist under /opt/graphite/storage/whisper/.
Say you sent a metric by accident named foo.bar.baz. This file will exist at /opt/graphite/storage/whisper/foo/bar/baz.wsp and can be deleted from the command-line with sudo rm /opt/graphite/storage/whisper/foo/bar/baz.wsp.

Mongodb to redshift

We have a few collections in mongodb that we wish to transfer to redshift (on an automatic incremental daily basis).
How can we do it? Should we export the mongo to csv?
I wrote some code to export data from Mixpanel into Redshift for a client. Initially the client was exporting to Mongo but we found Redshift offered very large performance improvements for query. So first of all we transferred the data out of Mongo into Redshift, and then we came up with a direct solution that transfers the data from Mixpanel to Redshift.
To store JSON data in Redshift first you need to create a SQL DDL to store the schema in Redshift i.e. a CREATE TABLE script.
You can use a tool like Variety to help as it can give you some insight into your Mongo schema. However it does struggle with big datasets - you might need to subsample your dataset.
Alternatively DDLgenerator can generate DDL from various sources including CSV or JSON. This also struggles with large datasets (well the dataset I was dealing with was 120GB).
So in theory you could use MongoExport to generate CSV or JSON from Mongo and then run it through DDL generator to get a DDL.
In practice I found using JSON export a little easier because you don't need to specify the fields you want to extract. You need to select the JSON array format. Specifically:
mongoexport --db <your db> --collection <your_collection> --jsonArray > data.json
head data.json > sample.json
ddlgenerator postgresql sample.json
Here - because I am using head - I use a sample of the data to show the process works. However, if your database has schema variation, you want to compute the schema based on the whole database which could take several hours.
Next you upload the data into Redshift.
If you have exported JSON, you need to use Redshift's Copy from JSON feature. You need to define a JSONpath to do this.
For more information check out the Snowplow blog - they use JSONpaths to map the JSON on to a relational schema. See their blog post about why people might want to read JSON to Redshift.
Turning the JSON into columns allows much faster query than the other approaches such as using JSON EXTRACT PATH TEXT.
For incremental backups, it depends if data is being added or data is changing. For analytics, it's normally the former. The approach I used is to export the analytic data once a day, then copy it into Redshift in an incremental fashion.
Here are some related resources although in the end I did not use them:
Spotify has a open-source project called Luigi - this code claims to upload JSON to Redshift but I haven't used it so I don't know if it works.
Amiato have a web page that says they offer a commercial solution for loading JSON data into Redshift - but there is not much information beyond that.
This blog post discusses performing ETL on JSON datasources such as Mixpanel into Redshift.
Related Redit question
Blog post about dealing with JSON arrays in Redshift
Honestly, I'd recommend using a third party here. I've used Panoply (panoply.io) and would recommend it. It'll take your mongo collections and flatten them into their own tables in redshift.
AWS Database Migration Service(DMS) Adds Support for MongoDB and Amazon DynamoDB.So I think now onward best option to migrate from MongoDB to Redshift is DMS.
MongoDB versions 2.6.x and 3.x as a database source
Document Mode and Table Mode supported
Supports change data capture(CDC)
Details - http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MongoDB.html
A few questions that would be helpful to know would be:
Is this an add-only always increasing incremental sync i.e. data is only being added and not being updated / removed or rather your redshift instance is interested only in additions?
Is the data inconsistency due to delete / updates happening at source and not being fed to redshift instance ok?
Does it need to be daily-incremental batch or can it be realtime as it is happening as well?
Depending on your situation may be mongoexport works for you, but you have to understand the shortcoming of it, which can be found at http://docs.mongodb.org/manual/reference/program/mongoexport/ .
I had to tackle the same issue (not on a daily basis though).
as ask mentioned, You can use mongoexport in order to export the data, but keep in mind that redshift doesn't support arrays, so in case your collections data contains arrays you'll find it a bit problematic.
My solution to this was to pipe the mongoexport into a small utility program I wrote that transforms the mongoexport json rows into my desired csv output.
piping the output also allows you to make the process parallel.
Mongoexport allows you to add a mongodb query to the command, so if your collection data supports it you can spawn N different mongoexport processes, pipe it's results into the other program and decrease the total runtime of the migration process.
Later on, I uploaded the files to S3, and performed a COPY into the relevant table.
This should be a pretty easy solution.
Stitch Data is the best tool ever I've ever seen to replicate incrementally from MongoDB to Redshift within a few clicks and minutes.
Automatically and dynamically Detect DML, DDL for tables for replication.