Cloud SQL 2nd gen (5.7) very slow after Google Maintenance - google-cloud-sql

After google maintenance on my google cloud instance, on last Saturday night, access to the database from my GAE app is very slow... this is quite obvious on huge tables. Previously I had very good performance, and now terrible. Queries are 10 to 20x slower, which causes error 500 on the app side, after reaching 60s timeouts... also, instance processor is near 100% most of the time, where before was around 20%.
Read/write operations are now insanely higher than before also...
I've already checked table status, optimize relevant tables, drop and rebuild indexes, etc...
It remains the same... can't get it done... actual instance version is 5.7.14-google (Google) - db-n1-standard-1
When checking processes, I can see often these states: 'Sending data' and 'optimizing'... those processes are taking so much time, and are always related to count queries...
For example:
| 298620 | root | cloudsqlproxy~173.194.90.100 | db_name | Query | 7 | Sending data | SELECT COUNT(*) AS `__count` FROM `sms_sms`
WHERE NOT ((`sms_sms`.`origin` = (`sms_sms`.`recipient`) |
| 298636 | root | cloudsqlproxy~74.125.93.164 | db_name | Query | 8 | optimizing | SELECT COUNT(*) AS `__count` FROM `sms_sms`
How to overcome this... it seems that is some problem with the last maintenance update, and I simply cannot do anything to solve this problem...
Any help would be appreciated!
Thanks

Related

Should KSQL tables be showing multiple rows per key for aggregates?

My understanding of KSQL tables is that they show an "as is" view of our data rather than all the data.
So if I have a simple aggregating query and I SELECT from my table, I should see the data as it is at this point in time.
My data (stream):
MY_TOPIC_STREAM:
15 | BEACH | Steven Ebb | over there
24 | CIRCUS | John Doe | an adress
30 | CIRCUS | Alice Small | another address
35 | CIRCUS | Barry Share | a home
35 | CIRCUS | Garry Share | a home
40 | CIRCUS | John Mee | somewhere
45 | CIRCUS | David Three | a place
45 | CIRCUS | Mary Three | a place
45 | CIRCUS | Joffrey Three | a place
My table definition:
CREATE TABLE MY_TABLE WITH (VALUE_FORMAT='AVRO') AS
SELECT ROWKEY AS APPLICATION, COUNT(*) AS NUM_APPLICANTS
FROM MY_TOPIC_STREAM
WHERE header->eventType = 'CIRCUS'
GROUP BY ROWKEY;
I am confused as to why I see multiple rows in my table even though the eventual aggregates are correct?
SELECT * FROM MY_TABLE;
APPLICATION NUM_APPLICANTS
24 1
30 1
--> 35 1 <-- why do I see this?
35 2
40 1
--> 45 1 <-- why do I see this?
--> 45 2 <-- why do I see this?
45 3
My sink topic also shows me the same as the table output - presumably this is correct?
I expected my table result to be:
APPLICATION NUM_APPLICANTS
24 1
30 1
35 2
40 1
45 3
Outputs abridged for brevity and readability above, but you get the gist.
So - are my expectations of the table and sink topic outputs off the mark?
UPDATE
Matthias answer below explains correctly that the table and sink topic show changelog events so it is normal to see intermediate values. However what was confusing me was that I was seeing all intermediate rows. It turned out that this was because I was using the confluent 5.2.1 docker-compose which sets the environment variable KSQL_STREAMS_CACHE_MAX_BYTES_BUFFERING=0. This disables caching of all intermediate results in KSQL aggregations and therefore the table shows more rows than expected whilst eventually arriving at the correct aggregates. Setting this to e.g. 10MB caused the data to output as expected. This feature is not immediately obvious in the documentation for those starting to play with KSQL and using docker to stand up the instances! This issue pointed me in the right direction, and this page documents the parameters. I had spent a long time on this and could not work out why it was not behaving as expected! I hope this helps someone.
Not sure what version you are using, however, SELECT * FROM MY_TABLE; does not return the current content of the table, but the table's changelog stream (this holds for older versions; in newer version the query you show is not valid as the syntax was changed).
Since the transition from KSQL to ksqlDB, the query you showed would be called a push query expressed as SELECT * FROM my_table EMIT CHANGES;.
Furthermore, ksqlDB introduced pull queries that allow you to lookup the current state. However SELECT * FROM my_table; is not supported as a pull query yet (it will be added in the future). You can only do table lookups for a specific key, i.e., there must be a WHERE clause at the moment.
Check out the docs for more details: https://docs.ksqldb.io/en/latest/concepts/queries/pull/

How to downsample data older than 14 days only in influxdb and keep it in the same table as raw data?

I am having a setup collecting metrics from telegraf into influxdb. Then grafana uses influxdb as data source to display graphs.
My problem is reducing disk usage, so I want to downsample old data (older than 3 days) and keep the new data (younger than 3 days) as is (raw)
I tried Retention Policy (RP) of influxdb and Continuous Queries (CQ) as described in guide:
https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention
influxdb ("telegraf")
+----------------------------+
| |
| +-----------------------+ |
| | table disk_raw | |
| | CURRENT RP (RAW) +---------+
| | (deleted after 3d) | | |
| +-----------------------+ | |CQ (average 30 min of datapoints into 1)
| +-----------------------+ | |
| | table_disk_ds | | |
| | LONGTERM RP +<--------+
| |(downsampled, kept 90d)| |
| +-----------------------+ |
| +<----+
+----------------------------+ |
|
|
grafana | grafana query
+----------------------------+ |
| | |
| +----------------------+ | |
| | data graph | +-----+
| +----------------------+ |
| |
+----------------------------+
The problem is - this solution is giving you 2 tables, one for raw data and one for downsampled data. CQ is constantly writing out to downsampled data.
That is not so good for me as:
I am using grafana to query influxdb and it reads from single table
to the graph. And I want one graph for both old data and new data.
Using 2 databases increases disk usage
Is there any way to downsample old records in the very same table?
configuring example:
https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention
grafana query
SELECT mean("used_percent") FROM "disk" WHERE ("device" = 'dm-0') AND $timeFilter GROUP BY time(10s) fill(none)
EDIT2: Here's a workaround implemented with template variables in Grafana
https://github.com/grafana/grafana/issues/4262#issuecomment-475570324
This seems like a really good solution.
ORIGINAL ANSWER
Looking into the example from the influxb page you linked
CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
INTO "a_year"."orders"
FROM "orders"
GROUP BY time(30m)
END
If you specify the same source and target table, namely orders, into both INTO and FROM clauses then the data will be written to the same table.
However, that does not solve your issue.
You would still need two queries to get the data from both retention policies. If you do a generic select * from disk_raw ... Influx will use the default retention policy and return data just from there.
The way you usually go about this is by running two queries and concatenating the results. In a single request something like
select * from rp_short.diskraw; select * from rp_long.diskraw
EDIT:
Here is a discussion of why it's not possible to do what you (and a lot of other people) want https://github.com/influxdata/influxdb/issues/2625
And also some ways to work around it.
Briefly, one way is to handle the downsampling and high resolution data manually (i.e not with CQ) and keep it in the same retention policy. Another is to use a proxy that would augment the query depending on the time range of the query in order to get the correct data.

Issue with displaying of information in a Tables/Matrix visual - Power BI

Hi I'm new to Power BI desktop, but have come across an issue when displaying information, Hopefully it's due to my lack of knowledge, but I can't seem to find a way to display values in rows one after the other similar to Pivot tables functionality.
For example so if I had the following table
Location | Salary | Number
A | 100 | 1
A | 200 | 2
B | 100 | 3
B | 400 | 4
C | 400 | 5
D | 800 | 6
What I'd like to produce is something like .....
A | B | C | D
300 | 500 | 400 | 800 <-- Salary Sum
3 | 7 | 5 | 6 <-- Number Sum
I have a direct link with my data source, please suggest a way to display the same with tables/matrix
Thank you in advance
Unfortunately this is currently not supported in Power BI, but maybe there is some light at the end of the tunnel... The Power BI team have started working on this much requested feature. See here
As Tom said, this is available with the August release. You can check which version you have by going to File -> Help -> About. If you have an older verion, you can go here to download the right one for you (32-bit vs 64-bit).
Once you have made sure you are running the August version, simply create a matrix with Location in the Columns field and Salary and Number in the Values field. Then go into the formatting pane and under Values, turn Show on rows to on.
Try this : Go to query editor, select the first column of the desired table, location in your case and from transform tab, select unpivot other columns.
That's it! Now go and drop your visual.

Is it possible to make multiple fields default to the same date, but also be individually editable?

I am VERY new to Access - I was sort of thrust into designing a database for a research project I'm involved in. So, please bear with me because I know next to nothing :) The problem I am having is thus:
My database is for a medical research project, and is very time and date dependent, by which I mean I need to capture the date and time for each piece of data so that we end up with a sort of timeline of events for each subject.
As is, I have something like the following for each piece of data: (Each in it's own field)
ArrivalDate
ArrivalTime
HeartRateDate
HeartRateTime
HeartRateData
TemperatureDate
TemperatureTime
TemperatureData
BloodPressureDate
BloodPressureTime
BloodPressureData
There are around 200 similar pieces of data that I need to collect for each patient. To avoid having to re-enter the same data over and over, and also to reduce the potential for error, I would like to have all of the date fields in a given patient record default to the first one that is entered, in this case "Arrival Date". However, I also need each date field to be editable without affecting the others. The reason for this is that in the event that a patient's visit occurs over the span of a few days we can accurately record that.
I have tried messing around with the default value setting, as well as setting the control source to reference the "Arrival Date" field, but then of course any changes to one field affect them all. I am not even sure that what I am trying to do is possible but I will appreciate any help and/or suggestions!
Thank you in advance
Having all this data in separate columns of a big table isn't going to work. You don't measure things like temperature or blood pressure only once per patient, do you?
This is a classic one-to-many relation.
You should have a separate Measurements table, looking e.g. like this:
+--------+-----------+---------------+------------------+-----------+
| MeasID | PatientID | MeasType | MeasDateTime | MeasValue |
+--------+-----------+---------------+------------------+-----------+
| 1 | 1 | Temperature | 2017-05-17 14:30 | 38.2 |
| 2 | 1 | BloodPressure | 2017-05-17 14:30 | 130/90 |
| 3 | 1 | Temperature | 2017-05-17 18:00 | 38.5 |
| 4 | 2 | Temperature | etc. | |
+--------+-----------+---------------+------------------+-----------+
As Barmar wrote, there is no reason to have separate columns for date and time.
In the form where measurements are entered, you can use the BeforeInsert event to set MeasDateTime to the current time, with the Now() function.
So the user never has to enter it manually, but they can edit it if the measurement was at a different time than entering the data.

merging rows in postgres with matching fields

I have a table format that appears in this type of format:
email | interest | major | employed |inserttime
jake#example.com | soccer | | true | 12:00
jake#example.com | | CS | true | 12:01
Essentially, this is a survey application and users sometimes hit the back button to add new fields. I later changed the INSERT logic to UPSERT so it just updated the row where email=currentUsersEmail , however for the data inserted prior to this code change there are many duplicate entries for single users. i have tried some group by's with no luck, as it continually says the
ID column must appear in the GROUP BY clause or be used in an
aggregate function.
Certainly there will be edge cases where there may be clashing data, in this case maybe user entered true for the employed column and then in the second he/she could have enter false. For now I am not going to take this into account yet.
I simply want to merge or flatten these values into a single row, in this case it would look like:
email | interest | major | employed |inserttime
jake#example.com | soccer | CS | true | 12:01
I am guessing I would take the most recent inserttime. I have been writing the web application in scala/play, but for this task i think probably using a language like python might be easier if i cannot do it directly through psql.
You can GROUP BY and flatten using MAX():
SELECT email, MAX(interest) AS interest,
MAX(major) AS major,MAX(employed) AS employed,
MAX(inserttime) AS inserttime
FROM your_table
GROUP BY email