We're designing an API where partners can post images and titles with it. I do have some trouble imaging how to implement an API endpoint for it which supports i18n in the most uncomplicated way for our partners (not remembering our returned ids for example if they want to use their existing ids).
Our first database table idea looked like:
image_id | partner_id | category_id | image | description
---------------------------------------------------------
123 | 1 | 8 | url.. | This is my image!
234 | 2 | 5 | url.. | A pretty image.
But we would probably split the description into an own table which would look something like:
image_id | language | description
---------------------------------
123 | en | This is my image!
123 | de | Dies ist mein Bild!
So how could the API endpoint look like?
Single request with an array of images, which can be the same images just with different language/description values? (images would not be persisted multiple times, just the different descriptions)
N requests with images and an Content-Language header for all the images contained inside the request? (resulting in the same persistence as in the 1. idea)
Single request to create images with a "default" language/description value and a PUT endpoint to further add additional language/description values?
Something completely different?
Related
What I have, are two streams (from two different systems, imported via connectors). Some of the information from the different streams will be used to build combined information.
Currently, I'm working with ksqlDB but I'm having problems with the last step to reduce the information from both streams.
Both streams contains a tree structure (id/parentId), so I've used a second table for each stream to find certain information from the parents, which is then joined into a table containing all the information to do the final reduce.
The main matching column is always the same, however, one or more columns (not fixed) is also needed to do the final match. The columns might also be partial matches between them.
An example output of the table might look like this:
| id | match | matchExtra1 | matchExtra2 | matchExtra3 |
| 1 | 1 | Extra1 | Extra2 | Extra3 |
| 2 | 1 | Extra1 | Extra4 | Extra5 |
| 3 | 1 | Extra6 | Extra7 | Extra8 |
| 4 | 1 | Extra9 | Extr10 | tra8 |
In this case, id 1 and 2 should be matched and id 3 and 4 should be another match.
If this is possible within ksqlDB, that would be great. If needed to work with low-level Kafka, that's fine as long as we can achieve the end result.
Basic flow as I have it right now:
How can I recursively get the breakdown of "Others" when Top N is applied to dimensions?
Imagine a measure Sales Amount is sliced by 3 dimensions, Region, Category and Product, and Top 1 is applied to each dimension. The result I want to see is a table like below. On each slice, the rest of members are grouped as "Others".
Region | Category | Product | Sales
============================================
Europe | Bikes | Mountain Bikes | $100
| |------------------------
| | Others | $ 30
|-----------------------------------
| Others | Gloves | $ 50
| |------------------------
| | Others | $120
--------------------------------------------
Others | Clothes | Jackets | $ 80
| |------------------------
| | Others | $130
|-----------------------------------
| Others | Shoes | $ 90
| |------------------------
| | Others | $110
--------------------------------------------
When an "Others" appears, I want to see the Top 1 of the next dimension within the scope of this "Others". This seems a little tricky. e.g. tuples like (North America, Clothes) and (Central America, Clothes) need to be aggregated as (Other Regions, Clothes). Is there a neat way to aggregate the measure based on the 2nd dimension, Category?
Alternatively, I think a sub cube that filters out Europe will easily provide the breakdown of Other Regions, Clothes and Other Categories. However, this is likely to result in creating many dependent queries. For an easy processing of the result set, it would be ideal if the query returns data in the above format.
Can this be possibly achieved by a single MDX query?
To get the breakdown of others we must use dynamic set, EXCEPT() and aggregate functions
in each of the three dimensions we will need to create a named dynamic set that holds too members (top 1 and others ).
as exemple, in the dimension category i have created a dynamic set that holds two members (Top 1 and others) like this :
CREATE MEMBER
CURRENTCUBE.[Product].[French Product Category Name].[ALL].[OTHERS] AS
AGGREGATE(EXCEPT([Product].[French Product Category Name].[French Product Category Name].MEMBERS,
TOPCOUNT([Product].[French Product Category Name].[French Product Category Name],1,[Measures].[Sales Amount])
));
CREATE DYNAMIC SET [TOP1 and Others]
AS {TOPCOUNT([Product].[French Product Category Name].[French Product Category Name],1,[Measures].[Sales Amount]),[OTHERS]};
because the set is dynamic then the values of top 1 and others will change according to the filters and slicers that you applay.
I am having a setup collecting metrics from telegraf into influxdb. Then grafana uses influxdb as data source to display graphs.
My problem is reducing disk usage, so I want to downsample old data (older than 3 days) and keep the new data (younger than 3 days) as is (raw)
I tried Retention Policy (RP) of influxdb and Continuous Queries (CQ) as described in guide:
https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention
influxdb ("telegraf")
+----------------------------+
| |
| +-----------------------+ |
| | table disk_raw | |
| | CURRENT RP (RAW) +---------+
| | (deleted after 3d) | | |
| +-----------------------+ | |CQ (average 30 min of datapoints into 1)
| +-----------------------+ | |
| | table_disk_ds | | |
| | LONGTERM RP +<--------+
| |(downsampled, kept 90d)| |
| +-----------------------+ |
| +<----+
+----------------------------+ |
|
|
grafana | grafana query
+----------------------------+ |
| | |
| +----------------------+ | |
| | data graph | +-----+
| +----------------------+ |
| |
+----------------------------+
The problem is - this solution is giving you 2 tables, one for raw data and one for downsampled data. CQ is constantly writing out to downsampled data.
That is not so good for me as:
I am using grafana to query influxdb and it reads from single table
to the graph. And I want one graph for both old data and new data.
Using 2 databases increases disk usage
Is there any way to downsample old records in the very same table?
configuring example:
https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention
grafana query
SELECT mean("used_percent") FROM "disk" WHERE ("device" = 'dm-0') AND $timeFilter GROUP BY time(10s) fill(none)
EDIT2: Here's a workaround implemented with template variables in Grafana
https://github.com/grafana/grafana/issues/4262#issuecomment-475570324
This seems like a really good solution.
ORIGINAL ANSWER
Looking into the example from the influxb page you linked
CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
INTO "a_year"."orders"
FROM "orders"
GROUP BY time(30m)
END
If you specify the same source and target table, namely orders, into both INTO and FROM clauses then the data will be written to the same table.
However, that does not solve your issue.
You would still need two queries to get the data from both retention policies. If you do a generic select * from disk_raw ... Influx will use the default retention policy and return data just from there.
The way you usually go about this is by running two queries and concatenating the results. In a single request something like
select * from rp_short.diskraw; select * from rp_long.diskraw
EDIT:
Here is a discussion of why it's not possible to do what you (and a lot of other people) want https://github.com/influxdata/influxdb/issues/2625
And also some ways to work around it.
Briefly, one way is to handle the downsampling and high resolution data manually (i.e not with CQ) and keep it in the same retention policy. Another is to use a proxy that would augment the query depending on the time range of the query in order to get the correct data.
I have a group of users who each have a variable that assigns them to a group. I can't share the data, but hopefully this example data will prove to be sufficient.
+-----+-----------+--------------+
| ID | Age Group | Location |
+-----+-----------+--------------+
| 1 | 18-34 | East Spain |
| 2 | 35-44 | North China |
| 3 | 35-44 | East China |
| 4 | 65+ | East Congo |
| 5 | 45-54 | North Japan |
| 6 | 0-17 | North Spain |
| 7 | 65+ | North Congo |
| 8 | 45-54 | East Japan |
| 9 | 0-17 | North Spain |
| 10 | 18-34 | East China |
| 11 | 18-34 | North China |
+-----+-----------+--------------+
My end goal is to create a sheet/dashboard, with a pie chart for age grouping. I want to filter this pie chart based on the Area, however, I want there to be two selections, one for Area (East/North), and one for Country (Spain/China/Congo/Japan). The filters will both be "Single Value Lists", so only one Area and one Country will be able to be selected at a time, but together they will combine to filter the patients. For example, if 'East' was chosen for the Area selection, and 'China' for the Country selection, the pie chart would only show for patients 3 and 10.
This helps reduce the number of selections that a user will have from 8, to 6. I know this isn't much of a difference, but in the actual data there are a lot more permutations and so the reduction would really help when de-cluttering the sheet/dashboard.
I've created the parameters for both Area and Country, but I don't know how to combine the two parameters to effect the patients that are selected.
Let me know if I can clarify anything. If parameters aren't the way to do this, I am also open to other suggestions!
Thanks so much!
Why not split the location into two columns, then create filters for each column? Then you have exactly the functionality that you want just using filters without params and calculations
You could then drag Country onto Area in the data pane to tell Tableau there is a hierarchical relationship between the fields, and set the filter for Country to show "only relevant values", and the filter for Area to show "all values in the database" -- via the little black caret menu at the top right of the filter control.
Then the filter control for Country would only display values for the selected Area.
The other advantage this has is that you wouldn't need to maintain a separate list of parameter values. The set of values would be discovered automatically from your data. If areas or countries appear, get renamed or removed from your database, then you'll see that automatically in the filter choices. With parameters, if Korea unifies or the US splits into red USA and blue USA, you'll see that automatically and not risk preventing access to new data simply because your list of parameter values is out of date.
Create a calculated field that concatenates the values from your parameters and tests it against your location field. Then put that calculated field in your filters card and set it to True.
Calculated field should look like this:
([Area] + ' ' + [Country]) = [Location]
I have a table format that appears in this type of format:
email | interest | major | employed |inserttime
jake#example.com | soccer | | true | 12:00
jake#example.com | | CS | true | 12:01
Essentially, this is a survey application and users sometimes hit the back button to add new fields. I later changed the INSERT logic to UPSERT so it just updated the row where email=currentUsersEmail , however for the data inserted prior to this code change there are many duplicate entries for single users. i have tried some group by's with no luck, as it continually says the
ID column must appear in the GROUP BY clause or be used in an
aggregate function.
Certainly there will be edge cases where there may be clashing data, in this case maybe user entered true for the employed column and then in the second he/she could have enter false. For now I am not going to take this into account yet.
I simply want to merge or flatten these values into a single row, in this case it would look like:
email | interest | major | employed |inserttime
jake#example.com | soccer | CS | true | 12:01
I am guessing I would take the most recent inserttime. I have been writing the web application in scala/play, but for this task i think probably using a language like python might be easier if i cannot do it directly through psql.
You can GROUP BY and flatten using MAX():
SELECT email, MAX(interest) AS interest,
MAX(major) AS major,MAX(employed) AS employed,
MAX(inserttime) AS inserttime
FROM your_table
GROUP BY email