SQLBase, query multiple values from one table column - left-join

I have big problem with SQLBase database, or it's engine.
I have history with MySQL but not with SQLBase.
I have:
Multiple tables joined together,
work order that has,
multiple values in a column and
I want them on query result in a row.
For example, this is what I want:
table
ordernr|type|..............|productnr
-------------------------------------
1141356| v1 | .............|fe465
1141356| v2 | .............|hty546
1141356| v3 | .............|rgrg211
1454446| v1 | .............|dw885
1454446| v2 | .............|fee885
1454446| v3 | .............|wwf6664
1231231| v1 | .............|ff664
1591591| v1 | .............|gg123
1591591| v2 | .............|jj5891
query result
ordernr | .............| v1 | v2 | v3
--------------------------------------------
1141356 | ............ |fe465|hty546|rgrg211
1454446 | ............ |dw885|fee885|wwf6664
1231231 | ............ |ff664| - | -
1591591 | ............ |gg123|jj5891| -
But when I am trying it I get only orders with one or two or three values.
Depending how I write on the query,
but I want all values showing.
I tried using left join but no result.
Only ordernr comes from other table.
Please, ask if you need more information.
I try my best to help.
EDIT:
Hi! It works! Somehow my query started working as it should. But let me say that I have worked with MySQL over 10 years without any bigger hassle, but this SQLBase is giving me high blood pressure. :)

Not knowing the schema , have you tried Group By i.e. Group By ordernr , to give all on one row.

Related

Should KSQL tables be showing multiple rows per key for aggregates?

My understanding of KSQL tables is that they show an "as is" view of our data rather than all the data.
So if I have a simple aggregating query and I SELECT from my table, I should see the data as it is at this point in time.
My data (stream):
MY_TOPIC_STREAM:
15 | BEACH | Steven Ebb | over there
24 | CIRCUS | John Doe | an adress
30 | CIRCUS | Alice Small | another address
35 | CIRCUS | Barry Share | a home
35 | CIRCUS | Garry Share | a home
40 | CIRCUS | John Mee | somewhere
45 | CIRCUS | David Three | a place
45 | CIRCUS | Mary Three | a place
45 | CIRCUS | Joffrey Three | a place
My table definition:
CREATE TABLE MY_TABLE WITH (VALUE_FORMAT='AVRO') AS
SELECT ROWKEY AS APPLICATION, COUNT(*) AS NUM_APPLICANTS
FROM MY_TOPIC_STREAM
WHERE header->eventType = 'CIRCUS'
GROUP BY ROWKEY;
I am confused as to why I see multiple rows in my table even though the eventual aggregates are correct?
SELECT * FROM MY_TABLE;
APPLICATION NUM_APPLICANTS
24 1
30 1
--> 35 1 <-- why do I see this?
35 2
40 1
--> 45 1 <-- why do I see this?
--> 45 2 <-- why do I see this?
45 3
My sink topic also shows me the same as the table output - presumably this is correct?
I expected my table result to be:
APPLICATION NUM_APPLICANTS
24 1
30 1
35 2
40 1
45 3
Outputs abridged for brevity and readability above, but you get the gist.
So - are my expectations of the table and sink topic outputs off the mark?
UPDATE
Matthias answer below explains correctly that the table and sink topic show changelog events so it is normal to see intermediate values. However what was confusing me was that I was seeing all intermediate rows. It turned out that this was because I was using the confluent 5.2.1 docker-compose which sets the environment variable KSQL_STREAMS_CACHE_MAX_BYTES_BUFFERING=0. This disables caching of all intermediate results in KSQL aggregations and therefore the table shows more rows than expected whilst eventually arriving at the correct aggregates. Setting this to e.g. 10MB caused the data to output as expected. This feature is not immediately obvious in the documentation for those starting to play with KSQL and using docker to stand up the instances! This issue pointed me in the right direction, and this page documents the parameters. I had spent a long time on this and could not work out why it was not behaving as expected! I hope this helps someone.
Not sure what version you are using, however, SELECT * FROM MY_TABLE; does not return the current content of the table, but the table's changelog stream (this holds for older versions; in newer version the query you show is not valid as the syntax was changed).
Since the transition from KSQL to ksqlDB, the query you showed would be called a push query expressed as SELECT * FROM my_table EMIT CHANGES;.
Furthermore, ksqlDB introduced pull queries that allow you to lookup the current state. However SELECT * FROM my_table; is not supported as a pull query yet (it will be added in the future). You can only do table lookups for a specific key, i.e., there must be a WHERE clause at the moment.
Check out the docs for more details: https://docs.ksqldb.io/en/latest/concepts/queries/pull/

How to downsample data older than 14 days only in influxdb and keep it in the same table as raw data?

I am having a setup collecting metrics from telegraf into influxdb. Then grafana uses influxdb as data source to display graphs.
My problem is reducing disk usage, so I want to downsample old data (older than 3 days) and keep the new data (younger than 3 days) as is (raw)
I tried Retention Policy (RP) of influxdb and Continuous Queries (CQ) as described in guide:
https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention
influxdb ("telegraf")
+----------------------------+
| |
| +-----------------------+ |
| | table disk_raw | |
| | CURRENT RP (RAW) +---------+
| | (deleted after 3d) | | |
| +-----------------------+ | |CQ (average 30 min of datapoints into 1)
| +-----------------------+ | |
| | table_disk_ds | | |
| | LONGTERM RP +<--------+
| |(downsampled, kept 90d)| |
| +-----------------------+ |
| +<----+
+----------------------------+ |
|
|
grafana | grafana query
+----------------------------+ |
| | |
| +----------------------+ | |
| | data graph | +-----+
| +----------------------+ |
| |
+----------------------------+
The problem is - this solution is giving you 2 tables, one for raw data and one for downsampled data. CQ is constantly writing out to downsampled data.
That is not so good for me as:
I am using grafana to query influxdb and it reads from single table
to the graph. And I want one graph for both old data and new data.
Using 2 databases increases disk usage
Is there any way to downsample old records in the very same table?
configuring example:
https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention
grafana query
SELECT mean("used_percent") FROM "disk" WHERE ("device" = 'dm-0') AND $timeFilter GROUP BY time(10s) fill(none)
EDIT2: Here's a workaround implemented with template variables in Grafana
https://github.com/grafana/grafana/issues/4262#issuecomment-475570324
This seems like a really good solution.
ORIGINAL ANSWER
Looking into the example from the influxb page you linked
CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
INTO "a_year"."orders"
FROM "orders"
GROUP BY time(30m)
END
If you specify the same source and target table, namely orders, into both INTO and FROM clauses then the data will be written to the same table.
However, that does not solve your issue.
You would still need two queries to get the data from both retention policies. If you do a generic select * from disk_raw ... Influx will use the default retention policy and return data just from there.
The way you usually go about this is by running two queries and concatenating the results. In a single request something like
select * from rp_short.diskraw; select * from rp_long.diskraw
EDIT:
Here is a discussion of why it's not possible to do what you (and a lot of other people) want https://github.com/influxdata/influxdb/issues/2625
And also some ways to work around it.
Briefly, one way is to handle the downsampling and high resolution data manually (i.e not with CQ) and keep it in the same retention policy. Another is to use a proxy that would augment the query depending on the time range of the query in order to get the correct data.

Issue with displaying of information in a Tables/Matrix visual - Power BI

Hi I'm new to Power BI desktop, but have come across an issue when displaying information, Hopefully it's due to my lack of knowledge, but I can't seem to find a way to display values in rows one after the other similar to Pivot tables functionality.
For example so if I had the following table
Location | Salary | Number
A | 100 | 1
A | 200 | 2
B | 100 | 3
B | 400 | 4
C | 400 | 5
D | 800 | 6
What I'd like to produce is something like .....
A | B | C | D
300 | 500 | 400 | 800 <-- Salary Sum
3 | 7 | 5 | 6 <-- Number Sum
I have a direct link with my data source, please suggest a way to display the same with tables/matrix
Thank you in advance
Unfortunately this is currently not supported in Power BI, but maybe there is some light at the end of the tunnel... The Power BI team have started working on this much requested feature. See here
As Tom said, this is available with the August release. You can check which version you have by going to File -> Help -> About. If you have an older verion, you can go here to download the right one for you (32-bit vs 64-bit).
Once you have made sure you are running the August version, simply create a matrix with Location in the Columns field and Salary and Number in the Values field. Then go into the formatting pane and under Values, turn Show on rows to on.
Try this : Go to query editor, select the first column of the desired table, location in your case and from transform tab, select unpivot other columns.
That's it! Now go and drop your visual.

merging rows in postgres with matching fields

I have a table format that appears in this type of format:
email | interest | major | employed |inserttime
jake#example.com | soccer | | true | 12:00
jake#example.com | | CS | true | 12:01
Essentially, this is a survey application and users sometimes hit the back button to add new fields. I later changed the INSERT logic to UPSERT so it just updated the row where email=currentUsersEmail , however for the data inserted prior to this code change there are many duplicate entries for single users. i have tried some group by's with no luck, as it continually says the
ID column must appear in the GROUP BY clause or be used in an
aggregate function.
Certainly there will be edge cases where there may be clashing data, in this case maybe user entered true for the employed column and then in the second he/she could have enter false. For now I am not going to take this into account yet.
I simply want to merge or flatten these values into a single row, in this case it would look like:
email | interest | major | employed |inserttime
jake#example.com | soccer | CS | true | 12:01
I am guessing I would take the most recent inserttime. I have been writing the web application in scala/play, but for this task i think probably using a language like python might be easier if i cannot do it directly through psql.
You can GROUP BY and flatten using MAX():
SELECT email, MAX(interest) AS interest,
MAX(major) AS major,MAX(employed) AS employed,
MAX(inserttime) AS inserttime
FROM your_table
GROUP BY email

Calculate median and average in a partition in Tableau using table calculation

I have a details table of posts and subjects digged from a forum. Row is the single subject (ie postID and subjectIS is the primary key for the table), then I have some measures at subject level and some at post level. For example:
+---------+-------------+--------------+------------+--------------+--------+
| post.ID | post.Author | post.Replies | subject.ID | subject.Rank | year |
+---------+-------------+--------------+------------+--------------+--------+
| 1 | mike | 10 | movie | 4 | 1990 |
| 1 | mike | 10 | comics | 6 | 1990 |
| 2 | sarah | 0 | tv | 10 | 2001 |
| 3 | tom | 4 | tv | 10 | 2003 |
| 3 | tom | 4 | comics | 6 | 2003 |
| 4 | mike | 1 | movie | 4 | 2008 |
+---------+-------------+--------------+------------+--------------+--------+
I want to study the trend of posts and subjects by year and color it by subject.Rank.
Firsts are easily measured putting COUNTD(post.ID) and COUNTD(subject.ID) in rows and 'year' in column.
But if I drag MEDIAN(subject.Rank) in Color, I got a wrong result: it's not calculated at distinct subject.ID level but at row level.
I think I can accomplish it using table calculation features, but I have no idea on how to proceed.
It sounds like you are trying to treat Subject.Rank as a dimension, instead of as a measure. If so, just convert it to a dimension on the worksheet in question by right clicking on the field and choosing dimension. You can also convert it to a dimension in the data pane by dragging the field from the measures section up to the dimensions section. That will tell Tableau to treat that field as a dimension by default in the future.
A field can be treated a dimension in some cases, and a measure in others. Depends on what you are trying to achieve. If you are familiar with SQL, dimensions are used to partition data rows for aggregation using the GROUP BY clause.
Finally, count distinct (COUNTD) can be expensive on large datasets. Often, you can get the same result another way. So try to think of other approaches and save COUNTD for when you really need it.
Try using {fixed [1st LEVEL],[2nd level]: median()}
or
Table calculation approach
when you put in median there is an edit table calculation under advance compute using put you fields in there(Make sure its ordered the way you want it to calculate when you select them) then click OK select the at which level and restart every