How to track rows (by id) with a specific column value using Kafka JDBC Connector? - apache-kafka

I have a table containing a large number of records. There's a column defining a type of the record. I'd like to collect records with a specific value in that column. Kind of:
Select * FROM myVeryOwnTable WHERE type = "VERY_IMPORTANT_TYPE"
What I've noticed I can't use WHERE clause in a custom query when I choose incremental(+timestamp) mode, otherwise I'd need to take care if filtering on my own.
The background of that I'd like to achieve is that I use Logstash to transfer some type of data from MySQL to ES. That's easily achievable there by using query that can contain where clause. However, with Kafka I can transfer my data much quicker (almost instantly) after inserting new rows in DB.
Thank you for any hints or advices.
Thanks to #wardziniak I was able to set it up.
query=select * from (select * from myVeryOwnTable p where type = 'VERY_IMPORTANT_TYPE') p
topic.prefix=test-mysql-jdbc-
incrementing.column.name=id
however, I was expecting a topic test-mysql-jdbc-myVeryOwnTable so I've registered my consumer to that. However, using the query shown above table name is skipped so my topic was named exactly as prefix defined above. So I've just updated my properties topic.prefix=test-mysql-jdbc-myVeryOwnTable and it seems to be working just fine.

You can use subquery in your Jdbc Source Connector query property.
Sample JDBC Source Connector configuration:
{
...
"query": "select * from (select * from myVeryOwnTable p where type = 'VERY_IMPORTANT_TYPE') p",
"incrementing.column.name": "id",
...
}

Related

format issue in scala, while having wildcards characters

I have a sql query suppose
sqlQuery="select * from %s_table where event like '%holi%'"
listCity=["Bangalore","Mumbai"]
for (city<- listCity){
print(s.format(city))
}
Expected output:
select * from Bangalore_table where event like '%holi%'
select * from Mumbai_table where event like '%holi%'
Actual output:
unknown format conversion exception: Conversion='%h'
Can anyone let me how to solve this, instead of holi it could be anything iam looking for a generic solution in scala.
If you want the character % in a formatting string you need to escape it by repeating it:
sqlQuery = "select * from %s_table where event like '%%holi%%'"
More generally I would not recommend using raw SQL. Instead, use a library to access the database. I use Slick but there are a number to choose from.
Also, having different tables named for different cities is really poor database design and will cause endless problems. Create a single table with an indexed city column and use WHERE to select one or more cities for inclusion in the query.

Issue with UUID datatype when laoding postgres table using Apache NiFi

Database
Postgres 9.6
Contains several tables that have a UUID column (containing the ID of each record)
NiFi
Latest release (1.7.1)
Uses Avro 1.8.1 (as far as I know)
Problem description
When scheduling the tables using the ExecuteSQL processor, the following error message occurs:
ExecuteSQL[id=09033e32-e840-1aed-3062-6e8cbc5551ba] ExecuteSQL[id=09033e32-e840-1aed-3062-6e8cbc5551ba] failed to process session due to createSchema: Unknown SQL type 1111 / uuid (table: country, column: id) cannot be converted to Avro type; Processor Administratively Yielded for 1 sec: java.lang.IllegalArgumentException: createSchema: Unknown SQL type 1111 / uuid (table: country, column: id) cannot be converted to Avro type
Note that the flowfiles aren't removed from the incoming queue, nor sent to the 'failure' relationship, resulting in an endless loop of failing attempts.
Attempts to fix issue
I tried enabling the Use Avro Logical Types property of the ExecuteSQL processor, but the same error occurred.
Possible but not preferred solutions
I currently perform a SELECT * from each table. A possible solution (I think) would be to specify each column, and have the query cast the uuid to a string. Though this could work, I'd strongly prefer not having to list every column separately.
A last note
I did find this Jira ticket: https://issues.apache.org/jira/browse/AVRO-1962
However, I'm not sure how to interpret this. Is it implemented or not? Should it work or not?
I believe UUID is not a standard JDBC type and is specific to Postgres.
The JDBC types class shows that SQL type 1111 is "OTHER":
/**
* The constant in the Java programming language that indicates
* that the SQL type is database-specific and
* gets mapped to a Java object that can be accessed via
* the methods <code>getObject</code> and <code>setObject</code>.
*/
public final static int OTHER = 1111;
So I'm not sure how NiFi could know what to do here because it could be anything depending on the type of DB.
Have you tried creating a view where you define the column as ::text?
SELECT
"v"."UUID_COLUMN"::text AS UUID_COLUMN
FROM
...

How do I pass results from one Custom Query to another?

I am using Tableau to create custom queries, I need to pass the results from one custom query into another. So for example one custom query might have the following query:
select *
from bime.test_tab
where record_date = '2018-05-21'.
I then create another custom query which uses the results from the query above. Is this possible?
Sure you can by making this custom query as a subquery
select * from (select *
from bime.test_tab
where record_date = '2018-05-21') a
As you might already know that we do not have a leverage to use temp tables in Tableau custom queries, hence we usually create subqueries for that.
Considering your case,
SELECT BIM_TABLE.* FROM
(SELECT * FROM
BIME.TEST_TAB WHERE RECORD_DATE= '2018-05-21')BIM_TABLE
You can post your actual problem with the relevant columns if you need further help.
PS - It's advised to do most of your data manipulations in SQL itself (if you are running your workbook on live connection), as Tableau is not able to properly optimize part of the 'Custom SQL' query.

Construct SQL where clause to pull data within Power Query

Basically I want to retrieve rows of data that meet my clause conditions using Power Query.
I got 400 rows of lookup values in my spreadsheet.
Each row represent 1 lookup code for example, code AAA1, AAB2 and so on
So lets say I have a select statement and I want to construct the where clauses using the above codes so my end sql statement will look like
select * from MyTable where Conditions in ('AA1', 'AAB2')
so so far I have this
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Form ID",
Int64.Type}}),
test = Sql.Database("myserver", "myDB", [Query="SELECT * FROM myTable where" & #"Changed Type" & "])"
in
test
Obviously that didnt work but thats my pseduo scenario anyway.
Please could you advice what to do?
Thank you
Peddie
I would create a "lookup" Power Query based on the Excel table. I would set the "Load To" properties to "Only Create Connection".
Then I would start the main Query by connecting to the SQL server using the Navigator to select "MyTable". Then I would add a Merge step to the main Query, to join to the "lookup" Query, matching the "Conditions" column to the "lookup" code. I would set the Join Type to "Inner". The Merge properties window will show you visually if the 2 columns you select actually contain matching data.
This approach does not require any coding, and is easier to build, extend and maintain.
Mike Honey's join is best for your problem, but here's a more general solution if you find yourself needing other logic in your where clause.
Normally Power query only generates row filters on an equality expression, but you can put any code you want in a Table.SelectRows filter, like each List.Contains({"AA1", "AAB2"}, [Conditions])
So for your table, your query would look something like:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Form ID", Int64.Type}}),
test = Sql.Database("myserver", "myDB"),
yourTable = test{[Name="myTable"]}[Data],
filtered = Table.SelectRows(yourTable, each List.Contains(#"Changed Type"[Form ID], [Conditions]))
in
filtered
The main downside to using the library functions is that Table.SelectRows only knows how to generate SQL where clauses for specific expression patterns, so the row filter probably runs on your machine after downloading the whole table, instead of having the Sql Server run the filter.

Converting complex query with inner join to tableau

I have a query like this, which we use to generate data for our custom dashboard (A Rails app) -
SELECT AVG(wait_time) FROM (
SELECT TIMESTAMPDIFF(MINUTE,a.finished_time,b.start_time) wait_time
FROM (
SELECT max(start_time + INTERVAL avg_time_spent SECOND) finished_time, branch
FROM mytable
WHERE name IN ('test_name')
AND status = 'SUCCESS'
GROUP by branch) a
INNER JOIN
(
SELECT MIN(start_time) start_time, branch
FROM mytable
WHERE name IN ('test_name_specific')
GROUP by branch) b
ON a.branch = b.branch
HAVING avg_time_spent between 0 and 1000)t
GROUP BY week
Now I am trying to port this to tableau, and I am not being able to find a way to represent this data in tableau. I am stuck at how to represent the inner group by in a calculated field. I can also try to just use a custom sql data source, but I am already using another data source.
columns in mytable -
start_time
avg_time_spent
name
branch
status
I think this could be achieved new Level Of Details formulas, but unfortunately I am stuck at version 8.3
Save custom SQL for rare cases. This doesn't look like a rare case. Let Tableau generate the SQL for you.
If you simply connect to your table, then you can usually write calculated fields to get the information you want. I'm not exactly sure why you have test_name in one part of your query but test_name_specific in another, so ignoring that, here is a simplified example to a similar query.
If you define a calculated field called worst_case_test_time
datediff(min(start_time), dateadd('second', max(start_time), avg_time_spent)), which seems close to what your original query says.
It would help if you explained what exactly you are trying to compute. It appears to be some sort of worst case bound for avg test time. There may be an even simpler formula, but its hard to know without a little context.
You could filter on status = "Success" and avg_time_spent < 1000, and place branch and WEEK(start_time) on say the row and column shelves.
P.S. Your query seems a little off. Don't you need an aggregation function like MAX or AVG after the HAVING keyword?