How to send unbounded TableResult to Kafka sink? - apache-kafka

I am using table API to create two streams lets call it A and B. Using executeSql I am joining the two tables. The output is in the form of TableResult. I want to send the joined result to Kafka Sink. Please find below the code.
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);
String ddlUser = "CREATE TABLE UserTable (\n" +
"id BIGINT,\n" +
"name STRING\n" +
") WITH (\n" +
"'connector' = 'kafka',\n" +
"'topic' = 'USER',\n" +
"'properties.bootstrap.servers' = 'pkc:9092',\n" +
"'properties.group.id' = 'testGroup',\n" +
"'scan.startup.mode' = 'earliest-offset',\n" +
"'format' = 'json',\n" +
"'properties.security.protocol' = 'SASL_SSL',\n" +
"'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";',\n" +
"'properties.sasl.mechanism' = 'PLAIN'\n" +
")";
tEnv.executeSql(ddlUser);
String ddlPurchase = "CREATE TABLE PurchaseTable (\n" +
"transactionId BIGINT,\n" +
"userID BIGINT,\n" +
"item STRING\n" +
") WITH (\n" +
"'connector' = 'kafka',\n" +
"'topic' = 'PURCHASE',\n" +
"'properties.bootstrap.servers' = 'pkc:9092',\n" +
"'properties.group.id' = 'purchaseGroup',\n" +
"'scan.startup.mode' = 'earliest-offset',\n" +
"'format' = 'json',\n" +
"'properties.security.protocol' = 'SASL_SSL',\n" +
"'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";',\n" +
"'properties.sasl.mechanism' = 'PLAIN'\n" +
")";
tEnv.executeSql(ddlPurchase);
String useQuery = "SELECT * FROM UserTable";
String purchaseQuery = "SELECT * FROM PurchaseTable JOIN UserTable ON PurchaseTable.userID = UserTable.id";
TableResult joinedData = tEnv.executeSql(purchaseQuery);
How to send unbounded TableResult to Kafka sink?

You need to insert into a destination table that is also backed by the kafka connector: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/common/#emit-a-table
In the example they create a temporary table, but as you have already done, you can create a table with the Kafka connector https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/kafka/ and have the stream inserted into it (haven't tested but should be something like this):
tEnv.sqlQuery(purchaseQuery).insertInto('DestinationTable')
or
tEnv.executeSql('INSERT INTO DestinationTable SELECT * FROM PurchaseTable JOIN UserTable ON PurchaseTable.userID = UserTable.id')

Related

How to use android SQLITE SELECT with two parameters?

This code return empty cursor.What is wrong here?
Data is already there in sqlitedb.
public static final String COL_2 = "ID";
public static final String COL_3 = "TYPE";
public Cursor checkData(String id, String type){
SQLiteDatabase db = getWritableDatabase();
Cursor res = db.rawQuery("SELECT * FROM "+ TABLE_NAME + " WHERE " + COL_2 + " = " + id+ " AND " + COL_3 + " = " + type , null);
return res;
}
When you pass strings as parameters you must quote them inside the sql statement.
But by concatenating quoted string values in the sql code your code is unsafe.
The recommended way to do it is with ? placeholders:
public Cursor checkData(String id, String type){
SQLiteDatabase db = getWritableDatabase();
String sql = "SELECT * FROM "+ TABLE_NAME + " WHERE " + COL_2 + " = ? AND " + COL_3 + " = ?";
Cursor res = db.rawQuery(sql , new String[] {id, type});
return res;
}
The parameters id and type are passed as a string array in the 2nd argument of rawQuery().
I finally solved it.
public Cursor checkData(String id, String type){
SQLiteDatabase db = getWritableDatabase();
Cursor res = db.rawQuery("SELECT * FROM "+ TABLE_NAME + " WHERE " + COL_2 + " = '" + id+ "' AND " + COL_3 + " = '" + type +"'" , null);
return res;
}
if COL_3 type is string try this:
Cursor res = db.rawQuery("SELECT * FROM "+ TABLE_NAME + " WHERE " + COL_2 + " = " + id+ " AND " + COL_3 + " = '" + type + "'" , null);

PSQLException: The column name clazz_ was not found in this ResultSet

I am trying to fetch a PlaceEntity. I've previously stored a bunch of GooglePlaceEntity objects where
#Entity
#Table(name = "place")
#Inheritance(
strategy = InheritanceType.JOINED
)
public class PlaceEntity extends AbstractTimestampEntity {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
}
and
#Entity
#Table(name = "google_place")
public class GooglePlaceEntity extends PlaceEntity {
// Additional fields ..
}
However, neither do I want to send information stored in google_place nor do I want to load it unnecessarily. For this reason I am only fetching
public interface PlaceRepository extends JpaRepository<PlaceEntity, Long> {
#Query(value = "" +
"SELECT * " +
"FROM place " +
"WHERE earth_distance( " +
" ll_to_earth(place.latitude, place.longitude), " +
" ll_to_earth(:latitude, :longitude) " +
") < :radius",
nativeQuery = true)
List<PlaceEntity> findNearby(#Param("latitude") Float latitude,
#Param("longitude") Float longitude,
#Param("radius") Integer radius);
}
and what I get is this:
org.postgresql.util.PSQLException: The column name clazz_ was not found in this ResultSet.
at org.postgresql.jdbc.PgResultSet.findColumn(PgResultSet.java:2588) ~[postgresql-9.4.1208-jdbc42-atlassian-hosted.jar:9.4.1208]
at org.postgresql.jdbc.PgResultSet.getInt(PgResultSet.java:2481) ~[postgresql-9.4.1208-jdbc42-atlassian-hosted.jar:9.4.1208]
at com.zaxxer.hikari.pool.HikariProxyResultSet.getInt(HikariProxyResultSet.java) ~[HikariCP-2.7.8.jar:na]
at org.hibernate.type.descriptor.sql.IntegerTypeDescriptor$2.doExtract(IntegerTypeDescriptor.java:62) ~[hibernate-core-5.2.14.Final.jar:5.2.14.Final]
at org.hibernate.type.descriptor.sql.BasicExtractor.extract(BasicExtractor.java:47) ~[hibernate-core-5.2.14.Final.jar:5.2.14.Final]
...
I am able to run this statement in pure PostgreSQL:
SELECT * FROM place WHERE
earth_distance(
ll_to_earth(place.latitude, place.longitude),
ll_to_earth(17.2592522, 25.0632745)
) < 1500;
but not using the JpaRepository.
And by the way, fetching a GooglePlaceEntity is working however:
#Query(value = "" +
"SELECT * " +
"FROM place JOIN google_place ON google_place.id = place.id " +
"WHERE earth_distance( " +
" ll_to_earth(place.latitude, place.longitude), " +
" ll_to_earth(:latitude, :longitude) " +
") < :radius",
nativeQuery = true)
List<GooglePlaceEntity> findNearby(#Param("latitude") Float latitude,
#Param("longitude") Float longitude,
#Param("radius") Integer radius);
In case of #Inheritance(strategy = InheritanceType.JOINED), when you retrieve data without nativeQuery=True in JPA repository, Hibernate will execute SQL like the following:
SELECT
table0_.id as id1_1_,
table0_.column2 as column2_2_1_,
... (main_table cols)
table0_1_.column1 as column1_1_0_,
... (table1 to N-1 cols)
table0_N_.column1 as column1_1_9_,
... (tableN-th cols)
CASE WHEN table0_1_.id is not null then 1
... (table1 to N-1 cols)
WHEN table0_N_.id is not null then N
WHEN table0_.id is not null then 0
END as clazz_
FROM table table0_
left outer join table1 table0_1_ on table0_.id=table0_1_.id
... (other tables join)
left outer join table2 table0_N_ on table0_.id=table0_N_.id
From the above SQL you can see clazz specification. If you want to map ResultSet to your super instance (PlaceEntity), you should specify clazz_ column in SELECT by yourself.
In your case it will be:
#Query(value = "" +
"SELECT *, 0 AS clazz_ " +
"FROM place " +
"WHERE earth_distance( " +
" ll_to_earth(place.latitude, place.longitude), " +
" ll_to_earth(:latitude, :longitude) " +
") < :radius",
nativeQuery = true)
You should use the name of the class instead of the table name on the query. Change place to PlaceEntity.
#Query(value = "" +
"SELECT * " +
"FROM place JOIN google_place ON google_place.id = place.id " +
"WHERE earth_distance( " +
" ll_to_earth(place.latitude, place.longitude), " +
" ll_to_earth(:latitude, :longitude) " +
") < :radius",
nativeQuery = true)
List<GooglePlaceEntity> findNearby(#Param("latitude") Float latitude,
#Param("longitude") Float longitude,
#Param("radius") Integer radius);

Writing query with parameters to avoid SQL Injections

I have done that before, but in this case I have an insert into table query where value of the column of the target table comes as a result from another query. Having that, I'm not sure if my parametarized query is formatted the right way.
Here is an original query without before Sql Injection fix:
cmd.CommandText += "insert into controlnumber (controlnumber, errorid)
values ('" + ControlNumber + "', (select errorid from error where
errordescription = '" + ErrorDescription + "' and errortype = '" +
ErrorType + "' + and applicationid = " + ApplicationID + " and statusid =
" + StatusID + " and userid = " + UserID + " and errortime = '" +
ErrorTime + "');";
This is the query after I tried to fix Sql Injection:
cmd.CommandText = "insert into ControlTable(ControlNumber, ErrorID)
values (#ControlNum, (select errorid from error where errordescription =
#ErrorDescription and errortype = #errorType and applicationid =
#ApplicationID and statusid = #StatusID and userid = #UserID and
errortime = #ErrorTime)"
This is where I add parameters:
.....
command.CommandType = CommandType.Text
command.Parameters.AddWithValue("#ErrorDescription ", ErrorDesc);
command.Parameters.AddWithValue("#ControlNum", cntNumber);
command.Parameters.AddWithValue("#errorType",ErrorType);
command.Parameters.AddWithValue("#ApplicationID",AppID);
command.Parameters.AddWithValue("#StatusID",StatusID);
command.Parameters.AddWithValue("#UserID",UserID);
....
I'm just wondering if my CommandText is formatted the right way.
Thank's
try this:
cmd.CommandText = "insert into ControlTable(ControlNumber, ErrorID)
select #ControlNum, errorid from error where errordescription =
#ErrorDescription and errortype = #errorType and applicationid =
#ApplicationID and statusid = #StatusID and userid = #UserID and
errortime = #ErrorTime)"
When using INSERT INTO SELECT FROM, you do not use keyword VALUES. The syntax is:
INSERT INTO TABLE(columns) SELECT ... FROM TABLE2

ESPER: 'Partition by' CLAUSE ERROR

The issue that I have is using the clause 'partition by' in 'Match Recognize', the 'partition by' clause seems to support just 99 different events because when I have 100 or more different events it does not group correctly. to test this I have the following EPL query:
select * from TemperatureSensorEvent
match_recognize (
partition by id
measures A.id as a_id, A.temperature as a_temperature
pattern (A)
define
A as prev(A.id) is null
)
I am using this query basically to get the first event (first temperature) of each device, however testing with 10, 20, 50, ... 99 different devices it works fine but when I have more than 99, it seems that ESPER resets all the events send before the device with id=100, and if I send a event that is of the device with id=001, ESPER takes it as if it was the first event.
it seems that 'partition by' just supports 99 different events and if you add one more the EPL is reset or something like that. Is it a restriction that 'partition by' clause has?, how I can increase this threshold because I have more than 100 devices?.
ESPER version: 5.1.0
Thanks in advance
Demo Class:
public class EsperDemo
{
public static void main(String[] args)
{
Configuration config = new Configuration();
config.addEventType("TemperatureSensorEvent", TemperatureSensorEvent.class.getName());
EPServiceProvider esperProvider = EPServiceProviderManager.getProvider("EsperDemoEngine", config);
EPAdministrator administrator = esperProvider.getEPAdministrator();
EPRuntime esperRuntime = esperProvider.getEPRuntime();
// query to get the first event of each temperature sensor
String query = "select * from TemperatureSensorEvent "
+ "match_recognize ( "
+ " partition by id "
+ " measures A.id as a_id, A.temperature as a_temperature "
+ " after match skip to next row "
+ " pattern (A) "
+ " define "
+ " A as prev(A.id) is null "
+ ")";
TemperatureSubscriber temperatureSubscriber = new TemperatureSubscriber();
EPStatement cepStatement = administrator.createEPL(query);
cepStatement.setSubscriber(temperatureSubscriber);
TemperatureSensorEvent temperature;
Random random = new Random();
int sensorsQuantity = 100; // it works fine until 99 sensors
for (int i = 1; i <= sensorsQuantity; i++) {
temperature = new TemperatureSensorEvent(i, random.nextInt(20));
System.out.println("Sending temperature: " + temperature.toString());
esperRuntime.sendEvent(temperature);
}
temperature = new TemperatureSensorEvent(1, 64);
System.out.println("Sending temperature: sensor with id=1 again: " + temperature.toString());
esperRuntime.sendEvent(temperature);
}
}

Uploading data using C# console application

I have a C# console application that uploads data into SQL Server database after doing a bit of calculation which is done using various C# functions. Now the problem is it is taking almost 1 sec to calculate and upload one line of data and I have to upload 50,000 lines of data in the same way.
Please suggest me a way to solve this problem.
P.S. : I am using stringbuilder to compose separate insert statements and upload in bulk. This process is taking only 1 min.
Inserting or updating to database is hardly taking any time as I have mentioned in my question. Calculation is taking most of the time. I am attaching the code sample of a function below:
public void EsNoMinLim()
{
ds = new DataSet();
ds = getDataSet("select aa.Country, aa.Serial_No from UEM_Data aa inner join (select distinct " +
"IId, Country from UEM_Data where Active_Status is null) bb on aa.iid = bb.iid where aa.Serial_No <> '0'").Copy();
execDML("Delete from ProMonSys_Grading");
StringBuilder strCmd = new StringBuilder();
foreach (DataRow dRow in ds.Tables[0].Rows)
{
SiteCode = dRow["Country"].ToString();
Serial_No = dRow["Serial_No"].ToString();
ds_sub = new DataSet();
ds_sub = getDataSet("select EsNo_Abs_Limit from EsNo_Absolute_Limit where Fec_Coding_Rate in "+
"(select MODCOD from FEC_Master where NMS_Value in (select Top 1 FEC_Rate from "+
"DNCC_Billing_Day where Serial_No = '" + Serial_No + "' and [Date] = (select max([Date]) "+
"from DNCC_Billing_Day where Serial_No = '" + Serial_No + "')))").Copy();
if (ds_sub.Tables[0].Rows.Count > 0 && Convert.ToString(ds_sub.Tables[0].Rows[0][0]) != "")
{
Min_EsNo = Convert.ToString(ds_sub.Tables[0].Rows[0][0]);
}
else
{
Min_EsNo = "a";
}
if (Min_EsNo != "a")
{
ds_sub = new DataSet();
ds_sub = getDataSet("select Top 1 modal_Avg_EsNo from DNCC_Billing_Day where " +
"Serial_No = '" + Serial_No + "' and [Date] = (select max([Date]) from DNCC_Billing_Day " +
"where Serial_No = '" + Serial_No + "')").Copy();
if (ds_sub.Tables[0].Rows.Count > 0 && Convert.ToString(ds_sub.Tables[0].Rows[0][0]) != "")
{
Avg_EsNo = Convert.ToString(ds_sub.Tables[0].Rows[0][0]);
}
else
{
Avg_EsNo = "-1";
}
ds_sub = new DataSet();
ds_sub = getDataSet("select Top 1 Transmit_Power from ProMonSys_Threshold where Serial_No = '" + Serial_No + "'").Copy();
if (ds_sub.Tables[0].Rows.Count > 0 && Convert.ToString(ds_sub.Tables[0].Rows[0][0]) != "")
{
Threshold_EsNo = Convert.ToString(ds_sub.Tables[0].Rows[0][0]);
}
else
{
Threshold_EsNo = "-1";
}
getGrade = EsNoSQFGrading(Min_EsNo, Avg_EsNo, Threshold_EsNo);
strCmd.Append("insert into ProMonSys_Grading(SiteCode, Serial_No, EsNo_Grade) " +
"values('" + SiteCode + "','" + Serial_No + "','" + getGrade + "')");
}
}
execDML_StringBuilder(strCmd);
}
Find out, what part of the process is the expensive one. Use StopWatch to check how long loading, calculating and saving takes separately. Then you which part to improve (and can tell us).