Apache Flink: LEFT JOIN with a TableFunction does not return expected result - left-join

Flink version: 1.3.1
I created two tables, one is from memory, another is from UDTF. When I tested join and left join, they returned the same result. What I expected was left join had more rows than join.
My test code is this:
public class ExerciseUDF {
public static void main(String[] args) throws Exception {
test_3();
}
public static void test_3() throws Exception {
// 1. set up execution environment
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env);
DataSet<WC> input = env.fromElements(
new WC("Hello", 1),
new WC("Ciao", 1),
new WC("Hello", 1));
// 2. register the DataSet as table "WordCount"
tEnv.registerDataSet("WordCount", input, "word, frequency");
Table table;
DataSet<WC> result;
DataSet<WCUpper> resultUpper;
table = tEnv.scan("WordCount");
// 3. table left join user defined table
System.out.println("table left join user defined table");
tEnv.registerFunction("myTableUpperFunc",new MyTableFunc_2());
table = tEnv.sql("SELECT S.word as word, S.frequency as frequency, S.word as myupper FROM WordCount as S left join LATERAL TABLE(myTableUpperFunc(S.word)) as T(word,myupper) on S.word = T.word");
resultUpper = tEnv.toDataSet(table, WCUpper.class);
resultUpper.print(); // out put —— WCUpper Ciao 1 CIAO, however, without the row having Hello
// 4. table join user defined table
System.out.println("table join user defined table");
tEnv.registerFunction("myTableUpperFunc",new MyTableFunc_2());
table = tEnv.scan("WordCount");
table = tEnv.sql("SELECT S.word as word, S.frequency as frequency, T.myupper as myupper FROM WordCount as S join LATERAL TABLE(myTableUpperFunc(S.word)) as T(word,myupper) on S.word = T.word"
);
resultUpper = tEnv.toDataSet(table, WCUpper.class);
resultUpper.print();
}
public static class WC {
public String word;
public long frequency;
// public constructor to make it a Flink POJO
public WC() {
}
public WC(String word, long frequency) {
this.word = word;
this.frequency = frequency;
}
#Override
public String toString() {
return "WC " + word + " " + frequency;
}
}
// user defined table function
public static class MyTableFunc_2 extends TableFunction<Tuple2<String,String>>{
public void eval(String str){ // hello --> hello HELLO
System.out.println("upper func executed for "+str);
if(str.equals("Hello")){
return;
}
collect(new Tuple2<String,String>(str,str.toUpperCase()));
// collect(new Tuple2<String,String>(str,str.toUpperCase()));
}
}
}
The output of the left join and join queries are the same. In both cases only one row is returned.
WCUpper Ciao 1 CIAO
However, I think that the left join query should preserve the 'Hello' rows.

Yes, you are right.
This is a bug in the translation of TableFunction outer joins with predicates and needs to be fixed.
Thanks, Fabian

Related

How to optimize SQL query in Anylogic

I am generating Agents with parameter values coming from SQL table in Anylogic. when agent is generated at source I am doing a v look up in table and extracting corresponding values from table. For now it is working perfectly but it is slowing down the performance.
Structure of Table looks like this
I am querying the data from this table with below code
double value_1 = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.avg_value)).get(0);
double value_min = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.min_value)).get(0);
double value_max = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.max_value)).get(0);
// Fetch the cluster number from account table
int cluster_num = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.cluster)).get(0);
int act_no = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.actno)).get(0);
String pay_term = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.pay_term)).get(0);
String pay_term_prob = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.pay_term_prob)).get(0);
But this is very slow and wants to improve the performance. someone mentioned that we can create a Java class and then add the table into collection . Is there any example where I can refer. I am finding it difficult to put entire code.
I have created a class using below code:
public class Customer {
private String act_code;
private int actno;
private double avg_value;
private String pay_term;
private String pay_term_prob;
private int cluster;
private double min_value;
private double max_value;
public String getact_code() {
return act_code;
}
public void setact_code(String act_code) {
this.act_code = act_code;
}
public int getactno() {
return actno;
}
public void setactno(int actno) {
this.actno = actno;
}
public double getavg_value() {
return avg_value;
}
public void setavg_value(double avg_value) {
this.avg_value = avg_value;
}
public String getpay_term() {
return pay_term;
}
public void setpay_term(String pay_term) {
this.pay_term = pay_term;
}
public String getpay_term_prob() {
return pay_term_prob;
}
public void setpay_term_prob(String pay_term_prob) {
this.pay_term_prob = pay_term_prob;
}
public int cluster() {
return cluster;
}
public void setcluster(int cluster) {
this.cluster = cluster;
}
public double getmin_value() {
return min_value;
}
public void setmin_value(double min_value) {
this.min_value = min_value;
}
public double getmax_value() {
return max_value;
}
public void setmax_value(double max_value) {
this.max_value = max_value;
}
}
Created collection object like this:
Pls provide an reference to add this database table into collection as a next step. then I want to query the collection based on the condition
You are on the right track here!
Every time you access the database to read data there is a computational overhead. So the best option is to access the database only once, at the start of the model. Create all the objects you need, store other data you will need later into Java classes, and then use the Java classes.
My suggestion is to create a Java class for each row in your table, like you have done. And then create a map object - like you have done, but with the key as String and the value as this new object.
Then on model start you can populate this map as follows:
List<Tuple> rows = selectFrom(customer).list();
for (Tuple row : rows) {
Customer customerData = new Customer(
row.get( customer.act_code ),
row.get( customer.actno ),
row.get( customer.avg_value )
);
mapOfCustomerData.put(customerData.act_code, customerData);
}
Where mapOfCustomerData is a linkedHashMap and customer is the name of the table
See the model created in this blog post for more details and an example on using a scenario object to store all the data from the Database in a separate object
Note: The code above is just an example - read this blog post for more details on using the AnyLogic INternal Database
Before using Java classes, try this first: click the "index" tickbox for all columns that you query with a WHERE clause.

Eloquent hasMany with hasMany and a join in the middle

I have this database structure
orders ====► order_items ====► order_item_meta
║ |
║ |
▼ ▼
order_meta products
The relations are orders hasMany order_items which hasManyThrough order_item_meta, orders also hasMany order_meta.
In addition, the order_items/product_id needs to be joined with the products table.
I have the order_id and I am trying to get the whole data in one call. But I have a weird issue. This is the current code:
$orders = Orders::
with([
'order_items' => function($q) { //#1
$q->leftJoin('products','order_items.product_id', '=', 'products.id');
}
])
->with(['order_items.orderitem_meta']) //#2
->with(['order_meta']); //#3
It seems that with#1 and with#2 are interfering with each other.
Case1: If I do with#1+with#3, I am able to see in the result the data from the product table + the data from order_items, but not the data from order_item_meta.
Case2: If I do with#2+with#3, I am able to see in the result the data from the from order_items + data from order_item_meta, but not from the product table.
In both cases data from with#3 is ok.
But if I do all three together (with#1+with#2+with3) I get the same results as case1. data from order_item_meta is missing.
Orders.php
class Orders extends Model
{
public function order_items()
{
return $this->hasMany('App\OrderItem','order_id','id'); //'foreign_key', 'local_key'
}
public function order_meta()
{
return $this->hasMany('App\OrderMeta','order_id','id'); //'foreign_key', 'local_key'
}
public function orderitem_meta()
{
return $this->hasManyThrough(
'App\OrderItem',
'App\OrderItemMeta',
'order_item_id', // Foreign key on order_itemmeta table...
'order_id', // Foreign key on order_item table...
'id', // Local key on order_item table...
'id' // Local key on order_itemmeta table...
);
}
}
OrderItem.php
class OrderItem extends Model
{
public function order()
{
return $this->belongsTo('App\Orders');
}
public function orderitem_meta()
{
return $this->hasMany('App\OrderItemMeta','order_item_id','id'); //'foreign_key', 'local_key'
}
}
OrderItemMeta.php
class OrderItemMeta extends Model
{
protected $table = 'order_itemmeta';
public function orderitem()
{
return $this->belongsTo('App\OrderItem');
}
}
What is the correct way to do this query?
I solved it by adding a relationship between the order_items and the products:
in OrderItem.php
public function product()
{
return $this->hasOne('App\Products','id','product_id'); //'foreign_key', 'local_key'
}
then the query becomes this:
$orders = Orders::
with(['order_items.orderitem_meta','order_items.product','order_meta']);
and it works

DISTINCT ON in JPQL or JPA criteria builder

I have a JPA entity User which contains a field (entity) City. I want to select one page of, for example, 10 users but from different cities.
In SQL I would use something like:
SELECT DISTINCT ON (u.city_id) u.username ,u.email, u.city_id ....
FROM user u LIMIT 0,10 ....
but I need to do it with JPQL or JPA criteria builder. How can I achieve this?
Recently I came across same situation, found that there is no direct way using criteria query to support it.
Here is my solution -
Create custom sql function for distinct on
register function to dialect
Update dialect in properties
call it from criteria query
1) Create Custom function
public class DistinctOn implements SQLFunction {
#Override
public boolean hasArguments() {
return true;
}
#Override
public boolean hasParenthesesIfNoArguments() {
return true;
}
#Override
public Type getReturnType(Type type, Mapping mapping) throws QueryException {
return StandardBasicTypes.STRING;
}
#Override
public String render(Type type, List arguments, SessionFactoryImplementor sessionFactoryImplementor) throws QueryException {
if (arguments.size() == 0) {
throw new QueryException("distinct on should have at least one argument");
}
String commaSeparatedArgs = String.join(",",arguments);
return " DISTINCT ON( " + commaSeparatedArgs + ") " + arguments.get(0) + " ";
}
}
2) Register Function
public class CustomPostgresSqlDialect extends PostgreSQLDialect {
public CustomPostgresSqlDialect() {
super();
registerFunction("DISTINCT_ON", new DistinctOn());
}
}
3) Update Dialect :
Here pass on your class name
spring.jpa.properties.hibernate.dialect = com.harshal.common.CustomPostgresSqlDialect
4) Use it in Criteria Query
CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<User> query = cb.createQuery(User.class);
Root<User> user = query.from(User.class);
// SELECT DISTINCT ON (u.city_id) u.username
query.multiselect(
cb.function("DISTINCT_ON", String.class, user.get("city")),
user.get("userName")
);
return em.createQuery(query).getResultList();
You can do this by using Hibernate Criteria Query
sample code can be like this
Criteria criteria = session.createCriteria(user.class);
ProjectionList projectionList = Projections.projectionList();
projectionList.add(Projections.distinct(projectionList.add(Projections.property("city_id"), "cityId")));
projectionList.add(Projections.property("username"), "username");
projectionList.add(Projections.property("email"), "email");
criteria.setProjection(projectionList2);
criteria.setResultTransformer(Transformers.aliasToBean(user.class));
List list = criteria.list();

How do I execute "select distinct ename from emp" using GreenDao

How do I execute "select distinct ename from emp" using GreenDao
I am trying to get distinct values of a column of sqlite DB using GreenDao. How do I do it? Any help appreciated.
You have to use a raw query for example like this:
private static final String SQL_DISTINCT_ENAME = "SELECT DISTINCT "+EmpDao.Properties.EName.columnName+" FROM "+EmpDao.TABLENAME;
public static List<String> listEName(DaoSession session) {
ArrayList<String> result = new ArrayList<String>();
Cursor c = session.getDatabase().rawQuery(SQL_DISTINCT_ENAME, null);
try{
if (c.moveToFirst()) {
do {
result.add(c.getString(0));
} while (c.moveToNext());
}
} finally {
c.close();
}
return result;
}
Of course you can add some filter-criteria to the query as well.
The static String SQL_DISTINCT_ENAME is used for performance, so that the query string doesn't have to be built every time.
EmpDao.Properties and EmpDao.TABLENAME is used to always have the exact column-names and table-names as they are generated by greendao.

Build condition with date for jpa2 typesafe query

I have following query:
SELECT DISTINCT *
FROM Projekt p
WHERE p.bewilligungsdatum = to_date('01-07-2000', 'dd-mm-yyyy')
but i have problems to build the conditions. Here my code:
condition = criteriaBuilder.equal((Expression<String>) projekt.get(criterion), "to_date('" + projektSearchField + "', 'dd-mm-yyyy')");
this generate following:
SELECT DISTINCT *
FROM Projekt p
WHERE p.bewilligungsdatum = 'to_date('01-07-2000', 'dd-mm-yyyy')'
and ufcorse doesn't work. Which method should i use for date comparision (or how to remove the outer ' chars in the pattern part)?
why don't you try to work with parameters like that. Then you can do the String->Date conversion in java and pass a real java.util.Date to the database.
EntityManager em; // initialized somewhere
Date datum; // initialized somewhere
...
String queryString = "SELECT p "
+ "FROM Projekt p"
+ "WHERE p.bewilligungsdatum = :datum";
Query query = em.createQuery(queryString)
query.setParameter("datum", datum);
List<Projekt> projekte = query.getResultList();
This is the way to stay DB independent because your are not using the specific to_date function
viele Grüße aus Bremen ;o)
This should work too, by passing a date as parameter of a restriction
Date datum; // initialized somewhere
CriteriaQuery query = ...
query.add(Restrictions.eq( "bewilligungsdatum ", datum );
...
Sorry. I had the hibernate CriteriaQuery in mind.
Then try via the CriteriaBuilder somthing like that
Date datum; // initialized somewhere
...
final CriteriaQuery<Projekt> query = criteriaBuilder.createQuery(Projekt.class);
final Root<Projekt> projekt = query.from(Projekt.class);
Predicate condition = criteriaBuilder.equals(projekt.get("bewilligungsdatum"),datum);
query.where(condition)
I did not use this before, so have a try on your own
you can use https://openhms.sourceforge.io/sqlbuilder/ ,then use the Condition like
Object value1 = hire_date
Object value2 = new CustomObj("to_date('2018-12-01 00:00:00','yyyy-MM-dd HH:mm:ss')")
//CustomObj
public class CustomObj extends Expression {
private Object _value;
public CustomObj(Object value) {
_value = value;
}
#Override
public boolean hasParens() {
return false;
}
#Override
protected void collectSchemaObjects(ValidationContext vContext) {
}
#Override
public void appendTo(AppendableExt app) throws IOException {
app.append(_value);
}
}
BinaryCondition.greaterThan(value1, value2, inclusive);
the sql like hire_date >= to_date('2011-02-28 00:00:00','yyyy-MM-dd HH:mm:ss'))