how to storing lat/lon info that read from mobile gps into sqlite database? - android-sqlite

i build app that read coordinates of current location (every 10-20 meters) when move mobile in streets and displays current lat/lon, now i would like to develop app that save (storing) these lat/lon into a sqlite database. Any help (guide) as starting point would be appreciated.

First you would need to design the database schema. Storing just lat/lon would probably be pretty useless. You say every 10-20 meters, so there is something else, perhaps the time that makes the lat/lon useful.
So assuming that time is the factor that in conjuntion with the lat/lon makes it useful. Then per row you would want to store the lat, lon and time.
So that would be 3 columns in your schema, as the rowid (a special column) is included in all tables (with some less frequently used exceptions). Then 4 columns (the rowid uniquely identifies a row).
So your schema could be :-
a column for storing the rowid, but just in case the standard android BaseColumns will be used which equates to _ID, the value stored will be a unique integer (long) and it will be the PRIMARY KEY (which implies a unique column).
a column for the lat double should suffice
likewise for lon
the date/time can be stored as a long
Th next stage is to do all the database handling stuff. This is greatly simplified by extending the SQLiteOpenHelper class.
So you can have a class that does this. That class MUST implement two overiding methods onCreate and onUpgrade.
onCreate is invoked when the database is created and is where you create the table(s) and perhaps other components (i.e. setting up the schema)
Noting that this is once for the lifetime of the database
onUpgrade is for handling when the database version is changed and is beyond the scope of this answer. It will just do nothing.
You probably want to minimise the opening and closing of the database as both can be relatively resource hungry. So a singleton approach without any closing of the database is recommended.
You may wish to include the methods that access the database (the CRUD) in the class.
So putting all the above together you could have, for example the class DBHelper as :-
class DBHelper extends SQLiteOpenHelper {
public static final String DATABASE_NAME = "the_database.db";
public static final int DATABASE_VERSION = 1;
public static final String TABLE_NAME_LATLON = "_latlon";
public static final String COL_NAME_LATLON_ID = BaseColumns._ID;
public static final String COL_NAME_LATLON_LAT = "_lat";
public static final String COl_NAME_LATLON_LON = "_lon";
public static final String COL_NAME_TIMESTAMP = "_timestamp";
private static final String TABLE_LATLON_CRTSQL = "CREATE TABLE IF NOT EXISTS "
+ TABLE_NAME_LATLON + "("
+ COL_NAME_LATLON_ID + " INTEGER PRIMARY KEY"
+ "," + COL_NAME_LATLON_LAT + " REAL "
+ "," + COl_NAME_LATLON_LON + " REAL "
+ "," + COL_NAME_TIMESTAMP + " INTEGER "
+ ")";
/* protect the constructor from being used elsewhere, thus forcing use of the singleton */
private SQLiteDatabase db;
private DBHelper(Context context) {
super(context,DATABASE_NAME,null,DATABASE_VERSION);
db = this.getWritableDatabase();
}
/* Use a singleton approach */
private volatile static DBHelper INSTANCE=null;
public static DBHelper getInstance(Context context) {
if (INSTANCE==null) {
INSTANCE = new DBHelper(context);
}
return INSTANCE;
}
#Override /* REQUIRED - ****NOTE ONLY EVER RUNS ONCE FOR THE LIFETIME OF THE DATABASE**** */
public void onCreate(SQLiteDatabase db) {
db.execSQL(TABLE_LATLON_CRTSQL);
}
#Override /* REQUIRED (but doesn't have to do anything)*/
public void onUpgrade(SQLiteDatabase db, int i, int i1) {
}
/* Full Insert method */
public long insertLatLonRow(Long id,double lat, double lon,Long timestamp) {
ContentValues cv = new ContentValues();
if (id != null && id > -1) {
cv.put(COL_NAME_LATLON_ID,id);
}
cv.put(COL_NAME_LATLON_LAT,lat);
cv.put(COl_NAME_LATLON_LON,lon);
if (timestamp == null) {
timestamp = System.currentTimeMillis() / 1000;
}
cv.put(COL_NAME_TIMESTAMP,timestamp);
return db.insert(TABLE_NAME_LATLON,null,cv);
}
/* Partial Insert method lat,lon and timestamp */
public long insertLatLonRow(double lat, double lon, long timestamp) {
return insertLatLonRow(null, lat,lon,timestamp);
}
/* Partial Insert method (calls full method) */
public long insertLatLonRow(double lat, double lon) {
return insertLatLonRow(null,lat,lon,null);
}
/* Extract data */
public Cursor getLatLonsForAPeriod(long start_timestamp, long end_timestamp) {
return db.query(
TABLE_NAME_LATLON, /* The table that the rows will be SELECTed from */
null /* ALL COLUMNS */,
COL_NAME_TIMESTAMP + " BETWEEN ? AND ?", /* The WHERE clause (less the WHERE keyword) note that ?'s will be bound (replaced)*/
new String[]{String.valueOf(start_timestamp),String.valueOf(end_timestamp)}, /* The values to be bound first replaces the first?, 2nd the 2nd .... */
null, /* no GROUP BY clause*/
null, /* no HAVING clause */
COL_NAME_TIMESTAMP /* ORDER BY clause */
);
}
}
The DBHelper will do nothing on it's own so here's an example of utilising it. This will insert some data (2 rows) and then extract the data and write it to the log:-
public class MainActivity extends AppCompatActivity {
DBHelper mDBHelper;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
mDBHelper = DBHelper.getInstance(this);
/* Add some test data */
mDBHelper.insertLatLonRow(10.1,100.5678);
mDBHelper.insertLatLonRow(null,11.1,11.6789,(System.currentTimeMillis() / 1000) - (60 * 60)); /* 1 hour before */
/* Extract the data */
Cursor csr = mDBHelper.getLatLonsForAPeriod((System.currentTimeMillis() / 1000) - (10 * 60 * 60),System.currentTimeMillis() / 1000 + (60 * 60));
int id_offset = csr.getColumnIndex(DBHelper.COL_NAME_LATLON_ID);
int lat_offset = csr.getColumnIndex(DBHelper.COL_NAME_LATLON_LAT);
int lon_offset = csr.getColumnIndex(DBHelper.COl_NAME_LATLON_LON);
int ts_offset = csr.getColumnIndex(DBHelper.COL_NAME_TIMESTAMP);
while (csr.moveToNext()) {
Log.d(
"LATLONINFO",
"ID is " + csr.getString(id_offset)
+ " LAT is " + csr.getDouble(lat_offset)
+ " LON is " + csr.getDouble(lon_offset)
+ " TimeStamp is " + csr.getLong(ts_offset)
);
}
}
}
Result :-
When run 2 rows will be inserted the 2nd row have a time 1 hour before the first. Then the data is extracted (for rows between 10 hours ago and 1 hour in the future (all rows as both fir this criteria)). However, as they are ORDERED according to the timestamp (in ascending order (the default)) the 2nd row is shown first.
The log thus includes (as expected):-
D/LATLONINFO: ID is 2 LAT is 11.1 LON is 11.6789 TimeStamp is 1656841299
D/LATLONINFO: ID is 1 LAT is 10.1 LON is 100.5678 TimeStamp is 1656844899

Related

processing data before presentation

I have dataset (from JSON source) with cumulative values. It looks like this:
Could I extract from this dataset delta from last hour or last day (for example, count from 0 since last midnight?)
What you are asking about falls squarely in the realm of process data as it usually comes from control systems aka process controls systems. There may be DCS (Distributed Control Systems) or SCADA out in the field that act as a focal point on receiving data. And there may be a process historian or time-series database for accessing that data, if not on an enterprise level at least not within the process controls network.
Much of the engineering associated with process data has been established for many, many decades. For my examples, I did not want to write too many custom classes so I will use some everyday .NET objects. However, I am adhering to 2 such well-regarded principles about process data:
All times will be in UTC. Usually one does not show the UtcTime until the very last moment when displaying to a local user.
Process Data acknowledges the Quality of a value. While there can be dozens of bad states associated with such Quality, I will use a simple binary approach of good or bad. Since I use double, a value is good as long as it is not double.NaN.
That said, I assume you have a class that looks similar to:
public class JsonDto
{
public string Id { get; set; }
public DateTime Time { get; set; }
public double value { get; set; }
}
Granted your class name may be different, but the main thing is this class holds an individual instance of process data. When you read a JSON file, it will produce a List<jsonDto> instance.
You will need lots of methods to transform the data to something a wee bit more useable in order to get to where the rubber finally meets the road: producing hourly differences. But that requires producing hourly values because there is no guarantee that your recorded values occur exactly on each hour.
ProcessData Class - lots of methods
public static class ProcessData
{
public enum CalculationTimeBasis { Auto = 0, EarliestTime, MostRecentTime, MidpointTime }
public static Dictionary<string, SortedList<DateTime, double>> GetTagTimedValuesMap(IEnumerable<JsonDto> jsonDto)
{
var map = new Dictionary<string, SortedList<DateTime, double>>();
var tagnames = jsonDto.Select(x => x.Id).Distinct().OrderBy(x => x);
foreach (var tagname in tagnames)
{
map.Add(tagname, new SortedList<DateTime, double>());
}
var orderedValues = jsonDto.OrderBy(x => x.Id).ThenBy(x => x.Time.ToUtcTime());
foreach (var item in orderedValues)
{
map[item.Id].Add(item.Time.ToUtcTime(), item.value);
}
return map;
}
public static DateTimeKind UnspecifiedDefaultsTo { get; set; } = DateTimeKind.Utc;
public static DateTime ToUtcTime(this DateTime value)
{
// Unlike ToUniversalTime(), this method assumes any Unspecified Kind may be Utc or Local.
if (value.Kind == DateTimeKind.Unspecified)
{
if (UnspecifiedDefaultsTo == DateTimeKind.Utc)
{
value = DateTime.SpecifyKind(value, DateTimeKind.Utc);
}
else if (UnspecifiedDefaultsTo == DateTimeKind.Local)
{
value = DateTime.SpecifyKind(value, DateTimeKind.Local);
}
}
return value.ToUniversalTime();
}
private static DateTime TruncateTime(this DateTime value, TimeSpan interval) => new DateTime(TruncateTicks(value.Ticks, interval.Ticks)).ToUtcTime();
private static long TruncateTicks(long ticks, long interval) => (interval == 0) ? ticks : (ticks / interval) * interval;
public static SortedList<DateTime, double> GetInterpolatedValues(SortedList<DateTime, double> recordedValues, TimeSpan interval)
{
if (interval <= TimeSpan.Zero)
{
throw new ArgumentOutOfRangeException($"{nameof(interval)} TimeSpan must be greater than zero");
}
var interpolatedValues = new SortedList<DateTime, double>();
var previous = recordedValues.First();
var intervalTimestamp = previous.Key.TruncateTime(interval);
foreach (var current in recordedValues)
{
if (current.Key == intervalTimestamp)
{
// It's easy when the current recorded value aligns perfectly on the desired interval.
interpolatedValues.Add(current.Key, current.Value);
intervalTimestamp += interval;
}
else if (current.Key > intervalTimestamp)
{
// We do not exactly align at the desired time, so we must interpolate
// between the "last recorded data" BEFORE the desired time (i.e. previous)
// and the "first recorded data" AFTER the desired time (i.e. current).
var interpolatedValue = GetInterpolatedValue(intervalTimestamp, previous, current);
interpolatedValues.Add(interpolatedValue.Key, interpolatedValue.Value);
intervalTimestamp += interval;
}
previous = current;
}
return interpolatedValues;
}
private static KeyValuePair<DateTime, double> GetInterpolatedValue(DateTime interpolatedTime, KeyValuePair<DateTime, double> left, KeyValuePair<DateTime, double> right)
{
if (!double.IsNaN(left.Value) && !double.IsNaN(right.Value))
{
double totalDuration = (right.Key - left.Key).TotalSeconds;
if (Math.Abs(totalDuration) > double.Epsilon)
{
double partialDuration = (interpolatedTime - left.Key).TotalSeconds;
double factor = partialDuration / totalDuration;
double calculation = left.Value + ((right.Value - left.Value) * factor);
return new KeyValuePair<DateTime, double>(interpolatedTime, calculation);
}
}
return new KeyValuePair<DateTime, double>(interpolatedTime, double.NaN);
}
public static SortedList<DateTime, double> GetDeltaValues(SortedList<DateTime, double> values, CalculationTimeBasis timeBasis = CalculationTimeBasis.Auto)
{
const CalculationTimeBasis autoDefaultsTo = CalculationTimeBasis.MostRecentTime;
var deltas = new SortedList<DateTime, double>(capacity: values.Count);
var previous = values.First();
foreach (var current in values.Skip(1))
{
var time = GetTimeForBasis(timeBasis, previous.Key, current.Key, autoDefaultsTo);
var diff = current.Value - previous.Value;
deltas.Add(time, diff);
previous = current;
}
return deltas;
}
private static DateTime GetTimeForBasis(CalculationTimeBasis timeBasis, DateTime earliestTime, DateTime mostRecentTime, CalculationTimeBasis autoDefaultsTo)
{
if (timeBasis == CalculationTimeBasis.Auto)
{
// Different (future) methods calling this may require different interpretations of Auto.
// Thus we leave it to the calling method to declare what Auto means to it.
timeBasis = autoDefaultsTo;
}
switch (timeBasis)
{
case CalculationTimeBasis.EarliestTime:
return earliestTime;
case CalculationTimeBasis.MidpointTime:
return new DateTime((earliestTime.Ticks + mostRecentTime.Ticks) / 2L).ToUtcTime();
case CalculationTimeBasis.MostRecentTime:
return mostRecentTime;
case CalculationTimeBasis.Auto:
default:
return earliestTime;
}
}
}
Usage Example
var inputValues = new List<JsonDto>();
// TODO: Magically populate inputValues
var tagDataMap = ProcessData.GetTagTimedValuesMap(inputValues);
foreach (var item in tagDataMap)
{
// Following would generate hourly differences for the one Tag Id (item.Key)
// by first generating hourly data, and then finding the delta of that.
var hourlyValues = ProcessData.GetInterpolatedValues(item.Value, TimeSpan.FromHours(1));
// Consider the difference between Hour(1) and Hour(2).
// That is, 2 input values will create 1 output value.
// Now you must decide which of the 2 input times you use for the 1 output time.
// This is what I call the CalculationTimeBasis.
// The time basis used will be Auto, which defaults to the most recent for this particular method, e.g. Hour(2)
var deltaValues = ProcessData.GetDeltaValues(hourlyValues);
// Same as above except we explicitly state we want the most recent time, e.g. also Hour(2)
var deltaValues2 = ProcessData.GetDeltaValues(hourlyValues, ProcessData.CalculationTimeBasis.MostRecentTime);
// Here the calculated differences are the same except the now
// timestamp now reflects the earliest time, e.g. Hour(1)
var deltaValues3 = ProcessData.GetDeltaValues(hourlyValues, ProcessData.CalculationTimeBasis.EarliestTime);

How to optimize SQL query in Anylogic

I am generating Agents with parameter values coming from SQL table in Anylogic. when agent is generated at source I am doing a v look up in table and extracting corresponding values from table. For now it is working perfectly but it is slowing down the performance.
Structure of Table looks like this
I am querying the data from this table with below code
double value_1 = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.avg_value)).get(0);
double value_min = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.min_value)).get(0);
double value_max = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.max_value)).get(0);
// Fetch the cluster number from account table
int cluster_num = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.cluster)).get(0);
int act_no = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.actno)).get(0);
String pay_term = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.pay_term)).get(0);
String pay_term_prob = (selectFrom(account_details)
.where(account_details.act_code.eq(z))
.list(account_details.pay_term_prob)).get(0);
But this is very slow and wants to improve the performance. someone mentioned that we can create a Java class and then add the table into collection . Is there any example where I can refer. I am finding it difficult to put entire code.
I have created a class using below code:
public class Customer {
private String act_code;
private int actno;
private double avg_value;
private String pay_term;
private String pay_term_prob;
private int cluster;
private double min_value;
private double max_value;
public String getact_code() {
return act_code;
}
public void setact_code(String act_code) {
this.act_code = act_code;
}
public int getactno() {
return actno;
}
public void setactno(int actno) {
this.actno = actno;
}
public double getavg_value() {
return avg_value;
}
public void setavg_value(double avg_value) {
this.avg_value = avg_value;
}
public String getpay_term() {
return pay_term;
}
public void setpay_term(String pay_term) {
this.pay_term = pay_term;
}
public String getpay_term_prob() {
return pay_term_prob;
}
public void setpay_term_prob(String pay_term_prob) {
this.pay_term_prob = pay_term_prob;
}
public int cluster() {
return cluster;
}
public void setcluster(int cluster) {
this.cluster = cluster;
}
public double getmin_value() {
return min_value;
}
public void setmin_value(double min_value) {
this.min_value = min_value;
}
public double getmax_value() {
return max_value;
}
public void setmax_value(double max_value) {
this.max_value = max_value;
}
}
Created collection object like this:
Pls provide an reference to add this database table into collection as a next step. then I want to query the collection based on the condition
You are on the right track here!
Every time you access the database to read data there is a computational overhead. So the best option is to access the database only once, at the start of the model. Create all the objects you need, store other data you will need later into Java classes, and then use the Java classes.
My suggestion is to create a Java class for each row in your table, like you have done. And then create a map object - like you have done, but with the key as String and the value as this new object.
Then on model start you can populate this map as follows:
List<Tuple> rows = selectFrom(customer).list();
for (Tuple row : rows) {
Customer customerData = new Customer(
row.get( customer.act_code ),
row.get( customer.actno ),
row.get( customer.avg_value )
);
mapOfCustomerData.put(customerData.act_code, customerData);
}
Where mapOfCustomerData is a linkedHashMap and customer is the name of the table
See the model created in this blog post for more details and an example on using a scenario object to store all the data from the Database in a separate object
Note: The code above is just an example - read this blog post for more details on using the AnyLogic INternal Database
Before using Java classes, try this first: click the "index" tickbox for all columns that you query with a WHERE clause.

How to do Geofence monitoring/analytics using KSQLDB?

I am trying to do geofence monitoring/analytics using KSQLDB. I want to get a message whenever a vehicle ENTERS/LEAVES a geofence. Taking inspiration from the [https://github.com/gschmutz/various-demos/tree/master/kafka-geofencing] I have created a UDF named as GEOFENCE, below is the code for the same.
Below is my query to perform join on geofence stream and live vehicle position stream
CREATE stream join_live_pos_geofence_status_1 AS SELECT lp1.vehicleid,
lp1.lat,
lp1.lon,
s1p.geofencecoordinates,
Geofence(lp1.lat, lp1.lon, 'POLYGON(('+s1p.geofencecoordinates+'))') AS geofence_status
FROM live_position_1 LP1
LEFT JOIN stream_1_processed S1P within 72 hours
ON kmdlp1.clusterid = kmds1p.clusterid emit changes;
I am taking into account all the geofences created in last 3 days.
I have created another query to use the geofence status from previous query to calculate whether the vehicle is ENTERING/LEAVING geofence.
CREATE stream join_geofence_monitoring_1 AS SELECT *,
Geofence(jlpgs1.lat, jlpgs1.lon, 'POLYGON(('+jlpgs1.geofencecoordinates+'))', jlpgs1.geofence_status) geofence_monitoring_status
FROM join_live_pos_geofence_status_1 JLPGS1 emit changes;
The above query give me the output as 'INSIDE', 'INSIDE' for geofence_status and geofence_monitoring_status columns, respectively or the output is 'OUTSIDE', 'OUTSIDE' for geofence_status and geofence_monitoring_status columns, respectively. I know I am not taking into account the time aspect, like these 2 queries should never be executed at same time say 't0' but I am not able to think the correct way of doing this.
public class Geofence
{
private static final String OUTSIDE = "OUTSIDE";
private static final String INSIDE = "INSIDE";
private static GeometryFactory geometryFactory = JTSFactoryFinder.getGeometryFactory();
private static WKTReader wktReader = new WKTReader(geometryFactory);
#Udf(description = "Returns whether a coordinate lies within a polygon or not")
public static String geofence(final double latitude, final double longitude, String geometryWKT) {
boolean status = false;
String result = "";
Polygon polygon = null;
try {
polygon = (Polygon) wktReader.read(geometryWKT);
// However, an important point to note is that the longitude is the X value
// and the latitude the Y value. So we say "lat/long",
// but JTS will expect it in the order "long/lat".
Coordinate coord = new Coordinate(longitude, latitude);
Point point = geometryFactory.createPoint(coord);
status = point.within(polygon);
if(status)
{
result = INSIDE;
}
else
{
result = OUTSIDE;
}
} catch (ParseException e) {
throw new RuntimeException(e.getMessage());
}
return result;
}
#Udf(description = "Returns whether a coordinate moved in or out of a polygon")
public static String geofence(final double latitude, final double longitude, String geometryWKT, final String statusBefore) {
String status = geofence(latitude, longitude, geometryWKT);
if (statusBefore.equals("INSIDE") && status.equals("OUTSIDE")) {
//status = "LEAVING";
return "LEAVING";
} else if (statusBefore.equals("OUTSIDE") && status.equals("INSIDE")) {
//status = "ENTERING";
return "ENTERING";
}
return status;
}
}
My question is how can I calculate correctly that a vehicle is ENTERING/LEAVING a geofence? Is it even possible to do with KSQLDB?
Would it be correct to say that the join_live_pos_geofence_status_1 stream can have rows that go from INSIDE -> OUTSIDE and then from OUTSIDE -> INSIDE for some key value?
And what you're wanting to do is to output LEAVING and ENTERING events for these transitions?
You can likely do what you want using a custom UDAF. Custom UDAFs take and input and calculate an output, via some intermediate state. For example, an AVG udaf would take some numbers as input, its intermediate state would be the number of inputs and the sum of inputs, and the output would be count/sum.
In your case, the input would be the current state, e.g. either INSIDE or OUTSIDE. The UDAF would need to store the last two states in its intermediate state, and then the output state can be calculated from this. E.g.
Input Intermediate Output
INSIDE INSIDE <only single in intermediate - your choice what you output>
INSIDE INSIDE,INSIDE no-change
OUTSIDE INSIDE,OUTSIDE LEAVING
OUTSIDE OUTSIDE,OUTSIDE no-change
INSIDE OUTSIDE,INSIDE ENTERING
You'll need to decide what to output when there is only a single entry in the intermediate state, i.e. the first time a key is seen.
You can then filter the output to remove any rows that have no-change.
You may also need to set cache.max.bytes.buffering to zero to stop any results being conflated.
UPDATE: suggested code.
Not tested, but something like the following code may do what you want:
#UdafDescription(name = "my_geofence", description = "Computes the geofence status.")
public final class GoeFenceUdaf {
private static final String STATUS_1 = "STATUS_1";
private static final String STATUS_2 = "STATUS_2";
#UdafFactory(description = "Computes the geofence status.",
aggregateSchema = "STRUCT<" + STATUS_1 + " STRING, " + STATUS_2 + " STRING>")
public static Udaf<String, Struct, String> calcGeoFenceStatus() {
final Schema STRUCT_SCHEMA = SchemaBuilder.struct().optional()
.field(STATUS_1, Schema.OPTIONAL_STRING_SCHEMA)
.field(STATUS_2, Schema.OPTIONAL_STRING_SCHEMA)
.build();
return new Udaf<String, Struct, String>() {
#Override
public Struct initialize() {
return new Struct(STRUCT_SCHEMA);
}
#Override
public Struct aggregate(
final String newValue,
final Struct aggregate
) {
if (newValue == null) {
return aggregate;
}
if (aggregate.getString(STATUS_1) == null) {
// First status for this key:
return aggregate
.put(STATUS_1, newValue);
}
final String lastStatus = aggregate.getString(STATUS_2);
if (lastStatus == null) {
// Second status for this key:
return aggregate
.put(STATUS_2, newValue);
}
// Third and subsequent status for this key:
return aggregate
.put(STATUS_1, lastStatus)
.put(STATUS_2, newValue);
}
#Override
public String map(final Struct aggregate) {
final String previousStatus = aggregate.getString(STATUS_1);
final String currentStatus = aggregate.getString(STATUS_2);
if (currentStatus == null) {
// Only have single status, i.e. first status for this key
// What to do? Probably want to do:
return previousStatus.equalsIgnoreCase("OUTSIDE")
? "LEAVING"
: "ENTERING";
}
// Two statuses ...
if (currentStatus.equals(previousStatus)) {
return "NO CHANGE";
}
return previousStatus.equalsIgnoreCase("OUTSIDE")
? "ENTERING"
: "LEAVING";
}
#Override
public Struct merge(final Struct agg1, final Struct agg2) {
throw new RuntimeException("Function does not support session windows");
}
};
}
}

Apache Flink: LEFT JOIN with a TableFunction does not return expected result

Flink version: 1.3.1
I created two tables, one is from memory, another is from UDTF. When I tested join and left join, they returned the same result. What I expected was left join had more rows than join.
My test code is this:
public class ExerciseUDF {
public static void main(String[] args) throws Exception {
test_3();
}
public static void test_3() throws Exception {
// 1. set up execution environment
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env);
DataSet<WC> input = env.fromElements(
new WC("Hello", 1),
new WC("Ciao", 1),
new WC("Hello", 1));
// 2. register the DataSet as table "WordCount"
tEnv.registerDataSet("WordCount", input, "word, frequency");
Table table;
DataSet<WC> result;
DataSet<WCUpper> resultUpper;
table = tEnv.scan("WordCount");
// 3. table left join user defined table
System.out.println("table left join user defined table");
tEnv.registerFunction("myTableUpperFunc",new MyTableFunc_2());
table = tEnv.sql("SELECT S.word as word, S.frequency as frequency, S.word as myupper FROM WordCount as S left join LATERAL TABLE(myTableUpperFunc(S.word)) as T(word,myupper) on S.word = T.word");
resultUpper = tEnv.toDataSet(table, WCUpper.class);
resultUpper.print(); // out put —— WCUpper Ciao 1 CIAO, however, without the row having Hello
// 4. table join user defined table
System.out.println("table join user defined table");
tEnv.registerFunction("myTableUpperFunc",new MyTableFunc_2());
table = tEnv.scan("WordCount");
table = tEnv.sql("SELECT S.word as word, S.frequency as frequency, T.myupper as myupper FROM WordCount as S join LATERAL TABLE(myTableUpperFunc(S.word)) as T(word,myupper) on S.word = T.word"
);
resultUpper = tEnv.toDataSet(table, WCUpper.class);
resultUpper.print();
}
public static class WC {
public String word;
public long frequency;
// public constructor to make it a Flink POJO
public WC() {
}
public WC(String word, long frequency) {
this.word = word;
this.frequency = frequency;
}
#Override
public String toString() {
return "WC " + word + " " + frequency;
}
}
// user defined table function
public static class MyTableFunc_2 extends TableFunction<Tuple2<String,String>>{
public void eval(String str){ // hello --> hello HELLO
System.out.println("upper func executed for "+str);
if(str.equals("Hello")){
return;
}
collect(new Tuple2<String,String>(str,str.toUpperCase()));
// collect(new Tuple2<String,String>(str,str.toUpperCase()));
}
}
}
The output of the left join and join queries are the same. In both cases only one row is returned.
WCUpper Ciao 1 CIAO
However, I think that the left join query should preserve the 'Hello' rows.
Yes, you are right.
This is a bug in the translation of TableFunction outer joins with predicates and needs to be fixed.
Thanks, Fabian

incompatible types found : double

i am trying to write a program, and the rest of the code so far works but i am getting a incompatible types found : double required :Grocery Item in line 38. Can anyone help me in explaining why I am receiving this error and how to correct it? Thank you. here is my code:
import java.util.Scanner;
public class GroceryList {
private GroceryItem[]groceryArr; //ARRAY HOLDS GROCERY ITEM OBJECTS
private int numItems;
private String date;
private String storeName;
public GroceryList(String inputDate, String inputName) {
//FILL IN CODE HERE
// CREATE ARRAY, INITIALIZE FIELDS
groceryArr = new GroceryItem[10];
numItems = 0;
}
public void load() {
Scanner keyboard = new Scanner(System.in);
double sum = 0;
System.out.println ("Enter the trip date and then hit return:");
date = keyboard.next();
keyboard.nextLine();
System.out.println("Enter the store name and then hit return:");
storeName = keyboard.next();
keyboard.nextLine();
double number = keyboard.nextDouble();
//NEED TO PROMPT USER FOR, AND READ IN THE DATE AND STORE NAME.
System.out.println("Enter each item bought and the price (then return).");
System.out.println("Terminate with an item with a negative price.");
number = keyboard.nextDouble();
while (number >= 0 && numItems < groceryArr.length) {
groceryArr[numItems] = number;
numItems++;
sum += number;
System.out.println("Enter each item bought and the price (then return).");
System.out.println("Terminate with an item with a negative price.");
number = keyboard.nextDouble();
}
/*
//READ IN AND STORE EACH ITEM. STORE NUMBER OF ITEMS
}
private GroceryItem computeTotalCost() {
//add code here
}
public void print() {
\\call computeTOtalCost
}
*/
}
}
"groceryArr[numItems] = number;"
groceryArr[numItems] is an instance of GroceryItem() - 'number' is a double
You need a double variable in your GroceryItem() object to store the 'number' value.