I have dataset (from JSON source) with cumulative values. It looks like this:
Could I extract from this dataset delta from last hour or last day (for example, count from 0 since last midnight?)
What you are asking about falls squarely in the realm of process data as it usually comes from control systems aka process controls systems. There may be DCS (Distributed Control Systems) or SCADA out in the field that act as a focal point on receiving data. And there may be a process historian or time-series database for accessing that data, if not on an enterprise level at least not within the process controls network.
Much of the engineering associated with process data has been established for many, many decades. For my examples, I did not want to write too many custom classes so I will use some everyday .NET objects. However, I am adhering to 2 such well-regarded principles about process data:
All times will be in UTC. Usually one does not show the UtcTime until the very last moment when displaying to a local user.
Process Data acknowledges the Quality of a value. While there can be dozens of bad states associated with such Quality, I will use a simple binary approach of good or bad. Since I use double, a value is good as long as it is not double.NaN.
That said, I assume you have a class that looks similar to:
public class JsonDto
{
public string Id { get; set; }
public DateTime Time { get; set; }
public double value { get; set; }
}
Granted your class name may be different, but the main thing is this class holds an individual instance of process data. When you read a JSON file, it will produce a List<jsonDto> instance.
You will need lots of methods to transform the data to something a wee bit more useable in order to get to where the rubber finally meets the road: producing hourly differences. But that requires producing hourly values because there is no guarantee that your recorded values occur exactly on each hour.
ProcessData Class - lots of methods
public static class ProcessData
{
public enum CalculationTimeBasis { Auto = 0, EarliestTime, MostRecentTime, MidpointTime }
public static Dictionary<string, SortedList<DateTime, double>> GetTagTimedValuesMap(IEnumerable<JsonDto> jsonDto)
{
var map = new Dictionary<string, SortedList<DateTime, double>>();
var tagnames = jsonDto.Select(x => x.Id).Distinct().OrderBy(x => x);
foreach (var tagname in tagnames)
{
map.Add(tagname, new SortedList<DateTime, double>());
}
var orderedValues = jsonDto.OrderBy(x => x.Id).ThenBy(x => x.Time.ToUtcTime());
foreach (var item in orderedValues)
{
map[item.Id].Add(item.Time.ToUtcTime(), item.value);
}
return map;
}
public static DateTimeKind UnspecifiedDefaultsTo { get; set; } = DateTimeKind.Utc;
public static DateTime ToUtcTime(this DateTime value)
{
// Unlike ToUniversalTime(), this method assumes any Unspecified Kind may be Utc or Local.
if (value.Kind == DateTimeKind.Unspecified)
{
if (UnspecifiedDefaultsTo == DateTimeKind.Utc)
{
value = DateTime.SpecifyKind(value, DateTimeKind.Utc);
}
else if (UnspecifiedDefaultsTo == DateTimeKind.Local)
{
value = DateTime.SpecifyKind(value, DateTimeKind.Local);
}
}
return value.ToUniversalTime();
}
private static DateTime TruncateTime(this DateTime value, TimeSpan interval) => new DateTime(TruncateTicks(value.Ticks, interval.Ticks)).ToUtcTime();
private static long TruncateTicks(long ticks, long interval) => (interval == 0) ? ticks : (ticks / interval) * interval;
public static SortedList<DateTime, double> GetInterpolatedValues(SortedList<DateTime, double> recordedValues, TimeSpan interval)
{
if (interval <= TimeSpan.Zero)
{
throw new ArgumentOutOfRangeException($"{nameof(interval)} TimeSpan must be greater than zero");
}
var interpolatedValues = new SortedList<DateTime, double>();
var previous = recordedValues.First();
var intervalTimestamp = previous.Key.TruncateTime(interval);
foreach (var current in recordedValues)
{
if (current.Key == intervalTimestamp)
{
// It's easy when the current recorded value aligns perfectly on the desired interval.
interpolatedValues.Add(current.Key, current.Value);
intervalTimestamp += interval;
}
else if (current.Key > intervalTimestamp)
{
// We do not exactly align at the desired time, so we must interpolate
// between the "last recorded data" BEFORE the desired time (i.e. previous)
// and the "first recorded data" AFTER the desired time (i.e. current).
var interpolatedValue = GetInterpolatedValue(intervalTimestamp, previous, current);
interpolatedValues.Add(interpolatedValue.Key, interpolatedValue.Value);
intervalTimestamp += interval;
}
previous = current;
}
return interpolatedValues;
}
private static KeyValuePair<DateTime, double> GetInterpolatedValue(DateTime interpolatedTime, KeyValuePair<DateTime, double> left, KeyValuePair<DateTime, double> right)
{
if (!double.IsNaN(left.Value) && !double.IsNaN(right.Value))
{
double totalDuration = (right.Key - left.Key).TotalSeconds;
if (Math.Abs(totalDuration) > double.Epsilon)
{
double partialDuration = (interpolatedTime - left.Key).TotalSeconds;
double factor = partialDuration / totalDuration;
double calculation = left.Value + ((right.Value - left.Value) * factor);
return new KeyValuePair<DateTime, double>(interpolatedTime, calculation);
}
}
return new KeyValuePair<DateTime, double>(interpolatedTime, double.NaN);
}
public static SortedList<DateTime, double> GetDeltaValues(SortedList<DateTime, double> values, CalculationTimeBasis timeBasis = CalculationTimeBasis.Auto)
{
const CalculationTimeBasis autoDefaultsTo = CalculationTimeBasis.MostRecentTime;
var deltas = new SortedList<DateTime, double>(capacity: values.Count);
var previous = values.First();
foreach (var current in values.Skip(1))
{
var time = GetTimeForBasis(timeBasis, previous.Key, current.Key, autoDefaultsTo);
var diff = current.Value - previous.Value;
deltas.Add(time, diff);
previous = current;
}
return deltas;
}
private static DateTime GetTimeForBasis(CalculationTimeBasis timeBasis, DateTime earliestTime, DateTime mostRecentTime, CalculationTimeBasis autoDefaultsTo)
{
if (timeBasis == CalculationTimeBasis.Auto)
{
// Different (future) methods calling this may require different interpretations of Auto.
// Thus we leave it to the calling method to declare what Auto means to it.
timeBasis = autoDefaultsTo;
}
switch (timeBasis)
{
case CalculationTimeBasis.EarliestTime:
return earliestTime;
case CalculationTimeBasis.MidpointTime:
return new DateTime((earliestTime.Ticks + mostRecentTime.Ticks) / 2L).ToUtcTime();
case CalculationTimeBasis.MostRecentTime:
return mostRecentTime;
case CalculationTimeBasis.Auto:
default:
return earliestTime;
}
}
}
Usage Example
var inputValues = new List<JsonDto>();
// TODO: Magically populate inputValues
var tagDataMap = ProcessData.GetTagTimedValuesMap(inputValues);
foreach (var item in tagDataMap)
{
// Following would generate hourly differences for the one Tag Id (item.Key)
// by first generating hourly data, and then finding the delta of that.
var hourlyValues = ProcessData.GetInterpolatedValues(item.Value, TimeSpan.FromHours(1));
// Consider the difference between Hour(1) and Hour(2).
// That is, 2 input values will create 1 output value.
// Now you must decide which of the 2 input times you use for the 1 output time.
// This is what I call the CalculationTimeBasis.
// The time basis used will be Auto, which defaults to the most recent for this particular method, e.g. Hour(2)
var deltaValues = ProcessData.GetDeltaValues(hourlyValues);
// Same as above except we explicitly state we want the most recent time, e.g. also Hour(2)
var deltaValues2 = ProcessData.GetDeltaValues(hourlyValues, ProcessData.CalculationTimeBasis.MostRecentTime);
// Here the calculated differences are the same except the now
// timestamp now reflects the earliest time, e.g. Hour(1)
var deltaValues3 = ProcessData.GetDeltaValues(hourlyValues, ProcessData.CalculationTimeBasis.EarliestTime);
Related
I'm reverse engineering a file format that stores each field as TLV blocks (type, length, value).
The fields do not have to be in order, or even present at all. Their presence is denoted with a sentinel, which is a 16-bit type identifier and a 32-bit end offset. There are hundreds of unique identifiers, but a decent chunk of those are just single primitive values. aside from denoting the type, they can also identify what field the data should be stored in.
It is also worth noting that there will never be a duplicate id on a parent structure. The only time is can occur is if there are multiple of the same object type in an array/list.
I have successfully written a Kaitai definition for one of them:
meta:
id: struct_02ea
endian: le
seq:
- id: unk_00
type: s4
- id: fields
type: field_block
repeat: eos
types:
sentinel:
seq:
- id: id
type: u2
- id: end_offset
type: u4
field_block:
seq:
- id: sentinel
type: sentinel
- id: value
type:
switch-on: sentinel.id
cases:
0xF0: u1
0xF1: u1
0xF2: u1
0xF3: u1
0xF4: u4
0xF5: u4
size: sentinel.end_offset - _root._io.pos
Handling things this way does work, and I could likely map out the entire format like this. However, when it comes time to compiling this definition into another format, things get nasty.
Since I am wrapping each field in a field_block, the generated code stores these values in that type of object. This is incredibly inefficient when half of the generated field_block objects store a single integer. It would also require the consuming code to iterate through a list of each field block in order to get the actual field's value.
Ideally, I would like to define this structure so that the sentinels are only parsed while Kaitai is reading the data, and each value would be mapped to a field on the parent structure.
Is this possible? This technology is really cool, and I'd love to use it in my project, but I feel like the overhead that this is generating is a lot more trouble than it's worth.
Here's an example of the definition when compiled into C#:
using System.Collections.Generic;
namespace Kaitai
{
public partial class Struct02ea : KaitaiStruct
{
public static Struct02ea FromFile(string fileName)
{
return new Struct02ea(new KaitaiStream(fileName));
}
public Struct02ea(KaitaiStream p__io, KaitaiStruct p__parent = null, Struct02ea p__root = null) : base(p__io)
{
m_parent = p__parent;
m_root = p__root ?? this;
_read();
}
private void _read()
{
_unk00 = m_io.ReadS4le();
_fields = new List<FieldBlock>();
{
var i = 0;
while (!m_io.IsEof) {
_fields.Add(new FieldBlock(m_io, this, m_root));
i++;
}
}
}
public partial class Sentinel : KaitaiStruct
{
public static Sentinel FromFile(string fileName)
{
return new Sentinel(new KaitaiStream(fileName));
}
public Sentinel(KaitaiStream p__io, Struct02ea.FieldBlock p__parent = null, Struct02ea p__root = null) : base(p__io)
{
m_parent = p__parent;
m_root = p__root;
_read();
}
private void _read()
{
_id = m_io.ReadU2le();
_endOffset = m_io.ReadU4le();
}
private ushort _id;
private uint _endOffset;
private Struct02ea m_root;
private Struct02ea.FieldBlock m_parent;
public ushort Id { get { return _id; } }
public uint EndOffset { get { return _endOffset; } }
public Struct02ea M_Root { get { return m_root; } }
public Struct02ea.FieldBlock M_Parent { get { return m_parent; } }
}
public partial class FieldBlock : KaitaiStruct
{
public static FieldBlock FromFile(string fileName)
{
return new FieldBlock(new KaitaiStream(fileName));
}
public FieldBlock(KaitaiStream p__io, Struct02ea p__parent = null, Struct02ea p__root = null) : base(p__io)
{
m_parent = p__parent;
m_root = p__root;
_read();
}
private void _read()
{
_sentinel = new Sentinel(m_io, this, m_root);
switch (Sentinel.Id) {
case 243: {
_value = m_io.ReadU1();
break;
}
case 244: {
_value = m_io.ReadU4le();
break;
}
case 245: {
_value = m_io.ReadU4le();
break;
}
case 241: {
_value = m_io.ReadU1();
break;
}
case 240: {
_value = m_io.ReadU1();
break;
}
case 242: {
_value = m_io.ReadU1();
break;
}
default: {
_value = m_io.ReadBytes((Sentinel.EndOffset - M_Root.M_Io.Pos));
break;
}
}
}
private Sentinel _sentinel;
private object _value;
private Struct02ea m_root;
private Struct02ea m_parent;
public Sentinel Sentinel { get { return _sentinel; } }
public object Value { get { return _value; } }
public Struct02ea M_Root { get { return m_root; } }
public Struct02ea M_Parent { get { return m_parent; } }
}
private int _unk00;
private List<FieldBlock> _fields;
private Struct02ea m_root;
private KaitaiStruct m_parent;
public int Unk00 { get { return _unk00; } }
public List<FieldBlock> Fields { get { return _fields; } }
public Struct02ea M_Root { get { return m_root; } }
public KaitaiStruct M_Parent { get { return m_parent; } }
}
}
Affiliate disclaimer: I'm a Kaitai Struct maintainer (see my GitHub profile).
Since I am wrapping each field in a field_block, the generated code stores these values in that type of object. This is incredibly inefficient when half of the generated field_block objects store a single integer. It would also require the consuming code to iterate through a list of each field block in order to get the actual field's value.
I think that rather than trying to describe the entire format with an ultimate Kaitai Struct specification, it's better for you not to let the generated code parse all the fields automatically. Move the parsing control to your application code, where you use the type Struct02ea.FieldBlock that represents the individual field and basically replicate the "repeat until end of stream" loop that the generated code that you posted was doing:
_fields = new List<FieldBlock>();
{
var i = 0;
while (!m_io.IsEof) {
_fields.Add(new FieldBlock(m_io, this, m_root));
i++;
}
}
The advantage of doing so is that you can adjust the loop to fit your needs. To avoid the overhead you describe, you'll probably want to keep the Struct02ea.FieldBlock object in a local variable inside the loop body, pull only the values you care about (save them in your compact, consumer-friendly output structures) and let it leave the scope after the loop iteration ends. This will allow each original FieldBlock object to get garbage-collected once you process it, so the overhead they have will be limited to a single instance and not multiplied by the number of fields in the file.
The most straightforward and seamless way to prevent the Kaitai Struct-generated code parse fields (but otherwise keep everything the same) is to add if: false in the KSY specification, as #webbnh suggested in a GitHub issue:
seq:
- id: unk_00
type: s4
- id: fields
type: field_block
repeat: eos
if: false # add this
The if: false works better than omitting it from seq entirely, because the kaitai-struct-compiler has occasional troubles with unused types (when compiling the KSY spec with unused types, you may get an error "Unable to derive _parent type in ..." due to a compiler bug). But with this if: false trick, you can't run into them because the field_block type is no longer unused.
I am trying to do geofence monitoring/analytics using KSQLDB. I want to get a message whenever a vehicle ENTERS/LEAVES a geofence. Taking inspiration from the [https://github.com/gschmutz/various-demos/tree/master/kafka-geofencing] I have created a UDF named as GEOFENCE, below is the code for the same.
Below is my query to perform join on geofence stream and live vehicle position stream
CREATE stream join_live_pos_geofence_status_1 AS SELECT lp1.vehicleid,
lp1.lat,
lp1.lon,
s1p.geofencecoordinates,
Geofence(lp1.lat, lp1.lon, 'POLYGON(('+s1p.geofencecoordinates+'))') AS geofence_status
FROM live_position_1 LP1
LEFT JOIN stream_1_processed S1P within 72 hours
ON kmdlp1.clusterid = kmds1p.clusterid emit changes;
I am taking into account all the geofences created in last 3 days.
I have created another query to use the geofence status from previous query to calculate whether the vehicle is ENTERING/LEAVING geofence.
CREATE stream join_geofence_monitoring_1 AS SELECT *,
Geofence(jlpgs1.lat, jlpgs1.lon, 'POLYGON(('+jlpgs1.geofencecoordinates+'))', jlpgs1.geofence_status) geofence_monitoring_status
FROM join_live_pos_geofence_status_1 JLPGS1 emit changes;
The above query give me the output as 'INSIDE', 'INSIDE' for geofence_status and geofence_monitoring_status columns, respectively or the output is 'OUTSIDE', 'OUTSIDE' for geofence_status and geofence_monitoring_status columns, respectively. I know I am not taking into account the time aspect, like these 2 queries should never be executed at same time say 't0' but I am not able to think the correct way of doing this.
public class Geofence
{
private static final String OUTSIDE = "OUTSIDE";
private static final String INSIDE = "INSIDE";
private static GeometryFactory geometryFactory = JTSFactoryFinder.getGeometryFactory();
private static WKTReader wktReader = new WKTReader(geometryFactory);
#Udf(description = "Returns whether a coordinate lies within a polygon or not")
public static String geofence(final double latitude, final double longitude, String geometryWKT) {
boolean status = false;
String result = "";
Polygon polygon = null;
try {
polygon = (Polygon) wktReader.read(geometryWKT);
// However, an important point to note is that the longitude is the X value
// and the latitude the Y value. So we say "lat/long",
// but JTS will expect it in the order "long/lat".
Coordinate coord = new Coordinate(longitude, latitude);
Point point = geometryFactory.createPoint(coord);
status = point.within(polygon);
if(status)
{
result = INSIDE;
}
else
{
result = OUTSIDE;
}
} catch (ParseException e) {
throw new RuntimeException(e.getMessage());
}
return result;
}
#Udf(description = "Returns whether a coordinate moved in or out of a polygon")
public static String geofence(final double latitude, final double longitude, String geometryWKT, final String statusBefore) {
String status = geofence(latitude, longitude, geometryWKT);
if (statusBefore.equals("INSIDE") && status.equals("OUTSIDE")) {
//status = "LEAVING";
return "LEAVING";
} else if (statusBefore.equals("OUTSIDE") && status.equals("INSIDE")) {
//status = "ENTERING";
return "ENTERING";
}
return status;
}
}
My question is how can I calculate correctly that a vehicle is ENTERING/LEAVING a geofence? Is it even possible to do with KSQLDB?
Would it be correct to say that the join_live_pos_geofence_status_1 stream can have rows that go from INSIDE -> OUTSIDE and then from OUTSIDE -> INSIDE for some key value?
And what you're wanting to do is to output LEAVING and ENTERING events for these transitions?
You can likely do what you want using a custom UDAF. Custom UDAFs take and input and calculate an output, via some intermediate state. For example, an AVG udaf would take some numbers as input, its intermediate state would be the number of inputs and the sum of inputs, and the output would be count/sum.
In your case, the input would be the current state, e.g. either INSIDE or OUTSIDE. The UDAF would need to store the last two states in its intermediate state, and then the output state can be calculated from this. E.g.
Input Intermediate Output
INSIDE INSIDE <only single in intermediate - your choice what you output>
INSIDE INSIDE,INSIDE no-change
OUTSIDE INSIDE,OUTSIDE LEAVING
OUTSIDE OUTSIDE,OUTSIDE no-change
INSIDE OUTSIDE,INSIDE ENTERING
You'll need to decide what to output when there is only a single entry in the intermediate state, i.e. the first time a key is seen.
You can then filter the output to remove any rows that have no-change.
You may also need to set cache.max.bytes.buffering to zero to stop any results being conflated.
UPDATE: suggested code.
Not tested, but something like the following code may do what you want:
#UdafDescription(name = "my_geofence", description = "Computes the geofence status.")
public final class GoeFenceUdaf {
private static final String STATUS_1 = "STATUS_1";
private static final String STATUS_2 = "STATUS_2";
#UdafFactory(description = "Computes the geofence status.",
aggregateSchema = "STRUCT<" + STATUS_1 + " STRING, " + STATUS_2 + " STRING>")
public static Udaf<String, Struct, String> calcGeoFenceStatus() {
final Schema STRUCT_SCHEMA = SchemaBuilder.struct().optional()
.field(STATUS_1, Schema.OPTIONAL_STRING_SCHEMA)
.field(STATUS_2, Schema.OPTIONAL_STRING_SCHEMA)
.build();
return new Udaf<String, Struct, String>() {
#Override
public Struct initialize() {
return new Struct(STRUCT_SCHEMA);
}
#Override
public Struct aggregate(
final String newValue,
final Struct aggregate
) {
if (newValue == null) {
return aggregate;
}
if (aggregate.getString(STATUS_1) == null) {
// First status for this key:
return aggregate
.put(STATUS_1, newValue);
}
final String lastStatus = aggregate.getString(STATUS_2);
if (lastStatus == null) {
// Second status for this key:
return aggregate
.put(STATUS_2, newValue);
}
// Third and subsequent status for this key:
return aggregate
.put(STATUS_1, lastStatus)
.put(STATUS_2, newValue);
}
#Override
public String map(final Struct aggregate) {
final String previousStatus = aggregate.getString(STATUS_1);
final String currentStatus = aggregate.getString(STATUS_2);
if (currentStatus == null) {
// Only have single status, i.e. first status for this key
// What to do? Probably want to do:
return previousStatus.equalsIgnoreCase("OUTSIDE")
? "LEAVING"
: "ENTERING";
}
// Two statuses ...
if (currentStatus.equals(previousStatus)) {
return "NO CHANGE";
}
return previousStatus.equalsIgnoreCase("OUTSIDE")
? "ENTERING"
: "LEAVING";
}
#Override
public Struct merge(final Struct agg1, final Struct agg2) {
throw new RuntimeException("Function does not support session windows");
}
};
}
}
I have a table in my entity model called prices. It has several fields named value0, value1, value2, value3, value4... (these are their literal names, sigh..). I cannot rename them or in any way change them.
What I would like is to use an extended entity to create a new property called values. This would be a collection containing value1, value2 etc...
To get access to the values I would then simply need to write prices.values[1]
I need property changed notification for this.
So far I have tried this;
public partial class Prices
{
private ObservableCollection<double?> values = null;
public ObservableCollection<double?> Values
{
get
{
if (values != null)
values.CollectionChanged -= values_CollectionChanged;
else
values = new ObservableCollection<double?>(new double?[14]);
values[0] = value0;
values[1] = value1;
values[2] = value2;
values.CollectionChanged += values_CollectionChanged;
return values;
}
private set
{
value0 = value[0];
value1 = value[1];
value2 = value[2];
}
}
private void values_CollectionChanged(object sender, NotifyCollectionChangedEventArgs e)
{
Values = values;
}
}
The issue comes when trying to set values. if I try to set a value by writing
prices.values[0] = someValue;
The new value is not always reflected in the collection (i.e. when I have previously set value and then try to overwrite the value).
I am willing to try any approach that would achieve my goal, I am not precious about having my solution fixed (although if anyone can explain what I'm missing that would be great!)
You could implement an indexer on Prices class without using a collection.
You can use switch to select the property to write or you can use reflection.
In this case I use reflection.
public double? this[int index]
{
get
{
if (index < 0 || index > 13) throw new ArgumentOutOfRangeException("index");
string propertyName = "Value" + index;
return (double?)GetType().GetProperty(propertyName).GetValue(this);
}
set
{
if (index < 0 || index > 13) throw new ArgumentOutOfRangeException("index");
string propertyName = "Value" + index;
GetType().GetProperty(propertyName).SetValue(this, value);
// Raise your event here
}
}
i am trying to write a program, and the rest of the code so far works but i am getting a incompatible types found : double required :Grocery Item in line 38. Can anyone help me in explaining why I am receiving this error and how to correct it? Thank you. here is my code:
import java.util.Scanner;
public class GroceryList {
private GroceryItem[]groceryArr; //ARRAY HOLDS GROCERY ITEM OBJECTS
private int numItems;
private String date;
private String storeName;
public GroceryList(String inputDate, String inputName) {
//FILL IN CODE HERE
// CREATE ARRAY, INITIALIZE FIELDS
groceryArr = new GroceryItem[10];
numItems = 0;
}
public void load() {
Scanner keyboard = new Scanner(System.in);
double sum = 0;
System.out.println ("Enter the trip date and then hit return:");
date = keyboard.next();
keyboard.nextLine();
System.out.println("Enter the store name and then hit return:");
storeName = keyboard.next();
keyboard.nextLine();
double number = keyboard.nextDouble();
//NEED TO PROMPT USER FOR, AND READ IN THE DATE AND STORE NAME.
System.out.println("Enter each item bought and the price (then return).");
System.out.println("Terminate with an item with a negative price.");
number = keyboard.nextDouble();
while (number >= 0 && numItems < groceryArr.length) {
groceryArr[numItems] = number;
numItems++;
sum += number;
System.out.println("Enter each item bought and the price (then return).");
System.out.println("Terminate with an item with a negative price.");
number = keyboard.nextDouble();
}
/*
//READ IN AND STORE EACH ITEM. STORE NUMBER OF ITEMS
}
private GroceryItem computeTotalCost() {
//add code here
}
public void print() {
\\call computeTOtalCost
}
*/
}
}
"groceryArr[numItems] = number;"
groceryArr[numItems] is an instance of GroceryItem() - 'number' is a double
You need a double variable in your GroceryItem() object to store the 'number' value.
I'm trying to implement a caching scheme for my EF Repository similar to the one blogged here. As the author and commenters have reported the limitation is that the key generation method cannot produce cache keys that vary with a given query's parameters. Here is the cache key generation method:
private static string GetKey<T>(IQueryable<T> query)
{
string key = string.Concat(query.ToString(), "\n\r",
typeof(T).AssemblyQualifiedName);
return key;
}
So the following queries will yield the same cache key:
var isActive = true;
var query = context.Products
.OrderBy(one => one.ProductNumber)
.Where(one => one.IsActive == isActive).AsCacheable();
and
var isActive = false;
var query = context.Products
.OrderBy(one => one.ProductNumber)
.Where(one => one.IsActive == isActive).AsCacheable();
Notice that the only difference is that isActive = true in the first query and isActive = false in the second.
Any suggestions/insight to efficiently generating cache keys which vary by IQueryable parameters would be truly appreciated.
Kudos to Sergey Barskiy for sharing the EF CodeFirst caching scheme.
Update
I took the approach of traversing the IQueryable's expression tree myself with the goal of resolving the values of the parameters used in the query. With maxlego's suggestion, I extended the System.Linq.Expressions.ExpressionVisitor class to visit the expression nodes that we're interested in - in this case, the MemberExpression. The updated GetKey method looks something like this:
public static string GetKey<T>(IQueryable<T> query)
{
var keyBuilder = new StringBuilder(query.ToString());
var queryParamVisitor = new QueryParameterVisitor(keyBuilder);
queryParamVisitor.GetQueryParameters(query.Expression);
keyBuilder.Append("\n\r");
keyBuilder.Append(typeof (T).AssemblyQualifiedName);
return keyBuilder.ToString();
}
And the QueryParameterVisitor class, which was inspired by the answers of Bryan Watts and Marc Gravell to this question, looks like this:
/// <summary>
/// <see cref="ExpressionVisitor"/> subclass which encapsulates logic to
/// traverse an expression tree and resolve all the query parameter values
/// </summary>
internal class QueryParameterVisitor : ExpressionVisitor
{
public QueryParameterVisitor(StringBuilder sb)
{
QueryParamBuilder = sb;
Visited = new Dictionary<int, bool>();
}
protected StringBuilder QueryParamBuilder { get; set; }
protected Dictionary<int, bool> Visited { get; set; }
public StringBuilder GetQueryParameters(Expression expression)
{
Visit(expression);
return QueryParamBuilder;
}
private static object GetMemberValue(MemberExpression memberExpression, Dictionary<int, bool> visited)
{
object value;
if (!TryGetMemberValue(memberExpression, out value, visited))
{
UnaryExpression objectMember = Expression.Convert(memberExpression, typeof (object));
Expression<Func<object>> getterLambda = Expression.Lambda<Func<object>>(objectMember);
Func<object> getter = null;
try
{
getter = getterLambda.Compile();
}
catch (InvalidOperationException)
{
}
if (getter != null) value = getter();
}
return value;
}
private static bool TryGetMemberValue(Expression expression, out object value, Dictionary<int, bool> visited)
{
if (expression == null)
{
// used for static fields, etc
value = null;
return true;
}
// Mark this node as visited (processed)
int expressionHash = expression.GetHashCode();
if (!visited.ContainsKey(expressionHash))
{
visited.Add(expressionHash, true);
}
// Get Member Value, recurse if necessary
switch (expression.NodeType)
{
case ExpressionType.Constant:
value = ((ConstantExpression) expression).Value;
return true;
case ExpressionType.MemberAccess:
var me = (MemberExpression) expression;
object target;
if (TryGetMemberValue(me.Expression, out target, visited))
{
// instance target
switch (me.Member.MemberType)
{
case MemberTypes.Field:
value = ((FieldInfo) me.Member).GetValue(target);
return true;
case MemberTypes.Property:
value = ((PropertyInfo) me.Member).GetValue(target, null);
return true;
}
}
break;
}
// Could not retrieve value
value = null;
return false;
}
protected override Expression VisitMember(MemberExpression node)
{
// Only process nodes that haven't been processed before, this could happen because our traversal
// is depth-first and will "visit" the nodes in the subtree before this method (VisitMember) does
if (!Visited.ContainsKey(node.GetHashCode()))
{
object value = GetMemberValue(node, Visited);
if (value != null)
{
QueryParamBuilder.Append("\n\r");
QueryParamBuilder.Append(value.ToString());
}
}
return base.VisitMember(node);
}
}
I'm still doing some performance profiling on the cache key generation and hoping that it isn't too expensive (I'll update the question with the results once I have them). I'll leave the question open, in case anyone has suggestions on how to optimize this process or has a recommendation for a more efficient method for generating cache keys with vary with the query parameters. Although this method produces the desired output, it is by no means optimal.
i suggest to use ExpressionVisitor
http://msdn.microsoft.com/en-us/library/bb882521(v=vs.90).aspx
Just for the record, "Caching the results of LINQ queries" works well with the EF and it's able to work with parameters correctly, so it can be considered as a good second level cache implementation for EF.
While the solution of the OP works quite well, I found that the performance of the solution is a little bit poor.
The duration of the key generation varied between 300ms and 1200ms for my queries.
However, I've found another solution that has quite better performance (<10ms).
public static string ToTraceString<T>(DbQuery<T> query)
{
var internalQueryField = query.GetType().GetFields(BindingFlags.NonPublic | BindingFlags.Instance).Where(f => f.Name.Equals("_internalQuery")).FirstOrDefault();
var internalQuery = internalQueryField.GetValue(query);
var objectQueryField = internalQuery.GetType().GetFields(BindingFlags.NonPublic | BindingFlags.Instance).Where(f => f.Name.Equals("_objectQuery")).FirstOrDefault();
var objectQuery = objectQueryField.GetValue(internalQuery) as ObjectQuery<T>;
return ToTraceStringWithParameters(objectQuery);
}
private static string ToTraceStringWithParameters<T>(ObjectQuery<T> query)
{
string traceString = query.ToTraceString() + "\n";
foreach (var parameter in query.Parameters)
{
traceString += parameter.Name + " [" + parameter.ParameterType.FullName + "] = " + parameter.Value + "\n";
}
return traceString;
}