FK field over IPC - kdb

Some prerequisites on the remote process:
q)\p 15222
q)t:([id:10 20 30]data:`aa`bb`cc);
q)kt:([]id:`t$10 20 20 30 30 30; num:til 6);
and the following will be performed on a local process:
The size of kt looks the same from both sides:
q)-22!`::15222 "kt"
138
q)`::15222 "-22!kt"
138
But meta's are different:
q)`::15222 "meta kt"
c | t f a
---| -----
id | j t
num| j
q)meta `::15222 "kt"
c | t f a
---| -----
id | j
num| j
Why is it so? - why the transferred table does not contain a whole information as its source (despite of the sizes are the same)?
I suspect this somehow related to enums - q completely removes enum info - is it true?:
// remote
q)e:`a`b`c;
q)e1:`e$`b`a`c`c`a`b;
// local
q)`::15222 "type e1"
20h
q)type `::15222 "e1"
11h

Yes, enums (and thus foreign keys) aren't preserved when transported over IPC to another process.
Equal length from -22! doesn't mean identical content.

The reason for the difference between the two tables is because the enumeration domain isn't being sent over IPC so there is no guarantee that the enumeration domain exists in the local process or is the same as the remote process. If it were to send over the enumeration domain from the remote process then there's also the risk that it may overwrite a local enum domain in the local process.
Another point to note is that -22! checks the serialised size which will remove any foreign key on both client and server.
It might be worth checking out https://www.aquaq.co.uk/q/adventure-in-retrieving-memory-size-of-kdb-object/ for further reading on determining memory usage of kdb objects.

Related

Ethernet/IP - is it possible to read/write specific data from Assembly Object using Get/Set_Attribute_Single and Extended Logical Format?

We have a Windows application that we are using to test against a CLICK PLC and a Schneider PLC.
My question is around reading and writing data to the Assembly Object, and more specifically if there is a way to read/write from/to a particular spot in the byte array defined by the Assembly Object instance?
Reading the entire array is not so much of an issue, but we really don't want to write the entire array if all we need to do is write a single value. If we have to write all the values back, don't we risk overwriting some values that could have changed underneath us? Yes, we could do a read, change the single value, then write, but there's is no guarantee that the other values won't have changed between the read and the write.
The PLCs do not support Get/Set_Member, so we cannot use that, which means we are left with Get/Set_Attribute_Single. From looking at the ODVA documentation, Vol 1, Appendix C, section C-1.4.2 it seems to me that I can create an EPATH that should let me do just that by using the Extended Logical Format?
Ex: If I want to read the first element in the byte array, I should be able to construct my padded EPATH using the Extended Logical Format as follows:
EPATH = 20 04 | 24 65 | 30 03 | 3D 01 | 00 00
(Class ID 4, Instance ID 101, Attribute 3, Extended Logical with 16 bit Logical value and Array Index Type, Array Index 0)
If I call Get_Attribute_Single with this EPATH I get a "Path Segment Error" for either PLC.
Is my thinking correct, that I should be able to get an array element?
If so, is my EPATH correct?
If so, could the error be due to either PLC not supporting this?

Postgres extended statistics with partitioning

I am using Postgres 13 and have created a table with columns A, B and C. The table is partitioned by A with 2 possible values. Partition 1 contains 100 possible values each for B and C, whereas partition 2 has 100 completely different values for B, and 1 different value for C. I have set the statistics for both columns to maximum so that this definitely doesn't cause any issue
If I group by B and C on either partition, Postgres estimates the number of groups correctly. However if I run the query against the base table where I really want it, it estimates what I assume is no functional dependency between A, B and C, i.e. (p1B + p1C) * (p2B + p2C) for 200 * 101 as opposed to the reality of p1B * p1C + p2B * p2C for 10000 + 100.
I guess I was half expecting it to sum the underlying partitions rather than use the full count of 200 B's and 101 C's that the base table can see. Moreover, if I also add A into the group by then the estimate erroneously doubles further still, as it then thinks that this set will also be duplicated for each value of A.
This all made me think that I need an extended statistic to tell it that A influences either B or C or both. However if I set one on the base partition and analyze, the value in pg_statistic_ext_data->stxdndistinct is null. Whereas if I set it on the partitions themselves, this does appear to work, though isn't particularly useful because the estimation is already correct at this level. How do I go about having Postgres estimate against the base table correctly without having to run the query against all of the partitions and unioning them together?
You can define extended statistics on a partitioned table, but PostgreSQL doesn't collect any data in that case. You'll have to create extended statistics on all partitions individually.
You can confirm that by querying the collected data after an ANALYZE:
SELECT s.stxrelid::regclass AS table_name,
s.stxname AS statistics_name,
d.stxdndistinct AS ndistinct,
d.stxddependencies AS dependencies
FROM pg_statistic_ext AS s
JOIN pg_statistic_ext_data AS d
ON d.stxoid = s.oid;
There is certainly room for improvement here; perhaps don't allow defining extended statistics on a partitioned table in the first place.
I found that I just had to turn enable_partitionwise_aggregate on to get this to estimate correctly

OrientDB unique index on properties of two connected vertices

I have two vertex classes V1 and V2, both have the property 'id'.
I have one edge class, E.
V1's id property is a unique index, so all V1 have unique ids.
Now I want that all V2 instances connected to a certain V1 instance have unique ids.
So:
OK (and needed to work)
V1(id:"A") ---- E ----> V2(id:"a")
V1(id:"A") ---- E ----> V2(id:"b")
V1(id:"B") ---- E ----> V2(id:"a")
V1(id:"B") ---- E ----> V2(id:"b")
Not OK
V1(id:"A") ---- E ----> V2(id:"a")
V1(id:"A") ---- E ----> V2(id:"a")
Preferably, as an addition, it should also be possible for V2 instances to exist without edges and they should then be unique in the global scope. If this last part is not possible, the first part is helpful anyways.
Is this possible by database configuration / indexing (on edge or vertices) or do I have to enforce it in the application?
UPDATE
What I mean with by configuration / indexing is that it would be prevented (exception) if you were trying to add the edge (just like when using a unique index to enforce that only one edge exists between two vertices).
I see only 2 ways to do this:
Put the V2 id attribute in the edge and call it v2id, so you can create an unique index against out + v2id
Create a hook (trigger) on onBeforeCreate() of class E and do your checks
You can use
update e set out=(select from v1 where id="a"), in=(select from v2 where id="a") upsert where out=(select from v1 where id="a") and in=(select from v2 where id="a")
Hope it helps.

Neo4j: MERGE creates duplicate nodes

My database model has users and MAC addresses. A user can have multiple MAC addresses, but a MAC can only belong to one user. If some user sets his MAC and that MAC is already linked to another user, the existing relationship is removed and a new relationship is created between the new owner and that MAC. In other words, a MAC moves between users.
This is a particular instance of the Cypher query I'm using to assign MAC addresses:
MATCH (new:User { Id: 2 })
MERGE (mac:MacAddress { Value: "D857EFEF1CF6" })
WITH new, mac
OPTIONAL MATCH ()-[oldr:MAC_ADDRESS]->(mac)
DELETE oldr
MERGE (new)-[:MAC_ADDRESS]->(mac)
The query runs fine in my tests, but in production, for some strange reason it sometimes creates duplicate MacAddress nodes (and a new relationship between the user and each of those nodes). That is, a particular user can have multiple MacAddress nodes with the same Value.
I can tell they are different nodes because they have different node ID's. I'm also sure the Values are exactly the same because I can do a collect(distinct mac.Value) on them and the result is a collection with one element. The query above is the only one in the code that creates MacAddress nodes.
I'm using Neo4j 2.1.2. What's going on here?
Thanks,
Jan
Are you sure this is the entirety of the queries you're running? MERGE has this really common pitfall where it merges everything that you give it. So here's what people expect:
neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" });
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 1
Labels added: 1
1650 ms
neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" });
+--------------------------------------------+
| No data returned, and nothing was changed. |
+--------------------------------------------+
17 ms
neo4j-sh (?)$ match (mac:MacAddress { Value: "D857EFEF1CF6" }) return count(mac);
+------------+
| count(mac) |
+------------+
| 1 |
+------------+
1 row
200 ms
So far, so good. That's what we expect. Now watch this:
neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" })-[r:foo]->(b:SomeNode {label: "Foo!"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2
Relationships created: 1
Properties set: 2
Labels added: 2
178 ms
neo4j-sh (?)$ match (mac:MacAddress { Value: "D857EFEF1CF6" }) return count(mac);
+------------+
| count(mac) |
+------------+
| 2 |
+------------+
1 row
2 ms
Wait, WTF happened here? We specified only the same MAC address again, why is a duplicate created?
The documentation on MERGE specifies that "MERGE will not partially use existing patterns — it’s all or nothing. If partial matches are needed, this can be accomplished by splitting a pattern up into multiple MERGE clauses". So because when we run this path MERGE the whole path doesn't already exist, it creates everything in it, including a duplicate mac address node.
There are frequently questions about duplicated nodes created by MERGE, and 99 times out of 100, this is what's going on.
This is the response I got back from Neo4j's support (emphasis mine):
I got some feedback from our team already, and it's currently known that this can happen in the absence of a constraint. MERGE is effectively MATCH or CREATE - and those two steps are run independently within the transaction. Given concurrent execution, and the "read committed" isolation level, there's a race condition between the two.
The team have done some discussion on how to provided a higher guarantee in the face of concurrency, and do have it noted as a feature request for consideration.
Meanwhile, they've assured me that using a constraint will provide the uniqueness you're looking for.

store list in key value database

I search for best way to store lists associated with key in key value database (like berkleydb or leveldb)
For example:
I have users and orders from user to user
I want to store list of orders ids for each user to fast access with range selects (for pagination)
How to store this structure?
I don't want to store it in serializable format for each user:
user_1_orders = serialize(1,2,3..)
user_2_orders = serialize(1,2,3..)
beacuse list can be long
I think about separate db file for each user with store orders ids as keys in it, but this does not solve range selects problem.. What if I want to get user ids with range [5000:5050]?
I know about redis, but interest in key value implementation like berkleydb or leveldb.
Let start with a single list. You can work with a single hashmap:
store in row 0 the count of user's order
for each new order store a new row with the count incremented
So yoru hashmap looks like the following:
key | value
-------------
0 | 5
1 | tomato
2 | celery
3 | apple
4 | pie
5 | meat
Steady increment of the key makes sure that every key is unique. Given the fact that the db is key ordered and that the pack function translates integers into a set of byte arrays that are correctly ordered you can fetch slices of the list. To fetch orders between 5000 and 5050 you can use bsddb Cursor.set_range or leveldb's createReadStream (js api)
Now let's expand to multiple user orders. If you can open several hashmap you can use the above using several hashmap. Maybe you will hit some system issues (max nb of open fds or max num of files per directory). So you can use a single and share the same hashmap for several users.
What I explain in the following works for both leveldb and bsddb given the fact that you pack keys correctly using the lexicographic order (byteorder). So I will assume that you have a pack function. In bsddb you have to build a pack function yourself. Have a look at wiredtiger.packing or bytekey for inspiration.
The principle is to namespace the keys using the user's id. It's also called key composition.
Say you database looks like the following:
key | value
-------------------
1 | 0 | 2 <--- count column for user 1
1 | 1 | tomato
1 | 2 | orange
... ...
32 | 0 | 1 <--- count column for user 32
32 | 1 | banna
... | ...
You create this database with the following (pseudo) code:
db.put(pack(1, make_uid(1)), 'tomato')
db.put(pack(1, make_uid(1)), 'orange')
...
db.put(pack(32, make_uid(32)), 'bannana')
make_uid implementation looks like this:
def make_uid(user_uid):
# retrieve the current count
counter_key = pack(user_uid, 0)
value = db.get(counter_key)
value += 1 # increment
# save new count
db.put(counter_key, value)
return value
Then you have to do the correct range lookup, it's similar to the single composite-key. Using bsddb api cursor.set_range(key) we retrieve all items
between 5000 and 5050 for user 42:
def user_orders_slice(user_id, start, end):
key, value = cursor.set_range(pack(user_id, start))
while True:
user_id, order_id = unpack(key)
if order_id > end:
break
else:
# the value is probably packed somehow...
yield value
key, value = cursor.next()
Not error checks are done. Among other things slicing user_orders_slice(42, 5000, 5050) is not guaranteed to tore 51 items if you delete items from the list. A correct way to query say 50 items, is to implement a user_orders_query(user_id, start, limit)`.
I hope you get the idea.
You can use Redis to store list in zset(sorted set), like this:
// this line is called whenever a user place an order
$redis->zadd($user_1_orders, time(), $order_id);
// list orders of the user
$redis->zrange($user_1_orders, 0, -1);
Redis is fast enough. But one thing you should know about Redis is that it stores all data in memory, so if the data eventually exceed the physical memory, you have to shard the data by your own.
Also you can use SSDB(https://github.com/ideawu/ssdb), which is a wrapper of leveldb, has similar APIs to Redis, but stores most data in disk, memory is only used for caching. That means SSDB's capacity is 100 times of Redis' - up to TBs.
One way you could model this in a key-value store which supports scans , like leveldb, would be to add the order id to the key for each user. So the new keys would be userId_orderId for each order. Now to get orders for a particular user, you can do a simple prefix scan - scan(userId*). Now this makes the userId range query slow, in that case you can maintain another table just for userIds or use another key convention : Id_userId for getting userIds between [5000-5050]
Recently I have seen hyperdex adding data types support on top of leveldb : ex: http://hyperdex.org/doc/04.datatypes/#lists , so you could give that a try too.
In BerkeleyDB you can store multiple values per key, either in sorted or unsorted order. This would be the most natural solution. LevelDB has no such feature. You should look into LMDB(http://symas.com/mdb/) though, it also supports sorted multi-value keys, and is smaller, faster, and more reliable than either of the others.