weird object returned by vector_indexing_suite - boost-python

I have a
std::vector<const T*>
that I return from a c++ function:
getallTs()
I have exposed the T class with:
class_<T,T*>
and the vector like so:
class_<std::vector<const T*> >("TsList")
.def(vector_indexing_suite<std::vector<const T*>,true>())
;
What does the NoProxy argument mean?
I expose the function like so:
def("getallTs", getallTs,
return_value_policy<return_by_value>{});
I observe a weird behaviour.
When I call from python
tlist = getallTs()
I get a TsList object.
len(tlist)
works.
tlist[<anycorrectindex>].<someattribute>
also works.
However, if I just
print(tlist[0])
and
print(tlist[100])
python prints
object T at <address>
This address is the same for all the Ts in tlist.
Also, I cannot iterate over Tlist with a python for loop.
for t in tlist:
doesn't work.
Any ideas what is wrong with the way I am exposing the vector and the function to python?
I understand the python objects that each wrap a c++ T hold a raw pointer to T.
These T instances exist throughout the process in a global table.
The c++ function retunrns a vector of pointers to those instances.
What does indexing_suite do with those?
Thanks,

When accessing elements by index, the indexing suite defaults to providing a proxy to the element, as a means to provide reference semantics for mutable types that Python users will often expect with collections:
val = c[i]
c[i].m() # Mutates state, equivalent to `val.m()`
assert(val == c[i]) # Have same state.
val.m()
assert(val == c[i]) # Have same state.
In the above example, val is a proxy object that is aware of the container element. When NoProxy is true, one gets value semantics when indexing, resulting in a copy on each index access.
val = c[i] # val is a copy.
c[i].m() # Modify a copy of c[i].
assert(val == c[i]) # These have the same state because c[i] returns a new copy.
val.m()
assert(val != c[i]) # These do not have the same state.
When proxies are not used, the mutations to the elements will only persists when invoked on a reference to the element, such as during iteration:
for val in c:
val.m() # modification observed in c[#]
When invoking print(c[i]), a temporary proxy object is created and passed to print, and the lifetime of the proxy object ends upon returning from print(). Hence, the memory and identification used by the temporary proxy object may be re-used. This can result in elements appearing to have the same identification:
id0 = id(c[0]) # id of the temporary proxy
id1 = id(c[1]) # id of another temporary proxy
assert(id0 ?? id1) # Non-deterministic if these will be the same.
assert(c[0] is not c[1]) # Guaranteed to not be the same.
On the other hand, during the lifetime of a proxy, other proxies to the same element will have identical identification, and proxies to different elements will have different identification:
c0 = c[0] # proxy to element 0.
c0_2 = c[0] # another proxy to element 0.
c1 = c[1] # proxy to element 1
assert(c0 is c0_2)
assert(c0 is c[0])
assert(c0 is not c1)
In the situation where T has been exposed as being held by T*, iteration over std::vector<const T*> will fail in Python if there is no to-Python conversion for const T* to a Python object. Exposing class T as being held by T* registers automatic to-Python and from-Python conversions for T*, not const T*. When iterating over the collection in Python, references to elements are returned, resulting in a Python object failing to be constructed from a const T*. On the other hand, when accessing elements via index, the resulting Python object is either a proxy or a copy, which can use the existing converters. To resolve this, consider either:
having std::vector<>'s element type be the same as T's held type
explicitly registering a const T* to-Python converter

Related

How to define `last` iterator without collecting/allocating?

Using the example from the Julia Docs, we can define an iterator like the following:
struct Squares
count::Int
end
Base.iterate(S::Squares, state=1) = state > S.count ? nothing : (state*state, state+1)
Base.eltype(::Type{Squares}) = Int # Note that this is defined for the type
Base.length(S::Squares) = S.count
But even though there's a length defined, asking for last(Squares(5)) results in an error:
julia> last(Squares(5))
ERROR: MethodError: no method matching lastindex(::Squares)
Since length is defined, is there a way to iterate through and return the last value without doing an allocating collect? If so, would it be bad to extend the Base.last method for my type?
As you can read in the docstring of last:
Get the last element of an ordered collection, if it can be computed in O(1) time. This is accomplished by calling lastindex to get the last index.
The crucial part is O(1) computation time. In your example the cost of computing last element is O(count) (of course if we want to use the definition of the iterator as in general it would be possible compute it in O(1) time).
The idea is to avoid defining last for collections for which it is expensive to compute it. For this reason the default definition of last is:
last(a) = a[end]
which requires not only lastindex but also getindex defined for the passed value (as the assumption is that if someone defines lastindex and getindex for some type then these operations can be performed fast).
If you look at Interfaces section of the Julia manual you will notice that the iteration interface (something that your example implements) is less demanding than indexing interface (something that is defined for your example in the next section of the manual). Usually the distinction is made that indexing interface is only added for collections that can be indexed efficiently.
If you still want last to work on your type you can either:
add a definition to Base.last specifically - there is nothing wrong with doing this;
add a definition of getindex, firstindex, and lastindex to make the collection indexable (and then the default definition of last would work) - this is the approach presented in the Julia manual

Thread safe operations on XDP

I was able to confirm from the documentation that bpf_map_update_elem is an atomic operation if done on HASH_MAPs. Source (https://man7.org/linux/man-pages/man2/bpf.2.html). [Cite: map_update_elem() replaces existing elements atomically]
My question is 2 folds.
What if the element does not exist, is the map_update_elem still atomic?
Is the XDP operation bpf_map_delete_elem thread safe from User space program?
The map is a HASH_MAP.
Atomic ops, race conditions and thread safety are sort of complex in eBPF, so I will make a broad answer since it is hard to judge from your question what your goals are.
Yes, both the bpf_map_update_elem command via the syscall and the helper function update the maps 'atmomically', which in this case means that if we go from value 'A' to value 'B' that the program always sees either 'A' or 'B' not some combination of the two(first bytes of 'B' and last bytes of 'A' for example). This is true for all map types. This holds true for all map modifying syscall commands(including bpf_map_delete_elem).
This however doesn't make race conditions impossible since the value of the map may have changed between a map_lookup_elem and the moment you update it.
What is also good to keep in mind is that the map_lookup_elem syscall command(userspace) works differently from the helper function(kernelspace). The syscall will always return a copy of the data which isn't mutable. But the helper function will return a pointer to the location in kernel memory where the map value is stored, and you can directly update the map value this way without using the map_update_elem helper. That is why you often see hash maps used like:
value = bpf_map_lookup_elem(&hash_map, &key);
if (value) {
__sync_fetch_and_add(&value->packets, 1);
__sync_fetch_and_add(&value->bytes, skb->len);
} else {
struct pair val = {1, skb->len};
bpf_map_update_elem(&hash_map, &key, &val, BPF_ANY);
}
Note that in this example, __sync_fetch_and_add is used to update parts of the map value. We need to do this since updating it like value->packets++; or value->packets += 1 would result in a race condition. The __sync_fetch_and_add emits a atomic CPU instruction which in this case fetches, adds and writes back all in one instruction.
Also, in this example, the two struct fields are atomically updated, but not together, it is still possible for the packets to have incremented but bytes not yet. If you want to avoid this you need to use a spinlock(using the bpf_spin_lock and bpf_spin_unlock helpers).
Another way to sidestep the issue entirely is to use the _PER_CPU variants of maps, where you trade-off congestion/speed and memory use.

Unique symbol value on type level

Is it possible to have some kind of unique symbol value on the type level, that could be used to distinct (tag) some record without the need to supply a unique string value?
In JS there is Symbol often used for such things. But I would like to have it without using Effect, in pure context.
Well, it could even like accessing Full qualified module name (which is quite unique for the task), but I'm not sure if this is a really relevant/possible thing in the Purescript context.
Example:
Say There is some module that exposes:
type Worker value state =
{ tag :: String
, work :: value -> state -> Effect state
}
makeWorker :: forall value state. Worker value state
performWork :: forall value state. woker -> Worker value state -> value -> Unit
This module is used to manage the state of workers, it passes them value and current state value, and gets Effect with new state value, and puts in state map where keys are tags.
Users of the module:
In one module:
worker = makeWorker { tag: "WorkerOne", work }
-- Then this tagged `worker` is used to performWork:
-- performWork worker "Some value"
In another module we use worker with another tag:
worker = makeWorker { tag: "WorkerTwo", work }
So it would be nice if there would be no need to supply a unique string ("WorkerOne", "WorkerTwo") as a tag but use some "generated" unique value. But the task is that worker should be created on the top level of the module in pure context.
Semantics of PureScript as such is pure and pretty much incompatible with this sort of thing. Same expression always produces same result. The results can be represented differently at a lower level, but in the language semantics they're the same.
And this is a feature, not a bug. In my experience, more often than not, a requirement like yours is an indication of a flawed design somewhere upstream.
An exception to this rule is FFI: if you have to interact with the underlying platform, there is no choice but to play by that platform's rules. One example I can give is React, which uses the JavaScript's implicit object identity as a way to tell components apart.
So the bottom line is: I urge you to reconsider the requirement. Chances are, you don't really need it. And even if you do, manually specified strings might actually be better than automatically generated ones, because they may help you troubleshoot later.
But if you really insist on doing it this way, good news: you can cheat! :-)
You can generate your IDs effectfully and then wrap them in unsafePerformEffect to make it look pure to the compiler. For example:
import Effect.Unsafe (unsafePerformEffect)
import Data.UUID (toString, genUUID)
workerTag :: String
workerTag = toString $ unsafePerformEffect genUUID

Hashcode doesn't change between reruns

object Main extends App {
var a = new AnyRef()
println(a hashCode)
}
I have this code in Intellij Idea. I noticed that hashcode does not change between reruns. Even more, it doesn't change if I restart idea, or do some light modifications to the code. I can rename variable a or add a few more variables and I still have the same hashcode.
Is it cached somewhere? Or it's just OS who allocated the same address to a variable? Any consequences of this?
I'd expect it to be new each time, as OS should allocate new address each run.
The implementation for Object.hashCode() can vary between JVMs as long as it obeys the contract, which doesn't require the numbers to be different between runs. For HotSpot there is even an option (-XX:hashCode) to change the implementation.
HotSpot's default is to use a random number generator, so if you are using that (with no -XX:hashCode option) then it seems it uses the same seed on each run, resulting in the same sequence of hash codes. There's nothing wrong with that.
lmm's answer is not correct unless maybe if you are using HotSpot with -XX:hashCode=4 or another JVM that uses this technique by default. But I'm not at all certain about that (you can try yourself by using HotSpot with -XX:hashCode=4 and see if you get another value which also stays the same between runs).
Check out the code for the different options:
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/runtime/synchronizer.cpp#l555
There is a comment in there about making the "else" branch the default, which is the Xorshift pattern, which is indeed a pseudo-random number generator which will always provide the same sequence.
The answer from "apangin" on this question says that indeed this has become the default since JDK8 which explains the change from JDK7 you described in your comment.
I can confirm that this is correct, look at the JDK8 source:
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/globals.hpp#l1127
--> Default value is now 5, which corresponds to the "else" branch (Xorshift).
Some experiment:
scala> class A extends AnyRef
defined class A
scala> val a1= new A
a1: A = A#5f6b1f19
scala> val a2 = new A
a2: A = A#d60aa4
scala> a1.hashCode
res19: Int = 1600855833
scala> a2.hashCode
res20: Int = 14027428
scala> val a3 = new AnyRef
a3: Object = java.lang.Object#16c3388e
scala> a3.hashCode
res21: Int = 381892750
So, it's obvious AnyRef hash code is equal to address of object. If we have equal hashes it's mean object address is the same on every rerun. And that is true for me with two repls.
API tells about AnyRef hashCode method:
The hashCode method for reference types. See hashCode in scala.Any.
And about Any method:
Calculate a hash code value for the object.
The default hashing algorithm is platform dependent.
I guess that platform determines location of object and therefore value of hashCode.
Any new process gets its own virtual address space from the OS. So while the process might exist at a different physical address each time the program runs, it will be mapped to the same virtual address each time. (ASLR exists, but I understand the JVM doesn't participate in it). You can see this with e.g. a small C program with a string constant in (you might have to deliberately disable ASLR for that program) - if you take a pointer to the string constant and print that pointer as an integer, it will be the same value every time.
hashCode() is not a random number. It is a digested result from analyzing some part of an object. Objects with the same values will, more than likely, have the same hash code. This is true for your case, since the "value" of an AnyRef with no fields is essentially empty.

(Usage of Class Variables) Pythonic - or nasty habit learnt from java?

Hello Pythoneers: the following code is only a mock up of what I'm trying to do, but it should illustrate my question.
I would like to know if this is dirty trick I picked up from Java programming, or a valid and Pythonic way of doing things: basically I'm creating a load of instances, but I need to track 'static' data of all the instances as they are created.
class Myclass:
counter=0
last_value=None
def __init__(self,name):
self.name=name
Myclass.counter+=1
Myclass.last_value=name
And some output of using this simple class , showing that everything is working as I expected:
>>> x=Myclass("hello")
>>> print x.name
hello
>>> print Myclass.last_value
hello
>>> y=Myclass("goodbye")
>>> print y.name
goodbye
>>> print x.name
hello
>>> print Myclass.last_value
goodbye
So is this a generally acceptable way of doing this kind of thing, or an anti-pattern ?
[For instance, I'm not too happy that I can apparently set the counter from both within the class(good) and outside of it(bad); also not keen on having to use full namespace 'Myclass' from within the class code itself - just looks bulky; and lastly I'm initially setting values to 'None' - probably I'm aping static-typed languages by doing this?]
I'm using Python 2.6.2 and the program is single-threaded.
Class variables are perfectly Pythonic in my opinion.
Just watch out for one thing. An instance variable can hide a class variable:
x.counter = 5 # creates an instance variable in the object x.
print x.counter # instance variable, prints 5
print y.counter # class variable, prints 2
print myclass.counter # class variable, prints 2
Do. Not. Have. Stateful. Class. Variables.
It's a nightmare to debug, since the class object now has special features.
Stateful classes conflate two (2) unrelated responsibilities: state of object creation and the created objects. Do not conflate responsibilities because it "seems" like they belong together. In this example, the counting of created objects is the responsibility of a Factory. The objects which are created have completely unrelated responsibilities (which can't easily be deduced from the question).
Also, please use Upper Case Class Names.
class MyClass( object ):
def __init__(self, name):
self.name=name
def myClassFactory( iterable ):
for i, name in enumerate( iterable ):
yield MyClass( name )
The sequence counter is now part of the factory, where the state and counts should be maintained. In a separate factory.
[For folks playing Code Golf, this is shorter. But that's not the point. The point is that the class is no longer stateful.]
It's not clear from question how Myclass instances get created. Lacking any clue, there isn't much more than can be said about how to use the factory. An iterable is the usual culprit. Perhaps something that iterates through a list or a file or some other iterable data structure.
Also -- for folks just of the boat from Java -- the factory object is just a function. Nothing more is needed.
Since the example on the question is perfectly unclear, it's hard to know why (1) two unique objects are created with (2) a counter. The two unique objects are already two unique objects and a counter isn't needed.
For example, the static variables in the Myclass are never referenced anywhere. That makes it very, very hard to understand the example.
x, y = myClassFactory( [ "hello", "goodbye" ] )
If the count or last value where actually used for something, then a perhaps meaningful example could be created.
You can solve this problem by splitting the code into two separate classes.
The first class will be for the object you are trying to create:
class MyClass(object):
def __init__(self, name):
self.Name = name
And the second class will create the objects and keep track of them:
class MyClassFactory(object):
Counter = 0
LastValue = None
#classmethod
def Build(cls, name):
inst = MyClass(name)
cls.Counter += 1
cls.LastValue = inst.Name
return inst
This way, you can create new instances of the class as needed, but the information about the created classes will still be correct.
>>> x = MyClassFactory.Build("Hello")
>>> MyClassFactory.Counter
1
>>> MyClassFactory.LastValue
'Hello'
>>> y = MyClassFactory.Build("Goodbye")
>>> MyClassFactory.Counter
2
>>> MyClassFactory.LastValue
'Goodbye'
>>> x.Name
'Hello'
>>> y.Name
'Goodbye'
Finally, this approach avoids the problem of instance variables hiding class variables, because MyClass instances have no knowledge of the factory that created them.
>>> x.Counter
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'MyClass' object has no attribute 'Counter'
You don't have to use a class variable here; this is a perfectly valid case for using globals:
_counter = 0
_last_value = None
class Myclass(obj):
def __init__(self, name):
self.name = name
global _counter, _last_value
_counter += 1
_last_value = name
I have a feeling some people will knee-jerk against globals out of habit, so a quick review may be in order of what's wrong--and not wrong--with globals.
Globals traditionally are variables which are visible and changeable, unscoped, from anywhere in the program. This is a problem with globals in languages like C. It's completely irrelevant to Python; these "globals" are scoped to the module. The class name "Myclass" is equally global; both names are scoped identically, in the module they're contained in. Most variables--in Python equally to C++--are logically part of instances of objects or locally scoped, but this is cleared shared state across all users of the class.
I don't have any strong inclination against using class variables for this (and using a factory is completely unnecessary), but globals are how I'd generally do it.
Is this pythonic? Well, it's definitely more pythonic than having global variables for a counter and the value of the most recent instance.
It's said in Python that there's only one right way to do anything. I can't think of a better way to implement this, so keep going. Despite the fact that many will criticize you for "non-pythonic" solutions to problems (like the needless object-orientation that Java coders like or the "do-it-yourself" attitude that many from C and C++ bring), in most cases your Java habits will not send you to Python hell.
And beyond that, who cares if it's "pythonic"? It works, and it's not a performance issue, is it?