How to randomly shuffle asyncio.Queue in Python? - queue

When I want to randomly shuffle a list in Python, I do:
from random import shuffle
shuffle(mylist)
How would I do the equivalent to an instance of asyncio.Queue? Do I have to convert the queue to a list, shuffle the list, and then put them back on the Queue? Or is there a way to do it directly?

As you can see in Queue source code, items in Queue are actually stored in _queue attribute. It can be used to extend Queue through inheritance:
import asyncio
from random import shuffle
class MyQueue(asyncio.Queue):
def shuffle(self):
shuffle(self._queue)
async def main():
queue = MyQueue()
await queue.put(1)
await queue.put(2)
await queue.put(3)
queue.shuffle()
while not queue.empty():
item = await queue.get()
print(item)
if __name__ == '__main__':
asyncio.run(main())
If you want to shuffle existing Queue instance, you can do it directly:
queue = asyncio.Queue()
shuffle(queue._queue)
It's usually not a good solution for obvious reason, but on the other hand probability that Queue's implementation will change in future in a way to make it problem seems relatively low (to me at least).

Related

Asyncio & Asyncpg & aiohttp one loop event for pool connection

I've been struggling to find a solution for my problem, I hope I've come to the right place.
I have a django rest framework API which connect to a postgresql db and I run bots on my own API in order to do stuff. Here is my code :
def get_or_create_eventloop():
"""Get the eventLoop only one time (create it if does not exist)"""
try:
return asyncio.get_event_loop()
except RuntimeError:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
return asyncio.get_event_loop()
My DB class which use asyncpg to connect / create a pool :
Class DB():
def __init__(self,loop):
self.pool = loop.run_until_complete(self.connect_to_db())
def connect_to_db():
return await asyncpg.create_pool(host="host",
database="database",
user="username",
password="pwd",
port=5432)
My API class :
Class Api(APIView):
#create a loop event since its not the main thread
loop = get_or_create_eventloop()
nest_asyncio.apply() #to avoid the <loop already running> problem
#init my DB pool directly so I wont have to connect each time
db_object = DB(loop)
def post(self,request):
... #I want to be able to call "do_something()"
async def do_something(self):
...
I have my bots running and sending post/get request to my django api via aiohttp.
The problem I'm facing is :
How to implement my post function in my API so he can handle multiple requests knowing that it's a new thread each time therefore a new event loop is created AND the creation of the pool in asyncpg is LINKED to the current event loop, i.e can't create new event loop, I need to keep working on the one created at the beginning so I can access my db later (via pool.acquire etc)
This is what I tried so far without success :
def post(self,request):
self.loop.run_until_complete(self.do_something())
This create :
RuntimeError: Non-thread-safe operation invoked on an event loop other than the current one
which I understand, we are trying to call the event loop from another thread possibly
I also tried to use asyng_to_sync from DJANGO :
#async_to_sync
async def post(..):
resp = await self.do_something()
The problem here is when doing async_to_sync it CREATES a new event loop for the thread, therefore I won't be able to access my DB POOL
edit : cf https://github.com/MagicStack/asyncpg/issues/293 for that (I would love to implement something like that but can't find a way)
Here is a quick example of one of my bot (basic stuff) :
import asyncio
from aiohttp import ClientSession
async def send_req(url, session):
async with session.post(url=url) as resp:
return await resp.text()
async def run(r):
url = "http://localhost:8080/"
tasks = []
async with ClientSession() as session:
for i in range(r):
task = asyncio.asyncio.create_task(send_req(url, session))
tasks.append(task)
responses = await asyncio.gather(*tasks)
print(responses)
if __name__ == '__main__':
asyncio.run(main())
Thank you in advance
After days of looking for an answer, I found the solution for my problem. I just used the package psycopg3 instead of asyncpg (now I can put #async_to_sync to my post function and it works)

How can I run my own method in every pymodbus server transaction that is processed

I would like to run my own method whenever in a pymodbus server whenever a message is processed. Is that possible ?
Thanks
While running through the examples, I came across an example where they subclass pymodbus.datastore.ModbusSparseDataBlock https://pymodbus.readthedocs.io/en/latest/source/example/callback_server.html
The example probably implements more than you'd need, at minimum you should just override:
__init__: by passing the a dict of values, it provides the server with the legal address range a client can request
setValues: this is where the magic happens: here you can add your own callbacks to any incoming values for a given address.
My minimal example looks like this:
import logging
from pymodbus.datastore import (
ModbusServerContext,
ModbusSlaveContext,
ModbusSparseDataBlock,
)
from pymodbus.server.sync import StartSerialServer
from pymodbus.transaction import ModbusRtuFramer
logger = logging.getLogger(__name__)
class CallbackDataBlock(ModbusSparseDataBlock):
"""callbacks on operation"""
def __init__(self):
super().__init__({k: k for k in range(60)})
def setValues(self, address, value):
logger.info(f"Got {value} for {address}")
super().setValues(address, value)
def run_server():
block = CallbackDataBlock()
store = ModbusSlaveContext(di=block, co=block, hr=block, ir=block)
context = ModbusServerContext(slaves=store, single=True)
StartSerialServer(
context,
framer=ModbusRtuFramer,
port="/dev/ttyNS0",
timeout=0.005,
baudrate=19200,
)
if __name__ == "__main__":
run_server()

How to test `Var`s of `scala.rx` with scalatest?

I have a method which connects to a websocket and gets stream messages from some really outside system.
The simplified version is:
def watchOrders(): Var[Option[Order]] = {
val value = Var[Option[Order]](None)
// onMessage( order => value.update(Some(order))
value
}
When I test it (with scalatest), I want to make it connect to the real outside system, and only check the first 4 orders:
test("watchOrders") {
var result = List.empty[Order]
val stream = client.watchOrders()
stream.foreach {
case Some(order) =>
result = depth :: result
if (result.size == 4) { // 1.
assert(orders should ...) // 2.
stream.kill() // 3.
}
case _ =>
}
Thread.sleep(10000) // 4.
}
I have 4 questions:
Is it the right way to check the first 4 orders? there is no take(4) method found in scala.rx
If the assert fails, the test still passes, how to fix it?
Is it the right way to stop the stream?
If the thread doesn't sleep here, the test will pass the code in case Some(order) never runs. Is there a better way to wait?
One approach you might consider to get a List out of a Var is to use the .fold combinator.
The other issue you have is dealing with the asynchronous nature of the data - assuming you really want to talk to this outside real world system in your test code (ie, this is closer to the integration test side of things), you are going to want to look at scalatest's support for async tests and will probably do something like construct a future out of a promise that you can complete when you accumulate the 4 elements in your list.
See: http://www.scalatest.org/user_guide/async_testing

Why scala futures do not work faster even with more threads in threads pool?

I have a following algorithm with scala:
Do initial call to db to initialize cursor
Get 1000 entities from db (Returns Future)
For every entity process one additional request to database and get modified entity (returns future)
Transform original entity
Put transformed entity to Future call back from #3
Wait for all Futures
In scala it some thing like:
val client = ...
val size = 1000
val init:Future = client.firstSearch(size) //request over network
val initResult = Await(init, 30.seconds)
var cursorId:String = initResult.getCursorId
while (!cursorId.isEmpty) {
val futures:Seq[Future] = client.grabWithSize(cursorId).map{response=>
response.getAllResults.map(result=>
val grabbedOne:Future[Entity] = client.grabOneEntity(result.id) //request over network
val resultMap:Map[String,Any] = buildMap(result)
val transformed:Map[String,Any] = transform(resultMap) //no future here
grabbedOne.map{grabbedOne=>
buildMap(grabbedOne) == transformed
}
}
Futures.sequence(futures).map(_=> response.getNewCursorId)
}
}
def buildMap(...):Map[String,Any] //sync call
I noticed that if I increase size say two times, every iteration in while started working slowly ~1.5. But I do not see that my PC processor loaded more. It loaded near zero, but time increases in ~1.5. Why? I have setuped:
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1024))
I think, that not all Futures executed in parallel. But why? And ho to fix?
I see that in your code, the Futures don't block each other. It's more likely the database that is the bottleneck.
Is it possible to do a SQL join for O(1) rather than O(n) in terms of database calls? (If you're using Slick, have a look under the queries section about joins.)
If the load is low, it's probably that the connection pool is maxed out, you'd need to increase it for the database and the network.

How to properly use spray.io LruCache

I am quite an unexperienced spray/scala developer, I am trying to properly use spray.io LruCache. I am trying to achieve something very simple. I have a kafka consumer, when it reads something from its topic I want it to put the value it reads to cache.
Then in one of the routings I want to read this value, the value is of type string, what I have at the moment looks as follows:
object MyCache {
val cache: Cache[String] = LruCache(
maxCapacity = 10000,
initialCapacity = 100,
timeToLive = Duration.Inf,
timeToIdle = Duration(24, TimeUnit.HOURS)
)
}
to put something into cache i use following code:
def message() = Future { new String(singleMessage.message()) }
MyCache.cache(key, message)
Then in one of the routings I am trying to get something from the cache:
val res = MyCache.cache.get(keyHash)
The problem is the type of res is Option[Future[String]], it is quite hard and ugly to access the real value in this case. Could someone please tell me how I can simplify my code to make it better and more readable ?
Thanks in advance.
Don't try to get the value out of the Future. Instead call map on the Future to arrange for work to be done on the value when the Future is completed, and then complete the request with that result (which is itself a Future). It should look something like this:
path("foo") {
complete(MyCache.cache.get(keyHash) map (optMsg => ...))
}
Also, if singleMessage.message does not do I/O or otherwise block, then rather than creating the Future like you are
Future { new String(singleMessage.message) }
it would be more efficient to do it like so:
Future.successful(new String(singleMessage.message))
The latter just creates an already completed Future, bypassing the use of an ExecutionContext to evaluate the function.
If singleMessage.message does do I/O, then ideally you would do that I/O with some library (like Spray client, if it's an HTTP request) that returns a Future (rather than using Future { ... } to create another thread which will block).