I want to share the url queue by different jobs of the same spider. The JOBDIR setting is just used for storing the state of a single job. Is there any solution for sharing the url queue?
I would take a look at scrapy-redis for this. It seems adequate.
Of course, this is if you don't mind adding Redis as dependency.
Related
I relatively new to web workers (simply had no need until now) and I did a lot of research and think I get the basics...
But :-)...
I'm stuck and hope for definitive input.
I'm rendering a graphic representation of an audio-file with the WebAudioAPI into an SVG. NO rocket science and it works to my satisfaction. With larger Audio-Files however it would be great to do it an web worker, The problem I have is however that inside a web worker I do not have access to the window object, and therefore I cannot access the AudioContext, which I would need to decode the raw data into an AudioBuffer. Is there another way to do it or a way around?
No, it is not possible to use WebAudio in a Worker. You will have to use the main thread with WebAudio and then transfer the data you need to the worker.
But see also the spec issue on supporting AudioContext in a Worker
I'm trying to do multiple request in background to download many jsons and check data from them but I don't know how to use AFNetworking in that case.
I tried to do like Wiki explaings but when it's going to download the second file then the app breaks. I want to do all the process in background.
Thanks
AFNetworking will definitely handle this. We use it for exchanging data with a RESTful set of services. The things to keep in mind:
An operation (eg. AFHTTPRequestOperation) can only be used once.
An operation is asynchronous.
Put your operations in an NSOperationQueue, or use AFHTTPClient (suggested) to manage the operations for you.
When sending multiple requests, always assume that the responses will come back in a random sequence. There is no guarantee that you will get the responses in the same sequence as the requests.
Hope this helps to point you towards a solution to your problem. Without more detail in your question, it's difficult to give you a specific answer.
Check out AFHTTPClient's
enqueueBatchOfHTTPRequestOperations:progressBlock:completionBlock:, which lets you enqueue multiple requests operations at once with the added bonus of having a completion handler that is called when all of those requests have finished, as well as a block for tracking the progress. Also note, that every single operation can still have its own completion handler (useful if you have to process the results of a request, for example).
If you don't need to customize the request operation (and don't need individual completion blocks), you can also use enqueueBatchOfHTTPRequestOperationsWithRequests:progressBlock:completionBlock:, which allows you to pass an array of NSURLRequest directly without having to build the operations yourself.
We have a predominantly RESTful architecture for our product. This enables us to nicely implement almost all of the required functionality, except this new requirement thats come in.
I need to implement a page which lets the user to large scale DB operations synchronously. They can stop the operation in between, if they realized they made a mistake (rather than waiting for it to complete and doing an undo operation)
I was wondering if some one could give some pointers as to what would be the best way to implement such a functionality?
Cheers!
Nirav
How about a resource that encapsulates a set of batch operations? Creating the resource means kicking off the operations (data to indicate what the operations should do is submitted via POST). Updating the resource allows stopping it or modifying it while processing.
I would kick off the large operation in a separate thread. Show the user a constantly updated status of the thread, along with a Cancel button. If the user clicks the Cancel button, you kill thread.
This is the way I've implemented similar things in the past.
The idea is to give them their control back immediately, but don't let them do anything else until the thread is complete except cancel.
In generic terms, you need a "job queue" and a way to manage the queue.
You need to integrate a batch manager, or implement your own. There are several products that can help you. As an example, read this article
http://www.ibm.com/developerworks/websphere/techjournal/0801_vignola/0801_vignola.html
Hi I have design/architecture question. I would like to send emails from one of my jsp pages. I have one particular issue that has been a little bit of a problem. there is an instance where one of the pages will need to send around 50 emails at near the same time. I would like the messages sent to a queue where a background thread will actually do the email sending. What is the appropriate way to solve this problem? If you know of a tutorial, example code or tomcat configuration is needed please let me know.
Thanks,
Your solution is rather sound: append the messages to a internal queue and then let some background task handle them.
Here are a few pointers that may be useful:
Unless you want to go distributed (in which case you should look at JMS), use a BlockingQueue implementation for your queue. In your background thread, just do an infinite loop while take()-ing messages from the queue. Those classes take care of potential concurrency issues for you.
Use a ServletContextListener to set up your background thread when your Web application starts and when it is stopped.
One possible problem with using a raw BlockingQueue is that when your Web application is stopped, all the messages in the queue are lost. If that's a serious problem, then it would probably be easiest just to use a database for the queue and to use notify() to wake up your background thread, which then processes all requests from the database.
I have a web application that I am adding workflow functionality to using Windows Workflow Foundation. I have based my solution around K. Scott Allen's Orders Workflow example on OdeToCode. At the start I didn't realise the significance of the caveat "if you use Delay activities with and configure active timers for the manual scheduling service, these events will happen on a background thread that is not associated with an HTTP request". I now need to use Delay activities and it doesn't work as is with his solution architecture. Has anyone come across this and found a good solution to this? The example is linked to from a lot of places but I haven't seen anyone else come across this issue and it seems like a bit of a show stopper to me.
Edit: The problem is that the results from the workflow are returned to the the web application via HttpContext. I am using the ManualWorkflowSchedulerService with the useActiveTimers and this works fine for most situations because workflow events are fired from the web app and HttpContext still exists when the workflow results are returned and the web app can continue processing. When a delay activity is used processing happens on a background thread and when it tries to return results to the web app, there is no valid HttpContext (because there has been no Http Request), so further processing fails. That is, the webapp is trying to process the workflow results but there has been no http request.
I think I need to do all post Delay activity processing within the workflow rather than handing off to the web app.
Cheers.
You didn't describe the problem you are having. But maybe this is of some help.
You can use the ManualWorkflowSchedulerService with the useActiveTimers and the workflow will continue on another thread. Normally this is fine because your HTTP request has already finished and it doesn't really matter.
If however you need full control the workflow runtime will let you get a handle on all loaded workflows using the GetLoadedWorkflows() function. This will return acollection of WorkflowInstance objects. usign these you can can call the GetWorkflowNextTimerExpiration() to check which is expired. If one is you can manually resume it. In this case you want to use the ManualWorkflowSchedulerService with the useActiveTimers=false so you can control the last thread as well. However in most cases using useActiveTimers=true works perfectly well.