Importing Archived TweetStream Twitter Ouput into Mongodb? - mongodb

I have around 1000 lines of twitter data captured using python
tweetstream. The data was collected using the simple tweetstream
example of:
>>> stream = tweetstream.SampleStream("username", "password")
>>> for tweet in stream:
... print tweet
which outputs like:
{u'user': {u'follow_request_sent': None,
u'profile_use_background_image': True,
u'profile_background_image_url_https': u'https://si0.twimg.com/
profile_background_images/
181013334/25957_1367646636642_1395984493_31038644_61586_n.jpg',
u'verified': False, u'profile_image_url_https': u'https://
si0.twimg.com/profile_images/1820265868/rosajennifer_normal.jpg',
u'profile_sidebar_fill_color': u'DDEEF6', u'id': 46478005,
u'profile_text_color': u'333333', u'followers_count': 505,
u'protected': False, u'location': u'', u'default_profile_image':
False, u'listed_count': 4, u'utc_offset': -21600, u'statuses_count':
35923, u'description': u'Take me as I am or watch me as I go. . .\n
\n', u'friends_count': 315, u'profile_link_color': u'0084B4',
u'profile_image_url': u'http://a1.twimg.com/profile_images/1820265868/
rosajennifer_normal.jpg', u'notifications': None,
u'show_all_inline_media': True, u'geo_enabled': False,
u'profile_background_color': u'C0DEED', u'id_str': u'46478005',
u'profile_background_image_url': u'http://a2.twimg.com/
profile_background_images/
181013334/25957_1367646636642_1395984493_31038644_61586_n.jpg',
u'name': u'rosa jennifer', u'lang': u'en', u'following': None,
u'profile_background_tile': True, u'favourites_count': 82,
u'screen_name': u'rosajennifer', u'url': u'http://www.facebook.com/
profile.php?ref=profile&id=1329240058', u'created_at': u'Thu Jun 11
20:11:28 +0000 2009', u'contributors_enabled': False, u'time_zone':
u'Central Time (US & Canada)', u'profile_sidebar_border_color':
u'C0DEED', u'default_profile': False, u'is_translator': False},
u'favorited': False, u'contributors': None, u'entities':
{u'user_mentions': [{u'indices': [1, 14], u'id': 90939650, u'id_str':
u'90939650', u'name': u'Dajuan(Dae-John)', u'screen_name':
u'Juan_Ton5oup'}], u'hashtags': [], u'urls': []}, u'text':
u'\u201c#Juan_Ton5oup: Spanish girls love jeans with animals outlined
on the back pockets.\u201dfoh lmao', u'created_at': u'Tue Feb 14
00:27:32 +0000 2012', u'truncated': False, u'retweeted': False,
u'in_reply_to_status_id': None, u'coordinates': None, u'id':
169216166617817088, u'source': u'<a href="http://twitter.com/#!/
download/ipad" rel="nofollow">Twitter for iPad</a>',
u'in_reply_to_status_id_str': None, u'in_reply_to_screen_name': None,
u'id_str': u'169216166617817088', u'place': None, u'retweet_count': 0,
u'geo': None, u'in_reply_to_user_id_str': None,
u'in_reply_to_user_id': None}
I have a single file of ~1000 of these, each on a seperate line. I've
tried mongoimport as well as a dozen other methods but I can't seem to
get this data imported. Mongoimport passes back this error:
Sat Mar 10 12:51:00 Assertion: 10340:Failure parsing JSON string near:
u'user': {
0x581762 0x528994 0xaa29f3 0xaa4ca3 0xa9b7dd 0xa9f772 0x34df82169d
0x4fe679
mongoimport(_ZN5mongo11msgassertedEiPKc+0x112) [0x581762]
mongoimport(_ZN5mongo8fromjsonEPKcPi+0x444) [0x528994]
mongoimport(_ZN6Import8parseRowEPSiRN5mongo7BSONObjERi+0x8b3)
[0xaa29f3]
mongoimport(_ZN6Import3runEv+0x16e3) [0xaa4ca3]
mongoimport(_ZN5mongo4Tool4mainEiPPc+0x169d) [0xa9b7dd]
mongoimport(main+0x32) [0xa9f772]
/lib64/libc.so.6(__libc_start_main+0xed) [0x34df82169d]
mongoimport(__gxx_personality_v0+0x3d1) [0x4fe679]
exception:Failure parsing JSON string near: u'user': {
I assume this is because the string is not actual json, it's some sort
of (json like) format.
Can anyone help?

The first problem as you noted, the following is not valid JSON, it's a python dictionary: {u'indices':.
Second problem, why are you trying to use mongoimport? In python you can just save the dictionary to the database. This is basically the first example of how to use MongoDB.
>>> from pymongo import Connection
>>> connection = Connection('localhost', 27017)
>>> db = connection.test_database
>>> posts = db.posts
>>> stream = tweetstream.SampleStream("username", "password")
>>> for tweet in stream:
... posts.insert(post)

The following code works, I had some issues with AST but managed to get past them once I updated python on my system. This script reads in a file line by line in python dictionary format and outputs valid JSON for import into MongoDB.
import json
import ast
mydict = open('data.txt', 'r')
for line in mydict:
line = ast.literal_eval(line)
line = json.dumps(line)
print line

Related

Creating a pyspark dataframe from exploded (nested) json values

I'm trying to get nested json values in a pyspark dataframe. I have easily solved this using pandas, but now I'm trying to get it working with just pyspark functions.
print(response)
{'ResponseMetadata': {'RequestId': 'PGMCTZNAPV677CWE', 'HostId': '/8qweqweEfpdegFSNU/hfqweqweqweSHtM=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': '/8yacqweqwe/hfjuSwKXDv3qweqweqweHtM=', 'x-amz-request-id': 'PqweqweqweE', 'date': 'Fri, 09 Sep 2022 09:25:04 GMT', 'x-amz-bucket-region': 'eu-central-1', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'IsTruncated': False, 'Contents': [{'Key': 'qweqweIntraday.csv', 'LastModified': datetime.datetime(2022, 7, 12, 8, 32, 10, tzinfo=tzutc()), 'ETag': '"qweqweqwe4"', 'Size': 1165, 'StorageClass': 'STANDARD'}], 'Name': 'test-bucket', 'Prefix': '', 'MaxKeys': 1000, 'EncodingType': 'url', 'KeyCount': 1}
With pandas I can parse this input into a dataframe with the following code:
object_df = pd.DataFrame()
for elem in response:
if 'Contents' in elem:
object_df = pd.json_normalize(response['Contents'])
print(object_df)
Key LastModified \
0 202207110000_qweIntraday.csv 2022-07-12 08:32:10+00:00
ETag Size StorageClass
0 "fqweqweqwee0cb4" 1165 STANDARD
(there are sometimes multiple "Contents", so I have to use recursion).
This was my attempt to replicate this with spark dataframe, and sc.parallelize:
object_df = spark.sparkContext.emptyRDD()
for elem in response:
if 'Contents' in elem:
rddjson = spark.read.json(sc.parallelize([response['Contents']]))
Also tried:
sqlc = SQLContext(sc)
rddjson = spark.read.json(sc.parallelize([response['Contents']]))
df = sqlc.read.json("multiline", "true").json(rddjson)
df.show()
+--------------------+
| _corrupt_record|
+--------------------+
|[{'Key': '2/3c6a6...|
+--------------------+
This is not working. I already saw some related posts, saying that I can use explode like in this example (stackoverflow answer) instead of json_normalize, but i'm having trouble replicating the example.
Any suggestion how I can solve this with pyspark or pyspark.sql (and not adding additional libraries) is very welcome.
It looks like the issue is with the data containing a python datetime object (in the LastModified field).
One way around this might be (assuming your ok with python standard libraries):
import json
sc = spark.sparkContext
for elem in response:
if 'Contents' in elem:
json_str = json.dumps(response['Contents'], default=str)
object_df = spark.read.json(sc.parallelize([json_str]))

Error creating universal sentence encoder embeddings using beam & tf transform

I have a simple beam pipline that takes some text and gets embeddings using universal sentence encoder with tf transform. Very similar to the demo made using tf 1.
import tensorflow as tf
import apache_beam as beam
import tensorflow_transform.beam as tft_beam
import tensorflow_transform.coders as tft_coders
from apache_beam.options.pipeline_options import PipelineOptions
import tempfile
model = None
def embed_text(text):
import tensorflow_hub as hub
global model
if model is None:
model = hub.load(
'https://tfhub.dev/google/universal-sentence-encoder/4')
embedding = model(text)
return embedding
def get_metadata():
from tensorflow_transform.tf_metadata import dataset_schema
from tensorflow_transform.tf_metadata import dataset_metadata
metadata = dataset_metadata.DatasetMetadata(dataset_schema.Schema({
'id': dataset_schema.ColumnSchema(
tf.string, [], dataset_schema.FixedColumnRepresentation()),
'text': dataset_schema.ColumnSchema(
tf.string, [], dataset_schema.FixedColumnRepresentation())
}))
return metadata
def preprocess_fn(input_features):
text_integerized = embed_text(input_features['text'])
output_features = {
'id': input_features['id'],
'embedding': text_integerized
}
return output_features
def run(pipeline_options, known_args):
argv = None # if None, uses sys.argv
pipeline_options = PipelineOptions(argv)
pipeline = beam.Pipeline(options=pipeline_options)
with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
articles = (
pipeline
| beam.Create([
{'id':'01','text':'To be, or not to be: that is the question: '},
{'id':'02','text':"Whether 'tis nobler in the mind to suffer "},
{'id':'03','text':'The slings and arrows of outrageous fortune, '},
{'id':'04','text':'Or to take arms against a sea of troubles, '},
]))
articles_dataset = (articles, get_metadata())
transformed_dataset, transform_fn = (
articles_dataset
| 'Extract embeddings' >> tft_beam.AnalyzeAndTransformDataset(preprocess_fn)
)
transformed_data, transformed_metadata = transformed_dataset
_ = (
transformed_data | 'Write embeddings to TFRecords' >> beam.io.tfrecordio.WriteToTFRecord(
file_path_prefix='{0}'.format(known_args.output_dir),
file_name_suffix='.tfrecords',
coder=tft_coders.example_proto_coder.ExampleProtoCoder(
transformed_metadata.schema),
num_shards=1
)
)
result = pipeline.run()
result.wait_until_finished()
python 3.6.8, tf==2.0, tf_transform==0.15, apache-beam[gcp]==0.16 (I tried various compatible combos from https://github.com/tensorflow/transform)
I am getting an error when tf_transform calls the graph analyser:
...
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/beam/impl.py", line 462, in process
lambda: self._make_graph_state(saved_model_dir))
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tfx_bsl/beam/shared.py", line 221, in acquire
return _shared_map.acquire(self._key, constructor_fn)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tfx_bsl/beam/shared.py", line 184, in acquire
result = control_block.acquire(constructor_fn)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tfx_bsl/beam/shared.py", line 87, in acquire
result = constructor_fn()
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/beam/impl.py", line 462, in <lambda>
lambda: self._make_graph_state(saved_model_dir))
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/beam/impl.py", line 438, in _make_graph_state
self._exclude_outputs, self._tf_config)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/beam/impl.py", line 357, in __init__
tensor_inputs = graph_tools.get_dependent_inputs(graph, inputs, fetches)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/graph_tools.py", line 686, in get_dependent_inputs
sink_tensors_ready)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/graph_tools.py", line 499, in __init__
table_init_op, graph_analyzer_for_table_init, translate_path_fn)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/graph_tools.py", line 560, in _get_table_init_op_source_info
if table_init_op.type not in _TABLE_INIT_OP_TYPES:
AttributeError: 'Tensor' object has no attribute 'type' [while running 'Extract embeddings/TransformDataset/Transform']
Exception ignored in: <bound method CapturableResourceDeleter.__del__ of <tensorflow.python.training.tracking.tracking.CapturableResourceDeleter object at 0x14152fbe0>>
Traceback (most recent call last):
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_core/python/training/tracking/tracking.py", line 190, in __del__
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3872, in as_default
File "/Users/justingrace/.pyenv/versions/3.6.8/lib/python3.6/contextlib.py", line 159, in helper
TypeError: 'NoneType' object is not callable
It appears like the graph analyser is expecting a list of operations with a type attribute but it is receiving a tensor. I can't grasp why this error is occuring other than a bug in the graph analyzer or a compatibility issue with tfx_bsl (there seem to be issues with pyarrow 0.14 so I have downgraded to 0.13)
Output of pip freeze:
absl-py==0.8.1
annoy==1.12.0
apache-beam==2.16.0
appnope==0.1.0
astor==0.8.1
astunparse==1.6.3
attrs==19.1.0
avro-python3==1.9.1
backcall==0.1.0
bleach==3.1.0
cachetools==3.1.1
certifi==2019.11.28
chardet==3.0.4
crcmod==1.7
cymem==1.31.2
cytoolz==0.9.0.1
decorator==4.4.1
defusedxml==0.6.0
dill==0.3.0
docopt==0.6.2
en-core-web-lg==2.0.0
en-coref-lg==3.0.0
en-ner-trained==2.0.0
entrypoints==0.3
fastavro==0.21.24
fasteners==0.15
flashtext==2.7
future==0.18.2
fuzzywuzzy==0.16.0
gast==0.2.2
google-api-core==1.16.0
google-apitools==0.5.28
google-auth==1.11.0
google-auth-oauthlib==0.4.1
google-cloud-bigquery==1.17.1
google-cloud-bigtable==1.0.0
google-cloud-core==1.3.0
google-cloud-datastore==1.7.4
google-cloud-pubsub==1.0.2
google-pasta==0.1.8
google-resumable-media==0.4.1
googleapis-common-protos==1.51.0
grpc-google-iam-v1==0.12.3
grpcio==1.24.0
h5py==2.10.0
hdfs==2.5.8
httplib2==0.12.0
idna==2.8
importlib-metadata==1.5.0
ipykernel==5.1.4
ipython==7.12.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.16.0
Jinja2==2.11.1
jsonpickle==1.2
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.1.0
jupyter-core==4.6.2
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
lxml==4.2.1
Markdown==3.2.1
MarkupSafe==1.1.1
mistune==0.8.4
mock==2.0.0
monotonic==1.5
more-itertools==8.2.0
msgpack==0.6.2
msgpack-numpy==0.4.4
murmurhash==0.28.0
nbconvert==5.6.1
nbformat==5.0.4
networkx==2.1
nltk==3.4.5
notebook==6.0.3
numpy==1.18.1
oauth2client==3.0.0
oauthlib==3.1.0
opt-einsum==3.1.0
packaging==20.1
pandas==0.23.0
pandocfilters==1.4.2
parso==0.6.1
pathlib2==2.3.5
pbr==5.4.4
pexpect==4.8.0
pickleshare==0.7.5
plac==0.9.6
pluggy==0.13.1
preshed==1.0.1
prometheus-client==0.7.1
prompt-toolkit==3.0.3
proto-google-cloud-datastore-v1==0.90.4
protobuf==3.11.3
psutil==5.6.7
ptyprocess==0.6.0
py==1.8.1
pyahocorasick==1.4.0
pyarrow==0.13.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pydot==1.4.1
Pygments==2.5.2
PyHamcrest==1.9.0
pymongo==3.10.1
pyparsing==2.4.6
pyrsistent==0.15.7
pytest==5.3.5
python-dateutil==2.8.0
python-Levenshtein==0.12.0
pytz==2019.3
PyYAML==3.13
pyzmq==18.1.1
qtconsole==4.6.0
regex==2017.4.5
repoze.lru==0.7
requests==2.22.0
requests-oauthlib==1.3.0
rsa==4.0
scikit-learn==0.19.1
scipy==1.4.1
Send2Trash==1.5.0
six==1.14.0
spacy==2.0.12
tb-nightly==2.2.0a20200217
tensorboard==2.0.2
tensorflow==2.0.0
tensorflow-estimator==2.0.1
tensorflow-hub==0.6.0
tensorflow-metadata==0.15.2
tensorflow-serving-api==2.1.0
tensorflow-transform==0.15.0
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
textblob==0.15.1
tf-estimator-nightly==2.1.0.dev2020012309
tf-nightly==2.2.0.dev20200217
tfx-bsl==0.15.0
thinc==6.10.3
toolz==0.10.0
tornado==6.0.3
tqdm==4.23.3
traitlets==4.3.3
typing==3.7.4.1
typing-extensions==3.7.4.1
ujson==1.35
Unidecode==1.0.22
urllib3==1.25.8
wcwidth==0.1.8
webencodings==0.5.1
Werkzeug==1.0.0
Whoosh==2.7.4
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==2.2.0
This could be an underlying issue according to this github post. Try using an updated version of tensorflow (2.1.0), or maybe even an updated version of your keras packages.

How to create data frames from rdd of word's list

I have gone through all the answers of the stackoverflow and on internet but nothing works.so i have this rdd of list of words:
tweet_words=['tweet_text',
'RT',
'#ochocinco:',
'I',
'beat',
'them',
'all',
'for',
'10',
'straight',
'hours']
**What i have done till now:**
Df =sqlContext.createDataFrame(tweet_words,["tweet_text"])
and
tweet_words.toDF(['tweet_words'])
**ERROR**:
TypeError: Can not infer schema for type: <class 'str'>
Looking at the above code, you are trying to convert a list to a DataFrame. A good StackOverflow link on this is: https://stackoverflow.com/a/35009289/1100699.
Saying this, here's a working version of your code:
from pyspark.sql import Row
# Create RDD
tweet_wordsList = ['tweet_text', 'RT', '#ochocinco:', 'I', 'beat', 'them', 'all', 'for', '10', 'straight', 'hours']
tweet_wordsRDD = sc.parallelize(tweet_wordsList)
# Load each word and create row object
wordRDD = tweet_wordsRDD.map(lambda l: l.split(","))
tweetsRDD = wordRDD.map(lambda t: Row(tweets=t[0]))
# Infer schema (using reflection)
tweetsDF = tweetsRDD.toDF()
# show data
tweetsDF.show()
HTH!

bulbs rexster system error

I am using Rexster 2.4.0 and Bulbs 0.3.14
With Rexster running on localhost, I am trying to get familiar with Bulbs, yet when trying:
>>>from bulbs.rexster import Graph
>>>g = Graph()
Traceback (most recent call last):
File "", line 1, in
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/rexster/graph.py", line 54, in init
super(Graph, self).init(config)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/base/graph.py", line 58, in init
self.vertices = self.build_proxy(Vertex)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/base/graph.py", line 124, in build_proxy
return self.factory.build_element_proxy(element_class, index_class)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/factory.py", line 19, in build_element_proxy
primary_index = self.get_index(element_class,index_class,index_name)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/factory.py", line 27, in get_index
index = index_proxy.get_or_create(index_name)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/rexster/index.py", line 80, in get_or_create
resp = self.client.get_or_create_vertex_index(index_name, index_params)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/rexster/client.py", line 660, in get_or_create_vertex_index
resp = self.gremlin(script, params)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/rexster/client.py", line 354, in gremlin
return self.request.post(gremlin_path,params)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/rest.py", line 128, in post
return self.request(POST, path, params)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/rest.py", line 183, in request
return self.response_class(http_resp, self.config)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/rexster/client.py", line 198, in init
self.handle_response(response)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/rexster/client.py", line 222, in handle_response
response_handler(http_resp)
File "/Users/lolmac/anaconda/lib/python2.7/site-packages/bulbs/rest.py", line 50, in server_error
raise SystemError(http_resp)
SystemError: ({'status': '500', 'transfer-encoding': 'chunked', 'server': 'grizzly/2.2.16', 'connection': 'close', 'date': 'Mon, 14 Oct 2013 19:43:45 GMT', 'access-control-allow-origin': '*', 'content-type': 'application/json'}, '{"message":"","error":"javax.script.ScriptException: groovy.lang.MissingMethodException: No signature of method: groovy.lang.MissingMethodException.stopTransaction() is applicable for argument types: () values: []","api":{"description":"evaluate an ad-hoc Gremlin script for a graph.","parameters":{"rexster.returnKeys":"an array of element property keys to return (default is to return all element properties)","rexster.showTypes":"displays the properties of the elements with their native data type (default is false)","load":"a list of \'stored procedures\' to execute prior to the \'script\' (if \'script\' is not specified then the last script in this argument will return the values","rexster.offset.end":"end index for a paged set of data to be returned","rexster.offset.start":"start index for a paged set of data to be returned","params":"a map of parameters to bind to the script engine","language":"the gremlin language flavor to use (default to groovy)","script":"the Gremlin script to be evaluated"}},"success":false}')
this is an old post: https://groups.google.com/forum/#!msg/gremlin-users/s7Ag1tjbxLs/kaBOSyed_9kJ, but it seems other folks encountered the same problem. Still, i was not able to find any documentation that indicates what is wrong or what to change in the default configuration.
grateful to any docs/discussions or ideas that can provide a hint.
You are getting this error because Bulbs 0.3.14 hadn't been updated to TinkerPop 2.4, but that's fixed now -- I just updated Bulbs-Rexster to TinkerPop 2.5.0-SNAPSHOT and pushed the Bulbs 0.3.15 to both GitHub and PyPi. All tests pass. Please let me know if that fixes it for you.

Unregistered task in Celery

tasks.py:
from celery import Celery
from django.http import HttpResponse
from anyjson import serialize
celery = Celery('tasks', broker='amqp://guest#localhost//')
##celery.task
def add(request):
x = int(request.GET['x'])
y = int(request.GET['y'])
result = x+y
response = {'status': 'success', 'retval': result}
return HttpResponse(serialize(response), mimetype='application/json')
after starting the worker,i tried http dispatcher:
from celery.task.http import URL
res = URL('http://localhost/add').get_async(x=10, y=10)
res.state
'PENDING'
and got error like,
[2013-04-03 06:39:54,791: ERROR/MainProcess] Received unregistered task of type u'celery.task.http.HttpDispatchTask'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you are using relative imports?
More: http://docs.celeryq.org/en/latest/userguide/tasks.html#names
The full contents of the message body was:
{u'utc': True, u'chord': None, u'args': [u'http://localhost/add', u'GET'], u'retries': 0, u'expires': None, u'task': u'celery.task.http.HttpDispatchTask', u'callbacks': None, u'errbacks': None, u'taskset': None, u'kwargs': {u'y': 10, u'x': 10}, u'eta': None, u'id': u'29f83cc9-ba5a-4008-9d2d-6f7bb24b0cfc'} (288b)
Traceback (most recent call last):
File "/usr/local/gdp/python2.7.2/lib/python2.7/site-packages/celery-3.0.13-py2.7.egg/celery/worker/consumer.py", line 435, in on_task_received
strategies[name](message, body, message.ack_log_error)
KeyError: u'celery.task.http.HttpDispatchTask'
Apologies if I'm stating the obvious, but the task decorator ##celery.task is commented out.