Apache Beam ReadFromSpanner decoding issue - apache-beam

I'm trying to run the following script in a GCP Dataflow pipeline.
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from typing import NamedTuple, Optional
from apache_beam.io.gcp.spanner import *
from past.builtins import unicode
import logging
class ItemRow(NamedTuple):
item_id: unicode
class LogResults(beam.DoFn):
"""Just log the results"""
def process(self, element):
logging.info("row: %s", element)
yield element
class SpannerToSpannerAndBigQueryPipelineOptions(PipelineOptions):
"""
Runtime Parameters given during template execution
path parameter is necessary for execution of pipeline
"""
#classmethod
def _add_argparse_args(cls, parser):
parser.add_argument(
'--SOURCE_SPANNER_PROJECT_ID', type=str, help='Source Spanner project ID',
default='project_id')
parser.add_argument(
'--SOURCE_SPANNER_DATASET_ID', type=str, help='Source Spanner dataset ID',
default='dataset_id')
parser.add_argument(
'--SOURCE_SPANNER_INSTANCE_ID', type=str, help='Source Spanner instance ID',
default='instance_id')
parser.add_argument(
'--SOURCE_QUERY', type=str, help='SQL to run in Source Spanner Instance',
required=True)
# Setup pipeline
def run():
beam.coders.registry.register_coder(ItemRow, beam.coders.RowCoder)
pipeline_options = PipelineOptions()
p = beam.Pipeline(options=pipeline_options)
importer_options = pipeline_options.view_as(
SpannerToSpannerAndBigQueryPipelineOptions)
rows = (
p
| "Read from source Spanner" >> ReadFromSpanner(
project_id=importer_options.SOURCE_SPANNER_PROJECT_ID,
instance_id=importer_options.SOURCE_SPANNER_INSTANCE_ID,
database_id=importer_options.SOURCE_SPANNER_DATASET_ID,
row_type=ItemRow,
sql='Select item_id from Items WHERE created_ts BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 5 SECOND) AND CURRENT_TIMESTAMP()',
timestamp_bound_mode=TimestampBoundMode.MAX_STALENESS,
staleness=3,
time_unit=TimeUnit.HOURS,
).with_output_types(ItemRow)
)
rows | 'Log results' >> beam.ParDo(LogResults())
result = p.run()
result.wait_until_finish()
if __name__ == "__main__":
run()
However, I've been running into issues for decoding the results obtained from Spanner. These are the output logs from my Dataflow job:
"An exception was raised when trying to execute the workitem 6665479626992209510 : Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
work_executor.execute()
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 179, in execute
op.start()
File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
File "dataflow_worker/native_operations.py", line 48, in dataflow_worker.native_operations.NativeReadOperation.start
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/inmemory.py", line 108, in __iter__
yield self._source.coder.decode(value)
File "/usr/local/lib/python3.7/site-packages/apache_beam/coders/coders.py", line 468, in decode
return self.get_impl().decode(encoded)
File "apache_beam/coders/coder_impl.py", line 226, in apache_beam.coders.coder_impl.StreamCoderImpl.decode
File "apache_beam/coders/coder_impl.py", line 228, in apache_beam.coders.coder_impl.StreamCoderImpl.decode
File "apache_beam/coders/coder_impl.py", line 123, in apache_beam.coders.coder_impl.CoderImpl.decode_from_stream
File "/usr/local/lib/python3.7/site-packages/apache_beam/coders/row_coder.py", line 215, in decode_from_stream
is_null in zip(self.components, nulls)))
File "/usr/local/lib/python3.7/site-packages/apache_beam/coders/row_coder.py", line 215, in <genexpr>
is_null in zip(self.components, nulls)))
File "apache_beam/coders/coder_impl.py", line 259, in apache_beam.coders.coder_impl.CallbackCoderImpl.decode_from_stream
File "apache_beam/coders/coder_impl.py", line 261, in apache_beam.coders.coder_impl.CallbackCoderImpl.decode_from_stream
File "/usr/local/lib/python3.7/site-packages/apache_beam/coders/coders.py", line 414, in decode
return value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 9: invalid start byte
"
I'm unsure as to how to solve this problem. I'm using this https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.io.gcp.spanner.html?highlight=spanner#module-apache_beam.io.gcp.spanner example as a starting point. The issue appears to be in decoding the results obtained from Spanner. There is little to no documentation on how to specify the schema for the Spanner table/tables I'm trying to query.
There is also an experimental IO module for Spanner which does not use the Java expansion module. Is it recommended to switch to the experimental version?
Thanks

I could not run a pipeline using the apache_beam.io.gcp.spanner module so I ended up using the apache_beam.io.gcp.experimental.spannerio module instead.

Related

Odoo 16: Playing with translations and Boom ! No longer can access my custom model data in my server

After successfully adding a pot file to my new i18n folder in my local machine, as well as setting "translate=True" in a couple of fields in my carddecks module, and verifying that in localhost I could acess my model data, I decided to update my server.
But when I try to access my model data I get the following error:
LINE 1: ..."write_date", COALESCE("carddecks_card"."cardText"->>'pt_PT'...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
Anyone might know what may be causing this?
source code for the module can be found at https://github.com/diogocsc/carddecks
Full error:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/odoo/http.py", line 1579, in _serve_db
return service_model.retrying(self._serve_ir_http, self.env)
File "/usr/lib/python3/dist-packages/odoo/service/model.py", line 134, in retrying
result = func()
File "/usr/lib/python3/dist-packages/odoo/http.py", line 1608, in _serve_ir_http
response = self.dispatcher.dispatch(rule.endpoint, args)
File "/usr/lib/python3/dist-packages/odoo/http.py", line 1805, in dispatch
result = self.request.registry['ir.http']._dispatch(endpoint)
File "/usr/lib/python3/dist-packages/odoo/addons/website/models/ir_http.py", line 235, in _dispatch
response = super()._dispatch(endpoint)
File "/usr/lib/python3/dist-packages/odoo/addons/base/models/ir_http.py", line 144, in _dispatch
result = endpoint(**request.params)
File "/usr/lib/python3/dist-packages/odoo/http.py", line 698, in route_wrapper
result = endpoint(self, *args, **params_ok)
File "/usr/lib/python3/dist-packages/odoo/addons/web/controllers/dataset.py", line 42, in call_kw
return self._call_kw(model, method, args, kwargs)
File "/usr/lib/python3/dist-packages/odoo/addons/web/controllers/dataset.py", line 33, in _call_kw
return call_kw(request.env[model], method, args, kwargs)
File "/usr/lib/python3/dist-packages/odoo/api.py", line 457, in call_kw
result = _call_kw_model(method, model, args, kwargs)
File "/usr/lib/python3/dist-packages/odoo/api.py", line 430, in _call_kw_model
result = method(recs, *args, **kwargs)
File "/usr/lib/python3/dist-packages/odoo/addons/web/models/models.py", line 62, in web_search_read
records = self.search_read(domain, fields, offset=offset, limit=limit, order=order)
File "/usr/lib/python3/dist-packages/odoo/models.py", line 4968, in search_read
result = records.read(fields, **read_kwargs)
File "/usr/lib/python3/dist-packages/odoo/models.py", line 2992, in read
self._read(stored_fields)
File "/usr/lib/python3/dist-packages/odoo/models.py", line 3235, in _read
cr.execute(query_str, params + [sub_ids])
File "/usr/lib/python3/dist-packages/odoo/sql_db.py", line 315, in execute
res = self._obj.execute(query, params)
psycopg2.errors.UndefinedFunction: operator does not exist: character varying ->> unknown
LINE 1: ..."write_date", COALESCE("carddecks_card"."cardText"->>'pt_PT'...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
The above server error caused the following client error:
OwlError: The following error occurred in onWillStart: "Odoo Server Error"
at wrapError (https://www.relationalgames.com/web/assets/504-41b52e3/web.assets_common.min.js:1445:77)
at onWillStart (https://www.relationalgames.com/web/assets/504-41b52e3/web.assets_common.min.js:1451:117)
at useModel (https://www.relationalgames.com/web/assets/505-11285c6/web.assets_backend.min.js:4709:1)
at ListController.setup (https://www.relationalgames.com/web/assets/505-11285c6/web.assets_backend.min.js:4430:645)
at new ComponentNode (https://www.relationalgames.com/web/assets/504-41b52e3/web.assets_common.min.js:1407:136)
at https://www.relationalgames.com/web/assets/504-41b52e3/web.assets_common.min.js:1929:6
at View.slot1 (eval at compile (https://www.relationalgames.com/web/assets/504-41b52e3/web.assets_common.min.js:1892:370), <anonymous>:15:36)
at callSlot (https://www.relationalgames.com/web/assets/504-41b52e3/web.assets_common.min.js:1508:25)
at WithSearch.template (eval at compile (https://www.relationalgames.com/web/assets/504-41b52e3/web.assets_common.min.js:1892:370), <anonymous>:8:12)
at Fiber._render (https://www.relationalgames.com/web/assets/504-41b52e3/web.assets_common.min.js:1336:96)
Caused by: RPC_ERROR: Odoo Server Error
at makeErrorFromResponse (https://www.relationalgames.com/web/assets/505-11285c6/web.assets_backend.min.js:967:163)
at XMLHttpRequest.<anonymous> (https://www.relationalgames.com/web/assets/505-11285c6/web.assets_backend.min.js:974:13)
Found the answer for this. In my localhost I was updating my module. In my server i was just running docker-compose up. Not updating the module in the db.
The solution was going to settings -> apps search by cardecks and upgrade it.

Airflow - email operator sending multiple files issue

I am using airflow 2.2. I am trying to send multiple files using airflow email operator. the files list will be generated dynamically and using XCom pull to get the list of files from the previous task. For some reason, email operator files parameter is NOT able to read the files list from XCom value. Kindly advise.
Error details:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
args.func(args, dag=self.dag)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 292, in task_run
_run_task_by_selected_method(args, dag, ti)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
_run_raw_task(args, ti)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 184, in _run_raw_task
error_file=args.error_file,
File "/usr/local/lib/python3.7/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
result = execute_callable(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/email.py", line 88, in execute
conn_id=self.conn_id,
File "/usr/local/lib/python3.7/site-packages/airflow/utils/email.py", line 66, in send_email
**kwargs,
File "/usr/local/lib/python3.7/site-packages/airflow/utils/email.py", line 99, in send_email_smtp
mime_charset=mime_charset,
File "/usr/local/lib/python3.7/site-packages/airflow/utils/email.py", line 157, in build_mime_message
with open(fname, "rb") as file:
FileNotFoundError: [Errno 2] No such file or directory: '['
[2022-09-18, 17:24:22 UTC] {{local_task_job.py:154}} INFO - Task exited with return code 1
[2022-09-18, 17:24:23 UTC] {{local_task_job.py:264}} INFO - 0 downstream tasks scheduled from follow-on schedule check
from airflow import DAG
from airflow.operators.email import EmailOperator
from airflow.operators.python import PythonOperator
import os
from datetime import datetime, timedelta
default_args = {
"owner": 'TEST',
"depends_on_past": False,
"email_on_failure": False,
"email_on_retry": False,
"retries": 0,
}
with DAG(
dag_id="test9_email_operator_dag",
default_args=default_args,
start_date=datetime(2022, 9, 14),
end_date=datetime(2022, 9, 15),
catchup=True,
max_active_runs=1,
schedule_interval="0 12 * * *", # Runs every day # 8AM EST
) as dag:
def print_local_folder_files(local_temp_folder):
print("local folder files => ", os.listdir(local_temp_folder))
files_list = []
for file in os.listdir(local_temp_folder):
files_list.append(local_temp_folder + file)
print("files_list => ", files_list)
return files_list
print_local_folder_files = PythonOperator(
task_id='print_local_folder_files',
python_callable=print_local_folder_files,
op_kwargs={'local_temp_folder': "/usr/local/airflow/dags/temp_dir/"},
do_xcom_push=True,
provide_context=True,
dag=dag)
send_email = EmailOperator(
task_id='send_email',
to='test#gmail.com',
subject='Test Email op Notification',
html_content='Test email op notification email. ',
files="{{ task_instance.xcom_pull(task_ids='print_local_folder_files') }}"
)
print_local_folder_files >> send_email
You pushed a list to Xcom but Xcoms are rendered as string by default so what you have there is a string representation of list. This is why when you try to read it you get the first char because when you iterate over a string you get it char by char.
To solve your issue you should set render_template_as_native_obj=True on the DAG object:
with DAG(
dag_id="test9_email_operator_dag",
...,
render_template_as_native_obj=True,
) as dag:
This will let Jinja engine know that you expect to render as native Python types so you will get a list rather than a string.
For more information check Airflow docs on this feature.

Chaining an iterable task result to a group, then chaining the results of this group to a chord

Related: How to chain a Celery task that returns a list into a group? and Chain a celery task's results into a distributed group, but they lack a MWE, to my understanding do not fully cover my needs, and are maybe outdated.
I need to dynamically pass the output of a task to a parallel group of tasks, then pass the results of this group to a final task. I would like to make this a single "meta task".
Here's my attempt so far:
# chaintasks.py
import random
from typing import List, Iterable, Callable
from celery import Celery, signature, group, chord
app = Celery(
"chaintasks",
backend="redis://localhost:6379",
broker="pyamqp://guest#localhost//",
)
app.conf.update(task_track_started=True, result_persistent=True)
#app.task
def select_items(items: List[str]):
print("In select_items, received", items)
selection = tuple(set(random.choices(items, k=random.randint(1, len(items)))))
print("In select_items, sending", selection)
return selection
#app.task
def process_item(item: str):
print("In process_item, received", item)
return item + "_processed"
#app.task
def group_items(items: List[str]):
print("In group_items, received", items)
return ":".join(items)
#app.task
def dynamic_map_group(iterable: Iterable, callback: Callable):
# Map a callback over an iterator and return as a group
# Credit to https://stackoverflow.com/a/13569873/5902284
callback = signature(callback)
return group(callback.clone((arg,)) for arg in iterable).delay()
#app.task
def dynamic_map_chord(iterable: Iterable, callback: Callable):
callback = signature(callback)
return chord(callback.clone((arg,)) for arg in iterable).delay()
Now, to try this out, let's spin up a celery worker and use docker to for rabbitmq and redis
$ docker run -p 5672:5672 rabbitmq
$ docker run -p 6379:6379 redis
$ celery -A chaintasks:app worker -E
Now, let's try to illustrate what I need:
items = ["item1", "item2", "item3", "item4"]
print(select_items.s(items).apply_async().get())
This outputs: ['item3'], great.
meta_task1 = select_items.s(items) | dynamic_map_group.s(process_item.s())
meta_task1_result = meta_task1.apply_async().get()
print(meta_task1_result)
It gets trickier here: the output is [['b916f5d3-4367-46c5-896f-f2d2e2e59a00', None], [[['3f78d31b-e374-484e-b05b-40c7a2038081', None], None], [['bce50d49-466a-43e9-b6ad-1b78b973b50f', None], None]]].
I am not sure what the Nones, mean, but I guess the UUIDs represents the subtasks of my meta task. But how do I get their results from here?
I tried:
subtasks_ids = [i[0][0] for i in meta_task1_result[1]]
processed_items = [app.AsyncResult(i).get() for i in subtasks_ids]
but this just hangs. What am I doing wrong here? I'm expecting an output looking like ['processed_item1', 'processed_item3']
What about trying to chord the output of select_items to group_items?
meta_task2 = select_items.s(items) | dynamic_map_chord.s(group_items.s())
print(meta_task2.apply_async().get())
Ouch, this raises an AttributeError that I don't really understand.
Traceback (most recent call last):
File "chaintasks.py", line 70, in <module>
print(meta_task2.apply_async().get())
File "/site-packages/celery/result.py", line 223, in get
return self.backend.wait_for_pending(
File "/site-packages/celery/backends/asynchronous.py", line 201, in wait_for_pending
return result.maybe_throw(callback=callback, propagate=propagate)
File "/site-packages/celery/result.py", line 335, in maybe_throw
self.throw(value, self._to_remote_traceback(tb))
File "/site-packages/celery/result.py", line 328, in throw
self.on_ready.throw(*args, **kwargs)
File "/site-packages/vine/promises.py", line 234, in throw
reraise(type(exc), exc, tb)
File "/site-packages/vine/utils.py", line 30, in reraise
raise value
AttributeError: 'NoneType' object has no attribute 'clone'
Process finished with exit code 1
And finally, what I really am trying to do is something like:
ultimate_meta_task = (
select_items.s(items)
| dynamic_map_group.s(process_item.s())
| dynamic_map_chord.s(group_items.s())
)
print(ultimate_meta_task.apply_async().get())
but this leads to:
Traceback (most recent call last):
File "chaintasks.py", line 77, in <module>
print(ultimate_meta_task.apply_async().get())
File "/site-packages/celery/result.py", line 223, in get
return self.backend.wait_for_pending(
File "/site-packages/celery/backends/asynchronous.py", line 199, in wait_for_pending
for _ in self._wait_for_pending(result, **kwargs):
File "/site-packages/celery/backends/asynchronous.py", line 265, in _wait_for_pending
for _ in self.drain_events_until(
File "/site-packages/celery/backends/asynchronous.py", line 58, in drain_events_until
on_interval()
File "/site-packages/vine/promises.py", line 160, in __call__
return self.throw()
File "/site-packages/vine/promises.py", line 157, in __call__
retval = fun(*final_args, **final_kwargs)
File "/site-packages/celery/result.py", line 236, in _maybe_reraise_parent_error
node.maybe_throw()
File "/site-packages/celery/result.py", line 335, in maybe_throw
self.throw(value, self._to_remote_traceback(tb))
File "/site-packages/celery/result.py", line 328, in throw
self.on_ready.throw(*args, **kwargs)
File "/site-packages/vine/promises.py", line 234, in throw
reraise(type(exc), exc, tb)
File "/site-packages/vine/utils.py", line 30, in reraise
raise value
kombu.exceptions.EncodeError: TypeError('Object of type GroupResult is not JSON serializable')
Process finished with exit code 1
What I try to achieve here, is to get an output looking like 'processed_item1:processed_item3'. Is this even doable using celery? Any help here is appreciated.

Error creating universal sentence encoder embeddings using beam & tf transform

I have a simple beam pipline that takes some text and gets embeddings using universal sentence encoder with tf transform. Very similar to the demo made using tf 1.
import tensorflow as tf
import apache_beam as beam
import tensorflow_transform.beam as tft_beam
import tensorflow_transform.coders as tft_coders
from apache_beam.options.pipeline_options import PipelineOptions
import tempfile
model = None
def embed_text(text):
import tensorflow_hub as hub
global model
if model is None:
model = hub.load(
'https://tfhub.dev/google/universal-sentence-encoder/4')
embedding = model(text)
return embedding
def get_metadata():
from tensorflow_transform.tf_metadata import dataset_schema
from tensorflow_transform.tf_metadata import dataset_metadata
metadata = dataset_metadata.DatasetMetadata(dataset_schema.Schema({
'id': dataset_schema.ColumnSchema(
tf.string, [], dataset_schema.FixedColumnRepresentation()),
'text': dataset_schema.ColumnSchema(
tf.string, [], dataset_schema.FixedColumnRepresentation())
}))
return metadata
def preprocess_fn(input_features):
text_integerized = embed_text(input_features['text'])
output_features = {
'id': input_features['id'],
'embedding': text_integerized
}
return output_features
def run(pipeline_options, known_args):
argv = None # if None, uses sys.argv
pipeline_options = PipelineOptions(argv)
pipeline = beam.Pipeline(options=pipeline_options)
with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
articles = (
pipeline
| beam.Create([
{'id':'01','text':'To be, or not to be: that is the question: '},
{'id':'02','text':"Whether 'tis nobler in the mind to suffer "},
{'id':'03','text':'The slings and arrows of outrageous fortune, '},
{'id':'04','text':'Or to take arms against a sea of troubles, '},
]))
articles_dataset = (articles, get_metadata())
transformed_dataset, transform_fn = (
articles_dataset
| 'Extract embeddings' >> tft_beam.AnalyzeAndTransformDataset(preprocess_fn)
)
transformed_data, transformed_metadata = transformed_dataset
_ = (
transformed_data | 'Write embeddings to TFRecords' >> beam.io.tfrecordio.WriteToTFRecord(
file_path_prefix='{0}'.format(known_args.output_dir),
file_name_suffix='.tfrecords',
coder=tft_coders.example_proto_coder.ExampleProtoCoder(
transformed_metadata.schema),
num_shards=1
)
)
result = pipeline.run()
result.wait_until_finished()
python 3.6.8, tf==2.0, tf_transform==0.15, apache-beam[gcp]==0.16 (I tried various compatible combos from https://github.com/tensorflow/transform)
I am getting an error when tf_transform calls the graph analyser:
...
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/beam/impl.py", line 462, in process
lambda: self._make_graph_state(saved_model_dir))
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tfx_bsl/beam/shared.py", line 221, in acquire
return _shared_map.acquire(self._key, constructor_fn)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tfx_bsl/beam/shared.py", line 184, in acquire
result = control_block.acquire(constructor_fn)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tfx_bsl/beam/shared.py", line 87, in acquire
result = constructor_fn()
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/beam/impl.py", line 462, in <lambda>
lambda: self._make_graph_state(saved_model_dir))
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/beam/impl.py", line 438, in _make_graph_state
self._exclude_outputs, self._tf_config)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/beam/impl.py", line 357, in __init__
tensor_inputs = graph_tools.get_dependent_inputs(graph, inputs, fetches)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/graph_tools.py", line 686, in get_dependent_inputs
sink_tensors_ready)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/graph_tools.py", line 499, in __init__
table_init_op, graph_analyzer_for_table_init, translate_path_fn)
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_transform/graph_tools.py", line 560, in _get_table_init_op_source_info
if table_init_op.type not in _TABLE_INIT_OP_TYPES:
AttributeError: 'Tensor' object has no attribute 'type' [while running 'Extract embeddings/TransformDataset/Transform']
Exception ignored in: <bound method CapturableResourceDeleter.__del__ of <tensorflow.python.training.tracking.tracking.CapturableResourceDeleter object at 0x14152fbe0>>
Traceback (most recent call last):
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_core/python/training/tracking/tracking.py", line 190, in __del__
File "/Users/justingrace/.pyenv/versions/hlx36/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3872, in as_default
File "/Users/justingrace/.pyenv/versions/3.6.8/lib/python3.6/contextlib.py", line 159, in helper
TypeError: 'NoneType' object is not callable
It appears like the graph analyser is expecting a list of operations with a type attribute but it is receiving a tensor. I can't grasp why this error is occuring other than a bug in the graph analyzer or a compatibility issue with tfx_bsl (there seem to be issues with pyarrow 0.14 so I have downgraded to 0.13)
Output of pip freeze:
absl-py==0.8.1
annoy==1.12.0
apache-beam==2.16.0
appnope==0.1.0
astor==0.8.1
astunparse==1.6.3
attrs==19.1.0
avro-python3==1.9.1
backcall==0.1.0
bleach==3.1.0
cachetools==3.1.1
certifi==2019.11.28
chardet==3.0.4
crcmod==1.7
cymem==1.31.2
cytoolz==0.9.0.1
decorator==4.4.1
defusedxml==0.6.0
dill==0.3.0
docopt==0.6.2
en-core-web-lg==2.0.0
en-coref-lg==3.0.0
en-ner-trained==2.0.0
entrypoints==0.3
fastavro==0.21.24
fasteners==0.15
flashtext==2.7
future==0.18.2
fuzzywuzzy==0.16.0
gast==0.2.2
google-api-core==1.16.0
google-apitools==0.5.28
google-auth==1.11.0
google-auth-oauthlib==0.4.1
google-cloud-bigquery==1.17.1
google-cloud-bigtable==1.0.0
google-cloud-core==1.3.0
google-cloud-datastore==1.7.4
google-cloud-pubsub==1.0.2
google-pasta==0.1.8
google-resumable-media==0.4.1
googleapis-common-protos==1.51.0
grpc-google-iam-v1==0.12.3
grpcio==1.24.0
h5py==2.10.0
hdfs==2.5.8
httplib2==0.12.0
idna==2.8
importlib-metadata==1.5.0
ipykernel==5.1.4
ipython==7.12.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.16.0
Jinja2==2.11.1
jsonpickle==1.2
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.1.0
jupyter-core==4.6.2
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
lxml==4.2.1
Markdown==3.2.1
MarkupSafe==1.1.1
mistune==0.8.4
mock==2.0.0
monotonic==1.5
more-itertools==8.2.0
msgpack==0.6.2
msgpack-numpy==0.4.4
murmurhash==0.28.0
nbconvert==5.6.1
nbformat==5.0.4
networkx==2.1
nltk==3.4.5
notebook==6.0.3
numpy==1.18.1
oauth2client==3.0.0
oauthlib==3.1.0
opt-einsum==3.1.0
packaging==20.1
pandas==0.23.0
pandocfilters==1.4.2
parso==0.6.1
pathlib2==2.3.5
pbr==5.4.4
pexpect==4.8.0
pickleshare==0.7.5
plac==0.9.6
pluggy==0.13.1
preshed==1.0.1
prometheus-client==0.7.1
prompt-toolkit==3.0.3
proto-google-cloud-datastore-v1==0.90.4
protobuf==3.11.3
psutil==5.6.7
ptyprocess==0.6.0
py==1.8.1
pyahocorasick==1.4.0
pyarrow==0.13.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pydot==1.4.1
Pygments==2.5.2
PyHamcrest==1.9.0
pymongo==3.10.1
pyparsing==2.4.6
pyrsistent==0.15.7
pytest==5.3.5
python-dateutil==2.8.0
python-Levenshtein==0.12.0
pytz==2019.3
PyYAML==3.13
pyzmq==18.1.1
qtconsole==4.6.0
regex==2017.4.5
repoze.lru==0.7
requests==2.22.0
requests-oauthlib==1.3.0
rsa==4.0
scikit-learn==0.19.1
scipy==1.4.1
Send2Trash==1.5.0
six==1.14.0
spacy==2.0.12
tb-nightly==2.2.0a20200217
tensorboard==2.0.2
tensorflow==2.0.0
tensorflow-estimator==2.0.1
tensorflow-hub==0.6.0
tensorflow-metadata==0.15.2
tensorflow-serving-api==2.1.0
tensorflow-transform==0.15.0
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
textblob==0.15.1
tf-estimator-nightly==2.1.0.dev2020012309
tf-nightly==2.2.0.dev20200217
tfx-bsl==0.15.0
thinc==6.10.3
toolz==0.10.0
tornado==6.0.3
tqdm==4.23.3
traitlets==4.3.3
typing==3.7.4.1
typing-extensions==3.7.4.1
ujson==1.35
Unidecode==1.0.22
urllib3==1.25.8
wcwidth==0.1.8
webencodings==0.5.1
Werkzeug==1.0.0
Whoosh==2.7.4
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==2.2.0
This could be an underlying issue according to this github post. Try using an updated version of tensorflow (2.1.0), or maybe even an updated version of your keras packages.

Issue with multiline TokenInputTransformer in IPython extension

I am trying to use a TokenInputTransformer within an IPython extension module, but it seems that there is something wrong with the standard implementation of token transformers with multiline input. Consider the following minimal extension:
from IPython.core.inputtransformer import TokenInputTransformer
#TokenInputTransformer.wrap
def test_transformer(tokens):
return tokens
def load_ipython_extension(ip):
for s in (ip.input_splitter, ip.input_transformer_manager):
s.python_line_transforms.extend([test_transformer()])
print "Test activated"
When I load the extension in IPython 1.1.0 I get a non-handled exception with multiline input:
In [1]: %load_ext test
Test activated
In [2]: abs(
...: 2
...: )
Traceback (most recent call last):
File "/Applications/anaconda/bin/ipython", line 6, in <module>
sys.exit(start_ipython())
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/__init__.py", line 118, in start_ipython
return launch_new_instance(argv=argv, **kwargs)
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/config/application.py", line 545, in launch_instance
app.start()
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/terminal/ipapp.py", line 362, in start
self.shell.mainloop()
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/terminal/interactiveshell.py", line 436, in mainloop
self.interact(display_banner=display_banner)
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/terminal/interactiveshell.py", line 548, in interact
self.input_splitter.push(line)
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/core/inputsplitter.py", line 620, in push
out = self.push_line(line)
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/core/inputsplitter.py", line 655, in push_line
line = transformer.push(line)
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/core/inputtransformer.py", line 152, in push
return self.output(tokens)
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/core/inputtransformer.py", line 157, in output
return untokenize(self.func(tokens)).rstrip('\n')
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/utils/_tokenize_py2.py", line 276, in untokenize
return ut.untokenize(iterable)
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/utils/_tokenize_py2.py", line 214, in untokenize
self.add_whitespace(start)
File "/Applications/anaconda/lib/python2.7/site-packages/IPython/utils/_tokenize_py2.py", line 199, in add_whitespace
assert row >= self.prev_row
AssertionError
If you suspect this is an IPython bug, please report it at:
https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev#scipy.org
You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.
Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
%config Application.verbose_crash=True
Am I doing something wrong or is it really an IPython bug?
It is really an IPython bug, I think. Specifically, the way we handle tokenize fails when an expression involving brackets (()[]{}) is spread over more than one line. I'm trying to work out what we can do about this.
Kinda late answer but, I was trying to use it in my own extension and just had the same problem. I've solved it by simply removing NL from the list (it's not the same as NEWLINE token which end statement), NL token only appears inside [], (), {} so it should be safely removable.
from tokenize import NL
#TokenInputTransformer.wrap
def mat_transformer(tokens):
tokens = list(filter(lambda t: t.type != NL, tokens))
return tokens
If you are looking for full example, I've posted my goofy code there: https://github.com/Quinzel/pymat