How to discover undocumented mimdump inline script parameters? - mitmproxy

I am trying to parse different elements of the request and response header with inline scripting and mitmdump. Some features are undocumented. I will post the lessons learned in reply to this question.

Why not use the official documentation?
http://mitmproxy.org/doc/scripting/inlinescripts.html
The canonical API documentation is the code, which you can browse locally or in our GitHub repo. You can view the API documentation using pydoc (which is installed with Python by default), like this:
pydoc libmproxy.protocol.http.HTTPRequest
gives better output.

The use of dir() in an inline script shows all the varables that you can use for parsing.
def response(context, flow):
print dir(flow)
print dir(flow.request)
for cookie in flow.response.headers["Set-Cookie"]:
print "%s:\t%s" % (flow.request.host, cookie)
Results for dir(flow)
['__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__getattribute__', '__hash__',
'__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', '__weakref__', '_backup', '_stateobject_attributes',
'_stateobject_long_attributes', 'accept_intercept', 'backup', 'client_conn', 'copy', 'error', 'from_state',
'get_state', 'id', 'intercept', 'intercepting', 'kill', 'live', 'load_state', 'match', 'modified', 'replace',
'reply', 'request', 'response', 'revert', 'server_conn', 'type']
results for dir(flow.request)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__',
'__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_assemble_first_line', '_assemble_head',
'_assemble_headers', '_stateobject_attributes', '_stateobject_long_attributes', 'anticache', 'anticomp',
'assemble', 'constrain_encoding', 'content', 'copy', 'decode', 'encode', 'form_in', 'form_out', 'from_state',
'from_stream', 'get_cookies', 'get_decoded_content', 'get_form_urlencoded', 'get_path_components',
'get_query', 'get_state', 'headers', 'host', 'httpversion', 'is_replay', 'load_state', 'method', 'path',
'port', 'pretty_host', 'pretty_url', 'replace', 'scheme', 'set_form_urlencoded', 'set_path_components',
'set_query', 'size', 'stickyauth', 'stickycookie', 'timestamp_end', 'timestamp_start', 'update_host_header',
'url']
Results for dir(flow.response)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__',
'__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', '__weakref__', '_assemble_first_line', '_assemble_head', '_assemble_headers',
'_refresh_cookie', '_stateobject_attributes', '_stateobject_long_attributes', 'assemble', 'code', 'content',
'copy', 'decode', 'encode', 'from_state', 'from_stream', 'get_cookies', 'get_decoded_content', 'get_state',
'headers', 'httpversion', 'is_replay', 'load_state', 'msg', 'refresh', 'replace', 'size', 'stream',
'timestamp_end', 'timestamp_start']

Related

How to get vocabulary from WordEmbeddingsModel in sparknlp

I need to create an embedding matrix from embeddings generated by WordEmbeddingsModel in sparknlp. Until now i have this code :
from sparknlp.annotator import *
from sparknlp.common import *
from sparknlp.base import *
# define sparknlp pipeline
document = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = Tokenizer() \
.setInputCols(["document"]) \
.setOutputCol("token")
embeddings = WordEmbeddingsModel\
.pretrained("w2v_cc_300d","sq")\
.setInputCols(["document", "token"])\
.setOutputCol("embeddings")
embeddingsFinisher = EmbeddingsFinisher()\
.setInputCols("embeddings")\
.setOutputCols("finished_embeddings")\
.setOutputAsVector(True)
pipeline = Pipeline(stages=[document, tokenizer, embeddings, embeddingsFinisher])
model = pipeline.fit(spark_train_df)
In this case, the model has an annotator WordEmbeddingsModel but this annotator doesn't have the getVocab method to fetch the vocabulary. How can I retrieve the vocabulary if the list of attributes and methods available for the model is:
dir(model)
['__abstractmethods__',
'__class__',
'__class_getitem__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__orig_bases__',
'__parameters__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__slots__',
'__str__',
'__subclasshook__',
'__weakref__',
'_abc_impl',
'_copyValues',
'_copy_params',
'_defaultParamMap',
'_dummy',
'_from_java',
'_is_protocol',
'_paramMap',
'_params',
'_randomUID',
'_resetUid',
'_resolveParam',
'_set',
'_setDefault',
'_shouldOwn',
'_testOwnParam',
'_to_java',
'_transform',
'clear',
'copy',
'explainParam',
'explainParams',
'extractParamMap',
'getOrDefault',
'getParam',
'hasDefault',
'hasParam',
'isDefined',
'isSet',
'load',
'params',
'read',
'save',
'set',
'stages',
'transform',
'uid',
'write']

GENSIM: 'TypeError: doc2bow expects an array of unicode tokens on input, not a single string' when trying to create mapping for dictionary

my text looks as follows:
text=['paris', 'shares', 'concerns', 'ecb', 'language', 'eroding', 'status', 'currency', 'union',
'diluting', 'legal', 'obligation', 'most', 'countries', 'join', 'ultimately', 'however', 'welcomes',
'britain', 'support', 'more', 'integrated', 'eurozone', 'recognises', 'uk', 'euro', 'means',
'obliged', 'choose', 'between', 'euro', 'pound', 'comment', 'article', 'moved', 'debates',
'february', 'language', 'english', 'web']
from gensim.corpora.dictionary import Dictionary
dictionary=Dictionary(text)
The error I'm getting:
TypeError: doc2bow expects an array of unicode tokens on input, not a
single string
I've tried to transform my text into a list of words to no avail. Also, I've tried to transform it to unicode to no avail. I'm no python expert just trying to analyse some text. My next step would be to check how often each token appears in the document called text. I'm using the ipython notebook.

is there some documentation on all methods of PixBuf, especially replace_data? (gtk3)

I was wondering where I can find the documentation from all of the methods that are implemented in PixBuf (found via dir, python3):
['__class__', '__copy__', '__deepcopy__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__gdoc__', '__ge__', '__getattribute__', '__gpointer__', '__grefcount__', '__gsignals__', '__gt__', '__gtype__', '__hash__', '__info__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_force_floating', '_ref', '_ref_sink', '_unref', '_unsupported_data_method', '_unsupported_method', 'add_alpha', 'apply_embedded_orientation', 'bind_property', 'bind_property_full', 'chain', 'compat_control', 'composite', 'composite_color', 'composite_color_simple', 'connect', 'connect_after', 'connect_data', 'connect_object', 'connect_object_after', 'copy', 'copy_area', 'deserialize', 'disconnect', 'disconnect_by_func', 'emit', 'emit_stop_by_name', 'equal', 'fill', 'find_property', 'flip', 'force_floating', 'freeze_notify', 'from_pixdata', 'g_type_instance', 'get_bits_per_sample', 'get_byte_length', 'get_colorspace', 'get_data', 'get_file_info', 'get_file_info_async', 'get_file_info_finish', 'get_formats', 'get_has_alpha', 'get_height', 'get_n_channels', 'get_option', 'get_options', 'get_pixels', 'get_properties', 'get_property', 'get_qdata', 'get_rowstride', 'get_width', 'handler_block', 'handler_block_by_func', 'handler_disconnect', 'handler_is_connected', 'handler_unblock', 'handler_unblock_by_func', 'hash', 'install_properties', 'install_property', 'interface_find_property', 'interface_install_property', 'interface_list_properties', 'is_floating', 'list_properties', 'load', 'load_async', 'load_finish', 'new', 'new_for_string', 'new_from_bytes', 'new_from_data', 'new_from_file', 'new_from_file_at_scale', 'new_from_file_at_size', 'new_from_inline', 'new_from_resource', 'new_from_resource_at_scale', 'new_from_stream', 'new_from_stream_async', 'new_from_stream_at_scale', 'new_from_stream_at_scale_async', 'new_from_stream_finish', 'new_from_xpm_data', 'new_subpixbuf', 'notify', 'notify_by_pspec', 'override_property', 'props', 'qdata', 'read_pixel_bytes', 'read_pixels', 'ref', 'ref_count', 'ref_sink', 'replace_data', 'replace_qdata', 'rotate_simple', 'run_dispose', 'saturate_and_pixelate', 'save_to_bufferv', 'save_to_callbackv', 'save_to_stream_finish', 'savev', 'scale', 'scale_simple', 'serialize', 'set_data', 'set_properties', 'set_property', 'steal_data', 'steal_qdata', 'stop_emission', 'stop_emission_by_name', 'thaw_notify', 'to_string', 'unref', 'watch_closure', 'weak_ref']
I am particularly interested in replace_data, as I need to update the reference to the data in the PixBuf (would that be possible?).
Any idea on where I can find that documentation?
The main GdkPixbuf documentation is https://developer.gnome.org/gdk-pixbuf/stable/
Python-specific documentation is https://lazka.github.io/pgi-docs/#GdkPixbuf-2.0

How to create data frames from rdd of word's list

I have gone through all the answers of the stackoverflow and on internet but nothing works.so i have this rdd of list of words:
tweet_words=['tweet_text',
'RT',
'#ochocinco:',
'I',
'beat',
'them',
'all',
'for',
'10',
'straight',
'hours']
**What i have done till now:**
Df =sqlContext.createDataFrame(tweet_words,["tweet_text"])
and
tweet_words.toDF(['tweet_words'])
**ERROR**:
TypeError: Can not infer schema for type: <class 'str'>
Looking at the above code, you are trying to convert a list to a DataFrame. A good StackOverflow link on this is: https://stackoverflow.com/a/35009289/1100699.
Saying this, here's a working version of your code:
from pyspark.sql import Row
# Create RDD
tweet_wordsList = ['tweet_text', 'RT', '#ochocinco:', 'I', 'beat', 'them', 'all', 'for', '10', 'straight', 'hours']
tweet_wordsRDD = sc.parallelize(tweet_wordsList)
# Load each word and create row object
wordRDD = tweet_wordsRDD.map(lambda l: l.split(","))
tweetsRDD = wordRDD.map(lambda t: Row(tweets=t[0]))
# Infer schema (using reflection)
tweetsDF = tweetsRDD.toDF()
# show data
tweetsDF.show()
HTH!

Devise few users cannot reset passwords and get locked out

'devise', '3.4.1'
'rails', '4.1.8'
ruby '2.1.2'
Reset password works with most users but not with others, I have noticed with these users devise fails to save reset_password_sent_at time.
Point me in the right direction if you can...I will add code if needed, just ask.
#devise.rb
config.reset_password_within = 36.hours
#routes
devise_for :users,
controllers: {
registrations: 'registrations',
confirmations: "confirmations", :omniauth_callbacks => "users/omniauth_callbacks"
}
resources :users
#user.rb
devise :database_authenticatable, :registerable, :validatable,
:recoverable, :rememberable, :trackable, :confirmable, :omniauthable,
:omniauth_providers => [:facebook, :twitter, :google, :linkedin]