NoSQL Injections with Rasa - mongodb

Security Concern
I have recently been playing around with Rasa and having MongoDB as the database behind.
I was wondering whether one should preprocess the inputs to Rasa somehow in order to prevent NoSQL injections?
I tried putting a Custom Component as a part of Rasa NLU Pipeline, but as soon as something hits the first element of the NLU pipeline, it seems that the original text is also saved in Mongo.
domain_file
language: "de"
pipeline:
- name: "nlu_components.length_limiter.LengthLimiter"
- name: "tokenizer_whitespace"
- name: "intent_entity_featurizer_regex"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
length_limiter.py - notice the "process" method
from rasa_nlu.components import Component
MAX_LENGTH = 300
class LengthLimiter(Component):
"""
This component shortens the input message to MAX_LENGTH chars
in order to prevent overloading the bot
"""
# Name of the component to be used when integrating it in a
# pipeline. E.g. ``[ComponentA, ComponentB]``
# will be a proper pipeline definition where ``ComponentA``
# is the name of the first component of the pipeline.
name = "LengthLimiter"
# Defines what attributes the pipeline component will
# provide when called. The listed attributes
# should be set by the component on the message object
# during test and train, e.g.
# ```message.set("entities", [...])```
provides = []
# Which attributes on a message are required by this
# component. e.g. if requires contains "tokens", than a
# previous component in the pipeline needs to have "tokens"
# within the above described `provides` property.
requires = []
# Defines the default configuration parameters of a component
# these values can be overwritten in the pipeline configuration
# of the model. The component should choose sensible defaults
# and should be able to create reasonable results with the defaults.
defaults = {
"MAX_LENGTH": 300
}
# Defines what language(s) this component can handle.
# This attribute is designed for instance method: `can_handle_language`.
# Default value is None which means it can handle all languages.
# This is an important feature for backwards compatibility of components.
language_list = None
def __init__(self, component_config=None):
super(LengthLimiter, self).__init__(component_config)
def train(self, training_data, cfg, **kwargs):
"""Train this component.
This is the components chance to train itself provided
with the training data. The component can rely on
any context attribute to be present, that gets created
by a call to :meth:`components.Component.pipeline_init`
of ANY component and
on any context attributes created by a call to
:meth:`components.Component.train`
of components previous to this one."""
pass
def process(self, message, **kwargs):
"""Process an incoming message.
This is the components chance to process an incoming
message. The component can rely on
any context attribute to be present, that gets created
by a call to :meth:`components.Component.pipeline_init`
of ANY component and
on any context attributes created by a call to
:meth:`components.Component.process`
of components previous to this one."""
message.text = message.text[:self.defaults["MAX_LENGTH"]]
def persist(self, model_dir):
"""Persist this component to disk for future loading."""
pass
#classmethod
def load(cls, model_dir=None, model_metadata=None, cached_component=None,
**kwargs):
"""Load this component from file."""
if cached_component:
return cached_component
else:
component_config = model_metadata.for_component(cls.name)
return cls(component_config)

I played around the mongo tracker store and could not inject anything.
However, if you want to add your own component to go absolutely sure you'd have to implement your own input channel. There you can change the messages, before it is processed by Rasa Core.

Related

How can I dump changes corresponding to list using ruamel.yaml?

I am using solution in the related answer for How to auto-dump modified values in nested dictionaries using ruamel.yaml
with RoundTripRepresenter.
If I make chnages on a list, ruamel.yaml is able to make changes on the local variable, but it does not dump/write the changes into the file. Would it be possible to achive it?
Example config file:
live:
- name: Ethereum
networks:
- chainid: 1
explorer: https://api.etherscan.io/api
For example I changed name into alper and tried to append new item into the list.
my code:
class YamlUpdate:
def __init__(self):
self.config_file = Path.home() / "alper.yaml"
self.network_config = Yaml(self.config_file)
self.change_item()
def change_item(self):
for network in self.network_config["live"]:
if network["name"] == "Ethereum":
network["name"] = "alper"
print(network)
network.append("new_item")
yy = YamlUpdate()
print(yy.config_file.read_text())
output is where name remains unchanged on the original file:
{'name': 'alper', 'networks': [{'chainid': 1, 'explorer': 'https://api.etherscan.io/api'}]}
live:
- name: Ethereum
networks:
- chainid: 1
explorer: https://api.etherscan.io/api
I think you should look at making a class SubConfigList that behaves like a list but notifies its parent (in the datastructure), like in the other answer where SubConfig notifies its parent.
You'll also need to make sure to represent SubConfigList as a sequence into the YAML document.
If you ever going to have list at the root of your data structure, you'll need to have list-like alternative to the dict-like Config. (Or document for the consumers/users of your code that the root always needs to be a mapping).

Pytest + Appium test framework

I'm very new to automation development, and currently starting to write an appium+pytest based Android app testing framework.
I managed to run tests on a connected device using this code, that seems to use unittest:
class demo(unittest.TestCase):
reportDirectory = 'reports'
reportFormat = 'xml'
dc = {}
driver = None
# testName = 'test_setup_tmotg_demo'
def setUp(self):
self.dc['reportDirectory'] = self.reportDirectory
self.dc['reportFormat'] = self.reportFormat
# self.dc['testName'] = self.testName
self.dc['udid'] = 'RF8MA2GW1ZF'
self.dc['appPackage'] = 'com.tg17.ud.internal'
self.dc['appActivity'] = 'com.tg17.ud.ui.splash.SplashActivity'
self.dc['platformName'] = 'android'
self.dc['noReset'] = 'true'
self.driver = webdriver.Remote('http://localhost:4723/wd/hub',self.dc)
# def test_function1():
# code
# def test_function2():
# code
# def test_function3():
# code
# etc...
def tearDown(self):
self.driver.quit()
if __name__ == '__main__':
unittest.main()
As you can see all the functions are currently within 'demo' class.
The intention is to create several test cases for each part of the app (for example: registration, main screen, premium subscription, etc.). That could sum up to hundreds of test cases eventually.
It seems to me that simply continuing listing them all in this same class would be messy and would give me a very limited control. However I didn't find any other way to arrange my tests while keeping the device connected via appium.
The question is what would be the right way to organize the project so that I can:
Set up the device with appium server
Run all the test suites in sequential order (registration, main screen, subscription, etc...).
Perform the cleaning... export results, disconnect device, etc.
I hope I described the issue clearly enough. Would be happy to elaborate if needed.
Well you have a lot of questions here so it might be good to split them up into separate threads. But first of all you can learn a lot about how Appium works by checking out the documentation here. And for the unittest framework here.
All Appium cares about is the capabilities file (or variable). So you can either populate it manually or white some helper function to do that for you. Here is a list of what can be used.
You can create as many test classes(or suites) as you want and add them together in any order you wish. This helps to break things up into manageable chunks. (See example below)
You will have to create some helper methods here as well, since Appium itself will not do much cleaning. You can use the adb command in the shell for managing android devices.
import unittest
from unittest import TestCase
# Create a Base class for common methods
class BaseTest(unittest.TestCase):
# setUpClass method will only be ran once, and not every suite/test
#classmethod
def setUpClass(cls) -> None:
# Init your driver and read the capabilites here
pass
#classmethod
def tearDownClass(cls) -> None:
# Do cleanup, close the driver, ...
pass
# Use the BaseTest class from before
# You can then duplicate this class for other suites of tests
class TestLogin(BaseTest):
#classmethod
def setUpClass(cls) -> None:
super(TestLogin, cls).setUpClass()
# Do things here that are needed only once (like loging in)
def setUp(self) -> None:
# This is executed before every test
pass
def testOne(self):
# Write your tests here
pass
def testTwo(self):
# Write your tests here
pass
def tearDown(self) -> None:
# This is executed after every test
pass
if __name__ == '__main__':
# Load the tests from the suite class we created
test_cases = unittest.defaultTestLoader.loadTestsFromTestCase(TestLogin)
# If you want do add more
test_cases.addTests(TestSomethingElse)
# Run the actual tests
unittest.TextTestRunner().run(test_cases)

How to do 2-way data binding using Python+PyGObject's GObject.bind_property function?

The background to this question (and my overall goal) is to structure a Python GTK application in a nice way. I am trying to bind widget properties to model properties using GTK's bidirectional data bindings.
My expectation is that the bidirectional binding should keep two properties in sync. I find instead that changes propagate in one direction only, even though I am using the GObject.BindingFlags.BIDIRECTIONAL flag. I created the following minimal example and the failing test case test_widget_syncs_to_model to illustrate the problem. Note that in a more realistic example, the model object could be an instance of Gtk.Application and the widget object could be an instance of Gtk.Entry.
import gi
gi.require_version('Gtk', '3.0')
from gi.repository import Gtk, GObject
import unittest
class Obj(GObject.Object):
"""A very simple GObject with a `txt` property."""
name = "default"
txt = GObject.Property(type=str, default="default")
def __init__(self, name, *args, **kwargs):
super().__init__(*args, **kwargs)
self.name = name
self.connect("notify", self.log)
def log(self, source, parameter_name):
print(
f"The '{self.name}' object received a notify event, "
f"its txt now is '{self.txt}'."
)
class TestBindings(unittest.TestCase):
def setUp(self):
"""Sets up a bidirectional binding between a model and a widget"""
print(f"\n\n{self.id()}")
self.model = Obj("model")
self.widget = Obj("widget")
self.model.bind_property(
"txt", self.widget, "txt", flags=GObject.BindingFlags.BIDIRECTIONAL
)
#unittest.skip("suceeds")
def test_properties_are_found(self):
"""Verifies that the `txt` properties are correctly set up."""
for obj in [self.model, self.widget]:
self.assertIsNotNone(obj.find_property("txt"))
#unittest.skip("suceeds")
def test_model_syncs_to_widget(self, data="hello"):
"""Verifies that model changes propagate to the widget"""
self.model.txt = data
self.assertEqual(self.widget.txt, data)
def test_widget_syncs_to_model(self, data="world"):
"""Verifies that widget changes propagate back into the model"""
self.widget.txt = data
self.assertEqual(self.widget.txt, data) # SUCCEEDS
self.assertEqual(self.model.txt, data) # FAILS
if __name__ == "__main__":
unittest.main()
The above program outputs:
ssF
======================================================================
FAIL: test_widget_syncs_to_model (__main__.TestBindings)
Verifies that widget changes propagate back into the model
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jh/.config/JetBrains/PyCharmCE2021.1/scratches/scratch_14.py", line 52, in test_widget_syncs_to_model
self.assertEqual(self.model.txt, data) # FAILS
AssertionError: 'default' != 'world'
- default
+ world
----------------------------------------------------------------------
Ran 3 tests in 0.001s
FAILED (failures=1, skipped=2)
__main__.TestBindings.test_widget_syncs_to_model
The 'widget' object received a notify event, its txt now is 'world'.
Process finished with exit code 1
My specific question is, how can I get the bidirectional data bindings to work?... I would be glad if someone could fix my example or provide another working example.
In a broader sense, are bidirectional bindings the way to go for syncing UI state and model state in a well-structured Python GTK application? What is the intended and well-supported way to do this? thanks!
I got an answer over at the gnome discourse thread about bidirectional property bindings in python.
To make it even more clear, the following code does not work because flags are not passed correctly:
# broken, flags are passed incorrectly as keywords argument:
self.model.bind_property("txt", self.widget, "txt", flags=GObject.BindingFlags.BIDIRECTIONAL)
Instead, flags must be passed as follows:
# functioning, flags are passed correctly as a positional argument:
self.model.bind_property("txt", self.widget, "txt", GObject.BindingFlags.BIDIRECTIONAL)
More example code: Proper use of bidirectional bindings is for example demonstrated in pygobject’s source code in the test_bidirectional_binding test case.

Is there a way to find out which pytest-xdist gateway is running?

I would like to create a separate log file for each subprocess/gateway that is spawned by pytest-xdist. Is there an elegant way of finding out in which subprocess/gateway pytest is currently in? I'm configuring my root logger with a session scoped fixture located in conftest.py, something like this:
#pytest.fixture(scope='session', autouse=True)
def setup_logging():
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
fh = logging.FileHandler('xdist.log')
fh.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.addHandler(fh)
It would be great if i could add a prefix to the log file name based on the gateway number, e.g:
fh = logging.FileHandler('xdist_gateway_%s.log' % gateway_number)
Without this each gateway will use the same log and the logs will get messy. I know that I can add a time stamp to the filename. But this doesn't let me to distinguish quickly which file is from which gateway.
Similar to #Kanguros's answer but plugging into the pytest fixture paradigm:
You can get the worker id by [accessing] the slaveinput dictionary. Here's a fixture which makes that information available to tests and other fixtures:
#pytest.fixture
def worker_id(request):
if hasattr(request.config, 'workerinput'):
return request.config.workerinput['workerid']
else:
return 'master'
This is quoted from a comment on the pytest-xdist Issues tracker/discussion (2016).
I found out that you can access the gateway id in the following way:
slaveinput = getattr(session.config, "slaveinput", None)
if slaveinput:
gatewayid = slaveinput['slaveid']
Of course you need to be in a place where you can access the session.config object.

New & renamed workflows with existing content

I have a site with a custom content type Content, which initially had a single workflow attached, content_workflow. There are several thousand existing instances of Content.
I now have a need to add a second workflow to this type, content_beta_workflow. How can I update all existing content to be part of the new workflow?
On a related note: if I want rename the initial workflow to content_alpha_workflow, how can I update all existing content to reflect this change?
If you are simply changing from one workflow to the other, follow these steps:
Go to Site Setup > Types
Select your custom content type from the drop down menu, the page will update to display the current workflow
Select your new workflow from the dropdown, a map will be generated showing each state in the current workflow
For each state, select the state in your new workflow that most closely matches (or is most appropriate)
When you save, all objects of your custom site will be updated to use the new workflow. For each state in the map from the original workflow, existing content in that state will be put into the state you chose in step 4 above. Security settings will be re-indexed and you are done.
As for renaming the old workflow, you can do so in the portal_workflow tool in the ZMI. But only change the human-facing Title of the workflow. Changing the ID may have side effects for the workflow history of your content.
edited
Okay, I see from your comment that you are looking to add a new workflow to a type in addition to the one it already has. Here's a bit of sample code to accomplish that:
my_type = 'Content' # This is your content portal_type name
my_wf = 'content_workflow_beta'
wf_chain = list(wf_tool.getChainForPortalType(my_type))
if my_wf not in wf_chain:
wf_chain.append(my_wf)
wf_tool.setChainForPortalTypes([my_type], wf_chain)
You can add this code in an upgrade step for the package that defines your content type and workflows. Add a call to updateRoleMappings on the workflow tool and you'll be set to use the new workflow through the standard Plone UI in addition to your original workflow.
As you've already found, you can also manually update the workflow history of all objects to rename workflow ID, but that's a pretty invasive step.
As workflow_history is a dict property on each content item, it was a case of adding or updating suitable items as required. First, I copied the GenericSetup for content_workflow to content_alpha_workflow. Next, I created content_beta_workflow and added it to the profile. Then I wrote the following upgrade step:
import logging
from DateTime import DateTime
def modify_content_workflow_history(context, logger=None):
if logger is None: logger = logging.getLogger('my.product')
# import the new workflows
context.portal_setup.runImportStepFromProfile('profile-my.product:default', 'workflow')
# set up some defaults for the new records
_history_defaults = dict(
action = None,
actor = 'admin',
comments = 'automatically created by update v2',
time = DateTime(),
)
_alpha_defaults = dict(review_state = 'alpha_state_1', **_history_defaults)
_beta_defaults = dict(review_state = 'beta_state_1', **_history_defaults)
for parent in context.parents.values():
for content in parent.content.values():
# don't acquire the parent's history
if 'parent_workflow' in content.workflow_history:
content.workflow_history = {}
# copy content_workflow to content_alpha_workflow
if 'content_workflow' in content.workflow_history:
alpha_defaults = context.workflow_history['content_workflow']
del content.workflow_history['content_workflow']
else:
alpha_defaults = (_alpha_defaults,) # must be a tuple
content.workflow_history['ctcc_content_alpha_workflow'] = alpha_defaults
# create the beta workflow with a modified actor
beta_defaults = dict(**_beta_defaults)
beta_defaults['actor'] = u'%suser' % parent.id
content.workflow_history['ctcc_content_beta_workflow'] = (beta_defaults,)
logger.info('Content workflow history updated')