Issue with incorporating BERT in RASA Pipeline - chatbot

BERT provides an option to include pre-trained language models from Hugging Face in pipline. As per the doc:
- name: HFTransformersNLP
# Name of the language model to use
model_name: "bert"
# Pre-Trained weights to be loaded
model_weights: "bert-base-uncased"
# An optional path to a specific directory to download and cache the pre-trained model weights.
# The `default` cache_dir is the same as https://huggingface.co/transformers/serialization.html#cache-directory .
cache_dir: null
Following this I configured my pipeline as:
- name: "HFTransformersNLP"
# Name of the language model to use
model_name: "bert"
# Pre-Trained weights to be loaded
model_weights: "bert-base-uncased"
cache_dir: "C:/Project ABC/cache/"
But the problem is that on running the training steps, the model keeps failing with:
OSError: Model name 'bert-base-uncased' was not found in tokenizers
model name list (bert-base-uncased, bert-large-uncased,
bert-base-cased, bert-large-cased, bert-base-multilingual-uncased,
bert-base-multilingual-cased, bert-base-chinese,
bert-base-german-cased, bert-large-uncased-whole-word-masking,
bert-large-cased-whole-word-masking,
bert-large-uncased-whole-word-masking-finetuned-squad,
bert-large-cased-whole-word-masking-finetuned-squad,
bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased,
bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1,
bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed
'bert-base-uncased' was a path, a model identifier, or url to a
directory containing vocabulary files named ['vocab.txt'] but couldn't
find such vocabulary files at this path or url.
I did some research and it looks like that there might be issue in downloading the files from internet, so I manually downloaded the files config.json, pytorch_model.bin and placed it in C:/Project ABC/cache/ still I am getting the same error message. Any idea how to resolve this, not giving cache directory is failing too with the same error.

Related

Is there a build_extensions rule in build.yaml to output all generated Flutter models in a common directory?

I am trying to create a build_extensions rule in build.yaml for builders freezed and json_serializable to output all generated models in the directory lib/generated/model, irrespective of their original location that matches lib/**/*.dart.
What I have tried:
I would expect '^lib/**/{{}}.dart': 'lib/generated/model/{{}}.g.dart' to work, but it doesn't match any Dart files.
I also tried things like '^lib/{{path}}/{{file}}.dart': 'lib/generated/model/{{file}}.g.dart' but {{path}} needs to be matched again in the destination, as per documentation (why even enforce this?).
Example:
Base model location: lib/core/feature/profile/profile.dart
Generated outputs after calling flutter pub run build_runner build --delete-conflicting-outputs:
lib/generated/model/profile.g.dart
lib/generated/model/profile.freezed.dart
My current build.yaml (which generates .g.dart and .freezed.dart files in the child generated directory relative to the original model location) is as follows:
targets:
$default:
builders:
source_gen|combining_builder:
generate_for:
- lib/**.dart
options:
build_extensions:
# I want this line to "work":
# '^lib/**/{{}}.dart': 'lib/generated/model/{{}}.g.dart'
'lib/{{path}}/{{file}}.dart': 'lib/{{path}}/generated/{{file}}.g.dart'
freezed:
options:
build_extensions:
# I want this line to "work":
# '^lib/**/{{}}.dart': 'lib/generated/model/{{}}.freezed.dart'
'lib/{{path}}/{{file}}.dart': 'lib/{{path}}/generated/{{file}}.freezed.dart'
field_rename: snake
explicit_to_json: true
json_serializable:
options:
field_rename: snake
explicit_to_json: true
For anyone else looking for a similar solution, turns out it's not possible to place multiple generated outputs into the same directory.
Build extensions need to be written in a way that prevents two different files from emitting the same output file (e.g. if you had lib/a/foo.dart and lib/b/foo.dart, there can't be a single foo.g.dart in lib/generated/model). ** is not something build extensions supports, and {{}} should be used to match multiple chars and then used again in the destination directory. Apart from that, build extensions just match suffixes.
To avoid the problem of different source directories potentially conflicting in the output directory, you'd have to use something like lib/generated/model/{{path}}/{{file}}.g.dart, there's no way to put everything in a single directory, even if you make sure that no generated files will be outputted with the same name in the first place.

Relative Path (pathlib) name working on MAC OS but on Windows gives me a error

Currently I am working a project that has have been using the pathlib library so I can work on my Windows desktop when I need too and on my MacBook Pro. Essentially be able to work between both operating systems. I have not have any issues at all until right now. Here is the set up:
I have a pipeline set up to automatically save a .joblib and a whole lot of .png files that will go to a directory called
output_dir = Path('../Trained_Models/Differential_gene_analysis/A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus/Models train on TCGA data and test on Rodriguez data/Oct-XX-20XX')
For example, if I want to save a .joblib file under the name RandomForest_TumorThrombus_104.joblib,I would use the command
joblib.dump(model ,output_dir / 'RandomForest_TumorThrombus_104.joblib')
On my MacBook Pro, I have no issues when this is ran, but on Windows it gives me the following error
FileNotFoundError: [Errno 2] No such file or directory: '..\\Trained_Models\\Differential_gene_analysis\\A Kidney Cancer Transcriptome Molecular Signature Identifies Tumors with Tumor Thrombus\\Models train on TCGA data and test on Rodriguez data\\Oct-17-2022\\RandomForest_TumorThrombus_104.joblib'
I have tried to use the .resolve() method to get the absolute path but still gives me the same error. I have tried to experiment to try to see what is goin on such as using os.path.exists(). When using the os.path.exists() method I get True for the follwoing command:
os.path.exists(output_dir)
So it does indeed recognize that the directory exists. The next thing I tried was to rename the file to something like dddddd.joblib and that worked. But I find that only a few names for the file would allow me to save the files. During debug I found that the most recent Traceback occurs here:
with open(filename, 'wb') as f:```
I was wondering if anyone here had any idea what was going on here and how I can fix this issue? Please and Thank you.
The solution was to enable long paths on Windows.

"failed to load any lstm-specific dictionaries for lang " tesseract 4.1

I tried to train the tesseract 4.1 using OCRD project but after training completed I copied the lang.traineddata but getting above error.
The tesseractWiki page is very confusing to understand asking to use combine_lang_model after making lstmf file. So Actually I have the lstmf file. I created these file by using tif/box pair.
Please help me for further step.
Related discussions:Failed to load any lstm-specific dictionaries for lang xxx
Suppose your training folder like this:
OCRD/makefile
OCRD/data/foo-ground-truth.
You could try as following steps:
Find the WORDLIST_FILE/NUMBERS_FILE/PUNC_FILE in the makefile, and change them to:
WORDLIST_FILE := data/$(MODEL_NAME).wordlist
NUMBERS_FILE := data/$(MODEL_NAME).numbers
PUNC_FILE := data/$(MODEL_NAME).punc
Suppose your base traineddata is eng.traineddata.
2.1 Download the .wordlist/.numbers/.punc files from the langdata_lstm.
2.2 Place them in OCRD/data
2.3 if the MODEL_NAME = foo, rename them as: foo.wordlist, foo.numbers, foo.punc
if you don't have the base traineddata, you could try this too. But if your base traineddata is afr, you should download the files from langdata_lstm/afr.
make training again
The cause of this error:
In OCRD, the default path of the above three files is $ (OUTPUT_DIR) = data / $ (MODEL_NAME), and all files in this path are automatically generated during the training process.
If the variable START_MODEL is not assigned, the makefile will not generate any related files under this path;
If the variable START_MODEL has been assigned, the foo.lstm-number-dawg、foo.lstm-punc-dawg、foo.lstm-word-dawg and so on will be produced in data / $ (MODEL_NAME). But they are not the right one. So there may be a bug in OCRD.

How can I get "HelloWorld - BitBake Style" working on a newer version of Yocto?

In the book "Embedded Linux Systems with the Yocto Project", Chapter 4 contains a sample called "HelloWorld - BitBake style". I encountered a bunch of problems trying to get the old example working against the "Sumo" release 2.5.
If you're like me, the first error you encountered following the book's instructions was that you copied across bitbake.conf and got:
ERROR: ParseError at /tmp/bbhello/conf/bitbake.conf:749: Could not include required file conf/abi_version.conf
And after copying over abi_version.conf as well, you kept finding more and more cross-connected files that needed to be moved, and then some relative-path errors after that... Is there a better way?
Here's a series of steps which can allow you to bitbake nano based on the book's instructions.
Unless otherwise specified, these samples and instructions are all based on the online copy of the book's code-samples. While convenient for copy-pasting, the online resource is not totally consistent with the printed copy, and contains at least one extra bug.
Initial workspace setup
This guide assumes that you're working with Yocto release 2.5 ("sumo"), installed into /tmp/poky, and that the build environment will go into /tmp/bbhello. If you don't the Poky tools+libraries already, the easiest way is to clone it with:
$ git clone -b sumo git://git.yoctoproject.org/poky.git /tmp/poky
Then you can initialize the workspace with:
$ source /tmp/poky/oe-init-build-env /tmp/bbhello/
If you start a new terminal window, you'll need to repeat the previous command which will get get your shell environment set up again, but it should not replace any of the files created inside the workspace from the first time.
Wiring up the defaults
The oe-init-build-env script should have just created these files for you:
bbhello/conf/local.conf
bbhello/conf/templateconf.cfg
bbhello/conf/bblayers.conf
Keep these, they supersede some of the book-instructions, meaning that you should not create or have the files:
bbhello/classes/base.bbclass
bbhello/conf/bitbake.conf
Similarly, do not overwrite bbhello/conf/bblayers.conf with the book's sample. Instead, edit it to add a single line pointing to your own meta-hello folder, ex:
BBLAYERS ?= " \
${TOPDIR}/meta-hello \
/tmp/poky/meta \
/tmp/poky/meta-poky \
/tmp/poky/meta-yocto-bsp \
"
Creating the layer and recipe
Go ahead and create the following files from the book-samples:
meta-hello/conf/layer.conf
meta-hello/recipes-editor/nano/nano.bb
We'll edit these files gradually as we hit errors.
Can't find recipe error
The error:
ERROR: BBFILE_PATTERN_hello not defined
It is caused by the book-website's bbhello/meta-hello/conf/layer.conf being internally inconsistent. It uses the collection-name "hello" but on the next two lines uses _test suffixes. Just change them to _hello to match:
# Set layer search pattern and priority
BBFILE_COLLECTIONS += "hello"
BBFILE_PATTERN_hello := "^${LAYERDIR}/"
BBFILE_PRIORITY_hello = "5"
Interestingly, this error is not present in the printed copy of the book.
No license error
The error:
ERROR: /tmp/bbhello/meta-hello/recipes-editor/nano/nano.bb: This recipe does not have the LICENSE field set (nano)
ERROR: Failed to parse recipe: /tmp/bbhello/meta-hello/recipes-editor/nano/nano.bb
Can be fixed by adding a license setting with one of the values that bitbake recognizes. In this case, add a line onto nano.bb of:
LICENSE="GPLv3"
Recipe parse error
ERROR: ExpansionError during parsing /tmp/bbhello/meta-hello/recipes-editor/nano/nano.bb
[...]
bb.data_smart.ExpansionError: Failure expanding variable PV_MAJOR, expression was ${#bb.data.getVar('PV',d,1).split('.')[0]} which triggered exception AttributeError: module 'bb.data' has no attribute 'getVar'
This is fixed by updating the special python commands being used in the recipe, because #bb.data was deprecated and is now removed. Instead, replace it with #d, ex:
PV_MAJOR = "${#d.getVar('PV',d,1).split('.')[0]}"
PV_MINOR = "${#d.getVar('PV',d,1).split('.')[1]}"
License checksum failure
ERROR: nano-2.2.6-r0 do_populate_lic: QA Issue: nano: Recipe file fetches files and does not have license file information (LIC_FILES_CHKSUM) [license-checksum]
This can be fixed by adding a directive to the recipe telling it what license-info-containing file to grab, and what checksum we expect it to have.
We can follow the way the recipe generates the SRC_URI, and modify it slightly to point at the COPYING file in the same web-directory. Add this line to nano.bb:
LIC_FILES_CHKSUM = "${SITE}/v${PV_MAJOR}.${PV_MINOR}/COPYING;md5=f27defe1e96c2e1ecd4e0c9be8967949"
The MD5 checksum in this case came from manually downloading and inspecting the matching file.
Done!
Now bitbake nano ought to work, and when it is complete you should see it built nano:
/tmp/bbhello $ find ./tmp/deploy/ -name "*nano*.rpm*"
./tmp/deploy/rpm/i586/nano-dbg-2.2.6-r0.i586.rpm
./tmp/deploy/rpm/i586/nano-dev-2.2.6-r0.i586.rpm
I have recently worked on that hands-on hello world project. As far as I am concerned, I think that the source code in the book contains some bugs. Below there is a list of suggested fixes:
Inheriting native class
In fact, when you build with bitbake that you got from poky, it builds only for the target, unless you mention in your recipe that you are building for the host machine (native). You can do the latter by adding this line at the end of your recipe:
inherit native
Adding license information
It is worth mentioning that the variable LICENSE is important to be set in any recipe, otherwise bitbake rises an error. In our case, we try to build the version 2.2.6 of the nano editor, its current license is GPLv3, hence it should be mentioned as follow:
LICENSE = "GPLv3"
Using os.system calls
As the book states, you cannot dereference metadata directly from a python function. Which means it is mandatory to access metadata through the d dictionary. Bellow, there is a suggestion for the do_unpack python function, you can use its concept to code the next tasks (do_configure, do_compile):
python do_unpack() {
workdir = d.getVar("WORKDIR", True)
dl_dir = d.getVar("DL_DIR", True)
p = d.getVar("P", True)
tarball_name = os.path.join(dl_dir, p+".tar.gz")
bb.plain("Unpacking tarball")
os.system("tar -x -C " + workdir + " -f " + tarball_name)
bb.plain("tarball unpacked successfully")
}
Launching the nano editor
After successfully building your nano editor package, you can find your nano executable in the following directory in case you are using Ubuntu (arch x86_64):
./tmp/work/x86_64-linux/nano/2.2.6-r0/src/nano
Should you have any comments or questions, Don't hesitate !

Old feature file path is used even after updating a new path

I am new to cucumber and I am automating a scenario. Initially I kept my features files in the path C:\Users\test\eclipse-workspace\Automation\src\test\resources\featureFile. Then I moved the feature files to a different path (C:\Users\test\eclipse-workspace\Automation\src\test\com\test]automation\features). I have updated the same in CucumberOptions as shown below.
#CucumberOptions(features = {
"src/test/java/com/test/automation/features/CO_Self_Service_Home_Page_Personalizations.feature" }, glue = {
"src/test/java/com/oracle/peoplesoft/HCM/StepDefinitions" })
But when I try to run the feature, I am getting the below exception stating the feature file is not found. Here the path shown in the exception is the old path. I am not sure from where it is fetched as I have updated the new path in Cucumber options. Can you please help me understand the cause of this issue.
Exception in thread "main" java.lang.IllegalArgumentException: Not a
file or directory:
C:\Users\test\eclipse-workspace\Automation\src\test\resources\featureFile\Self_Service_Home_Page_Personalizations.feature
at
cucumber.runtime.io.FileResourceIterator$FileIterator.(FileResourceIterator.java:54)
at
cucumber.runtime.io.FileResourceIterator.(FileResourceIterator.java:20)
at
cucumber.runtime.io.FileResourceIterable.iterator(FileResourceIterable.java:19)
at
cucumber.runtime.model.CucumberFeature.loadFromFeaturePath(CucumberFeature.java:103)
at
cucumber.runtime.model.CucumberFeature.load(CucumberFeature.java:54)
at
cucumber.runtime.model.CucumberFeature.load(CucumberFeature.java:34)
at
cucumber.runtime.RuntimeOptions.cucumberFeatures(RuntimeOptions.java:235)
at cucumber.runtime.Runtime.run(Runtime.java:110) at
cucumber.api.cli.Main.run(Main.java:36) at
cucumber.api.cli.Main.main(Main.java:18)
There are a couple of points you need to take care as follows :
As per Best Practices cerate the directory features which will contain the featurefile(s) strictly through your IDE only (not through other softwares Notepad or Textpad or SubLime3) as per the image below (New -> File) :
Create the featurefile i.e. CO_Self_Service_Home_Page_Personalizations.feature within features directory strictly through your IDE only.
Keep your Project Structure simple by placing the directory containing the featurefile(s) just under Project Workspace. For Featurefiles Cucumber works with directory names. So create the features directory just under your project space Automation (same hierarchy as src). So the location of the Self_Service_Home_Page_Personalizations.feature will be :
C:\Users\test\eclipse-workspace\Automation\features\Self_Service_Home_Page_Personalizations.feature
Again, as in your Class file containing #CucumberOptions you have mentioned glue = {"StepDefinitions" } ensure that the Class file containing #CucumberOptions must be in the similar hierarchy as the figure below :
So your CucumberOptions will be as follows :
#CucumberOptions(features = {"features" }, glue = {"StepDefinitions" })
Execute your Test
Note : Do not move/copy feature file(s)/directory(ies). Delete the unwanted and create a new one through your IDE only.