Why FastText test of a model return only 1 exemple when my test file contains 135 - fasttext

I'm trying to test the model (model.bin) i've made with fastText on a test file (test.txt). In this test file, i have 135 labelised data. I'm expecting from fastText to test my model on this number of example, but instead, it only test it over 1 example. Where does come from this problem ?
I've already tried to do such a thing with another model and another testing file and all worked nicely.
this is how I test my model. model_baby.bin is the model, and test.data.txt is my testing file.
./fasttext test model_baby.bin test.data.txt
N 1
P#1 1
R#1 0.0164
Number of examples: 1
And here is an extract from my testing file
__label__4.0 I love the fact you can hide your stuff. Only down is that the straps to hold it at midpoint and bottom could be better designed for your car. It's got plenty of room which is great. __label__5.0 This hid our ipad wonderfully. Especially for those quick stops where we all had jump out and use the restroom. It zipped, folded and held all our stuff for the kids in the back seat. __label__3.0
As i have more than 1 labelised example in my testing file, I expect the output "Number of examples: " to be at least more than 1 but the actual one is "1"

From the official documentation (https://fasttext.cc/docs/en/supervised-tutorial.html): Each line of the text file contains a list of labels, followed by the corresponding document. All the labels start by the __label__ prefix, which is how fastText recognize what is a label or what is a word.
I don't understand very much your extract. I think it should be like this:
__label__4.0 I love the fact you can hide your stuff. Only down is that the straps to hold it at midpoint and bottom could be better designed for your car. It's got plenty of room which is great.
__label__5.0 This hid our ipad wonderfully. Especially for those quick stops where we all had jump out and use the restroom. It zipped, folded and held all our stuff for the kids in the back seat.
__label__3.0 ...

Related

Is it possible to merge raster bands from several folders using GDAL?

I have two folders containing about 15 000 .tif files. Each file in the first folder is a raster with 5 bands, named AA_"number" meaning it looks like
AA_1.tif,
AA_2.tif,
...,
AA_15000.tif.
Each file in the second folder is a raster with 2 bands named BB_"number" and looks like
BB_1.tif,
BB_2.tif,
...,
BB_15000.tif.
My goal is to add bands 1-3 from first file from folder AA with band 1 from the first file in folder BB to create a 4 band raster, and make 15000 4 band rasters. After doing some research and testing things out in QGIS I believe the tool Merge from GDAL could solve this task, but I have not been able make it find the right files in different folders. And as I have 2x 15 000 files, it is not possible to do this selection manually. Is there anyone who know a smart solution to this, preferably using GDAL or QGIS?
There are many ways to do this, and it really depends on what the exact use case is. Like the type of analysis/visualization that needs to be done on the result.
With this many files, it could for example be nice to merge them using a VRT. That will avoid creating redundant data, but whether that's actually the best solution depends. Just stacking them in a new tiff-file would of course also work.
Unfortunately, creating a VRT using gdalbuildvrt / gdal.BuildVRT is not possible with multi-band inputs.
If your inputs are homogeneous in terms of properties, it should be fairly simple to set up a template where you fill in the file locations and write the VRT to disk. For more inputs with heterogeneous properties it might still be possible, but you'll have to be careful to take it all into account.
Conceptually such a VRT would look something like:
<VRTDataset rasterXSize="..." rasterYSize="...">
<SRS>...</SRS>
<GeoTransform>....</GeoTransform>
<VRTRasterBand dataType="..." band="1">
<ComplexSource>
<SourceFilename relativeToVRT="0">//some_drive/aa_folder/aa_file1.tif</SourceFilename>
<SourceBand>1</SourceBand>
...
</ComplexSource>
</VRTRasterBand>
<VRTRasterBand dataType="..." band="2">
<ComplexSource>
<SourceFilename relativeToVRT="0">//some_drive/aa_folder/aa_file1.tif</SourceFilename>
<SourceBand>2</SourceBand>
...
</ComplexSource>
</VRTRasterBand>
<VRTRasterBand dataType="..." band="3">
<ComplexSource>
<SourceFilename relativeToVRT="0">//some_drive/aa_folder/aa_file1.tif</SourceFilename>
<SourceBand>3</SourceBand>
...
</ComplexSource>
</VRTRasterBand>
<VRTRasterBand dataType="..." band="4">
<ComplexSource>
<SourceFilename relativeToVRT="0">//some_drive/bb_folder/bb_file1.tif</SourceFilename>
<SourceBand>1</SourceBand>
...
</ComplexSource>
</VRTRasterBand>
</VRTDataset>
You can first use gdalbuildvrt on some of your files to find all the properties that need to be filled in, like projection, pixel dimensions etc. That will work, but gdalbuildvrt will only be able to take the first band from the inputs. If all bands have homogeneous properties (like nodata value etc), that should be fine as a reference.

Cimplicity Screen - one object/button that is dependent on hundreds of points

So I have created a huge screen that essentially just shows the robot status for every robot in this factory (individually)… At the very end of the project, they decided they want one object on the screen that blinks if any of the 300 robots fault. I am trying to think of a way to make this work. Maybe a global script of some kind? Problem is, I do not do much scripting in Cimplicity, so any help is appreciated.
All the points that are currently used on this screen (to indicate a fault) have very similar names… as in, the beginning is the same… so I was thinking of a script that could maybe recognize if a bit is high based on PART of it's string name characteristic. The end will change a little each time, but I am sure there is a way to only look for part of a string and negate the rest. If the end has to be hard coded, that's fine.
You can use a Python script in Cimplicity.
I will not go into detail on the use of python in Cimplicity, which is well described in the documentation indicated above.
Here's an example of what can be done... note that I don't have a way to test it and, of course, this will work if the name of your robots in the declaration follows the format Robot_1, Robot_2, Robot_3 ... Robot_10 ... Robot_300 and it also depends on the Name and the Type of the fault variable... as you didn't define it, I imagine it can be an integer, with ZERO indicating no error. But if you use something other than that, you can easily change it.
import cimplicity
(...)
OneRobotWithFault = False
# Here you get the values and check for fault
for i in range(0, 300):
pointName = f'MyFactory.Robot_{i}.FaultCode'
robotFaultCode = cimplicity.point_get(pointName)
if robotFaultCode > 0:
OneRobotWithFault = True
break
# Set the status to the variable "WeHaveRobotWithFault"
cimplicity.point_set("WeHaveRobotWithFault", OneRobotWithFault)

Mozilla Deep Speech SST suddenly can't spell

I am using deep speech for speech to text. Up to 0.8.1, when I ran transcriptions like:
byte_encoding = subprocess.check_output(
"deepspeech --model deepspeech-0.8.1-models.pbmm --scorer deepspeech-0.8.1-models.scorer --audio audio/2830-3980-0043.wav", shell=True)
transcription = byte_encoding.decode("utf-8").rstrip("\n")
I would get back results that were pretty good. But since 0.8.2, where the scorer argument was removed, my results are just rife with misspellings that make me think I am now getting a character level model where I used to get a word-level model. The errors are in a direction that looks like the model isn't correctly specified somehow.
Now I when I call:
byte_encoding = subprocess.check_output(
['deepspeech', '--model', 'deepspeech-0.8.2-models.pbmm', '--audio', myfile])
transcription = byte_encoding.decode("utf-8").rstrip("\n")
I now see errors like
endless -> "endules"
service -> "servic"
legacy -> "legaci"
earning -> "erting"
before -> "befir"
I'm not 100% that it is related to removing the scorer from the API, but it is one thing I see changing between releases, and the documentation suggested accuracy improvements in particular.
Short: The scorer matches letter output from the audio to actual words. You shouldn't leave it out.
Long: If you leave out the scorer argument, you won't be able to detect real world sentences as it matches the output from the acoustic model to words and word combinations present in the textual language model that is part of the scorer. And bear in mind that each scorer has specific lm_alpha and lm_beta values that make the search even more accurate.
The 0.8.2 version should be able to take the scorer argument. Otherwise update to 0.9.0, which has it as well. Maybe your environment is changed in a way. I would start in a new dir and venv.
Assuming you are using Python, you could add this to your code:
ds.enableExternalScorer(args.scorer)
ds.setScorerAlphaBeta(args.lm_alpha, args.lm_beta)
And check the example script.

How do I generate a fixed sized list of facts (duplicates included)?

I'm new to ASP & Clingo and I need to work on a project for school. I thought about some basic music generator.
For now, I need to generate notes (I'm sticking with C major for now). I also want to generate them randomly and I don't know how to do that. How can I make the following code generate a random sequence of notes (duplicates too)?
note(c;d;e;f;g;a;b).
20 { play(X) : note(X)} 30.
#show play/1.
So far, the code won't allow for more than 7 as the upper bound, because it won't show duplicate notes.
Current output: play(b) play(g) play(e) play(c)
Wanted output: play(d) play(g) play(f) ...[20-30 randomly generated notes]
I want to be able to add constraints later (such as this note should not be followed by that note, and so on). I appreciate any tips since I know so little about this.
An answer set is a set. The atoms have no order and duplicates are not possible because it is a set.
You want to guess one note for each beat.
beat(1..8).
1 { play(N,B) : note(N) } 1 :- beat(B).

Google Sheets - Retrieve "A:File1" to "A:File2" where "Sheetname:File1" = "B:File2" if "C:File2" is between "E" and "F" in "File1"

Sorry for the somewhat long title, but I was told to be as specific as possible. :D
My problem will require some explantion.
So, I have 2 spreadsheets files ("Konverteringstabeller" and "Tee Posen").
In "Tee Posen" I have a sheet named "Scores MIK" (golf scorecard and my name).
In "Konverteringstabeller" I have sheets with conversion tables for multiple golf courses, but if one works, all should.
What I need is to find out what course handicap I would get if my golf handicap is "HCP 26,0" (as shown in File 2 Picture), and in this case that result should be 29 (not visible), but you should get the point.
(example: golf hcp 10 would result in course hcp 11, because 10 is between 9,9-10,7)
While I have been able to find the right result, it has only been in the "Konverteringstabeller" spreadsheet file and that is not the place I need it.
I want to have it written in E6 in the "Scores MIK" sheet in File 2.
I should mention that in "Scores MIK : File 2", cell C2 (Ikast Golf Klub) has data validation so I can easily change between the different courses in the "Konverteringstabeller" file once I add more.
What I have been messing with is something with vlookup and importrange with concatenate in it, but I can't figure out how to do it, so I ask for your help.
And I am by no means skilled in the art of Spreadsheets, so I would very much appreciate a detailed explanation.
Picture - Scores MIK (File 2)
Picture - Ikast Golf Klub (File 1)
Thanks in advance!
// Mikkel Christensen
OK so a couple notes - One is that to join a static cell where you keep the sheet name but allow it to chance you should add '$' around it, also if the rows for B8-E70 will always be the same position on the various sheets you also need to add $ around those as well.
here is an example of the whole formula
=IFERROR(ARRAYFORMULA(VLOOKUP(E5:E25;IMPORTRANGE("spreadsheet key";"'"&C2&"'!$B$8:$E$70");4;TRUE)))
And lastly - using the "&" operator to concatenate is better at least in my opinion because concatenate sometimes does not work as well with array formula - plus I find it personally quicker and easier to use that having wrap yet another function around my stuff.