Why is the ngram-merge of srilm taking wrong input?

Why is the ngram-merge of srilm taking wrong input? - sh

This is my first post here and sorry for my poor english.
I'm instantly working on Kaldi and srilm tools for my research, but I faced a strange problem while using ngram-merge to merge the 3-gram.count files generated by ngram-count.
(ngram-count and ngram-merge are two modules in srilm)
The code I used in my shell script is shown as follows:
ngram-merge \
-write $dir_ngram/corpus_${ng}-gram.count \
$dir_ngram/glsp_poj_tlu.txt_${ng}-gram.count /
$dir_ngram/icorpus_tlu.txt_${ng}-gram.count /
$dir_ngram/khkp_tlu.txt_${ng}-gram.count /
$dir_ngram/nmtl_tlu.txt_${ng}-gram.count /
$dir_ngram/total_tlu.txt_${ng}-gram.count /
$dir_ngram/twbb_tlu.txt_${ng}-gram.count
while $dir_ngram simply stands for the directory of the .count files and ${ng} is 3 here since I'm using trigram for my language model.
But when I run this part of code, errors occurred and they looks like this:
/kaldi/egs/simple_20190520/source/ngram/icorpus_tlu.txt_3-gram.count: line 1: unk: No such file or directory
/kaldi/egs/simple_20190520/source/ngram/icorpus_tlu.txt_3-gram.count: line 2: syntax error near unexpected token `<'
/kaldi/egs/simple_20190520/source/ngram/icorpus_tlu.txt_3-gram.count: line 2: `<unk> <unk> 11844000'
/kaldi/egs/simple_20190520/source/ngram/khkp_tlu.txt_3-gram.count: line 1: unk: No such file or directory
/kaldi/egs/simple_20190520/source/ngram/khkp_tlu.txt_3-gram.count: line 2: syntax error near unexpected token `<'
/kaldi/egs/simple_20190520/source/ngram/khkp_tlu.txt_3-gram.count: line 2: `<unk> <unk> 449400'
/kaldi/egs/simple_20190520/source/ngram/nmtl_tlu.txt_3-gram.count: line 1: unk: No such file or directory
/kaldi/egs/simple_20190520/source/ngram/nmtl_tlu.txt_3-gram.count: line 2: syntax error near unexpected token `<'
/kaldi/egs/simple_20190520/source/ngram/nmtl_tlu.txt_3-gram.count: line 2: `<unk> <unk> 13706200'
/kaldi/egs/simple_20190520/source/ngram/total_tlu.txt_3-gram.count: line 1: unk: No such file or directory
/kaldi/egs/simple_20190520/source/ngram/total_tlu.txt_3-gram.count: line 2: syntax error near unexpected token `<'
/kaldi/egs/simple_20190520/source/ngram/total_tlu.txt_3-gram.count: line 2: `<unk> <unk> 11155390'
/kaldi/egs/simple_20190520/source/ngram/twbb_tlu.txt_3-gram.count: line 1: unk: No such file or directory
/kaldi/egs/simple_20190520/source/ngram/twbb_tlu.txt_3-gram.count: line 2: syntax error near unexpected token `<'
/kaldi/egs/simple_20190520/source/ngram/twbb_tlu.txt_3-gram.count: line 2: `<unk> <unk> 7575840'
It seems like ngram-merge took the first line of the files as file name or directory, since the unk symbol is the first line of every .count files (take icorpus_tlu.txt_3-gram.count for example):
<unk> 21952800
<unk> <unk> 11844000
<unk> <unk> <unk> 6161460
<unk> <unk> pó-tshî 660
<unk> <unk> pe̍h-liáu-kang 60
<unk> <unk> m̄-sī 3840
<unk> <unk> lîu-hîng 540
<unk> <unk> ē-sái 12900
<unk> <unk> uî-huat 1740
<unk> <unk> kín-tiunn 780
<unk> <unk> tâi-tiong-tshī 840
<unk> <unk> kuī 120
<unk> <unk> tsú-lâng 660
<unk> <unk> tsi̍t 38520
.
.
.
The unk symbol and the second line of the .count file appears in the first and third lines of the error message. I don't know why this is happening, because I think ngram-merge should only open the file and start to read the ngrams, not treating the content as a directory to open. Another strange thing is that the "take content as directory" problem only occurs on the last five files. The first file seems to have no reading or directory problem at all.
I know I could simply merge the corpus together since all the corpus are not too big, but I'm just a little curious about this problem. Does anybody know how to solve this？

Related

ERROR: Cannot determine the Quarto source path. This script must be run from the bin or common folder

While attempting to use Quarto on JupyterNotebook, Quarto gives the following error on quarto render yippee.ipynb --to html:
/usr/local/bin/quarto: line 7: dirname: command not found
/usr/local/bin/quarto: line 8: readlink: command not found
/usr/local/bin/quarto: line 9: basename: command not found
/usr/local/bin/quarto: line 12: dirname: command not found
/usr/local/bin/quarto: line 23: basename: command not found
ERROR: Cannot determine the Quarto source path. This script must be run from the bin or common folder.
I tried to execute the mere Quarto, which gave the (exact) same error. Thanks.

Trying to install NNTP reader tin and parsdate.y error

I am trying to install tin on a CentOS 7 VM. ./configure runs fine, and then when I run make build, I get...
[user#db3 tin-2.4.5]$ make build
make[1]: Entering directory `/home/user/tin-2.4.5/src'
expect 6 shift/reduce conflicts ...
./parsdate.y
./parsdate.y: line 1: fg: no job control
./parsdate.y: line 2: /bin: Is a directory
./parsdate.y: line 3: active.c: command not found
./parsdate.y: line 4: active.c: command not found
./parsdate.y: line 5: active.c: command not found
./parsdate.y: line 6: active.c: command not found
./parsdate.y: line 7: active.c: command not found
./parsdate.y: line 8: active.c: command not found
./parsdate.y: line 9: syntax error near unexpected token `newline'
./parsdate.y: line 9: ` * Originally written by Steven M. Bellovin <smb#research.att.com>'
make[1]: *** [parsdate.o] Error 2
make[1]: Leaving directory `/home/user/tin-2.4.5/src'
make: [build] Error 2 (ignored)
Can someone tell me what I'm doing wrong?
Thanks.

I ran into the exact same problem, and in the same environment.
From what I can tell, at this point you may just need to:
./configure
make clean
make build
Rerunning ./configure after installing bison and chmod 777'ing parsdate.y is how I fixed the compilation errors but it seems you already did the bison part and chmod part.

Dual regression error (multiple files in a text file)

So I'm running into some trouble using dual_regression. The problem here is that I'm using the following command and getting the following error:
> macminngh:session_one_and_three sondosayyash$ dual_regression /Users/sondosayyash/Downloads/FIX_sNorm/40_subjects.gica/groupmelodic.ica/melodic_IC.nii.gz 1 -1 5000 dualreg_40subj_output.dr 'cat /Users/sondosayyash/Desktop/Users.txt'
/Users/sondosayyash/abin/fsl/bin/dual_regression: line 126: [: too many arguments
mkdir: dualreg_40subj_output.dr: File exists
mkdir: dualreg_40subj_output.dr/scripts+logs: File exists
creating common mask
/bin/sh: line 1: syntax error near unexpected token `dualreg_40subj_output.dr/scripts+logs/drA'
/bin/sh: line 1: `file (dualreg_40subj_output.dr/scripts+logs/drA) does not exist -T 5 -N drB -l dualreg_40subj_output.dr/scripts+logs dualreg_40subj_output.dr/scripts+logs/drB'
doing the dual regressions
sorting maps and running randomise
/bin/sh: line 1: you: command not found
I don't know where I'm going wrong.
As for the text file listed as 'Users.txt' has many different file directories to filtered_func data.
I have a feeling there is a problem with the text file but I'm not entirely sure.

AEM server not getting started from start script AEM6.3

I have an AEM 6.3 server on Linux(RedHat). It is being getting started from command line without any issues with below command-
java -jar aem-author-4502.jar
But I am not able to start the server from start script and getting below error-
# ./start.bat
./start.bat: line 1: #echo: command not found
./start.bat: line 2: ::: command not found
./start.bat: line 3: $'::\r': command not found
./start.bat: line 4: ::: command not found
./start.bat: line 5: ::: command not found
./start.bat: line 5: $'e.g.,\r': command not found
./start.bat: line 6: $'::\r': command not found
./start.bat: line 7: ::: command not found
: No such file or directoryt.bat
./start.bat: line 8: $'::\r': command not found
./start.bat: line 9: $'setlocal\r': command not found
./start.bat: line 10: $'\r': command not found
./start.bat: line 11: ::*: command not found
./start.bat: line 17: syntax error near unexpected token `('
'/start.bat: line 17: `::* runmode(s)
Also, I am not able to set the AEM as service (linux)..
What could be the reason for this?
One thing that I observed is I don't have cq.pid file in my crx-quickstart/conf folder..

start.bat is a Batch file, one you would run on a Windows OS.
What you need to run is a similar shell script that you should find just next to the one you're trying to execute. It should be present in the <cq_installation_directory>/bin directory.
As per the official documentation, Simply running ./start should do the trick on Linux.

Reset a crontab file in centos?

I have a crontab file that is no longer allowing me to edit it. When executing
/usr/bin/crontab -e
I get the following output
/usr/bin/crontab: line 8: 15: command not found
/usr/bin/crontab: line 9: 40: command not found
/usr/bin/crontab: line 10: 45: command not found
/usr/bin/crontab: line 11: 00: command not found
/usr/bin/crontab: line 12: 00: command not found
/usr/bin/crontab: line 13: 00: command not found
/usr/bin/crontab: line 14: 45: command not found
/usr/bin/crontab: line 15: 50: command not found
Is there a way for me to reset this file to a default version. All commands on the crontab fail with the same error so I not able to use
/usr/bin/crontab -r
to remove it. Thanks for your assistance

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Why is the ngram-merge of srilm taking wrong input? - sh

Related

ERROR: Cannot determine the Quarto source path. This script must be run from the bin or common folder

Trying to install NNTP reader tin and parsdate.y error

Dual regression error (multiple files in a text file)

AEM server not getting started from start script AEM6.3

Reset a crontab file in centos?

Categories

Resources