How to print debugging information on one/specific OpenAPI model? - openapi

According to the OpenAPI docs here is how one can print generator's models data:
$ java -jar openapi-generator-cli.jar generate \
-g typescript-fetch \
-o out \
-i api.yaml \
-DdebugModels
which outputs 39000 lines and it's a little difficult to find a modele of one's interest.
How to output debug information on just one model?

Unfortunately, there's no way to generate the debug log for just one model or operation.
As a workaround, you can draft a new spec that contains the model you want to debug.

Related

How to copy partial topic data from one cluster to another

I have a use case where I need to copy the data from one topic to another topic in a different cluster but I need to copy only from a given offset. What can I use for the above use case?
I have looked into mirror maker as it copies data from one cluster to another but how to mention the offset part, I am not getting that.
Is there any utility I can use?
If as you say "This will be a one time operation" you can use kafkacat this -o option.
For example (the easiest case):
kafkacat -C -b mybrocker_cluster_1:9092 -t mytopic1 -o <offset> | \
kafkacat -P -b mybrocker_cluster_2:9092 -t mytopic1
You probably still need to add a few parameters to the consumer:
-X message.max.bytes=<value> -X fetch.message.max.bytes=<value> -X receive.message.max.bytes=<value>

merge chromosomes in Plink

I have downloaded 1000G dataset in the vcf format. Using Plink 2.0 I have converted them into binary format.
Now I need to merge the 1-22 chromosomes.
I am using this script:
${BIN}plink2 \
--bfile /mnt/jw01-aruk-home01/projects/jia_mtx_gwas_2016/common_files/data/clean/thousand_genomes/from_1000G_web/chr1_1000Gv3 \
--make-bed \
--merge-list /mnt/jw01-aruk-home01/projects/jia_mtx_gwas_2016/common_files/data/clean/thousand_genomes/from_1000G_web/chromosomes_1000Gv3.txt \
--out /mnt/jw01-aruk-home01/projects/jia_mtx_gwas_2016/common_files/data/clean/thousand_genomes/from_1000G_web/all_chrs_1000G_v3 \
--noweb
But, I get this error
Error: --merge-list only accepts 1 parameter.
The chromosomes_1000Gv3.txt has files related to chromosomes 2-22 in this format:
chr2_1000Gv3.bed chr2_1000Gv3.bim chr2_1000Gv3.fam
chr3_1000Gv3.bed chr3_1000Gv3.bim chr3_1000Gv3.fam
....
Any suggestions what might be the issue?
Thanks
The --merge-list cannot be used in combination with --bfile. You can either have --bfile/--bmerge or --merge-list only in one plink command.

Mahout clustering: How to retrieve the name of a named vector

I want to cluster multiple documents using Mahout. The clustering works fine but I have no idea how to find out which documents are located in each cluster.
I read that you can use the option --namedVector when creating the sparse-files but where does it take the ID from and how can I retrieve this ID after the clustering is completed?
Right now I am doing the following steps:
I have a directory with a file for each document. The files are in the following format with the ID of the document as filename:
filename: documentID.txt
[TITLE]
[CONTENT]
I create a sparse directory with namedVectors using:
./mahout seqdirectory -i tmp/es-out -o tmp/es-out-seqdir -c UTF-8 -chunk 64 -xm sequential
./mahout seq2sparse -i tmp/es-out-seqdir -o tmp/es-out-sparse --maxDFPercent 85 --namedVector
Then I can cluster the results and create a dump:
./mahout kmeans -i tmp/es-out-sparse/tfidf-vectors -c tmp/es-kmeans-clusters -o tmp/es-kmeans -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -x 10 -k 20 -ow --clustering
./mahout clusterdump -i tmp/es-kmeans/clusters-10-final -o tmp/clusterdump -d tmp/es-out-sparse/dictionary.file-0 -dt sequencefile -b 100 -n 20 --evaluate -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -sp 0 --pointsDir tmp/es-kmeans/clusteredPoints
The dump looks like this:
:VL-190{n=1 c=[1:3.407, 110:6.193, 2007:3.736, about:1.762, according:2.948, account:3.507, acting:6.
Top Terms:
epa => 13.471728324890137
mountaintop => 11.364262580871582
mine => 10.942587852478027
Weight : [props - optional]: Point:
[...]
k-means in Mahout is only a toy.
You can use it for howtos and tutorials, but for real use it is too slow, too limited, roo hard to use. (Also, k-means results are not half as good as people think... most of the time they are dogfood.)
Benchmark other tools, and you'll be surprised big time.
I found a way. You can use the seqdumper to extract the cluster mapping:
./mahout seqdumper -i /tmp/es-kmeans/clusteredPoints/part-m-00000 -o /tmp/cluster-points.txt
Than you can use a regex to extract the mapping of the vector IDs to cluster IDs.

Progress of simulations in headless mode

Is there any way to check the progress of simulations in headless mode as opposed to gui?
Basic Code:
$ ~/netlogo-5.1.0/netlogo-headless.sh \
--model ~/myproject/MyModel.nlogo \
--experiment MyExperiment \
--table ~/myproject/MyNewOutputData.csv
I'd suggest doing tail -f ~/myproject/MyNewOutputData.csv. This will show you a live view of the output file as it is being written to.

K-means on Mahout returning non-exclusive clusters

In my data I have users with a list of likes, I've dumped these likes into individual files for each user and would like to cluster them. Everything is working except the output has the same likes in multiple clusters. My understanding is k-means should be exclusive. I figure the problem is perhaps with how I am dumping the data. I have also dumped all of the likes without spaces for the time being until I can write a custom tokenizer. Here's what I'm running (from a ruby script).
system("#{MAHOUT_CMD} seqdirectory -c UTF-8 -i data/users -o data/kmeans/converted")
system("#{MAHOUT_CMD} seq2sparse -i data/kmeans/converted -o data/kmeans/vectors")
system("#{MAHOUT_CMD} kmeans -i data/kmeans/vectors/tfidf-vectors -c data/kmeans/initial_clusters -o data/kmeans/kmeans_clusters -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -cd 0.1 -k 20 -x 20")
last_cluster_folder = Dir["data/kmeans/kmeans_clusters/*"].last.gsub("data/kmeans/kmeans_clusters/", "")
system("#{MAHOUT_CMD} clusterdump -s data/kmeans/kmeans_clusters/#{last_cluster_folder}/ -d data/kmeans/vectors/dictionary.file-0 -dt sequencefile -o data/kmeans/clusters.txt -n 1000")
The output lists the "top terms" in each cluster, however many of the likes occur in each cluster (though with different weights). Is the normal output for clusterdumper, do I need to find out what cluster each word belongs to by its weight?
Thanks
Mahout probably is only doing approximate k-means. Plus, there might be objects that have the same distance to more than one cluster.
You should however be able to just take the k means, and then do a 1-nearest-neighbor classification to get a unique result for each objects (this is trivial to parallelize and very fast).