merge chromosomes in Plink - merge

I have downloaded 1000G dataset in the vcf format. Using Plink 2.0 I have converted them into binary format.
Now I need to merge the 1-22 chromosomes.
I am using this script:
${BIN}plink2 \
--bfile /mnt/jw01-aruk-home01/projects/jia_mtx_gwas_2016/common_files/data/clean/thousand_genomes/from_1000G_web/chr1_1000Gv3 \
--make-bed \
--merge-list /mnt/jw01-aruk-home01/projects/jia_mtx_gwas_2016/common_files/data/clean/thousand_genomes/from_1000G_web/chromosomes_1000Gv3.txt \
--out /mnt/jw01-aruk-home01/projects/jia_mtx_gwas_2016/common_files/data/clean/thousand_genomes/from_1000G_web/all_chrs_1000G_v3 \
--noweb
But, I get this error
Error: --merge-list only accepts 1 parameter.
The chromosomes_1000Gv3.txt has files related to chromosomes 2-22 in this format:
chr2_1000Gv3.bed chr2_1000Gv3.bim chr2_1000Gv3.fam
chr3_1000Gv3.bed chr3_1000Gv3.bim chr3_1000Gv3.fam
....
Any suggestions what might be the issue?
Thanks

The --merge-list cannot be used in combination with --bfile. You can either have --bfile/--bmerge or --merge-list only in one plink command.

Related

Training a new font in arabic causing issues: " Compute CTC targets failed "

I've been trying to train a new arabic font using tesseract. I was able to train it at first with the default training_text file available once you install Tesseract. But I wanted to train it using my own generated data.
So I proceeded as follow:
First I changed the ara.training_text file and I inserted some of the data that I want to train my model on.
Then I generated the .tif files using this command:
!/content/tesstutorial/tesseract/src/training/tesstrain.sh --fonts_dir /content/fonts \
--fontlist 'Traditional Arabic' \
--lang ara \
--linedata_only \
--langdata_dir /content/tesstutorial/langdata \
--tessdata_dir /content/tesstutorial/tesseract/tessdata \
--save_box_tiff \
--maxpages 100 \
--output_dir /content/train
then I combined the train_best trained data for arabic with the generated ara.lstm
!combine_tessdata -e /content/tesstutorial/tesseract/tessdata/best/ara.traineddata ara.lstm!combine_tessdata -e /content/tesstutorial/tesseract/tessdata/best/ara.traineddata ara.lstm
All good for now, then when I proceed to call lstmtraining, I am getting a "Compute CTC tagets failed" error whenever I am calling training
!OMP_THREAD_LIMIT=8 lstmtraining \
--continue_from /content/ara.lstm \
--model_output /content/output/araNewModel \
--old_traineddata /content/tesstutorial/tesseract/tessdata/best/ara.traineddata \
--traineddata /content/train/ara/ara.traineddata \
--train_listfile /content/train/ara.training_files.txt \
--max_iterations 200 \
--debug_level -1
I realized that his was only happening whenever I was adding arabic numerals to my code. When I pass in a training_text file with no arabic numerals it works fine.
Can someone tell me what this error is about how to solve it.

How to print debugging information on one/specific OpenAPI model?

According to the OpenAPI docs here is how one can print generator's models data:
$ java -jar openapi-generator-cli.jar generate \
-g typescript-fetch \
-o out \
-i api.yaml \
-DdebugModels
which outputs 39000 lines and it's a little difficult to find a modele of one's interest.
How to output debug information on just one model?
Unfortunately, there's no way to generate the debug log for just one model or operation.
As a workaround, you can draft a new spec that contains the model you want to debug.

Filtering on labels in Docker API not working (possible bug?)

I'm using the Docker API to get info on containers in JSON format. Basically, I want to do a filter based on label values, but it is not working (just returns all containers). This filter query DOES work if you just use the command line docker, i.e.:
docker ps -a -f label=owner=fred -f label=speccont=true
However, if I try to do the equivalent filter query using the API, it just returns ALL containers (no filtering done), i.e.:
curl -s --unix-socket /var/run/docker.sock http:/containers/json?all=true&filters={"label":["speccont=true","owner=fred"]}
Note that I do uri escape the filters param when I execute it, but am just showing it here unescaped for readability.
Am I doing something wrong here? Or does this seem to be a bug in the Docker API? Thanks for any help you can give!
The correct syntax for filtering containers by label as of Docker API v1.41 is
curl -s -G -X GET --unix-socket /var/run/docker.sock http://localhost/containers/json" \
--data 'all=true' \
--data-urlencode 'filters={"label":["speccont=true","owner=fred"]}'
Note the automatic URL encoding as mentioned in this stackexchange post.
I felt there was a bug with API too. But turns out there is none. I am on API version 1.30.
I get desired results with this call:
curl -sS localhost:4243/containers/json?filters=%7B%22ancestor%22%3A%20%5B%222bab985010c3%22%5D%7D
I got the url escaped string using used above with:
python -c 'import urllib; print urllib.quote("""{"ancestor": ["2bab985010c3"]}""")'

Why am I getting the "Valid table name is required for in, out or format" error with BCP?

I want to import a table while keeping the identity column.
In cmd, I enter:
bcp database.edg.Hello in C:\Users\Tech\Downloads\p.csv -c -E
-S 349024ijfpok.windows.net\MSSQLSERVER -T
Which returns:
A valid table name is required for in, out or format options
Is this an issue with the syntax?
You need brackets for database information. I usually use colon " for path as well just to be sure. Complete command:
bcp [database].[edg].[Hello] in "C:\Users\Tech\Downloads\p.csv" -c -E
-S 349024ijfpok.windows.net\MSSQLSERVER -T

Get mysqldump to dump data suitable for psql input (escaped single quotes)

I'm trying to port a database from MySQL to PostgreSQL. I've rebuilt the schema in Postgres, so all I need to do is get the data across, without recreating the tables.
I could do this with code that iterates all the records and inserts them one at a time, but I tried that and it's waaayyyy to slow for our database size, so I'm trying to use mysqldump and a pipe into psql instead (once per table, which I may parallelize once I get it working).
I've had to jump through various hoops to get this far, turning on and off various flags to get a dump that is vaguely sane. Again, this only dumps the INSERT INTO, since I've already prepared the empty schema to get the data into:
/usr/bin/env \
PGPASSWORD=mypassword \
mysqldump \
-h mysql-server \
-u mysql-username \
--password=mysql-password \
mysql-database-name \
table-name \
--compatible=postgresql \
--compact \
-e -c -t \
--default-character-set=utf8 \
| sed "s/\\\\\\'/\\'\\'/g" \
| psql \
-h postgresql-server \
--username=postgresql-username \
postgresql-database-name
Everything except that ugly sed command is manageable. I'm doing that sed to try and convert MySQL's approach to quoting single-quotes inside of strings ('O\'Connor') o PostgreSQL's quoting requirements ('O''Connor'). It works, until there are strings like this in the dump: 'String ending with a backslash \\'... and yes, it seems there is some user input in our database that has this format, which is perfectly valid, but doesn't pass my sed command. I could add a lookbehind to the sed command, but I feel like I'm crawling into a rabbit hole. Is there a way to either:
a) Tell mysqldump to quote single quotes by doubling them up
b) Tell psql to expect backslashes to be interpreted as quoting escapes?
I have another issue with BINARY and bytea differences, but I've worked around that with a base64 encoding/decoding phase.
EDIT | Looks like I can do (b) with set backslash_quote = on; set standard_conforming_strings = off;, though I'm not sure how to inject that into the start of the piped output.
Dump the tables to TSV using mysqldump's --tab option and then import using psql's COPY method.
The file psqlrc and ~/.psqlrc may contain SQL commands to be executed when the client starts. You can put these three lines, or any other settings you would like in that file.
SET standard_conforming_strings = 'off';
SET backslash_quote = 'on';
SET escape_string_warning = 'off';
These settings for psql combined with the following mysqldump command will successfully migrate data only from mysql 5.1 to postgresql 9.1 with UTF-8 text (Chinese in my case). This method may be the only reasonable way to migrate a large database if creating an intermediate file would be too large or too time consuming. This requires you manually migrate the schema, since the two database's data types are vastly different. Plan on typing out some DDL to get it right.
mysqldump \
--host=<hostname> \
--user=<username> \
--password=<password> \
--default-character-set=utf8 \
--compatible=postgresql \
--complete-insert \
--extended-insert \
--no-create-info \
--skip-quote-names \
--skip-comments \
--skip-lock-tables \
--skip-add-locks \
--verbose \
<database> <table> | psql -n -d <database>
Try this:
sed -e "s/\\\\'/\\\\\\'/g" -e "s/\([^\\]\)\\\\'/\1\\'\\'/g"
Yeah, "Leaning Toothpick Syndrome", I know.