Extracting symbol names from nm output - sed

I'd like to use nm -P -g symbol names to generate a .c file. however I'm not sure how to extract those symbol names.
Reading https://pubs.opengroup.org/onlinepubs/9699919799/utilities/nm.html says:
The format given in nm STDOUT uses <space> characters between the fields, which may be any number of <blank> characters required to align the columns.
I'm not sure how to interpret this - should my regex be ^[^ ]+_mkdocs[ ] [note: workaround for stackoverflow's wonky code formatting] or something else? I want the result to be whatever symbol name I extracted concatenated with (&doc);
e.g.
foo_mkdocs T 0 0
should become
foo_mkdocs(&doc);
but I'm unsure if I'm understanding nm's output format specification correctly.

Related

match string pattern by certain characters but exclude combinations of those characters

I have the following sample string:
'-Dparam="x" -f hello-world.txt bye1.txt foo_bar.txt -Dparam2="y"'
I am trying to use RegEx (PowerShell, .NET flavor) to extract the filenames hello-world.txt, bye1.txt, and foo_bar.txt.
The real use case could have any number of -D parameters, and the -f <filenames> argument could appear in any position between these other parameters. I can't easily use something like split to extract it as the delimiter positioning could change, so I thought RegEx might be a good proposition here.
My attempt is something like this in PowerShell (can be opened on any Windows system and copy pasted into it):
'-Dparam="x" -f hello-world.txt bye1.txt foo_bar.txt -Dparam2="y"' -replace '^.* -f ([a-zA-Z0-9_.\s-]+).*$','$1'
Desired output:
hello-world.txt bye1.txt foo_bar.txt
My problem is that I either only take hello-world.txt, or I get hello-world.txt all the way to the end of the string or next = symbol (as in the example above).
I am having trouble expressing that \s is allowed, since I need to capture multiple space-delimited filenames, but that the combination of \s-[a-zA-Z] is not allowed, as that indicates the start of the next argument.

How to recode missing genotype code is " '-' " in the ped file of plink

I'm trying to impute genotype data from the public reference panels but my files fail the file sanity check on Sanger Imputation server and it gives the following error:
failed sanity check :
of Non-ACGTN alternate allele at 1:4635556 .. REF_SEQ:'(null)' vs VCF:'-'
I have tried fixing this in the plink with the following command ./plink --bfile chr1 --recode vcf --out chr1_vcf --missing-genotype -
but then it gives error Underscore(s) present in sample IDs.
--recode vcf to chr1_vcf.vcf ... done.
but I still see '_' in the new coded file.
I would appreciate any help, suggestions and comments.
Thanks
Jasdeep
You will have to replace _ with a different character in your PLINK files before running your code.
See below from PLINK manual
When using --recode vcf, sample IDs are formed by merging the FID and IID and placing an underscore between them. When the FID or IID already contains an underscore, this may make it difficult to reconstruct them from the VCF file; you may want to replace underscores with a different character in PLINK files (Unix tr is handy here).

Matlab - Help in listing files using a name-pattern

I'm trying to create a function that lists the content of a folder based on a pattern, however the listing includes more files than needed. I'll explain by an example: Consider a folder containing the files
file.dat
file.dat._
file.dat.000
file.dat.001
...
file.dat.999
I am interested only in the files that are .000, .001 and so on. The files file.dat and file.dat._ are to be excluded.
The later numbering can also be .0000,.0001 and so on, so number of digits is not necessarily 3.
I tried using the Dir command with the pattern file.dat.* - this included file.dat for some reason (Why the last comma treated differently?) and file.dat._, which was expected.
The "obvious" set of solutions is to add an additional regular expression or length check - however I would like to avoid that, if possible.
This needs to work both under UNIX and Windows (and preferably MacOS).
Any elegant solutions?
Get all filenames with dir and filter them using with the regex '^file\.dat\.\d+$'. This matches:
start of the string (^)
followed by the string file.dat. (file\.dat\.)
followed by one or more digits (\d+)
and then the string must end ($)
Since the output of dir is a cell array of char vectors, regex returns a cell array with the matching indices of each char vector. The matching indices can only be 1 or [], so any is applied to each cell's content to reduce it to true or false The resulting logical index tells which filenames should be kept.
f = dir('path/to/folder');
names = {f.name};
ind = cellfun(#any, regexp(names, '^file\.dat\.\d+$'));
names = names(ind);

Using nzload to upload a file with two differing date formats

I am trying to load onto Netezza a file from a table in an Oracle database, the file contains two separate date formats - one field has the format
DD-MON-YY and the second field has the format DD-MON-YYYY hh24:MI:SS, is there any with in NZLOAD to cater for two different date formats within a file
Thanks
rob..
If your file is fixed-length, you can use zones
However, if its field delimited, you can use some of the preprocessing tools like sed to convert all the date / timestamp to one standard format, before piping the output to nzload.
for ex.,
1. 01-JAN-17
2. 01-JAN-2017 11:20:32
Lets convert the date field to same format
cat output.dat |\
sed -E 's/([0-9]{2})-([A-Z]{3})-([0-9]{2})/\1-\2-20\3/g' |\
nzload -dateStyle DMONY -dateDelim '-'
sed expression is pretty simple here, let's break it down
# looking for 2 digits followed by
# 3 characters and followed by
# 2 digits all separated by '-'
# elements are grouped with '()' so they can be referred by number
's/([0-9]{2})-([A-Z]{3})-([0-9]{2})
# reconstruct the date using group number and separator, prefix 20 to YY
/\1-\2-20\3
# apply globally
/g'
also in nzload we have specified the format of date and its delimiter.
Now we'll have to modify the regular expression depending upon different date formats and what they are getting converted to, this may not be an universal solution.

Postgresql: CSV export with escaped linebreaks

I exported some data from a postgresql database using (all) the instruction(s) posted here: Save PL/pgSQL output from PostgreSQL to a CSV file
But some exported fields contains newlines (linebreaks), so I got a CSV file like:
header1;header2;header3
foobar;some value;other value
just another value;f*** value;value with
newline
nextvalue;nextvalue2;nextvalue3
How can I escape (or ignore) theese newline character(s)?
Line breaks are supported in CSV if the fields that contain them are enclosed in double quotes.
So if you had this in the middle of the file:
just another value;f*** value;"value with
newline"
it will be taken as 1 line of data spread on 2 lines with 3 fields and just work.
On the other hand, without the double quotes, it's an invalid CSV file (when it advertises 3 fields).
Although there's no formal specification for the CSV format, you may look at RFC 4180 for the rules that generally apply.