I have a file like this as shown below.
chr10 299448 299468 SRR048973.1457734 255 + 3
chr10 299448 299468 SRR048973.2114188 255 + 3
chr10 299448 299468 SRR048973.4148128 255 + 3
chr10 299945 299971 SRR048973.566192 255 + 6
chr10 299959 299982 SRR048973.762883 255 + 6
chr10 299968 299985 SRR048973.1595367 255 + 6
chr10 299968 299985 SRR048973.2828877 255 + 6
chr10 299968 299985 SRR048973.3711952 255 + 6
chr10 299968 299985 SRR048973.3821978 255 + 6
chr10 300073 300095 SRR048973.975870 255 + 1
chr10 300109 300134 SRR048973.1500469 255 + 1
chr10 300185 300209 SRR048973.655183 255 + 8
chr10 300185 300209 SRR048973.933425 255 + 8
chr10 300185 300209 SRR048973.963046 255 + 8
chr10 300185 300209 SRR048973.3506573 255 + 8
chr10 300185 300209 SRR048973.3627590 255 + 8
chr10 300186 300209 SRR048973.1133369 255 + 8
chr10 300186 300209 SRR048973.2178421 255 + 8
chr10 300186 300209 SRR048973.4047933 255 + 8
chr10 300401 300426 SRR048973.918503 255 + 2
chr10 300401 300426 SRR048973.2870188 255 + 2
Looking at the last column, if the last column is >= 5 then I want to count the lines till the column is greater than 5 until it falls back to < 5. Also the output I want should be like this for sample file
chr10 299945 299985 6
chr10 300185 300209 8
299945 comes from the 2nd column where the first 6 starts and 299985 comes from 3rd column where the last 6 ends. Similarly for 8.
I want to do this in Perl.
I tried writing the Perl script but cannot understand how to get coordinates properly.
#!/usr/bin/perl-w
use strict;
use warnings;
open F,'/user/tmp/output.bed',or die $!;
my $i=0;
while(<F>){
chomp;
my #s = split;
if($s[6] >= 5){
$i++;
}else{
if($s[6] < 5){
$i = 0;
}
}
}
How can I do it.
Thanks in advance
Regards
Use a range operator:
use strict;
use warnings;
my #last;
while (<DATA>) {
my #cols = split ' ';
if (my $range = $cols[-1] >= 5 .. $cols[-1] < 5 || eof) {
#last = #cols[0..2,-1] if $range == 1;
print "#last\n" if $range =~ /E/;
$last[2] = $cols[2];
}
}
__DATA__
chr10 299448 299468 SRR048973.1457734 255 + 3
chr10 299448 299468 SRR048973.2114188 255 + 3
chr10 299448 299468 SRR048973.4148128 255 + 3
chr10 299945 299971 SRR048973.566192 255 + 6
chr10 299959 299982 SRR048973.762883 255 + 6
chr10 299968 299985 SRR048973.1595367 255 + 6
chr10 299968 299985 SRR048973.2828877 255 + 6
chr10 299968 299985 SRR048973.3711952 255 + 6
chr10 299968 299985 SRR048973.3821978 255 + 6
chr10 300073 300095 SRR048973.975870 255 + 1
chr10 300109 300134 SRR048973.1500469 255 + 1
chr10 300185 300209 SRR048973.655183 255 + 8
chr10 300185 300209 SRR048973.933425 255 + 8
chr10 300185 300209 SRR048973.963046 255 + 8
chr10 300185 300209 SRR048973.3506573 255 + 8
chr10 300185 300209 SRR048973.3627590 255 + 8
chr10 300186 300209 SRR048973.1133369 255 + 8
chr10 300186 300209 SRR048973.2178421 255 + 8
chr10 300186 300209 SRR048973.4047933 255 + 8
chr10 300401 300426 SRR048973.918503 255 + 2
chr10 300401 300426 SRR048973.2870188 255 + 2
Outputs:
chr10 299945 299985 6
chr10 300185 300209 8
Do you need that counting? Your output seems to not incorporate it...
Using your code sample:
#!/usr/bin/perl-w
use strict;
use warnings;
open F,'/user/tmp/output.bed',or die $!;
my $i=0;
my $wasTheLastGreaterThan5 = 0;
while(<F>){
chomp;
my #s = split;
if(($s[6] >= 5) && !$wasTheLastGreaterThan5){
# Switched from smaller to greater than 5, do something here.
$wasTheLastGreaterThan5 = 1;
}elsif(($s[6] < 5) && $wasTheLastGreaterThan5){
# switched from greater to smaller, do something here.
$wasTheLastGreaterThan5 = 0;
}
else {
# Did not switch, if you need to count, you could do so here.
}
}
Related
I am creating a form control in Libre office and am exporting the document to pdf.
Trying to set the text of the control (a textbox) using itextsharp (in other words c# program) only empties the box.
However, if I open the pdf using acrobat reader and edits the text in the box, saving the document results in a pdf where it is possible to write to that textbox.
Why do I have to do that?
Error reproduction
Cliking the toolbar icon in libre office.
Dragging out a square in the document.
Double clicking that box, giving it the name currenttime.
Exporting to pdf:
c# code
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
if (saveFileDialog1.ShowDialog() == DialogResult.OK)
{
using (var fs = new FileStream(saveFileDialog1.FileName, FileMode.Create))
{
var reader = new PdfReader(openFileDialog1.FileName);
{
using (var pdfStamper = new PdfStamper(reader, fs))
{
var acroFields = pdfStamper.AcroFields;
acroFields.SetField("currentdate", DateTime.Now.ToString());
pdfStamper.FormFlattening = true;
pdfStamper.FreeTextFlattening = true;
pdfStamper.Writer.CloseStream = false;
}
}
reader.Close();
fs.Close();
}
}
}
edit
Here comes textual dumps of the pdf. I have changed some binary data places with "some binary data". The textbox has been given the default value "123".
pdf after it has been created with libre office is of version 1.4
%PDF-1.4
some binary data
2 0 obj
<</Length 3 0 R/Filter/FlateDecode>>
stream
some binary data
endstream
endobj
3 0 obj
78
endobj
7 0 obj
<</Type/FontDescriptor/FontName/LiberationSans
/Flags 4
/FontBBox[-543 -303 1301 980]/ItalicAngle 0
/Ascent 905
/Descent -211
/CapHeight 979
/StemV 80
>>
endobj
8 0 obj
<</Type/Font/Subtype/TrueType/BaseFont/LiberationSans
/Encoding/WinAnsiEncoding
/FirstChar 32 /LastChar 255
/Widths[277 277 354 556 556 889 666 190 333 333 389 583 277 333 277 277
556 556 556 556 556 556 556 556 556 556 277 277 583 583 583 556
1015 666 666 722 722 666 610 777 722 277 500 666 556 833 722 777
666 777 722 666 610 722 666 943 666 666 610 277 277 277 469 556
333 556 556 500 556 556 277 556 556 222 222 500 222 833 556 556
556 556 333 500 277 556 500 722 500 500 500 333 259 333 583 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
277 333 556 556 556 556 259 556 333 736 370 556 583 333 736 552
399 548 333 333 333 576 537 333 333 333 365 556 833 833 833 610
666 666 666 666 666 666 1000 722 666 666 666 666 277 277 277 277
722 722 777 777 777 777 777 583 777 722 722 722 722 666 666 610
556 556 556 556 556 556 889 500 556 556 556 556 277 277 277 277
556 556 556 556 556 556 556 548 610 556 556 556 556 500 556 500
]
/FontDescriptor 7 0 R>>
endobj
5 0 obj
<</F1 8 0 R
>>
endobj
9 0 obj
<</Font 5 0 R
/ProcSet[/PDF/Text]
>>
endobj
1 0 obj
<</Type/Page/Parent 6 0 R/Resources 9 0 R/MediaBox[0 0 595 842]/Annots[
4 0 R ]
/Group<</S/Transparency/CS/DeviceRGB/I true>>/Contents 2 0 R>>
endobj
6 0 obj
<</Type/Pages
/Resources 9 0 R
/MediaBox[ 0 0 595 842 ]
/Kids[ 1 0 R ]
/Count 1>>
endobj
10 0 obj
<</Type/XObject
/Subtype/Form
/BBox[0 0 82.7 23.1]
/Resources 9 0 R
/Length 18
/Filter/FlateDecode
>>
stream
some binary data
endstream
endobj
4 0 obj
<</Type/Annot/Subtype/Widget/F 4
/Rect[59.6 759.3 142.5 782.2]
/FT/Tx
/P 1 0 R
/T(currenttime)
/Ff 4096
/V <FEFF003100320033>
/DV <FEFF003100320033>
/DR<</Font 5 0 R>>
/DA(0 0 0 rg /F1 12 Tf)
/AP<<
/N 10 0 R
>>
>>
endobj
11 0 obj
<</Type/Catalog/Pages 6 0 R
/OpenAction[1 0 R /XYZ null null 0]
/Lang(sv-SE)
/AcroForm<</Fields[
4 0 R
]/DR 9 0 R/NeedAppearances true>>
>>
endobj
12 0 obj
<</Creator<FEFF005700720069007400650072>
/Producer<FEFF004C0069006200720065004F0066006600690063006500200035002E0033>
/CreationDate(D:20170606104859+02'00')>>
endobj
xref
0 13
0000000000 65535 f
0000001431 00000 n
0000000019 00000 n
0000000168 00000 n
0000001843 00000 n
0000001347 00000 n
0000001590 00000 n
0000000187 00000 n
0000000357 00000 n
0000001378 00000 n
0000001688 00000 n
0000002073 00000 n
0000002231 00000 n
trailer
<</Size 13/Root 11 0 R
/Info 12 0 R
/ID [ <5F5DD24A5E7FF740A8BB6B15F88EF602>
<5F5DD24A5E7FF740A8BB6B15F88EF602> ]
/DocChecksum /BFFAD3050AA9FF87945C97B9608B3C6C
>>
startxref
2406
%%EOF
after it has been edited in acrobat reader (I changed the default value of the textbox from "123" to "12"), it will be saved in version 1.6 and an interesting x:xmpmeta information is inserted. Also a lot of empty lines are inserted in the document. At this point, it is programmatically editable.
%PDF-1.6
%âãÏÓ
7 0 obj
<</Linearized 1/L 6449/O 9/E 2599/N 1/T 6160/H [ 451 149]>>
endobj
13 0 obj
<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<5F5DD24A5E7FF740A8BB6B15F88EF602><FAE65369E246E7409111A7D5BDED1E6F>]/Index[7 17]/Info 6 0 R/Length 52/Prev 6161/Root 8 0 R/Size 24/Type/XRef/W[1 2 1]>>stream
some binary data
endstream
endobj
startxref
0
%%EOF
23 0 obj
<</Filter/FlateDecode/I 92/Length 65/S 38/V 69>>stream
some binary data
endstream
endobj
8 0 obj
<</AcroForm<</DA(/Helv 0 Tf 0 g )/DR 22 0 R/Fields[14 0 R]>>/Lang(sv-SE)/Metadata 1 0 R/OpenAction[9 0 R/XYZ null null 0]/Pages 5 0 R/Type/Catalog>>
endobj
9 0 obj
<</Annots[14 0 R]/Contents 12 0 R/CropBox[0 0 595 842]/Group<</CS/DeviceRGB/I true/S/Transparency>>/MediaBox[0 0 595 842]/Parent 5 0 R/Resources 22 0 R/Rotate 0/Type/Page>>
endobj
10 0 obj
<</BBox[0.0 0.0 82.9 22.9]/Filter/FlateDecode/Length 68/Resources 15 0 R>>stream
some binary data
endstream
endobj
11 0 obj
<</Filter/FlateDecode/First 66/Length 1226/N 9/Type/ObjStm>>stream
some binary data
endstream
endobj
12 0 obj
<</Filter/FlateDecode/Length 78>>stream
some binary data
endstream
endobj
1 0 obj
<</Length 3146/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<xmp:CreateDate>2017-06-06T10:48:59+02:00</xmp:CreateDate>
<xmp:CreatorTool>Writer</xmp:CreatorTool>
<xmp:ModifyDate>2017-06-06T11:20:41+02:00</xmp:ModifyDate>
<xmp:MetadataDate>2017-06-06T11:20:41+02:00</xmp:MetadataDate>
<pdf:Producer>LibreOffice 5.3</pdf:Producer>
<xmpMM:DocumentID>uuid:fcdf7344-18ca-44b6-934c-8d5ab8fc8ea3</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:895fdc09-0aaa-4421-86b2-418c75f88d22</xmpMM:InstanceID>
<dc:format>application/pdf</dc:format>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
endstream
endobj
2 0 obj
<</Filter/FlateDecode/First 4/Length 48/N 1/Type/ObjStm>>stream
some binary data
endstream
endobj
3 0 obj
<</Filter/FlateDecode/First 4/Length 106/N 1/Type/ObjStm>>stream
some binary data
endstream
endobj
4 0 obj
<</DecodeParms<</Columns 3/Predictor 12>>/Filter/FlateDecode/ID[<5F5DD24A5E7FF740A8BB6B15F88EF602><FAE65369E246E7409111A7D5BDED1E6F>]/Info 6 0 R/Length 37/Root 8 0 R/Size 7/Type/XRef/W[1 2 0]>>stream
some binary data
endstream
endobj
startxref
116
%%EOF
edit2
I was putting the files on my dropbox.
https://www.dropbox.com/sh/5btzl9qqzua18q1/AACIjCrvNZ5cunuLj9sze-l3a?dl=0
As already surmised in a comment, the problem is caused by Libre Office creating the PDF with NeedAppearances set to true in the AcroForm dictionary. Furthermore the wrong field name is used.
Field name
In your code you set the field "currentdate" while in your sample PDFs the field is called "currenttime". Obviously you have to use the correct field name.
NeedAppearances flag
This flag tells a PDF viewer that it shall construct appearance streams and appearance dictionaries for all widget annotations in the document. iText, therefore, when filling in the form field
acroFields.SetField("currentdate", DateTime.Now.ToString());
does not create an appearance for that field - any viewer is required to construct new appearances anyways.
Unfortunately form flattening
pdfStamper.FormFlattening = true;
is implemented by using the existing appearances and only them. As no appearance has been created when setting the field, its flattened form turns out to be empty.
(Strictly speaking this implementation of form flattening is wrong: In this case iText is the PDF processor that wants to make use of the appearances; thus, it should create all appearances here, even ignoring existing ones.)
You can fix this by telling iText to create appearances during form fill-ins in spite of the NeedAppearances flag:
using (var pdfStamper = new PdfStamper(reader, fs))
{
var acroFields = pdfStamper.AcroFields;
acroFields.GenerateAppearances = true;// <<<<<<<<<<<<<<<<<<
acroFields.SetField("currenttime", DateTime.Now.ToString());
pdfStamper.FormFlattening = true;
pdfStamper.FreeTextFlattening = true;
pdfStamper.Writer.CloseStream = false;
}
After adding the marked line above, the output of the code includes the changed newly set value.
Additionally Libre Office does not embed the LiberationSans font. As I have not installed it on my system, I only see dots. I would propose you make LibreOffice embed such fonts or else use standard 14 fonts. Otherwise your PDFs won't display as desired on a number of computers.
This is my bcp command line:
bcp "exec DB.dbo.StoredProcedure" queryout (path to output file) -f (path to format file) -T -S<host>
This is my bcp format file to output a CSV file.
10.0
20
1 SQLINT 0 10 "\t" 1 Id ""
2 SQLNVARCHAR 0 255 "\t" 2 Name SQL_Latin1_General_CP1_CI_AS
3 SQLNVARCHAR 0 255 "\t" 3 Corporation SQL_Latin1_General_CP1_CI_AS
4 SQLNVARCHAR 0 255 "\t" 4 Address1 SQL_Latin1_General_CP1_CI_AS
5 SQLNVARCHAR 0 255 "\t" 5 Address2 SQL_Latin1_General_CP1_CI_AS
6 SQLNVARCHAR 0 255 "\t" 6 City SQL_Latin1_General_CP1_CI_AS
7 SQLNVARCHAR 0 255 "\t" 7 State SQL_Latin1_General_CP1_CI_AS
8 SQLNVARCHAR 0 255 "\t" 8 ZipCode SQL_Latin1_General_CP1_CI_AS
9 SQLNVARCHAR 0 255 "\t" 9 County SQL_Latin1_General_CP1_CI_AS
10 SQLNVARCHAR 0 255 "\t" 10 Phone SQL_Latin1_General_CP1_CI_AS
11 SQLNVARCHAR 0 255 "\t" 11 Latitude SQL_Latin1_General_CP1_CI_AS
12 SQLNVARCHAR 0 255 "\t" 12 Longitude SQL_Latin1_General_CP1_CI_AS
13 SQLNVARCHAR 0 255 "\t" 13 Area SQL_Latin1_General_CP1_CI_AS
14 SQLNVARCHAR 0 255 "\t" 14 Region SQL_Latin1_General_CP1_CI_AS
15 SQLNVARCHAR 0 255 "\t" 15 Email SQL_Latin1_General_CP1_CI_AS
16 SQLNVARCHAR 0 255 "\t" 16 LevelOfCare SQL_Latin1_General_CP1_CI_AS
17 SQLNVARCHAR 0 255 "\t" 17 Description SQL_Latin1_General_CP1_CI_AS
18 SQLNVARCHAR 0 255 "\t" 18 Options SQL_Latin1_General_CP1_CI_AS
19 SQLNVARCHAR 0 255 "\t" 19 Rates SQL_Latin1_General_CP1_CI_AS
20 SQLNVARCHAR 0 255 "\r\n" 20 Photos SQL_Latin1_General_CP1_CI_AS
The output is not getting delimited and it's all over the place. Does anyone know what I'm doing wrong? I can't use comma as a delimiter due the content being output.
hi i am trying to calculate a value of certain columns together and depending if a column is a certain value for instance
if lens.qty > 1 then (CASE LENS.LNS_PROGTYPE --DESIGN pOINTS
WHEN 762
THEN 70
when 767
THEN 70
when 768
THEN 70
WHEN 841
THEN 35
WHEN 842
then 35
else 0
end +
case LTRIM(RTRIM(LENS.COATTYP)) --ARC POINTS
when 'HVLL'
then 50
when 'HVLLBLUE'
then 100
else 0
end +
CASE LENS.LNS_IDX --MATERIAL POINTS
when 53
THEN 35
WHEN 56
THEN 35
WHEN 58
then 35
when 61
then 35
else 0
END +
CASE LENS.LNS_MATCLR --COLOR POINTS
WHEN 00
THEN 0
WHEN 46
THEN 35
WHEN 47
THEN 35
WHEN 48
then 35
else 0
end as TOTAL_POINTS)*lens.qty / 2
else
CASE LENS.LNS_PROGTYPE --DESIGN pOINTS
WHEN 762
THEN 70
when 767
THEN 70
when 768
THEN 70
WHEN 841
THEN 35
WHEN 842
then 35
else 0
end +
case LTRIM(RTRIM(LENS.COATTYP)) --ARC POINTS
when 'HVLL'
then 50
when 'HVLLBLUE'
then 100
else 0
end +
CASE LENS.LNS_IDX --MATERIAL POINTS
when 53
THEN 35
WHEN 56
THEN 35
WHEN 58
then 35
when 61
then 35
else 0
END +
CASE LENS.LNS_MATCLR --COLOR POINTS
WHEN 00
THEN 0
WHEN 46
THEN 35
WHEN 47
THEN 35
WHEN 48
then 35
else 0
end as TOTAL_POINTS)
i keep getting syntax error and i am not sure where i am going wrong
i am not sure how to do it and to be honest i don't completely understand the examples i have viewed your help would be greatly appreciated
I would do something like:
(CASE
WHEN LENS.LNS_PROGTYPE IN (762,767,768) THEN 70
WHEN LENS.LNS_PROGTYPE IN (841,842) THEN 35
else 0
end +
case LTRIM(RTRIM(LENS.COATTYP)) --ARC POINTS
when 'HVLL' then 50
when 'HVLLBLUE' then 100
else 0
end +
CASE
WHEN LENS.LNS_IDX IN (53,56,58,61) THEN 35
else 0
END +
CASE
WHEN LENS.LNS_MATCLR IN (46,47,48) THEN 35
else 0
end) * CASE WHEN lens.qty > 1 THEN lens.qty / 2 ELSE 1 END
For the entire expression. But, as I said, I'd also introduce some mapping tables rather than having all of these magic constants in the CASE expressions.
you must make sure, that all elements of that string/sum are of the same datatype. cast/convert them appropriately.
I've been using grep -f to obtain patterns from one file and extract lines from the other.
The results are like below:
1 11294199 11294322 40 10 123 0.0813008
1 11294199 11294322 41 6 123 0.0487805
1 11294199 11294322 42 10 123 0.0813008
1 11294199 11294322 43 2 123 0.0162602
1 11293454 11293544 51 1 90 0.0111111
1 11293454 11293544 52 2 90 0.0222222
1 11291356 11291491 54 6 135 0.0444444
1 11291356 11291491 55 8 135 0.0592593
1 11291356 11291491 56 3 135 0.0222222
Now I need to group the results based on the first three columns,and calculate the sum of column 4 for each of the groups:
1 11294199 11294322 (40+41+42+43)
1 11293454 11293544 (51+52)
1 11291356 11291491 (54+55+56)
How can I get such results? Any options in grep to achieve this?
thx
You will need awk to do what you want. Try this:
awk '{ array[$1 "\t" $2 "\t" $3] += $4 } END { for (i in array) print i "\t" array[i] }' file.txt
Results:
1 11294199 11294322 166
1 11291356 11291491 165
1 11293454 11293544 103
HTH
I have a matrix 'eff_tot' with dimension (m x n) which I want to rearrange according to a matrix called 'matches' (e.g. [n2 n3; n4 n5]) and put all the collumns not specified in 'matches' at the end.
That is, I want to have [eff_tot(:,n2) eff_tot(:,n3) ; eff_tot(:,n4) eff_tot(:,n5) ; eff_tot(:,n1)].
That's all folks!
Taking the example in the first answer, what I would like to have is:
eff_tot =
81 15 45 15 24
44 86 11 14 42
92 63 97 87 5
19 36 1 58 91
27 52 78 55 95
82 41 0 0 0
87 8 0 0 0
9 24 0 0 0
40 13 0 0 0
26 19 0 0 0
Regards.
Create a vector listing the indices of all the columns in eff_tot and then use SETDIFF to determine which columns do not occur in [n2 n3 n4 n5]. These columns are the unmatched ones. Now concatenate the matched and unmatched column indices to create your column-reordered eff_tot matrix.
>> eff_tot = randi(100, 5, 7)
eff_tot =
45 82 81 15 15 41 24
11 87 44 14 86 8 42
97 9 92 87 63 24 5
1 40 19 58 36 13 91
78 26 27 55 52 19 95
>> n2 = 3; n3 = 5; n4 = 2; n5 = 6;
>> missingColumn = setdiff(1:size(eff_tot, 2), [n2 n3 n4 n5])
missingColumn =
1 4 7
>> eff_tot = [eff_tot(:,n2) eff_tot(:,n3) eff_tot(:,missingIndex); eff_tot(:,n4) eff_tot(:,n5) zeros(size(eff_tot, 1), length(missingIndex))];
eff_tot =
81 15 45 15 24
44 86 11 14 42
92 63 97 87 5
19 36 1 58 91
27 52 78 55 95
82 41 0 0 0
87 8 0 0 0
9 24 0 0 0
40 13 0 0 0
26 19 0 0 0