Extracting certain rows from a text file, skipping others

Extracting certain rows from a text file, skipping others - perl

I have a file that contains a long list of data like this:
b09 fl__2220 fuel20 ddm___an ddm___an dfl___de dfl___de dfl___de dfl___de dfl___de dfl___de dfl___de
fl dfl___de dfl___de dfl___de dfl___de dfl___de dfl___de dfl___de ddm___an ddm___an
simulated 32.9 0.0000 0.0000 0.0659 0.0888 0.1132 0.1298 0.1374 0.1413 0.1452
0.1460 0.1434 0.1404 0.1339 0.1186 0.0946 0.0708 0.0000 0.0000
measured 29.0 0.0000 0.0000 0.0579 0.0780 0.0994 0.1140 0.1207 0.1241 0.1276
0.1283 0.1260 0.1233 0.1177 0.1042 0.0831 0.0622 0.0000 0.0000
I want to extract certain data from a particular row, and then more data from a few rows ahead. From the first row I want to extract 'b09 fl__2220' and then I want to extract the fifth and sixth rows, so everything after 'measured'. The final output would look something like this:
b09 fl__2220 measured 29.0 0.0000 0.0000 0.0579 0.0780 0.0994 0.1140 0.1207 0.1241 0.1276 0.1283 0.1260 0.1233 0.1177 0.1042 0.0831 0.0622 0.0000 0.0000
I can get gawk to extract the b09 and fl__2220 with gawk '/fl__2220/ {print $1, $2}', but how do I get it to skip ahead and read the stuff from 'measured' line onwards to the last 0.0000? Or would something like perl or grep be better for situations like this?
The whole file contains similar data, eg. 'c12 fl__2211. . .', etc. But I just want the data for b09 fl__2220.

This is pretty simple to do in Perl. It looks like your data is fixed width so in that case you would parse it with an unpack template.
Here is the documentation.
pack tutorial
https://perldoc.perl.org/functions/unpack
https://perldoc.perl.org/functions/pack
$line =
"b09 fl__2220 fuel20 ddm___an ddm___an dfl___de dfl___de dfl___de dfl___de dfl___de dfl___de dfl___de";
#fields = unpack "A6A10";
###
### produces #fields containing ("b09", "fl__2220")
###
You would write the template for each type of line and extract it in the same manner. The A template character automatically trims trailing spaces. The number after A means how many of that type to take. A6 means take 6 characters and trim.
Another way to do it would be to use substr to pull out a number of substrings but you have to trim it yourself. unpack is usually easier.
If you are new to Perl remember to always put use strict; and use diagnostics; at the top of your script. That way you'll get a nice explanatory message whenever anything goes wrong instead of hopelessness!
HTH

Related

add rows with strings between a matrix in matlab

I have two matrices that i have concatenated vertically. However, i want to insert 2 or more rows in between them with a string in those rows.. how do i go about doing that.?
Basically this is what i have;
A = 0.7363 0.8217 0.7904 0.5144 0.5341
0.3947 0.4299 0.9493 0.8843 0.0900
0.6834 0.8878 0.3276 0.5880 0.1117
0.7040 0.3912 0.6713 0.1548 0.1363
0.4423 0.7691 0.4386 0.1999 0.6787
0.0196 0.3968 0.8335 0.4070 0.4952
0.3309 0.8085 0.7689 0.7487 0.1897
0.4243 0.7551 0.1673 0.8256 0.4950
0.2703 0.3774 0.8620 0.7900 0.1476
0.1971 0.2160 0.9899 0.3185 0.0550
But i want it to be;
A = 0.7363 0.8217 0.7904 0.5144 0.5341
0.3947 0.4299 0.9493 0.8843 0.0900
0.6834 0.8878 0.3276 0.5880 0.1117
0.7040 0.3912 0.6713 0.1548 0.1363
0.4423 0.7691 0.4386 0.1999 0.6787
MESH PART
0.0196 0.3968 0.8335 0.4070 0.4952
0.3309 0.8085 0.7689 0.7487 0.1897
0.4243 0.7551 0.1673 0.8256 0.4950
0.2703 0.3774 0.8620 0.7900 0.1476
0.1971 0.2160 0.9899 0.3185 0.0550

Assuming CATIA can read the output correctly, you could simply set A as a cell variable, which can contain both numbers and strings of characters. This is achieved by using the brackets { }, as opposed to [ ] for numeric matrices. In your particular case, I would write:
A = {0.7363 0.8217 0.7904 0.5144 0.5341; ...
0.3947 0.4299 0.9493 0.8843 0.0900; ...
0.6834 0.8878 0.3276 0.5880 0.1117; ...
0.7040 0.3912 0.6713 0.1548 0.1363; ...
0.4423 0.7691 0.4386 0.1999 0.6787; ...
'MESH' 'PART' '-' '-' '-' ; ...
0.0196 0.3968 0.8335 0.4070 0.4952; ...
0.3309 0.8085 0.7689 0.7487 0.1897; ...
0.4243 0.7551 0.1673 0.8256 0.4950; ...
0.2703 0.3774 0.8620 0.7900 0.1476; ...
0.1971 0.2160 0.9899 0.3185 0.0550};
The '-'s next to MESH and PART are for consistency with the matrix (in this case, cell) size. I hope this works for you.

Can Lsqnonlin iterative output display current point for the algorithm?

My optimization using "Lsqnonlin" is running into an error in the 18th iteration. I was wondering if I could see what is the current input point that the algorithm is using for each iteration. It may help me diagnose what's going wrong. Thanks
EDIT: First Pass at Solution
I created myoutput.m
function stop = myoutput(x,optimValues,state)
stop = false;
indicator = x;
disp(indicator)
Then added OutPut Fcn to me options
options = optimset('disp','iter-detailed','MaxFunEvals',1000,'TolFun',1e-5,'OutputFcn',#myoutput);
HW1Fparams= lsqnonlin(HW1Fobjfun4,x0,lb,ub,options)
But I am getting hideous looking results like these:
I'd appreciate it if someone can help me make it look nicer. Below the break is the rest of the original question.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Full code below. I am using some Financial Toolbox functions. The idea is to calibrate the Hull White One Factor Model to market data. It's a straightforward exercise and I must be specifying things incorrectly because it's totally tripping me up.
ValuationDate = '10-01-2014';
Settle = datenum(ValuationDate);
CurveDates = [735874;
735882;
735906;
735936;
735950;
736040;
736133;
736224;
736314;
736424;
736606;
736788;
736971;
737153;
737336;
737518;
737701;
737884;
738069;
738251;
738433;
738615;
738797;
738979;
739162;
739345;
739528;
739710;
739893;
740075;
740260;
740442;
740624;
740806;
740989;
741171;
741354;
741536;
741719;
741901;
742084;
742269;
742451;
742633;
742815;
742997;
743180;
743362;
743545;
743728;
743911;
744093;
744278;
744460;
744642;
744824;
745006;
745189;
745372;
745554;
745737;
745919;
746102;
746284;
746469;
746651;
746833;
747015;
747198;
747380;
747563;
747745;
747928;
748111;
748296;
748478;
748660;
748842;
749024;
749206;
749389;
749572;
749755;
749937;
750120;
750302;
750487];
ZeroRates = 1.0e-03*[0.0172;
0.0188;
0.0191;
0.0221;
0.0249;
0.0244;
0.0269;
0.0333;
0.0423;
0.0571;
0.0789;
0.1021;
0.1253;
0.1435;
0.1617;
0.1749;
0.1881;
0.1973;
0.2064;
0.2158;
0.2253;
0.2311;
0.2370;
0.2429;
0.2488;
0.2547;
0.2607;
0.2640;
0.2672;
0.2706;
0.2738;
0.2772;
0.2807;
0.2842;
0.2877;
0.2913;
0.2948;
0.2964;
0.2979;
0.2995;
0.3011;
0.3026;
0.3043;
0.3060;
0.3077;
0.3095;
0.3112;
0.3118;
0.3125;
0.3132;
0.3138;
0.3146;
0.3152;
0.3160;
0.3167;
0.3175;
0.3183;
0.3186;
0.3189;
0.3192;
0.3196;
0.3199;
0.3202;
0.3206;
0.3209;
0.3213;
0.3217;
0.3217;
0.3216;
0.3216;
0.3216;
0.3216;
0.3216;
0.3216;
0.3216;
0.3216;
0.3216;
0.3217;
0.3217;
0.3218;
0.3218;
0.3219;
0.3219;
0.3220;
0.3220;
0.3221;
0.3221];
Compounding = 2;
RateSpec = intenvset('Compounding', 2,'ValuationDate', ValuationDate,'StartDates', ValuationDate,'EndDates', CurveDates,'Rates', ZeroRates);
InstrumentMaturity = datenum('12-Sep-2044');
SwaptionBlackVol = [ 0.5940 0.5550 0.4450 0.3710 0.3400 0.3110 0.2910 0.2750 0.2630 0.2520 0.2250 0.2140 0.2080 0.2050;
0.5630 0.5470 0.4420 0.3690 0.3360 0.3090 0.2900 0.2740 0.2630 0.2520 0.2260 0.2150 0.2090 0.2060;
0.5760 0.5330 0.4400 0.3730 0.3410 0.3150 0.2970 0.2820 0.2700 0.2590 0.2330 0.2220 0.2170 0.2140;
0.5840 0.5020 0.4240 0.3730 0.3480 0.3240 0.3060 0.2920 0.2810 0.2710 0.2430 0.2300 0.2230 0.2190;
0.5630 0.4750 0.4100 0.3700 0.3450 0.3230 0.3070 0.2940 0.2830 0.2740 0.2470 0.2330 0.2260 0.2210;
0.5510 0.4520 0.3980 0.3660 0.3410 0.3220 0.3070 0.2950 0.2850 0.2760 0.2500 0.2360 0.2290 0.2240;
0.4630 0.4010 0.3660 0.3440 0.3250 0.3100 0.2990 0.2890 0.2790 0.2720 0.2470 0.2320 0.2260 0.2210;
0.4230 0.3750 0.3480 0.3290 0.3140 0.3030 0.2930 0.2840 0.2760 0.2690 0.2420 0.2300 0.2240 0.2190;
0.3700 0.3470 0.3280 0.3110 0.2960 0.2880 0.2800 0.2730 0.2680 0.2620 0.2360 0.2240 0.2190 0.2150;
0.3420 0.3250 0.3100 0.2970 0.2850 0.2770 0.2700 0.2640 0.2590 0.2540 0.2280 0.2180 0.2140 0.2110;
0.3230 0.3010 0.2900 0.2810 0.2720 0.2650 0.2590 0.2540 0.2500 0.2470 0.2230 0.2130 0.2090 0.2060;
0.3010 0.2860 0.2760 0.2670 0.2580 0.2530 0.2480 0.2450 0.2420 0.2390 0.2160 0.2060 0.2030 0.2000;
0.2850 0.2750 0.2650 0.2560 0.2480 0.2440 0.2400 0.2370 0.2350 0.2320 0.2100 0.2000 0.1970 0.1940;
0.2710 0.2600 0.2510 0.2440 0.2380 0.2340 0.2310 0.2290 0.2260 0.2240 0.2040 0.1940 0.1910 0.1890;
0.2580 0.2470 0.2400 0.2350 0.2300 0.2270 0.2240 0.2210 0.2190 0.2170 0.1980 0.1890 0.1860 0.1840;
0.2460 0.2370 0.2320 0.2270 0.2240 0.2210 0.2180 0.2150 0.2130 0.2110 0.1980 0.1840 0.1820 0.1800;
0.2040 0.1980 0.1950 0.1920 0.1900 0.1890 0.1890 0.1880 0.1880 0.1870 0.1720 0.1660 0.1640 0.1620;
0.1790 0.1750 0.1740 0.1730 0.1730 0.1710 0.1710 0.1700 0.1690 0.1690 0.1530 0.1510 0.1500 0.1480;
0.1650 0.1650 0.1660 0.1670 0.1680 0.1670 0.1670 0.1680 0.1680 0.1680 0.1550 0.1580 0.1560 0.1530;
0.1530 0.1570 0.1590 0.1620 0.1640 0.1650 0.1660 0.1670 0.1680 0.1690 0.1560 0.1650 0.1620 0.1590];
SwaptionExerciseDates = cellstr(['1M ';'2M ';'3M '; '6M ';'9M ';'1Y ';'18M';'2Y ';'3Y ';'4Y ';'5Y ';'6Y ';'7Y ';'8Y ';'9Y ';'10Y';'15Y';'20Y';'25Y';'30Y']);
SwaptionTenors = cellstr(['1Y ';
'2Y ';
'3Y ';
'4Y ';
'5Y ';
'6Y ';
'7Y ';
'8Y ';
'9Y ';
'10Y';
'15Y';
'20Y';
'25Y';
'30Y']);
testmat = zeros(length(SwaptionExerciseDates),1);
for i = 1:length(SwaptionExerciseDates)
if SwaptionExerciseDates{i}(end)=='Y'
testmat(i) = addtodate(Settle,str2double(SwaptionExerciseDates{i}(1:end-1)),'year');
elseif SwaptionExerciseDates{i}(end)=='M'
testmat(i)=addtodate(Settle,str2double(SwaptionExerciseDates{i}(1:end-1)),'month');
end
end
EurExDates= testmat;
EurExDatesFull = repmat(testmat,1,length(SwaptionTenors));
testmat2 = zeros(length(SwaptionExerciseDates),length(SwaptionTenors));
for i = 1:size(EurExDatesFull,1)
for j = 1:size(EurExDatesFull,2)
if SwaptionTenors{j}(end)=='Y'
testmat2(i,j) = addtodate(EurExDatesFull(i,j),str2double(SwaptionTenors{j}(1:end-1)),'year');
elseif SwaptionTenors{j}(end)=='M'
testmat2(i,j)= addtodate(EurExDatesFull(i,j),str2double(SwaptionTenors{j}(1:end-1)),'month');
end
end
end
EurMatFull = testmat2;
relidx = find(EurMatFull <= InstrumentMaturity);
SwaptionBlackPrices = zeros(size(SwaptionBlackVol));
SwaptionStrike = zeros(size(SwaptionBlackVol));
for iSwaption=1:length(SwaptionExerciseDates)
for iTenor=1:length(SwaptionTenors)
[~,SwaptionStrike(iSwaption,iTenor)] = swapbyzero(RateSpec,[NaN 0],Settle, EurMatFull(iSwaption,iTenor),...
'StartDate',EurExDatesFull(iSwaption,iTenor),'LegReset',[1 2],'Basis',2);
SwaptionBlackPrices(iSwaption,iTenor) = swaptionbyblk(RateSpec,'call', SwaptionStrike(iSwaption,iTenor),Settle, ...
EurExDatesFull(iSwaption,iTenor), EurMatFull(iSwaption,iTenor),SwaptionBlackVol(iSwaption,iTenor));
end
end
TimeSpec = hwtimespec(Settle,daysadd(Settle,30*(1:370),6), 12);
% B = (214:224) produces error free solutions.
B = (150:224);
HW1Fobjfun4 = #(x) SwaptionBlackPrices(relidx(B)) - ...
swaptionbyhw(hwtree(hwvolspec(ValuationDate,testmat,x(2),testmat,x(1),'spline'), RateSpec, TimeSpec), 'call',SwaptionStrike(relidx(B)),EurExDatesFull(relidx(B)), 0,EurExDatesFull(relidx(B)), EurMatFull(relidx(B)),'Basis',2, 'SwapReset',12);
options = optimset('disp','iter','MaxFunEvals',1000,'TolFun',1e-5);
x0 = [.1 .01];
lb = [0 0];
ub = [1 1];
HW1Fparams = lsqnonlin(HW1Fobjfun4,x0,lb,ub,options)

Your best bet may be to modify the source lsqnonlin.m file. This can be a somewhat in-depth process, but it gives you the maximum control over what's going on.
Open the file by typing lsqnonlin at the command prompt, highlighting it, then right-clicking and clicking on Open Selection. Before you do anything else, save a copy of the file to your default Matlab working directory (e.g. C:\Users\username\Documents\MATLAB\ for Windows 7. Matlab puts your default working directory at the top of the search path, so if you have a program that's the same name as a Matlab built-in one, then Matlab will find yours first and use it instead. I don't have that particular function myself, so I can't give you the exact code to put in there, but the solution should be simple enough for you to implement.
With your locally-saved version of the code open, note that on the first line of the program, there's a function declaration that looks something like
function [output1,output2,...]=lsqnonlin(input1,input2,...)
From the MATLAB help page, it looks like x is the first output. Presumably, it's called x in the code itself or something similar, but if not, just use the first output parameter. Now that we know the name of the variable that is being output, we can go through the code and find where it is being calculated. MATLAB will probably have this routine be a wrapper around a more fundamental numerical code. For lsqnonneg, it calls lsqncommon, which then calls either snls or levenbergMarquardt, depending on the details of the problem. Any code that is iteratively solving something will eventually end up in a while loop, since it has to perform the same calculation an unknown number of times to converge on a solution. Once you find the while loop, it's simply a matter of adding a little code to output whatever parameter(s) you'd like to look at.
Just remember that as long as you have a file of the same name in your workspace, you'll be calling that one, not the original code, so you may want to delete (or at lease move) your modified code after you've finished debugging.

Gnuplot reading not locale encoding file

I want to plot data of an ISO_8859_1 encoded file (two columns of numbers). Those are the first 10 data points of the file:
#Pe2
1 0.8000
2 0.8000
3 0.8000
4 0.8000
5 0.8000
6 0.8000
7 0.8000
8 0.8000
9 0.8000
10 0.8000
The original file has 15000 data points. I create this data with MATLAB, specifically setting ISO_8859_1 encoding, so I am sure that that's the encoding. This is a snippet of the matlab code:
slCharacterEncoding('ISO-8859-1'); %Instruction before writing anything to the file.
fprintf(fileID,' %7d %7.4f',Tempo(i),y(i)); %For loop in this instruction
fprintf(fileID,'\r'); %Closing the file
fclose(fileID);
This is the script that I run. This file is encoded with the default Windows txt files encoding:
set encoding iso_8859_1
set terminal wxt size 1000,551
# Line width of the axes
set border linewidth 1.5
# Line styles
set style line 1 lc rgb '#dd181f' lt 1 lw 1 pt 0 # red
# Axes label
set xlabel 'tiempo'
set ylabel 'valor'
plot 'Pe2.txt' with lines ls 1
This is the output of the gnuplot console when I run the script. After that I input "show encoding":
G N U P L O T
Version 4.6 patchlevel 5 last modified February 2014
Build System: MS-Windows 32 bit
Copyright (C) 1986-1993, 1998, 2004, 2007-2014
Thomas Williams, Colin Kelley and many others
gnuplot home: http://www.gnuplot.info
faq, bugs, etc: type "help FAQ"
immediate help: type "help" (plot window: hit 'h')
Terminal type set to 'wxt'
gnuplot> cd 'C:\Example'
gnuplot> load 'script.txt'
"script.txt", line 10: warning: Skipping data file with no valid points
gnuplot> plot 'Pe2.txt' with lines ls 1
^
"script.txt", line 10: x range is invalid
gnuplot> show encoding
nominal character encoding is iso_8859_1
however LC_CTYPE in current locale is Spanish_Spain.1252
gnuplot>
If I open the file, make some change undo the change and save the file, gnuplot plots the file. I guess that it's because it saves it with local encoding which is the one gnuplot uses to read files.
How do I plot files with gnuplot which are not with the local encoding format?
I also have what it seems to be a similar problem when I output a file with VS2010Css. If I don't specifically set the culture with:
Thread.CurrentThread.CurrentUICulture = CultureInfo.GetCultureInfo("en-US");
Thread.CurrentThread.CurrentCulture = CultureInfo.GetCultureInfo("en-US");
I am not able to save a file wich gnuplot is able to plot. I believe that this last problem is because of the "," and the "."
In Css I save the files with this:
StreamWriter Writer = new StreamWriter(dir + #"\" + + (k+1) + "_" + nombre + extension);
Writer.WriteLine("#" + (k+1) + "_" + nombre);
Writer.WriteLine();
Writer.WriteLine("{0,32} {1,32}", "#tiempo", "#valor");
for (int i = 0; i < tiempo.GetLength(0); i++)
{
Writer.WriteLine("{0,32} {1,32}", tiempo[i].ToString(), valor[i, k]);
}
Thank you.

Your file has only carriage returns (\r 0xd) as line breaks which doesn't work with gnuplot. You must use only line feed (\n 0xa), but \r\n does also work.

create file by feching corresponding data from other files?

I have a list of SNPs for example (let's call it file1):
SNP_ID chr position
rs9999847 4 182120631
rs999985 11 107192257
rs9999853 4 148436871
rs999986 14 95803856
rs9999883 4 870669
rs9999929 4 73470754
rs9999931 4 31676985
rs9999944 4 148376995
rs999995 10 78735498
rs9999963 4 84072737
rs9999966 4 5927355
rs9999979 4 135733891
I have another list of SNP with corresponding P-value (P) and BETA (as shown below) for different phenotypes here i have shown only one (let's call it file2):
CHR SNP BP A1 TEST NMISS BETA SE L95 U95 STAT P
1 rs3094315 742429 G ADD 1123 0.1783 0.2441 -0.3 0.6566 0.7306 0.4652
1 rs12562034 758311 A ADD 1119 -0.2096 0.2128 -0.6267 0.2075 -0.9848 0.3249
1 rs4475691 836671 A ADD 1111 -0.006033 0.2314 -0.4595 0.4474 -0.02608 0.9792
1 rs9999847 878522 A ADD 1109 -0.2784 0.4048 -1.072 0.5149 -0.6879 0.4916
1 rs999985 890368 C ADD 1111 0.179 0.2166 -0.2455 0.6034 0.8265 0.4087
1 rs9999853 908247 C ADD 1110 -0.02015 0.2073 -0.4265 0.3862 -0.09718 0.9226
1 rs999986 918699 G ADD 1111 -1.248 0.7892 -2.795 0.2984 -1.582 0.114
Now I want to make two files named file3 and file4 such that:
file3 should contain:
SNPID Pvalue_for_phenotype1 Pvalue_for_phenotype2 Pvalue_for_phenotype3 and so on....
rs9999847 0.9263 0.00005 0.002 ..............
The first column (SNPIDs) in file3 will be fixed (all the snps in my chip will be listed here), and i want to write a programe so that it will match snp id in file3 and file2 and will fetch the P-value for that corresponding snp id and put it in file3 from file2.
file4 should contain:
SNPID BETAvale_for_phenotype1 BETAvale_for_phenotype2 BETAvale_for_phenotype3 .........
rs9999847 0.01812 -0.011 0.22
the 1st column (SNPIDs) in file4 will be fixed (all the SNPs in my chip will be listed here), and I want to write a program so that it will match SNP ID in file4 and file2 and will fetch the BETA for that corresponding SNP ID and put it in file4 from file2.

it's a simple exercise about How to transfer the data of columns to rows (with awk)?
file2 to file3.
I assumed that you have got machine with large RAM, because I think that you have got million lines into file2.
you could save this code into column2row.awk file:
#!/usr/bin/awk -f
BEGIN {
snp=2
val=12
}
{
if ( vector[$snp] )
vector[$snp] = vector[$snp]","$val
else
vector[$snp] = $val
}
END {
for (snp in vector)
print snp","vector[snp]
}
where snp is column 2 and val is column 12 (pvalue).
now you could run script:
/usr/bin/awk -f column2row.awk file2 > file3
If you have got small RAM, then you could divide load:
cat file1 | while read l; do s=$(echo $l|awk '{print $1}'); grep -w $s file2 > $s.snp; /usr/bin/awk -f column2row.awk $s.snp >> file3; done
It recovers from $l (line) first parameter ($s, snp name), search $s into file2 and create small file about each snp name.
and then it uses awk script to generate file3.
file2 to file4.
you could modify value about val into awk script from column 12 to 7.

what's the purpose of fcntl with parameter F_DUPFD

I　traced an oracle process, and find it first open a file /etc/netconfig as file handle 11, and then duplicate it as 256 by calling fcntl with parameter F_DUPFD, and then close the original file handle 11. Later it read using file handle 256. So what's the point to duplicate the file handle? Why not just work on the original file handle?
12931: 0.0006 open("/etc/netconfig", O_RDONLY|O_LARGEFILE) = 11
12931: 0.0002 fcntl(11, F_DUPFD, 0x00000100) = 256
12931: 0.0001 close(11) = 0
12931: 0.0002 read(256, " # p r a g m a i d e n".., 1024) = 1024
12931: 0.0003 read(256, " t s t p i _ c".., 1024) = 215
12931: 0.0002 read(256, 0x106957054, 1024) = 0
12931: 0.0001 lseek(256, 0, SEEK_SET) = 0
12931: 0.0002 read(256, " # p r a g m a i d e n".., 1024) = 1024
12931: 0.0003 read(256, " t s t p i _ c".., 1024) = 215
12931: 0.0003 read(256, 0x106957054, 1024) = 0
12931: 0.0001 close(256) = 0

On some systems, like Solaris, standard I/O with FILE only works with file descriptors 0-255 because its implementation of the FILE structure uses an 8-bit integer instead of int. If a program uses a lot of file descriptors, it's useful to reserve file descriptors 3-255 using fnctl(fd, F_DUPFD, 256). Otherwise, functions like fopen(), freopen() and fdopen() will fail if you have 256 files open.

As an aside, they're file descriptors rather than file handles. The latter are a C feature used with fopen and its brethren while descriptors are more UNIXy, for use with open et al.
Interesting. The only reason that comes to mind is that some other piece of code has a specific need for the file descriptor to be 256. I suspect only Oracle would know the bizarre reasons for that. In any case, you're not guaranteed to get 256, you get the file first available file descriptor greater than or equal to that number.
From a bit of investigation (I don't know every little thing about the innards of UNIX off the top of my head), there are attributes that belong to a group of duplicated descriptors such as file position and access mode. There are other attributes that belong to a single file descriptor, even when duplicated, such as the close-on-exec flag in GNULib.
Doing a duplicate (either with dup, dup2 or your fcntl) could be a way to create two descriptors, one with different file descriptor attributes, but I can't see that being the case in your question since the first descriptor is closed anyway. As you say, why not just use the low descriptor?
Interestingly enough, if you google for netconfig f_dupfd, you will see similar traces where the fcntl fails and it continues to read that file with the low descriptor so my thoughts on the matter are that this is an attempt to preserve low file descriptors as much as possible. For example:
4327: open("/etc/netconfig", O_RDONLY|O_LARGEFILE) = 4
4327: fcntl(4, F_DUPFD, 0x00000100) Err#22 EINVAL
4327: read(4, " # p r a g m a i d e n".., 1024) = 1024
4327: read(4, " t s t p i _ c".., 1024) = 215
4327: read(4, 0x00296B80, 1024) = 0
4327: lseek(4, 0, SEEK_SET) = 0
4327: read(4, " # p r a g m a i d e n".., 1024) = 1024
4327: read(4, " t s t p i _ c".., 1024) = 215
4327: read(4, 0x00296B80, 1024) = 0
4327: close(4) = 0
Maybe the software has a byte array of file descriptors somewhere that's limited so it attempts to move other files above the 255-limit.
But really, that's just guesswork on my part (although I'd like to think it's relatively intelligent guesswork). Also keep in mind that it may not be Oracle itself doing this. The netconfig stuff is used in a lot of places so it may be some underlying library doing that, especially in light of the fact that most of the afore-mentioned web hits weren't Oracle-specific (ftp, remsh and so on).

Here is another example when a technique of reserving low-numbered file descriptors is needed.
Assume that a process opens a large number of file descriptor e.g. it accepts more than 1024 simultaneous socket connections. At the same time the process also uses third party library that opens socket connections and uses select() to see if sockets are ready for reading or writing. Additionally the third party library was compiled with __FD_SETSIZE set to 1024 (default value).
If the library opens a socket when all file descriptors below 1024 are in use then it will get a descriptor that select() and associated FD_* macros can not cope with. This will result in process crashing or undefined behaviour.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Extracting certain rows from a text file, skipping others - perl

Related

add rows with strings between a matrix in matlab

Can Lsqnonlin iterative output display current point for the algorithm?

Gnuplot reading not locale encoding file

create file by feching corresponding data from other files?

what's the purpose of fcntl with parameter F_DUPFD

Categories

Resources