Network data in SQL Server - identifying separate routes - tsql

I am looking for help with a database task, which probably will be easier to solve by some object programming language. At this moment I keep trying to find TSQL/SQL Server solution of it.
I use a source table which contains data about routes. Each record describes a link of a route with routeNo, originNodeID and destinationNodeID. The most complicated example of data from this table looks like below:
routeID originNodeID destinationNodeID
WRTV ... ...
WRTX 5 10
WRTX 10 15
WRTX 15 20
WRTX 20 25
WRTX 25 30
WRTX 25 1505
WRTX 25 2005
WRTX 30 35
WRTX 30 1005
WRTX 35 40
WRTX 40 45
WRTX 45 50
WRTX 1005 1010
WRTX 1015 1020
WRTX 1505 1510
WRTX 1510 1515
WRTX 2005 2010
WRTX 2010 2015
WRTX 2020 2025
WRTY .... ....
So, as you can see each routeID describes not a linear route but route with branches. The route from the example may look like this:
1515 1020
/ /
/ /
5 ------ 25 --- 30 -------50
\
\
2025
Now, what I need to do is to dismember this route to separate routes:
5-25-30-50 WRTX1
5-25-30-1020 WRTX2
5-25-1515 WRTX3
5-25-2025 WRTX4
For each of the new routes I just need the link sequence like below:
routeID originNodeID destinationNodeID
WRTX1 5 10
WRTX1 10 15
WRTX1 15 20
WRTX1 20 25
WRTX1 25 30
WRTX1 30 35
WRTX1 35 40
WRTX1 40 45
WRTX1 45 50
WRTX2 5 10
WRTX2 10 15
WRTX2 15 20
WRTX2 20 25
WRTX2 25 30
WRTX2 30 1005
WRTX2 1005 1010
WRTX2 1015 1020
WRTX3 5 10
WRTX3 10 15
WRTX3 15 20
WRTX3 20 25
WRTX3 25 1505
WRTX3 1505 1510
WRTX3 1510 1515
WRTX4 5 10
WRTX4 10 15
WRTX4 15 20
WRTX4 20 25
WRTX4 25 2005
WRTX4 2005 2010
WRTX4 2010 2015
WRTX4 2020 2025
Do you have any idea how to solve my problem ? Preferably I would like to make this solution in SQL Server, but I had only little experience in loops and cursors which possibly could be useful in that case. Once I even made an ETL, but it was working only when there was only one point where the route splits.
I would be grateful for any help.

Not all of the actions you need to program in sql. There is no universal programming languages. Some action needs to be done in other programming languages. For your tasks it is better suited to additionally use python with sql databases.
You can edit a row in python and insert into the database. Can you give an example script, but you must bring the correct string "5-25-30-50 WRTX1 5-25-30-1020 WRTX2 5-25-1515 WRTX3 5-25-2025 WRTX4" and correct example of a data table.
In your table there is the number 10, but they are not in the line above. In this regard, do not understand the mechanism of decomposition of the string "5-25-30-50 WRTX1 5-25-30-1020 WRTX2 5-25-1515 WRTX3 5-25-2025 WRTX4". For example, "5-25-30-50 WRTX1" decompose ""5 25 WRTX1", ""25 30 WRTX1", ""30 50 WRTX1"? And so on?
EXAMPLE FOR PYTHON + MSSQL
import pymssql
import re
ServName = 'YourMSSQLServName'
DBName = 'YourDBName'
conn = pymssql.connect(server=ServName, database=DBName)
cursor = conn.cursor()
querytxt = '''
INSERT INTO [routing]
([routeID] ,[originNodeID] ,[destinationNodeID])
VALUES
('#routeID', #originNodeID , #destinationNodeID)'''
limit = 1000
Mask = 'WRTX'
F = open('rote.txt', 'r')
L = [R.strip() for R in F]
for Line in L:
LineLast = Line
j = 1
while len(LineLast) != 0:
PingLines = LineLast.partition(Mask)[0].strip()
LineTemp = LineLast.partition(Mask)[2].strip()
Num = LineTemp[0]
LineLast = LineTemp.partition(' ')[2]
PingSet = PingLines.split('-')
i = 0
while i < len(PingSet)-1:
Ping1 = PingSet[i]
Ping2 = PingSet[i+1]
i = i + 1
routeID = Mask + Num
originNodeID = Ping1
destinationNodeID = Ping2
print('Mask = %s\tPing1 = %s\tPing2 = %s' % (routeID , originNodeID, destinationNodeID))
query = querytxt.replace('#routeID', routeID)
query = query.replace('#originNodeID', originNodeID)
query = query.replace('#destinationNodeID', destinationNodeID)
cursor.execute(query)
conn.commit()
if i >= limit : break
j = j + 1
if j >= limit : break
F.close()

Related

remove description lines and add time to the first column

AWk experts, I have a file as descried below and I wonder if it is possible to easily convert it to the form that I want:
The file containing multiple variables over one month (one observance ONLY in one day, but some days may be missing). The format for each day is the same except the date/value. However there is some description lines (containing words and numbers) at the end of each day, and the number of description lines varies among different days.
KBO BTA Observations at 12Z 01 Feb 2020
-----------------------------------------------------------------------------
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
hPa m C C % g/kg deg knot K K K
-----------------------------------------------------------------------------
1000.0 92
925.0 765
850.0 1516
754.0 2546 13.0 9.3 78 9.85 150 2 310.2 340.6 312.0
752.0 2569 14.0 9.2 73 9.80 149 2 311.5 342.0 313.4
700.0 3173 -9.20 7.5 89 9.38 120 6 312.6 341.9 314.4
Station information and sounding indices
Station elevation: 2546.0
Lifted index: 1.83
Pres [hPa] of the Lifted Condensation Level: 693.42
1000 hPa to 500 hPa thickness: 5798.00
Precipitable water [mm] for entire sounding: 21.64
8022 KBO BTA Observations at 00Z 02 Feb 2020
-----------------------------------------------------------------------------
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
hPa m C C % g/kg deg knot K K K
-----------------------------------------------------------------------------
1000.0 97
925.0 758
850.0 1515
753.0 2546 10.8 6.8 76 8.30 190 3 307.9 333.4 309.5
750.0 2580 12.6 7.9 73 8.99 186 3 310.2 338.1 311.9
Here is what I want: remove all the description lines and read the date/time information and put it as the first column.
Time PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
20200201t12Z 754.0 2546 13.0 9.3 78 9.85 150 2 310.2 340.6 312.0
20200201t12Z 752.0 2569 14.0 9.2 73 9.80 149 2 311.5 342.0 313.4
20200201t12Z 700.0 3173 -9.2 7.5 89 9.38 120 6 312.6 341.9 314.4
20200202t00Z 753.0 2546 10.8 6.8 76 8.30 190 3 307.9 333.4 309.5
20200202t00Z 750.0 2580 12.6 7.9 73 8.99 186 3 310.2 338.1 311.9
Any help is appreciated.
Kelly
something like this...
$ awk 'function m(x)
{return sprintf("%02d",int(index("JanFebMarAprMayJunJulAugSepOctNovDec",x)-1)/3+1)}
NR==1 {print "time PRES TEMP WDIR WSPD RELH"}
/^-+$/ {f=!f}
f {date=p[n] m(p[n-1]) p[n-2]}
!f {n=split($0,p)}
NF==11 && !/[^ 0-9.-]/ {print date,$0}' file | column -t
time PRES TEMP WDIR WSPD RELH
20200201 1000 10 230 5 90
20200201 900 9 200 6 85
20200201 800 9 100 6 87
20200202 1000 9.2 233 5 90
20200202 900 9.1 200 4 80
20200202 800 9 176 2 80
Explanation
function just returns the month number from the month string by looking up the index of and converting to formatted number
f keeps track of the dashed lines so that from the previous line we can parse the date,
finally to find the data lines the heuristic is number of fields and no non-number signs (digits, spaces, dots or negative signs).
$ cat tst.awk
/^-+$/ && ( ((++dashCnt) % 2) == 1 ) {
mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",p[n-1])+2)/3
time = sprintf("%04d%02d%02d", p[n], mthNr, p[n-2])
}
/^[[:upper:][:space:]]+$/ && !doneHdr++ { print "Time", $0 }
/^[0-9.[:space:]]+$/ { print time, $0 }
{ n = split($0,p) }
.
$ awk -f tst.awk file | column -t
Time PRES TEMP WDIR WSPD RELH
20200001 1000 10 230 5 90
20200001 900 9 200 6 85
20200001 800 9 100 6 87
20200002 1000 9.2 233 5 90
20200002 900 9.1 200 4 80
20200002 800 9 176 2 80

Indexing a Structure in matlab

I was under the impression that structure in matlab were similar to query tables in sql but I have a feeling I might be wrong.
I have a rather large dataset consisting of many entries and many fields. Ideally I want to index the structure, pulling out only the data I am interested in. Here is an example of the dataset
Cond Type Stime ETime
2 10 1 900
2 10 1 900
2 10 1 900
3 1 901 1800
3 1 901 1800
4 1 1801 2700
8 1 901 1800
8 1 901 1800
9 1 901 1800
9 1 901 1800
12 1 901 1800
12 1 901 1800
13 10 1 900
13 10 1 900
13 10 1 900
16 1 901 1800
16 1 901 1800
17 10 1 900
17 10 1 900
17 10 1 900
19 10 1 900
19 10 1 900
19 10 1 900
20 10 1 900
20 10 1 900
20 10 1 900
22 1 901 1800
22 1 901 1800
25 10 1 900
25 10 1 900
25 10 1 900
27 1 901 1800
27 1 901 1800
28 1 901 1800
28 1 901 1800
30 1 1801 2700
31 1 901 1800
31 1 901 1800
32 10 1 900
32 10 1 900
32 10 1 900
35 10 1 900
35 10 1 900
35 10 1 900
What I want to do is pull specific data entries for analysis example being I want all entries where Type is equal to 10 or I want all Cond from 1:20 that have ETime == 900.
I can do this by the following
idx = find([stats.Type] == 10);
[stats(idx).Stime]
but for multiple types I need a for loop as trying to use a vector throws an error.
idx = find([stats.Type] == 1:10); % Does not work
% must use this
temp = [];
for aa = 1:10
idx = find([stats.Type] == aa);
temp = horzcat(idx,temp);
end
[stats(temp).Stime]
Is this the wrong way to use structures? Is there an easier method to index a structure to pull data of interest?
This answer proposes using table indexing instead of struct indexing, which is a bit of a side-step to directly answering the question. However, my comments on this post were deemed useful so I've formalised as an answer...
If you use struct2table then you can interact with it as a table, which is generally much more intuitive.
Structures are useful if your fields have different numbers of elements (i.e. you couldn't form a consistent height table). In almost all other areas, I find tables are easier to use.
With tables you can use:
Logical indexing
Sorting (including sortrows by column name)
The family of "join" operations
Dot notation for accessing table columns by name, as you do for accessing struct fields, or select multiple columns by name using myTable( :, {'col1','col2'} ). - You don't need weird syntactic tricks like [stats.Type] to group outputs, you can just do stats.Type
I would then use ismember to compare multiple items against a table column...
idx = ismember( stats.Type, 1:10 );
Unless you need the indices, you can skip using find for speed, and just directly index using idx.

how I delete combination rows that have the same numbers from matrix and only keeping one of the combinations?

for a=1:50; %numbers 1 through 50
for b=1:50;
c=sqrt(a^2+b^2);
if c<=50&c(rem(c,1)==0);%if display only if c<=50 and c=c/1 has remainder of 0
pyth=[a,b,c];%pythagorean matrix
disp(pyth)
else c(rem(c,1)~=0);%if remainder doesn't equal to 0, omit output
end
end
end
answer=
3 4 5
4 3 5
5 12 13
6 8 10
7 24 25
8 6 10
8 15 17
9 12 15
9 40 41
10 24 26
12 5 13
12 9 15
12 16 20
12 35 37
14 48 50
15 8 17
15 20 25
15 36 39
16 12 20
16 30 34
18 24 30
20 15 25
20 21 29
21 20 29
21 28 35
24 7 25
24 10 26
24 18 30
24 32 40
27 36 45
28 21 35
30 16 34
30 40 50
32 24 40
35 12 37
36 15 39
36 27 45
40 9 41
40 30 50
48 14 50
This problem involves the Pythagorean theorem but we cannot use the built in function so I had to write one myself. The problem is for example columns 1 & 2 from the first two rows have the same numbers. How do I code it so it only deletes one of the rows if the columns 1 and 2 have the same number combination? I've tried unique function but it doesn't really delete the combinations. I have read about deleting duplicates from previous posts but those have confused me even more. Any help on how to go about this problem will help me immensely!
Thank you
welcome to StackOverflow.
The problem in your code seems to be, that pyth only contains 3 values, [a, b, c]. The unique() funcion used in the next line has no effect in that case, because only one row is contained in pyth. another issue is, that the values idx and out are calculated in each loop cycle. This should be placed after the loops. An example code could look like this:
pyth = zeros(0,3);
for a=1:50
for b=1:50
c = sqrt(a^2 + b^2);
if c<=50 && rem(c,1)==0
abc_sorted = sort([a,b,c]);
pyth = [pyth; abc_sorted];
end
end
end
% do final sorting outside of the loop
[~,idx] = unique(pyth, 'rows', 'stable');
out = pyth(idx,:);
disp(out)
a few other tips for writing MATLAB code:
You do not need to end for or if/else stements with a semicolon
else statements cover any other case not included before, so they do not need a condition.
Some performance reommendations:
Due to the symmetry of a and b (a^2 + b^2 = b^2 + a^2) the b loop could be constrained to for b=1:a, which would roughly save you half of the loop cycles.
if you use && for contencation of scalar values, the second part is not evaluated, if the first part already fails (source).
Regards,
Chris
You can also linearize your algorithm (but we're still using bruteforce):
[X,Y] = meshgrid(1:50,1:50); %generate all the combination
C = (X(:).^2+Y(:).^2).^0.5; %sums of two square for every combination
ind = find(rem(C,1)==0 & C<=50); %get the index
res = unique([sort([X(ind),Y(ind)],2),C(ind)],'rows'); %check for uniqueness
Now you could really optimized your algorithm using math, you should read this question. It will be useful if n>>50.

Writing to file with no leading spaces at the Start of Line and No Blank End Line

Hi All Fortran Lovers,
I am trying to write to a file which outputs three variables as
program main
integer N, u
parameter(u=20)
open (u, FILE='points.dat', STATUS='new')
do 10 i= 1, 100
write(u,100) i, i*2, i*5
10 continue
100 format (I5, I10, 9X, I10)
close(u)
print *,'COMPLETE!!'
end
Which Gives output (points.dat stripped file content):
1 2 5
2 4 10
3 6 15
4 8 20
5 10 25
6 12 30
7 14 35
8 16 40
9 18 45
10 20 50
11 22 55
12 24 60
...
...
...
...
...
99 198 495
100 200 500
|(This line added by the write statement)
But I want something like this:
1 2 5
2 4 10
3 6 15
4 8 20
5 10 25
6 12 30
7 14 35
8 16 40
9 18 45
10 20 50
11 22 55
12 24 60
...
...
...
...
...
99 198 495
100 200 500|(The cursor stop here)
i.e. No space at start of each line. The last line stops after printing '500'
I tried using Horizontal spacing using '1X' specifier but no success.
Add advance='no' in write statement. If the line is not the last one, write EOL:
do 10 i= 1, 100
write(u,100,advance='no') i, i*2, i*5
if (i.ne.100) write(u,*)
10 continue
Edit: I see it now, it seems that the fortran program will add EOL to the end of file anyway. Then you have to use external programs to truncate your file, see for example https://www.quora.com/How-do-I-chop-off-just-the-last-byte-of-a-file-in-Bash .

Find the longest run of sequential integers in a vector

I have a routine that returns a list of integers as a vector.
Those integers come from groups of sequential numbers; for example, it may look like this:
vector = 6 7 8 12 13 14 15 26 27 28 29 30 55 56
Note that above, there are four 'runs' of numbers (6-8, 12-15, 26-30 & 55-56). What I'd like to do is forward the longest 'run' of numbers to a new vector. In this case, that would be the 26-30 run, so I'd like to produce:
newVector = 26 27 28 29 30
This calculation has to be performed many, many times on various vectors, so the more efficiently I can do this the better! Any wisdom would be gratefully received.
You can try this:
v = [ 6 7 8 12 13 14 15 26 27 28 29 30 55 56];
x = [0 cumsum(diff(v)~=1)];
v(x==mode(x))
This results in
ans =
26 27 28 29 30
Here is a solution to get the ball rolling . . .
vector = [6 7 8 12 13 14 15 26 27 28 29 30 55 56]
d = [diff(vector) 0]
maxSequence = 0;
maxSequenceIdx = 0;
lastIdx = 1;
while lastIdx~=find(d~=1, 1, 'last')
idx = find(d~=1, 1);
if idx-lastIdx > maxSequence
maxSequence = idx-lastIdx;
maxSequenceIdx = lastIdx;
end
d(idx) = 1;
lastIdx=idx;
end
output = vector(1+maxSequenceIdx:maxSequenceIdx+maxSequence)
In this example, the diff command is used to find consecutive numbers. When numbers are consecutive, the difference is 1. A while loop is then used to find the longest group of ones, and the index of this consecutive group is stored. However, I'm confident that this could be optimised further.
Without loops using diff:
vector = [6 7 8 12 13 14 15 26 27 28 29 30 55 56];
seqGroups = [1 find([1 diff(vector)]~=1) numel(vector)+1]; % beginning of group
[~, groupIdx] = max( diff(seqGroups)); % bigger group index
output = vector( seqGroups(groupIdx):seqGroups(groupIdx+1)-1)
output vector is
ans =
26 27 28 29 30
Without loops - should be faster
temp = find ( ([(vector(2:end) - vector(1:end-1))==1 0])==0);
[len,ind]=max(temp(2:end)-temp(1:end-1));
vec_out = vector(temp(ind)+1:temp(ind)+len)