Talend - split a string to n rows

Talend - split a string to n rows - talend

I would like to split a string in a column to n rows in Talend.
For example :
column
2aabbccdd
The first number is the "n" which I use to define the row lenght, so the expected result should be :
row 1 = aa
row 2 = bb
row 3 = cc
row 4 = dd
The idea here is to iterate on the string and cut it every 2 characters.
Any idea please ?

I would use a tJavaFlex to split the string, with a trick to have n rows coming out of it.
tJavaFlex's main code:
int n = Integer.parseInt(row1.str.substring(0, 4)); //get n from the first 4 characters
String str2 = row1.str.substring(4); //get the string after n
int nbParts = (str2.length() + 1) / n;
System.out.println("number of parts = " + nbParts);
for (int i = 0; i < nbParts; i++)
{
String part = str2.substring(i * n);
if(part.length() > n)
{
part = part.substring(0, n);
}
row2.str = part;
And tJavaFlex's end code is just a closing brace:
}
The trick is to use a for loop in the main code, but only close it in the end code.
tFixedFlowInput contains just one column holding the input string.

Related

remove duplicates in a table (rexx language)

I have a question about removing duplicates in a table (rexx language), I am on netphantom applications that are using the rexx language.
I need a sample on how to remove the duplicates in a table.
I do have a thoughts on how to do it though, like using two loops for these two tables which are A and B, but I am not familiar with this.
My situation is:
rc = PanlistInsertData('A',0,SAMPLE)
TABLE A (this table having duplicate data)
123
1
1234
12
123
1234
I need to filter out those duplicates data into TABLE B like this:
123
1234
1
12

You can use lookup stem variables to test if you have already found a value.
This should work (note I have not tested so there could be syntax errors)
no=0;
yes=1
lookup. = no /* initialize the stem to no, not strictly needed */
j=0
do i = 1 to in.0
v = in.i
if lookup.v <> yes then do
j = j + 1
out.j = v
lookup.v = yes
end
end
out.0 = j

You can eliminate the duplicates by :
If InStem first element, Move the element to OutStem Else check all the OutStem elements for the current InStem element
If element is found, Iterate to the next InStem element Else add InStem element to OutStem
Code Snippet :
/*Input Stem - InStem.
Output Stem - OutStem.
Array Counters - I, J, K */
J = 1
DO I = 1 TO InStem.0
IF I = 1 THEN
OutStem.I = InStem.I
ELSE
DO K = 1 TO J
IF (InStem.I ?= OutStem.K) & (K = J) THEN
DO
J = J + 1
OutStem.J = InStem.I
END
ELSE
DO
IF (InStem.I == OutStem.K) THEN
ITERATE I
END
END
END
OutStem.0 = J
Hope this helps.

Passing value with numeric format

i have problems regarding passing value with formatting issues.
The description i have put in the codes as i am having problem passing my PreEditedCheque in my code for the if ValidateMMonCheque <> MM part. The output for if length(RawChequenumber) = 15 will be in 1 digit instead of 00001 ( example)
MM = HostGetFLD('','MM')
YY = HostGetFLD('','YY')
PreEditedCheque = substr(RawChequenumber,11,5)
ValidateMMonCheque = substr(RawChequenumber,7,2)
if ValidateMMonCheque <> MM Then *From this statement*
Do
PreEditedCheque = substr('00000',1,5) *This part where those 0 can't be properly shown if pass to the next statement*
EditedCheque = '00'||'2'||'0'||YY||MM||'00'||PreEditedCheque
rc = message(2,2,EditedCheque)
End
if length(RawChequenumber) = 15 Then
EditedCheque = '00'||'2'||'0'||YY||MM||'00'||PreEditedCheque + 1 *Second statement if <>MM ran, this part, the PreEditedCheque will be not in 00001, it will be 1.
rc = PanSetCtlData('PREVIEW',EditedCheque)

What you're asking for is to have the cheque number padded to the left with zeroes in a 5-character field. The Right() function is your friend:
Right(PreEditedCheque, 5, '0') /* "1" -> "00001" */

Printing the values from the corresponding array

Suppose:
1 A
2 B
3 C
I need to print the value corresponding to 1-> A. I have put each in an array:
d1[1,2,3] and s1[A,B,C]. Now I need to print the value in the form shown in the above:
d1[0] s1[0]
1 A
How can I do this using UnityScript? In the program, I did id printing in this format:
1 A
1 B
1 C
2 A
2 B
2 C

What you have probably done, and I'm guessing without seeing the code is something along that you have a for loop within a for loop which means you're processing the first item in the first array and then all items in the second:
for (var i = 0; i < d1.Length; i++) {
for(var j = 0; j < s1.length; j++ {
Debug.log(d1[i] + " " + s1[j])
}
}
Your options depend on the array sizes and how you want to handle them. For example if you know they're the same size consistently then
for(var i = 0; i < d1.Length; i++) {
Debug.log(d1[i] + " " + s1[i])
}
should work. I would wonder if what you're trying to achieve could be done with a different data structure such as a dictionary, etc.

MATLAB Execution Time Increasing

Here is my code. The intent is I have a Wireshark capture saved to a particularly formatted text file. The MATLAB code is supposed to go through the Packets, dissect them for different protocols, and then make tables based on those protocols. I currently have this programmed for ETHERNET/IP/UDP/MODBUS. In this case, it creates a column in MBTable each time it encounters a new register value, and each time it comes across a change to that register value, it updates the value in that line of the table. The first column of MBTable is time, the registers start with the second column.
MBTable is preallocated to over 100,000 Rows (nol is very large), 10 columns before this code is executed. The actual data from a file I'm pulling into the table gets to about 10,000 rows and 4 columns and the code execution is so slow I have to stop it. The tic/toc value is calculated every 1000 rows and continues to increase exponentially with every iteration. It is a large loop, but I can't see where anything is growing in such a way that it would cause it to run slower with each iteration.
All variables get initialized up top (left out to lessen amount of code.
The variables eth, eth.ip, eth.ip.udp, and eth.ip.udp.modbus are all of type struct as is eth.header and eth.ip.header. WSID is a file ID from a .txt file opened earlier.
MBTable = zeros(nol,10);
tval = tic;
while not(feof(WSID))
packline = packline + 1;
fl = fl + 1;
%Get the next line from the file
MBLine = fgetl(WSID);
%Make sure line is not blank or short
if length(MBLine) >= 3
%Split the line into 1. Line no, 2. Data, 3. ASCII
%MBAll = strsplit(MBLine,' ');
%First line of new packet, if headers included
if strcmp(MBLine(1:3),'No.')
newpack = true;
newtime = false;
newdata = false;
stoppack = false;
packline = 1;
end
%If packet has headers, 2nd line contains timestamp
if newpack
Ordered = false;
if packline == 2;
newtime = true;
%MBstrs = strsplit(MBAll{2},' ');
packno = int32(str2double(MBLine(1:8)));
t = str2double(MBLine(9:20));
et = t - lastt;
if lastt > 0 && et > 0
L = L + 1;
MBTable(L,1) = t;
end
%newpack = false;
end
if packline > 3
dataline = int16(str2double(MBLine(1:4)));
packdata = strcat(packdata,MBLine(7:53));
end
end
else
%if t >= st
if packline > 3
stoppack = true;
newpack = false;
end
if stoppack
invalid = false;
%eth = struct;
eth.pack = packdata(~isspace(packdata));
eth.length = length(eth.pack);
%Dissect the packet data
eth.stbyte = 1;
eth.ebyte = eth.length;
eth.header.stbyte = 1;
eth.header.ebyte = 28;
%Ethernet Packet Data
eth.header.pack = eth.pack(eth.stbyte:eth.stbyte+27);
eth.header.dest = eth.header.pack(eth.header.stbyte:eth.header.stbyte + 11);
eth.header.src = eth.header.pack(eth.header.stbyte + 12:eth.header.stbyte + 23);
eth.typecode = eth.header.pack(eth.header.stbyte + 24:eth.header.ebyte);
if strcmp(eth.typecode,'0800')
eth.type = 'IP';
%eth.ip = struct;
%IP Packet Data
eth.ip.stbyte = eth.header.ebyte + 1;
eth.ip.ver = eth.pack(eth.ip.stbyte);
%IP Header length
eth.ip.header.length = 4*int8(str2double(eth.pack(eth.ip.stbyte+1)));
eth.ip.header.ebyte = eth.ip.stbyte + eth.ip.header.length - 1;
%Differentiated Services Field
eth.ip.DSF = eth.pack(eth.ip.stbyte + 2:eth.ip.stbyte + 3);
%Total IP Packet Length
eth.ip.length = hex2dec(eth.pack(eth.ip.stbyte+4:eth.ip.stbyte+7));
eth.ip.ebyte = eth.ip.stbyte + max(eth.ip.length,46) - 1;
eth.ip.pack = eth.pack(eth.ip.stbyte:eth.ip.ebyte);
eth.ip.ID = eth.pack(eth.ip.stbyte+8:eth.ip.stbyte+11);
eth.ip.flags = eth.pack(eth.ip.stbyte+12:eth.ip.stbyte+13);
eth.ip.fragoff = eth.pack(eth.ip.stbyte+14:eth.ip.stbyte+15);
%Time to Live
eth.ip.ttl = hex2dec(eth.pack(eth.ip.stbyte+16:eth.ip.stbyte+17));
eth.ip.typecode = eth.pack(eth.ip.stbyte+18:eth.ip.stbyte+19);
eth.ip.checksum = eth.pack(eth.ip.stbyte+20:eth.ip.stbyte+23);
%eth.ip.src = eth.pack(eth.ip.stbyte+24:eth.ip.stbyte+31);
eth.ip.src = ...
[num2str(hex2dec(eth.pack(eth.ip.stbyte+24:eth.ip.stbyte+25))),'.', ...
num2str(hex2dec(eth.pack(eth.ip.stbyte+26:eth.ip.stbyte+27))),'.', ...
num2str(hex2dec(eth.pack(eth.ip.stbyte+28:eth.ip.stbyte+29))),'.', ...
num2str(hex2dec(eth.pack(eth.ip.stbyte+30:eth.ip.stbyte+31)))];
eth.ip.dest = ...
[num2str(hex2dec(eth.pack(eth.ip.stbyte+32:eth.ip.stbyte+33))),'.', ...
num2str(hex2dec(eth.pack(eth.ip.stbyte+34:eth.ip.stbyte+35))),'.', ...
num2str(hex2dec(eth.pack(eth.ip.stbyte+36:eth.ip.stbyte+37))),'.', ...
num2str(hex2dec(eth.pack(eth.ip.stbyte+38:eth.ip.stbyte+39)))];
if strcmp(eth.ip.typecode,'11')
eth.ip.type = 'UDP';
eth.ip.udp.stbyte = eth.ip.stbyte + 40;
eth.ip.udp.src = hex2dec(eth.pack(eth.ip.udp.stbyte:eth.ip.udp.stbyte + 3));
eth.ip.udp.dest = hex2dec(eth.pack(eth.ip.udp.stbyte+4:eth.ip.udp.stbyte+7));
eth.ip.udp.length = hex2dec(eth.pack(eth.ip.udp.stbyte+8:eth.ip.udp.stbyte+11));
eth.ip.udp.checksum = eth.pack(eth.ip.udp.stbyte+12:eth.ip.udp.stbyte+15);
eth.ip.udp.protoID = eth.pack(eth.ip.udp.stbyte+20:eth.ip.udp.stbyte+23);
if strcmp(eth.ip.udp.protoID,'0000')
eth.ip.udp.proto = 'MODBUS';
%eth.ip.udp.modbus = struct;
eth.ip.udp.modbus.stbyte = eth.ip.udp.stbyte+16;
eth.ip.udp.modbus.transID = eth.pack(eth.ip.udp.modbus.stbyte:eth.ip.udp.modbus.stbyte+3);
eth.ip.udp.modbus.protoID = eth.ip.udp.protoID;
eth.ip.udp.modbus.length = int16(str2double(eth.pack(eth.ip.udp.modbus.stbyte + 8:eth.ip.udp.modbus.stbyte + 11)));
eth.ip.udp.modbus.UID = eth.pack(eth.ip.udp.modbus.stbyte + 12:eth.ip.udp.modbus.stbyte + 13);
eth.ip.udp.modbus.func = hex2dec(eth.pack(eth.ip.udp.modbus.stbyte + 14:eth.ip.udp.modbus.stbyte+15));
eth.ip.udp.modbus.register = eth.pack(eth.ip.udp.modbus.stbyte + 16: eth.ip.udp.modbus.stbyte+19);
%Number of words to a register, or the number of registers
eth.ip.udp.modbus.words = hex2dec(eth.pack(eth.ip.udp.modbus.stbyte+20:eth.ip.udp.modbus.stbyte+23));
eth.ip.udp.modbus.bytes = hex2dec(eth.pack(eth.ip.udp.modbus.stbyte+24:eth.ip.udp.modbus.stbyte+25));
eth.ip.udp.modbus.data = eth.pack(eth.ip.udp.modbus.stbyte + 26:eth.ip.udp.modbus.stbyte + 26 + 2*eth.ip.udp.modbus.bytes - 1);
%If func 16 or 23, loop through data/registers and add to table
if eth.ip.udp.modbus.func == 16 || eth.ip.udp.modbus.func == 23
stp = eth.ip.udp.modbus.bytes*2/eth.ip.udp.modbus.words;
for n = 1:stp:eth.ip.udp.modbus.bytes*2;
%Check for existence of register as a key?
if ~isKey(MBMap,eth.ip.udp.modbus.register)
MBCol = MBCol + 1;
MBMap(eth.ip.udp.modbus.register) = MBCol;
end
MBTable(L,MBCol) = hex2dec(eth.ip.udp.modbus.data(n:n+stp-1));
eth.ip.udp.modbus.register = dec2hex(hex2dec(eth.ip.udp.modbus.register)+1);
end
lastt = t;
end
%If func 4, make sure it is the response, then put
%data into table for register column
elseif false
%need code to handle serial to UDP conversion box
else
invalid = true;
end
else
invalid = true;
end
else
invalid = true;
end
if ~invalid
end
end
%end
end
%Display Progress
if int64(fl/1000)*1000 == fl
for x = 1:length(mess);
fprintf('\b');
end
%fprintf('Lines parsed: %i',fl);
mess = sprintf('Lines parsed: %i / %i',fl,nol);
fprintf('%s',mess);
%Check execution time - getting slower:
%%{
ext = toc(tval);
mess = sprintf('\nExecution Time: %f\n',ext);
fprintf('%s',mess);
%%}
end
end
ext = toc - exst;
Update: I updated my code above to remove the overloaded operators (disp and lt were replaced with mess and lastt)
Was asked to use the profiler, so I limited to 2000 lines in the table (added && L >=2000 to the while loop) to limit the execution time, and here are the top results from the profiler:
SGAS_Wireshark_Parser_v0p7_fulleth 1 57.110 s 9.714 s
Strcat 9187 29.271 s 13.598 s
Blanks 9187 15.673 s 15.673 s
Uigetfile 1 12.226 s 0.009 s
uitools\private\uigetputfile_helper 1 12.212 s 0.031 s
FileChooser.FileChooser>FileChooser.show 1 12.085 s 0.006s
...er>FileChooser.showPeerAndBlockMATLAB 1 12.056 s 0.001s
...nChooser>FileOpenChooser.doShowDialog 1 12.049 s 12.049 s
hex2dec 44924 2.944 s 2.702 s
num2str 16336 1.139 s 0.550 s
str2double 17356 1.025 s 1.025 s
int2str 16336 0.589 s 0.589 s
fgetl 17356 0.488 s 0.488 s
dec2hex 6126 0.304 s 0.304 s
fliplr 44924 0.242 s 0.242 s
It appears to be strcat calls that are doing it. I only explicitly call strcat on one line. Are some of the other string manipulations I'm doing calling strcat indirectly?
Each loop should be calling strcat the same number of times though, so I still don't understand why it takes longer and longer the more it runs...
also, hex2dec is called a lot, but is not really affecting the time.
But anyway, are there any other methods I can use the combine the strings?

Here is the issue:
The string (an char array in MATLAB) packdata was being resized and reallocated over and over again. That's what was slowing down this code. I did the following steps:
I eliminated the redundant variable packdata and now only use eth.pack.
I preallocated eth.pack and a couple "helper variables" of known lengths by running blanks ONCE for each before the loop ever starts
eth.pack = blanks(604);
thisline = blanks(47);
smline = blanks(32);
(Note: 604 is the maximum possible size of packdata based on headers + MODBUS protocol)
Then I created a pointer variable to point to the location of the last char written to packdata.
pptr = 1;
...
dataline = int16(str2double(MBLine(1:4)));
thisline = MBLine(7:53); %Always 47 characters
smline = [thisline(~isspace(thisline)),blanks(32-sum(~isspace(thisline)))]; %Always 32 Characters
eth.pack(pptr:pptr+31) = smline;
pptr = pptr + 32;
The above was inside the 'if packline > 3' block in place of the 'packdata =' statement, then at the end of the 'if stoppack' block was the reset statement:
pptr = 1; %Reset Pointer
FYI, not surprisingly this brought out other flaws in my code which I've mostly fixed but still need to finish. Not a big issue now as this loop executes lightning fast with these changes. Thanks to Yvon for helping point me in the right direction.
I kept thinking my huge table, MBTable was the issue... but it had nothing to do with it.

Collision Hash Function

Hi all I have a big problem with my hash function. I try to explain my problem :
I have a set of char and I want to do an hash function because I want to change the set with hash set, for each char I have a index , so what I do now :
pair --> index p = 1 index a = 2 index i = 3 index r= 4---> so my hash return 1234
but if for example I have
so --> index s = 12 index o = 34 ---> hash 1234
COLLISION!!!!
P.S. : I cannot order my char in alphabetic number....
So , is there anyone that can help me?? THANKS A LOT :)

You could try the string hashing function of Java. This is my C# port which should be simple ported to c++:
int javaHash(String txt) {
uint h = 0;
if(txt.Length > 0) {
for(int i = 0; i < txt.Length; i++) {
h = 31 * h + txt[i];
}
}
return (int)h;
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Talend - split a string to n rows - talend

Related

remove duplicates in a table (rexx language)

Passing value with numeric format

Printing the values from the corresponding array

MATLAB Execution Time Increasing

Collision Hash Function

Categories

Resources