Scala Creating combination by Key efficiently - scala

Update
Original file is a text file that is in a form of
( record_id, element )
( 1 1 2 3 )
( 2 2 5 6 7 )
I used the following function to read input file:
sc.textFile(input)
and process it into the records array format below.
=========================================================================
Here is a sample an array in scala im working in
Records Array looks like below:
Array(Record_Id , Array(Element) )
( 1 , Array(1,2,3 ) )
( 2 , Array(2,5,6,7) )
....
I wrote a scala map function to extract prefix(dynamically in here i take first half of the elements) of each array
val prefix = records.map(x => ((x._1, x._2) ,(x._2.take((x._2.size*0.5).ceil ))) )
Which will give me
Array(Record_Id , Array(Element) , Prefix)
( 1 , Array(1,2,3 ) , Array(1,2))
( 2 , Array(2,5,6,7) , Array(2,5))
....
Now I try to generate a RDD that uses each prefix as key, Which would look like something like this:
(Prefix , Array(Record_Id , Array(Element) )
( 1 , (1 , Array(1,2,3 ) )
( 2 , (1 , Array(1,2,3 ) )
( 2 , (2 , Array(2,5,6,7) )
( 5 , (2 , Array(2,5,6,7) )
....
I tried the following:
val pairedWithKey = prefix.map{case (k,v) => v.map(i => k ->i)}
It works perfectly on small volume data.
However, once i use this in a larger datasets it will take forever to load.
I'm still new to scala, it would be great if anyone can direct me on how to improve the performance of this operation.

Related

replace elements in a 2d array

Given a 2d array
select (ARRAY[[1,2,3], [4,0,0], [7,8,9]]);
{{1,2,3},{4,0,0},{7,8,9}}
Is there a way to replace the slice at [2:2][2:] (the {{0,0}}) with values 5 and 6? array_replace replaces a specific value so I'm not sure how to approach this.
I believe it's more readable to code a function in plpgsql. However, a pure SQL solution also exists:
select (
select array_agg(inner_array order by outer_index)
from (
select outer_index,
array_agg(
case
when outer_index = 2 and inner_index = 2 then 5
when outer_index = 2 and inner_index = 3 then 6
else item
end
order by inner_index
) inner_array
from (
select item,
1 + (n - 1) % array_length(a, 1) inner_index,
1 + (n - 1) / array_length(a, 2) outer_index
from
unnest(a) with ordinality x (item, n)
) _
group by outer_index
)_
)
from (
select (ARRAY[[1,2,3], [4,0,0], [7,8,9]]) a
)_;

Multiple AND statements in a FileMaker calculation

I want to calculate if a value falls outside of 10% of the last two values added to a database. This calculation is not giving me correct feedback when I have 'Weight" values close to 10, or from 100-110. Otherwise it works fine.
Case (
Get(RecordNumber) ≤ 2 ; "Continue Collecting Data" ;
(((GetNthRecord ( Weight ; Get(RecordNumber)-2))*.9) ≤ Weight) and
(((GetNthRecord ( Weight ; Get(RecordNumber)-2))*1.1) ≥ Weight) and
(((GetNthRecord ( Weight ; Get(RecordNumber)-1))*.9) ≤ Weight) and
(((GetNthRecord ( Weight ; Get(RecordNumber)-1))*1.1) ≥ Weight);
"Stable";
"Unstable")
I’m going to start with the assumption that your table includes fields for both the primary key and the creation timestamp. If not, I would highly recommend adding both to this table and every other table you create.
Assuming these fields are in place, you need to create another occurrence of the table on which this layout is based, then relate the primary key to itself via a Cartesian (×) join. Sort the relationship by creation timestamp descending. Your calculation is then:
Case (
(((GetNthRecord ( Weight ; 1 ) *.9 ) ≤ Weight) and
(((GetNthRecord ( Weight ; 1 ) *1.1 ) ≥ Weight) and
(((GetNthRecord ( Weight ; 2 ) *.9 ) ≤ Weight) and
(((GetNthRecord ( Weight ; 2 ) *1.1 ) ≥ Weight);
"Stable";
"Unstable")
Another thing I noticed is that your code is kind of complex. The Let function can make things easier to read, and your four criteria can be cut down to whether either difference is out of range. So, a simpler version becomes:
Let ( [
#weight1 = GetNthRecord ( all::weight ; 1 ) ;
#weight2 = GetNthRecord ( all::weight ; 2 )
] ; //end define Let
Case (
Abs ( #weight1 - weight ) > .1 ; "Unstable" ;
Abs ( #weight2 - weight ) > .1 ; "Unstable" ;
"Stable"
) //end Case
) //end Let
Does that help?
Assuming you are using FileMaker v12 or later, this looks like a good place to use the ExecuteSQL function (not the script step) to retrieve the last two values. You could do something like this:
Let (
sqlQuery = "
SELECT t.weight
FROM MyTable t
WHERE t.id <> ?
ORDER BY t.id DESC
FETCH FIRST 2 ROWS ONLY
" ;
ExecuteSQL ( sqlQuery ; "" ; "" ; MyTable::id )
)
This query assumes you have a unique 'id' field (i.e. a primary key) that's defined as an 'auto-enter serial' value. The WHERE clause makes sure that the current record (presumably the one the user is entering) is not included in the query. The ORDER BY DESC clause forces the last two records to the top where we can fetch the 'weight' values easily into a value list.
Assuming you use a 'Set Variable' script step to put the query results into $lastValues, you can then test for whether they are in range like so:
Let ( [
lastValue1 = GetValue ( $lastValues ; 1 ) ;
lastValue2 = GetValue ( $lastValues ; 2 ) ;
Value1_InRange = lastValue1 - Abs ( lastValue1 - weight ) >= ( 0.9 * lastValue1 ) ;
Value2_InRange = lastValue2 - Abs ( lastValue2 - weight ) >= ( 0.9 * lastValue2 )
] ;
Value1_InRange and Value2_InRange // returns 1 if both values within range, 0 if not
)
If I were doing this, I would put the above range-checking code into a custom function so it's generic and can be easily reused:
IsWithinRange ( valueToTest ; lastValue ; range ) =
lastValue - Abs ( lastValue - valueToTest ) >= ( ( 1 - range ) * lastValue )
Then the above range-checking code can be reduced to:
IsWithinRange ( MyTable::weight ; GetValue ( $lastValues ; 1 ) ; 0.1 ) &
IsWithinRange ( MyTable::weight ; GetValue ( $lastValues ; 2 ) ; 0.1 )
And one last note.. if you use the ExecuteSQL function in a calculated field, be sure to make it 'unstored' so that it only executes when needed. However, I would recommend you avoid that altogether and simply call it from a script step like 'Set Variable'. That way you can control exactly when it executes.
Hope that helps!

How to extract number from a string

I've a string like 'intercompany creditors {DEMO[[1]]}'. I want to extract only the numbers from the string, in example just '1'.
How to do this in Invantive SQL?
You should be able to do so with substr (get some piece of text from specific positions in the text) and instr (get the position from a specific piece of text inside some other text):
select substr
( d
, instr(d, '[[') + 2
, instr(d, ']]') - instr(d, '[[') - 2
)
from ( select 'intercompany creditors {DEMO[[1]]}' d
from dual#DataDictionary
) x

Postgresql query won't work

query
any help appreciated,a week now and I am stuck -
many thanks if you can.
I added an image of the problem but it's disappeared
WITH SITESmin as (
SELECT public.acc.Location_Easting_OSGR, public.acc.Location_Northing_OSGR
FROM acc Sites ,
ORDER BY ( acc.Location_Easting_OSGR - Sites.SITE_ETG ) * ( acc.Location_Easting_OSGR - Sites.SITE_ETG ) + (acc.Location_Northing_OSGR - "public"."Sites"."SITE_ETG" ) * ( acc.Location_Northing_OSGR - "public"."Sites"."SITE_NTG" )
LIMIT 1
)
UPDATE ACC
SET acc.Location_Easting_OSGR = SITESmin.acc.Location_Easting_OSGR,
acc.Location_Northing_OSGR = SITESmin.acc.Location_Northing_OSGR
FROM SITESmin;
Here's the error:
Error : ERROR: syntax error at or near "ORDER"
LINE 4: ORDER BY ( acc.Location_Easting_OSGR - Sites.SITE_ETG )...
The ^ carat appears just after the Line 4: colon
on second look i noticed that this query has several problems.If you are using alias then stick to that alias, you have lots of fields defined wrongly or your query you posted has some missing parts and are not present in your example. and update part looks like is missing where condition ....
for example
SELECT public.acc.Location_Easting_OSGR, public.acc.Location_Northing_OSGR
yet you defined alias "Sites", which by the way is missing "as" syntax, it shouldve been
FROM acc as Sites
WITH SITESmin as (
SELECT Sites.Location_Easting_OSGR, Sites.Location_Northing_OSGR
FROM acc as Sites --, <--- this coma was is causing that error, it does not belong there or some code is missing
ORDER BY (Sites.Location_Easting_OSGR - Sites.SITE_ETG ) * ( Sites.Location_Easting_OSGR - Sites.SITE_ETG ) + (Sites.Location_Northing_OSGR - Sites.SITE_ETG ) * ( Sites.Location_Northing_OSGR - Sites.SITE_NTG )
LIMIT 1
)
UPDATE ACC
SET acc.Location_Easting_OSGR = SITESmin.Location_Easting_OSGR,
acc.Location_Northing_OSGR = SITESmin.Location_Northing_OSGR
FROM SITESmin
--- missing where condition?
;

FMP 14 - Auto Populate a Field based on a calculation

I am using FMP 14 and would like to auto-populate field A based on the following calulation:
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v1" ; "1st" ) or
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v2" ; "2nd" ) or
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v3" ; "3rd" ) or
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v4" ; "4th" ) or
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v5" ; "5th" ) or
If ( Get ( ActiveLayoutObjectName ) = "tab_Visits_v6" ; "6th" )
The above code is supposed to auto-populate the value 1st, 2nd, 3rd ... in field A depending on the name of the object the Get (ActiveLayoutObjectName) function returns. I have named each of my tabs, but the calculation is only returning 0.
Any help would be appreciated.
thanks.
The way your calculation is written makes very little sense. Each one of the If() statements returns a result of either "1st", "2nd", etc. or nothing (an empty string). You are then applying or to all these results. Since only of them can be true, your calculation is essentially doing something like:
"" or "2nd" or "" or "" or "" or ""
which happens to return 1 (true), but has no useful meaning.
You should be using the Case() function here:
Case (
Get ( ActiveLayoutObjectName ) = "tab_Visits_v1" ; "1st" ;
Get ( ActiveLayoutObjectName ) = "tab_Visits_v2" ; "2nd" ;
Get ( ActiveLayoutObjectName ) = "tab_Visits_v3" ; "3rd" ;
Get ( ActiveLayoutObjectName ) = "tab_Visits_v4" ; "4th" ;
Get ( ActiveLayoutObjectName ) = "tab_Visits_v5" ; "5th" ;
Get ( ActiveLayoutObjectName ) = "tab_Visits_v6" ; "6th"
)
Note also that a calculation field may not always refresh itself as a result of user switching a tab. This refers to an unstored calculation field; if you are trying to use this as the formula to be auto-entered into a "regular' (e.g. Text) field, it will never update.
Added:
Here is our situation. We see a patient a maximum of 6 times. We have
a tab for each one of those 6 visits.
I would suggest you use a portal to a related table of Visits instead of a tab control. A tab control is designed to display fixed components of user interface - not dynamic data. And certainly not data split into separate records. You should have only one unique record for each patient - and as many records for each patient's visits as may be necessary (potentially unlimited).
If you like, you can use the portal rows as buttons to select a specific visit to view in more detail (similar to a tab control, except that the portal shows the "tabs" as vertical rows). A one-row portal to the same Visits table, filtered by the user selection, would work very well for this purpose, I believe.
With 1. .... it would be easy:
Right (Get ( ActiveLayoutObjectName ) ; 1) & "."
Thanks for pointing out, that my first version does not work.