PYSPARK: add leading zero with condition

PYSPARK: add leading zero with condition - pyspark

I have a list of IDs with different patterns, some of them have 4 characters, others 9 characters, etc.
I need to add a leading 0 only to the IDs with 9 characters and not to affect other items.
With this code I am adding '0' to all items in the list:
df = df.withColumn('ID', F.lpad(F.col('ID'), 10, '0'))
Many thanks!

I got it:
df.withColumn('RESULT', F.when(F.length(df['ID']) == 9, F.lpad(df['ID'], 10, '0')).otherwise(df['ID'])).show(100, False)

Related

Check of non-overlapping elements postgres

Postgres documentation seems to only show how to check if two arrays have overlapping elements.
I need to know if the two array have non-overlapping elements.
So if I have 2 arrays:
array_1 = '{1, 2, 3, 4}'
array_2 = '{0, 3, 4, 5}'
This should return false. Checking for not equal doesn't work because they may not be equal in that one array might have the same integer repeat several times.
Is this comparison possible?

Two sets are non-overlapping if there are no elements in common, so using the && operator for arrays and negating the result give you what you want.
# select NOT (ARRAY[1,2,3,4] && ARRAY[0,3,3,3,3,4,5]) AS non_overlapping;
non_overlapping
-----------------
f
(1 row)
# select NOT (ARRAY[1,2,8,9] && ARRAY[0,3,3,3,3,4,5]) AS non_overlapping;
non_overlapping
-----------------
t
(1 row)

Adding leading zeros to Record Number field in Crystal Reports 13

I'm trying to add leading zeros to 'Record Number' special field provided by Crystal Reports 13.
Eg:
Record Number 1 should be '001'
Record Number 20 should be '020'
I have noticed that there's a related post about customizing table fields by using ToText({table.field},"000"). But this approach doesn't work when I use {recordnumber} instead of {table.field}.

Create a new formula to your desired field, then add this:
Right(("000" + ToText(({Comand.YourField}), 0, "")), 3)
That's it (note that if you need more zeros you can edit the formula. For 10 digits it will be Right(("0000000000" + ToText(({Comand.YourField}), 0, "")), 10) and so on.
As you completed your question, if you want RecordNumber use this way:
Right(("000" + ToText((RecordNumber), 0, "")), 3)

Iterate through mixed-Type scala Lists

Using Spark 2.1.1., I have an N-row csv as 'fileInput'
colname datatype elems start end
colA float 10 0 1
colB int 10 0 9
I have successfully made an array of sql.rows ...
val df = spark.read.format("com.databricks.spark.csv").option("header", "true").load(fileInput)
val rowCnt:Int = df.count.toInt
val aryToUse = df.take(rowCnt)
Array[org.apache.spark.sql.Row] = Array([colA,float,10,0,1], [colB,int,10,0,9])
Against those Rows and using my random-value-generator scripts, I have successfully populated an empty ListBuffer[Any] ...
res170: scala.collection.mutable.ListBuffer[Any] = ListBuffer(List(0.24455154, 0.108798146, 0.111522496, 0.44311434, 0.13506883, 0.0655781, 0.8273762, 0.49718297, 0.5322746, 0.8416396), List(1, 9, 3, 4, 2, 3, 8, 7, 4, 6))
Now, I have a mixed-type ListBuffer[Any] with different typed lists.
.
How do iterate through and zip these? [Any] seems to defy mapping/zipping. I need to take N lists generated by the inputFile's definitions, then save them to a csv file. Final output should be:
ColA, ColB
0.24455154, 1
0.108798146, 9
0.111522496, 3
... etc
The inputFile can then be used to create any number of 'colnames's, of any 'datatype' (I have scripts for that), of each type appearing 1::n times, of any number of rows (defined as 'elems'). My random-generating scripts customize the values per 'start' & 'end', but these columns are not relevant for this question).

Given a List[List[Any]], you can "zip" all these lists together using transpose, if you don't mind the result being a list-of-lists instead of a list of Tuples:
val result: Seq[List[Any]] = list.transpose
If you then want to write this into a CSV, you can start by mapping each "row" into a comma-separated String:
val rows: Seq[String] = result.map(_.mkString(","))
(note: I'm ignoring the Apache Spark part, which seems completely irrelevant to this question... the "metadata" is loaded via Spark, but then it's collected into an Array so it becomes irrelevant)

I think the RDD.zipWithUniqueId() or RDD.zipWithIndex() methods can perform what you wanna do.
Please refer to official documentation for more information. hope this help you

Remove according to the pattern matching

The current raw data :
1-2-05.11
1-15-05.20
how can I remove after .. The expected result is 1-2-05. I test using split_part and also substring, but the result is not fit the requirement.
Any suggestion ?

Try this,I assume that your expected output is 1-2-05.(with .)
Using split_part().
SELECT SPLIT_PART('1-2-05.11','.',1)||'.';
Using substring().
SELECT SUBSTRING('1-15-05.20', 1, LENGTH('1-15-05.20') - 2)
1 is the starting position(from left) of the string(1-15-05.20) in which substring action to be taken
LENGTH('1-15-05.20') - 2, is to define the number of character to be extracted from the given string.The string is 1-15-05.20 the length() of it is 10, you need to remove last two characters from this 10 chars so 10 - 2 ie LENGTH('1-15-05.20') - 2

How can I select certain rows in a dataset? Mathematica

My question is probably really easy, but I am a mathematica beginner.
I have a dataset, lets say:
Column: Numbers from 1 to 10
Column Signs
Column Other signs.
{{1,2,3,4,5,6,7,8,9,10},{d,t,4,/,g,t,w,o,p,m},{g,h,j,k,l,s,d,e,w,q}}
Now I want to extract all rows for which column 1 provides an odd number. In other words I want to create a new dataset.
I tried to work with Select and OddQ as well as with the IF function, but I have absolutely no clue how to put this orders in the right way!

Taking a stab at what you might be asking..
(table = {{1, 2, 3, 4, 5, 6, 7, 8, 9, 10} ,
Characters["abcdefghij"],
Characters["ABCDEFGHIJ"]}) // MatrixForm
table[[All, 1 ;; -1 ;; 2]] // MatrixForm
or perhaps this:
Select[table, OddQ[#[[1]]] &]
{{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}}

The convention in Mathematica is the reverse of what you use in your description.
Rows are first level sublists.
Let's take your original data
mytable = {{1,2,3,4,5,6,7,8,9,10},{d,t,4,"/",g,t,w,o,p,m},{g,h,j,k,l,s,d,e,w,q}}
Just as you suggested, Select and OddQ can do what you want, but on your table, transposed. So we transpose first and back:
Transpose[Select[Transpose[mytable], OddQ[First[#]]& ]]
Another way:
Mathematica functional command MapThread can work on synchronous lists.
DeleteCases[MapThread[If[OddQ[#1], {##}] &, mytable], Null]
The inner function of MapThread gets all elements of what you call a 'row' as variables (#1, #2, etc.). So it test the first column and outputs all columns or a Null if the test fails. The enclosing DeleteCases suppresses the unmatching "rows".

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

PYSPARK: add leading zero with condition - pyspark

I got it: df.withColumn('RESULT', F.when(F.length(df['ID']) == 9, F.lpad(df['ID'], 10, '0')).otherwise(df['ID'])).show(100, False)

Related

Check of non-overlapping elements postgres

Adding leading zeros to Record Number field in Crystal Reports 13

Iterate through mixed-Type scala Lists

Remove according to the pattern matching

How can I select certain rows in a dataset? Mathematica

Categories

Resources