I have a problem with an eloquent statement.
I have two tables
Table 1 (ORDER_DETAILS)
FORM_ID
ORDER_ID
DETAIL_ID
VALUE
A1
X1
B1
Test
A1
X1
B3
10;20
A2
X2
B10
Test 2
A2
X2
B20
1;2
A2
X2
B30
A;B;C
A3
X3
B200
X;Y;Z
A3
X3
B300
Test 3
A4
X4
B1000
L;M;O
Table 2 (FORM_DETAIL)
FORM_ID
DETAIL_ID
MOD
A1
B1
Text
A1
B2
Input
A1
B3
Select
A2
B10
Input
A2
B20
Select
A2
B30
Select
A3
B100
Text
A3
B200
Select
A3
B300
Input
A4
B1000
Select
A4
B2000
Text
A4
B3000
Text
A4
B4000
Input
A4
B5000
Input
Now I would like to bring them together for example when I call the ORDER_ID X1.
FORM_ID
ORDER_ID
DETAIL_ID
VALUE
A1
X1
B1
Test
A1
X1
B2
null
A1
X1
B3
10;20
or the ORDER_ID X4
FORM_ID
ORDER_ID
DETAIL_ID
VALUE
A4
X4
B1000
L;M;O
A4
X4
B2000
null
A4
X4
B3000
null
A4
X4
B4000
null
A4
X4
B5000
null
All values from the table 2 (FORM_DETAIL) should always be displayed and
check whether a VALUE is intended in the table ORDER_ID X4. It is important
that FORM_ID and ORDER_ID are always identical in the
are always identical.
Can you maybe help me?
Sorry, I am still Laravel beginner :o)
I have a huge pyspark dataframe with segments and their subsegments, like this:
SegmentId SubSegmentStart SubSegmentEnd
1 a1 a2
1 a2 a3
2 b1 b2
3 c1 c2
3 c3 c4
3 c2 c3
I need to group records by SegmentId and add new column index to build chain of subsegments using start and end points. I need to do it for each Segment.
So I need to get the following dataframe:
SegmentId SubSegmentStart SubSegmentEnd Index
1 a1 a2 0
1 a2 a3 1
2 b1 b2 0
3 c1 c2 0
3 c3 c4 2
3 c2 c3 1
How can I do it by PySpark?
How to get an additional column of type string using ??
I tried this:
t:([]c1:`a`b`c;c2:1 2 3)
?[t;();0b;`c1`c2`c3!(`c1;`c2;10)] / ok
?[t;();0b;`c1`c2`c3!(`c1;`c2;enlist(`abc))] / ok
?[t;();0b;`c1`c2`c3!(`c1;`c2;"10")] / 'length
?[t;();0b;`c1`c2`c3!(`c1;`c2;enlist("10"))] / 'length
but got 'length error.
Your first case works because an atom will automatically expand to the required length. For a compound column you'll need to explicitly generate the correct length as follows
q)select c1,c2,c3:`abc,c4:10,c5:count[i]#enlist"abc" from t
c1 c2 c3 c4 c5
------------------
a 1 abc 10 "abc"
b 2 abc 10 "abc"
c 3 abc 10 "abc"
// in functional form
q)?[t;();0b;`c1`c2`c3!(`c1;`c2;(#;(count;`i);(enlist;"abc")))]
c1 c2 c3
-----------
a 1 "abc"
b 2 "abc"
c 3 "abc"
Jason
Input Data:
key,date,value
10,20180701,a10
11,20180702,a11
12,20180702,a12
13,20180702,a13
14,20180702,a14
15,20180702,a15
16,20180702,a16
17,20180702,a17
18,20180702,a18
19,20180702,a19
1 ,20180701,a1
2 ,20180701,a2
3 ,20180701,a3
4 ,20180701,a4
5 ,20180701,a5
6 ,20180701,a6
7 ,20180701,a7
8 ,20180701,a8
9 ,20180701,a9
Code
val rawData=sc.textFile(.....).
val datadf:DataFrame=rawData.toDF
After reading the data into DF with columns key,data,value
datadf.coalesce(1).orderBy(desc("key")).drop(col("key")).write.mode("overwrite").partitionBy("date").text("hdfs://path/")
I am trying to order the column by column key and drop the same column before saving to hdfs (into a single file for each day).
I am not able to preserve the order in the outputfiles.
if i am not using coalesce the order is preserved but multiple files are getting generated.
Output:
/20180701/part-xxxxxxx.txt
a1
a9
a6
a4
a5
a3
a7
a8
a2
a10
/20180702/part-xxxxxxx.txt
a18
a12
a13
a19
a15
a16
a17
a11
a14
Expected OP:
/20180701/part-xxxxxxx.txt
a1
a2
a3
a4
a5
a6
a7
a8
a9
a10
/20180702/part-xxxxxxx.txt
a11
a12
a13
a14
a15
a16
a17
a18
a19
The following code should get you started (This is using Spark 2.1) :-
import org.apache.spark.sql.types.StructType
val schema = new StructType().add($"key".int).add($"date".string).add($"value".string)
val df = spark.read.schema(schema).option("header","true").csv("source.txt")
df.coalesce(1).orderBy("key").drop("key").write.mode("overwrite").partitionBy("date").csv("hdfs://path/")
I have a report with the 4 columns,
ColumnA|ColumnB|ColumnC|ColumnD
Row1 A1 B1 C1 D1
Row2 A1 B1 C1 D2
Row3 A1 B1 C1 D1
Row4 A1 B1 C1 D2
Row5 A1 B1 C1 D1
I did like grouping based on the 4 columns, but i got output with space for every row.
But here in this report i would like to get the ouput as,
ColumnA|ColumnB|ColumnC|ColumnD
Row1 A1 B1 C1 D1
Row2 A1 B1 C1 D2
<-------------an empty space ----------->
Row3 A1 B1 C1 D1
Row4 A1 B1 C1 D2
<-------------an empty space ----------->
Row5 A1 B1 C1 D1
How can i achieve the above output?
A standard group by would sort the record like this:
ColumnA|ColumnB|ColumnC|ColumnD
Row1 A1 B1 C1 D1
Row3 A1 B1 C1 D1
Row5 A1 B1 C1 D1
Row2 A1 B1 C1 D2
Row4 A1 B1 C1 D2
Since you don't have a standard grouping, another approach may work. You basically want a blank line after the D2 value. This will only work if you always have D2 values at the end of a group.
Create a new blank detail section under the main section
Detail one
A1 B1 C1 D1
Detail two
<blank>
Then put a conditional suppress expression on detail two
ColumnD <> "D2"
Then whenever D2 is present the blank detail section will be displayed.
You can use a Formula instead of a field Value for grouping.
select Column4 <br>
case D1 : "Group1"<br>
case D2 : "Group2"<br>
case D3 : "Group3"<br>
case D4 : "Group3"<br>
case D5 : "Group3"<br>
case D6 : "Group4"<br>
default "Group5"<br>
Is that your problem ?
The blank lines can be generated as Group Footer.