How can I read from port IO in Simics? - simics

I can see from help-search that there's a "<port_space>.read", but I don't know how to find the <port_space> name for the CPU.

port_space attribute of a processor core points to a memory space object that is used for port accesses. You can access to memory space object directly using get/set (for non-architectural access), read/write (for architectural access) for example:
simics> board.mb.cpu0.core[0][0]->port_space
"board.mb.cpu0.ports_proxy[0][0]"
simics> board.mb.cpu0.ports_proxy[0][0].write 0xcf9 0xff size=1
[board.mb.nb.pci_bus info] sending hot_reset
You can also traverse the memory space to find what is exactly mapped there:
simics> board.mb.cpu0.ports_proxy[0][0].map
+---------+-------------------+--+------+------+------+----+-----+----+
| Base|Object |Fn|Offset|Length|Target|Prio|Align|Swap|
+---------+-------------------+--+------+------+------+----+-----+----+
|-default-|board.mb.port_mem_m| | 0x0| | | | | |
+---------+-------------------+--+------+------+------+----+-----+----+
simics> board.mb.port_mem_m.map
+----+-----------------+--+------+-------+------+----+-----+----+
|Base|Object |Fn|Offset| Length|Target|Prio|Align|Swap|
+----+-----------------+--+------+-------+------+----+-----+----+
| 0x0|board.mb.port_mem| | 0x0|0x10000| | 0| | |
+----+-----------------+--+------+-------+------+----+-----+----+
simics> board.mb.port_mem.map
+---------+-------------------------------+--+------+------+------+----+-----+----+
| Base|Object |Fn|Offset|Length|Target|Prio|Align|Swap|
+---------+-------------------------------+--+------+------+------+----+-----+----+
| 0x402|board.mb.conf | | 0x0| 0x1| | 0| | |
| 0x510|board.mb.conf | 3| 0x0| 0x2| | 0| | |
| 0x511|board.mb.conf | 4| 0x0| 0x1| | 0| | |
| 0xcf8|board.mb.nb.bridge.bank.io_regs| | 0xcf8| 0x4| | 0| | |
| 0xcf9|board.mb.sb.cf9 | | 0x0| 0x1| | 0| | |
| 0xcfc|board.mb.nb.bridge.bank.io_regs| | 0xcfc| 0x4| | 0| | |
| 0xcfd|board.mb.nb.bridge.bank.io_regs| | 0xcfd| 0x2| | 0| | |
| 0xcfe|board.mb.nb.bridge.bank.io_regs| | 0xcfe| 0x2| | 0| | |
| 0xcff|board.mb.nb.bridge.bank.io_regs| | 0xcff| 0x1| | 0| | |
| 0xfff0|board.mb.conf | | 0x0| 0x1| | 0| | |
| 0xfff1|board.mb.conf | 1| 0x0| 0x1| | 0| | |
| 0xfff2|board.mb.conf | 2| 0x0| 0x2| | 0| | |
| 0xfff4|board.mb.shadow | | 0x0| 0x1| | 0| | |
| 0xfff5|board.mb.shadow | | 0x1| 0x1| | 0| | |
|-default-|board.mb.nb.pci_bus.io_space | | 0x0| | | | | |
+---------+-------------------------------+--+------+------+------+----+-----+----+

To find a port_space object, you can try below command in CLI (command line interface) window:
simics> list-objects iface = port_space -all

Related

PySpark Windows Function with Conditional Reset

I have a dataframe like this
| user_id | acivity_date |
| -------- | ------------ |
| 49630701 | 1/1/2019 |
| 49630701 | 1/10/2019 |
| 49630701 | 1/28/2019 |
| 49630701 | 2/5/2019 |
| 49630701 | 3/10/2019 |
| 49630701 | 3/21/2019 |
| 49630701 | 5/25/2019 |
| 49630701 | 5/28/2019 |
| 49630701 | 9/10/2019 |
| 49630701 | 1/1/2020 |
| 49630701 | 1/10/2020 |
| 49630701 | 1/28/2020 |
| 49630701 | 2/10/2020 |
| 49630701 | 3/10/2020 |
What I would need to create is the "Group" column, the logic is For every User we need to retain the Group # until the cumulative date difference is less than 30 days, whenever the cumulative date difference is greater than 30 days then we need to increment the group # as well as reset the cumulative date difference to zero
| user_id | acivity_date | Group |
| -------- | ------------ | ----- |
| 49630701 | 1/1/2019 | 1 |
| 49630701 | 1/10/2019 | 1 |
| 49630701 | 1/28/2019 | 1 |
| 49630701 | 2/5/2019 | 2 | <- Cumulative date diff till here is 35, which is greater than 30, so increment the Group by 1 and reset the cumulative diff to 0
| 49630701 | 3/10/2019 | 3 |
| 49630701 | 3/21/2019 | 3 |
| 49630701 | 5/25/2019 | 4 |
| 49630701 | 5/28/2019 | 4 |
| 49630701 | 9/10/2019 | 5 |
| 49630701 | 1/1/2020 | 6 |
| 49630701 | 1/10/2020 | 6 |
| 49630701 | 1/28/2020 | 6 |
| 49630701 | 2/10/2020 | 7 |
| 49630701 | 3/10/2020 | 7 |
I tried with the below code with the loop, but it is not efficient, it is running for hours. Is there a better way to achieve this? Any help would be really appreciated
df= spark.read.table('excel_file)
df1 = df.select(col("user_id"), col("activity_date")).distinct()
partitionWindow = Window.partitionBy("user_id").orderBy(col("activity_date").asc())
lagTest = lag(col("activity_date"), 1, "0000-00-00 00:00:00").over(partitionWindow)
df1 = df1.select(col("*"), (datediff(col("activity_date"),lagTest)).cast("int").alias("diff_val_with_previous"))
df1 = df1.withColumn('diff_val_with_previous', when(col('diff_val_with_previous').isNull(), lit(0)).otherwise(col('diff_val_with_previous')))
distinctUser = [i['user_id'] for i in df1.select(col("user_id")).distinct().collect()]
rankTest = rank().over(partitionWindow)
df2 = df1.select(col("*"), rankTest.alias("rank"))
interimSessionThreshold = 30
totalSessionTimeThreshold = 30
rowList = []
for x in distinctUser:
tempDf = df2.filter(col("user_id") == x).orderBy(col('activity_date'))
cumulDiff = 0
group = 1
startBatch = True
len_df = tempDf.count()
dp = 0
for i in range(1, len_df+1):
r = tempDf.filter(col("rank") == i)
dp = r.select("diff_val_with_previous").first()[0]
cumulDiff += dp
if ((dp <= interimSessionThreshold) & (cumulDiff <= totalSessionTimeThreshold)):
startBatch=False
rowList.append([r.select("user_id").first()[0], r.select("activity_date").first()[0], group])
else:
group += 1
cumulDiff = 0
startBatch = True
dp = 0
rowList.append([r.select("user_id").first()[0], r.select("activity_date").first()[0], group])
ddf = spark.createDataFrame(rowList, ['user_id', 'activity_date', 'group'])
I can think of two solutions but none of them are matching exactly what you want :
from pyspark.sql import functions as F, Window
df.withColumn(
"idx", F.monotonically_increasing_id()
).withColumn(
"date_as_num", F.unix_timestamp("activity_date")
).withColumn(
"group", F.min("idx").over(Window.partitionBy('user_id').orderBy("date_as_num").rangeBetween(- 60 * 60 * 24 * 30, 0))
).withColumn(
"group", F.dense_rank().over(Window.partitionBy("user_id").orderBy("group"))
).show()
+--------+-------------+----------+-----------+-----+
| user_id|activity_date| idx|date_as_num|group|
+--------+-------------+----------+-----------+-----+
|49630701| 2019-01-01| 0| 1546300800| 1|
|49630701| 2019-01-10| 1| 1547078400| 1|
|49630701| 2019-01-28| 2| 1548633600| 1|
|49630701| 2019-02-05| 3| 1549324800| 2|
|49630701| 2019-03-10| 4| 1552176000| 3|
|49630701| 2019-03-21| 5| 1553126400| 3|
|49630701| 2019-05-25| 6| 1558742400| 4|
|49630701| 2019-05-28|8589934592| 1559001600| 4|
|49630701| 2019-09-10|8589934593| 1568073600| 5|
|49630701| 2020-01-01|8589934594| 1577836800| 6|
|49630701| 2020-01-10|8589934595| 1578614400| 6|
|49630701| 2020-01-28|8589934596| 1580169600| 6|
|49630701| 2020-02-10|8589934597| 1581292800| 7|
|49630701| 2020-03-10|8589934598| 1583798400| 8|
+--------+-------------+----------+-----------+-----+
or
df.withColumn(
"group",
F.datediff(
F.col("activity_date"),
F.lag("activity_date").over(
Window.partitionBy("user_id").orderBy("activity_date")
),
),
).withColumn(
"group", F.sum("group").over(Window.partitionBy("user_id").orderBy("activity_date"))
).withColumn(
"group", F.floor(F.coalesce(F.col("group"), F.lit(0)) / 30)
).withColumn(
"group", F.dense_rank().over(Window.partitionBy("user_id").orderBy("group"))
).show()
+--------+-------------+-----+
| user_id|activity_date|group|
+--------+-------------+-----+
|49630701| 2019-01-01| 1|
|49630701| 2019-01-10| 1|
|49630701| 2019-01-28| 1|
|49630701| 2019-02-05| 2|
|49630701| 2019-03-10| 3|
|49630701| 2019-03-21| 3|
|49630701| 2019-05-25| 4|
|49630701| 2019-05-28| 4|
|49630701| 2019-09-10| 5|
|49630701| 2020-01-01| 6|
|49630701| 2020-01-10| 6|
|49630701| 2020-01-28| 7|
|49630701| 2020-02-10| 7|
|49630701| 2020-03-10| 8|
+--------+-------------+-----+

pyspark check whether each name has 3 data

In pyspark, I have a DataFrame as follows. I want to check whether each name has 3 action data (0, 1, 2). If there are missing, add a new row, the score column is set to 0, and the other columns are unchanged(ex: str1, str2, str3).
+-----+--------+--------+--------+-------+-------+
| name| str1 | str2 | str3 | action| score |
+-----+--------+--------+--------+-------+-------+
| A | str_A1 | str_A2 | str_A3 | 0| 2|
| A | str_A1 | str_A2 | str_A3 | 1| 6|
| A | str_A1 | str_A2 | str_A3 | 2| 74|
| B | str_B1 | str_B2 | str_B3 | 0| 59|
| B | str_B1 | str_B2 | str_B3 | 1| 18|
| C | str_C1 | str_C2 | str_C3 | 0| 3|
| C | str_C1 | str_C2 | str_C3 | 1| 33|
| C | str_C1 | str_C2 | str_C3 | 2| 3|
+-----+--------+--------+--------+-------+-------+
For example, name B has no action 2, add a new row data as follows
+-----+--------+--------+--------+-------+-------+
| name| str1 | str2 | str3 | action| score |
+-----+--------+--------+--------+-------+-------+
| A | str_A1 | str_A2 | str_A3 | 0| 2|
| A | str_A1 | str_A2 | str_A3 | 1| 6|
| A | str_A1 | str_A2 | str_A3 | 2| 74|
| B | str_B1 | str_B2 | str_B3 | 0| 59|
| B | str_B1 | str_B2 | str_B3 | 1| 18|
| B | str_B1 | str_B2 | str_B3 | 2| 0|<---- new row data
| C | str_C1 | str_C2 | str_C3 | 0| 3|
| C | str_C1 | str_C2 | str_C3 | 1| 33|
| C | str_C1 | str_C2 | str_C3 | 2| 3|
+-----+--------+--------+--------+-------+-------+
It is also possible that there is only one row data for one name, and two new row data need to be added.
+-----+--------+--------+--------+-------+-------+
| name| str1 | str2 | str3 | action| score |
+-----+--------+--------+--------+-------+-------+
| A | str_A1 | str_A2 | str_A3 | 0| 2|
| A | str_A1 | str_A2 | str_A3 | 1| 6|
| A | str_A1 | str_A2 | str_A3 | 2| 74|
| B | str_B1 | str_B2 | str_B3 | 0| 59|
| B | str_B1 | str_B2 | str_B3 | 1| 18|
| B | str_B1 | str_B2 | str_B3 | 2| 0|
| C | str_C1 | str_C2 | str_C3 | 0| 3|
| C | str_C1 | str_C2 | str_C3 | 1| 33|
| C | str_C1 | str_C2 | str_C3 | 2| 3|
| D | str_D1 | str_D2 | str_D3 | 0| 45|
+-----+--------+--------+--------+-------+-------+
+-----+--------+--------+--------+-------+-------+
| name| str1 | str2 | str3 | action| score |
+-----+--------+--------+--------+-------+-------+
| A | str_A1 | str_A2 | str_A3 | 0| 2|
| A | str_A1 | str_A2 | str_A3 | 1| 6|
| A | str_A1 | str_A2 | str_A3 | 2| 74|
| B | str_B1 | str_B2 | str_B3 | 0| 59|
| B | str_B1 | str_B2 | str_B3 | 1| 18|
| B | str_B1 | str_B2 | str_B3 | 2| 0|
| C | str_C1 | str_C2 | str_C3 | 0| 3|
| C | str_C1 | str_C2 | str_C3 | 1| 33|
| C | str_C1 | str_C2 | str_C3 | 2| 3|
| D | str_D1 | str_D2 | str_D3 | 0| 45|
| D | str_D1 | str_D2 | str_D3 | 1| 0|<---- new row data
| D | str_D1 | str_D2 | str_D3 | 2| 0|<---- new row data
+-----+--------+--------+--------+-------+-------+
I am new to pyspark and don't know how to do this operation.
Thank you for your help.
Solution with a UDF
from pyspark.sql import functions as F, types as T
#F.udf(T.MapType(T.StringType(), T.IntegerType()))
def add_missing_values(values):
return {i: values.get(i, 0) for i in range(3)}
df = (
df.groupBy("name", "str1", "str2", "str3")
.agg(
F.map_from_entries(F.collect_list(F.struct("action", "score"))).alias("values")
)
.withColumn("values", add_missing_values(F.col("values")))
.select(
"name", "str1", "str2", "str3", F.explode("values").alias("action", "score")
)
)
df.show()
+----+------+------+------+------+-----+
|name| str1| str2| str3|action|score|
+----+------+------+------+------+-----+
| A|str_A1|str_A2|str_A3| 0| 2|
| A|str_A1|str_A2|str_A3| 1| 6|
| A|str_A1|str_A2|str_A3| 2| 74|
| B|str_B1|str_B2|str_B3| 0| 59|
| B|str_B1|str_B2|str_B3| 1| 18|
| B|str_B1|str_B2|str_B3| 2| 0|<---- new row data
| C|str_C1|str_C2|str_C3| 0| 3|
| C|str_C1|str_C2|str_C3| 1| 33|
| C|str_C1|str_C2|str_C3| 2| 3|
| D|str_D1|str_D2|str_D3| 0| 45|
| D|str_D1|str_D2|str_D3| 1| 0|<---- new row data
| D|str_D1|str_D2|str_D3| 2| 0|<---- new row data
+----+------+------+------+------+-----+
Full Spark solution :
df = (
df.groupBy("name", "str1", "str2", "str3")
.agg(
F.map_from_entries(F.collect_list(F.struct("action", "score"))).alias("values")
)
.withColumn(
"values",
F.map_from_arrays(
F.array([F.lit(i) for i in range(3)]),
F.array(
[F.coalesce(F.col("values").getItem(i), F.lit(0)) for i in range(3)]
),
),
)
.select(
"name", "str1", "str2", "str3", F.explode("values").alias("action", "score")
)
)

filter on data which are numeric

hi i have a dataframe with a column CODEARTICLE here is the dataframe
|CODEARTICLE| STRUCTURE| DES|TYPEMARK|TYP|IMPLOC|MARQUE|GAMME|TAR|
+-----------+-------------+--------------------+--------+---+------+------+-----+---+
| GENCFFRIST|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| GENCFFMARC|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| GENCFFESCO|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| GENCFFTNA|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| GENCFFEMBA|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| 789600010|9999999999998|xxxxxxxxxxxxxxxxx...| 7| 1| Local| | | |
| 799700040|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
| 799701000|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
| 899980490|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 9| Local| | | |
| 429600010|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
| 559970040|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| 679500010|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
| 679500040|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
| 679500060|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
+-----------+-------------+--------------------+--------+---+------+------+-----+---+
i would like to take only rows having a numeric CODEARTICLER
//connect to table TMP_STRUCTURE oracle
val spark = sparkSession.sqlContext
val articles_Gold = spark.load("jdbc",
Map("url" -> "jdbc:oracle:thin:System/maher#//localhost:1521/XE",
"dbtable" -> "IPTECH.TMP_ARTICLE")).select("CODEARTICLE", "STRUCTURE", "DES", "TYPEMARK", "TYP", "IMPLOC", "MARQUE", "GAMME", "TAR")
val filteredData =articles_Gold.withColumn("test",'CODEARTICLE.cast(IntegerType)).filter($"test"!==null)
thank you a lot
Use na.drop:
articles_Gold.withColumn("test",'CODEARTICLE.cast(IntegerType)).na.drop("test")
you can use .isNotNull function on the column in your filter function. You don't even need to create another column for your logic. You can simply do the following
val filteredData = articles_Gold.withColumn("CODEARTICLE",'CODEARTICLE.cast(IntegerType)).filter('CODEARTICLE.isNotNull)
I hope the answer is helpful

Spark groupby filter sorting with top 3 read articles each city

I have a table data like following :
+-----------+--------+-------------+
| City Name | URL | Read Count |
+-----------+--------+-------------+
| Gurgaon | URL1 | 3 |
| Gurgaon | URL3 | 6 |
| Gurgaon | URL6 | 5 |
| Gurgaon | URL4 | 1 |
| Gurgaon | URL5 | 5 |
| Delhi | URL3 | 4 |
| Delhi | URL7 | 2 |
| Delhi | URL5 | 1 |
| Delhi | URL6 | 6 |
| Punjab | URL6 | 5 |
| Punjab | URL4 | 1 |
| Mumbai | URL5 | 5 |
+-----------+--------+-------------+
I would like to see somthing like -> Top 3 Read article(if exists) each city
+-----------+--------+--------+
| City Name | URL | Count |
+-----------+--------+--------+
| Gurgaon | URL3 | 6 |
| Gurgaon | URL6 | 5 |
| Gurgaon | URL5 | 5 |
| Delhi | URL6 | 6 |
| Delhi | URL3 | 4 |
| Delhi | URL1 | 3 |
| Punjab | URL6 | 5 |
| Punjab | URL4 | 1 |
| Mumbai | URL5 | 5 |
+-----------+--------+--------+
I am working on Spark 2.0.2, Scala 2.11.8
You can use window function to get the output.
import org.apache.spark.sql.expressions.Window
val df = sc.parallelize(Seq(
("Gurgaon","URL1",3), ("Gurgaon","URL3",6), ("Gurgaon","URL6",5), ("Gurgaon","URL4",1),("Gurgaon","URL5",5)
("DELHI","URL3",4), ("DELHI","URL7",2), ("DELHI","URL5",1), ("DELHI","URL6",6),("Mumbai","URL5",5)
("Punjab","URL6",6), ("Punjab","URL4",1))).toDF("City", "URL", "Count")
df.show()
+-------+----+-----+
| City| URL|Count|
+-------+----+-----+
|Gurgaon|URL1| 3|
|Gurgaon|URL3| 6|
|Gurgaon|URL6| 5|
|Gurgaon|URL4| 1|
|Gurgaon|URL5| 5|
| DELHI|URL3| 4|
| DELHI|URL7| 2|
| DELHI|URL5| 1|
| DELHI|URL6| 6|
| Mumbai|URL5| 5|
| Punjab|URL6| 6|
| Punjab|URL4| 1|
+-------+----+-----+
val w = Window.partitionBy($"City").orderBy($"Count".desc)
val dfTop = df.withColumn("row", rowNumber.over(w)).where($"row" <= 3).drop("row")
dfTop.show
+-------+----+-----+
| City| URL|Count|
+-------+----+-----+
|Gurgaon|URL3| 6|
|Gurgaon|URL6| 5|
|Gurgaon|URL5| 5|
| Mumbai|URL5| 5|
| DELHI|URL6| 6|
| DELHI|URL3| 4|
| DELHI|URL7| 2|
| Punjab|URL6| 6|
| Punjab|URL4| 1|
+-------+----+-----+
Output tested on Spark 1.6.2
Window functions are probably the way to go, and there is a built-in function for this purpose:
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.{rank, desc}
val window = Window.partitionBy($"City").orderBy(desc("Count"))
val dfTop = df.withColumn("rank", rank.over(window)).where($"rank" <= 3)

how to output multiple (key,value) in spark map function

The format of input data likes below:
+--------------------+-------------+--------------------+
| StudentID| Right | Wrong |
+--------------------+-------------+--------------------+
| studentNo01 | a,b,c | x,y,z |
+--------------------+-------------+--------------------+
| studentNo02 | c,d | v,w |
+--------------------+-------------+--------------------+
And the format of output likes below():
+--------------------+---------+
| key | value|
+--------------------+---------+
| studentNo01,a | 1 |
+--------------------+---------+
| studentNo01,b | 1 |
+--------------------+---------+
| studentNo01,c | 1 |
+--------------------+---------+
| studentNo01,x | 0 |
+--------------------+---------+
| studentNo01,y | 0 |
+--------------------+---------+
| studentNo01,z | 0 |
+--------------------+---------+
| studentNo02,c | 1 |
+--------------------+---------+
| studentNo02,d | 1 |
+--------------------+---------+
| studentNo02,v | 0 |
+--------------------+---------+
| studentNo02,w | 0 |
+--------------------+---------+
The Right means 1 , The Wrong means 0.
I want to process these data using Spark map function or udf, But I don't know how to deal with it . Can you help me, please? Thank you.
Use split and explode twice and do the union
val df = List(
("studentNo01","a,b,c","x,y,z"),
("studentNo02","c,d","v,w")
).toDF("StudenID","Right","Wrong")
+-----------+-----+-----+
| StudenID|Right|Wrong|
+-----------+-----+-----+
|studentNo01|a,b,c|x,y,z|
|studentNo02| c,d| v,w|
+-----------+-----+-----+
val pair = (
df.select('StudenID,explode(split('Right,",")))
.select(concat_ws(",",'StudenID,'col).as("key"))
.withColumn("value",lit(1))
).unionAll(
df.select('StudenID,explode(split('Wrong,",")))
.select(concat_ws(",",'StudenID,'col).as("key"))
.withColumn("value",lit(0))
)
+-------------+-----+
| key|value|
+-------------+-----+
|studentNo01,a| 1|
|studentNo01,b| 1|
|studentNo01,c| 1|
|studentNo02,c| 1|
|studentNo02,d| 1|
|studentNo01,x| 0|
|studentNo01,y| 0|
|studentNo01,z| 0|
|studentNo02,v| 0|
|studentNo02,w| 0|
+-------------+-----+
You can convert to RDD as follows
val rdd = pair.map(r => (r.getString(0),r.getInt(1)))