How do I configure the Column names in a Scala Table? - scala

I am writing a Scala program to manage a database, and have drawn all of the data into a 2-dimensional ArrayBuffer where row 0 is the column names, and the subsequent rows contain the info for each entry in the table.
When trying to put this into a Table, ho=w do I go about assigning the Column headers?
Syntax suggestions would be greatly appreciated.
Pseudocode:
Data=ArrayBuffer()
Data(0)={"Name","Birthday","ID"}
Data(1)={"Bob", "07/19/1986", "2354"}
Data(2)={"Sue", "05/07/1980", "2355"}
Data(3)={"Joe", "08/12/1992", "2356"}
Data(4)={"Jim", "11/20/1983", "2357"}
I want to put this into a Table where Data(0) describes the column headers, and the subsequent rows describe rows in the table, but I can't figure out how to set the row headers.

The easiest way to put data in a Table is to use its constructor:
new Table (rowData: Array[Array[Any]], columnNames: Seq[_])
The slightly tricky thing here is that arrays are not covariant (see Why doesn't the example compile, aka how does (co-, contra-, and in-) variance work?), which means that an Array[String] is not a subtype of Array[Any]. So you need some way of turning one into the other: a map does the job.
Also, for the column names to show, you need to put the table in a ScrollPane.
import swing._
import collection.mutable.ArrayBuffer
object Demo extends SimpleSwingApplication {
val data = ArrayBuffer(
Array("Name","Birthday","ID"),
Array("Bob", "07/19/1986", "2354"),
Array("Sue", "05/07/1980", "2355")
)
def top = new MainFrame {
contents = new ScrollPane {
contents = new Table(
data.tail.toArray map (_.toArray[Any]),
data.head
)
}
}
}
Will give you a table:
Edit: you can also use a cast: data.tail.toArray.asInstanceOf[Array[Array[Any]]], which is more efficient than mapping.

assuming you are talking of swing, if you put your table inside a scrollpane and create your table model based on the array buffer shown, the first row will be taken as column names by default.

Related

PySpak and dataframe : Another way to show the type of one specific colum?

I'm new with pYSPark and I'm struggling when I select one colum and I want to showh the type.
If I have a datagrame and want to show the types of all colums, this is what i do:
raw_df.printSchema()
If i want a specific column, i'm doig this but i'm sure we can do it faster:
new_df = raw_df.select( raw_df.annee)
new_df.printSchema()
Do i have to use select and store my colum in a new dataframe and use printchema()?
I tried something like this but it doesn't work:
raw_df.annee.printchema()
is there another way?
Do i have to use select and store my colum in a new dataframe and use printchema()
Not necessarily - take a look at this code:
raw_df = spark.createDataFrame([(1, 2)], "id: int, val: int")
print(dict(raw_df.dtypes)["val"])
int
The "val" is of course the column name you want to query.

pyspark add int column to a fixed date

I have a fixed date "2000/01/01" and a dataframe:
data1 = [{'index':1,'offset':50}]
data_p = sc.parallelize(data1)
df = spark.createDataFrame(data_p)
I want to create a new column by adding the offset column to this fixed date
I tried different method but cannot pass the column iterator and expr error as:
function is neither a registered temporary function nor a permanent function registered in the database 'default'
The only solution I can think of is
df = df.withColumn("zero",lit(datetime.strptime('2000/01/01', '%Y/%m/%d')))
df.withColumn("date_offset",expr("date_add(zero,offset)")).drop("zero")
Since I cannot use lit and datetime.strptime in the expr, I have to use this approach which creates a redundant column and redundant operations.
Any better way to do it?
As you have marked it as pyspark question so in python you can do below
df_a3.withColumn("date_offset",F.lit("2000-01-01").cast("date") + F.col("offset").cast("int")).show()
Edit- As per comment below lets assume there was an extra column of type then based on it below code can be used
df_a3.withColumn("date_offset",F.expr("case when type ='month' then add_months(cast('2000-01-01' as date),offset) else date_add(cast('2000-01-01' as date),cast(offset as int)) end ")).show()

Is there a Scala collection that maintains the order of insert?

I have a List:hdtList which contain columns that represent the columns of a Hive table:
forecast_id bigint,period_year bigint,period_num bigint,period_name string,drm_org string,ledger_id bigint,currency_code string,source_system_name string,source_record_type string,gl_source_name string,gl_source_system_name string,year string
I have a List: partition_columns which contains two elements: source_system_name, period_year
Using the List: partition_columns, I am trying to match them and move the corresponding columns in List: hdtList to the end of it as below:
val (pc, notPc) = hdtList.partition(c => partition_columns.contains(c.takeWhile(x => x != ' ')))
But when I print them as: println(notPc.mkString(",") + "," + pc.mkString(","))
I see the output unordered as below:
forecast_id bigint,period_num bigint,period_name string,drm_org string,ledger_id bigint,currency_code string,source_record_type string,gl_source_name string,gl_source_system_name string,year string,period string,period_year bigint,source_system_name string
The columns period_year comes first and the source_system_name last. Is there anyway I can make data as below so that the order of columns in the List: partition_columns is maintained.
forecast_id bigint,period_num bigint,period_name string,drm_org string,ledger_id bigint,currency_code string,source_record_type string,gl_source_name string,gl_source_system_name string,year string,period string,source_system_name string,period_year bigint
I know there is an option to reverse a List but I'd like to learn if I can implement a collection that maintains that order of insert.
It doesn't matter which collections you use; you only use partition_columns to call contains which doesn't depend on its order, so how could it be maintained?
But your code does maintain order: it's just hdtList's.
Something like
// get is ugly, but safe here
val pc1 = partition_columns.map(x => pc.find(y => y.startsWith(x)).get)
after your code will give you desired order, though there's probably more efficient way to do it.

change a dataframe row value with dynamic number of columns spark scala

I have a dataframe (contains 10 columns) for which I want to change the value of a row (for the last column only). I have written following code for this:
val newDF = spark.sqlContext.createDataFrame(WRADF.rdd.map(r=> {
Row(r.get(0), r.get(1),
r.get(2), r.get(3),
r.get(4), r.get(5),
r.get(6), r.get(7),
r.get(8), decrementCounter(r))
}), WRADF.schema)
I want to change the value of a row for 10th column only (for which I wrote decrementCounter() function). But the above code only runs for dataframes with 10 columns. I don't know how to convert this code so that it can run for different dataframe (with different number of columns). Any help will be appreciated.
Don't do something like this. Define udf
import org.apache.spark.sql.functions.udf._
val decrementCounter = udf((x: T) => ...) // adjust types and content to your requirements
df.withColumn("someName", decrementCounter($"someColumn"))
I think UDF will be a better choice because it can be applied using the Column name itself.
For more on udf you can take a look here : https://docs.databricks.com/spark/latest/spark-sql/udf-scala.html
For your code just use this :
import org.apache.spark.sql.functions.udf._
val decrementCounterUDF = udf(decrementCounter _)
df.withColumn("columnName", decrementCounterUDF($"columnName"))
What it will does is apply this decrementCounter function on each and every value of column columnName.
I hope this helps, cheers !

Iterate through all rows returned from an Scala Anorm query

I have a small Anorm query which is returning all the rows in the Service Messages table in my database. I would eventually like to turn each of these rows into JSON.
However, currently all I am doing is iterating through the elements of the first row with the .map function. How could I iterate through all rows so I can manipulate all the rows and turn it into a JSON object.
val result = DB.withConnection("my-db") { implicit connection =>
val messagesRaw = SQL("""
SELECT *
FROM ServiceMessages
""").apply;
messagesRaw.map(row =>
println(row[String]("title"))
)
}
Actually what you do IS iterating all the rows (not only the first one) taking contents of title column from each row.
In order to collect all titles you need the following trivial modification:
val titles = messagesRaw.map(row =>
row[String]("title")
)
Converting them to json (array) is simple as well:
import play.api.libs.json._
...
Ok(Json.toJson(titles))