How to make a with_column become the first column w/o selecting all columns

How to make a with_column become the first column w/o selecting all columns - python-polars

Let's say I have a simple df
df=pl.DataFrame(
{
"date":["2022-01-01", "2022-01-02"],
"hroff":[5,2],
"minoff":[1,2]
}).with_column(pl.col('date').str.strptime(pl.Date,"%Y-%m-%d"))
If I want to add a column I can do df=df.with_column(pl.lit('abc').alias('newcolumn'))
but if I want that new column to be first is there a direct way to do that other than having to add .select(['newcolumn','date','hroff','minoff'])?

df.select([
pl.lit('abc').alias('newcolumn'),
pl.all()
])

Related

PySpak and dataframe : Another way to show the type of one specific colum?

I'm new with pYSPark and I'm struggling when I select one colum and I want to showh the type.
If I have a datagrame and want to show the types of all colums, this is what i do:
raw_df.printSchema()
If i want a specific column, i'm doig this but i'm sure we can do it faster:
new_df = raw_df.select( raw_df.annee)
new_df.printSchema()
Do i have to use select and store my colum in a new dataframe and use printchema()?
I tried something like this but it doesn't work:
raw_df.annee.printchema()
is there another way?

Do i have to use select and store my colum in a new dataframe and use printchema()
Not necessarily - take a look at this code:
raw_df = spark.createDataFrame([(1, 2)], "id: int, val: int")
print(dict(raw_df.dtypes)["val"])
int
The "val" is of course the column name you want to query.

"null" columns while creating derived column using withColumns

In my dataframe, I have a column named parent_asset_xid.
I want to create a new column parent_asset_sk which will be md5(parent_asset_xid) or 00000000-0000-0000-0000-000000000000 if parent_asset_xid is null.
I am trying something like this but I not sure how to integrate the md5 part in this
mydf.withColumn(
"parent_asset_sk",
when($"parent_asset_xid".isnull, "00000000-0000-0000-0000-000000000000")
)

You had most of it covered, you just need to add an otherwise to you when,
val newDF = yourDF.withColumn(
"parent_asset_sk",
when(
col("parent_asset_xid").isNotNull,
md5(col("parent_asset_xid"))
).otherwise(lit("00000000-0000-0000-0000-000000000000"))
)

Dataframe column substring based on the value during join

I have a dataframe with column having values like "COR//xxxxxx-xx-xxxx" or "xxxxxx-xx-xxxx"
I need to compare this column with another column in a different dataframe based on the column value.
If column value have "COR//xxxxx-xx-xxxx", I need to use substring("column", 4, length($"column")
If the column value have "xxxxx-xx-xxxx", I can compare directly without using substring.
For example:
val DF1 = DF2.join(DF3, upper(trim($"column1".substr(4, length($"column1")))) === upper(trim(DF3("column1"))))
I am not sure how to add the condition while joining. Could anyone please let me know how can we achieve this in Spark dataframe?

You can try adding a new column based on the conditions and join on the new column. Something like this.
val data = List("COR//xxxxx-xx-xxxx", "xxxxx-xx-xxxx")
val DF2 = ps.sparkSession.sparkContext.parallelize(data).toDF("column1")
val DF4 = DF2.withColumn("joinCol", when(col("column1").like("%COR%"),
expr("substring(column1, 6, length(column1)-1)")).otherwise(col("column1")) )
DF4.show(false)
The new column will have values like this.
+------------------+-------------+
|column1 |joinCol |
+------------------+-------------+
|COR//xxxxx-xx-xxxx|xxxxx-xx-xxxx|
|xxxxx-xx-xxxx |xxxxx-xx-xxxx|
+------------------+-------------+
You can now join based on the new column added.
val DF1 = DF4.join(DF3, upper(trim(DF4("joinCol"))) === upper(trim(DF3("column1"))))
Hope this helps.

Simply create a new column to use in the join:
DF2.withColumn("column2",
when($"column1" rlike "COR//.*",
$"column1".substr(lit(4), length($"column1")).
otherwise($"column1"))
Then use column2 in the join. It is also possible to add the whole when clause directly in the join but it would look very messy.
Note that to use a constant value in substr you need to use lit. And if you want to remove the whole "COR//" part, use 6 instead of 4.

How to select a row in kendogrid after filtering

I have a kendogrid. What I do is first filter it, which works fine.
After having it filtered, I want to select a specific row. I am sure I have the row in the result of the filter.
Example:
data = id=a1, id=a2, id=a3, id=a4, id=a5, id=a6
filter result:
id=a2, id=a4, id=a6
I would like to select the row a4.

First of all loop through the Grid's data which is currently displayed (i.e.
var arrayOfModels = $('#GridName').data().kendoGrid.dataSource.view();
next add the k-state-selected class to the row you want to make it selected
$('#GridName tbody [data-uid='+model.uid+']').addClass('.k-state-selected')
where model is the record from the arrayOfModels above which you need

How do I configure the Column names in a Scala Table?

I am writing a Scala program to manage a database, and have drawn all of the data into a 2-dimensional ArrayBuffer where row 0 is the column names, and the subsequent rows contain the info for each entry in the table.
When trying to put this into a Table, ho=w do I go about assigning the Column headers?
Syntax suggestions would be greatly appreciated.
Pseudocode:
Data=ArrayBuffer()
Data(0)={"Name","Birthday","ID"}
Data(1)={"Bob", "07/19/1986", "2354"}
Data(2)={"Sue", "05/07/1980", "2355"}
Data(3)={"Joe", "08/12/1992", "2356"}
Data(4)={"Jim", "11/20/1983", "2357"}
I want to put this into a Table where Data(0) describes the column headers, and the subsequent rows describe rows in the table, but I can't figure out how to set the row headers.

The easiest way to put data in a Table is to use its constructor:
new Table (rowData: Array[Array[Any]], columnNames: Seq[_])
The slightly tricky thing here is that arrays are not covariant (see Why doesn't the example compile, aka how does (co-, contra-, and in-) variance work?), which means that an Array[String] is not a subtype of Array[Any]. So you need some way of turning one into the other: a map does the job.
Also, for the column names to show, you need to put the table in a ScrollPane.
import swing._
import collection.mutable.ArrayBuffer
object Demo extends SimpleSwingApplication {
val data = ArrayBuffer(
Array("Name","Birthday","ID"),
Array("Bob", "07/19/1986", "2354"),
Array("Sue", "05/07/1980", "2355")
)
def top = new MainFrame {
contents = new ScrollPane {
contents = new Table(
data.tail.toArray map (_.toArray[Any]),
data.head
)
}
}
}
Will give you a table:
Edit: you can also use a cast: data.tail.toArray.asInstanceOf[Array[Array[Any]]], which is more efficient than mapping.

assuming you are talking of swing, if you put your table inside a scrollpane and create your table model based on the array buffer shown, the first row will be taken as column names by default.