I have a set of data whereby there is a column which can take the value of "BUYS" or "SELLs" and I have another column where the quantity is displayed (shown in absolute terms). I want to be able to query this data and make sure that when the value = "SELLS" I am multiplying the quantity by -1.
thanks
You could try using a vector conditional?
https://code.kx.com/q4m3/10_Execution_Control/#1013-vector-conditional-evaluation
I have a data file of some columns. I am performing some mathematical computations on the values for that purpose I want to map my non Integer value columns to Int and then after the operations on the values I want to remap them.
Following are my columns values
atom_id,molecule_id,element,type,charge
d100_1,d100,c,22,-0.128
d100_10,d100,h,3,0.132
d100_11,d100,c,29,0.002
d100_12,d100,c,22,-0.128
d100_13,d100,c,22,-0.128
Suppose I want to map only 2 columns and then remap those columns values only. I have searched for methods and found String Indexer but it maps all of the columns of the DF, I need to map only specific columns and then remap the values of those specific columns. Any help will be appreciated.
//edited Part
I have the following columns in my DataFrame
ind1,inda,logp,lumo,mutagenic,element
1,0,4.23,-1.246,yes,c
1,0,4.62,-1.387,yes,b
0,0,2.68,-1.034,no,h
1,0,6.26,-1.598,yes,c
1,0,2.4,-3.172,yes,a
Basically I am writing the code for synthetic Data Generation based on the given input data, so I want to use column values i.e ind1,inda,logp,lumo,mutagenic,element. single row at a time and after applying some math functions on it I will get a row which will consist of 6 values and each value will be representing the corresponding column value.
Now the problem is that all column values are of type double except mutagenic and element. I want to map this mutagenic and element columns to double values for example yes to 0 and No to 1 so that I can use them and then when I will receive the output row then I will reverse map that generated mutagenic value back to the corresponding string value using that mapping function.
Hope so I am clear this time
AVG function in PostgreSQL ignores NULL values when it calculates the average. But what if I want to count the average value of multiple columns with many NULL values?
All of below commands dont work
AVG(col1,col2,col3)
AVG(col1)+AVG(col2)+AVG(col3) -> sum calculation alone gives wrong value because of null calculation
This question is similar to this Average of multiple columns, but is there any simple solution for PostgreSQL specific case?
I have a dataset C1.txt that has one column named features.All the rows are string and represent x and y, The coordinates of a two-dimensional point. I want to change the type to double but when I'm doing that by this code:
from pyspark.sql.types import(StructField,StringType,IntegerType,StructType,DoubleType)
changedTypedf =df.withColumn("features", df["features"].cast(DoubleType()))
I receive null for all rows (before changing datatype).
I don't know what is the wrong,please help me solving this problem.
Thanks
I have a spark (scala) dataframe "Marketing" with approx 17 columns with 1 of them as "Balance". The data type of this column is Int. I need to find the median Balance. I can do upto arranging it in ascending order, but how to proceed after that? I have a given hint that the percentile function of scala can be used. I don't have any idea about this percentile function. Can anyone help?
Median is the same thing as the 50th percentile. If you do not mind using hive functions you can do one of the following:
marketingDF.selectExpr("percentile(CAST(Balance AS BIGINT), 0.5) AS median")
If you do not need an exact figure you can look into using percentile_approx() instead.
Documentation for both functions is located here.