hive row wise count by the type in data

hive row wise count by the type in data - hiveql

h1 h2 h3 h4 h5 h6 h7 h8
U U NULL U Y NULL Y X
U NULL U U Y Y X X
U U U NULL U NULL Y NULL
NULL NULL NULL NULL NULL NULL NULL NULL
X V U U Y NULL Z X
Y X NULL X Y Z U
X NULL U NULL NULL U Z Y
NULL NULL NULL NULL NULL NULL NULL NULL
Above data set is has 8 columns h1,h2,h3......h8. If all the all the column is NULL then the count is 0. If at least one column has value then the count is 1.
Like the first row has a count of 6.(do not consider null values).
And here X= boy,v= girl, u= wife, z=husbend, y= head. So, how can we find the count by type (ex. boy or wife or head etc) for every row-wise. Like how many wife and how many husbands or how many girls are there?

Related

Using Same column for lag on which lag is performed

I have an excel which has the below logic, and I am looking to convert the logic in PySpark.
col1 (Either Y/N)
derivedvalue ( logic: if col1 = N then 1 else previous row value value +1)
N
1
Y
2
Y
3
N
1
N
1
Y
2
Y
3
Y
4

counter_var =0
def cal_duplicate(input_flag):
global counter_var
if input_flag == 'N':
counter_var=1
return 1
else:
counter_var+=1
return counter_var
df['duplicate_counter'] = df.apply(lambda x : cal_duplicate(x['flag']),axis=1)

Minizinc - Implementing count constraint

I am working on this problem. I have two decision variables x and y. For same values in x, and corresponding same values in y, I have to count the occurences of this value in y.
I have tried to implement it. But I have doubts that the functionality is as expected.
My code
set of int: x = 1..4;
set of int: y = 1..3;
set of int: a = 1..10;
set of int: b = 1..5;
%Decision variables
array[x, y] of var a: recordA;
array[x, y] of var b: recordB;
constraint forall(i in x ) (
alldifferent([recordB[i,j] | j in y])
);
constraint forall(i in x ) (
alldifferent([recordA[i,j] | j in y])
);
%constraint forall(i,k in x, j in y where i<k /\ recordB[i,j]=recordB[k,j]) (
% forall(i,k in x,j in y where recordA[i,j]=recordA[k,j])(
% count(recordA, recordA[i,j])
%);
OR
%Maybe something like this
%constraint forall(i in x, j in y, m in b)(
% count(col(recordA, j),count(col(recordB,j),m))
% );
The idea is for every recorB and every recordA, to go through all the y (for all the x) and count how many values occur the same recordB at corresponding the same recordA.
For Example- for a value r1 in recordB->count of number of values r2 occuring in recordA.
In this case, recordB has three occurences of value 4 at recordB[1], recordB[6] and recordB[11]
correspondingly recordA has two occurrences of value 5 at recordA[1], recordA[6] but at recordA[11] the value is 1.
I want to count these occurrences of same values in recordA corresponding wuth recordB.
And each row of recordA is unique
And each row of recordB is unique
Thanks for suggestions

postgreSQL query over several very large tables with same columns , how to optimize it and its code

I am runining a following "simple query" from tables a1, a2, ..., a20. each table a1, a2, ...., a20 has milions of rows, and each of them have same columns, X, Y, Z.
CREATE TABLE A_bis as
SELECT
X, Y, Z
FROM a1
WHERE
Y= 3
UNION
SELECT
X, Y, Z
FROM a2
WHERE
Y= 3
UNION
SELECT
X, Y, Z
FROM a3
WHERE
Y= 3
UNION
...
SELECT
X, Y, Z
FROM a20
WHERE
Y= 3
and I get table A_bis, but it takes at least 20 minutes.
I'd like to:
a) optimize the query so it is faster.
b) improve the code (loop ? ) so I don't have to literally write a 7 lines for each of tables a1, .... a20 to get 130 lines of code

Comments answered your question A (Basically : Add an index on each aX table).
For the question B, you can use PostgreSQL inheritance:
CREATE TABLE aParent (x INT, y INT, z INT);
ALTER TABLE a1 INHERITS aParent;
ALTER TABLE a2 INHERITS aParent;
...
ALTER TABLE a20 INHERITS aParent;
Then you can do
SELECT X, Y, Z FROM aParent WHERE Y = 3;

HIVE : Not in clause

Is there any way to execute the following Sql query in HiveQL?
select * from my_table
where (a,b,c) not in (x,y,z)
where a,b,c correspond respectively to x,y,z
Thanks:)

You'll have to break these down to separate conditions:
SELECT *
FROM my_table
WHERE a != x AND b != y AND c != z

Is this what you intend?
where a <> x or b <> y or c <> z
Or this?
where a not in (x, y, z) and
b not in (x, y, z) and
c not in (x, y, z)
Or some other variation?

Did I apply pumping lemma correctly?

L = { w | w in {0,1}* and w has equal number of 0s and 1s }
Let n be the number of the pumping lemma.
I pick s = 0n 1n and y = 0t where 1 <= t <= n.
Which gives xyz = 0(n-t) 0t 1n= 0n 1n which is in L.
But xz = 0(n-t) 1n is not in L. Contradiction.
Did I apply it correct?

Hmmm ... You were almost ! there. Just in the last statement you are not pumping the string w = xyz at y.
Now we start by assuming that L is regular where L = { w | w in {0,1}* and w has equal number of 0s and 1s } and then we will go on to prove that for any i >= 0 the pumped string i.e w = xyiz does not contain the equal number of 0s and 1s ( a contradiction per se) therefore, the language is not regular :
L is given by :
L = {0n1n | n >= 0}
Iff y = 0t => w = 0n-t0t1n
Now after pumping y for i >= 0 we get
xyiz = 0n-t0it1n
-> xyiz = 0n+(i-1)t1n
Now since n+(i-1)t is not equal to n this contradicts our assumption that L = { w | w in {0,1}* and w has equal number of 0s and 1s } therefore xyiz does not belong to L
NOTE- You also need to consider other cases like y = 0t11 , y = 1t etc and later on prove that these do imply a contradiction.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

hive row wise count by the type in data - hiveql

Related

Using Same column for lag on which lag is performed

Minizinc - Implementing count constraint

postgreSQL query over several very large tables with same columns , how to optimize it and its code

HIVE : Not in clause

Did I apply pumping lemma correctly?

Categories

Resources