I have a table where a record may contain null values in one or more columns. I want to delete these records as long as it contains a null value. I'm wondering if there is any suggested way to do that in DolphinDB?
Try DolphinDB function rowAnd to specify the output conditions.
The following script is for your reference. It outputs rows of data only when the columns meet the set conditions (delete the records if NULL contained):
sym = take(`a`b`c, 110)
id = 1..100 join take(int(),10)
id2 = take(int(),10) join 1..100
t = table(sym, id,id2)
t[each(isValid, t.values()).rowAnd()]
The output can be found in the screenshot:
I have a field where I'd like to count the number of instances the field has the max number for that given column. For example, if the max value for a given column is 20, I want to know how many 20's are in that column. I've tried the following formula but I have received a "Cannot mix aggregate and non-aggregate arguments with this function."
IF [Field1] = MAX([Field1])
THEN 1
ELSE 0
END
Try
IF ATTR([Field1]) = MAX(['Field1'])
THEN 1
ELSE 0
END
ATTR() is an aggreation which will allow you to compare aggregate and non aggregate values. As long as the value you are aggregating with ATTR() contains unique values then this won't have an impact on your data.
I'm using the following query to sum the entire column. In the TOREMOVEALLPRIV column, I have both integer and null values.
I want to sum both null and integer values and print the total sum value.
Here is my query which print the sum values as null:
select
sum(URT.PRODSYS) as URT_SUM_PRODSYS,
sum(URT.Users) as URT_SUM_USERS,
sum(URT.total_orphaned) as URT_SUM_TOTAL_ORPHANED,
sum(URT.Bp_errors) as URT_SUM_BP_ERRORS,
sum(URT.Ma_errors) as URT_SUM_MA_ERRORS,
sum(URT.Pp_errors) as URT_SUM_PP_ERRORS,
sum(URT.REQUIREURTCBN) as URT_SUM_CBNREQ,
sum(URT.REQUIREURTQEV) as URT_SUM_QEVREQ,
sum(URT.REQUIREURTPRIV) as URT_SUM_PRIVREQ,
sum(URT.cbnperf) as URT_SUM_CBNPERF,
sum(URT.qevperf) as URT_SUM_QEVPERF,
sum(URT.privperf) as URT_SUM_PRIVPERF,
sum(URT.TO_REMOVEALLPRIV) as TO_REMOVEALLPRIV_SUM
from
URTCUSTSTATUS URT
inner join CUSTOMER C on URT.customer_id=C.customer_id;
Output image:
Expected Output:
Instead of null, I need to print sum of rows whichever have integers.
The SUM function automatically handles that for you. You said the column had a mix of NULL and numbers; the SUM automatically ignores the NULL values and correctly returns the sum of the numbers. You can read it on IBM Knowledge Center:
The function is applied to the set of values derived from the argument values by the elimination of null values.
Note: All aggregate functions ignore NULL values except the COUNT function. Example: if you have two records with values 5 and NULL, the SUM and AVG functions will both return 5, but the COUNT function will return 2.
However, it seems that you misunderstood why you're getting NULL as a result. It's not because the column contains null values, it's because there are no records selected. That's the only case when the SUM function returns NULL. If you want to return zero in this case, you can use the COALESCE or IFNULL function. Both are the same for this scenario:
COALESCE(sum(URT.TO_REMOVEALLPRIV), 0) as TO_REMOVEALLPRIV_SUM
or
IFNULL(sum(URT.TO_REMOVEALLPRIV), 0) as TO_REMOVEALLPRIV_SUM
I'm guessing that you want to do the same to all other columns in your query, so I'm not sure why you only complained about the TO_REMOVEALLPRIV column.
What you're looking for is the COALESCE function:
select
sum(URT.PRODSYS) as URT_SUM_PRODSYS,
sum(URT.Users) as URT_SUM_USERS,
sum(URT.total_orphaned) as URT_SUM_TOTAL_ORPHANED,
sum(URT.Bp_errors) as URT_SUM_BP_ERRORS,
sum(URT.Ma_errors) as URT_SUM_MA_ERRORS,
sum(URT.Pp_errors) as URT_SUM_PP_ERRORS,
sum(URT.REQUIREURTCBN) as URT_SUM_CBNREQ,
sum(URT.REQUIREURTQEV) as URT_SUM_QEVREQ,
sum(URT.REQUIREURTPRIV) as URT_SUM_PRIVREQ,
sum(URT.cbnperf) as URT_SUM_CBNPERF,
sum(URT.qevperf) as URT_SUM_QEVPERF,
sum(URT.privperf) as URT_SUM_PRIVPERF,
sum(COALESCE(URT.TO_REMOVEALLPRIV,0)) as TO_REMOVEALLPRIV_SUM
from
URTCUSTSTATUS URT
inner join CUSTOMER C on URT.customer_id=C.customer_id;
I have field Value in table finStatementTrans which is array.
How should I write select syntax with group by and sum by this field?
while select finStatementTable join DataClassParagraph,sum(Value) from finStatementTrans
group by finStatementTrans.DataClassParagraph
where finStatementTable.RecId == finStatementTrans.FinStatementTable_FK
&& finStatementTable.FinStatementTableParent_FK == 5637569094
{
info(strFmt(%1,%2",finStatementTrans.DataClassParagraph,finStatementTrans.Value[1]));
}
Is this correct?
sum(Value[1])
with this I can't compile.
As Aliaksandr Maksimau mentioned in his comment, aggregating array fields is not possible. Aggregations are only supported for integer and real data type fields.
See also X++ data selection and manipulation, paragraph select statements, last sentence.
I have a postgresql database and in one particular table, with many rows. One column in this table, called data, is a float array, REAL[], and gets filled with an array of ~4500 elements. I want to access this table through some query via SQLAlchemy and the ORM.
How do I select all rows in the table where a subset of this column satisfies some condition, e.g.contains a range of values? Like I want to select all rows where the data contains values >= 10, or values between >=10 and <=20.
Can I do this with a straight session query like
rows = session.query(Table).filter(Table.data.(some conditional)).all()
where my conditional is something like "VALUES >= 10 and VALUES <= 20"?
Or do I need to define some special methods, or setup, when I'm defining my SQLAlchemy table class. For example, I have my table set up as
class Table(Base):
__tablename__ = 'table'
__table_args__ = {'autoload' : True, 'schema' : 'testdb', 'extend_existing':True}
data = deferred(Column(ARRAY(Float)))
def __repr__(self):
return '<Table (pk={0})>'.format(self.pk)
Ideally I'd like to set it up so I can just do simple filtering in my session.query calls. Is this possible? I'm not super familiar with the ORM, so maybe it is?
I've had a look at the ARRAY Comparator sqlalchemy docs but those only seem to work on exact values. My data is precise to 6 sigfigs, and I don't know the exact values ahead of time.
What's the best way to do this? Thanks.
EDIT:
Based on the below comment, here is the code I'm using in attempting to select all rows (out of 1000) that have data (from 1 column) >= 1.0. There should be 537 rows.
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).all()
This gives the correct subset number. len(rows) = 537. However, I don't understand the logic of with this operator, where to select data >=1.0 , I use the le operator? Also, along those same lines, there should be 234 rows that have data between the values >=1.0 and <1.0, but this statement fails to give the correct subset..
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).filter(datadb.Table.data.any(1.2,operator=operators.ge)).all()
* EDIT 2 *
Here's an example of my database Table with a few rows. pk is an integer, and data is a real[].
db datadb
schema Table
pk data
0 [0.0,0.0,0.5,0.3,1.3,1.9,0.3,0.0,0.0]
1 [0.1,0.0,1.0,0.7,1.1,1.5,1.2,0.3,1.4]
2 [0.0,0.6,0.4,0.3,1.6,1.7,0.4,1.3,0.0]
3 [0.0,0.1,0.2,0.4,1.0,1.1,1.2,0.9,0.0]
4 [0.0,0.0,0.5,0.3,0.2,0.1,0.7,0.3,0.1]
I have 5 rows, 4 of them have data with values >= 1.0, while just 2 have values in the range >= 1.0 and <= 1.2. The query I would do to grab the rows is in the first case
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).all()
This should return the 4 rows, at pk=0,1,2,3. This query does what I expect. The second case
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le)).filter(datadb.Table.data.any(1.2,operator=operators.ge)).all()
and should return the 2 rows at pk=1,3. However this query just returns the 4 rows from the first query. For the second query, I also tried
rows = session.query(datadb.Table).filter(datadb.Table.data.any(1.0,operator=operators.le),datadb.Table.data.any(1.2,operator=operators.ge)).all()
which also didn't work.
Please read documentation on ARRAY.Comparator, according to which you should be able to do the following:
rows = (session.query(Table)
.filter(Table.data.any(10, operator=operators.le))
.filter(Table.data.any(20, operator=operators.ge)
).all()
EDIT:
# combined filter does not work,
# but applying one or the other is still useful as it reduces the result set
q = (session.query(MyTable)
.filter(MyTable.data.any(1.0, operator=operators.le))
# .filter(MyTable.data.any(1.2, operator=operators.ge))
)
# filter in memory
items = [_row for _row in q.all()
if any(1.0 <= item <= 1.2 for item in _row.data)]
for item in items:
print(item)