Here is my Table:
CREATE TABLE mytable
(
id uuid,
day text,
mytime timestamp,
value text,
status int,
PRIMARY KEY ((id, day), mytime )
)
WITH CLUSTERING ORDER BY (mytime desc)
;
Here is the Index:
CREATE INDEX IF NOT EXISTS idx_status ON mytable (status);
When I run this select statement, I get the expected results:
select * from mytable
where id = 38403e1e-44b0-11e4-bd3d-005056a93afd
AND day = '2014-10-29'
;
62 rows are returned as a result of this query.
If I add to this query to include the index column:
select * from mytable
where id = 38403e1e-44b0-11e4-bd3d-005056a93afd
AND day = '2014-10-29'
AND status = 5
;
zero rows are returned. (there are several records with status = 5)
If I query the table...looking ONLY for a specific index value:
select * from mytable
where status = 5
;
zero rows are also returned.
I'm at a loss. I don't understand what exactly is taking place.
I am on a 3 node cluster, replication level 3. Cassandra 2.1.3
Could this be a configuration issue at all..in cassandra.yaml ?
Or...is there an issue with my select statement?
Appreciate the assistance, thanks.
UPDATE:
I am seeing this in the system.log file, ideas? ...
ERROR [CompactionExecutor:1266] 2015-03-24 15:20:26,596 CassandraDaemon.java:167 - Exception in thread Thread[CompactionExecutor:1266,1,main]
java.lang.AssertionError: /cdata/cassandra/data/my_table-c5f756b5318532afb494483fa1828675/my_table.idx_status-ka-32-Data.db
at org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:235) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:153) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:76) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:240) ~[apache-cassandra-2.1.3.jar:2.1.3]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_51]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_51]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_51]
I ran your steps above and was able to query rows by status=5 just fine. One thing I can suggest, is to try rebuilding your index. Try this from a command prompt:
nodetool rebuild_index mykeyspace mytable idx_status
Otherwise, IMO the best way to solve this, is not with a secondary index. If you know that you're going to have to support a query (especially with a large dataset) by status, then I would seriously consider building a specific, additional "query table" for it.
CREATE TABLE mytablebystatus (id uuid, day text, mytime timestamp, value text, status int,
PRIMARY KEY ((status),day,mytime,id));
This would support queries only by status, or status and day sorted by mytime. In summary, I would experiment with a few different PRIMARY KEY definitions, and see which better-suits your query patterns. That way, you can avoid having to use ill-performing secondary indexes all together.
Related
I have a table cusers with a primary key:
primary key(uid, lid, cnt)
And I try to insert some values into the table:
insert into cusers (uid, lid, cnt, dyn, ts)
values
(A, B, C, (
select C - cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 1
), now())
on conflict do nothing
Quite often (with the possibility of 98%) a row cannot be inserted to cusers because it violates the primary key constraint, so hard select queries do not need to be executed at all. But as I can see PostgreSQL first counts the select query as a result of dyn column and only then rejects row because of uid, lid, cnt violation.
What is the best way to insert rows quickly in such situation?
Another explanation
I have a system where one row depends on another. Here is an example:
(x, x, 2, 2, <timestamp>)
(x, x, 5, 3, <timestamp>)
Two columns contain an absolute value (2 and 5) and relative value (2, 5 - 2). Each time I insert new row it should:
avoid same rows (see primary key constraint)
if new row differs, it should count a difference and put it into the dyn column (so I take the last inserted row for the user according to the timestamp and subtract values).
Another solution I've found is to use returning uid, lid, ts for inserts and get user ids which were really inserted - this is how I know they have differences from existing rows. Then I update inserted values:
update cusers
set dyn = (
select max(cnt) - min(cnt)
from (
select cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 2) Table
)
where uid = A and lid = B and ts = TS
But it is not a fast approach either, as it seeks all over the ts column to find the two last inserted rows for each user. I need a fast insert query as I insert millions of rows at a time (but I do not write duplicates).
What the solution can be? May be I need a new index for this? Thanks in advance.
I have a table ErrorCase in postgres database. This table has one field case_id with datatype text. Its value is generated by format: yymmdd_xxxx. yymmdd is the date when the record insert to DB, xxxx is the number of record in that date.
For example, 3th error case on 2019/08/01 will have the case_id = 190801_0003. On 08/04, if there is one more case, its case_id will be 190804_0001, and go on.
I already using trigger in database to generate value for this field:
DECLARE
total integer;
BEGIN
SELECT (COUNT(*) + 1) INTO total FROM public.ErrorCase WHERE create_at = current_date;
IF (NEW.case_id is null) THEN
NEW.case_id = to_char(current_timestamp, 'YYMMDD_') || trim(to_char(total, '0000'));
END IF;
RETURN NEW;
END
And in Spring Project, I config the application properties for jpa/hibernates:
datasource:
type: com.zaxxer.hikari.HikariDataSource
url: jdbc:postgresql://localhost:5432/table_name
username: postgres
password: postgres
hikari:
poolName: Hikari
auto-commit: false
jpa:
database-platform: io.github.jhipster.domain.util.FixedPostgreSQL82Dialect
database: POSTGRESQL
show-sql: true
properties:
hibernate.id.new_generator_mappings: true
hibernate.connection.provider_disables_autocommit: true
hibernate.cache.use_second_level_cache: true
hibernate.cache.use_query_cache: false
hibernate.generate_statistics: true
Currently, it generates the case_id correctly.
However, when insert many records in nearly same time, it generates the same case_id for two record. I guess the reason is because of the isolation level. When the first transaction not yet committed, the second transaction do the SELECT query to build case_id. So, the result of SELECT query does not include the record from first query (because it has not committed yet). Therefore, the second case_id has the same result as the first one.
Please suggest me any solution for this problems, which isolation level is good for this case???
"yymmdd is the date when the record insert to DB, xxxx is the number of record in that date" - no offense but that is a horrible design.
You should have two separate columns, one date column and one integer column. If you want to increment the counter during an insert, make that date column the primary key and use insert on conflict. You can get rid that horribly inefficient trigger and more importantly that will be safe for concurrent modifications even with read committed.
Something like:
create table error_case
(
error_date date not null primary key,
counter integer not null default 1
);
Then use the following to insert rows:
insert into error_case (error_date)
values (date '2019-08-01')
on conflict (error_date) do update
set counter = counter + 1;
No trigger needed and safe for concurrent inserts.
If you really need a text column as a "case ID", create a view that returns that format:
create view v_error_case
as
select concat(to_char(error_date, 'yymmdd'), '_', to_char(counter, '0000')) as case_id,
... other columns
from error_case;
I have two tables: tire_life and transaction on my database.
It works fine when the subquery returns a value, but when no row is returned it does not update the tire_life table.
That's the query I'm running.
UPDATE tire_life as life SET covered_distance = initial_durability + transactions.durability
FROM (SELECT tire_life_id, SUM(tire_covered_distance) as durability
FROM transaction WHERE tire_life_id = 24 AND deleted_at IS NULL
GROUP BY (tire_life_id)) as transactions
WHERE life.id = 24
I have tried to use the COALESCE() function with no success at all.
You can use coalesce() for a subquery only if it returns single column. As you do not use tire_life_id outside the subquery, you can skip it:
UPDATE tire_life as life
SET covered_distance = initial_durability + transactions.durability
FROM (
SELECT coalesce(
(
SELECT SUM(tire_covered_distance)
FROM transaction
WHERE tire_life_id = 24 AND deleted_at IS NULL
GROUP BY (tire_life_id)
), 0) as durability
) as transactions
WHERE life.id = 24;
I guess you want to get 0 as durability if the subquery returns no rows.
I have a problem.... here is a part of sample query:
select * from tablename WHERE (id = "some4294-0643-4eaa-a262-7479c1859860" OR code = "some4294-0643-4eaa-a262-7479c1859860")
and deleted is null and blablablabla...>
And there are 2 indexes on this table: id and code.
If I'm querying tablename this way, indexes are not used.
In other query: select from (select * from tablename WHERE (id = "some4294-0643-4eaa-a262-7479c1859860" OR code = "some4294-0643-4eaa-a262-7479c1859860"))
where deleted is null and blablablabla...>
Indexes are used.
problem is my query is even more complex..... and I don't really want to deal with select in select.... but I really wish indexes to be used...
Is there any way to build index for 1st statement?
I am trying to create table (CTAS) from hive and want to write the file in BSON format, inorder to import it into MongoDb.
Here is my query:
create table if not exists rank_locn
ROW FORMAT SERDE "com.mongodb.hadoop.hive.BSONSerde"
STORED AS INPUTFORMAT "com.mongodb.hadoop.BSONFileInputFormat"
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
as
select RGN_OVRHD_NBR,DM_OVRHD_NBR,LOCN_NBR,Derived,
rank() OVER (ORDER BY DERIVED DESC) as NationalRnk,
rank() OVER (PARTITION BY RGN_OVRHD_NBR ORDER BY DERIVED DESC) as RegionRnk,
rank() OVER (PARTITION BY DM_OVRHD_NBR ORDER BY DERIVED DESC) as DistrictRnk
from Locn_Dim_Values
where Derived between -999999 and 999999;
Three jobs are launched. The last reduce job is failing. Error log is as follows:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":78133,"reducesinkkey1":143.82632293080053},"value":{"_col0":1,"_col1":12,"_col2":79233,"_col3":78133,"_col4":1634,"_col5":143.82632293080053},"alias":0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:274)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":78133,"reducesinkkey1":143.82632293080053},"value":{"_col0":1,"_col1":12,"_col2":79233,"_col3":78133,"_col4":1634,"_col5":143.82632293080053},"alias":0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:262)
... 7 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:91)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:637)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at org.apache.hadoop.hive.ql.exec.PTFOperator.executeWindowExprs(PTFOperator.java:341)
at org.apache.hadoop.hive.ql.exec.PTFOperator.processInputPartition(PTFOperator.java:198)
at org.apache.hadoop.hive.ql.exec.PTFOperator.processOp(PTFOperator.java:130)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Please help me resolve the issue.