I have a log data in mysql
id | value | date
1 | 10.2 | 2017-07-20 18:00:00
2 | 10.5 | 2017-07-20 18:00:01
3 | 10.3 | 2017-07-20 18:00:03
then transformed it into hash dan sorted set in redis.
This is my hashes:
hmset mylog:1 id 1 value 10.2 date 1388534400
hmset mylog:2 id 2 value 10.5 date 1388534401
hmset mylog:3 id 3 value 10.3 date 1388534402
and sorted set :
zadd log_date 1388534400 1
zadd log_date 1388534401 2
zadd log_date 1388534402 3
I want to perform query just like WHERE date beetween .... and ....
Is there any possible way to get data from hashes, based on date range in sorted set?
Thanks!
There are two ways possible.
Keep data in hashes, dates as unix timestamp in sorted set, and query the sorted set using ZRANGE to get the ids, then query the hashes with those ids
Another approach I would recommend if your MySQL row data is simple i.e. 2-3 columns with primitive values,is to store the data itself as a key in a Sorted set, with date being the score.
zadd log_date 1388534400 1_10.2
The position of the elements on splitting your key is fixed hence [0] index would give you the id,1 index would give you the value.
This way all your data would lie in the sorted set, and you can query the data using ZRANGE ( with WITHSCORE flag ) to fetch all the data along with the dates within the provided unix timestamp dates. This approach is memory efficient, and also saves you from the problem of data linking into two points where you would have to add or delete data in sorted set as well as the hash. Here only the sorted set is required.
To do that, first perform the query on the Sorted Set to obtain the members in the date range, and then fetch the relevant Hashes.
Related
Im a novice SPSS user and are working on a data set with two columns, customer ID and order date. I want to create a third variable with a month integer of number of inactive months since the observed customer ID:s last order date. This is how the data looks like:
This will create some sample data to demonstrate on:
data list list/ID (f3) OrderDate (adate10).
begin data
1 09/18/2016
1 03/02/2017
1 05/12/2017
2 06/06/2016
2 09/09/2017
end data.
Now you can run the following syntax to create a variable that contains the number of complete months between the date in the present row and the date in the previous row:
sort cases by ID OrderDate.
if ID=lag(ID) MonthSince=DATEDIF(OrderDate, lag(OrderDate), "months").
How would i write a statement that would make specific group by's looking at the monthly date range/difference. Example:
org_group | date | second_group_by
A 30.10.2013 1
A 29.11.2013 1
A 31.12.2013 1
A 30.01.2015 2
A 27.02.2015 2
A 31.03.2015 2
A 30.04.2015 2
as long es there isnt a monthly date_diff > 1 it should be in the same second_group_by. I hope its clear enough for you to understand, the column second_group_by should be generated by the user...it doesnt exists in the table.
date diff between which rows though?
If you just want to separate years (or months or weeks) use
GROUP BY DATEPART(....)
That's Sybase or SQL Server but other SQLs will have equivalent.
If you have specific data ranges, get them into a table with start and end date-time and a monotonically increasing integer, join to that with a BETWEEN and GROUP BY the integer.
So this is a new one, I am thinking. We have an Access query with 2 date fields fdate1 and fdate2. The fdate1 is always the first date, and fdate2 is always the second. The two are a range. What we need to do is query the table to find all the records where the record is at any point in the year 2010. So for instance, here is some pretend data:
Fname fdate1 fdate2
John 2/18/2008 5/08/2014
Mary 1/6/2010 6/21/2010
Jane 9/25/2010 4/13/2012
We need to know any records that involve the date range of 1/1/2010 - 12/31/2010. As you can see, the above records all match, but because they are 2 separate fields, I am not sure how to find that those 2 columns represent a date range and that date range does or does not overlap with the date range criteria. Make sense?
Any help is appreciated.
One approach would be to place the criteria >DateSerial(2010,1,1) on fdate2, and <DateSerial(2011,1,1) on fdate1.
select * from tablename
Where (fdate1 between '1/1/2010' and '12/31/2010') OR (fdate2 '1/1/2010' and '12/31/2010')
I have a Cassandra column family where I am storing a large number (hundreds of thousands) of events per month with timestamp (“Ymdhisu”) as the row key. It has multiple columns capturing some data for each event. I tried retrieving events data for a specific time range. For example for the month of Jan, I used the following CQL query:
a) Query between range Jan 1- Jan 15, 2013
select count(*) from Test where Key > 20130101070100000000 and Key <
20130115070100000000 limit 100000; Bad Request: Start key's md5 sorts
after end key's md5. This is not allowed; you probably should not
specify end key at all, under RandomPartitioner
b) Query between range Jan 1- Jan 10, 2013
select count(*) from Test where Key > 20130101070100000000 and Key <
20130110070100000000 limit 100000; count - 73264
c) Query between range Jan 1- Jan 2, 2013
select count(*) from Test where Key > 20130101070100000000 and Key <
20130102070100000000 limit 100000; count - 78328
It appears as though the range search simply is not working! The schema of my Columnfamily is:
Create column family Test with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type AND compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64};
To extract data, what are the suggestions? Do I need to redefine my schema with key validation class as TimeUUID type? Is there any other way to query efficiently without changing the schema?
I am dealing with at least 100-200K rows of data monthly in this column family. If this schema does not work for this purpose, what would be an appropriate Cassandra schema to store and retrieve the kind of data described here?
You can create secondary indexes such as "Date" and "Month", and store each event's Date and Month in those columns along with other data. When querying data, you can fetch all rows for specified months or days.
I dont think range query on Keys will work. Perhaps if you change your partitioner from RandomPartitioner to ByteOrderedPartitioner?
I have a rating table. It boils down to:
rating_value created
+2 april 3rd
-5 april 20th
So, every time someone gets rated, I track that rating event in the database.
I want to generate a rating history/time graph where the rating is the sum of all ratings up to that point in time on a graph.
I.E. A person's rating on April 5th might be select sum(rating_value) from ratings where created <= april 5th
The only problem with this approach is I have to run this day by day across the interval I'm interested in. Is there some trick to generating a running total using this sort of data?
Otherwise, I'm thinking the best approach is to create a denormalized "rating history" table alongside the individual ratings.
If you have postgresql 8.4, you can use a window-aggregate function to calculate a running sum:
steve#steve#[local] =# select rating_value, created,
sum(rating_value) over(order by created)
from rating;
rating_value | created | sum
--------------+------------+-----
2 | 2010-04-03 | 2
-5 | 2010-04-20 | -3
(2 rows)
See http://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
try to add a group by statement. that gives you the rating value for each day (in e.g. an array). as you output the rating value over time, you can just add the previous array elements together.