I am trying to update a only one record on a specific date, where there are multiple records on that date... It doesn't matter which one. Below is the output I am looking for.
id date charge
01 1/1/2014 75
02 1/1/2014 0
03 1/1/2014 0
04 1/2/2014 75
05 1/3/2014 75
06 1/3/2014 0
it does matter which id is updated, i just need one updated on on that date. Any suggestion, or if you could point me in the right direction, it would be greatly appreciated.
Related
When using pushdown predicate with AWS Glue Dynamic frame, how does it iterate through a list?
For example, the following list was created to be used as a pushdown predicate:
day=list(p_day.select('day').toPandas()['day'])
month=list(p_month.select('month').na.drop().toPandas()['month'])
year=list(p_year.select('year').toPandas()['year'])
predicate = "day in (%s) and month in (%s) and year in (%s)"%(",".join(map(lambda s: "'"+str(s)+"'",dat))
,",".join(map(lambda s: "'"+str(s)+"'",month))
,",".join(map(lambda s: "'"+str(s)+"'",year)))
Let's say it returns this:
"day in ('07','15') and month in ('11','09','08') and year in ('2021')"
How would the push down predicate read this combination/list?
Is it:
day
month
year
07
11
2021
15
11
2021
07
09
2021
15
09
2021
07
08
2021
15
08
2021
-OR-
day
month
year
07
11
2021
15
11
2021
15
08
2021
15
09
2021
I have a feeling that this list is read like the first table rather than the latter... But, it's the latter that I would like to pass through as a pushdown predicate. Does creating a list essentially cause a permutation? It's as if the true day, month, and year combination is lost in the list which should be 11/7/2021, 11/15/2021, 08/15/2021, and 09/15/2021.
This has nothing to do with Glue itself, since the Partition Predicate is just basic Spark SQL. You will receive the first list and not the second. You would have to restructure the boolean expression to receive the second list.
I am having difficulty achieving this functionality in SPSS. The data set is formatted like this (apologies for the excel format)
In this example, the AGGREGATE function was used to combine the cases by the same variable. In other words, CITY, Tampa in the example, is the break variable.
Unfortunately, each entry for Tampa gives 10 unique temperatures for each day. So the first entry for Tampa is days 0-10, and the second is days 10-20, they provide useful information. I can't figure out how to use the aggregate function to create new variables to avoid losing these days. I want to do this, as I want to be able to run tests on the mean temperature in Tampa over days 0-20, relative to days 0-20 in other cities.
My current syntax is:
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=CITY
/Temp=Max(Temp).
But this doesn't create the variables, and I'm not sure where to start on that end. I checked the SPSS manual and didn't see this as an option within aggregate, any idea on what function might allow this functionality?
If I understand right, you are trying to reorganize all the CITY information into one line, and not to aggregate it. So what you are looking for is the restructure command casestovars.
First we'll create some fake data to demonstrate on:
data list list/City (a10) temp1 to temp10 (10f6).
begin data
Tampa 10 11 12 13 14 15 16 17 18 19
Boston 20 21 22 23 24 25 26 27 28 29
Tampa 30 31 32 33 34 35 36 37 38 39
NY 40 41 42 43 44 45 46 47 48 49
Boston 50 51 52 53 54 55 56 57 58 59
End data.
casestovars needs an index variable (e.g number of row within city). In your example your data doesn't have an index, so the following commands will create one:
sort cases by CITY.
if $casenum=1 or city<>lag(city) IndVar=1.
if city=lag(city) IndVar=lag(IndVar)+1.
format IndVar(f2).
Now we can restructure:
sort cases by CITY IndVar.
casestovars /id=CITY /index=IndVar/separator="_"/groupby=index.
This will also work if you have more rows per city.
Important note: my artificial index (IndVar) doesn't necessarily reflect the original order of rows in your file. If your file really doesn't contain an index and isn't ordered so the first row represents the first measurements etc', the restructured file will accordingly not be ordered either: the earlier measurements might appear on the left or on the right of the later ones - according to their order in the original file. To avoid this you should try to define a real index and use it in casestovars.
Run EXECUTE or Transform > Run Pending Transformations to see the results of the AGGREGATE command.
I have a question about timestamps hope you can help me.
I'm reading one timestamp column from excel to matlab using;
[temp, timestamps] = xlsread('2012_15min.xls', 'JAN', 'A25:A2999');
This column have date like this:
01-01-2012 00:00
01-01-2012 00:15
01-01-2012 00:30
01-01-2012 00:45
01-01-2012 01:00
(it goes on until the end of January in periods of 15 minutes)
Now I want to get a new column in matlab that keeps only year month day and hour, this data must be separated and I don't want to keep repetitive dates (e.g I don't want to get 4 dates with 01 01 2012 0 only one of that)
So I want to get:
01 01 2012 0
01 01 2012 1
01 01 2012 2
It must go until the end of January with periods of 1 hour.
If you know that there is data for every hour you could construct this directly, but if you have possible missing data and you therefore need to convert from your timestamps, then some combination of datestr/datenum/datevec is usually the best bet.
First, convert timestamps with datevec:
times = datevec(timestamps); % sometimes need to also use format string
Then, take only the year/month/day/hour, removing repetitions:
[times_hours,m,n] = unique(times(:,1:4), 'rows');
You can use the indices in m to extract the matching data for those times.
If you want this converted back to some sort of string you can use datestr and specify format:
timesout = datestr(times_hours,'dd mm yyyy hh');
I have been researching through the posted material on the site and can't figure if it is GROUP BY or PARTITION BY and I can't seem to get the answer I want.
What I have is an enormous amount of TAG data and what I need is to retrieve a history of locations that it was seen at i.e. show me the last time it was seen at a particular location.
I have added a small subset of the data.
What I would want to see is by date Order being newest first so I would expect to see ROW 01 as the first in location ...004 and then next entry would be ROW 209 as it it the latest entry for ...50 and then ROW 216 ..004 again so everytime there is a change.
Can this be done with a SQL Statement
Thanks for any help you can offer
ROW TAG DATE
001 004 2012-10-19 10:20
002 004 2012-10-19 10:10
003 004 2012-10-19 10:00
209 050 2012-10-19 08:50
210 050 2012-10-19 08:40
211 050 2012-10-19 08:30
216 004 2012-10-19 07:30
217 004 2012-10-19 02:20
Here is how it can be done:
with last_records_tagged as (
select
row,
tag,
date,
case when lead(tag) over (partition by tag order by date asc) is null
then 1
else 0
end as last
from your_table
)
select * from last_records_tagged where last = 1
(I ordered ascending by date because it makes more sense to me).
Basically, any time the next record, ordered by date, has a different tag, you know you are at the last record of that group.
I have a ssas cube with the fact table containing:
FactID
Status
StartDate
EndDate
the dates are linked to a date dimension (status to the status dimension).
Im trying to get a report that shows the amount of facts at a status on each day over a two week period, eg:
01 May 2011, 02 May 2011, 03 May 2011 etc...
status1 300 310 320 ...
status2 250 240 265 ...
status3 125 546 123 ...
I can obtain the data for a single day using the following:
select
{
[TOTAL NUMBER FACT]
} on 0
,{
descendants([DIM STATUS].[STATUS DESCRIPTION])
} on 1
from [DW_CUBE]
WHERE
([DIM HISTORY START DATE].[YEAR MONTH DAY].FirstMember:[DIM HISTORY START DATE].[YEAR MONTH DAY].&[20110501],
[DIM HISTORY END DATE].[YEAR MONTH DAY].&[20110501]:[DIM HISTORY END DATE].[YEAR MONTH DAY].LastMember)
but do i get this working for more than a single day?
Many many thanks
Have a look at the following links:
http://www.bp-msbi.com/2010/10/avoiding-multiple-role-playing-date-dimensions/
http://cwebbbi.wordpress.com/2011/01/21/solving-the-events-in-progress-problem-in-mdx-part-1/
In brief, you can use MDX to do this with LinkMember, or if you are counting events in progress - with counting from begin of time till now and subtracting one event from another.
You can also solve the problem with modelling - in my post by pivoting and in Chris's follow-up with role-playing measure groups.