NTP Audit - failed adjtimex syscall?

NTP Audit - failed adjtimex syscall? - system-calls

As part of a new PCI-DSS server deployment I am in the process of configuring a fully auditable NTP time change history. All is working as expected however I am now seeing audit logs written every single second relating to time change operations. After a lot of searching I'm still no closer to understanding what is going on. The issue shows itself in /var/log/messages where an audit message is being written continuously.
My research suggests that the syscall "exit=5" message means that the clock was not properly synchronised:
adjtimex() syscall response "#define TIME_BAD 5 /* clock not synchronized */".
So, in summary it appears that the clock is synced correctly (as far as my understanding takes me) however it is constantly changing - unexpected behaviour with the polling interval set at the default 64s.
Is anyone able to offer suggestions? I've included as much detail as I can muster below:
Audit time rules:
[09:31] callum pci-fram-ipa1 ~ $ sudo cat /etc/audit/rules.d/audit_time_rules.rules
-a always,exit -F arch=b64 -S adjtimex -S settimeofday -k time-change
-a always,exit -F arch=b32 -S adjtimex -S settimeofday -S stime -k time-change
-a always,exit -F arch=b64 -S clock_settime -k time-change
-a always,exit -F arch=b32 -S clock_settime -k time-change
-w /etc/localtime -p wa -k time-change
System time vs clock time:
[09:14] callum pci-fram-ipa1 ~ $ sudo clock;date
Thu 05 Jan 2017 09:14:01 GMT -0.500708 seconds
Thu 5 Jan 09:14:01 GMT 2017
Example audit output:
[09:15] callum pci-fram-ipa1 ~ $ sudo tail -f /var/log/messages|grep time
Jan 5 09:15:25 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607725.390:2328215): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:26 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607726.390:2328216): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:27 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607727.390:2328217): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:28 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607728.390:2328218): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:29 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607729.390:2328219): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:30 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607730.390:2328220): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:31 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607731.390:2328221): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:32 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607732.390:2328222): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:33 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607733.390:2328223): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Jan 5 09:15:34 pci-fram-ipa1 audispd: node=pci-fram-ipa1.x.net type=SYSCALL msg=audit(1483607734.390:2328224): arch=c000003e syscall=159 success=yes exit=5 a0=7ffe85ddc320 a1=7ffe85ddc410 a2=861 a3=2 items=0 ppid=1 pid=11479 auid=4294967295 uid=38 gid=38 euid=38 suid=38 fsuid=38 egid=38 sgid=38 fsgid=38 tty=(none) ses=4294967295 comm="ntpd" exe="/usr/sbin/ntpd" subj=system_u:system_r:ntpd_t:s0 key="time-change"
Sync stats:
[09:15] callum pci-fram-ipa1 ~ $ sudo ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*neon.trippett.o 131.188.3.221 2 u 112 256 377 17.924 -0.704 0.252
+uno.alvm.me 193.79.237.14 2 u 196 256 377 19.737 0.505 0.436
+greenore.zeip.e 140.203.204.77 2 u 165 256 377 19.616 0.019 0.252
+devrandom.pl 87.124.126.49 3 u 124 256 377 19.675 0.371 0.572
Additional info:
[09:17] callum pci-fram-ipa1 ~ $ ntpdc -c sysinfo
system peer: neon.trippett.org
system peer mode: client
leap indicator: 00
stratum: 3
precision: -23
root distance: 0.03258 s
root dispersion: 0.04211 s
reference ID: [178.62.6.103]
reference time: dc188cec.d9ea15c5 Thu, Jan 5 2017 9:14:20.851
system flags: auth ntp stats
jitter: 0.000320 s
stability: 0.000 ppm
broadcastdelay: 0.000000 s
authdelay: 0.000000 s

This sounds like this could be expected behavior, based upon how often NTP skews a clock
From NTP documentation:
5.1.3.2. How frequently will the System Clock be updated?
As time should be a continuous and steady stream, ntpd updates the clock in small quantities. However, to keep up with clock errors, such corrections have to be applied frequently. If adjtime() is used, ntpd will update the system clock every second(I know this is not adjtimex, but adjtimex can function just as adjtime in the ADJ_OFFSET_SINGLESHOT mode: see adjtimex man page). If ntp_adjtime() is available, the operating system can compensate clock errors automatically, requiring only infrequent updates. See also Section 5.2 and Q: 5.1.6.1..
The polling interval has nothing to do with this though. It's instead how often the upstream(lower time stratum) time server is "queried" for reference.
If the problem is that you're seeing the audit entries and you don't wish to see them for the ntp user - and you only want to see nefarious time skews, then follow the advice from this link, and exclude the ntp uid/auid.
Also, from the adjtimex man page, it seems that the TIME_BAD error you see may not mean that the time was never correctly slewed:
TIME_ERROR The system clock is not synchronized to a reliable
server. This value is returned when any of the following
holds true:
* Either STA_UNSYNC or STA_CLOCKERR is set.
* STA_PPSSIGNAL is clear and either STA_PPSFREQ or
STA_PPSTIME is set.
* STA_PPSTIME and STA_PPSJITTER are both set.
* STA_PPSFREQ is set and either STA_PPSWANDER or
STA_PPSJITTER is set.
The symbolic name TIME_BAD is a synonym for TIME_ERROR,
provided for backward compatibility.

Related

How to extract the whole hours from a time range in Postgresql and get the duration of each extracted hour

I'm new to database (even more to postgres), so if you can help me. I have a table something like this:
id_interaction
start_time
end_time
0001
2022-06-03 12:40:10
2022-06-03 12:45:16
0002
2022-06-04 10:50:40
2022-06-04 11:10:12
0003
2022-06-04 16:30:00
2022-06-04 18:20:00
0004
2022-06-05 23:00:00
2022-06-06 10:30:12
Basically I need to create a query to get the duration doing a separation by hours, for example:
id_interaction
start_time
end_time
hour
duration
0001
2022-06-03 12:40:10
2022-06-03 12:45:16
12:00:00
00:05:06
0002
2022-06-04 10:50:40
2022-06-04 11:10:12
10:00:00
00:09:20
0002
2022-06-04 10:50:40
2022-06-04 11:10:12
11:00:00
00:10:12
0003
2022-06-04 16:30:00
2022-06-04 18:20:00
16:00:00
00:30:00
0003
2022-06-04 16:30:00
2022-06-04 18:20:00
17:00:00
01:00:00
0003
2022-06-04 16:30:00
2022-06-04 18:20:00
18:00:00
00:20:00
0004
2022-06-05 23:00:00
2022-06-06 03:30:12
23:00:00
01:00:00
0004
2022-06-05 23:00:00
2022-06-06 03:30:12
24:00:00
01:00:00
0004
2022-06-05 23:00:00
2022-06-06 03:30:12
01:00:00
01:00:00
0004
2022-06-05 23:00:00
2022-06-06 03:30:12
02:00:00
01:00:00
0004
2022-06-05 23:00:00
2022-06-06 03:30:12
03:00:00
00:30:12
I need all the hours from start to finish. For example: if an id starts at 17:10 and ends at 19:00, I need the duration of 17:00, 18:00 and 19:00

If you're trying to get the duration in each whole hour interval overlapped by your data, this can be achieved by rounding timestamps using date_trunc(), using generate_series() to move around the intervals and casting between time, interval and timestamp:
create or replace function hours_crossed(starts timestamp,ends timestamp)
returns integer
language sql as '
select case
when date_trunc(''hour'',starts)=date_trunc(''hour'',ends)
then 0
when date_trunc(''hour'',starts)=starts
then floor(extract(''epoch'' from ends-starts)::numeric/60.0/60.0)
else floor(extract(''epoch'' from ends-starts)::numeric/60.0/60.0) +1
end';
select * from (
select
id_interacao,
tempo_inicial,
tempo_final,
to_char(hora, 'HH24:00')::time as hora,
least(tempo_final, hora + '1 hour'::interval)
- greatest(tempo_inicial, hora)
as duracao
from (select
*,
date_trunc('hour',tempo_inicial)
+ (generate_series(0, hours_crossed(tempo_inicial,tempo_final))::text||' hours')::interval
as hora
from test_times
) a
) a
where duracao<>'0'::interval;
This also fixes your first entry that lasts 5 minutes but shows as 45.
You'll need to decide how you want to handle zero-length intervals and ones that end on an exact hour - I added a condition to skip them. Here's a working example.

Date range by week number Golang

With this simple function, I can get the week number. Now, with the number of the week, how can I get the date range, started on Sunday?
import (
"fmt"
"time"
)
func main() {
Week(time.Now().UTC())
}
func Week(now time.Time) string {
_, thisWeek := now.ISOWeek()
return "S" + strconv.Itoa(thisWeek)
}

Foreword: Time.ISOWeek() returns you the week number that starts on Monday, so I will answer your question that also handles weeks starting on Monday. Alter it to your needs if you want it to work with weeks starting on Sunday.
I released this utility in github.com/icza/gox, see timex.WeekStart().
The standard library does not provide a function that would return you the date range of a given week (year+week number). So we have to construct one ourselves.
And it's not that hard. We can start off from the middle of the year, align to the first day of the week (Monday), get the week of this time value, and corrigate: add as many days as the week difference multiplied by 7.
This is how it could look like:
func WeekStart(year, week int) time.Time {
// Start from the middle of the year:
t := time.Date(year, 7, 1, 0, 0, 0, 0, time.UTC)
// Roll back to Monday:
if wd := t.Weekday(); wd == time.Sunday {
t = t.AddDate(0, 0, -6)
} else {
t = t.AddDate(0, 0, -int(wd)+1)
}
// Difference in weeks:
_, w := t.ISOWeek()
t = t.AddDate(0, 0, (week-w)*7)
return t
}
Testing it:
fmt.Println(WeekStart(2018, 1))
fmt.Println(WeekStart(2018, 2))
fmt.Println(WeekStart(2019, 1))
fmt.Println(WeekStart(2019, 2))
Output (try it on the Go Playground):
2018-01-01 00:00:00 +0000 UTC
2018-01-08 00:00:00 +0000 UTC
2018-12-31 00:00:00 +0000 UTC
2019-01-07 00:00:00 +0000 UTC
One nice property of this WeekStart() implementation is that it handles out-of-range weeks nicely. That is, if you pass 0 for the week, it will be interpreted as the last week of the previous year. If you pass -1 for the week, it will designate the second to last week of the previous year. Similarly, if you pass max week of the year plus 1, it will be interpreted as the first week of the next year etc.
The above WeekStart() function only returns the given week's first day (Monday), because the last day of the week is always its first day + 6 days.
If we also need the last day:
func WeekRange(year, week int) (start, end time.Time) {
start = WeekStart(year, week)
end = start.AddDate(0, 0, 6)
return
}
Testing it:
fmt.Println(WeekRange(2018, 1))
fmt.Println(WeekRange(2018, 2))
fmt.Println(WeekRange(2019, 1))
fmt.Println(WeekRange(2019, 2))
Output (try it on the Go Playground):
2018-01-01 00:00:00 +0000 UTC 2018-01-07 00:00:00 +0000 UTC
2018-01-08 00:00:00 +0000 UTC 2018-01-14 00:00:00 +0000 UTC
2018-12-31 00:00:00 +0000 UTC 2019-01-06 00:00:00 +0000 UTC
2019-01-07 00:00:00 +0000 UTC 2019-01-13 00:00:00 +0000 UTC

The following does the work of finding the first day of week for me, although not from week number but from time. If you add an extra parameter - for the hard-coded time.Monday - that can be any day of week, e.g. Sunday.
func weekStartDate(date time.Time) time.Time {
offset := (int(time.Monday) - int(date.Weekday()) - 7) % 7
result := date.Add(time.Duration(offset*24) * time.Hour)
return result
}
Test:
func TestWeekStartDate(t *testing.T) {
date := time.Date(2020, 1, 1, 0, 0, 0, 0, time.UTC)
for i := 0; i < 2000; i++ {
weekStart := weekStartDate(date)
log.Printf("%s %s", date.Format("2006-01-02 Mon"), weekStart.Format("2006-01-02 Mon"))
assert.NotNil(t, weekStart)
assert.Equal(t, time.Monday, weekStart.Weekday())
date = date.Add(24 * time.Hour)
}
}
Output:
...
2021/01/17 08:50:03 2020-12-15 Tue 2020-12-14 Mon
2021/01/17 08:50:03 2020-12-16 Wed 2020-12-14 Mon
2021/01/17 08:50:03 2020-12-17 Thu 2020-12-14 Mon
2021/01/17 08:50:03 2020-12-18 Fri 2020-12-14 Mon
2021/01/17 08:50:03 2020-12-19 Sat 2020-12-14 Mon
2021/01/17 08:50:03 2020-12-20 Sun 2020-12-14 Mon
2021/01/17 08:50:03 2020-12-21 Mon 2020-12-21 Mon
2021/01/17 08:50:03 2020-12-22 Tue 2020-12-21 Mon
2021/01/17 08:50:03 2020-12-23 Wed 2020-12-21 Mon
2021/01/17 08:50:03 2020-12-24 Thu 2020-12-21 Mon
2021/01/17 08:50:03 2020-12-25 Fri 2020-12-21 Mon
2021/01/17 08:50:03 2020-12-26 Sat 2020-12-21 Mon
2021/01/17 08:50:03 2020-12-27 Sun 2020-12-21 Mon
2021/01/17 08:50:03 2020-12-28 Mon 2020-12-28 Mon
2021/01/17 08:50:03 2020-12-29 Tue 2020-12-28 Mon
2021/01/17 08:50:03 2020-12-30 Wed 2020-12-28 Mon
2021/01/17 08:50:03 2020-12-31 Thu 2020-12-28 Mon
2021/01/17 08:50:03 2021-01-01 Fri 2020-12-28 Mon
2021/01/17 08:50:03 2021-01-02 Sat 2020-12-28 Mon
2021/01/17 08:50:03 2021-01-03 Sun 2020-12-28 Mon
2021/01/17 08:50:03 2021-01-04 Mon 2021-01-04 Mon
2021/01/17 08:50:03 2021-01-05 Tue 2021-01-04 Mon
2021/01/17 08:50:03 2021-01-06 Wed 2021-01-04 Mon
2021/01/17 08:50:03 2021-01-07 Thu 2021-01-04 Mon
2021/01/17 08:50:03 2021-01-08 Fri 2021-01-04 Mon
2021/01/17 08:50:03 2021-01-09 Sat 2021-01-04 Mon
2021/01/17 08:50:03 2021-01-10 Sun 2021-01-04 Mon
2021/01/17 08:50:03 2021-01-11 Mon 2021-01-11 Mon
2021/01/17 08:50:03 2021-01-12 Tue 2021-01-11 Mon
2021/01/17 08:50:03 2021-01-13 Wed 2021-01-11 Mon
2021/01/17 08:50:03 2021-01-14 Thu 2021-01-11 Mon
2021/01/17 08:50:03 2021-01-15 Fri 2021-01-11 Mon
2021/01/17 08:50:03 2021-01-16 Sat 2021-01-11 Mon
2021/01/17 08:50:03 2021-01-17 Sun 2021-01-11 Mon
2021/01/17 08:50:03 2021-01-18 Mon 2021-01-18 Mon
2021/01/17 08:50:03 2021-01-19 Tue 2021-01-18 Mon
2021/01/17 08:50:03 2021-01-20 Wed 2021-01-18 Mon
2021/01/17 08:50:03 2021-01-21 Thu 2021-01-18 Mon
2021/01/17 08:50:03 2021-01-22 Fri 2021-01-18 Mon
2021/01/17 08:50:03 2021-01-23 Sat 2021-01-18 Mon
2021/01/17 08:50:03 2021-01-24 Sun 2021-01-18 Mon
2021/01/17 08:50:03 2021-01-25 Mon 2021-01-25 Mon
2021/01/17 08:50:03 2021-01-26 Tue 2021-01-25 Mon
2021/01/17 08:50:03 2021-01-27 Wed 2021-01-25 Mon
2021/01/17 08:50:03 2021-01-28 Thu 2021-01-25 Mon
2021/01/17 08:50:03 2021-01-29 Fri 2021-01-25 Mon
2021/01/17 08:50:03 2021-01-30 Sat 2021-01-25 Mon
2021/01/17 08:50:03 2021-01-31 Sun 2021-01-25 Mon
2021/01/17 08:50:03 2021-02-01 Mon 2021-02-01 Mon
2021/01/17 08:50:03 2021-02-02 Tue 2021-02-01 Mon
2021/01/17 08:50:03 2021-02-03 Wed 2021-02-01 Mon
2021/01/17 08:50:03 2021-02-04 Thu 2021-02-01 Mon
2021/01/17 08:50:03 2021-02-05 Fri 2021-02-01 Mon
...

Thanks to #prajwal Singh, I've found more generic to find out the start and last day of the week w.r.t month, week, and year
func DateRange(week, month, year int) (startDate, endDate time.Time) {
timeBenchmark := time.Date(year, time.Month(month), 1, 0, 0, 0, 0, time.UTC)
weekStartBenchmark := timeBenchmark.AddDate(0, 0, -(int(timeBenchmark.Weekday())+6)%7)
startDate = weekStartBenchmark.AddDate(0, 0, (week-1)*7)
endDate = startDate.AddDate(0, 0, 6)
return startDate, endDate
}

Thanks to #icza for the solution, found a way to simplify it even further in terms of logic:
func DateRange(week, year int) (startDate, endDate time.Time) {
timeBenchmark := time.Date(year, 7, 1, 0, 0, 0, 0, time.UTC)
weekStartBenchmark := timeBenchmark.AddDate(0, 0, -(int(timeBenchmark.Weekday())+6)%7)
_, weekBenchmark := weekStartBenchmark.ISOWeek()
startDate = weekStartBenchmark.AddDate(0, 0, (week-weekBenchmark)*7)
endDate = startDate.AddDate(0, 0, 6)
return startDate, endDate
}
Works fine as well.

How to get the lag of a column in a Spark streaming dataframe?

I have data streaming into my Spark Scala application in this format
id mark1 mark2 mark3 time
uuid1 100 200 300 Tue Aug 8 14:06:02 PDT 2017
uuid1 100 200 300 Tue Aug 8 14:06:22 PDT 2017
uuid2 150 250 350 Tue Aug 8 14:06:32 PDT 2017
uuid2 150 250 350 Tue Aug 8 14:06:52 PDT 2017
uuid2 150 250 350 Tue Aug 8 14:06:58 PDT 2017
I have it read into columns id, mark1, mark2, mark3 and time. The time is converted to datetime format as well.
I want to get this grouped by id and get the lag for mark1 which gives the previous row's mark1 value.
Something like this:
id mark1 mark2 mark3 prev_mark time
uuid1 100 200 300 null Tue Aug 8 14:06:02 PDT 2017
uuid1 100 200 300 100 Tue Aug 8 14:06:22 PDT 2017
uuid2 150 250 350 null Tue Aug 8 14:06:32 PDT 2017
uuid2 150 250 350 150 Tue Aug 8 14:06:52 PDT 2017
uuid2 150 250 350 150 Tue Aug 8 14:06:58 PDT 2017
Consider the dataframe to be markDF. I have tried:
val window = Window.partitionBy("uuid").orderBy("timestamp")
val newerDF = newDF.withColumn("prev_mark", lag("mark1", 1, null).over(window))`
which says non time windows cannot be applied on streaming/appending datasets/frames.
I have also tried:
val window = Window.partitionBy("uuid").orderBy("timestamp").rowsBetween(-10, 10)
val newerDF = newDF.withColumn("prev_mark", lag("mark1", 1, null).over(window))
To get a window for few rows which did not work either. The streaming window something like:
window("timestamp", "10 minutes")
cannot be used to send over the lag. I am super confused on how to do this. Any help would be awesome!!

I would advise you to change the time column into String as
+-----+-----+-----+-----+----------------------------+
|id |mark1|mark2|mark3|time |
+-----+-----+-----+-----+----------------------------+
|uuid1|100 |200 |300 |Tue Aug 8 14:06:02 PDT 2017|
|uuid1|100 |200 |300 |Tue Aug 8 14:06:22 PDT 2017|
|uuid2|150 |250 |350 |Tue Aug 8 14:06:32 PDT 2017|
|uuid2|150 |250 |350 |Tue Aug 8 14:06:52 PDT 2017|
|uuid2|150 |250 |350 |Tue Aug 8 14:06:58 PDT 2017|
+-----+-----+-----+-----+----------------------------+
root
|-- id: string (nullable = true)
|-- mark1: integer (nullable = false)
|-- mark2: integer (nullable = false)
|-- mark3: integer (nullable = false)
|-- time: string (nullable = true)
After that doing the following should work
df.withColumn("prev_mark", lag("mark1", 1).over(Window.partitionBy("id").orderBy("time")))
Which will give you output as
+-----+-----+-----+-----+----------------------------+---------+
|id |mark1|mark2|mark3|time |prev_mark|
+-----+-----+-----+-----+----------------------------+---------+
|uuid1|100 |200 |300 |Tue Aug 8 14:06:02 PDT 2017|null |
|uuid1|100 |200 |300 |Tue Aug 8 14:06:22 PDT 2017|100 |
|uuid2|150 |250 |350 |Tue Aug 8 14:06:32 PDT 2017|null |
|uuid2|150 |250 |350 |Tue Aug 8 14:06:52 PDT 2017|150 |
|uuid2|150 |250 |350 |Tue Aug 8 14:06:58 PDT 2017|150 |
+-----+-----+-----+-----+----------------------------+---------+

How to resample time vector data matlab

I have to resample the following cell array:
dateS =
'2004-09-02 06:00:00'
'2004-09-02 07:30:00'
'2004-09-02 12:00:00'
'2004-09-02 18:00:00'
'2004-09-02 19:30:00'
'2004-09-03 00:00:00'
'2004-09-03 05:30:00'
'2004-09-03 06:00:00'
following an irregular spacing, e.g. between 1st and 2nd rows there are 5 readings, while between 2 and 3rd there are 10. The number of intermediates 'readings' are stored in a vector 'v'. So, what I need is a new vector with all the intermediate dates/times in the same format at dateS.
EDIT:
There's 1h30min = 90min between the first 2 readings in the list. Five intervals b/w them amounts to 90 mins / 5 = 18 mins. Now insert five 'readings' between (1) and (2), each separated by 18mins. I need to do that for all the dateS.
Any ideas? Thanks!

You can interpolate the serial dates with interp1():
% Inputs
dates = [
'2004-09-02 06:00:00'
'2004-09-02 07:30:00'
'2004-09-02 12:00:00'
'2004-09-02 18:00:00'
'2004-09-02 19:30:00'
'2004-09-03 00:00:00'
'2004-09-03 05:30:00'
'2004-09-03 06:00:00'];
v = [5 4 3 2 4 5 3];
% Serial dates
serdates = datenum(dates,'yyyy-mm-dd HH:MM:SS');
% Interpolate
x = cumsum([1 v]);
resampled = interp1(x, serdates, x(1):x(end))';
The result:
datestr(resampled)
ans =
02-Sep-2004 06:00:00
02-Sep-2004 06:18:00
02-Sep-2004 06:36:00
02-Sep-2004 06:54:00
02-Sep-2004 07:12:00
02-Sep-2004 07:30:00
02-Sep-2004 08:37:30
02-Sep-2004 09:45:00
02-Sep-2004 10:52:30
02-Sep-2004 12:00:00
02-Sep-2004 14:00:00
02-Sep-2004 16:00:00
02-Sep-2004 18:00:00
02-Sep-2004 18:45:00
02-Sep-2004 19:30:00
02-Sep-2004 20:37:30
02-Sep-2004 21:45:00
02-Sep-2004 22:52:30
03-Sep-2004 00:00:00
03-Sep-2004 01:06:00
03-Sep-2004 02:12:00
03-Sep-2004 03:18:00
03-Sep-2004 04:24:00
03-Sep-2004 05:30:00
03-Sep-2004 05:40:00
03-Sep-2004 05:50:00
03-Sep-2004 06:00:00

The following code does what you want (I picked arbitrary values for v - as long as the number of elements in vector v is one less than the number of entries in dateS this should work):
dateS = [
'2004-09-02 06:00:00'
'2004-09-02 07:30:00'
'2004-09-02 12:00:00'
'2004-09-02 18:00:00'
'2004-09-02 19:30:00'
'2004-09-03 00:00:00'
'2004-09-03 05:30:00'
'2004-09-03 06:00:00'];
% "stations":
v = [6 5 4 3 5 6 4];
dn = datenum(dateS);
df = diff(dn)'./v;
newDates = [];
for ii = 1:numel(v)
newDates = [newDates dn(ii) + (0:v(ii))*df(ii)];
end
newStrings = datestr(newDates, 'yyyy-mm-dd HH:MM:SS');
The array newStrings ends up containing the following: for example, you can see that the interval between the first and second time has been split into 6 15 minute segments
2004-09-02 06:00:00
2004-09-02 06:15:00
2004-09-02 06:30:00
2004-09-02 06:45:00
2004-09-02 07:00:00
2004-09-02 07:15:00
2004-09-02 07:30:00
2004-09-02 08:24:00
2004-09-02 09:18:00
2004-09-02 10:12:00
2004-09-02 11:06:00
2004-09-02 12:00:00
2004-09-02 13:30:00
2004-09-02 15:00:00
2004-09-02 16:30:00
2004-09-02 18:00:00
2004-09-02 18:30:00
2004-09-02 19:00:00
2004-09-02 19:30:00
2004-09-02 20:24:00
2004-09-02 21:18:00
2004-09-02 22:12:00
2004-09-02 23:06:00
2004-09-03 00:00:00
2004-09-03 00:55:00
2004-09-03 01:50:00
2004-09-03 02:45:00
2004-09-03 03:40:00
2004-09-03 04:35:00
2004-09-03 05:30:00
2004-09-03 05:37:30
2004-09-03 05:45:00
2004-09-03 05:52:30
The code relies on a few concepts:
A date can be represented as a string or a datenum. I use built in functions to go between them
Once you have the date/time as a number, it is easy to interpolate
I use the diff function to find the difference between successive times
I don't attempt to "vectorize" the code - you were not asking for efficient code, and for an example like this the clarity of a for loop trumps everything.

Mongodb replica set polluted logs and arbiter in "initial startup"

I'm running a replica set on with MongoBD v.2.0.3, here is the latest status:
+-----------------------------------------------------------------------------------------------------------------------+
| Member |id|Up| cctime |Last heartbeat|Votes|Priority| State | Messages | optime |skew|
|--------------------+--+--+-----------+--------------+-----+--------+------------------+---------------+----------+----|
|127.0.1.1:27002 |0 |1 |13 hrs |2 secs ago |1 |1 |PRIMARY | |4f619079:2|1 |
|--------------------+--+--+-----------+--------------+-----+--------+------------------+---------------+----------+----|
|127.0.1.1:27003 |1 |1 |12 hrs |1 sec ago |1 |1 |SECONDARY | |4f619079:2|1 |
|--------------------+--+--+-----------+--------------+-----+--------+------------------+---------------+----------+----|
|127.0.1.1:27001 |2 |1 |2.5e+02 hrs|1 sec ago |1 |0 |SECONDARY (hidden)| |4f619079:2|-1 |
|--------------------+--+--+-----------+--------------+-----+--------+------------------+---------------+----------+----|
|127.0.1.1:27000 (me)|3 |1 |2.5e+02 hrs| |1 |1 |ARBITER |initial startup|0:0 | |
|--------------------+--+--+-----------+--------------+-----+--------+------------------+---------------+----------+----|
|127.0.1.1:27004 |4 |1 |9.5 hrs |2 secs ago |1 |1 |SECONDARY | |4f619079:2|-1 |
+-----------------------------------------------------------------------------------------------------------------------+
I'm puzzled by the following:
1) The arbiter always report the same message "initial startup" and "optime" of 0:0.
What does the "initial startup" mean, and is it normal for this message not to change?
Why the "optime" is always "0:0"?
2) What information does the skew column convey?
I've set up my replicas according to MongoDB's documentation and data seems to replicate across the set nicely, so no problem with that.
Another thing is that logs across all MongoDB hosts are polluted with such entries:
Thu Mar 15 03:25:29 [initandlisten] connection accepted from 127.0.0.1:38599 #112781
Thu Mar 15 03:25:29 [conn112781] authenticate: { authenticate: 1, nonce: "99e2a4a5124541b9", user: "__system", key: "417d42d26643b2c2d014b89900700263" }
Thu Mar 15 03:25:32 [clientcursormon] mem (MB) res:12 virt:244 mapped:32
Thu Mar 15 03:25:34 [conn112779] end connection 127.0.0.1:38586
Thu Mar 15 03:25:34 [initandlisten] connection accepted from 127.0.0.1:38602 #112782
Thu Mar 15 03:25:34 [conn112782] authenticate: { authenticate: 1, nonce: "a021e521ac9e19bc", user: "__system", key: "14507310174c89cdab3b82decb52b47c" }
Thu Mar 15 03:25:36 [conn112778] end connection 127.0.0.1:38585
Thu Mar 15 03:25:36 [initandlisten] connection accepted from 127.0.0.1:38604 #112783
Thu Mar 15 03:25:37 [conn112783] authenticate: { authenticate: 1, nonce: "58bcf511e040b760", user: "__system", key: "24c5b20886f6d390d1ea8ea1c61fd109" }
Thu Mar 15 03:26:00 [conn112781] end connection 127.0.0.1:38599
Thu Mar 15 03:26:00 [initandlisten] connection accepted from 127.0.0.1:38615 #112784
Thu Mar 15 03:26:00 [conn112784] authenticate: { authenticate: 1, nonce: "8a8f24fe012a03fe", user: "__system", key: "9b0be0c7fc790021b25aeb4511d85848" }
Thu Mar 15 03:26:01 [conn112780] end connection 127.0.0.1:38598
Thu Mar 15 03:26:01 [initandlisten] connection accepted from 127.0.0.1:38616 #112785
Thu Mar 15 03:26:01 [conn112785] authenticate: { authenticate: 1, nonce: "420808aa9a12947", user: "__system", key: "90e8654a2eb3981219c370208989e97a" }
Thu Mar 15 03:26:04 [conn112782] end connection 127.0.0.1:38602
Thu Mar 15 03:26:04 [initandlisten] connection accepted from 127.0.0.1:38617 #112786
Thu Mar 15 03:26:04 [conn112786] authenticate: { authenticate: 1, nonce: "b46ac4868db60973", user: "__system", key: "43cda53cc503bce942040ba8d3c6c3b1" }
Thu Mar 15 03:26:09 [conn112783] end connection 127.0.0.1:38604
Thu Mar 15 03:26:09 [initandlisten] connection accepted from 127.0.0.1:38621 #112787
Thu Mar 15 03:26:10 [conn112787] authenticate: { authenticate: 1, nonce: "20fae7ed47cd1780", user: "__system", key: "f7b81c2d53ad48343e917e2db9125470" }
Thu Mar 15 03:26:30 [conn112784] end connection 127.0.0.1:38615
Thu Mar 15 03:26:30 [initandlisten] connection accepted from 127.0.0.1:38632 #112788
Thu Mar 15 03:26:31 [conn112788] authenticate: { authenticate: 1, nonce: "38ee5b7b665d26be", user: "__system", key: "49c1f9f4e3b5cf2bf05bfcbb939ee422" }
Thu Mar 15 03:26:33 [conn112785] end connection 127.0.0.1:38616
It seems like many connections are established and dropped. Is that the replica set heartbeat?
Additional information
Arbiter config
dbpath=/var/lib/mongodb
logpath=/var/log/mongodb/mongodb.log
logappend=true
port = 27000
bind_ip = 127.0.1.1
rest = true
journal = true
replSet = myreplname
keyFile = /etc/mongodb/set.key
oplogSize = 8
quiet = true
Replica set member config
dbpath=/root/local/var/mongodb
logpath=/root/local/var/log/mongodb.log
logappend=true
port = 27002
bind_ip = 127.0.1.1
rest = true
journal = true
replSet = myreplname
keyFile = /root/local/etc/set.key
quiet = true
MongoDB instances are running on different machines and connect to each other over SSH tunnels setup in fully connected mesh.

The arbiter doesn't do anything besides participate in elections, so it has no further operations after startup. "Skew" is clock skew in seconds between this member and the others in the set. Yes, the connect / disconnect messages are heartbeats.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse