Mongodb replica set polluted logs and arbiter in "initial startup" - mongodb

I'm running a replica set on with MongoBD v.2.0.3, here is the latest status:
+-----------------------------------------------------------------------------------------------------------------------+
| Member |id|Up| cctime |Last heartbeat|Votes|Priority| State | Messages | optime |skew|
|--------------------+--+--+-----------+--------------+-----+--------+------------------+---------------+----------+----|
|127.0.1.1:27002 |0 |1 |13 hrs |2 secs ago |1 |1 |PRIMARY | |4f619079:2|1 |
|--------------------+--+--+-----------+--------------+-----+--------+------------------+---------------+----------+----|
|127.0.1.1:27003 |1 |1 |12 hrs |1 sec ago |1 |1 |SECONDARY | |4f619079:2|1 |
|--------------------+--+--+-----------+--------------+-----+--------+------------------+---------------+----------+----|
|127.0.1.1:27001 |2 |1 |2.5e+02 hrs|1 sec ago |1 |0 |SECONDARY (hidden)| |4f619079:2|-1 |
|--------------------+--+--+-----------+--------------+-----+--------+------------------+---------------+----------+----|
|127.0.1.1:27000 (me)|3 |1 |2.5e+02 hrs| |1 |1 |ARBITER |initial startup|0:0 | |
|--------------------+--+--+-----------+--------------+-----+--------+------------------+---------------+----------+----|
|127.0.1.1:27004 |4 |1 |9.5 hrs |2 secs ago |1 |1 |SECONDARY | |4f619079:2|-1 |
+-----------------------------------------------------------------------------------------------------------------------+
I'm puzzled by the following:
1) The arbiter always report the same message "initial startup" and "optime" of 0:0.
What does the "initial startup" mean, and is it normal for this message not to change?
Why the "optime" is always "0:0"?
2) What information does the skew column convey?
I've set up my replicas according to MongoDB's documentation and data seems to replicate across the set nicely, so no problem with that.
Another thing is that logs across all MongoDB hosts are polluted with such entries:
Thu Mar 15 03:25:29 [initandlisten] connection accepted from 127.0.0.1:38599 #112781
Thu Mar 15 03:25:29 [conn112781] authenticate: { authenticate: 1, nonce: "99e2a4a5124541b9", user: "__system", key: "417d42d26643b2c2d014b89900700263" }
Thu Mar 15 03:25:32 [clientcursormon] mem (MB) res:12 virt:244 mapped:32
Thu Mar 15 03:25:34 [conn112779] end connection 127.0.0.1:38586
Thu Mar 15 03:25:34 [initandlisten] connection accepted from 127.0.0.1:38602 #112782
Thu Mar 15 03:25:34 [conn112782] authenticate: { authenticate: 1, nonce: "a021e521ac9e19bc", user: "__system", key: "14507310174c89cdab3b82decb52b47c" }
Thu Mar 15 03:25:36 [conn112778] end connection 127.0.0.1:38585
Thu Mar 15 03:25:36 [initandlisten] connection accepted from 127.0.0.1:38604 #112783
Thu Mar 15 03:25:37 [conn112783] authenticate: { authenticate: 1, nonce: "58bcf511e040b760", user: "__system", key: "24c5b20886f6d390d1ea8ea1c61fd109" }
Thu Mar 15 03:26:00 [conn112781] end connection 127.0.0.1:38599
Thu Mar 15 03:26:00 [initandlisten] connection accepted from 127.0.0.1:38615 #112784
Thu Mar 15 03:26:00 [conn112784] authenticate: { authenticate: 1, nonce: "8a8f24fe012a03fe", user: "__system", key: "9b0be0c7fc790021b25aeb4511d85848" }
Thu Mar 15 03:26:01 [conn112780] end connection 127.0.0.1:38598
Thu Mar 15 03:26:01 [initandlisten] connection accepted from 127.0.0.1:38616 #112785
Thu Mar 15 03:26:01 [conn112785] authenticate: { authenticate: 1, nonce: "420808aa9a12947", user: "__system", key: "90e8654a2eb3981219c370208989e97a" }
Thu Mar 15 03:26:04 [conn112782] end connection 127.0.0.1:38602
Thu Mar 15 03:26:04 [initandlisten] connection accepted from 127.0.0.1:38617 #112786
Thu Mar 15 03:26:04 [conn112786] authenticate: { authenticate: 1, nonce: "b46ac4868db60973", user: "__system", key: "43cda53cc503bce942040ba8d3c6c3b1" }
Thu Mar 15 03:26:09 [conn112783] end connection 127.0.0.1:38604
Thu Mar 15 03:26:09 [initandlisten] connection accepted from 127.0.0.1:38621 #112787
Thu Mar 15 03:26:10 [conn112787] authenticate: { authenticate: 1, nonce: "20fae7ed47cd1780", user: "__system", key: "f7b81c2d53ad48343e917e2db9125470" }
Thu Mar 15 03:26:30 [conn112784] end connection 127.0.0.1:38615
Thu Mar 15 03:26:30 [initandlisten] connection accepted from 127.0.0.1:38632 #112788
Thu Mar 15 03:26:31 [conn112788] authenticate: { authenticate: 1, nonce: "38ee5b7b665d26be", user: "__system", key: "49c1f9f4e3b5cf2bf05bfcbb939ee422" }
Thu Mar 15 03:26:33 [conn112785] end connection 127.0.0.1:38616
It seems like many connections are established and dropped. Is that the replica set heartbeat?
Additional information
Arbiter config
dbpath=/var/lib/mongodb
logpath=/var/log/mongodb/mongodb.log
logappend=true
port = 27000
bind_ip = 127.0.1.1
rest = true
journal = true
replSet = myreplname
keyFile = /etc/mongodb/set.key
oplogSize = 8
quiet = true
Replica set member config
dbpath=/root/local/var/mongodb
logpath=/root/local/var/log/mongodb.log
logappend=true
port = 27002
bind_ip = 127.0.1.1
rest = true
journal = true
replSet = myreplname
keyFile = /root/local/etc/set.key
quiet = true
MongoDB instances are running on different machines and connect to each other over SSH tunnels setup in fully connected mesh.

The arbiter doesn't do anything besides participate in elections, so it has no further operations after startup. "Skew" is clock skew in seconds between this member and the others in the set. Yes, the connect / disconnect messages are heartbeats.

Related

Returns the value in the next empty rows

Input
Here is an example of my input.
Number
Date
Motore
1
Fri Jan 01 00:00:00 CET 2021
Motore 1
2
Motore 2
3
Motore 3
4
Motore 4
5
Fri Feb 01 00:00:00 CET 2021
Motore 1
6
Motore 2
7
Motore 3
8
Motore 4
Expected Output
Number
Date
Motore
1
Fri Jan 01 00:00:00 CET 2021
Motore 1
2
Fri Jan 01 00:00:00 CET 2021
Motore 2
3
Fri Jan 01 00:00:00 CET 2021
Motore 3
4
Fri Jan 01 00:00:00 CET 2021
Motore 4
5
Fri Feb 01 00:00:00 CET 2021
Motore 1
6
Fri Feb 01 00:00:00 CET 2021
Motore 2
7
Fri Feb 01 00:00:00 CET 2021
Motore 3
8
Fri Feb 01 00:00:00 CET 2021
Motore 4
I tried to use the TmemorizeRows component but without any result, the second line is valorized but the others are not. Could you kindly help me.
You can solve this with a simple tMap with 2 inner variables (using the "var" array in the middle of the tMap)
Create two variables :
currentValue : you put in it the value of your input column date (in my example "row1.data").
updateValue : check whether currentValue is null or not : if null then you do not modify updateValue field . If not null then you update the field value. This way "updateValue" always contains not null data.
In output, just use "updateValue" variable.

Search data field by year in Janusgraph

I have a 'Date' property on my 'Patent' node class that is formatted like this:
==>Sun Jan 28 00:08:00 UTC 2007
==>Tue Jan 27 00:10:00 UTC 1987
==>Wed Jan 10 00:04:00 UTC 2001
==>Sun Jan 17 00:08:00 UTC 2010
==>Tue Jan 05 00:10:00 UTC 2010
==>Thu Jan 28 00:09:00 UTC 2010
==>Wed Jan 04 00:09:00 UTC 2012
==>Wed Jan 09 00:12:00 UTC 2008
==>Wed Jan 24 00:04:00 UTC 2018
And is stored as class java.util.Date in the database.
Is there a way to search this field to return all the 'Patents' for a particular year?
I tried variations of g.V().has("Patent", "date", 2000).values(). However, it doesn't return any results or an error message.
Is there a way to search this property field by year or do I need to create a separate property that just contains year?
You do not need to create a separate property for the year. JanusGraph recognizes the Date data type and can filter by date values.
gremlin> dateOfBirth1 = new GregorianCalendar(2000, 5, 6).getTime()
==>Tue Jun 06 00:00:00 MDT 2000
gremlin> g.addV("person").property("name", "Person 1").property("dateOfBirth", dateOfBirth1)
==>v[4144]
gremlin> dateOfBirth2 = new GregorianCalendar(2001, 5, 6).getTime()
==>Wed Jun 06 00:00:00 MDT 2001
gremlin> g.addV("person").property("name", "Person 2").property("dateOfBirth", dateOfBirth2)
==>v[4328]
gremlin> dateOfBirthFrom = new GregorianCalendar(2000, 0, 1).getTime()
==>Sat Jan 01 00:00:00 MST 2000
gremlin> dateOfBirthTo = new GregorianCalendar(2001, 0, 1).getTime()
==>Mon Jan 01 00:00:00 MST 2001
gremlin> g.V().hasLabel("person").
......1> has("dateOfBirth", gte(dateOfBirthFrom)).
......2> has("dateOfBirth", lt(dateOfBirthTo)).
......3> values("name")
==>Person 1

Why is only one data displayed in LineChart?

I got all the Reportstuff working, get a PDF generated and redirected as servlet response.
The Report opens up and the data I pass is there (tested with textfields...data is there). When I try to display the data in a line chart, it just shows one data point. I have grouped the chart on the 'day' data, as i want to have a new chart for every day (which also works, as there are 31 charts for my 31 days in testdata.
Plus I have multiple lines to display, bound by the 'type' data. So I want four lines on every diagram: the data itself (type=1), the min-value (type=2), the average value(type=3) and the max-value (type=3). But it just won't work...Where is my mistake?
Here's the code:
JasperReport:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Created with Jaspersoft Studio version 6.6.0.final using JasperReports Library version 6.6.0 -->
<jasperReport xmlns="http://jasperreports.sourceforge.net/jasperreports" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports http://jasperreports.sourceforge.net/xsd/jasperreport.xsd" name="jasper_report_template" pageWidth="595" pageHeight="842" whenNoDataType="NoPages" columnWidth="515" leftMargin="40" rightMargin="40" topMargin="50" bottomMargin="50" uuid="c5395651-075a-4d6a-a971-d283cab77f63">
<parameter name="ReportTitle" class="java.lang.String"/>
<parameter name="Author" class="java.lang.String"/>
<queryString>
<![CDATA[]]>
</queryString>
<field name="day" class="java.lang.Integer">
<fieldDescription><![CDATA[day]]></fieldDescription>
</field>
<field name="type" class="java.lang.Integer">
<fieldDescription><![CDATA[type]]></fieldDescription>
</field>
<field name="time" class="java.lang.String">
<fieldDescription><![CDATA[time]]></fieldDescription>
</field>
<field name="kwh" class="java.lang.Double">
<fieldDescription><![CDATA[kwh]]></fieldDescription>
</field>
<sortField name="day"/>
<sortField name="type"/>
<group name="DayGroup" minHeightToStartNewPage="60">
<groupExpression><![CDATA[$F{day}]]></groupExpression>
<groupHeader>
<band height="210" splitType="Stretch">
<lineChart>
<chart evaluationTime="Group" evaluationGroup="DayGroup">
<reportElement x="-20" y="10" width="560" height="200" uuid="9abe6b47-1c6f-4bd0-a684-abf9e91b4adc"/>
<chartTitle/>
<chartSubtitle/>
<chartLegend/>
</chart>
<categoryDataset>
<dataset resetType="Group" resetGroup="DayGroup" incrementType="Group" incrementGroup="DayGroup"/>
<categorySeries>
<seriesExpression><![CDATA[$F{type}]]></seriesExpression>
<categoryExpression><![CDATA[$F{time}]]></categoryExpression>
<valueExpression><![CDATA[$F{kwh}]]></valueExpression>
</categorySeries>
</categoryDataset>
<linePlot>
<plot/>
<categoryAxisFormat>
<axisFormat labelColor="#000000" tickLabelColor="#000000" axisLineColor="#000000"/>
</categoryAxisFormat>
<valueAxisFormat>
<axisFormat labelColor="#000000" tickLabelColor="#000000" axisLineColor="#000000"/>
</valueAxisFormat>
</linePlot>
</lineChart>
</band>
</groupHeader>
</group>
</jasperReport>
Servlet:
JRBeanCollectionDataSource jrbcds = new JRBeanCollectionDataSource(BasicDataHandler.getInstance().getThatBeanList());
String path = getServletContext().getRealPath("/reports/");
JasperReport jasReport = JasperCompileManager.compileReport(path+ "/template_all_month_line_graph.jrxml");
JasperPrint jasPrint;
jasPrint = JasperFillManager.fillReport(jasReport, null, jrbcds);
ServletOutputStream sos = response.getOutputStream();
response.setContentType("application/pdf");
JasperExportManager.exportReportToPdfStream(jasPrint, sos);
BasicDataHandler:
package sc2pdf.datahandlers;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import net.sf.jasperreports.engine.data.JRBeanCollectionDataSource;
import sc2pdf.datasets.DailyDataSet;
import sc2pdf.javabeans.DataObject;
public class BasicDataHandler {
private static BasicDataHandler myself=null;
private Map<Integer, DailyDataSet> data = null;
private Map<Integer, DailyDataSet> min_data = null;
private Map<Integer, DailyDataSet> avg_data = null;
private Map<Integer, DailyDataSet> max_data = null;
public BasicDataHandler() {
data = new HashMap<Integer, DailyDataSet>(); //holds all the real data values; key: DAY_OF_MONTH; value DailyDataSet for every day of month
min_data = new HashMap<Integer, DailyDataSet>(); //holds all the min data values; key: DAY_OF_WEEK; value DailyDataSet for every day of week
avg_data = new HashMap<Integer, DailyDataSet>(); //holds all the average data values; key: DAY_OF_WEEK; value DailyDataSet for every day of week
max_data = new HashMap<Integer, DailyDataSet>(); //holds all the max data values; key: DAY_OF_WEEK; value DailyDataSet for every day of week
}
public static BasicDataHandler getInstance() {
if (myself == null) {
myself = new BasicDataHandler();
}
return myself;
}
private void registerDailyDataSet(DailyDataSet ds) {
Calendar c = Calendar.getInstance();
c.setTime(ds.getDay());
Integer tmp = new Integer(c.get(Calendar.DAY_OF_MONTH));
data.put(tmp, ds);
}
private void registerCalcValueMinDailyDataSet(DailyDataSet ds) {
Calendar c = Calendar.getInstance();
c.setTime(ds.getDay());
Integer tmp = new Integer(c.get(Calendar.DAY_OF_WEEK));
min_data.put(tmp, ds);
}
private void registerCalcValueAvgDailyDataSet(DailyDataSet ds) {
Calendar c = Calendar.getInstance();
c.setTime(ds.getDay());
Integer tmp = new Integer(c.get(Calendar.DAY_OF_WEEK));
avg_data.put(tmp, ds);
}
private void registerCalcValueMaxDailyDataSet(DailyDataSet ds) {
Calendar c = Calendar.getInstance();
c.setTime(ds.getDay());
Integer tmp = new Integer(c.get(Calendar.DAY_OF_WEEK));
max_data.put(tmp, ds);
}
public void registerDataObject(DataObject dob) {
Calendar c = Calendar.getInstance();
c.setTime(dob.getTimeAsDate());
Integer key = c.get(Calendar.DAY_OF_MONTH);
DailyDataSet ds = data.get(key);
if(ds == null) {
registerDailyDataSet(new DailyDataSet(dob.getTimeAsDate()));
ds = data.get(key);
}
ds.addDataObject(dob);
Integer keyCalcValues = c.get(Calendar.DAY_OF_WEEK);
ds = min_data.get(keyCalcValues);
DataObject t;
if(ds == null) {
registerCalcValueMinDailyDataSet(new DailyDataSet(dob.getTimeAsDate()));
ds = min_data.get(keyCalcValues);
ds.addDataObject(new DataObject(dob.getTimeAsDate(),dob.getKwh(),DataObject.TYPE_MIN));
}else{
t = ds.getDataObjectByWeekday(dob.getTimeAsDate());
if(t==null) {
t=new DataObject(dob.getTimeAsDate(),dob.getKwh(),DataObject.TYPE_MIN);
ds.addDataObject(t);
}
if(t.getKwh()>dob.getKwh())
t.setKwh(dob.getKwh());
}
ds = avg_data.get(keyCalcValues);
if(ds == null) {
registerCalcValueAvgDailyDataSet(new DailyDataSet(dob.getTimeAsDate()));
ds = avg_data.get(keyCalcValues);
ds.addDataObject(new DataObject(dob.getTimeAsDate(),dob.getKwh(),DataObject.TYPE_AVG));
}else{
t = ds.getDataObjectByWeekday(dob.getTimeAsDate());
if(t==null) {
t=new DataObject(dob.getTimeAsDate(),dob.getKwh(),DataObject.TYPE_AVG);
ds.addDataObject(t);
}
if(t.getKwh()>dob.getKwh())
t.setKwh(dob.getKwh());
}
ds = max_data.get(keyCalcValues);
if(ds == null) {
registerCalcValueMaxDailyDataSet(new DailyDataSet(dob.getTimeAsDate()));
ds = max_data.get(keyCalcValues);
ds.addDataObject(new DataObject(dob.getTimeAsDate(),dob.getKwh(),DataObject.TYPE_MAX));
}else{
t = ds.getDataObjectByWeekday(dob.getTimeAsDate());
if(t==null) {
t=new DataObject(dob.getTimeAsDate(),dob.getKwh(),DataObject.TYPE_MAX);
ds.addDataObject(t);
}
if(t.getKwh()<dob.getKwh())
t.setKwh(dob.getKwh());
}
}
public List<DataObject> getThatBeanList() {
List<DataObject> rl = new ArrayList<DataObject>();
Iterator<Integer> iter = data.keySet().iterator();
Calendar c = Calendar.getInstance();
while (iter.hasNext()) {
Integer keyToDo = iter.next();
DailyDataSet dds = data.get(keyToDo);
rl.addAll(dds.getData());
c.setTime(dds.getDay());
Integer dayKey = c.get(Calendar.DAY_OF_WEEK);
DailyDataSet dsmin = min_data.get(dayKey);
rl.addAll(dsmin.getData());
DailyDataSet dsavg = avg_data.get(dayKey);
rl.addAll(dsavg.getData());
DailyDataSet dsmax = max_data.get(dayKey);
rl.addAll(dsmax.getData());
}
return rl;
}
}
DailyDataSet:
package sc2pdf.datasets;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Collection;
import java.util.Date;
import java.util.Iterator;
import sc2pdf.javabeans.DataObject;
public class DailyDataSet {
private Date day;
private Collection<DataObject> data;
public DailyDataSet() {
this(new Date());
}
public DailyDataSet(Date d) {
this.day=d;
this.data = new ArrayList<DataObject>();
}
public void addDataObject(Date d, double k) {
this.data.add(new DataObject(d,k));
}
public void addDataObject(DataObject d) {
this.data.add(d);
}
public Date getDay() {
return day;
}
public Collection<DataObject> getData(){
return data;
}
public DataObject getDataObject(Date toSearch) {
Iterator<DataObject> iter = data.iterator();
Calendar c = Calendar.getInstance();
c.setTime(toSearch);
while (iter.hasNext()) {
DataObject dob = iter.next();
Calendar c2 = Calendar.getInstance();
c2.setTime(dob.getTimeAsDate());
if(c.equals(c2)) {
return dob;
}
}
return null;
}
public DataObject getDataObjectByWeekday(Date toSearch) {
Iterator<DataObject> iter = data.iterator();
Calendar c = Calendar.getInstance();
c.setTime(toSearch);
while (iter.hasNext()) {
DataObject dob = iter.next();
Calendar c2 = Calendar.getInstance();
c2.setTime(dob.getTimeAsDate());
if(c.get(Calendar.DAY_OF_WEEK) == c2.get(Calendar.DAY_OF_WEEK)) {
if(c.get(Calendar.HOUR_OF_DAY) == c2.get(Calendar.HOUR_OF_DAY)) {
if(c.get(Calendar.MINUTE) == c2.get(Calendar.MINUTE)) {
return dob;
}
}
}
}
return null;
}
}
DataObject:
package sc2pdf.javabeans;
import java.util.Calendar;
import java.util.Date;
public class DataObject {
private Date timeAsDate;
private Integer day;
private String timeAsString;
private double kwh;
private Integer type;
public static final Integer TYPE_DATA = 1;
public static final Integer TYPE_MIN = 2;
public static final Integer TYPE_AVG = 3;
public static final Integer TYPE_MAX = 4;
/**
* Constructor
*/
public DataObject() {
this(new Date(),0);
}
public DataObject(int day, int month, int year, int hour, int minute, double k) {
this(day,month,year,hour,minute,k,TYPE_DATA);
}
public DataObject(int day, int month, int year, int hour, int minute, double k, Integer typ) {
Calendar c = Calendar.getInstance();
c.set(year,month,day,hour,minute);
this.day=new Integer(c.get(Calendar.DAY_OF_MONTH));
this.timeAsDate=c.getTime();
this.kwh=k;
this.type=typ;
generateTimeAsString();
}
public DataObject(Date t, double p) {
this(t,p,TYPE_DATA);
}
public DataObject(Date t, double p, Integer ty) {
this.timeAsDate=t;
this.kwh=p;
this.type=ty;
Calendar c = Calendar.getInstance();
c.setTime(t);
this.day=new Integer(c.get(Calendar.DAY_OF_MONTH));
generateTimeAsString();
}
public void generateTimeAsString() {
Calendar c = Calendar.getInstance();
c.setTime(timeAsDate);
timeAsString = c.get(Calendar.HOUR_OF_DAY) +
":"+
c.get(Calendar.MINUTE);
}
public String getTime() {
return timeAsString != null?timeAsString:"WRONG STRING";
}
public void setTime(String s) {
this.timeAsString = s;
}
public Integer getDay() {
return day;
}
public void setDay(Integer day) {
this.day = day;
}
public double getKwh() {
return kwh;
}
public void setKwh(double kwh) {
this.kwh = kwh;
}
public Date getTimeAsDate() {
return timeAsDate;
}
public Integer getType() {
return type;
}
public void setType(Integer type) {
this.type = type;
}
public String toString() {
return (type.equals(DataObject.TYPE_DATA)?"DATA: ":(type.equals(DataObject.TYPE_MIN)?" MIN: ":(type.equals(DataObject.TYPE_AVG)?" AVG: ":" MAX: ")))+this.timeAsDate.toString()+": "+this.kwh;
}
}
the data as it get delivered to the report (format: [type==1?"DATA: ":type==2?"MIN: ":type==3?"AVG: ":"MAX: "]+Date+kWh
DATA: Tue Jan 01 00:00:07 CET 2019: 2.501
DATA: Tue Jan 01 00:15:07 CET 2019: 2.679
DATA: Tue Jan 01 00:30:07 CET 2019: 2.93
DATA: Tue Jan 01 00:45:07 CET 2019: 2.363
DATA: Tue Jan 01 01:00:07 CET 2019: 2.589
DATA: Tue Jan 01 01:15:07 CET 2019: 2.423
DATA: Tue Jan 01 01:30:07 CET 2019: 2.531
DATA: Tue Jan 01 01:45:07 CET 2019: 2.976
DATA: Tue Jan 01 02:00:07 CET 2019: 2.369
DATA: Tue Jan 01 02:15:07 CET 2019: 2.636
DATA: Tue Jan 01 02:30:07 CET 2019: 2.391
DATA: Tue Jan 01 02:45:07 CET 2019: 2.667
DATA: Tue Jan 01 03:00:07 CET 2019: 2.88
DATA: Tue Jan 01 03:15:07 CET 2019: 2.378
DATA: Tue Jan 01 03:30:07 CET 2019: 2.723
DATA: Tue Jan 01 03:45:07 CET 2019: 2.511
DATA: Tue Jan 01 04:00:07 CET 2019: 2.789
DATA: Tue Jan 01 04:15:07 CET 2019: 2.867
DATA: Tue Jan 01 04:30:07 CET 2019: 3.101
DATA: Tue Jan 01 04:45:07 CET 2019: 3.321
DATA: Tue Jan 01 05:00:07 CET 2019: 2.438
DATA: Tue Jan 01 05:15:07 CET 2019: 2.616
DATA: Tue Jan 01 05:30:07 CET 2019: 3.146
DATA: Tue Jan 01 05:45:07 CET 2019: 5.882
DATA: Tue Jan 01 06:00:07 CET 2019: 4.814
DATA: Tue Jan 01 06:15:07 CET 2019: 4.593
DATA: Tue Jan 01 06:30:07 CET 2019: 5.078
DATA: Tue Jan 01 06:45:07 CET 2019: 5.69
DATA: Tue Jan 01 07:00:07 CET 2019: 9.734
DATA: Tue Jan 01 07:15:07 CET 2019: 12.63
DATA: Tue Jan 01 07:30:07 CET 2019: 14.304
DATA: Tue Jan 01 07:45:07 CET 2019: 14.52
DATA: Tue Jan 01 08:00:07 CET 2019: 14.988
DATA: Tue Jan 01 08:15:07 CET 2019: 14.984
DATA: Tue Jan 01 08:30:07 CET 2019: 14.748
DATA: Tue Jan 01 08:45:07 CET 2019: 13.859
DATA: Tue Jan 01 09:00:07 CET 2019: 12.038
DATA: Tue Jan 01 09:15:07 CET 2019: 15.084
DATA: Tue Jan 01 09:30:07 CET 2019: 14.787
DATA: Tue Jan 01 09:45:07 CET 2019: 15.069
DATA: Tue Jan 01 10:00:07 CET 2019: 15.764
DATA: Tue Jan 01 10:15:07 CET 2019: 16.71
DATA: Tue Jan 01 10:30:07 CET 2019: 15.549
DATA: Tue Jan 01 10:45:07 CET 2019: 14.21
DATA: Tue Jan 01 11:00:07 CET 2019: 14.073
DATA: Tue Jan 01 11:15:07 CET 2019: 12.233
DATA: Tue Jan 01 11:30:07 CET 2019: 13.17
DATA: Tue Jan 01 11:45:07 CET 2019: 11.973
DATA: Tue Jan 01 12:00:07 CET 2019: 11.982
DATA: Tue Jan 01 12:15:07 CET 2019: 10.907
DATA: Tue Jan 01 12:30:07 CET 2019: 8.087
DATA: Tue Jan 01 12:45:07 CET 2019: 4.575
DATA: Tue Jan 01 13:00:07 CET 2019: 2.96
DATA: Tue Jan 01 13:15:07 CET 2019: 1.884
DATA: Tue Jan 01 13:30:07 CET 2019: 0.995
DATA: Tue Jan 01 13:45:07 CET 2019: 2.691
DATA: Tue Jan 01 14:00:07 CET 2019: 1.826
DATA: Tue Jan 01 14:15:07 CET 2019: 2.487
DATA: Tue Jan 01 14:30:07 CET 2019: 2.984
DATA: Tue Jan 01 14:45:07 CET 2019: 2.808
DATA: Tue Jan 01 15:00:07 CET 2019: 3.539
DATA: Tue Jan 01 15:15:07 CET 2019: 2.49
DATA: Tue Jan 01 15:30:07 CET 2019: 3.473
DATA: Tue Jan 01 15:45:07 CET 2019: 6.41
DATA: Tue Jan 01 16:00:07 CET 2019: 6.977
DATA: Tue Jan 01 16:15:07 CET 2019: 6.854
DATA: Tue Jan 01 16:30:07 CET 2019: 12.807
DATA: Tue Jan 01 16:45:07 CET 2019: 10.83
DATA: Tue Jan 01 17:00:07 CET 2019: 8.435
DATA: Tue Jan 01 17:15:07 CET 2019: 8.633
DATA: Tue Jan 01 17:30:07 CET 2019: 4.887
DATA: Tue Jan 01 17:45:07 CET 2019: 3.594
DATA: Tue Jan 01 18:00:07 CET 2019: 3.36
DATA: Tue Jan 01 18:15:07 CET 2019: 4.145
DATA: Tue Jan 01 18:30:07 CET 2019: 9.11
DATA: Tue Jan 01 18:45:07 CET 2019: 8.714
DATA: Tue Jan 01 19:00:07 CET 2019: 7.982
DATA: Tue Jan 01 19:15:07 CET 2019: 6.545
DATA: Tue Jan 01 19:30:07 CET 2019: 6.764
DATA: Tue Jan 01 19:45:07 CET 2019: 7.394
DATA: Tue Jan 01 20:00:07 CET 2019: 9.887
DATA: Tue Jan 01 20:15:07 CET 2019: 7.292
DATA: Tue Jan 01 20:30:07 CET 2019: 5.972
DATA: Tue Jan 01 20:45:07 CET 2019: 6.924
DATA: Tue Jan 01 21:00:07 CET 2019: 6.426
DATA: Tue Jan 01 21:15:07 CET 2019: 5.675
DATA: Tue Jan 01 21:30:07 CET 2019: 5.973
DATA: Tue Jan 01 21:45:07 CET 2019: 5.936
DATA: Tue Jan 01 22:00:07 CET 2019: 6.06
DATA: Tue Jan 01 22:15:07 CET 2019: 5.646
DATA: Tue Jan 01 22:30:07 CET 2019: 4.484
DATA: Tue Jan 01 22:45:07 CET 2019: 2.481
DATA: Tue Jan 01 23:00:07 CET 2019: 2.49
DATA: Tue Jan 01 23:15:07 CET 2019: 2.423
DATA: Tue Jan 01 23:30:07 CET 2019: 2.49
DATA: Tue Jan 01 23:45:07 CET 2019: 2.955
MIN: Tue Jan 01 00:00:07 CET 2019: 2.154
MIN: Tue Jan 01 00:15:07 CET 2019: 2.093
MIN: Tue Jan 01 00:30:07 CET 2019: 2.052
MIN: Tue Jan 01 00:45:07 CET 2019: 2.126
MIN: Tue Jan 01 01:00:07 CET 2019: 2.139
MIN: Tue Jan 01 01:15:07 CET 2019: 2.171
MIN: Tue Jan 01 01:30:07 CET 2019: 2.162
MIN: Tue Jan 01 01:45:07 CET 2019: 2.178
MIN: Tue Jan 01 02:00:07 CET 2019: 2.049
MIN: Tue Jan 01 02:15:07 CET 2019: 2.033
MIN: Tue Jan 01 02:30:07 CET 2019: 2.034
MIN: Tue Jan 01 02:45:07 CET 2019: 2.129
MIN: Tue Jan 01 03:00:07 CET 2019: 2.159
MIN: Tue Jan 01 03:15:07 CET 2019: 2.165
MIN: Tue Jan 01 03:30:07 CET 2019: 2.001
MIN: Tue Jan 01 03:45:07 CET 2019: 2.006
MIN: Tue Jan 01 04:00:07 CET 2019: 2.313
MIN: Tue Jan 01 04:15:07 CET 2019: 1.992
MIN: Tue Jan 01 04:30:07 CET 2019: 2.082
MIN: Tue Jan 01 04:45:07 CET 2019: 1.982
MIN: Tue Jan 01 05:00:07 CET 2019: 2.217
MIN: Tue Jan 01 05:15:07 CET 2019: 2.088
MIN: Tue Jan 01 05:30:07 CET 2019: 2.09
MIN: Tue Jan 01 05:45:07 CET 2019: 1.923
MIN: Tue Jan 01 06:00:07 CET 2019: 1.928
MIN: Tue Jan 01 06:15:07 CET 2019: 2.337
MIN: Tue Jan 01 06:30:07 CET 2019: 2.333
MIN: Tue Jan 01 06:45:07 CET 2019: 2.397
MIN: Tue Jan 01 07:00:07 CET 2019: 2.135
MIN: Tue Jan 01 07:15:07 CET 2019: 2.45
MIN: Tue Jan 01 07:30:07 CET 2019: 2.288
MIN: Tue Jan 01 07:45:07 CET 2019: 2.405
MIN: Tue Jan 01 08:00:07 CET 2019: 2.093
MIN: Tue Jan 01 08:15:07 CET 2019: 2.403
MIN: Tue Jan 01 08:30:07 CET 2019: 3.309
MIN: Tue Jan 01 08:45:07 CET 2019: 2.904
and a screenshot from my output:
my pdf looks like this atm...
dummy cvs(separated by '|') valueset:
day|type|time|kwh
1|1|00:15|0.9
1|1|00:30|1.2
1|2|00:15|0.8
1|2|00:30|1
1|3|00:15|0.8
1|3|00:30|1.1
1|4|00:15|1
1|4|00:30|1.5
2|1|00:15|0.7
2|1|00:30|1
2|2|00:15|0.6
2|2|00:30|0.9
2|3|00:15|0.7
2|3|00:30|1
2|4|00:15|0.8
2|4|00:30|1.1
expected outcome (the red line would be data of type '1', the green of type'2', the blue of type'3' and the yellow of type'4'; there should be as many charts, as different days (which would be solved via grouping, which already works):
expected chart

How can i query a map field in ksql?

I have created a stream from a topic i ksql. The stream has the fields as below. I can query diferent field for example: select category from fake-data-119. I would like to know how can i get a single item from the map field, for example : status?
The data that are coming from the source are:
ProducerRecord(topic=fake-data-119, partition=null, headers=RecordHeaders(headers = [], isReadOnly = true), key=null, value={"deviceId": 16, "category": "visibility sensors", "timeStamp": "Tue Jun 19 10:11:10 CEST 2018", "deviceProperties": {"visibility": "72", "status": "true"}}, timestamp=null)
ProducerRecord(topic=fake-data-119, partition=null, headers=RecordHeaders(headers = [], isReadOnly = true), key=null, value={"deviceId": 6, "category": "fans", "timeStamp": "Tue Jun 19 10:11:11 CEST 2018", "deviceProperties": {"temperature": "22", "rotationSense": "1", "status": "false", "frequency": "56"}}, timestamp=null)
ProducerRecord(topic=fake-data-119, partition=null, headers=RecordHeaders(headers = [], isReadOnly = true), key=null, value={"deviceId": 23, "category": "air quality monitors", "timeStamp": "Tue Jun 19 10:11:12 CEST 2018", "deviceProperties": {"coPpm": "136", "status": "false", "Co2Ppm": "450"}}, timestamp=null)
I am using the statement below to create the stream:
CREATE STREAM fakeData119 WITH (KAFKA_TOPIC='fake-data-119', VALUE_FORMAT='AVRO');
Field | Type
---------------------------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
DEVICEID | INTEGER
CATEGORY | VARCHAR(STRING)
TIMESTAMP | VARCHAR(STRING)
DEVICEPROPERTIES | MAP[VARCHAR(STRING),VARCHAR(STRING)]
---------------------------------------------------------
ksql> select * from fakeData119;
1529394182864 | null | 6 | fans | Tue Jun 19 09:43:02 CEST 2018 | {temperature=36, rotationSense=1, status=false, frequency=72}
1529394183869 | null | 5 | fans | Tue Jun 19 09:43:03 CEST 2018 | {temperature=23, rotationSense=1, status=true, frequency=76}
1529394184872 | null | 16 | visibility sensors | Tue Jun 19 09:43:04 CEST 2018 | {visibility=14, status=true}
1529394185875 | null | 25 | air quality monitors | Tue Jun 19 09:43:05 CEST 2018 | {coPpm=280, status=false, Co2Ppm=170}
You can get items in the map in the following way:
select deviceproperties['status'] from fakedata119

How to get the lag of a column in a Spark streaming dataframe?

I have data streaming into my Spark Scala application in this format
id mark1 mark2 mark3 time
uuid1 100 200 300 Tue Aug 8 14:06:02 PDT 2017
uuid1 100 200 300 Tue Aug 8 14:06:22 PDT 2017
uuid2 150 250 350 Tue Aug 8 14:06:32 PDT 2017
uuid2 150 250 350 Tue Aug 8 14:06:52 PDT 2017
uuid2 150 250 350 Tue Aug 8 14:06:58 PDT 2017
I have it read into columns id, mark1, mark2, mark3 and time. The time is converted to datetime format as well.
I want to get this grouped by id and get the lag for mark1 which gives the previous row's mark1 value.
Something like this:
id mark1 mark2 mark3 prev_mark time
uuid1 100 200 300 null Tue Aug 8 14:06:02 PDT 2017
uuid1 100 200 300 100 Tue Aug 8 14:06:22 PDT 2017
uuid2 150 250 350 null Tue Aug 8 14:06:32 PDT 2017
uuid2 150 250 350 150 Tue Aug 8 14:06:52 PDT 2017
uuid2 150 250 350 150 Tue Aug 8 14:06:58 PDT 2017
Consider the dataframe to be markDF. I have tried:
val window = Window.partitionBy("uuid").orderBy("timestamp")
val newerDF = newDF.withColumn("prev_mark", lag("mark1", 1, null).over(window))`
which says non time windows cannot be applied on streaming/appending datasets/frames.
I have also tried:
val window = Window.partitionBy("uuid").orderBy("timestamp").rowsBetween(-10, 10)
val newerDF = newDF.withColumn("prev_mark", lag("mark1", 1, null).over(window))
To get a window for few rows which did not work either. The streaming window something like:
window("timestamp", "10 minutes")
cannot be used to send over the lag. I am super confused on how to do this. Any help would be awesome!!
I would advise you to change the time column into String as
+-----+-----+-----+-----+----------------------------+
|id |mark1|mark2|mark3|time |
+-----+-----+-----+-----+----------------------------+
|uuid1|100 |200 |300 |Tue Aug 8 14:06:02 PDT 2017|
|uuid1|100 |200 |300 |Tue Aug 8 14:06:22 PDT 2017|
|uuid2|150 |250 |350 |Tue Aug 8 14:06:32 PDT 2017|
|uuid2|150 |250 |350 |Tue Aug 8 14:06:52 PDT 2017|
|uuid2|150 |250 |350 |Tue Aug 8 14:06:58 PDT 2017|
+-----+-----+-----+-----+----------------------------+
root
|-- id: string (nullable = true)
|-- mark1: integer (nullable = false)
|-- mark2: integer (nullable = false)
|-- mark3: integer (nullable = false)
|-- time: string (nullable = true)
After that doing the following should work
df.withColumn("prev_mark", lag("mark1", 1).over(Window.partitionBy("id").orderBy("time")))
Which will give you output as
+-----+-----+-----+-----+----------------------------+---------+
|id |mark1|mark2|mark3|time |prev_mark|
+-----+-----+-----+-----+----------------------------+---------+
|uuid1|100 |200 |300 |Tue Aug 8 14:06:02 PDT 2017|null |
|uuid1|100 |200 |300 |Tue Aug 8 14:06:22 PDT 2017|100 |
|uuid2|150 |250 |350 |Tue Aug 8 14:06:32 PDT 2017|null |
|uuid2|150 |250 |350 |Tue Aug 8 14:06:52 PDT 2017|150 |
|uuid2|150 |250 |350 |Tue Aug 8 14:06:58 PDT 2017|150 |
+-----+-----+-----+-----+----------------------------+---------+