logstash date filter add_field is not working - date

I"m connecting to postgres and writing a few rows to elastic via logstash.
same date read/write is working fine.
After I apply a date fileter, fetch a date field and assign it to newly created field, it's not working. Below is the filter
filter {
date {
locale => "en"
match => ["old_date", "YYYY-MM-dd"]
timezone => "Asia/Kolkata"
add_field => { "newdate" => "2022-10-06"}
target => "#newdate"
}
}
I tried with mutate also but the new field is not getting created and there's no any error

Related

how to remove field in logstash output

I have set up an ELK stack. For the logstash instance, it has two output including Kafka and elasticsearch.
For the output of elasticsearch, I want to keep the field #timestamp. For the output of Kafka, I want to remove the field #timestamp. So I cannot just remove field #timestamp in the filter. I just want it removed for the Kafka output.
I have not found this kind of solution.
append
Try to use clone plugin:
clone {
clones => ["kafka"]
id => ["kafka"]
remove_field => ["#timestamp"]
}
output {
if [type] != "kafka" {
elastcsearch output
}
if [type] == "kafka" {
kafka output
}
}
It's strange that the output of elasticsearch can work. But it cannot output to kafka. And I have tried to judge by id, still does not wordk.
Since you can only remove fields in the filter block, to have the same pipeline output two different versions of the same event you will need to clone your events, remove the field in the cloned event and use conditionals in the output.
To clone your event and remove the #timestamp field you will need something like this in your filter block.
filter {
# your other filters
#
clone {
clones => ["kafka"]
}
if [type] == "kafka" {
mutate {
remove_field => ["#timestamp"]
}
}
}
This will clone the event and the cloned event will have the value kafka in the field type, you will then use this field in the conditionals in your output.
output {
if [type] != "kafka" {
your elasticsearch output
}
if [type] == "kafka" {
your kafka output
}
}

How to update data in Elasticsearch on a schedule?

I have a table in the PostgreSQL database. I want to insert data from that table into the Elasticsearch's index. I need to update index data on a schedule. In other words, deletes old data and inserts with a new one. I have such Logstash configuration file but it doesn't update data in index. It's insert data but in the same time I see old data. Therefore, duplicate data occurs. How correctly to update data in Elasticsearch on a schedule?
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://host:port/postgres"
jdbc_user => "postgres"
jdbc_password => "postgres"
jdbc_driver_library => "postgresql-42.2.9.jar"
jdbc_driver_class => "org.postgresql.Driver"
statement => "SELECT * FROM layers;"
schedule => "0 0 * * MON"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "layers"
}
}
You index name doesnt change, so everytime you add new records, it adds to same index.
Add a datetime postfix to the index
index => "layers%{+YYYY.MM.dd}"
So there'll be a new index for each date.
Now for searching, create an alias , so you can always use the same name in your application. For example: layers/_search by adding alias like below:
POST _aliases
{
"actions": [
{
"add": {
"index": "layers-2019.12.11",
"alias": "layers"
}
}
]
}
Above step is via kibana or you can use http post. However, i'd recommend using Curator for alias operations. That way once, log stash command completes, you can run curator to remove current index from the alias and add the newly created one.

Two outputs in logstash. One for certain aggregations only

I'm trying to specify a second output of logstash in order to save certain aggregated data only. No clue how to achieve it at the moment. Documentation doesn't cover such a case.
At the moment I use a single input and a single output.
Input definition (logstash-udp.conf):
input {
udp {
port => 25000
codec => json
buffer_size => 5000
workers => 2
}
}
filter {
grok {
match => [ "message", "API call happened" ]
}
aggregate {
task_id => "%{example_task}"
code => "
map['api_calls'] ||= 0
map['api_calls'] += 1
map['message'] ||= event.get('message')
event.cancel()
"
timeout => 60
push_previous_map_as_event => true
timeout_code => "event.set('aggregated_calls', event.get('api_calls') > 0)"
timeout_tags => ['_aggregation']
}
}
Output definition (logstash-output.conf):
output {
elasticsearch {
hosts => ["localhost"]
manage_template => false
index => "%{[#metadata][udp]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
What I want to achieve now? I need to add a second, different aggregation (different data and conditions) which will save all the not aggregated data to Elasticsearch like now however aggregated data for this aggregation would be saved to Postgres. I'm pretty much stuck at the moment and searching the web for some docs/examples doesn't help.
I'd suggest using multiple pipelines: https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
This way you can have one pipeline for aggregation and second one for pure data.

Sequelize set timezone to query

I'm currently using the Sequelize with postgres in my project. I need to change the query, so it return created_at column with timezone offset.
var sequelize = new Sequelize(connStr, {
dialectOptions: {
useUTC: false //for reading from database
},
timezone: '+08:00' //for writing to database
});
But this affects on entire database. But I need to use timezone for select queries only. Does anyone know how to do that?
This is how I configured it:
dialectOptions: {
dateStrings: true,
typeCast: true,
},
timezone: 'America/Los_Angeles',
http://docs.sequelizejs.com/class/lib/sequelize.js~Sequelize.html
I suggest combining moment.js with one of the following methods:
If you need to parameterize the timezone, then you will probably want to add the offset for each individual query or add an additional field to your table that indicates the timezone, as it does not seem as though sequelize allows parameterized getters.
For example:
const moment = require('moment.js');
const YourModel = sequelize.define('your_model', {
created_at: {
type: Sequelize.DATE,
allowNull: false,
get() {
return moment(this.getDataValue('created_at'))
.utcOffset(this.getDataValue('offset'));
},
},
});
Another possibility would be to extend the model prototype's instance methods in a similar fashion, which allows you to specify the offset at the time that you require the created_at value:
const moment = require('moment.js');
YourModel.prototype.getCreatedAtWithOffset = function (offset) {
return moment(this.created_at).utcOffset(offset);
};
For correct using queries with timezone, prepare your pg driver, see details here:
const pg = require('pg')
const { types } = pg
// we must store dates in UTC
pg.defaults.parseInputDatesAsUTC = true
// fix node-pg default transformation for date types
// https://github.com/brianc/node-pg-types
// https://github.com/brianc/node-pg-types/blob/master/lib/builtins.js
types.setTypeParser(types.builtins.DATE, str => str)
types.setTypeParser(types.builtins.TIMESTAMP, str => str)
types.setTypeParser(types.builtins.TIMESTAMPTZ, str => str)
It's must be initialized before initiating your Sequelize instance.
You can now run a query indicating the timezone for which you want to get the date.
Suppose we make a SQL query, select all User's fields, and "createdAt" in timezone 'Europe/Kiev':
SELECT *, "createdAt"::timestamptz AT TIME ZONE 'Europe/Kiev' AS "createdAt" FROM users WHERE id = 1;
# or with variables
SELECT *, "createdAt"::timestamptz AT TIME ZONE :timezone AS "createdAt" FROM users WHERE id = :id;
For Sequelize (for User model) it will be something like this:
sequelize.findOne({
where: { id: 1 },
attributes: {
include: [
[sequelize.literal(`"User"."createdAt"::timestamptz AT TIME ZONE 'Europe/Kiev'`), 'createdAt'],
// also you can use variables, of course remember about SQL injection:
// [sequelize.literal(`"User"."updatedAt"::timestamptz AT TIME ZONE ${timeZoneVariable}`), 'updatedAt'],
]
}
});

Order Posts by Most Votes (Overall, Last Month, etc.) with Laravel MongoDB

I am trying to understand more advanced functions of mongodb and laravel but having trouble with this. Currently I have my schema setup with a users, posts, and posts_votes collections. The posts_votes has a user_id, post_id and timestamp field.
In a relational DB, I would just left join the posts_votes collection, count, and order by that count. Exclude dates when need be and all that.
MongoDB I am having difficulty b/c there's no left join equivalent. So I'd like to learn how to accomplish my goal in a more document-y way.
On my Post model in Laravel, I reference this way. So looking at an individual post, I can get the vote count, see if current user voted for a specific post, etc.
public function votes()
{
return $this->hasMany(PostVote::class, 'post_id');
}
And my current working query looks like this:
$posts = Post::forCategoryType($type)
->with('votes', 'author', 'businessType')
->where('approved', true)
->paginate(25);
The forCategoryType method is just extended scope I added. Here it is on the Post model/document class.
public function scopeForCategoryType($builder, $catType)
{
if ($catType->exists) {
return $builder->where('cat_id', $catType->id);
}
return $builder;
}
So when I look at posts like this one, it's close to what I want to accomplish, but I am not applying it properly. For instance, I changed my main query to look like this:
$posts = Post::forBusinessType($type)
->with('votes', 'author', 'businessType')
->where('approved', true)
->sortByVotes()
->paginate(25);
And created this new method on the Post model:
public function scopeSortByVotes($builder, $dir = 'desc')
{
return $builder->raw(function($collection) {
return $collection->aggregate([
['$group' => [
'_id' => ['post_id' => 'votes.$post_id', 'user_id' => 'votes.$user_id']
],
'vote_count' => ['$sum' => 1]
],
['$sort' => ['vote_count' => -1]]
]);
});
}
This returns the error exception: A pipeline stage specification object must contain exactly one field.
Not sure how to fix that (still looking), so then I tried:
return $collection->aggregate([
['$unwind' => '$votes'],
['$group' => [
'_id' => ['post_id' => ['$votes.post_id', 'user_id' => '$votes.user_id']],
'count' => ['$sum' => 1]
]
]
]);
returns an empty ArrayIterator, so then I tried:
public function scopeSortByVotes($builder, $dir = 'desc')
{
return $builder->raw(function($collection) {
return $collection->aggregate([
'$lookup' => [
'from' => 'community_posts_votes',
'localField' => 'post_id',
'foreignField' => '_id',
'as' => 'vote_count'
]
]);
});
}
But on this setup, I just get the list of posts unsorted. On version 3.2.8.
The default loads everything by most recent. But ultimately I want to be able to pull these posts based on how many votes they got lifetime, but also query based on which posts got the most votes in the last week, month, etc.
That example I shared has the grand total linked in the Post model and an array of all the user ids that voted on it. With the way I have things setup using a separate collection holding the user_id, post_id and timestamps of when the vote happened, can I still accomplish the same goal?
Note: using this laravel mongodb library.