How to install MongoDB 4.4 on Alpine 3.11 - mongodb

I want to install MongoDB 4.4 on Alpine 3.11, but it appears that Alpine has removed the MDB package because of BSD license.
I have to build the image myself, but I have some errors...
Firstly, I clone de git repository :
git clone --branch v4.4 --single-branch --depth 1 https://github.com/mongodb/mongo.git /tmp/mongo
Then I install some packages :
apk add --no-cache --virtual build-pack \
boost-build=1.69.0-r1 \
boost-filesystem=1.71.0-r1 \
boost-iostreams=1.71.0-r1 \
boost-program_options=1.71.0-r1 \
boost-python3=1.71.0-r1 \
boost-system=1.71.0-r1 \
build-base=0.5-r1 \
busybox=1.31.1-r9 \
curl-dev=7.67.0-r0 \
cmake=3.15.5-r0 \
db=5.3.28-r1 \
isl=0.18-r0 \
libbz2=1.0.8-r1 \
libc-dev=0.7.2-r0 \
libgcc=9.2.0-r4
libpcrecpp=8.43-r0 \
libsasl=2.1.27-r5 \
libssl1.1=1.1.1d-r3 \
libstdc++=9.2.0-r4 \
linux-headers=4.19.36-r0 \
g++=9.2.0-r4 \
gcc=9.2.0-r4 \
gmp=6.1.2-r1 \
jsoncpp=1.9.2-r0 \
jsoncpp-dev=1.9.2-r0 \
mpc1=1.1.0-r1 \
mpfr4=4.0.2-r1 \
musl=1.1.24-r2 \
musl-dev=1.1.24-r2 \
openssl-dev=1.1.1d-r3 \
pcre=8.43-r0 \
pkgconf=1.6.3-r0 \
python3=3.8.2-r0 \
py3-cheetah=3.2.4-r1 \
py3-crypto=2.6.1-r5 \
py3-openssl=19.1.0-r0 \
py3-psutil=5.6.7-r0 \
py3-yaml=5.3.1-r0 \
scons=3.1.1-r0 \
snappy=1.1.7-r1 \
xz-libs=5.2.4-r0 \
yaml-cpp=0.6.3-r0 \
yaml-cpp-dev=0.6.3-r0 \
zlib=1.2.11-r3
I have this error when I use this command python3 buildscripts/scons.py MONGO_VERSION=4.4 --prefix=/opt/mongo mongod --disable-warnings-as-errors :
src/mongo/util/processinfo_linux.cpp:50:10: fatal error: gnu/libc-version.h: No such file or directory
...
scons: building terminated because of errors.
build/opt/mongo/util/processinfo_linux.o failed: Error 1
Any idea ?
Thanks.
EDIT : I have tried 4.2.5 version, I have this error message and :
In file included from src/third_party/mozjs-60/platform/x86_64/linux/build/Unified_cpp_js_src29.cpp:11:
src/third_party/mozjs-60/extract/js/src/threading/posix/Thread.cpp: In function 'void js::ThisThread::GetName(char*, size_t)':
src/third_party/mozjs-60/extract/js/src/threading/posix/Thread.cpp:210:8: error: 'pthread_getname_np' was not declared in this scope; did you mean 'pthread_setname_np'?
210 | rv = pthread_getname_np(pthread_self(), nameBuffer, len);
| ^~~~~~~~~~~~~~~~~~
| pthread_setname_np
scons: *** [build/opt/third_party/mozjs-60/platform/x86_64/linux/build/Unified_cpp_js_src29.o] Error 1
scons: building terminated because of errors.
build/opt/third_party/mozjs-60/platform/x86_64/linux/build/Unified_cpp_js_src29.o failed: Error 1
With these packages :
apk add --no-cache --virtual build-pack \
build-base=0.5-r1 \
cmake=3.15.5-r0 \
curl-dev=7.67.0-r0 \
libgcc=9.2.0-r4 \
libssl1.1=1.1.1d-r3 \
libstdc++=9.2.0-r4 \
linux-headers=4.19.36-r0 \
g++=9.2.0-r4 \
gcc=9.2.0-r4 \
openssl-dev=1.1.1d-r3 \
musl=1.1.24-r2
python3=3.8.2-r0 \
py3-cheetah=3.2.4-r1 \
py3-crypto=2.6.1-r5 \
py3-openssl=19.1.0-r0 \
py3-psutil=5.6.7-r0 \
py3-yaml=5.3.1-r0 \
scons=3.1.1-r0 \
libc-dev=0.7.2-r0

Just run:
apk update && apk add openrc

Related

Spark Shuffle Read and Shuffle Write Increasing in Structured Screaming

I have been running spark-structured streaming with Kafka for the last 23 hours. And I could see Shuffle Read and Shuffle Write Increasing drastically and finally, the driver stopped due to"out of memory".
Data Pushing to Kafak is 3 json per second and Spark streaming processingTime='30 seconds'
spark = SparkSession \
.builder \
.master("spark://spark-master:7077") \
.appName("demo") \
.config("spark.executor.cores", 1) \
.config("spark.cores.max", "4") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.config("spark.sql.warehouse.dir", "hdfs://172.30.7.36:9000/user/hive/warehouse") \
.config("spark.streaming.stopGracefullyOnShutdown", "true") \
.config("spark.executor.memory", '1g') \
.config("spark.scheduler.mode", "FAIR") \
.config("spark.driver.memory", '2g') \
.config("spark.sql.caseSensitive", "true") \
.config("spark.sql.shuffle.partitions", 8) \
.enableHiveSupport() \
.getOrCreate()
CustDf \
.writeStream \
.queryName("customerdatatest") \
.format("delta") \
.outputMode("append") \
.trigger(processingTime='30 seconds') \
.option("mergeSchema", "true") \
.option("checkpointLocation", "/checkpoint/bronze_customer/") \
.toTable("bronze.customer")
I am expecting this straming should be run alteast for 1 month continuously.
Spark is transforming json (Flattening the json) and insert into the delta table.
Please help me on this. weather i misssed any configuration ?

BigQuery to Postgre execution failed on Dataflow workflow timestamp

Hi have this issue where am unsure how to get a proper start date for my query, I get the following error and am unsure how to go about fixing it. Can I get help on the time conversion format please?
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Workflow failed. Causes: S01:QueryTableStdSQL+Writing to DB/ParDo(_WriteToRelationalDBFn) failed., BigQuery execution failed., Error:
Message: No matching signature for operator >= for argument types: TIMESTAMP, INT64. Supported signature: ANY >= ANY at [1:1241]
HTTP Code: 400
My script main query looks like:
with beam.Pipeline(options=options) as p:
rows = p | 'QueryTableStdSQL' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True,
query = 'SELECT \
billing_account_id, \
service.id as service_id, \
service.description as service_description, \
sku.id as sku_id, \
sku.description as sku_description, \
usage_start_time, \
usage_end_time, \
project.id as project_id, \
project.name as project_description, \
TO_JSON_STRING(project.labels) \
as project_labels, \
project.ancestry_numbers \
as project_ancestry_numbers, \
TO_JSON_STRING(labels) as labels, \
TO_JSON_STRING(system_labels) as system_labels, \
location.location as location_location, \
location.country as location_country, \
location.region as location_region, \
location.zone as location_zone, \
export_time, \
cost, \
currency, \
currency_conversion_rate, \
usage.amount as usage_amount, \
usage.unit as usage_unit, \
usage.amount_in_pricing_units as \
usage_amount_in_pricing_units, \
usage.pricing_unit as usage_pricing_unit, \
TO_JSON_STRING(credits) as credits, \
invoice.month as invoice_month, \
cost_type, \
FROM `pprodjectID.bill_usage.gcp_billing_export_v1_xxxxxxxx` \
WHERE export_time >= 2020-01-01'))
source_config = relational_db.SourceConfiguration(
The date format on bigquery console
export_time
2018-01-25 01:18:55.637 UTC
usage_start_time
2018-01-24 21:23:10.643 UTC
You forgot to include as a string the time
WHERE export_time >= 2020-01-01
The above results Calc: 0+2020-01-01=2018 you should have
WHERE export_time >= "2020-01-01"

AttributeError: 'Namespace' object has no attribute 'project'

I am trying to reuse a code which I copied from https://www.opsguru.io/post/solution-walkthrough-visualizing-daily-cloud-spend-on-gcp-using-gke-dataflow-bigquery-and-grafana. Am not too familiar with python as such seek for help here. Trying to copy GCP Bigquery data into Postgres
I have done some modification to the code and am getting some error due to my mistake or code
Here is what I have
import uuid
import argparse
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions, StandardOptions, GoogleCloudOptions, WorkerOptions
from beam_nuggets.io import relational_db
from apache_beam.io.gcp import bigquery
parser = argparse.ArgumentParser()
args = parser.parse_args()
project = args.project("project", help="Enter Project ID")
job_name = args.job_name + str(uuid.uuid4())
bigquery_source = args.bigquery_source
postgresql_user = args.postgresql_user
postgresql_password = args.postgresql_password
postgresql_host = args.postgresql_host
postgresql_port = args.postgresql_port
postgresql_db = args.postgresql_db
postgresql_table = args.postgresql_table
staging_location = args.staging_location
temp_location = args.temp_location
subnetwork = args.subnetwork
options = PipelineOptions(
flags=["--requirements_file", "/opt/python/requirements.txt"])
# For Cloud execution, set the Cloud Platform project, job_name,
# staging location, temp_location and specify DataflowRunner.
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = project
google_cloud_options.job_name = job_name
google_cloud_options.staging_location = staging_location
google_cloud_options.temp_location = temp_location
google_cloud_options.region = "europe-west4"
worker_options = options.view_as(WorkerOptions)
worker_options.zone = "europe-west4-a"
worker_options.subnetwork = subnetwork
worker_options.max_num_workers = 20
options.view_as(StandardOptions).runner = 'DataflowRunner'
start_date = define_start_date()
with beam.Pipeline(options=options) as p:
rows = p | 'QueryTableStdSQL' >> beam.io.Read(beam.io.BigQuerySource(
query = 'SELECT \
billing_account_id, \
service.id as service_id, \
service.description as service_description, \
sku.id as sku_id, \
sku.description as sku_description, \
usage_start_time, \
usage_end_time, \
project.id as project_id, \
project.name as project_description, \
TO_JSON_STRING(project.labels) \
as project_labels, \
project.ancestry_numbers \
as project_ancestry_numbers, \
TO_JSON_STRING(labels) as labels, \
TO_JSON_STRING(system_labels) as system_labels, \
location.location as location_location, \
location.country as location_country, \
location.region as location_region, \
location.zone as location_zone, \
export_time, \
cost, \
currency, \
currency_conversion_rate, \
usage.amount as usage_amount, \
usage.unit as usage_unit, \
usage.amount_in_pricing_units as \
usage_amount_in_pricing_units, \
usage.pricing_unit as usage_pricing_unit, \
TO_JSON_STRING(credits) as credits, \
invoice.month as invoice_month cost_type \
FROM `' + project + '.' + bigquery_source + '` \
WHERE export_time >= "' + start_date + '"', use_standard_sql=True))
source_config = relational_db.SourceConfiguration(
drivername='postgresql+pg8000',
host=postgresql_host,
port=postgresql_port,
username=postgresql_user,
password=postgresql_password,
database=postgresql_db,
create_if_missing=True,
)
table_config = relational_db.TableConfiguration(
name=postgresql_table,
create_if_missing=True
)
rows | 'Writing to DB' >> relational_db.Write(
source_config=source_config,
table_config=table_config
)
When I run the program am getting the following error:
bq-to-sql.py: error: unrecognized arguments: --project xxxxx --job_name bq-to-sql-job --bigquery_source xxxxxxxx
--postgresql_user xxxxx --postgresql_password xxxxx --postgresql_host xx.xx.xx.xx --postgresql_port 5432 --postgresql_db xxxx --postgresql_table xxxx --staging_location g
s://xxxxx-staging --temp_location gs://xxxxx-temp --subnetwork regions/europe-west4/subnetworks/xxxx
argparse needs to be configured. Argparse works like magic, but it does need configuration. These lines are needed between line 10 parser = argparse.ArgumentParser() and line 11 args = parser.parse_args()
parser.add_argument("--project")
parser.add_argument("--job_name")
parser.add_argument("--bigquery_source")
parser.add_argument("--postgresql_user")
parser.add_argument("--postgresql_password")
parser.add_argument("--postgresql_host")
parser.add_argument("--postgresql_port")
parser.add_argument("--postgresql_db")
parser.add_argument("--postgresql_table")
parser.add_argument("--staging_location")
parser.add_argument("--temp_location")
parser.add_argument("--subnetwork")
Argparse is a useful library. I recommend adding a lot of options to these add_argument calls.

Splunk REST API - How to add a webhook action?

I want to create an alert, and add a webhook action to it. However, looking at the Splunk documentation, it doesn't seem to say how to do it.
Here is my current request:
curl -s -k -u admin:password https://splunk.rf:8089/servicesNS/admin/search/saved/searches > /dev/null \
-d name=bruteforcetest \
--data-urlencode output_mode='json' \
--data-urlencode alert.digest_mode='0' \
--data-urlencode alert.expires='24h' \
--data-urlencode alert.managedBy='' \
--data-urlencode alert.severity='3' \
--data-urlencode alert.suppress='1' \
--data-urlencode alert.suppress.fields='source_ip' \
--data-urlencode alert.suppress.period='2m' \
--data-urlencode alert_comparator='greater than' \
--data-urlencode alert_condition='' \
--data-urlencode alert_threshold='20' \
--data-urlencode alert_type='number of events' \
--data-urlencode alert.track='1' \
--data-urlencode cron_schedule='* * * * *' \
--data-urlencode description='' \
--data-urlencode disabled='0' \
--data-urlencode displayview='' \
--data-urlencode is_scheduled='1' \
--data-urlencode is_visible='1' \
--data-urlencode max_concurrent='1' \
--data-urlencode realtime_schedule='1' \
--data-urlencode restart_on_searchpeer_add='1' \
--data-urlencode run_n_times='0' \
--data-urlencode run_on_startup='0' \
--data-urlencode schedule_priority='default' \
--data-urlencode schedule_window='0' \
--data-urlencode dispatch.earliest_time='rt-2m' \
--data-urlencode dispatch.latest_time='rt-0m' \
--data-urlencode display.events.fields='["host","source","sourcetype", "source_ip"]' \
--data-urlencode search='"error: invalid login credentials for user"'
How can I modify this request to add a webhook action? The webhook query should be to http://firewall.mycompany/ban.
There are two parameters you need to specify to create a webhook action:
actions='webhook'
action.webhook.param.url='http://firewall.mycompany/ban'
Here is your request, modified to include a webhook action:
curl -s -k -u admin:password https://splunk.rf:8089/servicesNS/admin/search/saved/searches > /dev/null \
-d name=bruteforcetest \
--data-urlencode actions='webhook' \
--data-urlencode action.webhook.param.url='http://firewall.mycompany/ban' \
--data-urlencode output_mode='json' \
--data-urlencode alert.digest_mode='0' \
--data-urlencode alert.expires='24h' \
--data-urlencode alert.managedBy='' \
--data-urlencode alert.severity='3' \
--data-urlencode alert.suppress='1' \
--data-urlencode alert.suppress.fields='source_ip' \
--data-urlencode alert.suppress.period='2m' \
--data-urlencode alert_comparator='greater than' \
--data-urlencode alert_condition='' \
--data-urlencode alert_threshold='20' \
--data-urlencode alert_type='number of events' \
--data-urlencode alert.track='1' \
--data-urlencode cron_schedule='* * * * *' \
--data-urlencode description='' \
--data-urlencode disabled='0' \
--data-urlencode displayview='' \
--data-urlencode is_scheduled='1' \
--data-urlencode is_visible='1' \
--data-urlencode max_concurrent='1' \
--data-urlencode realtime_schedule='1' \
--data-urlencode restart_on_searchpeer_add='1' \
--data-urlencode run_n_times='0' \
--data-urlencode run_on_startup='0' \
--data-urlencode schedule_priority='default' \
--data-urlencode schedule_window='0' \
--data-urlencode dispatch.earliest_time='rt-2m' \
--data-urlencode dispatch.latest_time='rt-0m' \
--data-urlencode display.events.fields='["host","source","sourcetype", "source_ip"]' \
--data-urlencode search='"error: invalid login credentials for user"'

Splunk REST API: How to set "Send to triggered alerts" action when creating an alert?

I wish to create a Splunk alert using the REST API. However, I can't find the action "Send to triggered alerts" in the actions list. How can I add that action?
The parameter you are looking for in the Splunk documentation is alert.track. You must set alert.track to 1 in your request.
Here is an example of such an alert:
curl -k -u admin:password https://some.address:8089/servicesNS/admin/search/saved/searches \
-d name=test4 \
--data-urlencode output_mode='json' \
--data-urlencode actions='' \
--data-urlencode alert.digest_mode='1' \
--data-urlencode alert.expires='24h' \
--data-urlencode alert.managedBy='' \
--data-urlencode alert.severity='3' \
--data-urlencode alert.suppress='0' \
--data-urlencode alert.suppress.fields='' \
--data-urlencode alert.suppress.period='' \
--data-urlencode alert.track='1' \
--data-urlencode alert_comparator='equal to' \
--data-urlencode alert_condition='' \
--data-urlencode alert_threshold='0' \
--data-urlencode alert_type='number of events' \
--data-urlencode allow_skew='0' \
--data-urlencode cron_schedule='*/2 * * * *' \
--data-urlencode description='' \
--data-urlencode disabled='0' \
--data-urlencode displayview='' \
--data-urlencode is_scheduled='1' \
--data-urlencode is_visible='1' \
--data-urlencode max_concurrent='1' \
--data-urlencode realtime_schedule='1' \
--data-urlencode restart_on_searchpeer_add='1' \
--data-urlencode run_n_times='0' \
--data-urlencode run_on_startup='0' \
--data-urlencode schedule_priority='default' \
--data-urlencode schedule_window='0' \
--data-urlencode search='sourcetype="auth" failed'