hadoop fs -copyFromLocal localfile.txt cos://remotefile.txt => Failed to create /disk2/s3a - ibm-cloud

I'm trying to upload a file to cloud object storage from IBM Analytics Engine:
$ hadoop fs -copyFromLocal LICENSE-2.0.txt \
cos://xxxxx/LICENSE-2.0.txt
However, I'm receiving warnings about failure to create disks:
18/01/26 17:47:47 WARN fs.LocalDirAllocator$AllocatorPerContext:
Failed to create /disk1/s3a 18/01/26 17:47:47 WARN
fs.LocalDirAllocator$AllocatorPerContext: Failed to create /disk2/s3a
Note even though I recieve this warning, the file is still uploaded:
$ hadoop fs -ls cos://xxxxx/LICENSE-2.0.txt
-rw-rw-rw- 1 clsadmin clsadmin 11358 2018-01-26 17:49 cos://xxxxx/LICENSE-2.0.txt
The problem seems to be:
$ grep -B2 -C1 'disk' /etc/hadoop/conf/core-site.xml
<property>
<name>fs.s3a.buffer.dir</name>
<value>/disk1/s3a,/disk2/s3a,/tmp/s3a</value>
</property>
$ ls -lh /disk1 /disk2
ls: cannot access /disk1: No such file or directory
ls: cannot access /disk2: No such file or directory
What are the implications of these warnings? The /tmp/s3a folder does exist, so can we ignore the warnings about these other folders?

The hadoop property 'fs.s3a.buffer.dir' supports list (comma separated values)and points to a local path. When the path is missing, the warnings do appear but they can be safely ignored since they are harmless.If the same command had been run from within the data node, the warning would not show up.Regardless of the warning, the file will be copied to Cloud Object Store, hence does not have any other impact.
Idea to have multiple values for fs.s3a.buffer.dir to'/disk1/s3a,/disk2/s3a,/tmp/s3a' is that when hadoop jobs are run on cluster with Cloud Object Store, the map-reduce tasks are scheduled on data nodes which has additional disks viz /disk1 and /disk2 which has more disk capacity compared to management nodes.

Related

AWS Elastic Beanstalk failed to install psycopg2 using requirements.txt Git Pip

I am trying to deploy an app using elasticbeanstalk with Python 3.8. I am using the following requirements.txt
click==8.0.1
Flask==1.1.2
Flask-SQLAlchemy==2.5.1
greenlet==1.1.0
itsdangerous==2.0.1
Jinja2==3.0.1
MarkupSafe==2.0.1
marshmallow==3.12.1
marshmallow-sqlalchemy==0.25.0
SQLAlchemy==1.4.15
Werkzeug==2.0.1
celery[redis]
psycopg2==2.9.3
Flask-JWT-Extended==4.3.1
Flask-RESTful==0.3.9
python-decouple==3.6
When I run the command eb create, I get the following error
2022-04-05 22:03:00 INFO Created security group named: sg-00b14485064e5e8ca
2022-04-05 22:03:16 INFO Created security group named: awseb-e-ekd3bw2bvf-stack-AWSEBSecurityGroup-1O3NAVBIRRK30
2022-04-05 22:03:31 INFO Created Auto Scaling launch configuration named: awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingLaunchConfiguration-HKjIVsa84E3U
2022-04-05 22:04:49 INFO Created Auto Scaling group named: awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingGroup-5FQOAWMGCR3W
2022-04-05 22:04:49 INFO Waiting for EC2 instances to launch. This may take a few minutes.
2022-04-05 22:04:49 INFO Created Auto Scaling group policy named: arn:aws:autoscaling:us-east-1:208357543212:scalingPolicy:ecfbbff0-4151-492f-a474-ba01535ad348:autoScalingGroupName/awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingGroup-5FQOAWMGCR3W:policyName/awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingScaleDownPolicy-CI2UIP6X023P
2022-04-05 22:04:49 INFO Created Auto Scaling group policy named: arn:aws:autoscaling:us-east-1:208357543212:scalingPolicy:d534189a-45e3-48f1-a206-720f202b4469:autoScalingGroupName/awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingGroup-5FQOAWMGCR3W:policyName/awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingScaleUpPolicy-1F0WVTUXXPFKF
2022-04-05 22:05:04 INFO Created CloudWatch alarm named: awseb-e-ekd3bw2bvf-stack-AWSEBCloudwatchAlarmLow-W8URMJEYBO3C
2022-04-05 22:05:04 INFO Created CloudWatch alarm named: awseb-e-ekd3bw2bvf-stack-AWSEBCloudwatchAlarmHigh-13J8QHI51MEBM
2022-04-05 22:06:09 INFO Created load balancer named: arn:aws:elasticloadbalancing:us-east-1:208357543212:loadbalancer/app/awseb-AWSEB-IXOR2Z0K0OJV/1fba4c6ff6122c55
2022-04-05 22:06:24 INFO Created Load Balancer listener named: arn:aws:elasticloadbalancing:us-east-1:208357543212:listener/app/awseb-AWSEB-IXOR2Z0K0OJV/1fba4c6ff6122c55/734b0cf960b6b8c4
2022-04-05 22:06:42 ERROR Instance deployment failed to install application dependencies. The deployment failed.
2022-04-05 22:06:42 ERROR Instance deployment failed. For details, see 'eb-engine.log'.
2022-04-05 22:06:44 ERROR [Instance: i-0368a7ba2157241f4] Command failed on instance. Return code: 1 Output: Engine execution has encountered an error..
2022-04-05 22:06:45 INFO Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
2022-04-05 22:07:48 ERROR Create environment operation is complete, but with errors. For more information, see troubleshooting documentation.
I look at the corresponding logs and I get the following error:
Collecting Werkzeug==2.0.1
Downloading Werkzeug-2.0.1-py3-none-any.whl (288 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.2/288.2 KB 35.6 MB/s eta 0:00:00
Collecting celery[redis]
Downloading celery-5.2.6-py3-none-any.whl (405 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 405.6/405.6 KB 54.7 MB/s eta 0:00:00
Collecting psycopg2==2.9.3
Downloading psycopg2-2.9.3.tar.gz (380 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 380.6/380.6 KB 52.2 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
2022/04/05 22:06:42.952376 [INFO] error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [23 lines of output]
running egg_info
creating /tmp/pip-pip-egg-info-v0aygozt/psycopg2.egg-info
writing /tmp/pip-pip-egg-info-v0aygozt/psycopg2.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-v0aygozt/psycopg2.egg-info/dependency_links.txt
writing top-level names to /tmp/pip-pip-egg-info-v0aygozt/psycopg2.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-v0aygozt/psycopg2.egg-info/SOURCES.txt'
Error: pg_config executable not found.
pg_config is required to build psycopg2 from source. Please add the directory
containing pg_config to the $PATH or specify the full executable path with the
option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
If you prefer to avoid building psycopg2 from source, please install the PyPI
'psycopg2-binary' package instead.
For further information please check the 'doc/src/install.rst' file (also at
<https://www.psycopg.org/docs/install.html>).
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
I am not quite familiar with the requirements of AWS, but I could run the app locally and without any problem. I just wonder what would be a right configuration for the requirements.txt file in order to avoid the bug.
Thanks in advance.
You have to install postgresql-devel first before you can use psycopg2. You can add the installation instructions to your ebextentions:
packages:
yum:
postgresql-devel: []
or
commands:
command1:
command: yum install -y postgresql-devel
I could solve the error. I have to change psycopg2 by psycopg2-binary as it was suggested by the AWS logs:
If you prefer to avoid building psycopg2 from source, please install the PyPI
'psycopg2-binary' package instead.
This issue has to be with the particular configuration of the libraries and the specific Linux machines used in AWS.

mounting bucket with fstab not working NEWBIE

I'm new on GCP and on linux and I try to mount a bucket on my centos instance using gcsfuse.
I tried with a script running at boot but it was not working so I tried with fstab (peoples told me it is much better)
But I got this error when I tried to ls my mounted point :
ls: reading directory .: Input/output error
here is my fstab file :
#
# /etc/fstab
# Created by anaconda on Tue Mar 26 23:07:36 2019
#
# Accessible filesystems, by reference, are maintained under'/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=de2d3dce-cce3-47a8-a0fa-5bfe54e611ab / xfs defaults 0 0
mybucket /mount/to/point gcsfuse rw,allow_other,uid=1001,gid=1001
According : https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/mounting.md
Thanks for your time.
Okay so I just had to wait 2 minutes due to google auth granting my key. Basically it works

rsyslog 5.8 imfile outside /var/log not picking up log files

I would like to pick up logs of different types from various locations other than /var/log and send them to a central location.
Using RH 6.6 and rsyslog 5.8 the configuration works fine when using path within /var/log. If I use other path like /opt/appname/log/file.log. The rsyslog client does not pick up the log. I do not see any error or message when running rsyslogd in debug mode.
Example:
Client:
...
$InputFileName /opt/appname/test.log
$InputFileTag APPNAME1
$InputFileStateFile stat-APPNAME1
$InputFileSeverity info
$InputFilePersistStateInterval 200
$InputFileFacility local3 # alto tried with other local
$InputRunFileMonitor
...
Server:
...
$template HostAudit, "/opt/logs/%HOSTNAME%/test.log" # tried differnt path
$template auditFormat, "%msg%\n"
local3.* ?HostAudit;auditFormat
...
Any recommendations?, I appreciate your help!!!
Bill
I would first try these:
Verify that the state file names are unique
Verify that every $InputFileName points to an existing regular file
Remove some of the files that you want to be monitored from the configuration. It could be that there is a problem with only one of the monitored files. That would make rsyslog ignore the rest of the files.
I had this with "$InputFileStateFile tomcat-log" for each of the individual tomcat logs. Each of the state file name needs to be unique. For me it worked by changing it to instances of:
"$InputFileStateFile tomcat-manager"
"$InputFileStateFile tomcat-localhost"
etc...
Another option is to just add numbers to the end of the state file name.
"$InputFileStateFile tomcat-log1"
"$InputFileStateFile tomcat-log2"

Snorby not display alerts on main page

Building a Snort / Barnyard2 / Snorby setup.
Having trouble with getting snorby to see events.
Snort and barnyard2 are both running at boot.
Here is my config relevant to the problem.
Snort:
output unified2: filename snort.u2, limit 128
Barnyard2:
config reference_file: /usr/local/snort/etc/reference.config
config classification_file: /usr/local/snort/etc/classification.config
config gen_file: /usr/local/snort/etc/gen-msg.map
config sid_file: /usr/local/snort/etc/sid-msg.map
config hostname:localhost
config interface: eth1
input unified2
output database: log, mysql, user=snort password=snorbypass dbname=snorby host=localhost
Snorby:
snorby: &snorby
adapter: mysql
username: snort
password: "snorbypass"
host: localhost
rc.local:
ifconfig eth1 up
/usr/local/snort/bin/snort -D -u snort -g snort \
-c /usr/local/snort/etc/snort.conf -i eth1
/usr/local/bin/barnyard2 -c /usr/local/snort/etc/barnyard2.conf \
-d /var/log/snort \
-f snort.u2 \
-w /var/log/snort/barnyard2.waldo \
-D
Current status:
Right now, when the system boots, I can see both the snort and barnyard2 processes running as issued in the rc.local.
When browsing to localhost in a browser, I can login to Snorby and change password, etc...
I can also see the sensor listed under sensors.
When looking at the workers, there is one running. I have also deleted this from from withing the web ui and recreated it without any issues.
When looking in the database for snorby, I can "SELECT * from signature" and see alot of signatures listed here.
Also, I can see the size of the most recent /var/log/snort/snort.u2.1398021580 being constantly updated.
My barnyard.waldo is also in this directory and I can see it with data and you can see that it is no longer a text file, but a binary. This can be re-created by deleting the file re-creating a new barnyard2.waldo text file and restarting barnyard2. By doing so, the file will be turned into a binary file with the size of 2056.
The file ownership is snort:snort and the file permissions on the directory /var/log/snort is 666.
Possible problem??::
The only thing I can see that is not functioning correctly is when I stop barnyard2 and start without -D to see the startup.
I receive a repeating error:
--== Initialization Complete ==--
Using waldo file '/var/log/snort/barnyard2.waldo':
spool directory = /var/log/snort
spool filebase = snort.u2
time_stamp = 1398023768
record_idx = 0
Opened spool file '/var/log/snort/snort.u2.1398023768'
WARNING: No function defined to read header.
WARNING: No function defined to read header.
Closing spool file '/var/log/snort/snort.u2.1398023768'. Read 0 records
Opened spool file '/var/log/snort/snort.u2.1398024174'
WARNING: No function defined to read header.
Waiting for new data
WARNING: No function defined to read header.
WARNING: No function defined to read header.
WARNING: No function defined to read header.
WARNING: No function defined to read header.
WARNING: No function defined to read header.
WARNING: No function defined to read header.
There is very little on this error when I looked through Google, but I believe that barnyard2 is having trouble reading the snort,u2 file. You can see here that it seems to load it okay, but that is about it. Regardless, when looking in the Snorby UI, there are 0 events on the listed sensor.
Any ideas would be greatly appreciated.
Once I ran PulledPork to down load snort rules, I received a new sid-msg.map file for the version of snort currently running. Repeating error resolved.

Chef Deployment with irrelevant default symlinks

I am trying to deploy my application code with Chef, which is working for one node and failing on another. I cannot determine why it works for one node and not another when they have the exact same config, but I can at least try to debug the problem on the node that fails.
deploy_revision app_config['deploy_dir'] do
scm_provider Chef::Provider::Git
repo app_config['deploy_repo']
revision app_config['deploy_branch']
if secrets["deploy_key"]
git_ssh_wrapper "#{app_config['deploy_dir']}/git-ssh-wrapper" # For private Git repos
end
enable_submodules true
shallow_clone false
symlink_before_migrate({}) # Symlinks to add before running db migrations
purge_before_symlink [] # Directories to delete before adding symlinks
create_dirs_before_symlink [] # Directories to create before adding symlinks
# symlinks()
action :deploy
restart_command do
service "apache2" do action :restart; end
end
end
This is my recipe for deploying the code. Notice that I have tried disabling symlinking entirely, as Chef always jams its own defaults in. Even with this I get the error:
================================================================================
Error executing action `deploy` on resource 'deploy_revision[/var/www]'
================================================================================
Chef::Exceptions::FileNotFound
------------------------------
Cannot symlink /var/www/shared/config/database.yml to /var/www/releases/7404041cf8859a35de90ae72091bea1628391075/config/database.yml before migrate: No such file or directory - /var/www/shared/config/database.yml or /var/www/releases/7404041cf8859a35de90ae72091bea1628391075/config/database.yml
Resource Declaration:
---------------------
# In /var/chef/cache/cookbooks/kapture/recipes/api.rb
68:
69: deploy_revision app_config['deploy_dir'] do
70: scm_provider Chef::Provider::Git
71: repo app_config['deploy_repo']
72: revision app_config['deploy_branch']
73: if secrets["deploy_key"]
74: git_ssh_wrapper "#{app_config['deploy_dir']}/git-ssh-wrapper" # For private Git repos
75: end
76: enable_submodules true
Compiled Resource:
------------------
# Declared in /var/chef/cache/cookbooks/kapture/recipes/api.rb:69:in `from_file'
deploy_revision("/var/www") do
destination "/var/www/shared/cached-copy"
symlink_before_migrate {"config/database.yml"=>"config/database.yml"}
updated_by_last_action true
restart_command #<Proc:0x00007f40f366e5a0#/var/chef/cache/cookbooks/kapture/recipes/api.rb:82>
repository_cache "cached-copy"
retries 0
keep_releases 5
create_dirs_before_symlink ["tmp", "public", "config"]
updated true
provider Chef::Provider::Deploy::Revision
enable_submodules true
deploy_to "/var/www"
current_path "/var/www/current"
recipe_name "api"
revision "HEAD"
scm_provider Chef::Provider::Git
purge_before_symlink ["log", "tmp/pids", "public/system"]
git_ssh_wrapper "/var/www/git-ssh-wrapper"
remote "origin"
shared_path "/var/www/shared"
cookbook_name "kapture"
symlinks {"log"=>"log", "system"=>"public/system", "pids"=>"tmp/pids"}
action [:deploy]
repo "git#github.com:kapture/api.git"
retry_delay 2
end
[2012-09-24T15:42:07+00:00] ERROR: Running exception handlers
[2012-09-24T15:42:07+00:00] FATAL: Saving node information to /var/chef/cache/failed-run-data.json
[2012-09-24T15:42:07+00:00] ERROR: Exception handlers complete
[2012-09-24T15:42:07+00:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
[2012-09-24T15:42:07+00:00] FATAL: Chef::Exceptions::FileNotFound: deploy_revision[/var/www] (kapture::api line 69) had an error: Chef::Exceptions::FileNotFound: Cannot symlink /var/www/shared/config/database.yml to /var/www/releases/7404041cf8859a35de90ae72091bea1628391075/config/database.yml before migrate: No such file or directory - /var/www/shared/config/database.yml or /var/www/releases/7404041cf8859a35de90ae72091bea1628391075/config/database.yml
Here you can see it mention database.yml, tmp/, system/ and pids folders, all of which are defaults that Chef likes to set (see related bug)
Question 1
What are these symlinks for and how do I know if I even need any. What sort of things am I symlinking? I will be using migrations, so if they are useful for the migration then I'll need them working.
I have read the documentation many times and it just doesn't explain this is plain English - at least not that I have found.
Question 2
If I do not require them, how can I disable symlinking entirely? Following the examples in that bug report have not helped.
Clear out all the symlink attributes.
deploy_revision("/var/www") do
# ...
symlink_before_migrate.clear
create_dirs_before_symlink.clear
purge_before_symlink.clear
symlinks.clear
end
Make sure the deployment directory in shared has the proper directory structure (/var/www/shared/[log,pids,system,config]) and that all config files necessary for your application are in the config directory.
Your recipe for your application's cookbook should have an array of directory names to create (recursively) so that you won't run into this error again.
The symlinks are there so that while your application code will continue to evolve, you can share the log, pids, and system folder by symlinking shared/log to current/log and so on and so forth..
Chef happens to cache the directory structure - somehow, I haven't dug into that - with this troll application cookbook. It's something in the deploy resource I believe - I never use that - but you can fix it by deleting the directory structure it creates in /var/derp or whatever. Also ensure your tmp directory is setup.
A couple reasons this may be an issue:
File permissions are incorrect
The currently running chef user cannot access the file
You are using the application cookbook in a configuration for rails deploys and your application does not have the same directory structure.
It's definitely caused by Chef caching the state of the deploy somewhere, then reading that state out of it's cache - wherever that is - and then reusing that. I would look at either the application cookbook to see for any persistance, and if you fail at finding it there, look in the deploy resource from chef itself.