AWS Lambda Function Can't Find Package - Psycopg2 - postgresql

Here's my issue:
START RequestId: 3ef6bbb9-62da-11e8-82ba-81e0afb0b224 Version: $LATEST
Unable to import module 'lambda_insertEmailAddress': No module named psycopg2
END RequestId: 3ef6bbb9-62da-11e8-82ba-81e0afb0b224
REPORT RequestId: 3ef6bbb9-62da-11e8-82ba-81e0afb0b224 Duration: 0.44 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 19 MB
My zip file has the following structure file name: lambdaInsertEmail.zip:
total 98784
drwxrwxrwx 20 chauncey staff 680B May 27 13:22 psycopg2
drwxrwxrwx 22 chauncey staff 748B May 27 12:55 postgresql-9.4.3
-rwxrwxrwx 1 chauncey staff 1.8K Apr 30 15:41 lambda_insertEmailAddress.py
-rw-r--r-- 1 chauncey staff 48M May 30 12:09 lambdaInsertEmail.zip
In case you want to know my setup.cfg file has the following changes:
pg_config=/Users/chauncey/Desktop/portfolio/aws_lambda_files/lambda_insertEmailAddress/postgresql-9.4.3/src/bin/pg_config/pg_config
static_libpq=1
I'm trying to get this lambda function working.

The problem is being cause because psycopg2 needs to be build an compiled with statically linked libraries for Linux. Please reference Using psycopg2 with Lambda to Update Redshift (Python) for more details on this issue. Another reference of problems of compiling psycopg2 on OSX.
One Solution is to compile the library using an Amazon Linux AMI. Once you have connected to the AMI with SSH:
Set up aws credentials using aws configure
sudo su -
pip install psycopg2 -t /path/to/project-dir
Zip directory zip project-dir.zip project-dir
Upload to Lambda using CLI
Hopefully this helps you understand the issue and a possible solution solution. There are additional solutions in the above references, such as using boto3 or this repo https://github.com/jkehler/awslambda-psycopg2, but this is the solution that worked for me.

pip install psycopg2-binary
fixed it for me

Related

AWS Elastic Beanstalk failed to install psycopg2 using requirements.txt Git Pip

I am trying to deploy an app using elasticbeanstalk with Python 3.8. I am using the following requirements.txt
click==8.0.1
Flask==1.1.2
Flask-SQLAlchemy==2.5.1
greenlet==1.1.0
itsdangerous==2.0.1
Jinja2==3.0.1
MarkupSafe==2.0.1
marshmallow==3.12.1
marshmallow-sqlalchemy==0.25.0
SQLAlchemy==1.4.15
Werkzeug==2.0.1
celery[redis]
psycopg2==2.9.3
Flask-JWT-Extended==4.3.1
Flask-RESTful==0.3.9
python-decouple==3.6
When I run the command eb create, I get the following error
2022-04-05 22:03:00 INFO Created security group named: sg-00b14485064e5e8ca
2022-04-05 22:03:16 INFO Created security group named: awseb-e-ekd3bw2bvf-stack-AWSEBSecurityGroup-1O3NAVBIRRK30
2022-04-05 22:03:31 INFO Created Auto Scaling launch configuration named: awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingLaunchConfiguration-HKjIVsa84E3U
2022-04-05 22:04:49 INFO Created Auto Scaling group named: awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingGroup-5FQOAWMGCR3W
2022-04-05 22:04:49 INFO Waiting for EC2 instances to launch. This may take a few minutes.
2022-04-05 22:04:49 INFO Created Auto Scaling group policy named: arn:aws:autoscaling:us-east-1:208357543212:scalingPolicy:ecfbbff0-4151-492f-a474-ba01535ad348:autoScalingGroupName/awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingGroup-5FQOAWMGCR3W:policyName/awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingScaleDownPolicy-CI2UIP6X023P
2022-04-05 22:04:49 INFO Created Auto Scaling group policy named: arn:aws:autoscaling:us-east-1:208357543212:scalingPolicy:d534189a-45e3-48f1-a206-720f202b4469:autoScalingGroupName/awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingGroup-5FQOAWMGCR3W:policyName/awseb-e-ekd3bw2bvf-stack-AWSEBAutoScalingScaleUpPolicy-1F0WVTUXXPFKF
2022-04-05 22:05:04 INFO Created CloudWatch alarm named: awseb-e-ekd3bw2bvf-stack-AWSEBCloudwatchAlarmLow-W8URMJEYBO3C
2022-04-05 22:05:04 INFO Created CloudWatch alarm named: awseb-e-ekd3bw2bvf-stack-AWSEBCloudwatchAlarmHigh-13J8QHI51MEBM
2022-04-05 22:06:09 INFO Created load balancer named: arn:aws:elasticloadbalancing:us-east-1:208357543212:loadbalancer/app/awseb-AWSEB-IXOR2Z0K0OJV/1fba4c6ff6122c55
2022-04-05 22:06:24 INFO Created Load Balancer listener named: arn:aws:elasticloadbalancing:us-east-1:208357543212:listener/app/awseb-AWSEB-IXOR2Z0K0OJV/1fba4c6ff6122c55/734b0cf960b6b8c4
2022-04-05 22:06:42 ERROR Instance deployment failed to install application dependencies. The deployment failed.
2022-04-05 22:06:42 ERROR Instance deployment failed. For details, see 'eb-engine.log'.
2022-04-05 22:06:44 ERROR [Instance: i-0368a7ba2157241f4] Command failed on instance. Return code: 1 Output: Engine execution has encountered an error..
2022-04-05 22:06:45 INFO Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
2022-04-05 22:07:48 ERROR Create environment operation is complete, but with errors. For more information, see troubleshooting documentation.
I look at the corresponding logs and I get the following error:
Collecting Werkzeug==2.0.1
Downloading Werkzeug-2.0.1-py3-none-any.whl (288 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.2/288.2 KB 35.6 MB/s eta 0:00:00
Collecting celery[redis]
Downloading celery-5.2.6-py3-none-any.whl (405 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 405.6/405.6 KB 54.7 MB/s eta 0:00:00
Collecting psycopg2==2.9.3
Downloading psycopg2-2.9.3.tar.gz (380 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 380.6/380.6 KB 52.2 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
2022/04/05 22:06:42.952376 [INFO] error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [23 lines of output]
running egg_info
creating /tmp/pip-pip-egg-info-v0aygozt/psycopg2.egg-info
writing /tmp/pip-pip-egg-info-v0aygozt/psycopg2.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-v0aygozt/psycopg2.egg-info/dependency_links.txt
writing top-level names to /tmp/pip-pip-egg-info-v0aygozt/psycopg2.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-v0aygozt/psycopg2.egg-info/SOURCES.txt'
Error: pg_config executable not found.
pg_config is required to build psycopg2 from source. Please add the directory
containing pg_config to the $PATH or specify the full executable path with the
option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
If you prefer to avoid building psycopg2 from source, please install the PyPI
'psycopg2-binary' package instead.
For further information please check the 'doc/src/install.rst' file (also at
<https://www.psycopg.org/docs/install.html>).
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
I am not quite familiar with the requirements of AWS, but I could run the app locally and without any problem. I just wonder what would be a right configuration for the requirements.txt file in order to avoid the bug.
Thanks in advance.
You have to install postgresql-devel first before you can use psycopg2. You can add the installation instructions to your ebextentions:
packages:
yum:
postgresql-devel: []
or
commands:
command1:
command: yum install -y postgresql-devel
I could solve the error. I have to change psycopg2 by psycopg2-binary as it was suggested by the AWS logs:
If you prefer to avoid building psycopg2 from source, please install the PyPI
'psycopg2-binary' package instead.
This issue has to be with the particular configuration of the libraries and the specific Linux machines used in AWS.

How do I import classes from one or more local .jar files into a Spark/Scala Notebook?

I am struggling to load classes from JARs into my Scala-Spark kernel Jupyter notebook. I have jars at this location:
/home/hadoop/src/main/scala/com/linkedin/relevance/isolationforest/
with contents listed as follows:
-rwx------ 1 hadoop hadoop 7170 Sep 11 20:54 BaggedPoint.scala
-rw-rw-r-- 1 hadoop hadoop 186719 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1.jar
-rw-rw-r-- 1 hadoop hadoop 1482 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1-javadoc.jar
-rw-rw-r-- 1 hadoop hadoop 20252 Sep 11 21:36 isolation-forest_2.3.0_2.11-1.0.1-sources.jar
-rwx------ 1 hadoop hadoop 16133 Sep 11 20:54 IsolationForestModelReadWrite.scala
-rwx------ 1 hadoop hadoop 5740 Sep 11 20:54 IsolationForestModel.scala
-rwx------ 1 hadoop hadoop 4057 Sep 11 20:54 IsolationForestParams.scala
-rwx------ 1 hadoop hadoop 11301 Sep 11 20:54 IsolationForest.scala
-rwx------ 1 hadoop hadoop 7990 Sep 11 20:54 IsolationTree.scala
drwxrwxr-x 2 hadoop hadoop 157 Sep 11 21:35 libs
-rwx------ 1 hadoop hadoop 1731 Sep 11 20:54 Nodes.scala
-rwx------ 1 hadoop hadoop 854 Sep 11 20:54 Utils.scala
When I attempt to load the IsolationForest class like so:
import com.linkedin.relevance.isolationforest.IsolationForest
I get the following error in my notebook:
<console>:33: error: object linkedin is not a member of package com
import com.linkedin.relevance.isolationforest.IsolationForest
I've been Googling for several hours now to get to this point but am unable to progress further. What is the next step?
By the way, I am attempting to use this package: https://github.com/linkedin/isolation-forest
Thank you.
For Scala:
if you're using spylon-kernel, then you can specify additional jars in the %%init_spark section, as described in the docs (first is for jar file, second is for package, as described below):
%%init_spark
launcher.jars = ["/some/local/path/to/a/file.jar"]
launcher.packages = ["com.acme:super:1.0.1"]
For Python:
in the first cells of Jupyter notebook, before initializing the SparkSession, do the following:
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars <full_path_to>/isolation-forest_2.3.0_2.11-1.0.1.jar pyspark-shell'
this will add the jars into the PySpark context. But it's better to use --packages instead of --jars because it will also fetch all necessary dependencies, and put everything into the internal cache. For example
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.linkedin.isolation-forest:isolation-forest_2.3.0_2.11:1.0.0 pyspark-shell'
You only need to select version that matches your PySpark and Scala version (2.3.x & 2.4 are Scala 2.11, 3.0 is Scala 2.12), as it's listed in the Git repo.
I got the following to work with pure Scala, Jupyter Lab, and Almond, which uses Ammonite, no Spark or any other heavy overlay involved:
interp.load.cp (os.pwd/"yourfile.jar")
The above, added as a statement in the notebook directly, loads yourfile.jar from the current directory. After this you can import from the jar. For instance, import yourfile._, if yourfile is the name of the top level package. I observed one caveat that one should wait a bit, until the kernel starts properly, before attempting to load. If the first statement is run too fast (for instance with restart and run all) then the whol thing hangs. This seems to be an unrelated issue.
You can, of course, construct another path (look over here for the available API). Also under the ammonite magic imports link from above you will find info how to load a package from ivy or how to load a Scala script as well. The trick is to use the interp object and the LoadJar trait that you can access from it. LoadJar has the following API:
trait LoadJar {
/**
* Load a `.jar` file or directory into your JVM classpath
*/
def cp(jar: os.Path): Unit
/**
* Load a `.jar` from a URL into your JVM classpath
*/
def cp(jar: java.net.URL): Unit
/**
* Load one or more `.jar` files or directories into your JVM classpath
*/
def cp(jars: Seq[os.Path]): Unit
/**
* Load a library from its maven/ivy coordinates
*/
def ivy(coordinates: Dependency*): Unit
}

Uploading/Downloading file from Artifactory OSS while preserving file permissions

I have a binary file named node_exporter, which has executable file permissions:
-rwxr-xr-x. 1 root root 16834973 Jul 29 08:35 node_exporter
I use the Artifactory CLI to upload the file:
./jfrog rt u node_exporter {repo}/node_exporter.
And then to download the file:
./jfrog rt dl {repo}/node_exporter.
Once downloaded, the file loses executable permissions.
-rw-r--r--. 1 root root 16834973 Jul 29 08:44 node_exporter
I know that using cURL doesn't deal with metadata, but I'm not sure why the official CLI doesn't as well. Is it possible at all to upload/download while preserving file permissions? This is being used in integration with Jenkins, so I would rather not have to set the file permissions after downloading the binary for each build.

mounting bucket with fstab not working NEWBIE

I'm new on GCP and on linux and I try to mount a bucket on my centos instance using gcsfuse.
I tried with a script running at boot but it was not working so I tried with fstab (peoples told me it is much better)
But I got this error when I tried to ls my mounted point :
ls: reading directory .: Input/output error
here is my fstab file :
#
# /etc/fstab
# Created by anaconda on Tue Mar 26 23:07:36 2019
#
# Accessible filesystems, by reference, are maintained under'/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=de2d3dce-cce3-47a8-a0fa-5bfe54e611ab / xfs defaults 0 0
mybucket /mount/to/point gcsfuse rw,allow_other,uid=1001,gid=1001
According : https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/mounting.md
Thanks for your time.
Okay so I just had to wait 2 minutes due to google auth granting my key. Basically it works

Fabric take long time with ssh

I am running fabric to automate deployment. It is painfully slow.
My local environment:
(somenv)bob#sh ~/code/somenv/somenv/fabfile $ > uname -a
Darwin sh.local 12.4.0 Darwin Kernel Version 12.4.0: Wed May 1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64
My fab file:
#!/usr/bin/env python
import logging
import paramiko as ssh
from fabric.api import env, run
env.hosts = [ 'examplesite']
env.use_ssh_config = True
#env.forward_agent = True
logging.basicConfig(level=logging.INFO)
ssh.util.log_to_file('/tmp/paramiko.log')
def uptime():
run('uptime')
Here is the portion of the debug logs:
(somenv)bob#sh ~/code/somenv/somenv/fabfile $ > date;fab -f /Users/bob/code/somenv/somenv/fabfile/pefabfile.py uptime
Sun Aug 11 22:25:03 EDT 2013
[examplesite] Executing task 'uptime'
[examplesite] run: uptime
DEB [20130811-22:25:23.610] thr=1 paramiko.transport: starting thread (client mode): 0x13e4650L
INF [20130811-22:25:23.630] thr=1 paramiko.transport: Connected (version 2.0, client OpenSSH_5.9p1)
DEB [20130811-22:25:23.641] thr=1 paramiko.transport: kex algos:['ecdh-sha2-nistp256', 'ecdh-sha2-nistp384', 'ecdh-sha2-nistp521', 'diffie-hellman-grou
It takes 20 seconds before paramiko is even starting the thread. Surely, Executing task 'uptime' does not take that long. I can manually log in through ssh, type in uptime, and exit in 5-6 seconds. I'd appreciate any help on how to extract mode debug information. I made the changes mentioned here, but no difference.
Try:
env.disable_known_hosts = True
See:
https://github.com/paramiko/paramiko/pull/192
&
Slow public key authentication with paramiko
Maybe it is a problem with DNS resolution and/or IPv6.
A few things you can try:
replacing the server name by its IP address in env.hosts
disabling IPv6
use another DNS server (e.g. OpenDNS)
For anyone looking at this post-2014, paramiko, which was the slow component when checking known hosts, introduced a fix in March 2014 (v1.13), which was allowed as requirement by Fabric in v1.9.0, and backported to v1.8.4 and v1.7.4.
So, upgrade !