Installing spark-avro - pyspark

I'm trying to read avro files in pyspark.
Found out from How to read Avro file in PySpark that spark-avro is the best way to do that but I can't figure out how to install that from their Github repo. There's no downloadable jar, do I build it myself? How?
It's Spark 1.6 (pyspark) running on a cluster. I didn't set it up so don't know much about the configs but I have sudo access so I guess I should be able to install stuff. But the machine doesn't have direct internet access so need to manually copy and install stuff to it.
Thank you.

You can add spark-avro as a package when running pyspark or spark-submit: https://github.com/databricks/spark-avro#with-spark-shell-or-spark-submit but this will require internet access on driver (driver will then distribute all files to the executors).
If you have no internet access on a driver you will need to build spark-avro yourself to a fat jar:
git clone https://github.com/databricks/spark-avro.git
cd spark-avro
# If you are using spark package other than newest,
# checkout appropriate tag based on table in spark-avro README,
# for example for spark 1.6:
# git checkout v2.0.1
./build/sbt assembly
Then test it using pyspark shell:
./bin/pyspark --jars ~/git/spark-avro/target/scala-2.11/spark-avro-assembly-3.1.0-SNAPSHOT.jar
>>> spark.range(10).write.format("com.databricks.spark.avro").save("/tmp/output")
>>> spark.read.format("com.databricks.spark.avro").load("/tmp/output").show()
+---+
| id|
+---+
| 7|
| 8|
| 9|
| 2|
| 3|
| 4|
| 0|
| 1|
| 5|
| 6|
+---+

Should be possible with
wget https://repo1.maven.org/maven2/org/apache/spark/spark-avro_2.11/${SPARK_VERSION}/spark-avro_2.11-${SPARK_VERSION}.jar -P $SPARK_HOME/jars/
echo spark.executor.extraClassPath $SPARK_HOME/jars/spark-avro_2.11-$SPARK_VERSION.jar >> /usr/local/spark/conf/spark-defaults.conf
echo spark.driver.extraClassPath $SPARK_HOME/jars/spark-avro_2.11-$SPARK_VERSION.jar >> /usr/local/spark/conf/spark-defaults.conf

Related

Remote VS Code without scp

I have a question on installing vs-code server on a remote machine without scp. I have no root priveleges on the remote. Also scp is not available on remote machine. So connection freezes on message "Setting up SSH Host $hostname: Copying VS Code Server to host with scp".
I tried to transfer .vscode-server from another remote machine that connects without problems to target remote machine but in this case the message is
Acquiring lock on /home/username/.vscode-server/bin/5235c6bb189b60b01b1f4906"
May be the problem is in commitId 5235c6bb189b60b01b1f4906?
May be there is some solution to install the server on a remote machine in the reight way that do not leed to Acquiring lock problems?
Probably it's not specific to the commit, but rooted in the problem that VSCode still can't use SCP to create lock files (or any files for that matter).
So, fix that (probably takes an email to an admin. Disabling SCP and SFTP has zero security advantage, because an attacker can do the very same thing with raw remote shell – just normal users have a harder time, as you notice). Your VS Code depends on it.
Worst case, if necessary, you can copy over your own SSH server, and matching configuration that runs as your regular user, uses a non-privileged port (instead of 22) and has SCP enabled, start it manually via the regular SSH connection, and use your remote machine's "official" ssh server only as jumphost to get to your "private" SSH server.
In short, something like
/usr/sbin/sshd -h ~/some_key_you_generated -f /dev/null -e -D -p 9999
#^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
#\……………….……………/ | \………………………….……………………………/ | \…….………/ | | | \.…/
# | | | | | | | | |
#Need to run | You'll have to generate | Was too | | \ /
#openSSHd with | a valid SSH keypair. | lazy to | | \/
#full path,else | But it's the same | write a | | Use port 9999
#it refuses to | `ssh-keygen` invo- | config. | | (needs to be
#run | cation as for genera- | Use de- | | > 1024, for un-
# | ting user keys. | faults. | | privileged
# Specify your own | | | users)
# host key Specify your own | Don't fork
# config file | into back-
# | ground
# |
# Error messages go to
# standard output in-
# stead of system log
# (which we can't write,
# anyways)
might do

pg_repack version mismatch after maintenance on cloud sql (GCP)

I have a cloud-sql postgres11 instance on GCP and use pg_repack cron for cleaning my database. I've noticed that since last maintenance occurred (7th of March 21) I cannot perform a repack.
When tried to manually run a repack I encountered this error message:
ERROR: pg_repack failed with error: program 'pg_repack 1.4.4' does not match database library 'pg_repack 1.4.6'
Did the following checks:
what is the version of pg_repack loaded:
List of installed extensions
Name | Version | Schema | Description
--------------------+---------+------------+--------------------------------------------------------------
pg_repack | 1.4.4 | public | Reorganize tables in PostgreSQL databases with minimal locks
pg_stat_statements | 1.6 | public | track execution statistics of all SQL statements executed
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(3 rows)
what is the available version of pg_repack:
name | version | installed | superuser | relocatable | schema | requires | comment
-----------+---------+-----------+-----------+-------------+--------+----------+--------------------------------------------------------------
pg_repack | 1.4.4 | t | t | f | | | Reorganize tables in PostgreSQL databases with minimal locks
(1 row)
I upgraded pg_repack to version 1.4.6 and it did not help, I also tried to drop and create the extension, or restart the sql-instance with no luck. :-(
I wonder if someone had encouctered this issue. If so, is there any solution?
I got this working on Debian 10 with a very jank workaround. Basically I built a copy of 1.4.6 with the version checks commented out, and successfully ran it with the -k flag:
sudo apt install build-essential postgresql-server-dev-13 libssl-dev zlib1g-dev libreadline-dev
git clone https://github.com/yunyu/pg_repack.git # My fork with the version checks commented out
cd pg_repack
make && sudo make install
./bin/pg_repack <flags>
It seemed to work and I haven't run into any issues. Obviously run this on a VM that can access the Postgres instance, since you need shell access to even execute pg_repack.
Upgrade the extension:
ALTER EXTENSION pg_repack UPDATE;

Installing pg_profile for postgreSql

I am trying to use the extension pg_profile in order to get awr-like reports in postgreSql.
In githab README file it says:
https://github.com/zubkov-andrei/pg_profile
# cp pg_profile* `pg_config --sharedir`/extension
and then:
postgres=# CREATE EXTENSION pg_profile;
But this fails with:
ERROR: could not open extension control file "/usr/share/postgresql/11/extension/pg_profile.control": No such file or directory
and indeed I can only find pg_profile.control.tpl in under this directory (and in the zip file I downloaded from Github).
IF I try to follow the tutorial instructions saying:
https://dbtut.com/index.php/2019/03/29/postgresql-awr/
After downloading the file to the server, go the directory you
downloaded and follow the steps below. We’re running the Make command.
make install
Then I get this error:
# make install
Makefile:16: ../../src/Makefile.global: No such file or directory
Makefile:17: /contrib/contrib-global.mk: No such file or directory
make: *** No rule to make target '/contrib/contrib-global.mk'. Stop.
Any idea how to solve these?
I was able to run CREATE EXTENSION in PG 12.3 database but to do that:
I have removed the line PG_CONFIG = /usr/local/pgsql/bin/pg_config in Makefile:
I have run as root:
$ make USE_PGXS=y install
/usr/bin/mkdir -p '/usr/pgsql-12/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-12/share/extension'
/usr/bin/install -c -m 644 .//pg_profile.control '/usr/pgsql-12/share/extension/'
/usr/bin/install -c -m 644 pg_profile--0.1.1.sql pg_profile.control '/usr/pgsql-12/share/extension/'
And after that I could run without error:
postgres=# CREATE EXTENSION dblink;
postgres=# CREATE EXTENSION pg_stat_statements;
postgres=# CREATE EXTENSION pg_profile;
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
--------------------+---------+------------+--------------------------------------------------------------
dblink | 1.2 | public | connect to other PostgreSQL databases from within a database
pg_profile | 0.1.1 | public | PostgreSQL load profile repository and report builder
pg_stat_statements | 1.7 | public | track execution statistics of all SQL statements executed
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
postgres_fdw | 1.0 | public | foreign-data wrapper for remote PostgreSQL servers
(5 rows)

Executing Linux Command in Scala-Shell

I'm working on a project where I'm needing to execute some linux commands (sqoop command) in my Scala application. See sample command I tried executing with MySql on my VM.
import sys.process._
"sqoop eval --connect jdbc:mysql://localhost:3306/retail_db --username root --password cloudera --query 'select * from categories'".!
I got the following error:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/06/24 15:25:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
20/06/24 15:25:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure.
Consider using -P instead.
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Error parsing arguments for eval:
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Unrecognized argument: *
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Unrecognized argument: from
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Unrecognized argument: categories
I used this command as well and I got same error message:
"sqoop eval --connect jdbc:mysql://localhost:3306/retail_db --username root --password cloudera --query 'select * from categories'".!<
Can someone help me figure out what's cause of the error. I've tried using single quote and double quotes, all to no avail. I searched all over SO but I could not get any solution. That's why I'm posting here.
NOTE: Same command successfully executed in pyspark as seen below:
>>> import os
>>> import sys
>>> query = "sqoop eval --connect jdbc:mysql://localhost:3306/retail_db --username root --password
cloudera --query 'select * from categories'"
>>> os.system(query)
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/06/24 15:28:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
20/06/24 15:28:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure.
Consider using -P instead.
20/06/24 15:28:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
----------------------------------------------------
| category_id | category_department_id | category_name |
----------------------------------------------------
| 1 | 2 | Football |
| 2 | 2 | Soccer |
| 3 | 2 | Baseball & Softball |
| 4 | 2 | Basketball |
| 5 | 2 | Lacrosse |
| 6 | 2 | Tennis & Racquet |
It looks like sqoop doesn't recognize *, from, and categories as individual arguments. The reason it works when invoked from the command line is that the shell interprets the quote marks and presents them as a single select * from categories argument. In other words, the shell does some pre-processing before handing everything off to the sqoop program.
The .! method (i.e. the Scala ProcessBuilder) launches processes directly, which means that the command elements are not passed to a shell for pre-processing. There are two ways to get around this problem.
You can invoke the shell directly and pass the command-line to it as a single argument, or
you can do most of the obvious pre-processing yourself.
Here's an example of the 2nd option.
Seq("sqoop"
,"eval"
,"--connect"
,"jdbc:mysql://localhost:3306/retail_db"
,"--username"
,"root"
,"--password"
,"cloudera"
,"--query"
,"select * from categories").!
As you can see, all the individual arguments are presented as individual arguments, including the last one.

Fabric Mac: Not able to generate Fastlane

Recently I updated my Mac OS to High Sierra. After the OS update, the Fabric app is not working properly. Not able to generate the "Fastlane beta".
OS Details: Mac OS X: macOS High Sierra (10.13.2)
Fabric Details: App Version: 2.6.17 (1288) Fabric Version: 2.78.0
Environment:
<details><summary>✅ fastlane environment ✅</summary>
### Stack
| Key | Value |
| --------------------------- | ----------------------------------------------- |
| OS | 10.13.2 |
| Ruby | 2.2.4 |
| Bundler? | false |
| Git | git version 2.11.0 (Apple Git-81) |
| Installation Source | /usr/local/lib/fastlane_lib/bundle/bin/fastlane |
| Host | Mac OS X 10.13.2 (17C88) |
| Ruby Lib Dir | /usr/local/lib/fastlane_lib/bundle/lib |
| OpenSSL Version | OpenSSL 1.0.2g 1 Mar 2016 |
| Is contained | false |
| Is homebrew | false |
| Is installed via Fabric.app | true |
| Xcode Path | /Applications/Xcode 8.3.app/Contents/Developer/ |
| Xcode Version | 8.3.3 |
### System Locale
| Variable | Value | |
| -------- | ----------- | - |
| LANG | en_US.UTF-8 | ✅ |
| LC_ALL | en_US.UTF-8 | ✅ |
| LANGUAGE | en_US.UTF-8 | ✅ |
### fastlane files:
**No Fastfile found**
**No Appfile found**
### fastlane gems
| Gem | Version | Update-Status |
| -------- | ------- | -------------- |
| fastlane | 2.78.0 | 💥 Check failed |
### Loaded fastlane plugins:
**No plugins Loaded**
<details><summary><b>Loaded gems</b></summary>
| Gem | Version |
| ------------------------- | ------------ |
| slack-notifier | 2.3.2 |
| CFPropertyList | 2.3.6 |
| claide | 1.0.2 |
| colored2 | 3.1.2 |
| nanaimo | 0.2.3 |
| xcodeproj | 1.5.4 |
| rouge | 2.0.7 |
| xcpretty | 0.2.8 |
| terminal-notifier | 1.8.0 |
| unicode-display_width | 1.3.0 |
| terminal-table | 1.8.0 |
| plist | 3.4.0 |
| public_suffix | 2.0.5 |
| addressable | 2.5.2 |
| multipart-post | 2.0.0 |
| word_wrap | 1.0.0 |
| tty-screen | 0.6.4 |
| tty-cursor | 0.5.0 |
| tty-spinner | 0.8.0 |
| babosa | 1.0.2 |
| colored | 1.2 |
| highline | 1.7.10 |
| commander-fastlane | 4.4.5 |
| excon | 0.60.0 |
| faraday | 0.14.0 |
| unf_ext | 0.0.7.4 |
| unf | 0.1.4 |
| domain_name | 0.5.20170404 |
| http-cookie | 1.0.3 |
| faraday-cookie_jar | 0.0.6 |
| fastimage | 2.1.1 |
| gh_inspector | 1.0.3 |
| json | 1.8.1 |
| mini_magick | 4.5.1 |
| multi_json | 1.13.1 |
| multi_xml | 0.6.0 |
| rubyzip | 1.2.1 |
| security | 0.1.3 |
| xcpretty-travis-formatter | 1.0.0 |
| dotenv | 2.2.1 |
| bundler | 1.16.1 |
| faraday_middleware | 0.12.2 |
| uber | 0.1.0 |
| declarative | 0.0.10 |
| declarative-option | 0.1.0 |
| representable | 3.0.4 |
| retriable | 3.1.1 |
| mime-types-data | 3.2016.0521 |
| mime-types | 3.1 |
| little-plugger | 1.1.4 |
| logging | 2.2.2 |
| jwt | 2.1.0 |
| memoist | 0.16.0 |
| os | 0.9.6 |
| signet | 0.8.1 |
| googleauth | 0.6.2 |
| httpclient | 2.8.3 |
| google-api-client | 0.13.6 |
| libxml-ruby | 3.0.0 |
</details>
*generated on:* **2018-02-01**
</details>
Can anyone please let me know what is the issue on this configuration? Thanks in advance.
EDIT:
I have tried the solution given by #Mike but, I couldn't resolve the issue. Please find the Terminal output,
Last login: Mon Feb 5 16:45:31 on qqvm915 Yuva-M:~ Yuva$ rm -rf
~/.fastlane/bin Yuva-M:~ Yuva$ rm -rf /usr/local/lib/fastlane_lib
Yuva-M:~ Yuva$ cd /Users/Yuva/Documents/iOS\
Applications/TestApp/TestApp\ App/TestApp\ App\ Dev Yuva-M:TestApp
Dev Yuva$ touch Gemfile Yuva-M:TestApp Dev Yuva$ bundle update
Fetching source index from https://rubygems.org/
Retrying fetcher due to error (2/4): Bundler::HTTPError Could not
fetch specs from https://rubygems.org/ Retrying fetcher due to error
(3/4): Bundler::HTTPError Could not fetch specs from
https://rubygems.org/ Retrying fetcher due to error (4/4):
Bundler::HTTPError Could not fetch specs from
https://rubygems.org/Resolving
dependencies............................ Using CFPropertyList 2.3.6
Using public_suffix 2.0.5 Using addressable 2.5.2 Fetching atomos
0.1.2
Your user account isn't allowed to install to the system RubyGems.
You can cancel this installation and run:
bundle install --path vendor/bundle
to install the gems into ./vendor/bundle/, or you can enter your
password and install the bundled gems to RubyGems using sudo.
Password:
Your user account isn't allowed to install to the system RubyGems.
You can cancel this installation and run:
bundle install --path vendor/bundle
to install the gems into ./vendor/bundle/, or you can enter your
password and install the bundled gems to RubyGems using sudo.
Password: Installing atomos 0.1.2 Using babosa 1.0.2 Using bundler
1.16.1 Using claide 1.0.2 Using colored 1.2 Using colored2 3.1.2 Using highline 1.7.10 Using commander-fastlane 4.4.5 Using declarative
0.0.10 Using declarative-option 0.1.0 Using unf_ext 0.0.7.4 Using unf 0.1.4 Using domain_name 0.5.20170404 Using dotenv 2.2.1 Using excon 0.60.0 Using multipart-post 2.0.0 Using faraday 0.14.0 Using http-cookie 1.0.3 Using faraday-cookie_jar 0.0.6 Using
faraday_middleware 0.12.2 Using fastimage 2.1.1 Fetching gh_inspector
1.1.1 Installing gh_inspector 1.1.1 Using jwt 2.1.0 Using little-plugger 1.1.4 Using multi_json 1.13.1 Using logging 2.2.2 Using
memoist 0.16.0 Using os 0.9.6 Using signet 0.8.1 Using googleauth
0.6.2 Using httpclient 2.8.3 Using mime-types-data 3.2016.0521 Using mime-types 3.1 Using uber 0.1.0 Using representable 3.0.4 Using
retriable 3.1.1 Using google-api-client 0.13.6 Fetching json 2.1.0
Installing json 2.1.0 with native extensions Using mini_magick 4.5.1
Using multi_xml 0.6.0 Using plist 3.4.0 Using rubyzip 1.2.1 Using
security 0.1.3 Using slack-notifier 2.3.2 Using terminal-notifier
1.8.0 Using unicode-display_width 1.3.0 Using terminal-table 1.8.0 Using tty-screen 0.6.4 Using tty-cursor 0.5.0 Using tty-spinner 0.8.0
Using word_wrap 1.0.0 Using nanaimo 0.2.3 Fetching xcodeproj 1.5.6
Installing xcodeproj 1.5.6 Using rouge 2.0.7 Using xcpretty 0.2.8
Using xcpretty-travis-formatter 1.0.0 Using fastlane 2.80.0 Bundle
updated! Yuva-M:TestApp Dev Yuva$ bundle exec fastlane beta [✔] 🚀
[17:10:27]: Could not find fastlane in current directory. Make sure to
have your fastlane configuration files inside a folder called
"fastlane". Would you like to set fastlane up? (y/n) y [✔] Looking for
iOS and Android projects in current directory... [17:10:34]: Created
new folder './fastlane'. [17:10:34]: Detected an iOS/macOS project in
the current directory: 'TestApp.xcworkspace' [17:10:34]:
----------------------------- [17:10:34]: --- Welcome to fastlane 🚀 --- [17:10:34]: ----------------------------- [17:10:34]: fastlane can help you with all kinds of automation for your mobile app [17:10:34]:
We recommend automating one task first, and then gradually automating
more over time [17:10:34]: What would you like to use fastlane for?
1. 📸 Automate screenshots
2. 👩‍✈️ Automate beta distribution to TestFlight
3. 🚀 Automate App Store distribution
4. 🛠 Manual setup - manually setup your project to automate your tasks ?
Fastlane folder generated into project directory but, no file were there. I just tried to generate a build from "Fabric" Mac App, still am getting the error (Find the screenshot.) Any guidance? Thanks.
Mike from Fabric. I'd recommend switching from fastlane in Fabric to using it independently. Here's how to do it, referenced from Felix Krause.
In terminal, run:
rm -rf ~/.fastlane/bin
-- This removes the current fastlane installation from your home directory.
rm -rf /usr/local/lib/fastlane_lib
-- This removes the current fastlane installation from your system.)
After, edit your ~/.bashrc or ~/.bash_profile and remove any reference to fastlane. We're clearing everything out so that you can set up a fresh install.
Now, navigate to your project directory and run touch Gemfile. Add the following content to that file:
source "https://rubygems.org"
gem "fastlane"
Save the file and run bundle update. Commit both the Gemfile and the Gemfile.lock to version control and from now on, every time you run fastlane, prefix your command using bundle exec:
bundle exec fastlane beta
Finally, restart the Fabric macOS app and you'll be all set.