Installing Apache Spark from source

1. Introduction

I will show how to intall Apache Spark3.0.1 from source.

2. Install procedure

(1) Install Java8

sudo yum install java-1.8.0-openjdk
sudo yum install java-1.8.0-openjdk-devel

(2) Install Maven3.6.3 which is a Java build tool

wget https://ftp.yz.yamagata-u.ac.jp/pub/network/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
sudo tar xf ./apache-maven-3.6.3-bin.tar.gz -C /opt
sudo ln -s /opt/apache-maven-3.6.3 /opt/maven

(3) Set environment variable for maven
sudo vi /etc/profile.d/maven.sh

export JAVA_HOME=/usr/lib/jvm/jre-openjdk
export M2_HOME=/opt/maven
export MAVEN_HOME=/opt/maven
export PATH=${M2_HOME}/bin:${PATH}
–add the following lines
sudo chmod +x /etc/profile.d/maven.sh
source /etc/profile.d/maven.sh

(4) Install Scala2.12

wget https://downloads.lightbend.com/scala/2.12.13/scala-2.12.13.tgz
tar xf ./scala-2.12.13.tgz -C /usr/local
vi ~/.bashrc
–add the following lines
export SCALA_HOME=/usr/local/scala

source ~/.bashrc

(5) Install Spark3.0.1

wget https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1.tgz
tar xf ./spark-3.0.1.tgz
cd ./spark-3.0.1
./build/mvn -DskipTests clean package

(6) Verfy whether Installing Succeeds or not

./bin/run-example SparkPi 10

3. References

[1] Spark Standalone Mode
https://spark.apache.org/docs/latest/spark-standalone.html

[2] Building Spark
https://spark.apache.org/docs/latest/building-spark.html

[3] Installing Apache Maven on CentOS 7
https://cloudwafer.com/blog/installing-apache-maven-on-centos-7/

Published by ktke109

I love open souce database management systems.