Screenshot 2024 12 30 at 8.55.22 AM
87
Views
image 3 png
Fig. Apache Spark Logo

Apache Spark is a powerful, open-source distributed computing system that is widely used for big data processing and analytics. If you want to run Apache Spark on Ubuntu using UTM on a Mac, this guide will walk you through the process step-by-step.

Thank you for reading this post, don't forget to subscribe!

Prerequisites

  1. A Mac with UTM installed.
  2. An Ubuntu virtual machine running in UTM.
  3. Internet access within the Ubuntu VM.

Step 1: Update and Upgrade Ubuntu

Start by updating the package index and upgrading existing packages:

sudo apt update && sudo apt upgradeubuntu update command
Fig. update/upgrade your linux.

Step 2: Install Java

Apache Spark requires Java to run. Install OpenJDK:

sudo apt install default-jdk -y
Screenshot from 2024 12 30 02 42 52
Fig. Installing Java JDK, with linux command.

Verify the installation:

java -version
check java -version
Fig.Checking java version after installation.

Step 3: Install Scala (Optional)

While Spark runs on the Java Virtual Machine (JVM), it’s often used with Scala.

sudo apt install scala -y
install scala
Fig.Installing scala.

Verify Scala installation:

scala -version

check scala version in linux
Fig.Checking scala version.

Step 4: Download Apache Spark

Visit the official Apache Spark download page to get the latest version of Spark. Alternatively, use the following command to download the latest stable release (replace the URL with the appropriate version):

wget https://downloads.apache.org/spark/spark-<version>/spark-<version>-bin-hadoop3.tgz

or 

curl -O https://archive.apache.org/dist/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz

Extract the downloaded file:

tar -xvzf spark-<version>-bin-hadoop3.tgz

or

sudo tar xvf spark-3.2.0-bin-hadoop3.2.tgz

Move the extracted folder to /opt:

sudo mv spark-<version>-bin-hadoop3 /opt/spark

or 
run the below command sequentially.

$ sudo mkdir /opt/spark
$ sudo mv spark-3.2.0-bin-hadoop3.2/* /opt/spark
$ sudo chmod -R 777 /opt/spark

Step 5: Set Environment Variables

Edit the ~/.bashrc file to set environment variables for Spark:

sudo vim ~/.bashrc

Add the following lines at the end of the file:

# Spark Environment Variables
export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$PATH
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:/bin/java::")

or 

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

# Save the file, to bring the changes in effect.

# Exit from the vim editor using the following command.


:wq

Reload the environment:

source ~/.bashrc

Follow the Step 6. if you want to install python and run the PySpark.

Step 6: Install Python and pip (Optional)

If you plan to use PySpark, ensure Python and pip are installed:

sudo apt install python3 python3-pip -y

Verify the installation:

python3 --version
pip3 --version

Install PySpark:

pip3 install pyspark

Step 7: Verify the Installation

Run the Spark shell to confirm the installation:

For Scala:

spark-shell


For PySpark:
pyspark

If the Spark shell or PySpark starts successfully, your installation is complete.

Start the standalone master server.

$ start-master.sh

Start the Apache Spark worker process.

$ start-slave.sh

Spark Web UI

Browse the Spark UI to know about worker nodes, running application, cluster resources.

use
http://localhost:8080
http://localhost:4040

Step 8: Run a Sample Application

Create a test script (e.g., test.py) to validate the Spark setup:

from pyspark.sql import SparkSession

# Initialize SparkSession
spark = SparkSession.builder.appName("TestSpark").getOrCreate()

# Test DataFrame
data = [("Alice", 34), ("Bob", 45), ("Cathy", 29)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)
df.show()

Run the script:

python3 test.py

You should see the test data output displayed in the terminal.


Conclusion

You have successfully installed Apache Spark on Ubuntu in a UTM virtual machine on your Mac. You can now start using Spark for your big data projects. If you encounter any issues, consult the Apache Spark Documentation. Comment down if you are facing any issues, i will try to resolve your issue at the earliest.

Article Categories:
Blog · How To · Softwares

Comments are closed.