Nothing Special   »   [go: up one dir, main page]

Page MenuHomePhabricator

Create Spark docker images for version 3.5.3
Closed, ResolvedPublic

Description

We need to build a set of docker images for spark version 3.5.3

We need them for two reasons:

  • They are used by the spark operator
  • The yarn shuffler jar is built as part of this operation, which we subsequently need when installing that new shuffler version

Event Timeline

Oh, I have hit a hurdle building spark 3.5.3 with support for Hadoop version 2.10.2

2024-11-15 16:12:56 [docker-pkg-build] INFO - Downloading from central: https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-client-api/2.10.2/hadoop-client-api-2.10.2.jar (drivers.py:106)
2024-11-15 16:12:56 [docker-pkg-build] INFO - Downloading from central: https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-client-runtime/2.10.2/hadoop-client-runtime-2.10.2.jar (drivers.py:106)
2024-11-15 16:12:56 [docker-pkg-build] INFO - [INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.5.3:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [02:25 min]
[INFO] Spark Project Tags ................................. SUCCESS [ 31.608 s]
[INFO] Spark Project Sketch ............................... SUCCESS [  3.632 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 15.260 s]
[INFO] Spark Project Common Utils ......................... SUCCESS [ 18.302 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 17.160 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  5.112 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  4.527 s]
[INFO] Spark Project Launcher ............................. FAILURE [  0.994 s]
[INFO] Spark Project Core ................................. SKIPPED

2024-11-15 16:12:56 [docker-pkg-build] INFO - [ERROR] Failed to execute goal on project spark-launcher_2.12: Could not resolve dependencies for project org.apache.spark:spark-launcher_2.12:jar:3.5.3: The following artifacts could not be resolved: org.apache.hadoop:hadoop-client-api:jar:2.10.2 (absent), org.apache.hadoop:hadoop-client-runtime:jar:2.10.2 (absent): Could not find artifact org.apache.hadoop:hadoop-client-api:jar:2.10.2 in gcs-maven-central-mirror (https://maven-central.storage-download.googleapis.com/maven2/) -> [Help 1]

There are some guidelines here about specifying the Hadoop version: https://spark.apache.org/docs/3.5.3/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn but maven central doesn't host any hadoop-client jars below 3.0.0
https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-client-api/

image.png (652×799 px, 105 KB)

Thinking about what to do now.

The hadoop2 profile was removed from spark 3.5.0
https://issues.apache.org/jira/browse/SPARK-42452 but this change didn't make it into the relerase notes: https://spark.apache.org/releases/spark-release-3-5-0.html

I don't think this should be a problem, as the client jars should be backwards compatible.

As an example, the jupyter noteboosk that I run with Spark 3.3.2 constantly remind me on init that they are using hadoop-client-api-3.3.2.jar.

So in practice it should work, even if Spark is compiled againts >= 3.0 hadoop, because the APIs are stable and wire compatible.

...
The requested conda environment has already been packed.
If you want it to be repacked, set force=True in conda_pack_kwargs.
Shipping conda-env.tgz to remote Spark executors.
SPARK_HOME: /home/xcollazo/.conda/envs/spark33/lib/python3.10/site-packages/pyspark
Using Hadoop client lib jars at hadoop-client-api-3.3.2.jar                                 <<<<<<<<<<<
hadoop-client-runtime-3.3.2.jar, provided by Spark.
PYSPARK_PYTHON=conda-env/bin/python3
:: loading settings :: file = /etc/maven/ivysettings.xml
:: loading settings :: url = jar:file:/srv/home/xcollazo/.conda/envs/spark33/lib/python3.10/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
....

I don't think this should be a problem, as the client jars should be backwards compatible.

As an example, the jupyter noteboosk that I run with Spark 3.3.2 constantly remind me on init that they are using hadoop-client-api-3.3.2.jar.

So in practice it should work, even if Spark is compiled againts >= 3.0 hadoop, because the APIs are stable and wire compatible.

OK, great. Thanks Xabriel. So I'll try again and target Hadoop version 3.3.6, which is what we will likely be using with Bigtop 3.3.0 in {T379385) (unles bigtop get another release out before we upgrade).
It's only the spark shuffler for YARN that is critical to get working out of this container, for now. Hopefully that will also be compatible.

Change #1092194 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/docker-images/production-images@master] Add spark version 3.5.3 to production images

https://gerrit.wikimedia.org/r/1092194

Change #1092194 merged by Btullis:

[operations/docker-images/production-images@master] Add spark version 3.5.3 to production images

https://gerrit.wikimedia.org/r/1092194

I've set off a build with:

root@build2001:/srv/images/production-images# /srv/deployment/docker-pkg/venv/bin/docker-pkg -c /etc/production-images/config.yaml build images/ --select '*spark3.5*'

Oh, the build failed on build2001 with the following error:

2024-11-18 13:26:01 [docker-pkg-build] INFO - ++ /usr/src/spark/build/mvn help:evaluate -Dexpression=project.version -Phive -Phive-thriftserver -Pyarn -Pkubernetes -Dhadoop.version=3.3.6
 (drivers.py:106)
2024-11-18 13:26:01 [docker-pkg-build] INFO - ++ grep -v WARNING
 (drivers.py:106)
2024-11-18 13:26:01 [docker-pkg-build] INFO - ++ tail -n 1
++ grep -v INFO
 (drivers.py:106)
2024-11-18 13:26:01 [docker-pkg-build] INFO - exec: curl --silent --show-error -L https://downloads.lightbend.com/scala/2.12.18/scala-2.12.18.tgz
 (drivers.py:106)
2024-11-18 13:26:02 [docker-pkg-build] INFO - exec: curl --silent --show-error -L https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.6/binaries/apache-maven-3.9.6-bin.tar.gz?action=download
 (drivers.py:106)
2024-11-18 13:26:02 [docker-pkg-build] INFO - exec: curl --silent --show-error -L https://archive.apache.org/dist/maven/maven-3/3.9.6/binaries/apache-maven-3.9.6-bin.tar.gz.sha512
 (drivers.py:106)
2024-11-18 13:26:03 [docker-pkg-build] INFO - Verifying checksum from /usr/src/spark/build/apache-maven-3.9.6-bin.tar.gz.sha512
 (drivers.py:106)
2024-11-18 13:26:03 [docker-pkg-build] INFO - Using `mvn` from path: /usr/src/spark/build/apache-maven-3.9.6/bin/mvn
 (drivers.py:106)
2024-11-18 13:27:05 [docker-pkg-build] INFO - + VERSION='[ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException'
 (drivers.py:106)
2024-11-18 13:27:06 [docker-pkg-build] ERROR - Build command failed with exit code 1: The command '/bin/sh -c ./dev/make-distribution.sh --name wmf --pip --r     -Phive     -Phive-thriftserver     -Pyarn     -Pkubernetes     -Dhadoop.version=3.3.6' returned a non-zero code: 1 (drivers.py:97)
2024-11-18 13:27:06 [docker-pkg-build] ERROR - Building image docker-registry.discovery.wmnet/spark3.5-build:3.5.3-1 failed - check your Dockerfile: Building image docker-registry.discovery.wmnet/spark3.5-build:3.5.3-1 failed (image.py:210)

But it works on my workstation.

Change #1092754 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/docker-images/production-images@master] spark3.5/build: define a maven settings file to make it use webroxy to connect to central

https://gerrit.wikimedia.org/r/1092754

Change #1092754 merged by Brouberol:

[operations/docker-images/production-images@master] spark3.5/build: define a maven settings file to make it use webroxy to connect to central

https://gerrit.wikimedia.org/r/1092754

brouberol subscribed.
== Step 2: publishing ==
Successfully published image docker-registry.discovery.wmnet/spark3.5-build:3.5.3-1
Successfully published image docker-registry.discovery.wmnet/spark3.5:3.5.3-1
Successfully published image docker-registry.discovery.wmnet/spark3.5-operator:1.0.2-3.5.3-1
== Build done! ==