Data Engineering On call 4

Amit Singh Rathore

Published in

Dev Genius

3 min readSep 9, 2023

Another day, new learnings.

Earlier parts of the series:

DE On call 1 | DE On call 2 | DE On call 3 | DE On call 4 | DE On call 5

Issue 1

After an update of a spark job, the following exception occurred:

java.lang.ClassCastException: cannot assign instance of scala.None$ to field 
org.apache.spark.scheduler.Task.appAttemptId of type scala.Option in instance of 
org.apache.spark.scheduler.ResultTask
.
.
.
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:466)

This is caused by Scala version mismatch between Spark & App. I looked at the Spark environment (environment tab in the UI) and found the Scala version used in the classpath.

Java Home — /usr/mware/jdk8u352/jre
Java Version — 1.8.0_352
Scala Version — version 2.12.15

After this, I checked the Scala version in the app Jar. There was a transitive dependency where a different version of Scala was being pulled into the classpath. After removing that (using exclusion in the pom file) this issue was fixed.

Issue 2

A restart of the Spark history server (spark2) gave the following error.

ERROR Utils: Uncaught exception in thread 
java.util.NoSuchElementException
        at org.apache.spark.util.kvstore.InMemoryStore.read(InMemoryStore.java:85)
        at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkAndCleanLog$3(FsHistoryProvider.scala:927)

The Service check for this failed as the service check does a regex match on the response body. This was an intermittent issue. And was resolved after all logs were parsed by SHS.

Issue 3

For a job, we got the following error:

IllegalArgumentException: Cannot grow BufferHolder error. 
  java.lang.IllegalArgumentException: C
annot grow BufferHolder by size 95969 because the size after growing exceeds size limitation 2147483632

As we already know BufferHolder / Partition has a maximum size of 2147483632 bytes (approximately 2 GB). If a partition is bigger than this and it needs to be shuffled/buffered then we get the above error. Asked the user to repartition the data based on two keys rather than one and it solved the problem.

Issue 4

Another job failed with the following error:

java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$5466/589672638.get$Lambda(Unknown Source)
      at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:777)
      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:427)

After looking at the physical plan of the job we found out that the plan was very big with multiple repetitions. The user code had many withColumn transformations. The JVM stack was overflowing because of that.

Asked the user to increase the stack size of the driver. The typical default value is 1024KB, aske user to you can increase it to 4M by setting spark.driver.extraJavaOptions to -Xss4M.