{"id":100118,"date":"2025-05-15T10:00:00","date_gmt":"2025-05-15T17:00:00","guid":{"rendered":"https:\/\/developer.nvidia.com\/blog\/?p=100118"},"modified":"2025-05-29T12:04:59","modified_gmt":"2025-05-29T19:04:59","slug":"predicting-performance-on-apache-spark-with-gpus","status":"publish","type":"post","link":"https:\/\/developer.nvidia.com\/blog\/predicting-performance-on-apache-spark-with-gpus\/","title":{"rendered":"Predicting Performance on Apache Spark with GPUs"},"content":{"rendered":"\n<p>The world of big data analytics is constantly seeking ways to accelerate processing and reduce infrastructure costs. Apache Spark has become a leading platform for scale-out analytics, handling massive datasets for ETL, machine learning, and deep learning workloads. While traditionally CPU-based, the advent of GPU acceleration offers a compelling promise: significant speedups for data processing tasks.<\/p>\n\n\n\n<p>However, migrating Spark workloads from CPUs to GPUs isn&#8217;t a straightforward endeavor.\u00a0GPU acceleration, while powerful for certain operations, doesn&#8217;t necessarily improve performance in every scenario. Factors like small datasets, large amounts of data movement, and using user-defined functions (UDFs) can sometimes negatively impact GPU performance. Conversely, workloads involving high-cardinality data, such as joins, aggregates, sort, window operations, and transcoding tasks (like encoding\/compressing Apache Parquet or Apache ORC or parsing CSV) are typically positive indicators for GPU acceleration.<\/p>\n\n\n\n<p>This presents a crucial problem for organizations looking to leverage GPUs: How do you know if your specific Spark workload would truly benefit from GPU acceleration before investing time and resources in migration?\u00a0The specific environment that a Spark workload is running in on CPU vs. GPU can vary greatly. In addition, the network setup, disk bandwidth, and even the GPU type can play a factor in performance on GPU, and these variables can be hard to capture from Spark logs.\u00a0\u00a0<\/p>\n\n\n\n<h2 id=\"the_qualification_tool\"  class=\"wp-block-heading\">The Qualification Tool<a href=\"#the_qualification_tool\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>The proposed solution to this problem is the Spark RAPIDS Qualification Tool. This tool is designed to analyze your existing CPU-based Spark applications and predict which ones are good candidates for migration to a GPU cluster. It aims to project the performance of the Spark application on GPUs with a machine learning estimation model trained on industry benchmarks and historical results from many real-world examples. The tool is available as a command-line interface via a <a href=\"https:\/\/pypi.org\/project\/spark-rapids-user-tools\/\">pip package<\/a> and can be used in various environments, including cloud service providers (CSPs) like AWS EMR, Google Dataproc, Databricks (AWS\/Azure), as well as on-premise environments.\u00a0There are quick-start notebooks available specifically for the <a href=\"https:\/\/github.com\/NVIDIA\/spark-rapids-examples\/blob\/main\/tools\/emr\/%5BRAPIDS%20Accelerator%20for%20Apache%20Spark%5D%20Qualification%20Tool%20Notebook%20Template.ipynb\">AWS EMR<\/a> and <a href=\"https:\/\/github.com\/NVIDIA\/spark-rapids-examples\/blob\/main\/tools\/databricks\/%5BRAPIDS%20Accelerator%20for%20Apache%20Spark%5D%20Qualification%20Tool%20Notebook%20Template.ipynb\">Databricks<\/a> environments.<\/p>\n\n\n\n<p>The tool works by taking the <a href=\"https:\/\/spark.apache.org\/docs\/latest\/monitoring.html\">Spark event logs<\/a> generated from your CPU-based Spark applications as its primary input. These event logs contain valuable information about the application, its executors, and the expressions used, along with relevant operating metrics. The tool supports event logs from both Spark 2.x and Spark 3.x jobs.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69efb759a3adc&quot;}\" data-wp-interactive=\"core\/image\" class=\"aligncenter size-full wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1584\" height=\"422\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1.png\" alt=\"The qualification tool processes event logs to generate application recommendations for migration to GPU along with Spark config recommendations.\" class=\"wp-image-100124\" srcset=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1.png 1584w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-300x80.png 300w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-625x167.png 625w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-179x48.png 179w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-768x205.png 768w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-1536x409.png 1536w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-645x172.png 645w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-500x133.png 500w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-160x43.png 160w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-362x96.png 362w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-413x110.png 413w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-1024x273.png 1024w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-1.-High-level-flow-for-the-qualification-tool-1-960x256.png 960w\" sizes=\"auto, (max-width: 1584px) 100vw, 1584px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on-async--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><em>Figure 1. High-level flow for the qualification tool<\/em><\/figcaption><\/figure><\/div>\n\n\n<p>As output, the qualification tool provides several key pieces of information to aid in the migration process:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A qualified workload list indicating which applications are candidates for GPU migration.<\/li>\n\n\n\n<li>Recommended Spark configurations for GPU, which are calculated based on cluster information (like memory and cores) and data from the Spark event logs that could impact the performance of Spark applications on GPU.<\/li>\n\n\n\n<li>For CSP environments \u2014 a recommended GPU cluster shape, including instance type and count, along with GPU information.<\/li>\n<\/ul>\n\n\n\n<p>The output provides a starting point, but it&#8217;s important to note that the tool does not guarantee that the recommended applications will be accelerated the most. The tool is a predictive estimate and we will explain more the methodology in the next section.\u00a0The tool reports its findings by examining the amount of time spent on tasks of SQL Dataframe operations.<\/p>\n\n\n\n<p>You can run the tool from the command line using a CLI command: <code>spark_rapids qualification --eventlogs &lt;file-path> --platform &lt;platform><\/code>.<\/p>\n\n\n\n<h2 id=\"how_qualification_works\"  class=\"wp-block-heading\">How Qualification Works<a href=\"#how_qualification_works\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>So how does the tool work internally to provide these predictions and recommendations? The core of the qualification tool lies in its ability to analyze the input event logs and extract various metrics, which are then used as features. The tool parses the raw event log and generates intermediate CSV files containing raw features for each SQL execution ID (sqlID). These features are derived from information within the event logs, such as disk bytes spilled, maximum heap memory used, estimated scan bandwidth, details about individual operators in the query plan, and data size.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69efb759a4ab6&quot;}\" data-wp-interactive=\"core\/image\" class=\"aligncenter size-full wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1589\" height=\"621\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1.png\" alt=\"The estimation model for the qualification tool extracts features from the event log such as bytes spilled, scan bandwidth, data size, and more.\" class=\"wp-image-100123\" srcset=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1.png 1589w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-300x117.png 300w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-625x244.png 625w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-179x70.png 179w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-768x300.png 768w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-1536x600.png 1536w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-645x252.png 645w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-500x195.png 500w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-160x63.png 160w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-362x141.png 362w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-281x110.png 281w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-1024x400.png 1024w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-2.-The-qualification-tool-estimation-model-1-960x375.png 960w\" sizes=\"auto, (max-width: 1589px) 100vw, 1589px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on-async--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><em>Figure 2. The qualification tool estimation model<\/em><\/figcaption><\/figure><\/div>\n\n\n<p>These extracted features serve as input for a Machine Learning estimation model.\u00a0 This model has been trained on historical data from matching CPU and GPU runs of various Spark applications. By leveraging this training data, the model learns to predict the speedup an application might achieve when run on a GPU.\u00a0 The tool uses data from these historical benchmarks to estimate speed-up at the individual operator level. This estimation is then combined with other relevant heuristics to determine the overall qualification of a workload for GPU migration.\u00a0The tool ships with pre-trained estimation models tailored for various environments, primarily trained on <a href=\"https:\/\/github.com\/NVIDIA\/spark-rapids-benchmarks\/tree\/dev\/nds\">NDS benchmark<\/a> workloads.<\/p>\n\n\n\n<h2 id=\"building_a_custom_qualification_model\"  class=\"wp-block-heading\">Building A Custom Qualification Model<a href=\"#building_a_custom_qualification_model\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>While the pre-trained models work well for many scenarios, you might encounter situations where the out-of-the-box predictions aren&#8217;t accurate for your specific needs. This is particularly true if your workloads don&#8217;t resemble the NDS benchmarks the models were primarily trained on, if your Spark environment (hardware, network, etc.) is significantly different from the pre-trained environments, or if you have already benchmarked numerous workloads on both CPU and GPU in your environment and observe discrepancies with the predictions.<\/p>\n\n\n\n<p>In these cases, the Spark RAPIDS Qualification Tool offers the capability to build a custom qualification estimation model. This allows you to train an estimation model specifically on your own data and environment, potentially leading to more accurate predictions.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69efb759a5d92&quot;}\" data-wp-interactive=\"core\/image\" class=\"aligncenter size-full wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1548\" height=\"720\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model.png\" alt=\"The process to build a custom estimation model starts with running CPU and GPU workloads to collect event logs, then preprocessing the logs, and then training the model before feature importance and evaluation.\" class=\"wp-image-100125\" srcset=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model.png 1548w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-300x140.png 300w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-625x291.png 625w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-179x83.png 179w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-768x357.png 768w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-1536x714.png 1536w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-645x300.png 645w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-500x233.png 500w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-160x74.png 160w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-362x168.png 362w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-237x110.png 237w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-1024x476.png 1024w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-3.-Process-to-build-a-custom-estimation-model-960x447.png 960w\" sizes=\"auto, (max-width: 1548px) 100vw, 1548px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on-async--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><em>Figure 3. Process to build a custom estimation model<\/em><\/figcaption><\/figure><\/div>\n\n\n<h3 id=\"run_cpu_and_gpu_workloads_and_collect_event_logs\"  class=\"wp-block-heading\">Run CPU and GPU Workloads and Collect Event Logs<a href=\"#run_cpu_and_gpu_workloads_and_collect_event_logs\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p>To train a model that accurately predicts GPU performance in your environment, you need training data that includes both CPU and GPU runs for the same workloads. The process involves running the target Spark applications on both CPU and GPU clusters and collecting the resulting Spark event logs. It&#8217;s crucial to collect CPU and GPU event log pairs for each workload. CPU event logs are used to derive the features for the model, while GPU event logs are used to compute the actual speedup achieved, which serves as the label for training.<\/p>\n\n\n\n<h3 id=\"preprocess_the_event_logs\"  class=\"wp-block-heading\">Preprocess the Event Logs<a href=\"#preprocess_the_event_logs\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p>Before training, the collected event logs need to be processed to extract the features required by the model. The preprocessing step uses the Profiler tool to parse the raw event logs and generate CSV files containing &#8220;raw features&#8221; per sqlID. This process can take some time depending on the volume and size of the event logs. To optimize subsequent runs, the <code>$QUALX_CACHE_DIR<\/code> environment variable can be set to cache these intermediate Profiler CSV files. The preprocessing step can be executed using the CLI command <code>qualx preprocess --dataset datasets<\/code>.<\/p>\n\n\n\n<h3 id=\"train_the_xgboost_model\"  class=\"wp-block-heading\">Train the XGBoost Model<a href=\"#train_the_xgboost_model\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p>Once the features are extracted through preprocessing, you can train your custom XGBoost model. The training process can be initiated using the <code>spark_rapids train<\/code> CLI command. You need to provide the path to the directory containing your dataset JSON files, the path where you want to save the trained model, and an output folder for the generated CSV files. For example, you might run <code>spark_rapids train --dataset datasets --model custom_onprem.json --output_folder train_output<\/code>. The training process utilizes machine learning and leverages <a href=\"https:\/\/optuna.org\/\">Optuna<\/a> for hyper-parameter optimization. You can also configure the number of trials for hyperparameter search. The model is trained at the SQL execution ID level (sqlID). As a rule of thumb, around 100 sqlIDs are recommended for an &#8220;initial&#8221; model, and around 1000 sqlIDs for a &#8220;good&#8221; model.<\/p>\n\n\n\n<h3 id=\"evaluate_feature_importance_and_model_performance\"  class=\"wp-block-heading\">Evaluate Feature Importance and Model Performance<a href=\"#evaluate_feature_importance_and_model_performance\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p>After training, it&#8217;s beneficial to evaluate the importance of the features used by the model. While the estimation model has built-in feature importance metrics (gain, cover, frequency), there is also available Shapley (SHAP) values, which provide a game-theoretic allocation of importance and are additive, summing to the final prediction. Typical important features include durations, compute, and network I\/O.&nbsp;<\/p>\n\n\n\n<p>You also can evaluate the performance of your trained model by comparing the predicted speedups against the actual observed speedups from your training data. An ideal prediction would fall on the identity line (predicted speedup equals actual speedup). You can choose an evaluation metric suitable for your use case, such as Mean Absolute Percentage Error (MAPE), precision, or recall.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69efb759a6eba&quot;}\" data-wp-interactive=\"core\/image\" class=\"aligncenter size-full wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1508\" height=\"1061\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model.png\" alt=\"The evaluation graph shows actual speedup vs predicted speedup for various Spark queries.  That allows the custom model to be evaluated for true positives, false positives, true negatives, and false negatives based on evaluation criteria.\" class=\"wp-image-100126\" srcset=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model.png 1508w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model-300x211.png 300w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model-625x440.png 625w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model-163x115.png 163w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model-768x540.png 768w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model-645x454.png 645w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model-426x300.png 426w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model-128x90.png 128w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model-362x255.png 362w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model-156x110.png 156w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Figure-4.-Example-evaluation-graph-of-a-custom-model-1024x720.png 1024w\" sizes=\"auto, (max-width: 1508px) 100vw, 1508px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on-async--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><em>Figure 4. Example evaluation graph of a custom model<\/em><\/figcaption><\/figure><\/div>\n\n\n<h3 id=\"use_the_custom_model_for_prediction\"  class=\"wp-block-heading\">Use the Custom Model for Prediction<a href=\"#use_the_custom_model_for_prediction\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h3>\n\n\n\n<p>Once you&#8217;re satisfied with the performance of your custom-trained model, you can use it with the qualification tool for predicting speedups on new, unseen Spark applications. When running the <code>spark_rapids prediction<\/code> command, simply supply the path to your trained model file (e.g., custom_onprem.json) using the <code>--custom_model_file<\/code> argument. The tool will then use your custom model instead of the default pre-trained model to analyze the event logs and provide speedup predictions and recommendations. The output will include per-application, and per-sql speedup predictions, feature values used for prediction, and feature importance values.<\/p>\n\n\n\n<p>Building a custom qualification model empowers you to tailor the prediction process to your specific environment and workloads, increasing the accuracy of the recommendations and ultimately helping you more effectively leverage GPUs for your Spark applications.<\/p>\n\n\n\n<h2 id=\"getting_started_with_apache_spark_on_gpus\"  class=\"wp-block-heading\">Getting started with Apache Spark on GPUs<a href=\"#getting_started_with_apache_spark_on_gpus\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>Enterprises can take advantage of the <a href=\"https:\/\/www.nvidia.com\/en-us\/deep-learning-ai\/solutions\/data-science\/apache-spark-3\/\">RAPIDS Accelerator for Apache Spark<\/a> to seamlessly migrate Apache Spark workloads to NVIDIA GPUs. RAPIDS Accelerator for Apache Spark leverages GPUs to accelerate processing by combining the power of the RAPIDS cuDF library and the scale of the Spark distributed computing framework. Run existing Apache Spark applications on GPUs with no code changes by launching Spark with the RAPIDS Accelerator for the Apache Spark plugin JAR file.&nbsp;<\/p>\n\n\n\n<p>The qualification tool is also part of <a href=\"https:\/\/blogs.nvidia.com\/blog\/project-aether-accelerates-apache-spark\/\">Project Aether<\/a> which is a collection of tools and processes that automatically qualify, test, configure and optimize Spark workloads for GPU acceleration at scale. Organizations who are interested in using Project Aether to assist with their Spark migrations can <a href=\"https:\/\/www.nvidia.com\/en-us\/deep-learning-ai\/solutions\/data-science\/apache-spark-3\/#project-aether\">apply to be considered<\/a> for this free service.<\/p>\n\n\n\n<p>For additional information about the qualification tool, please check out the <a href=\"https:\/\/docs.nvidia.com\/spark-rapids\/user-guide\/latest\/qualification\/overview.html\">Spark RAPIDS user guide<\/a>&nbsp; For a more detailed technical view of this topic, you can watch the <a href=\"https:\/\/www.nvidia.com\/en-us\/on-demand\/session\/gtc25-dlit71528\/\">GTC 2025 on-demand session<\/a> focused on Spark RAPIDS tools.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The world of big data analytics is constantly seeking ways to accelerate processing and reduce infrastructure costs. Apache Spark has become a leading platform for scale-out analytics, handling massive datasets for ETL, machine learning, and deep learning workloads. While traditionally CPU-based, the advent of GPU acceleration offers a compelling promise: significant speedups for data processing &hellip; <a href=\"https:\/\/developer.nvidia.com\/blog\/predicting-performance-on-apache-spark-with-gpus\/\">Continued<\/a><\/p>\n","protected":false},"author":2565,"featured_media":100119,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"publish_to_discourse":"","publish_post_category":"318","wpdc_auto_publish_overridden":"1","wpdc_topic_tags":"","wpdc_pin_topic":"","wpdc_pin_until":"","discourse_post_id":"1620559","discourse_permalink":"https:\/\/forums.developer.nvidia.com\/t\/predicting-performance-on-apache-spark-with-gpus\/333334","wpdc_publishing_response":"success","wpdc_publishing_error":"","nv_subtitle":"","ai_post_summary":"<ul><li>The Spark RAPIDS Qualification Tool analyzes existing CPU-based Spark applications and predicts which ones are good candidates for migration to a GPU cluster using a machine learning estimation model trained on industry benchmarks.<\/li><li>The tool takes Spark event logs as input and provides a qualified workload list, recommended Spark configurations for GPU, and a recommended GPU cluster shape for cloud service providers.<\/li><li>To improve prediction accuracy for specific environments, users can build a custom qualification estimation model by running CPU and GPU workloads, collecting event logs, preprocessing the logs, training an XGBoost model, and evaluating feature importance and model performance.<\/li><\/ul>","footnotes":"","_links_to":"","_links_to_target":""},"categories":[696,4146],"tags":[278,3273,453],"coauthors":[4322,3393,4573],"class_list":["post-100118","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-development","tag-apache-spark","tag-accelerated-data-analytics","tag-featured","tagify_workload-data-center-cloud","tagify_workload-data-science"],"acf":{"post_industry":["General"],"post_products":["RAPIDS"],"post_learning_levels":["Intermediate Technical"],"post_content_types":["Deep dive"],"post_collections":""},"jetpack_featured_media_url":"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2025\/05\/Predicting-Performance-on-Apache-Spark-with-GPUs.png","primary_category":{"category":"Data Science","link":"https:\/\/developer.nvidia.com\/blog\/category\/data-science\/","id":696,"data_source":""},"nv_translations":[{"language":"zh_CN","title":"\u4f7f\u7528 GPU \u9884\u6d4b Apache Spark \u7684\u6027\u80fd","post_id":13975}],"jetpack_shortlink":"https:\/\/wp.me\/pcCQAL-q2O","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/100118","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/users\/2565"}],"replies":[{"embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/comments?post=100118"}],"version-history":[{"count":1,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/100118\/revisions"}],"predecessor-version":[{"id":100127,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/100118\/revisions\/100127"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/media\/100119"}],"wp:attachment":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/media?parent=100118"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/categories?post=100118"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/tags?post=100118"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/coauthors?post=100118"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}