{"id":113316,"date":"2026-03-05T09:00:00","date_gmt":"2026-03-05T17:00:00","guid":{"rendered":"https:\/\/developer.nvidia.com\/blog\/?p=113316"},"modified":"2026-03-05T11:19:41","modified_gmt":"2026-03-05T19:19:41","slug":"controlling-floating-point-determinism-in-nvidia-cccl","status":"publish","type":"post","link":"https:\/\/developer.nvidia.com\/blog\/controlling-floating-point-determinism-in-nvidia-cccl\/","title":{"rendered":"Controlling Floating-Point Determinism in NVIDIA CCCL"},"content":{"rendered":"\n<p>A computation is considered deterministic if multiple runs with the same input data produce the same bitwise result. While this may seem like a simple property to guarantee, it can be difficult to achieve in practice, especially in parallel programming and floating-point arithmetic. This is because floating-point addition and multiplication aren&#8217;t strictly associative\u2014that is, (a + b) + c may not equal a + (b + c)\u2014due to rounding that occurs when intermediate results are stored with <a href=\"https:\/\/docs.nvidia.com\/cuda\/cuda-programming-guide\/05-appendices\/mathematical-functions.html#associativity\">finite precision<\/a>.<\/p>\n\n\n\n<p>With <a href=\"https:\/\/github.com\/NVIDIA\/cccl\">NVIDIA CUDA Core Compute Libraries<\/a> (CCCL) 3.1, <a href=\"https:\/\/nvidia.github.io\/cccl\/cub\/index.html\">CUB\u2014a low-level CUDA library for speed-of-light parallel device algorithms<\/a>\u2014added a new single-phase API that accepts an execution environment, enabling users to customize algorithm behavior. We can use this environment to configure the <code>reduce<\/code> algorithm\u2019s determinism property. This can only be done through the new single-phase API, since the two-phase API doesn&#8217;t accept an execution environment.<\/p>\n\n\n\n<p>The following code shows how to specify the determinism level in CUB (find the complete example online using <a href=\"https:\/\/godbolt.org\/z\/PdrxPrvev\">compiler explorer<\/a>).<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\nauto input  = thrust::device_vector&lt;float&gt;{0.0f, 1.0f, 2.0f, 3.0f};\n auto output = thrust::device_vector&lt;float&gt;(1);\n\n\n auto env = cuda::execution::require(cuda::execution::determinism::not_guaranteed); \/\/ can be not_guaranteed, run_to_run (default), or gpu_to_gpu\n\n\n auto error = cub::DeviceReduce::Sum(input.begin(), output.begin(), input.size(), env);\n if (error != cudaSuccess)\n {\n   std::cerr &lt;&lt; &quot;cub::DeviceReduce::Sum failed with status: &quot; &lt;&lt; error &lt;&lt; std::endl;\n }\n\n\n assert(output&#x5B;0] == 6.0f);\n<\/pre><\/div>\n\n\n<p>We begin by specifying the input and output vectors. We then use <code>cuda::execution::requir<\/code>e() to construct a <code>cuda::std::execution::env<\/code> object, setting the determinism level to <code>not_guaranteed<\/code>.<\/p>\n\n\n\n<p>There are three determinism levels available for reduction, which are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>not_guaranteed<\/code><\/li>\n\n\n\n<li><code>run_to_run<\/code><\/li>\n\n\n\n<li><code>gpu_to_gpu<\/code><\/li>\n<\/ul>\n\n\n\n<h2 id=\"determinism_not_guaranteed\"  class=\"wp-block-heading\">Determinism not guaranteed<a href=\"#determinism_not_guaranteed\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>In floating-point reductions, the result can depend on the order in which elements are combined. If two runs apply the reduction operator in different orders, the final values may differ slightly. In many applications, these minor differences are acceptable. By relaxing the requirement for strict determinism, the reduction implementation can rearrange the operations in any order, which can improve runtime performance.<\/p>\n\n\n\n<p>In CUB, <code>not_guaranteed<\/code> relaxes the determinism level. This enables atomic operations\u2014whose unordered execution across threads results in a different order of operations between runs\u2014to compute both the block-level partial aggregates and the final reduction value. The entire reduction can also be performed in a single kernel launch, since the atomic operations combine the block-level partial aggregates into the result.<\/p>\n\n\n\n<p>The nondeterministic reduce variant is typically faster than the run-to-run deterministic version\u2014particularly for smaller input arrays, where performing the reduction in a single kernel reduces latency from multiple kernel launches, minimizes extra data movement, and avoids additional synchronization. The tradeoff is that repeated runs may yield slightly different results due to the lack of deterministic behavior.<\/p>\n\n\n\n<h2 id=\"run-to-run_determinism\"  class=\"wp-block-heading\">Run-to-run determinism<a href=\"#run-to-run_determinism\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>While nondeterministic reductions offer potential performance gains, CUB also provides a mode that guarantees consistent results across runs. By default, <code>cub::DeviceReduce<\/code> is run-to-run deterministic, which corresponds to setting the determinism level to <code>run_to_run<\/code> in the single-phase API. In this mode, multiple invocations with the same input, kernel launch configuration, and GPU will produce identical outputs.<\/p>\n\n\n\n<p>This determinism is achieved by structuring the reduction as a fixed, hierarchical tree rather than relying on atomics, whose update order can vary across runs. At each stage of the reduction, elements are first combined within individual threads. The intermediate results are then reduced across threads within a warp using shuffle instructions, followed by a block-wide reduction using shared memory. Finally, a second kernel aggregates the per-block results to produce the final output. Because this sequence is predetermined and independent of the relative timing of thread execution, the same inputs, kernel configuration, and GPU yield the same bitwise result.<\/p>\n\n\n\n<h2 id=\"gpu-to-gpu_determinism\"  class=\"wp-block-heading\">GPU-to-GPU determinism<a href=\"#gpu-to-gpu_determinism\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>For applications that require the highest level of reproducibility, CUB also provides GPU-to-GPU determinism, which guarantees identical results across multiple runs with the same input on different GPUs. This mode corresponds to setting the determinism level to <code>gpu_to_gpu<\/code>.<\/p>\n\n\n\n<p>To achieve this level of determinism, CUB uses a <a href=\"https:\/\/people.eecs.berkeley.edu\/~demmel\/ma221_Fall23\/J115_Efficient_Reproducible_Summation_TOMS_2020.pdf\">Reproducible Floating-point Accumulator (RFA)<\/a>, a solution based on the NVIDIA GTC 2024 session, <a href=\"https:\/\/www.nvidia.com\/en-us\/on-demand\/session\/gtc24-s62405\/\"><em>Restoring the Scientific Method to HPC: High Performance Reproducible Parallel Reductions<\/em><\/a><em>.<\/em> The RFA counters floating-point non-associativity\u2014which arises when adding numbers with different exponents\u2014by grouping all input values into a fixed number of exponent ranges (the default is three bins). This fixed, structured accumulation order ensures the final result is independent of GPU architecture.&nbsp;<\/p>\n\n\n\n<p>The accuracy of the final result depends on the number of bins: more bins provide greater accuracy, but also increase the number of intermediate summations, which can reduce performance. The current implementation defaults the number of bins to three, an optimal default providing balanced performance and accuracy. It\u2019s worth noting that this configuration is not just strictly deterministic, but also guarantees numerically correct results, providing tighter error bounds than the standard pairwise summation traditionally used in parallel reductions.<\/p>\n\n\n\n<p>How results vary based on the determinism levels<\/p>\n\n\n\n<p>The three determinism levels differ in the amount of variation they produce across multiple runs:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Not-guaranteed<\/strong> <strong>determinism<\/strong> produces slightly different summation values on each invocation.<\/li>\n\n\n\n<li><strong>Run-to-run determinism<\/strong> ensures the same value for every invocation on a single GPU, but the result may vary if a different GPU is used.<\/li>\n\n\n\n<li><strong>GPU-to-GPU determinism<\/strong> guarantees that the summation value is identical for every invocation, regardless of which GPU executes the reduction.<\/li>\n<\/ul>\n\n\n\n<p>This is shown in Figure 1, with the summation of an array for each determinism level\u2014represented by green, blue, and red circles\u2014plotted against the run number. A flat horizontal line shows that the reduction produces the same result.&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69efb7649e02e&quot;}\" data-wp-interactive=\"core\/image\" class=\"aligncenter size-full wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1999\" height=\"789\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior.webp\" alt=\"Charts showing how the GPU-to-GPU and run-to-run algorithms produce identical results, but the Not Guaranteed algorithm results vary slightly.\" class=\"wp-image-113324\" srcset=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior.webp 1999w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-179x71.png 179w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-300x118.png 300w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-768x303.png 768w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-625x247.png 625w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-1536x606.png 1536w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-645x255.png 645w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-500x197.png 500w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-160x63.png 160w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-362x143.png 362w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-279x110.png 279w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-1024x404.png 1024w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Behavior-960x379.png 960w\" sizes=\"auto, (max-width: 1999px) 100vw, 1999px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on-async--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><em>Figure 1. Summation value compared to run\u00a0<\/em><\/figcaption><\/figure><\/div>\n\n\n<h2 id=\"determinism_performance_comparison\"  class=\"wp-block-heading\">Determinism performance comparison<a href=\"#determinism_performance_comparison\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>The level of determinism selected affects the performance of <code>cub::DeviceReduce<\/code>. Not-guaranteed determinism, with its relaxed requirements, provides the highest performance. The default run-to-run determinism delivers good performance but is slightly slower than not-guaranteed determinism. GPU-to-GPU determinism, which enforces the strictest reproducibility across different GPUs, can significantly reduce performance, increasing execution time by 20% to 30% for large problem sizes.<\/p>\n\n\n\n<p>Figure 2 compares the performance of the different determinism requirements for <code>float32<\/code> and <code>float64 <\/code>inputs on an NVIDIA H200 GPU (lower is better). They clearly show how the choice of determinism level impacts execution time across different data types.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69efb7649f91b&quot;}\" data-wp-interactive=\"core\/image\" class=\"aligncenter size-full wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1999\" height=\"940\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance.webp\" alt=\"Bar graph showing elapsed time compared to number of elements where not guaranteed is always the best performance, followed closely by run-to-run.\u00a0 GPU-to-GPU is significantly less performant than the other two\" class=\"wp-image-113325\" srcset=\"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance.webp 1999w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-179x84.png 179w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-300x141.png 300w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-768x361.png 768w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-625x294.png 625w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-1536x722.png 1536w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-645x303.png 645w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-500x235.png 500w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-160x75.png 160w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-362x170.png 362w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-234x110.png 234w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-1024x482.png 1024w, https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Determinism-Performance-960x451.png 960w\" sizes=\"auto, (max-width: 1999px) 100vw, 1999px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on-async--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><em>Figure 2. Elapsed time compared to the number of elements<\/em><\/figcaption><\/figure><\/div>\n\n\n<h2 id=\"conclusion\"  class=\"wp-block-heading\">Conclusion<a href=\"#conclusion\" class=\"heading-anchor-link\"><i class=\"fas fa-link\"><\/i><\/a><\/h2>\n\n\n\n<p>With the introduction of the single-phase API and explicit determinism levels, CUB provides an enhanced toolbox for controlling both the behavior and performance of reduction algorithms. Users can choose the level of determinism that best suits their needs: from the high-performance and flexible, not-guaranteed mode, to the reliable run-to-run default, and up to the strictest GPU-to-GPU reproducibility.<\/p>\n\n\n\n<p>Determinism in CUB isn&#8217;t limited to reductions. We plan to extend these capabilities to additional algorithms for developers to control reproducibility across a wider range of parallel CUDA primitives. For updates and discussion, see the ongoing <a href=\"https:\/\/github.com\/NVIDIA\/cccl\/issues\/5550\">GitHub issue<\/a> on expanded determinism support, to follow our roadmap, and provide feedback on algorithms you&#8217;d like to see deterministic versions of.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A computation is considered deterministic if multiple runs with the same input data produce the same bitwise result. While this may seem like a simple property to guarantee, it can be difficult to achieve in practice, especially in parallel programming and floating-point arithmetic. This is because floating-point addition and multiplication aren&#8217;t strictly associative\u2014that is, (a &hellip; <a href=\"https:\/\/developer.nvidia.com\/blog\/controlling-floating-point-determinism-in-nvidia-cccl\/\">Continued<\/a><\/p>\n","protected":false},"author":3179,"featured_media":113323,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"publish_to_discourse":"","publish_post_category":"318","wpdc_auto_publish_overridden":"1","wpdc_topic_tags":"","wpdc_pin_topic":"","wpdc_pin_until":"","discourse_post_id":"1769046","discourse_permalink":"https:\/\/forums.developer.nvidia.com\/t\/controlling-floating-point-determinism-in-nvidia-cccl\/362564","wpdc_publishing_response":"success","wpdc_publishing_error":"","nv_subtitle":"","ai_post_summary":"<ul><li>CUB in NVIDIA CUDA Core Compute Libraries 3.1 introduced a single-phase API that enables explicit control over reduction determinism, allowing users to choose between not_guaranteed, run_to_run, and gpu_to_gpu determinism levels via the execution environment.<\/li><li>Not_guaranteed determinism leverages atomic operations and single kernel launches to maximize reduction performance at the cost of reproducibility, while run_to_run determinism uses fixed hierarchical reduction trees to guarantee identical results across runs on the same GPU.<\/li><li>GPU-to-GPU determinism employs the Reproducible Floating-point Accumulator (RFA), as described in the NVIDIA GTC 2024 session, to ensure bitwise-identical results across different GPUs by grouping inputs into exponent bins, with a tradeoff of increased execution time (20-30% slower for large datasets) for strict reproducibility and tighter error bounds.<\/li><\/ul>","footnotes":"","_links_to":"","_links_to_target":""},"categories":[696,4146,503],"tags":[453],"coauthors":[5010,4225],"class_list":["post-113316","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-development","category-simulation-modeling-design","tag-featured","tagify_workload-data-science"],"acf":{"post_industry":["General"],"post_products":["CUDA"],"post_learning_levels":["Intermediate Technical"],"post_content_types":["Tutorial"],"post_collections":""},"jetpack_featured_media_url":"https:\/\/developer-blogs.nvidia.com\/wp-content\/uploads\/2026\/03\/Floating-Point-CUB.webp","primary_category":{"category":"Developer Tools &amp; Techniques","link":"https:\/\/developer.nvidia.com\/blog\/category\/development\/","id":4146,"data_source":""},"nv_translations":[{"language":"zh_CN","title":"\u63a7\u5236 NVIDIA CCCL \u4e2d\u7684\u6d6e\u70b9\u786e\u5b9a\u6027","post_id":16853},{"language":"ko_KR","title":"NVIDIA CCCL\uc744 \ud65c\uc6a9\ud55c \ubd80\ub3d9 \uc18c\uc218\uc810 \uacb0\uc815\ub860 \uc81c\uc5b4 \uae30\ubc95","post_id":4854}],"jetpack_shortlink":"https:\/\/wp.me\/pcCQAL-ttG","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/113316","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/users\/3179"}],"replies":[{"embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/comments?post=113316"}],"version-history":[{"count":7,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/113316\/revisions"}],"predecessor-version":[{"id":113417,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/posts\/113316\/revisions\/113417"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/media\/113323"}],"wp:attachment":[{"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/media?parent=113316"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/categories?post=113316"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/tags?post=113316"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/developer-blogs.nvidia.com\/wp-json\/wp\/v2\/coauthors?post=113316"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}