Skip to content

[EPIC]: GPU-agnostic Reproducible floating-point reductions #1558

@jrhemstad

Description

@jrhemstad

Is this a duplicate?

Area

CUB

Is your feature request related to a problem? Please describe.

I would like reproducible reductions for floating-point values.

Describe the solution you'd like

@maddyscientist has a proof-of-concept implementation here: https://github.com/maddyscientist/reproducible_floating_sums/tree/feature/cuda

MVP:

  • Extend DeviceReduce::Sum with requirements API for opt-in reproducibility
  • Works only for input iterator value_type of float/double

Future work:

  • Support half/bfloat by upconverting to float/double
  • Support for custom types containing floating point values by using something similar to the decomposer approach we used for radix sort to decompose a custom type into something like tuple<float, double, ...>.
  • Extend BlockReduce with reproducible algorithm
  • Extend to Scan
    • For scan, we'd ideally like to fit the aggregate state in 128 bits, which would be tricky because for k=2 (from @maddyscientist's algorithm) we'd need 128 bits trivially, but could potentially reserve 2 of the bits in the aggregator type to use for the decoupled lookback signaling (see [FEA]: Intrusive Decoupled Look-Back #220)

Describe alternatives you've considered

No response

Additional context

No response

Tasks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions