[EPIC]: GPU-agnostic Reproducible floating-point reductions

### Is this a duplicate?

- [X] I confirmed there appear to be no [duplicate issues](https://github.com/NVIDIA/cccl/issues) for this request and that I agree to the [Code of Conduct](CODE_OF_CONDUCT.md)

### Area

CUB

### Is your feature request related to a problem? Please describe.

I would like reproducible reductions for floating-point values. 

### Describe the solution you'd like

@maddyscientist has a proof-of-concept implementation here: https://github.com/maddyscientist/reproducible_floating_sums/tree/feature/cuda

MVP:
- Extend `DeviceReduce::Sum` with requirements API for opt-in reproducibility
- Works only for input iterator `value_type` of `float/double` 

Future work:
- Support half/bfloat by upconverting to float/double
- Support for custom types containing floating point values by using something similar to the decomposer approach we used for radix sort to decompose a custom type into something like `tuple<float, double, ...>`. 
- Extend BlockReduce with reproducible algorithm 
- Extend to Scan 
    - For scan, we'd ideally like to fit the aggregate state in 128 bits, which would be tricky because for k=2 (from @maddyscientist's algorithm) we'd need 128 bits trivially, but could potentially reserve 2 of the bits in the aggregator type to use for the decoupled lookback signaling (see https://github.com/NVIDIA/cccl/issues/220)


### Describe alternatives you've considered

_No response_

### Additional context

_No response_

### Tasks
- [ ] https://github.com/NVIDIA/cccl/issues/2119
- [ ] https://github.com/NVIDIA/cccl/issues/2112
- [ ] https://github.com/NVIDIA/cccl/issues/2120
- [ ] https://github.com/NVIDIA/cccl/issues/2121
- [ ] https://github.com/NVIDIA/cccl/issues/2122
- [ ] https://github.com/NVIDIA/cccl/issues/2125
- [ ] https://github.com/NVIDIA/cccl/issues/2124
- [ ] https://github.com/NVIDIA/cccl/issues/2123
- [ ] https://github.com/NVIDIA/cccl/issues/2126


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC]: GPU-agnostic Reproducible floating-point reductions #1558

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[EPIC]: GPU-agnostic Reproducible floating-point reductions #1558

Description

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions