Home/Instructions/VFWMACCBF16-VV

VFWMACCBF16.VV

RISC-V VFWMACCBF16.VV Instruction Details

Instruction ManualR-type

BF16 vector-vector widening fused multiply-accumulate into FP32 vd.

Instruction Syntax

vfwmaccbf16.vv vd, vs1, vs2, vm

Operand Breakdown

vd: destination vector register group.

vs2/vs1 or scalar source: selected by suffixes such as .vv, .vx, .vi, or .vf.

vm: when present, vm=0 uses v0 as the execution mask and vm=1 is unmasked.

ZvfbfwmaVector Operations

Related Search Terms

vfwmaccbf16.vv instruction RISC-V BF16 FMA BF16 vector multiply-add vfwmaccbf16 encoding Zvfbfwma instruction FP32 accumulator BF16 widening FMA

Instruction Behavior

VFWMACCBF16.VV, with SEW=16, reads BF16 elements from vs1 and vs2 plus the FP32 accumulator in vd, multiplies the BF16 sources, adds the unrounded product to the corresponding FP32 accumulator, then rounds by frm and writes FP32 vd; it belongs to Zvfbfwma.

VFWMACCBF16.VV Decode And Execute Animation

Decode the OP-V encoding and execute BF16 widening fused multiply-accumulate: each active lane multiplies vs1[i] and vs2[i] as BF16 sources and accumulates into FP32 vd[i].

Step 1 / 15

Read OP-V encoding fields

V-extension FP instructions use the OP-V major opcode, with funct6, source registers, vm, funct3, vd, and opcode fields.

Instruction input

vfwmaccbf16.vv

vdUse v0-v31vs1Use v0-v31vs2Use v0-v31vmexecution mask control

Execution context

VLSEW

frm

RNE

demo dynamic rounding mode

acc EEW

vd accumulator is FP32

vta/vma

ta, ma

tail/inactive policy

opcode

1010111

OP-V major opcode

vd accumulatorComma-separated finite values, count must equal VL, representable as finite binary32vs2 elementsComma-separated finite values, count must equal VL; 4-digit 0x bit patterns also allowedvs1 elementsComma-separated finite values, count must equal VL; 4-digit 0x bit patterns also allowedv0 mask bits0/1 string, length must equal VL

Encoding fields

0xec861257

31..26

24..20

19..15

14..12

11..7

6..0

111011

funct6

01000

vs2

01100

vs1

001

funct3

00100

1010111

OP-V

Lane results

Long vectors scroll inside this module without page overflow.

i=0active

0.5 + 1.5 * 0.5

1.25 / 0x3fa00000

i=1active

-1 + -2.25 * 1

-3.25 / 0xc0500000

i=2skip

v0.t=0, not executed

i=3active

-2.5 + -6 * 2

-14.5 / 0xc1680000

i=4active

4 + 8 * 2.5

24 / 0x41c00000

i=5active

-4.5 + -10 * 3

-34.5 / 0xc20a0000

i=6skip

v0.t=0, not executed

i=7active

-8.5 + -20 * 4

-88.5 / 0xc2b10000

SEW=16, BF16 sources come from vs2 and vs1/rs1, and vd is both the FP32 accumulator input and FP32 result. The animation exactly widens BF16 sources first, then shows the equivalent FP32 fused multiply-accumulate path; exception flags follow the official BF16 rules.

Quick Understanding & Search Notes

This is the BF16 vector-vector widening FMA: the syntax is vd, vs1, vs2, vm; vd is first read as an FP32 accumulator and then written with the FP32 result.

OP-V encoding: funct6=111011, funct3=001 (OPFVV), opcode=1010111; the vs1 field is a real BF16 vector source.

Official reserved encoding: any SEW value other than 16 is reserved.

vs1/vs2 sources have EEW=16 BF16, while vd input and output have EEW=32 FP32.

The official equivalent sequence widens BF16 sources to FP32 and then performs vfmacc.vv; the fused semantics and frm rounding must not be split into ordinary BF16 arithmetic.

Only active elements within vl execute; with vm=0, v0.t controls activity.

RISC-V Unprivileged ISA Manual: BF16 extensions RISC-V Unprivileged ISA Manual: V standard vector extension

Vector Execution Context

When reading VFWMACCBF16.VV, do not stop at the mnemonic. Official V-extension semantics also depend on the current vl, vtype, and mask state. .vv: two vector sources participate element by element.

Check vl first

The current vl determines the number of body elements. Typical code executes vsetvli, vsetivli, or vsetvl before this instruction.

Then check vtype

The current vtype supplies SEW, LMUL, tail policy, and mask policy; these affect element width, register-group size, and inactive/tail destination elements.

Then check vm/v0

For ordinary vector instructions with vm, vm=0 uses v0 as the execution mask and vm=1 is unmasked. A few forms such as VMERGE use v0 as data-selection input.

Official source: RISC-V V Standard Extension for Vector Operations

Common Usage Scenarios

Vector Operations

Understand this scenario with real code like «vfwmaccbf16.vv v4, v8, v12 # v4[fp32] += bf16(v8) * bf16(v12)».

Machine Learning

Understand this scenario with real code like «vfwmaccbf16.vv v4, v8, v12 # v4[fp32] += bf16(v8) * bf16(v12)».

Pre-Use Checklist

Syntax Check

Confirm the current instruction format is R-type.
Confirm the operand order matches the example.

Semantic Check

Ensure the destination register usage is compatible with the calling convention.
Confirm this is not the lower-level form of a pseudo-instruction expansion.

Pitfalls / Common Confusions

SEW must be 16; other SEW encodings are reserved.

vs1 and vs2 are BF16 sources, while vd is both the FP32 accumulator input and FP32 output.

This is fused multiply-accumulate; do not model it as separate BF16 multiply then BF16 add.

Zvfbfwma depends on Zvfbfmin and Zfbfmin.

Exceptions include Overflow, Underflow, Inexact, and Invalid; the animation only demonstrates finite checkable examples.

FAQ

Do these vector BF16 instructions support any SEW?

No. The official vector BF16 instructions reserve encodings when SEW is not 16; this page animation fixes SEW=16.

Is BF16 the same as IEEE binary16?

No. BF16 has 1 sign bit, 8 exponent bits, and 7 fraction bits; it differs from half-precision binary16 in exponent and fraction widths.

Is VFWMACCBF16 the same as ordinary vfmacc?

Not exactly. The official description gives an equivalent sequence that widens BF16 sources to FP32 and then uses vfmacc, but the source format, SEW=16 restriction, and Zvfbfwma requirement are different.