Home/Instructions/VFWMACCBF16-VF

VFWMACCBF16.VF

RISC-V VFWMACCBF16.VF Instruction Details

Instruction ManualR-type

BF16 vector-scalar widening fused multiply-accumulate into FP32 vd.

Instruction Syntax

vfwmaccbf16.vf vd, rs1, vs2, vm

Operand Breakdown

vd: destination vector register group.

vs2/vs1 or scalar source: selected by suffixes such as .vv, .vx, .vi, or .vf.

vm: when present, vm=0 uses v0 as the execution mask and vm=1 is unmasked.

ZvfbfwmaVector Operations

Related Search Terms

vfwmaccbf16.vf instruction RISC-V BF16 FMA BF16 vector scalar vfwmaccbf16 encoding Zvfbfwma instruction rs1 BF16 scalar FP32 accumulator

Instruction Behavior

VFWMACCBF16.VF, with SEW=16, reads a scalar BF16 value from f[rs1], BF16 elements from vs2, and the FP32 accumulator in vd, multiplies the BF16 sources, adds the unrounded product to the corresponding FP32 accumulator, then rounds by frm and writes FP32 vd; it belongs to Zvfbfwma.

VFWMACCBF16.VF Decode And Execute Animation

Decode the OP-V encoding and execute BF16 widening fused multiply-accumulate: each active lane multiplies f[rs1] and vs2[i] as BF16 sources and accumulates into FP32 vd[i].

Step 1 / 15

Read OP-V encoding fields

V-extension FP instructions use the OP-V major opcode, with funct6, source registers, vm, funct3, vd, and opcode fields.

Instruction input

vfwmaccbf16.vf

vdUse v0-v31rs1FP register f0-f31/ABIvs2Use v0-v31vmexecution mask control

Execution context

VLSEW

frm

RNE

demo dynamic rounding mode

acc EEW

vd accumulator is FP32

vta/vma

ta, ma

tail/inactive policy

opcode

1010111

OP-V major opcode

vd accumulatorComma-separated finite values, count must equal VL, representable as finite binary32vs2 elementsComma-separated finite values, count must equal VL; 4-digit 0x bit patterns also allowedf[rs1]finite FP demo value; 4-digit 0x bit patterns also allowedv0 mask bits0/1 string, length must equal VL

Encoding fields

0xec855257

31..26

24..20

19..15

14..12

11..7

6..0

111011

funct6

01000

vs2

01010

rs1

101

funct3

00100

1010111

OP-V

Lane results

Long vectors scroll inside this module without page overflow.

i=0active

0.5 + 1.5 * 1.5

2.75 / 0x40300000

i=1active

-1 + -2.25 * 1.5

-4.375 / 0xc08c0000

i=2skip

v0.t=0, not executed

i=3active

-2.5 + -6 * 1.5

-11.5 / 0xc1380000

i=4active

4 + 8 * 1.5

16 / 0x41800000

i=5active

-4.5 + -10 * 1.5

-19.5 / 0xc19c0000

i=6skip

v0.t=0, not executed

i=7active

-8.5 + -20 * 1.5

-38.5 / 0xc21a0000

SEW=16, BF16 sources come from vs2 and vs1/rs1, and vd is both the FP32 accumulator input and FP32 result. The animation exactly widens BF16 sources first, then shows the equivalent FP32 fused multiply-accumulate path; exception flags follow the official BF16 rules.

Quick Understanding & Search Notes

This is the BF16 vector-scalar widening FMA: the syntax is vd, rs1, vs2, vm; rs1 is a BF16 scalar in an FP register, and vd is the FP32 accumulator and result.

OP-V encoding: funct6=111011, funct3=101 (OPFVF), opcode=1010111; the rs1 field selects the BF16 scalar source in an FP register.

Official reserved encoding: any SEW value other than 16 is reserved.

rs1/vs2 sources have EEW=16 BF16, while vd input and output have EEW=32 FP32.

The official equivalent sequence widens the BF16 scalar and vector source to FP32 and then performs vfmacc.vf.

Only active elements within vl execute; with vm=0, v0.t controls activity.

RISC-V Unprivileged ISA Manual: BF16 extensions RISC-V Unprivileged ISA Manual: V standard vector extension

Vector Execution Context

When reading VFWMACCBF16.VF, do not stop at the mnemonic. Official V-extension semantics also depend on the current vl, vtype, and mask state. .vf: one vector source and one floating-point scalar source participate.

Check vl first

The current vl determines the number of body elements. Typical code executes vsetvli, vsetivli, or vsetvl before this instruction.

Then check vtype

The current vtype supplies SEW, LMUL, tail policy, and mask policy; these affect element width, register-group size, and inactive/tail destination elements.

Then check vm/v0

For ordinary vector instructions with vm, vm=0 uses v0 as the execution mask and vm=1 is unmasked. A few forms such as VMERGE use v0 as data-selection input.

Official source: RISC-V V Standard Extension for Vector Operations

Common Usage Scenarios

Vector Operations

Understand this scenario with real code like «vfwmaccbf16.vf v4, f0, v8 # v4[fp32] += bf16(f0) * bf16(v8[i])».

Machine Learning

Understand this scenario with real code like «vfwmaccbf16.vf v4, f0, v8 # v4[fp32] += bf16(f0) * bf16(v8[i])».

Pre-Use Checklist

Syntax Check

Confirm the current instruction format is R-type.
Confirm the operand order matches the example.

Semantic Check

Ensure the destination register usage is compatible with the calling convention.
Confirm this is not the lower-level form of a pseudo-instruction expansion.

Pitfalls / Common Confusions

SEW must be 16; other SEW encodings are reserved.

rs1 names a scalar BF16 source in a floating-point register, not an integer-register source.

vs2 is the BF16 vector source, while vd is both the FP32 accumulator input and FP32 output.

Zvfbfwma depends on Zvfbfmin and Zfbfmin.

Do not model this as separate BF16 multiply then BF16 add; the official semantics are widening fused multiply-accumulate.

FAQ

Do these vector BF16 instructions support any SEW?

No. The official vector BF16 instructions reserve encodings when SEW is not 16; this page animation fixes SEW=16.

Is BF16 the same as IEEE binary16?

No. BF16 has 1 sign bit, 8 exponent bits, and 7 fraction bits; it differs from half-precision binary16 in exponent and fraction widths.

Is VFWMACCBF16 the same as ordinary vfmacc?

Not exactly. The official description gives an equivalent sequence that widens BF16 sources to FP32 and then uses vfmacc, but the source format, SEW=16 restriction, and Zvfbfwma requirement are different.