MLIR letter-D vector items are currently represented while the (n-1)-D arrays of just one-D vectors when reduced to LLVM

The implication of actual HW constraints on the programming model is actually this option don’t index dynamically round the equipment registers: an enroll document normally basically not listed dynamically. For the reason that the new sign in number is restricted plus one either must unroll clearly locate fixed register wide variety or wade due to memories. This might be a constraint familiar to help you CUDA programmers: when saying a personal drift a ; and after that indexing having an energetic really worth contributes to thus-entitled local memory usage (i.e. roundtripping in order to memory).

Implication to your codegen ¶

This raises the effects into fixed against active indexing discussed in earlier times: extractelement , insertelement and you can shufflevector to the letter-D vectors when you look at the MLIR only help fixed indices. Active indices are just served towards really lesser 1-D vector however the fresh external (n-1)-D . Some other cases, direct stream / places are essential.

Loops as much as vector thinking is indirect handling out-of vector values, they must operate on explicit stream / store functions more n-D vector versions.
Immediately following an letter-D vector type of is actually piled on a keen SSA worthy of (that may otherwise might not are now living in n data, which have otherwise in place of spilling, whenever in the course of time paid off), it can be unrolled so you can smaller k-D vector types and operations you to definitely correspond to the brand new HW. So it amount of MLIR codegen resembles register allotment and spilling that exist much later throughout the LLVM tube.
HW may help >1-D vectors which have intrinsics to possess indirect handling on these vectors. These could become focused by way of explicit vector_shed procedures off MLIR k-D vector sizes and processes so you’re able to LLVM step one-D vectors + intrinsics.

Instead, we argue that privately reducing so you senior sizzle can a good linearized abstraction hides out the fresh new codegen intricacies regarding recollections accesses by providing an untrue perception out-of phenomenal vibrant indexing around the documents. Instead i always build those people extremely specific into the MLIR and you can allow it to be codegen to understand more about tradeoffs. Different HW requires different tradeoffs throughout the sizes in steps step 1., dos. and you will step three.

Choices produced in the MLIR top can get implications on good much later on stage within the LLVM (immediately following check in allotment). We do not think to expose issues regarding modeling from check in allotment and spilling so you can MLIR clearly. Instead, for each target have a tendency to expose a collection of “good” target businesses and you can letter-D vector designs, of can cost you that PatterRewriters at MLIR peak might possibly be capable target. Such as for instance can cost you in the MLIR peak would-be conceptual and you will put to have ranks, maybe not to own real show modeling. Afterwards like can cost you would-be learned.

Implication on the Reducing to help you Accelerators ¶

To target accelerators that support higher dimensional vectors natively, we can start from either 1-D or n-D vectors in MLIR and use vector.cast to flatten the most minor dimensions to 1-D vector where K is an appropriate constant. Then, the existing lowering to LLVM-IR immediately applies, with extensions for accelerator-specific intrinsics.

It is the role of an Accelerator-specific vector dialect (see codegen flow in the figure above) to lower the vector.cast . Accelerator -> LLVM lowering would then consist of a bunch of Accelerator -> Accelerator rewrites to perform the casts composed with Accelerator -> LLVM conversions + intrinsics that operate on 1-D vector .

Some of those rewrites may need extra handling, especially if a reduction is involved. For example, vector.cast %0: vector to vector when K != K1 * … * Kn and some arbitrary irregular vector.cast %0: vector<4x4x17xf32> to vector may introduce masking and intra-vector shuffling that may not be worthwhile or even feasible, i.e. infinite cost.

However vector.cast %0: vector to vector when K = K1 * … * Kn should be close to a noop.

Implication to your codegen ¶

Implication on the Reducing to help you Accelerators ¶

Leave a Comment Cancel Reply