RISC-V asm instruction name madness

dingosity · on March 14, 2022

All those comments about "interesting" assembly mnemonics for various architectures and not a single person mentioned the 6809's SEX (Sign EXtend) instruction.

zozbot234 · on March 14, 2022

Interestingly, VM instruction sets (JVM, CLR, WASM etc.) tend to have the opposite problem in that the chosen names are pointlessly long, even though the insn sets are so small that more concise mnemonics are definitely possible.

mananaysiempre · on March 14, 2022

Not only VMs — SSE/AVX on x86 has been getting more and more ridiculous over the last decade (PCLMULHQLQDQ anyone?), although it isn’t clear that the names are at fault and not simply the haphazard instruction set.

int_19h · on March 14, 2022

VM instruction sets are generally not written by people, but they're occasionally read by people (who might have to do so for debugging). And terse mnemonics are not good for readability.

zozbot234 · on March 14, 2022

> And terse mnemonics are not good for readability.

I disagree there, though it would be nice to see some actual research on these issues. I for one see pointlessly long mnemonics as kind of a worst case for surveyability, maybe even worse than pretty-printed English-like phrases as seen in BASIC or COBOL.

(Of course, seemingly "long" mnemonics often result from orthogonality in the insn set, e.g. as in PCLMULHQLQDQ mentioned in a sibling comment. In that case, having a "long" mnemonic is indeed better than compacting it and losing track of its internal structure.)

int_19h · on March 14, 2022

I dunno; e.g. IL doesn't look pointlessly long to me, and it's certainly nowhere near COBOL level of verbosity:

https://en.wikipedia.org/wiki/List_of_CIL_instructions

Mostly it uses well-established terse mnemonics - e.g. for branching or arithmetic - but stuff related to CLI object model and other unusual things are spelled out more fully.

LeFantome · on March 14, 2022

What are the pointlessly long names in the CIL ( Common Intermediate Language - CLR assembly )? Most of them are ADD, MUL, DUP, BGT, CALL, LDC.I4, and the like.

There are a few localloc, tailcall, volatile, and such but you don’t use them much.

Things like LDC.I4 ( load 32 bit integer ) have companions like LDC.I8 and LDC.R8 that add a lot of symmetry and clarity without adding too much bulk ( in my view ).

nynx · on March 14, 2022

WASM went ahead and renamed all the instructions at some point to make them more consistent. e.g. `get_local` was renamed to `local.get`. Maybe RISCV should do the same.

bhouston · on March 14, 2022

The main proposal here for true problem is to normalize longer assembly instruction names. That seems reasonable.

qayxc · on March 14, 2022

Unfortunately it seems as if there's too much ad-hoc naming going on as opposed to a reasonable set of conventions.

charcircuit · on March 14, 2022

Is it possible to view this whole thread at once instead of having to look at each person's message 1 by 1?

psychoslave · on March 14, 2022

A bit convoluted, and far from a perfect rendering, but you can try this:

  elinks 'https://www.realworldtech.com/forum/?threadid=205288' -dump | grep curpostid | sed -e 's/.*https/https/' | while read -r url ; do  elinks -dump $url | awk '/By:/, /Previous/' | sed \$d ; done | less

tmp_cond_trav · on March 14, 2022

Thanks! I pasted the output of this making it easy to read under https://pastebin.com/JeVxYzLM

znwu · on March 15, 2022

> In the V-extension, they have an instruction named "vrgather".

> But that instruction is not a gather load instruction. This instructoin is just a shuffle instruction.

I think on the contrary, x86's permute/shuffle/gather naming is a horrible legacy mess (plus broadcast and other stuff). The channels width, the source selection, the IMM encoding, all basically nonsense and horrifically non-orthogonal.

`vrgather` is literally "gather from register, vector". In risc-v, you can gather from 8 consecutive vector registers (maximum size of a vector register group). That means 1024bits on a bare minimum Zv128b machine. This channel width blows out the 256bits/128bits channel nonsense even in AVX512 and deserve a name closer to memory gather.

wyldfire · on March 14, 2022

> In the V-extension, they have an instruction named "vrgather".

> But that instruction is not a gather load instruction. This instructoin is just a shuffle instruction.

I don't know that it's the case here but if there's a more general instruction that could do the same effects as another with implicit operands you can preserve the opcode space and create assembler-mapped instructions. So for this case you could make a "vshuffle" mnemonic that encodes a specific vgather with an implied permutation of the inputs. I'm speaking very generally and broadly here - I have no idea about the specifics of the RISC-V vector extension.

snvzz · on March 14, 2022

IMHO it is actually a good thing.

It promotes looking up what an opcode does, rather than assuming what it does from its name.

What an opcode precisely does is actually critical, in assembly.

Once the engineer has done so and is intimately familiar with the opcode, the short mnemonic is a non-issue.

limoce · on March 14, 2022

Can we have namespaces for mnemonics? Something like vec.gather

wyldfire · on March 14, 2022

you can have arbitrarily complex assembly language when you define a new architecture. You'll have an easier time fitting into existing assemblers if you are able to accept gas-like directives at least.

So yeah - just create your mnemonics for your new architecture that happened to be prefixed by the instruction class/domain.

Pet_Ant · on March 14, 2022

Sorry, but isn’t it trivial to rename all the instructions and then just have a transpiler? There you go.

wyldfire · on March 14, 2022

Sure, you can always solve problems by adding indirection. But sometimes you want to debate an effective design of this layer - in this case the assembly language.

contravariant · on March 14, 2022

Famously you can solve all problems that way, except the problem of having too many layers of indirection.

LeFantome · on March 15, 2022

Well, you could mostly solve it by having a compiler target whatever level of indirection required. No need to chain through them all.