Views: 1,526,831 Main | Rules/FAQ | Memberlist | Active users | Last posts | Calendar | Stats | Online users | Search 07-25-24 03:50 AM

0 users reading Vertex shader ops investigation | 1 bot

Main - Reverse-engineering - Vertex shader ops investigation Hide post layouts | New reply

Posted on 11-06-14 06:01 PM Link | #13
Test operation is the following:

[op] d1A, d01, d25 (0x6)

d01 holds a texcoord.
d25 is a constant set to: 0, 0, 0.5, 1
d1A is a temporary register that is later mov'd to the texcoord output registers.

Opdesc 0x6 is declared as follows:
.opdesc xyzw, xyzw, zzzz ; 0x6

So, if I got all that junk right, it should really be, in pseudo-GLSL:

d1A.xyzw = d01.xyzw [op] d25.zzzz;

In practice I must have done something wrong. None of the known opcodes behaved as expected.

They either had no effect on the texcoords or turned them into all the same value.

blargSNES -- SNES emu for 3DS
More cool stuff

Posted on 11-06-14 09:54 PM Link | #14

mul d1A, d25, d01 (0x6) <- okay
mul d1A, d01, d25 (0x6) <- doesn't work

Note the order of operands.

(each with a properly adjusted opdesc)

blargSNES -- SNES emu for 3DS
More cool stuff

Posted on 11-15-14 11:40 AM Link | #23
I mentioned this on IRC already, but for reference: I think talking about this in a shader language just adds a second level of confusion, hence I suggest providing the underlying bytecode in hex along to each assembly line. That's probably a bit more effort, but at least we then can be sure that if something "odd" is happening it is because of the GPU, rather than due to some misdesign of the shader assembler.

For example, I think one issue with what you explained out is that you try to assign a floating point register to the second source argument of the mul instruction. However, as far as I am aware src2 only has 5 bits available for addressing registers, which is why 0x25 cannot be indexed by that (contrary to src1, which has 7 bits). Obviously, this issue only manifests itself in aemstro's shading language, because had you worked with bytecode directly, you wouldn't even have thought of trying to assign 0x25 to src2.

By the way, nihstro is aware of this limitation and hence swaps the values of src1 and src2 if necessary.

Main - Reverse-engineering - Vertex shader ops investigation Hide post layouts | New reply

Page rendered in 0.011 seconds. (2048KB of memory used)
MySQL - queries: 26, rows: 67/67, time: 0.006 seconds.
[powered by Acmlm] Acmlmboard 2.064 (2018-07-20)
© 2005-2008 Acmlm, Xkeeper, blackhole89 et al.