Views: 1,609,262 | Main | Rules/FAQ | Memberlist | Active users | Last posts | Calendar | Stats | Online users | Search | 11-21-24 12:20 PM |
Guest: |
0 users reading Vertex shader ops investigation | 1 bot |
Main - Reverse-engineering - Vertex shader ops investigation | Hide post layouts | New reply |
StapleButter |
| ||
Member blarg Level: 30 Posts: 12/184 EXP: 151358 Next: 14511 Since: 10-27-14 From: France Last post: 2648 days ago Last view: 2558 days ago |
Test operation is the following:
[op] d1A, d01, d25 (0x6) d01 holds a texcoord. d25 is a constant set to: 0, 0, 0.5, 1 d1A is a temporary register that is later mov'd to the texcoord output registers. Opdesc 0x6 is declared as follows: .opdesc xyzw, xyzw, zzzz ; 0x6 So, if I got all that junk right, it should really be, in pseudo-GLSL: d1A.xyzw = d01.xyzw [op] d25.zzzz; In practice I must have done something wrong. None of the known opcodes behaved as expected. They either had no effect on the texcoords or turned them into all the same value. ____________________ blargSNES -- SNES emu for 3DS More cool stuff |
StapleButter |
| ||
Member blarg Level: 30 Posts: 13/184 EXP: 151358 Next: 14511 Since: 10-27-14 From: France Last post: 2648 days ago Last view: 2558 days ago |
Okay.
mul d1A, d25, d01 (0x6) <- okay mul d1A, d01, d25 (0x6) <- doesn't work Note the order of operands. (each with a properly adjusted opdesc) ____________________ blargSNES -- SNES emu for 3DS More cool stuff |
neobrain |
| ||
Member Normal user Level: 10 Posts: 2/17 EXP: 4239 Next: 175 Since: 11-15-14 Last post: 3313 days ago Last view: 2802 days ago |
I mentioned this on IRC already, but for reference: I think talking about this in a shader language just adds a second level of confusion, hence I suggest providing the underlying bytecode in hex along to each assembly line. That's probably a bit more effort, but at least we then can be sure that if something "odd" is happening it is because of the GPU, rather than due to some misdesign of the shader assembler.
For example, I think one issue with what you explained out is that you try to assign a floating point register to the second source argument of the mul instruction. However, as far as I am aware src2 only has 5 bits available for addressing registers, which is why 0x25 cannot be indexed by that (contrary to src1, which has 7 bits). Obviously, this issue only manifests itself in aemstro's shading language, because had you worked with bytecode directly, you wouldn't even have thought of trying to assign 0x25 to src2. By the way, nihstro is aware of this limitation and hence swaps the values of src1 and src2 if necessary. |
Main - Reverse-engineering - Vertex shader ops investigation | Hide post layouts | New reply |
Page rendered in 0.011 seconds. (2048KB of memory used) MySQL - queries: 28, rows: 69/69, time: 0.006 seconds. Acmlmboard 2.064 (2018-07-20) © 2005-2008 Acmlm, Xkeeper, blackhole89 et al. |