The AMD Mantle Thread

selni · Nov 29, 2013

psoomah said:
● This conversation originated from my "Basically it means Mantle can address, and flexibly distribute the workload between, an APU and a GCN dGPU at the same time." statement, so how is Mantle allowing direct memory control "another issue entirely"?
● I don't get your 'huge difference in power' thing. We're not talking a dual core Sempron here, Kaveri is an extremely powerful high performance processor in it's own right, 512 SPs aren't chopped liver. Even without Mantle in that leaked hotel room video Kaveri was smoothly running BF4 with everything set to 'high'. Programmers can assign pretty much anything they want to it.

One is about allowing developers to do pretty much whatever they want with memory on the GPU, as opposed to DX. This is a Very Good Thing, and not many will argue otherwise. This lets you do stuff you couldn't do before. That's also obviously a prerequisite for the second part, which is a unified address space that lets the APU and dGPU essentially share pointers. While really neat in the APU/console case, it's much less interesting in the dGPU case - what advantage is there here as opposed to having an APU address space and dGPU address space?

What I mean by the difference in power is that the simplest way to do multi GPU is AFR. Obviously you can't do that here when one GPU is 6-7x faster than the other, so you need a new approach and whatever you're running on the iGPU can't be memory bandwidth heavy etc (because unified address space or not, it's going to be across PCIe).

Basically crossfiring an APU and high end dGPU still looks too hard to be worthwhile, mantle or not.

selni · Nov 29, 2013

Abwx said:
Sorry i confused the two issues, right that it s a different
matter once a dGPU is added.

Yeah, I think unified memory for an APU is a huge development - it's going to be very interesting to see how console (and possibly PC!) devs use it.

psoomah · Nov 30, 2013

selni said:
Again that discussion is about a dGPU + APU - obviously the APU can have truly unified memory, but what happens when the dGPU tries to access APU memory and vice versa?

If it's a R9 290 card - "A. 290 and 290X support system unified addressing."

Kaveri being a year late and having Steamroller 'B' cores = pulling selected system unification Excavator capabilities forward into Kaveri 2.0.

I suspect there will be no true 'Excavator' processors, just optimized 20nm Kaveri 2.0 processors with DDR4 memory capability, 6-8 core variants and maybe a 768 SP iGPU.

psoomah · Nov 30, 2013

selni said:
Yeah, I think unified memory for an APU is a huge development - it's going to be very interesting to see how console (and possibly PC!) devs use it.

If it's DICE, it's probably going to be PS4 HSA/hUMA optimizations developed hand in hand with Mantle and directly translatable to PC HSA capable APUs.

blastingcap · Nov 30, 2013

psoomah said:
Yep.

Looks to me like Microsoft is (was) planning to split DX, with their new and improved streamlined Xbox version reserved for their closed store ecosystem apps and games and the older version left pretty much as is for those WIndows 7 and 8 freeloaders too cheap to pay for proper apps from their store. Like providing no SP2 for Window 7 to drive people to Windows 8 so they acclimate to the 'metro' interface and going to Microsoft's store to buy apps. I just did a Windows 7 re-install with SP1 and the initial update was still 135 items. Yeah, frack u Microsoft.

Welcome to the bad timing awards. You're making your slow ponderous continuous face planting drive to switch to an Apple style closed ecosystem just as all heck breaks loose in the fast and agile viable alternative OS/platform universe.

Now you're too late. DICE has said flat out it's concentrating on Mantle and the PS4 for it's future Frostbite R&D and optimizations. Good luck keeping up with the PS4 on EA games. To DICE/EA your proprietary DX programming requirement represents a drag to progress. In 5 years, when EA is doing write once, fast and easy to the metal Mantle ports to every other platform in sight, they'll still have to muck around with your 'DX only' game programming requirement. What Dice and EA already know, the rest of the developers will soon realize. Mantle and PS4 optimizations are the future of much higher quality gaming and a substantially better profit picture at the same time.

And there in your rearview mirror is a menacing black van with a Mantle wearing skull and bones motif driven by a crazily grinning Gabe Newell.

Sony doesn't like it either, but there was a price to pay for going with such a stock AMD hardware and middleware solution - being in no position to impede Mantle in any way than a statement of 'no support'. But hey, all the future optimizations will benefit the PS4 over the Xbox One so there is that.

Mantle is not to the metal!

krumme · Nov 30, 2013

psoomah said:
700% of the performance. But it's not that much, the 290X has 2,816 stream processors and the top Kaveri will have 512 stream processors, that's 550% based on that spec. Kaveri also has the full HSA/hUMA advantage, so the performance delta will be somewhat less than 550%.

Then there's the power savings when not high end gaming. There will be extensive power gating throughout, so the 290x won't spin up until it is needed. Playing Plants vs. Zombies: Garden Warfare for instance, which will be optimized for APUs, might not even spin up the 290x, or just spin up part of it. That saves a lot of heat and power usage.

It doesn't have to be 'really easy' to implement in the engine, it just needs to make a fair bit more money over time than it costs to implement.

For all developers, the bigger strategic picture provides substantial long lasting cost savings and increased profits if Mantle becomes the programming model across OSes, hardware and platforms over time, which is exactly Johan Anderssons vision.

Well the 290x runs 1Ghz and having plenty of mem bandwith, while Kaveri will end on 6-700MHz and beeing mem bandwith limited, so the 700% is about right.

Its perhaps nitpicking, but it goes to show how the assymetric use of APU and dGPU is not worth it.

Its typical old AMD style to promote all the technical possibilities without having prioritization on what matters most for the business.

Fortunately this Mantle launch is a big exeption to that. Assymetric is just 0.05% of the new possibilities for the devs, and its not worth discussing more.

Take eg. this perspective:
Kaveri on 25-35w will probably be able to play BF4 on medium with mantle, on a 768p screen, in intense 64man battles. That kind of redefines what a low end gaming machine can do, and rocks the market situation imho.

Kaveri with mantle could be a killer for NV profitable lowend/midrange mobile gfx market.

Actually i think AMD is lucky they could not launch Kaveri earlier. I hope they insist on reviews using mantle on bf4 as it shows what is possible with the APU.

blastingcap · Nov 30, 2013

Would it be technically feasible for the iGPU/APU to do something like, say, TressFX or physics calculations while the dGPU does the remainder?

krumme said:
Well the 290x runs 1Ghz and having plenty of mem bandwith, while Kaveri will end on 6-700MHz and beeing mem bandwith limited, so the 700% is about right.

Its perhaps nitpicking, but it goes to show how the assymetric use of APU and dGPU is not worth it.

Its typical old AMD style to promote all the technical possibilities without having prioritization on what matters most for the business.

Fortunately this Mantle launch is a big exeption to that. Assymetric is just 0.05% of the new possibilities for the devs, and its not worth discussing more.

Take eg. this perspective:
Kaveri on 25-35w will probably be able to play BF4 on medium with mantle, on a 768p screen, in intense 64man battles. That kind of redefines what a low end gaming machine can do, and rocks the market situation imho.

Kaveri with mantle could be a killer for NV profitable lowend/midrange mobile gfx market.

Actually i think AMD is lucky they could not launch Kaveri earlier. I hope they insist on reviews using mantle on bf4 as it shows what is possible with the APU.

krumme · Nov 30, 2013

blastingcap said:
Mantle is not to the metal!

But its aparently close enough. It looks like they hit the right balance of thinness of the driver. I am pretty sure the driver for PS4, Xbox and Mantle is very similar, and that Sony and MS dont do that more programming directly to the metal.
Sony can probably do some more with their memory model and what to me seems to be more agressive HSA like implementation.

krumme · Nov 30, 2013

Here is my guess of a scenario for the future:

1. The devs need to follow Dice because Dice otherwise have a compettitive advantage to sell premium games to a larger base on the PC. But also because the consoles are mantalites. The results is Mantle is going to be THE API for high performance games. At that time dx rules for every thing else.

2. Sony will take the lead on the consoles, and will fight for their share vs. the pc market by improving the ps4 gradually. So it can play eg bf4 at 1080 and higher quality, but at the same time beeing backwards compatible. PS4 -> PS4.1 in 2 years. Faster ipc jaguar style cores, more cgn 1.2. At the same time HSA will be fully integrated.

3. AMD uses the position to establish Mantle as the standard. As they have control over where the API is thin and where its thick, they can continuesly have an advantage to eg. NV even if they were to support Mantle. The same goes for Intel. Its excactly the dynamics (floating) of the level between driver and API that will ensure AMD is keeping the edge as they can adapt the level to their hardware.

4. At some point the ps4 needs to go to ps5. Here there could be a huge change, as ARM is probably in for the running. Mantle is established - also on the phones, and perhaps abandoning x86 can transform the entire market. At that point the consoles and the phones is probably the window to the internet, games and entertaining in many households. Its not pc any more. At that time both windows and x86 dies on the consumer side - its not only a tax - its simply not working efficient enough and is out of touch with consumer behavior.

Ofcourse this scenario is 100% certain

blastingcap · Nov 30, 2013

krumme said:
Here is my guess of a scenario for the future:

1. The devs need to follow Dice because Dice otherwise have a compettitive advantage to sell premium games to a larger base on the PC. But also because the consoles are mantalites. The results is Mantle is going to be THE API for high performance games. At that time dx rules for every thing else.

2. Sony will take the lead on the consoles, and will fight for their share vs. the pc market by improving the ps4 gradually. So it can play eg bf4 at 1080 and higher quality, but at the same time beeing backwards compatible. PS4 -> PS4.1 in 2 years. Faster ipc jaguar style cores, more cgn 1.2. At the same time HSA will be fully integrated.

3. AMD uses the position to establish Mantle as the standard. As they have control over where the API is thin and where its thick, they can continuesly have an advantage to eg. NV even if they were to support Mantle. The same goes for Intel. Its excactly the dynamics (floating) of the level between driver and API that will ensure AMD is keeping the edge as they can adapt the level to their hardware.

4. At some point the ps4 needs to go to ps5. Here there could be a huge change, as ARM is probably in for the running. Mantle is established - also on the phones, and perhaps abandoning x86 can transform the entire market. At that point the consoles and the phones is probably the window to the internet, games and entertaining in many households. Its not pc any more. At that time both windows and x86 dies on the consumer side - its not only a tax - its simply not working efficient enough and is out of touch with consumer behavior.

Ofcourse this scenario is 100% certain

WOW. Just... wow. I wish I could put your entire post in my signature line.

Btw, you keep calling GCN "cgn" which isn't right. GCN is short for "Graphics Core Next." "cgn" is short for "come get nekkid."

krumme · Nov 30, 2013

blastingcap said:
Would it be technically feasible for the iGPU/APU to do something like, say, TressFX or physics calculations while the dGPU does the remainder?

Yes it is. Its up to the dev to decide. Look at the execution model. The devs decide what cue the feature should go into. And now imagine the execution model with igpu and dgpu as two layers. Actually there could be many and not only 2 devices. Still the dev is in control where the feature should go.

(Btw; I think your question and my answer can fit in your signature as the execution model is the most radical innovation with mantle and shows the potential )

blastingcap · Nov 30, 2013

krumme said:
(Btw; I think your question and my answer can fit in your signature as the execution model is the most radical innovation with mantle and shows the potential )

I want to put your answer in my signature line for religious purposes.

krumme · Nov 30, 2013

I dont know if any of you remember the q&a where, i think it was one of the oxide guys, who said that using dx the driver was a black box?

If we look at mantle and the execution model its easy to see why. When programming for dx, dx decides where the feature should go. But the problem for the programmer is simply he dont know where dx lines the feature up!

For a low performance games that is not a significant dealbreaker but for programming a highend, high performance engine its simply extremely difficult, costly and frustrating. Why?;

Because to get the needed performance in dx optimizing for that black box, the result is lots of expensive trial and error. And you are still left with a bad solution.

And as was said with dx; often it was not possible for the programmer to know if the problem was in the driver or the programming. Thats a very frustrating working methology. And adds a lot of cost. With mantle they said that mostly they found out the problem was programming. And ofcource they preferred that! They are in control and can focus.

The benefit of the execution model is therefore not only the straight benefit of performance but also that it motivates the programmer.

3DVagabond · Nov 30, 2013

krumme said:
I dont know if any of you remember the q&a where, i think it was one of the oxide guys, who said that using dx the driver was a black box?

If we look at mantle and the execution model its easy to see why. When programming for dx, dx decides where the feature should go. But the problem for the programmer is simply he dont know where dx lines the feature up!

For a low performance games that is not a significant dealbreaker but for programming a highend, high performance engine its simply extremely difficult, costly and frustrating. Why?;

Because to get the needed performance in dx optimizing for that black box, the result is lots of expensive trial and error. And you are still left with a bad solution.

And as was said with dx; often it was not possible for the programmer to know if the problem was in the driver or the programming. Thats a very frustrating working methology. And adds a lot of cost. With mantle they said that mostly they found out the problem was programming. And ofcource they preferred that! They are in control and can focus.

The benefit of the execution model is therefore not only the straight benefit of performance but also that it motivates the programmer.

Yeah, programming in DX is like trying to hit a moving target while blindfolded.

redhotiron2004 · Dec 2, 2013

My question is whether mantle or even a part of its improvements would work on older generations such as the ati 4000, 5000, 6000 series graphic cards as well?

I have read that GCN is important for mantle to work. So, it means that NVIDIA can never implement that because they would need to change there architecture. Which could never happen. So, inspite of amd claiming that anyone can use it. It actually meant anyone having GCN architecture.

3DVagabond · Dec 2, 2013

redhotiron2004 said:
My question is whether mantle or even a part of its improvements would work on older generations such as the ati 4000, 5000, 6000 series graphic cards as well?

Whether it'll work or not, it doesn't appear that's an option. Likely not though, no matter.

Noctifer616 · Dec 2, 2013

redhotiron2004 said:
My question is whether mantle or even a part of its improvements would work on older generations such as the ati 4000, 5000, 6000 series graphic cards as well?

I have read that GCN is important for mantle to work. So, it means that NVIDIA can never implement that because they would need to change there architecture. Which could never happen. So, inspite of amd claiming that anyone can use it. It actually meant anyone having GCN architecture.

Mantle isn't a pure low level API like you can have on consoles, it still has an abstraction layer. They did this so that Mantle will work with future generations of AMD graphics architecture.

As long as a graphics architecture supports all mantle features, Mantle can work with that architecture.

And AMD has publicly stated that Mantle isn't vendor locked and can work on other graphic architectures.

Just watch the videos from the AMD summit.

itsmydamnation · Dec 2, 2013

redhotiron2004 said:
My question is whether mantle or even a part of its improvements would work on older generations such as the ati 4000, 5000, 6000 series graphic cards as well?

an amd driver dev on b3d said that basically bindless textures are a minimum requirement for mantle compatibility. that rules out anything before GCN on the AMD side.

krumme · Dec 2, 2013

Techreport part 3 (dont think it was posted here)
http://techreport.com/review/25683/delving-deeper-into-amd-mantle-api/3

Some old quotes but here it goes:

---------

Nixxes' Katsman: ""very early figures from Thief" (which is "not fully running on Mantle yet") showed a big reduction in draw call overhead. "Before, we would often see about 40% of the CPU time stuck in the driver, in D3D, or in various threads," he said. "The early measurements we did, right now we have that down to about a fifth of that.""

--------

"DICE's Andersson extrapolated upon that same notion in his keynote, saying that, with Mantle, the CPU "should never really be a bottleneck for the GPU anymore." In a separate demonstration, Oxide showed their Mantle-enabled space game suffering no frame rate hit when the FX-8350 processor on which it ran was underclocked to 2GHz, or half its base speed. (Graphics processing in that demo was handled by a Radeon R9 290X.)"

----

"The reduction in draw call overhead also means more draw calls can be issued per frame. Riguer said Mantle raises the draw call limit by an order of magnitude to "at least" 100,000 draw calls per frame "at reasonable frame rates." This isn't just theoretical—Oxide showed their space game demo actually hitting 100,000 draw calls per frame. Andersson, who was in the audience for that presentation, was impressed enough to tweet about the demo."

-----

Katsman: "The APIs we have right now, they just allow us to queue synchronous workloads. We say, "draw some triangles," and then, "do some compute," and the driver can try to be a little smart, and maybe it'll overlap some of that. But for the most part, it's serial, and where we're doing one thing, it's not doing other things.
With Mantle . . . we can schedule compute work in parallel with the normal graphics work. That allows for some really interesting optimizations that will really help your overall frame rate and how . . . with less power, you can achieve higher frame rates.

What we'd see, for example—say we're rendering shadow maps. There's really not much compute going on. . . . Compute units are basically sitting there being idle. If, at the same time, we are able to do post-processing effects—say maybe even the post-processing from a previous frame, or what we could do in Tomb Raider, [where] we have TressFX hair simulations, which can be quite expensive—we can do that in parallel, in compute, with these other graphics tasks, and effectively, they can become close to zero cost.

If we guessed that maybe only 50% of that compute power was utilized, the theoretical number—and we won't reach that, but in theory, we might be able to get up to 50% better GPU performance from overlapping compute work, if you would be able to find enough compute work to really fill it up.

The 50% figure is a theoretical best-case scenario, but Katsman added, "It seems quite realistic that you would get maybe 20% additional GPU performance out of optimizations like that.""

krumme · Dec 2, 2013

itsmydamnation said:
an amd driver dev on b3d said that basically bindless textures are a minimum requirement for mantle compatibility. that rules out anything before GCN on the AMD side.

That does exclude fermi also, but does not exclude kepler, as i understand it.

moonbogg · Dec 2, 2013

Not being vendor locked means nothing. Nvidia won't even comment on mantle let alone use it, ever.

PPB · Dec 2, 2013

moonbogg said:
Not being vendor locked means nothing. Nvidia won't even comment on mantle let alone use it, ever.

That's their loss, really. In any CPU bound scenario with DX, they will be behind than a comparable mantle solution.

blastingcap · Dec 2, 2013

Edit: never mind, someone else answered my question

blackened23 · Dec 2, 2013

This seems worrisome:

Finally, adding Mantle support to current game engines, as Nixxes did with the version of Unreal Engine 3 used by Thief, can be a challenge. "Native D3D ports will not magically get much higher performance," explained Katsman. "If you emulate the same system on top of Mantle, you will not get much better performance." Fully optimizing an existing engine for Mantle seems to involve breaking and rewriting some chunks of that engine to take advantage of the new development model. But here again, Katsman believes the performance improvements make the effort worthwhile.

At the end of http://techreport.com/review/25683/delving-deeper-into-amd-mantle-api/3

blastingcap · Dec 2, 2013

If Mantle gets into the actual big game engines, not just frostbite but unity, cryengine, id tech, and especially Unreal..

The AMD Mantle Thread

Senior member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Member

Lifer

Senior member

Diamond Member

Diamond Member

Diamond Member

Lifer

Golden Member

Diamond Member

Diamond Member

Diamond Member