|
Register or have you forgotten your password?
|
|
|
| Amiga community support ideas This forum is for the open discussion of new thoughts and ideas intended to help the Amiga community. What do we need? What do we want? |
![]() |
|
|
Thread Tools | Display Modes |
|
|
#31 | ||||||||
|
Sockologist
![]()
|
ARM is very much the present of computing, let alone the future. I wouldn't write off the x86 though. The current generation of these processors is a far cry from the clunky old components. The modern 64-bit implementations are actually quite nice and extremely high performance.
__________________
OCA This isn't SCSI... This is SATA!!! I have CDO. It's like OCD except all the letters are in ascending order. The way they should be. Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64 A1XE 800MHz / 512MB / Radeon 9200 / OS4.1 A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4 A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9 A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB |
||||||||
|
|
|
|
|
#32 | |||||||||
|
Technoid
![]()
Join Date: Sep 2011
Location: Edinburgh, Scotland
Posts: 342
|
Quote:
Still. Check this out: http://www.youtube.com/watch?v=oLte5f34ya8 this is the sort of trick a new Amiga ought to aim for. Forget GPUs, massive parallelism is the way to go. Maybe a single standard supervisor CPU with a whole load of barrel co-processors similar to the UltraSparc T1. The throughput of those things is incredible, given the right workloads. They threw away such complexities as out-of-order execution in exchange for simultaneous multithreading, thus all but eliminating cache latencies. This strategy would be perfect for highly paralellisable workloads such as ray-tracing. We could call them Juggler Chips!
__________________
Signature intentionally left blank |
|||||||||
|
|
|
|
|
#33 | |||||||||
|
Kindred of Babble-on
![]()
Join Date: Dec 2003
Location: Serbia
Posts: 2,532
|
Quote:
__________________
You`re here, Noŷs. |
|||||||||
|
|
|
|
|
#34 | ||||||||
|
Cult Member
![]()
Join Date: Oct 2009
Posts: 553
|
While everyone is talking about 'new' hardware, I have to ask: What about the software?
|
||||||||
|
|
|
|
|
#35 | |||||||||
|
Sockologist
![]()
|
Quote:
Full ray tracing is a tough one due to the tendency of threads to become divergent in their flow of execution but far from impossible with modest GPUs today. Then there is ray marching, which is the poor man's next best thing. And they can do that entirely realtime. In your browser, even, if you happen to have a WebGL capable one and supported hardware.
__________________
OCA This isn't SCSI... This is SATA!!! I have CDO. It's like OCD except all the letters are in ascending order. The way they should be. Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64 A1XE 800MHz / 512MB / Radeon 9200 / OS4.1 A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4 A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9 A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB |
|||||||||
|
|
|
|
|
#36 | |||||||||
|
Technoid
![]()
Join Date: Sep 2011
Location: Edinburgh, Scotland
Posts: 342
|
Quote:
"However, adaptive ray-casting upon the projection plane and adaptive sampling along each individual ray do not map well to the SIMD architecture of modern GPU; therefore, it is a common perception that this technique is very slow and not suitable for interactive rendering. Multi-core CPUs, however, are a perfect fit for this technique and may benefit marvelously from an adaptive ray-casting strategy, making it suitable for interactive ultra-high quality volumetric rendering." http://en.wikipedia.org/wiki/Volume_ray_casting Here Intel are doing real time ray tracing show off their Nehalem core: http://www.youtube.com/watch?v=ianMNs12ITc Obviously that is an expensive top-of-the-range CPU there (or rather, four of them). It makes me wonder what could be done with a big bunch of ARM chips. GPUs can be made to do this but you'd not be using them optimally. Likewise even a general purpose chip like the Nehalem is a lot more complex than necessary. I think to sum it up, GPUs are designed for a task too specific, while mainstream CPUs are designed for tasks too general. I wonder if this goes some way to explain AMD's strategy with their Bulldozer chips, which seems to have confused a lot of people.
__________________
Signature intentionally left blank |
|||||||||
|
|
|
|
|
#37 | ||||||||
|
Cult Member
![]()
Join Date: Oct 2007
Posts: 800
|
@Mrs Beanbag
I do agree that FPGAs represent a big opportunity to change how flexible computing architecture can be, but the line I quoted above doesn't make sense. The very reason GPGPU is a growing field is due to the massively parallel nature of modern GPUs. GPU computing and FPGA computing are not identical, but they are clearly related. I've got good news for you, your Juggler chips already exist: http://www.eetimes.com/electronics-p...into-its-FPGAs
__________________
"OS5 is so fast that only Chuck Norris can use it." AeroMan |
||||||||
|
|
|
|
|
#38 | ||||||||||
|
Technoid
![]()
Join Date: Sep 2011
Location: Edinburgh, Scotland
Posts: 342
|
Quote:
GPUs are of course massively parallel, but they are optimised for a specific sort of workload, although they are becoming more general purpose lately. Quote:
__________________
Signature intentionally left blank |
||||||||||
|
|
|
|
|
#39 | |||||||||
|
Sockologist
![]()
|
Quote:
__________________
OCA This isn't SCSI... This is SATA!!! I have CDO. It's like OCD except all the letters are in ascending order. The way they should be. Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64 A1XE 800MHz / 512MB / Radeon 9200 / OS4.1 A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4 A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9 A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB |
|||||||||
|
|
|
|
|
#40 | |||||||||
|
Technoid
![]()
Join Date: Sep 2011
Location: Edinburgh, Scotland
Posts: 342
|
Quote:
I'm not saying it can't be done, or even done well, I'm only saying it's not optimal, because the chips are designed for something else and to make them do it you have to work around their limitations. In other words, if they are so good at doing ray tracing already, imagine if they were actually designed for ray tracing instead of rasterisation... it seems to me that the complexity of graphics is these days getting to the point where ray tracing could actually be faster! But a mainstream CPU is also far more complex than it needs to be, having been optimised for single-threaded performance, which is the opposite of what we want. I mean look at this: http://www.youtube.com/watch?v=x5aXxJGefxU 100% CPU work, and "Running in an E2140 1.6GHZ", that's not a lot of CPU, doesn't even have hyperthreading. Now if you had 16 of such cores instead of only two, each with 8-way hyperthreading instead of superscalar... this is where CPU and GPU would meet in the middle. The compromises made for streaming processors no longer seem appropriate.
__________________
Signature intentionally left blank |
|||||||||
|
|
|
|
|
#41 | ||||||||
|
Sockologist
![]()
|
Shader engine is an obsolete term. Modern GPUs are massively parallel stream processors that are Turing complete. You can use them to perform any inherently parallel task you like, provided you know how to code it. If you program them to ray trace, that is exactly what they do. Or you could program them to perform all-pairs n-body particle interaction, or brute force md5 sums. They are nothing whatsoever like fixed function, discrete shade unit graphics chips of a few years ago any more than a modern multicore x64 is like a 286. Their main application is graphics processing because that is the sort of inherently parallel task they excel at, whether it is simple rasterization or complex per pixel shading. However, you need to look at this in the abstract. It can be any algorithm operating on a set of data using thread per unit data parallelism. There is no shader, the shader is merely a software construct running on a truly general purpose (algorithmically speaking- stream processor. And it crushes CPUs for this
__________________
OCA This isn't SCSI... This is SATA!!! I have CDO. It's like OCD except all the letters are in ascending order. The way they should be. Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64 A1XE 800MHz / 512MB / Radeon 9200 / OS4.1 A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4 A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9 A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB |
||||||||
|
|
|
|
|
#42 | |||||||||
|
Cult Member
![]()
Join Date: Oct 2007
Posts: 800
|
Quote:
In the meantime, here's another couple of links about massively parallel chips that you may be interested in following up on: http://www.greenarraychips.com/ http://www.tilera.com/
__________________
"OS5 is so fast that only Chuck Norris can use it." AeroMan |
|||||||||
|
|
|
|
|
#43 | ||||||||
|
Technoid
![]()
Join Date: Sep 2011
Location: Edinburgh, Scotland
Posts: 342
|
Well if that is the case then a modern GPU *is* a CPU, the only difference being the way it is connected to the memory. But I still don't think that is quite the case. How I understand it, a GPU is given a "kernel" which is a small program that is run for every piece of data that comes in on the stream. They don't run a "full program" like a CPU does, but continually apply the same function over and over on the incoming data. Which is very useful. But its "Turing completeness" is limited to the bounds of the kernel, that is you can branch and loop as much as you like within a kernel, but you can't arbitrarily call one kernel from another. Also the data goes in one end and out the other, very useful if you can split your dataset up into loads of small independent chunks. If you're doing rasterisation this is very easy because every triangle can be done independently. Maybe there's some cunning trick to it but I don't know how ray tracing would work in that scheme, because you want to do blocks of pixels in parallel rather than triangles or objects so every pipeline needs access to the complete scene structure.
But theory aside, I've been putting "real time ray tracing" into Youtube and I get a lot of stuff on CPUs and GPUs, and a lot of it is very impressive, but I don't see that GPUs actually have any obvious advantage over CPUs so far.
__________________
Signature intentionally left blank |
||||||||
|
|
|
|
|
#44 | ||||||||
|
Sockologist
![]()
|
The modern GPU is basically a very large collection of arithmetic/logic units. Think of these as very simple CPU cores where stuff like conditional branching is expensive but data processing is not. Then imagine them in clusters, each cluster running the same code but on different data. Not like a SIMD unit, but as an array of cores, able to branch independently but optimal when in step. Now imagine a set of work supervisors that oversee them, detecting when clusters are waiting for IO and able to switch the thread group they are executing for one that is ready to go. Finally, imagine these being served by multiple memory controllers on demand. That's your basic GPU today. Current GPUs can even execute multiple kernels concurrently, so if one cannot occupy all the stream units, you can run more.
__________________
OCA This isn't SCSI... This is SATA!!! I have CDO. It's like OCD except all the letters are in ascending order. The way they should be. Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64 A1XE 800MHz / 512MB / Radeon 9200 / OS4.1 A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4 A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9 A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB |
||||||||
|
|
|
|
|
#45 | |||||||||
|
Technoid
![]()
Join Date: Sep 2011
Location: Edinburgh, Scotland
Posts: 342
|
Quote:
UltraSPARC T1 took a more holistic approach. Knowing servers always run umpteen threads at once, there's really no point in all that extra complexity to get the most single threaded performance. So they ditched it all and instead made a CPU core that could switch threads on every cycle. They only have to have a register file for each thread and rotate them round (hence the term "barrel processor"), and you can get rid of a whole load of complexity and go back to a very simple core that only does one instruction at once, which gives you room for loads more cores on a die, and cache misses can be made to vanish into the background. Single-thread performance is terrible, but if you can throw enough threads at it it can keep up with CPUs that run at far faster clock speeds. The T1 typically ran at 1.2GHz and, given the right sort of workloads, could keep pace with 3GHz Xeons.
__________________
Signature intentionally left blank |
|||||||||
|
|
|
![]() |
| Bookmarks |
| Tags |
| age , amiga , golden |
| Thread Tools | |
| Display Modes | |
|
|