amiga.org
     
iconAll times are GMT -6. The time now is 12:01 PM. | Welcome to Forum, please register to access all of our features.

Amiga.org Amiga computer related discussion Amiga Software Issues and Discussion 68k AGA AROS + UAE => winner!

Amiga Software Issues and Discussion This forum exists for the discussion of the use, issues with, and fun brought about by classic and next generation Amiga software.

Reply
 
Thread Tools Display Modes
Old 04-10-2004, 12:13 PM   #41
bloodline
Master Sock Abuser
Points: 39,844, Level: 100 Points: 39,844, Level: 100 Points: 39,844, Level: 100
Activity: 16% Activity: 16% Activity: 16%
 
bloodline's Avatar
 
Join Date: Mar 2002
Location: London, UK
Posts: 11,977
Blog Entries: 3
Default Re: 68k AGA AROS + UAE => winner!

@whoosh777

Quote:
who is Crumb?
An A.orger who wants inline 68k emulation in x86 AROS :-)

Quote:
>If you find a PC in the trash/skip/bin then dig it out and run AROS on it,
>I have several junk machines found that run AROS fine.

I am trying to decide which path to take, PC or A1?

if I buy an A1 I think I can sell it back to Eyetech,
this reduces the risk of making this purchase,

re: reflashing ROMs, could you also run Mac's OS on Pegasos + A1
as well as Morphos + OS4 + AROS on Macs?
I'm not sure Eyetech would want to buy your A1 back... not with out a massive loss (more than the Price of a PC!).

I suggest you talk to people and look around for the machine that best suits your needs for the lowest cost.
(click the link for BlackTroll for great prices on AROS PCs)

Yes you can run MacOS on both the A1 and the Pegasos, by using a special program called "Mac-on-Linux", but then you need to run Linux too.
MOS and OS4 don't run on the Mac. AROS should though.

Quote:
Some AROS questions:

1. what compiler(s) are used for recompiling AROS programs?
The Default Compiler is gcc. One AROS dev works in SAS/C on a real Amiga though. But to compile for the x86 you need to use gcc. We also have an x86 Assember program, as well as BASIC, False and Python all included with AROS.

Quote:
2. are commercial AROS programs allowed, eg could they sell an AROS version of IBrowse?
Of course you can run commercial apps on AROS. That fits with my personal computer paradigm:

1. You pay for the harware.
2. You pay for the drivers.
3. You pay for the software.
4. But the OS is free and open source.

In fact the AROS licence even allows you to sell AROS, with certain conditions applying, this is how the MorphOS team are able to use the AROS sources (they bug fix the code they use).

I hope to see comercial software appearing for AROS once it gets better established.
__________________
My iPhone Game: Puny Humans -
http://itunes.apple.com/gb/app/puny-...362230281?mt=8
bloodline is offline   Reply With Quote
Old 04-10-2004, 05:01 PM   #42
whoosh777
Too much caffeine
Points: 5,031, Level: 45 Points: 5,031, Level: 45 Points: 5,031, Level: 45
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Posts: 114
Default Re: 68k AGA AROS + UAE => winner!


@bloodline,

from your reply I think I will go down the AROS path,

my line of action being:

1. decide on machine eg A1 or PC?
2. buy machine,
3. familiarise myself with machine,
if its A1 familiarise myself with UBoot,
4. look into what AROS are doing:
either to contribute directly
or to write or port things to AROS,

so step 2 could be in 3 weeks and step 4 in 7 weeks time,


probably I will always be on the outermost periphery of AROS
and all other projects as I prefer to be standalone,


Getting AROS to boot directly on an A1 sounds a very high priority
project, so if it hasnt been completed I may join that project,
it also sounds very interesting,


I understand AROS already runs above Linux on A1 so AROS is there
already but not the way many people want,


I am sure I can contribute things to AROS though it may be
several months before I can contribute something substantial,


ambitious projects are slow moving, this is why I am reacting slowly,


I can see that AROS only projects will help tip the balance towards AROS,
"portability" is a virtue but "porting" is strategy,


>>who is Crumb?

>An A.orger who wants inline 68k emulation in x86 AROS

if you compile AROS with big endian Intel gcc then you can have
seamless 68k + x86 AROS integration using some variant of my
suggestion,

read + execute exceptions would toggle between emulated and nonemulated
instructions,

Amithlon has a big endian Intel gcc on www.aminet.net,


If the bytes of RAM are $12 $34 $56 $78 ...........

a big endian CPU sees this the way I have written it and

sees the first int of memory as $12345678

a little endian CPU sees this as .......... $78 $56 $34 $12

and reads the first int (address 0) as $78563412 totally different from
what the big endian CPU sees,

thus the 68k emulator is in danger of clashing with the x86 CPU,

eg addresses will be mangled,

resolve all this via a big endian Intel gcc compile of AROS:

68k and x86 can then access identical OS data structures,

>I'm not sure Eyetech would want to buy your A1 back...
>not with out a massive loss (more than the Price of a PC!).

they have a buy back policy with a depreciation formula,
not sure where I read this, but you could calculate how much
you lose,

I would only sell back the core machine, I would keep the peripherals,
the peripherals anyway wouldnt be bought from Eyetech


Would there be any point in creating your own AROS PPC platform?


:you could then open up the marketting policy,


>I suggest you talk to people and look around for the machine that
>best suits your needs for the lowest cost.
>(click the link for BlackTroll for great prices on AROS PCs)

>Yes you can run MacOS on both the A1 and the Pegasos,
>by using a special program called "Mac-on-Linux",
>but then you need to run Linux too.

I was thinking of directly doing it by reflashing the ROM,
but Mac being Mac probably have some proprietory obstacle to prevent this,

>MOS and OS4 don't run on the Mac. AROS should though.

>The Default Compiler is gcc.

this is the deciding factor,

which versions?

I hope you have gcc2.95.3-4 even though its not the most current,

is it a specifically AROS gcc or do you reuse generic ones?

Have you got 68k hosted cross compiler gcc's (PPC , Intel) for AROS?

(preferably gcc2.95.3-4),


>One AROS dev works in SAS/C on a real Amiga though.

SAS/C 650 compiles considerably faster (seconds) than gcc (minutes)
and produces slightly better code unoptimised than gcc -O2 optimisation,

gcc has its own strengths,

if I can use either I always use SAS/C, sometimes though
the only way to do something is with gcc,


eg ISTR that a really huge static array such as 1 million entries
(eg automatically generated look up table such as
int x[]={ 1 , 2 , 3 , ..., 1000000} ;} )
will crash the Amiga linker but I think gcc will be ok,


>But to compile for the x86 you need to use gcc.
>We also have an x86 Assember program, as well as BASIC, False and Python all
>included with AROS.

you realise that gcc is also an assembler, the moment a platform has gcc
it automatically has an assembler:

gcc -c xyz.s -o xyz.o

will assemble xyz.s, gcc uses nonstandard cross-platform assembler syntax though
eg for 68k gcc:

.text
.even

.globl _function

_function:

move.l #0,d0 /* this is a gcc assembler comment */

rts

to compile function()


it doesnt understand xref and xdef, .globl is xdef, xref is implicit,
so it wont assemble traditional 68k progs, they need fixing for gcc


I think it uses c style #define's for its macros, so it wont understand
Metacomco's macros,


on x86 it would also be .text, .even, .globl, /* comments */,
which reduces the learning curve,

to compile eg specific 68040 instructions you would type:

gcc -m68040 -c xyz.s -o xyz.o

(the default is 68000 + no FPU),

gcc -m68881 -m68020 -c xyz.s -o xyz.o

for 68020 + FPU code,

>Of course you can run commercial apps on AROS.
>That fits with my personal computer paradigm:
>
>1. You pay for the harware.
>2. You pay for the drivers.
>3. You pay for the software.
>4. But the OS is free and open source.

this is a deciding factor for me,

so eg commercial AROS IBrowse can be closed source?

(I think they wont do it open source)


>In fact the AROS licence even allows you to sell AROS,
>with certain conditions applying, this is how the MorphOS team
>are able to use the AROS sources (they bug fix the code they use).
>
>I hope to see comercial software appearing for AROS once it gets better established.
>


there is an opportunity immediately available for you here:


iospirit announced they have abandoned OS4 development,
there was a link to this from AmigaWorld.net at the time of the
KMOS takeover,


ask iospirit if they will recompile + sell IBrowse to AROS,


they have nothing to lose by doing this,
they already have an up and running website for selling IBrowse,


IBrowse is currently the flagship commercial program for the Amiga,


this would be a major coup,


tell them that you are working towards directly booting AROS on the A1,


AWEB is also now open source, so recompile that and you also get that,
(possibly it may need some work to compile it on gcc)

whoosh777 is offline   Reply With Quote
Old 04-10-2004, 06:05 PM   #43
Karlos
Sockologist
Points: 50,827, Level: 100 Points: 50,827, Level: 100 Points: 50,827, Level: 100
Activity: 8% Activity: 8% Activity: 8%
 
Karlos's Avatar
 
Join Date: Nov 2002
Location: Barishabaad, Sardistan
Posts: 16,670
Blog Entries: 18
Default Re: 68k AGA AROS + UAE => winner!

Quote:
Amithlon has a big endian Intel gcc on www.aminet.net,
They have a 680x0 hosted gcc that produces x86 code for amithlon.

AFAIK, there is no such thing as big endian x86.

Quote:
resolve all this via a big endian Intel gcc compile of AROS:

68k and x86 can then access identical OS data structures,
Again, I have no idea what you mean by "big endian intel".

x86 is little endian. 680x0 is big endian. Each has it's advantages and disadvantages.

Fundamentally the only difference is that on a big endian system, the lowest byte address of a multibyte object is the most significant, whereas for a little endian system, its the least significant.

Thus 680x0 is big endian by definition but its a bit odd if you consider register-only byte/word/long operations - the effect is litte endian. That is, a byte/word operation always affects the LSB/LSW of the register. Do the same thing on a 32-bit memory operand and you find the MSB/MSW is affected:

Imagine you have the value 0x01000001 in a memory address pointed to by (a0)

move.l (a0), d0
add.b #1, d0 ; least sig byte is affected
move.l d0, a0

gives 0x01000002

is completely different behaviour to

add.b #1, (a0) ; most sig byte is affected

which gives

0x02000001

On little endian systems like x86, the equivalent code for each of the above fragments generates the same result, affecting the LSB in both cases.

Of the common CPUs kicking around, only load and store architecture (as typified by RISC) tend to support endian swapping modes since all operations they perform are on registers. Never operating directly on memory operands and providing both big and little endian load/store instructions is how the PPC does it, for example.
__________________
OCA
This isn't SCSI... This is SATA!!!
I have CDO. It's like OCD except all the letters are in ascending order. The way they should be.
Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64
A1XE 800MHz / 512MB / Radeon 9200 / OS4.1
A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4
A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9
A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB
Karlos is offline   Reply With Quote
Old 04-11-2004, 04:00 PM   #44
whoosh777
Too much caffeine
Points: 5,031, Level: 45 Points: 5,031, Level: 45 Points: 5,031, Level: 45
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Posts: 114
Default Re: 68k AGA AROS + UAE => winner!


by Karlos on 2004/4/11 1:05:26

Quote:



>>Amithlon has a big endian Intel gcc on www.aminet.net,




>They have a 680x0 hosted gcc that produces x86 code for amithlon.

>AFAIK, there is no such thing as big endian x86.

not totally sure, I have this installed,

I created ram:xyz.c

extern int x ;

void f()
{
x = 0x12345678 ;
}

and now compiled it:

bin/i686be-amithlon-gcc -O2 -S ram:xyz.c -o ram:xyz.s

this generated x86 assembler:

.file "xyz.c"
.version "01.01"
gcc2_compiled.:
.text
.align 4
.globl f
.type f,@function
f:
bswap %ebp
pushl %ebp
bswap %ebp
movl %esp,%ebp
movl $2018915346,x
movl %ebp,%esp
popl %ebp
bswap %ebp
ret
.Lfe1:
.size f,.Lfe1-f
.ident "GCC: (GNU) 2.95.3 20010315 (release/lcs-2002-02-08)"

not sure what its doing, it looks like neither endianess,
anyone understand x86 assembler?

$12345678 is 0001 0010 0011 0100 0101 0110 0111 1000 in binary,

the above code has

$2018915346 which is 0010 0000 0001 1000 1001 0001 0101 0011 0100 110

which looks totally different,

not sure whats going on,

bswap looks like some kind of byte reversal but what is
it reversing?

(BTW it looks like its using some 1-address code
which is a good move IMO)



I was under the impression that it generates big endian code,
I think its also available for other platforms such as PPC,

when I say big endian, what I mean is that
for 2-byte word and 4-byte long accesses it will reverse the
byte order before accessing ram:

say we have:

int *x;

x* = $12345678 ;


on big endian we would get:

byte[ x ] = $12 ; byte[ x + 1 ] = $34 ; byte[ x+2] = $56 ; byte[ x+3]= $78 ;

(*A*)

on little endian (read carefully!):

byte[ x+3 ] = $12 ; byte[ x+2 ] = $34 ; byte[ x + 1 ] = $56 ; byte[ x ] = $78 ;

ie

byte[ x ] = $78 ; byte[ x + 1 ] = $56 ; byte[ x + 2 ] = $34 ; byte[ x + 3 ] = $12 ;

so to implement big endian on little endian:


x* = byte_reverse( $12345678 ) ; /* some CPUs may have an assembler instruction for this */

which would do

x* = $78563412 ;

ie

byte[ x ] = $12 ; byte[ x+1]=$34 ; byte[ x + 2 ] = $56 ; byte[ x + 3 ] = $78 ;

identical to what big endian would do see (*A*) above,

so as long as all memory accesses are intercepted with a byte reversal
by the compiler then Intel will behave as if it were big endian,


>>resolve all this via a big endian Intel gcc compile of AROS:

>>68k and x86 can then access identical OS data structures,




>Again, I have no idea what you mean by "big endian intel".

rewording will help:

((big-endian-ram)-emulation) gcc for Intel,

usual Intel CPU but a compiler that
intercepts all memory reads + writes by byte reversal:

for reads:

1.read memory
2.byte reverse it
3. use it,

for writes:

1. have data
2. byte reverse it
3. write it

>Thus 680x0 is big endian by definition but its a bit odd if you consider
>register-only byte/word/long operations - the effect is litte endian.
>That is, a byte/word operation always affects the LSB/LSW of the register.
>Do the same thing on a 32-bit memory operand and you find the MSB/MSW is affected:
>

>Imagine you have the value 0x01000001 in a memory address pointed to by (a0)
>
>move.l (a0), d0
>add.b #1, d0 ; least sig byte is affected
>move.l d0, a0
>
>gives 0x01000002
>
>is completely different behaviour to
>
>add.b #1, (a0) ; most sig byte is affected
>
>which gives
>
>0x02000001

I wasnt aware of this subtlety, usually inconsistencies such as this are
a sign of bad design,

with good design everything should be logically harmonious,

what they should have done is that the most significant register byte should
be affected and that eg

move.b (a0),d0

would load the number to the most significant byte of register d0,

I think you have located another design flaw of the 68k architecture,

maybe we should reimplement and improve 68k! eg:

fix this problem,
remove all the silly addressing modes,
remove pointless opcodes,
make all registers general purpose: where it currently says
effective address=mode xxx register xxx
we replace this by
effective address=mode xx register xxxx
(x = binary digit),
make exception frames store their size,
create huge caches,


the 68030 MMU OTOH is very well designed,
much better than the PPC MMU,


You know that PPC has a CPU flag that allows you to select whether
its big or little endian,


BTW regarding things which look fast but arent:

I was studying CPU registers + stack in supervisor mode
and other system things when I observed
that my A1200 supervisor stack pointer is at the top of chip ram ie
$200000,

so I thought I would speed it up by moving it to fast ram, I did this,
and then timed some graphics + disk intensive stuff and found it
made no difference at all!

whoosh777 is offline   Reply With Quote
Old 04-11-2004, 06:22 PM   #45
Karlos
Sockologist
Points: 50,827, Level: 100 Points: 50,827, Level: 100 Points: 50,827, Level: 100
Activity: 8% Activity: 8% Activity: 8%
 
Karlos's Avatar
 
Join Date: Nov 2002
Location: Barishabaad, Sardistan
Posts: 16,670
Blog Entries: 18
Default Re: 68k AGA AROS + UAE => winner!

Dude, your posts are huge :-D

Let's see.

So, there is a compiler for amithlon that generates x86 code that does automatic byte reversal for operands during load/save to memory, thus giving a "big endian" data model that the 680x0 model is compatible with.

This makes some sense. However, this also means that you have totally killed the benefits of a CPU capable of memory operands for the majority of normal code.
This is because you have to (for anything bigger than a byte) load the operand from memory to a register, swap it from its "big" endian representation to little endian, perform an operation on it, reverse it again and pump it back out to memory.

Sound familiar? It should. You turned your "complex addressing mode" x86 into a load/store architecture.

Unfortunately, x86 doesn't have the register count to make load/store based code effective. In fact, it is especially bad at it since it was (like the 680x0) designed to be able to have memory operands for most instructions.

Conversely, load/store code is the domain of CPUs like the PPC, where all operations are on registers and to compensate for lack of memory based operands, you have plenty of registers.

Basically, if what you say is true, the compiler generates code that is "big endian" data format compatible at the expense of a large amount of code required to wrap fairly simple operations. In other words, speed is secondary to compatibility.

Can you imagine what any half complicated C expression compiles to where variables are loaded from ram each time?
Instead of being able to perform arithmetic on a register using a memory operand, you have to load the memory operand to another register, byte swap it etc.

You also now see why the 1-address code model ain't really so hot at all (sorry, but it's true. x86 architecture comes straight from pocket calculator era, hence the "accumulator" register ;-) )

Back onto 68K

I disagree that the 680x0 architecture is flawed by its "odd" register behaviour, but I threw it in purely to show that endianness is even more perculiuar than many people think. Things are the way they are for a reason. It would be a flaw if it wasn't well documented that the registers behave that way and you just assumed they behaved the same way as memory ;-)

The 68K has one of the nicest architectures IMHO. Too bad motorola canned it. The memory indirect addressing modes are a bit pointless, but other than that it is a fine design.

GPUs are just ace for this type of oddness. I know ones which have little endian 16-bit words but big endian 32-bit words etc.

Yeah I know the PPC allows big/little endian operation. That's easy for load/store based architectures and it's not alone in being able to do it. There are ARM cores with the same ability.


As for the supervisor stack thing:

1) are you sure that the SSP wasnt already remapped somewhere else in fast ram buy your 030?

2) the system doesnt exactly spend long in supervisor mode anyway. Most OS functions do the bulk of their work in user mode.


-edit-

PS: Your asm code for the x86 $2018915346 is a decimal literal that is 0x78563412 in hex.

Thats your 0x12345678 with the bytes reversed. If your gcc compiler does create "big endian" data, it makes sense that integer literals would get statically converted in this fashion - the variable "x" now contains 0x78563412 as far as the x86 is concerned, but a 680x0 would see it as 0x12345678, which is what you are indending.
__________________
OCA
This isn't SCSI... This is SATA!!!
I have CDO. It's like OCD except all the letters are in ascending order. The way they should be.
Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64
A1XE 800MHz / 512MB / Radeon 9200 / OS4.1
A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4
A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9
A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB
Karlos is offline   Reply With Quote
Old 04-11-2004, 07:47 PM   #46
Karlos
Sockologist
Points: 50,827, Level: 100 Points: 50,827, Level: 100 Points: 50,827, Level: 100
Activity: 8% Activity: 8% Activity: 8%
 
Karlos's Avatar
 
Join Date: Nov 2002
Location: Barishabaad, Sardistan
Posts: 16,670
Blog Entries: 18
Default Re: 68k AGA AROS + UAE => winner!

Quote:
maybe we should reimplement and improve 68k! eg:

fix this problem,
remove all the silly addressing modes,
remove pointless opcodes,
make all registers general purpose: where it currently says
effective address=mode xxx register xxx
we replace this by
effective address=mode xx register xxxx
(x = binary digit),
make exception frames store their size,
create huge caches,
Fine, as long as you recognise it wont be object code compatible with any existing 680x0 :-D

If we are dreaming about the perfect 680x0...

As for effective addresses, as long as you have

operation <ea>, register
operation register, <ea>

for all the common instructions I'd be happy. For example, there is no

eor <ea>, dN

which is a bit of a bummer.

64-bit data registers and a new size type reflecting them would be nice. Operations involving them would naturally require extended opcodes since the existing 16-bit opcode word format has no ability to handle 8-byte elements.

However, youd never see this from the assembler perspective and hence have no idea that

add.q #1, d0

has a different opcode layout than

add.b/w/l #1, d0

Also, make sure you add some new muls/mulu operations that can do the long arithmetic and use a single 64 bit register result.

Another nice feature I'd like to see is an extended number of registers. Now, I know you can't actually do this blithely and have d9, d10 etc., because the existing opcode format doesnt allow it.

Instead, have a register "swap space" that allows you to have say 32 data registers in total and the ability to select which 8 of them are currently mapped to d0-d7.
The mapping can be done in such a way that it's impossible to have any of d0-d7 mapped to the same register. The simplest way to do this is to divide the full register space into "banks" (4 for a 32-register version), and you can set which bank dX is in.

For example, you might have a bank set opcode

setreg d0-d7, 0 ; // d0-d7 are using bank 0

setreg d2, 1 ; // d2 now mapped to bank 1

Changing the mapping would not destroy the original value. Eg

setreg d0-d7, 0

move.l #20, d0
setreg d0, 1
move.l #15, d0
setreg d0, 0 ; // d0 = 20
setreg d0, 1 ; // d0 = 15

To augment this, a "swap" operation, swapping the contents of the current dX with the equivalent dX register in another bank

swapreg d0, 3 ; // exchange current d0's value with bank #3 d0's value

Whilst not as flexible as a genuine 32 register file, it would remain object code compatible with older 680x0 and open interesting optimisations

I wish the 680x0 could be continued as a proper CPU (coldfire is interesting, but a more direct modern replacement for 68060 would be nice).

/dream off


Quote:
the 68030 MMU OTOH is very well designed,
much better than the PPC MMU,
Why do you say that? The only gripe I have is that the 603 MMU needs some software support to handle table lookup, but the 604 and higher chips dont (AFAIK).
__________________
OCA
This isn't SCSI... This is SATA!!!
I have CDO. It's like OCD except all the letters are in ascending order. The way they should be.
Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64
A1XE 800MHz / 512MB / Radeon 9200 / OS4.1
A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4
A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9
A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB
Karlos is offline   Reply With Quote
Old 04-12-2004, 05:38 PM   #47
whoosh777
Too much caffeine
Points: 5,031, Level: 45 Points: 5,031, Level: 45 Points: 5,031, Level: 45
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Posts: 114
Default Re: 68k AGA AROS + UAE => winner!


by Karlos on 2004/4/12 1:22:32

>Dude, your posts are huge

apologies in advance then!

>Let's see.

>So, there is a compiler for amithlon that generates x86 code that does automatic
>byte reversal for operands during load/save to memory, thus giving a "big endian"
>data model that the 680x0 model is compatible with.

ok, so it is big endian, I wasnt totally sure but from what you say later it is,

>This makes some sense. However, this also means that you have totally killed
>the benefits of a CPU capable of memory operands for the majority of normal code.
>This is because you have to (for anything bigger than a byte) load the operand
>from memory to a register, swap it from its "big" endian representation to
>little endian, perform an operation on it, reverse it again and pump it back
>out to memory.

exactly,

but Intels have huge caches (I think), so its not actually pumping it out
to memory but to the cache,

thus it will be a lot faster than you think,

I believe Amithlon does exactly this, and Amithlon users are
quite happy with its speed: its up to the mark of the more discerning
users whereas PPC isnt,

Note that the above compiler is for Amithlon, which proves I think that
Amithlon uses big endian emulation of RAM,

What does Sysinfo on Amithlon say the speed is??

>Sound familiar? It should. You turned your "complex addressing mode" x86 into a
>load/store architecture.
>
>Unfortunately, x86 doesn't have the register count to make load/store based code
>effective. In fact, it is especially bad at it since it was (like the 680x0)
>designed to be able to have memory operands for most instructions.
>

if it has 8 general purpose registers that is quite sufficient,
you may be underestimating compilers, see later,

>Conversely, load/store code is the domain of CPUs like the PPC, where all
>operations are on registers and to compensate for lack of memory based operands,
>you have plenty of registers.

PPC has toooo many registers, the people who designed it are clueless about
what compilers can do,

you need to look at the assembler output of SAS C which is very high quality
via omd or the output of gcc -O2 -S something.c -o something.s
before jumping to any conclusions,


to compute an expression with M terms you only need approx log_2(M) registers,

so to compute an expression with 64 terms eg

(x1 + x2/(x3 - x4*x5) /(x6+x7+ (x8-(x9/x10))) ................ x64

you only need approx 6 registers regardless of expression complexity,

you dont need 64 registers thats for sure,

IMO CPUs should be designed by s/w people, with h/w engineers to implement,

if you put engineers in charge you end up with interesting and slow CPUs,

IBM dont know how to design CPUs otherwise why did they originally use Intel
and now use Motorolas PPC specification

(I dont know how they did their Mainframes)

>Basically, if what you say is true, the compiler generates code that is
>"big endian" data format compatible at the expense of a large amount of code
>required to wrap fairly simple operations. In other words, speed is secondary
>to compatibility.

never make assumptions about speed, code that looks fast is often slow
and vice versa,

always remember that Amithlon does exactly this,
hands up anyone who thinks Amithlon is slow? (IYSWIM),

Note that in the assembler output the byte reversal happened at compile time,
so there was no run time overhead,


>Can you imagine what any half complicated C expression compiles to where variables
>are loaded from ram each time?
>Instead of being able to perform arithmetic on a register using a memory operand,
> you have to load the memory operand to another register, byte swap it etc.

your description of what happens is correct, but, I need to put you in the
picture of what compilers do:

variables are usually local variables, (unless its badly written code),

lets say you have a C function with 30 local variables,

a naive C compiler would implement that via 30 stack cells, not necessarily a
bad thing as the stack is almost guaranteed to be in the cache,
the byte reversal problem would happen,

However a sophisticated C compiler would implement those 30 local variables
by eg 4 registers so no byte reversal directly necessary,

a given variable say x would be implemented via several registers eg d0 d1 and d2,
conversely a given register say d2 would implement maybe 10 variables,

eg:

f( int x )
{
int y,z, t ;

y = 5 ; z = y + x ; t = z + 2 ;
return( z ) ;
}

in this the return is z = y + x = 5 + x ,

say x is a register d1

a good compiler would implement this as:

move.l d1,d0
add.l #5,d0
rts

2 registers implementing 4 variables,

the variable count is actually irrelevant to the compiler,
it is only interested in the data processing complexity which in
real programs is very low,

real progs tend to do things like:

if( strcmp( x->name , y->name )==0 ){ x->count++ ; thing_free( y ) ; return ; }

:data movement,

here y is absorbed by x and removed,
the maths is trivial nameley increase x->count by 1,


as explained earlier for an arbitrary complexity expression with 1024 terms
a compiler only needs approx 10 registers (it may be 9 or 11 registers),

many 1024 term expressions can be done with much fewer registers eg:

x1 + x2 + x3 +....+x1024 only needs 2 registers:

move x1,d0
move x2,d1
add d1,d0
move x3,d1
add d1,d0
move x4,d1
add d1,d0
.....
move x1024,d1
add d1,d0

(**A**)

(on 68k you may only need 1 register:
move x1,d0
add x2,d0
add x3,d0
add x4,d0
.......

In real code from real programs a lot of the time only 4 registers are necessary,

math expressions in real programs are usually very simple, most progs are
about data movement rather than maths,

with 32 registers you could do expressions with 4 billion terms which is
an impractical ability,

every register requires circuitry eg 64 flip flops and it requires "wires"
from these flip flops to the ALU (arithmetic and logic unit),
also the instruction decode circuitry becomes bigger: to decipher
register 30 = 11110 requires circuitry x4 & x3 & x2 & x1 & ~(x0)
4 AND-gates and 1 NOT-gate, so you will need 32 * 4 = 128 AND-gates,
and probably 32 * 4 / 2 = 64 NOT-gates, 192 components to decode 1 register,

PPC 3 address-code has *3* registers => 3 * 192 = 576 logic gates just
for instruction decode,

now add in the 64 * 32 flip flops for the registers = 2048 flip flops,

so that is already 2624 components,

are you still sure you want 32 registers? I am *certain* I dont!


space is very precious and limited,


this leads to a bigger CPU ie longer wires => higher latency => slower,

it also eats up instruction bits: 32 registers => 5 bits per register,
which doubles instruction size for 3-address PPC or MIPS,


>You also now see why the 1-address code model ain't really so hot at all (sorry,
>but it's true. x86 architecture comes straight from pocket calculator era,
>hence the "accumulator" register )

real compiler code uses exactly the accumulator concept see the code fragment
above: there is no alternative,
computing a mathematical expression is entirely an accumulator-process:

x = (y+z)*t ; load y, add z, mul t, store x,

also its always like this,

this is why accumulator CPUs will be very fast,


>Back onto 68K
>
>I disagree that the 680x0 architecture is flawed by its "odd" register behaviour,
>but I threw it in purely to show that endianness is even more perculiuar than
>many people think. Things are the way they are for a reason. It would be a flaw
>if it wasn't well
> documented that the registers behave that way and you just assumed they behaved
> the same way as memory
>
> The 68K has one of the nicest architectures IMHO. Too bad motorola canned it.
>The memory indirect addressing modes are a bit pointless, but other than that it
>is a fine design.

a lot of 68k is very well designed,


> GPUs are just ace for this type of oddness. I know ones which have little
>endian 16-bit words but big endian 32-bit words etc.
>
> Yeah I know the PPC allows big/little endian operation. That's easy for
>load/store based architectures and it's not alone in being able to do it.
>There are ARM cores with the same ability.

MIPS also,


> As for the supervisor stack thing:
>
> 1) are you sure that the SSP wasnt already remapped somewhere else in fast ram
>buy your 030?

yes, translation control register TC=$03ffffff,
bit 31 needs to be 1 to enable the MMU, so the MMU is off,

if I run
CPU fastrom

then TC=$80f17540, bit 31 is now 1 => MMU on,


> 2) the system doesnt exactly spend long in supervisor mode anyway.
>Most OS functions do the bulk of their work in user mode.

difficult to know this except by trying it and timing it,
all h/w interrupts and exceptions will be in supervisor mode,

anyway I timed it and that tells me your comment 2) is true,

> PS: Your asm code for the x86 $2018915346 is a decimal literal that is
>0x78563412 in hex.
>
> Thats your 0x12345678 with the bytes reversed. If your gcc compiler does create
>"big endian" data, it makes sense that integer literals would get statically
>converted in this fashion - the variable "x" now contains 0x78563412 as far as
>the x86 is concerned
>, but a 680x0 would see it as 0x12345678, which is what you are indending.

great!

you've solved this riddle, I couldnt see it,

when I see $ I immediately think that signifies hexadecimal,

have you figured out what the bswap statements are for?

whoosh777 is offline   Reply With Quote
Old 04-12-2004, 06:22 PM   #48
Karlos
Sockologist
Points: 50,827, Level: 100 Points: 50,827, Level: 100 Points: 50,827, Level: 100
Activity: 8% Activity: 8% Activity: 8%
 
Karlos's Avatar
 
Join Date: Nov 2002
Location: Barishabaad, Sardistan
Posts: 16,670
Blog Entries: 18
Default Re: 68k AGA AROS + UAE => winner!

Hi,

I'm sure I don't have the energy to reply to that point by point :-D

When I said the "big endian" data emulation model was slow, I meant compared to normal x86 code - not that it will be slow compared to a real 680x0 :-D

That is to say, compile 2 programs, one running in the normal x86 little endian memory access fashion, and once compiled to support a big endian data model. The little endian version will have much less work to do at runtime (eg swapping data when loading saving registers) and also the code will be optimal for it's "memory based operand" architecture.

As for RISC style load/store chips, lots of registers arent bad at all. Your main point is that you cant use them effectively and instructions increase in size etc.

On the usage front, I beg to differ - just study the PPC open ABI. Lots of important stuff (useful for the system) can stay in registers and you have lots of volatile registers for passing data to functions directly etc.

Stack based passing is only used there when there are more than 8 parameters to pass to a function and that's not often.

Also, the 3 operand instructions allow for lots of optimisations with clever compilers. Don't assume gcc is the smartest. I can quite assure you it isnt.

Ultimately, it comes down to a system where memory accesses for data are rare, since a lot of stuff is created in registers, passed arounf in registers and ultimately never ends up in memory unless that's its final destination.

This means memory/caches are hit less, stay valid for longer etc. and most memory accesses end up as bursts.

Quote:
IBM dont know how to design CPUs otherwise why did they originally use Intel and now use Motorolas PPC specification
They don't? As I recall, Motorola used IBM's existing POWER (Performance Optimised With Enhanced RISC) architecture to create a desktop CPU, the PowerPC (with Apple as the cheif customer). It was basically a partnership in which IBM provided the architecture, Motorola provided the fabrication processes and Apple sold them :-)

IBM currently make by far the best implementations of the PowerPC architecture, surpassing motorola by quite a margin.

On the design complexity front, the large register count is not a big deal. You do realise that modern x86 cores (since as far back as the Pentium2) use dozens of registers in shadow register files? Internally, the architecture is totally different from the code you see in your assembler. Incoming x86 code is decomposed into smaller RISC like operations. The core executes these operations and makes use of a very large number of registers in the process.

Hence the objections you raise aren't really valid because x86 and PPC both use multi-operand instructions with lots of registers. It's just that you see it directly on the PPC and don't on the x86 ;-)

Finally, I dunno what bswap are for, but I expect they probably aren't endian conversion operations. I'm not too hot on x86 assembler :-)

-edit-

Anyhow, to summarise, architectual comparisons are getting a bit off topic. Alas, if you believe you can compile AROS with this "big endian" dataformat GCC with the intention of adding a 680x0 emulation, I say go for it - it would be an interesting thing to see :-)

The trouble is, stimulating conversation though it has been, I genuinely don't feel I'm helping you get any closer that goal, so I'll settle for wishing you luck :-D
__________________
OCA
This isn't SCSI... This is SATA!!!
I have CDO. It's like OCD except all the letters are in ascending order. The way they should be.
Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64
A1XE 800MHz / 512MB / Radeon 9200 / OS4.1
A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4
A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9
A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB
Karlos is offline   Reply With Quote
Old 04-12-2004, 10:13 PM   #49
Megol
Merely Curious
Points: 4,085, Level: 40 Points: 4,085, Level: 40 Points: 4,085, Level: 40
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Nov 2003
Posts: 1
Default Re: 68k AGA AROS + UAE => winner!

Quote:
exactly, but Intels have huge caches (I think), so its not actually pumping it out
to memory but to the cache, thus it will be a lot faster than you think,
The problem is not with memory accesses (x86 generally have the most efficient caches in the computing world), the problem is that code goes from (stupid example):
mov eax, [ebx+myOffset] ; dependency on ebx
add eax, ecx ; dependency on the above load+ecx
to:
mov eax, [ebx+myOffset] ; dependency on ebx
bswap eax ; dependency on above load
add eax, ecx ; dependency on bswap+ecx

For a modern OOO processor the most important compiler optimization is to make the dependency chains as short as possible, bswaps lengthens the dependency chains and so makes the code generally slower.

Quote:
if it has 8 general purpose registers that is quite sufficient,
you may be underestimating compilers, see later,
No you are. Without enough "native" registers the compiler can't really do a number of powerful optimizations such as loop unrolling, software pipelining, loop fusion and more. That is (as with all thing in life) not completly true, x86 compilers can use those optimizations for some cases but not for more interresting problems.

Quote:
PPC has toooo many registers, the people who designed it are clueless about
what compilers can do, ...
What?!? RISC where designed to make it possible to get a more effective hardware-software interaction and one of the things that makes it better is more registers! You are really making yourself sounding clueless...

Quote:
<snip>
to compute an expression with M terms you only need approx log_2(M) registers,
No you don't need any (programmer visible) registers at all. LIFO-4-life.

Quote:
<more incoherent rantings snipped>
IBM dont know how to design CPUs otherwise why did they originally use Intel
and now use Motorolas PPC specification

(I dont know how they did their Mainframes)
Really... It is now clear that you have absolutely no f*c*ing clue about this area, and still you want to expose your ignorance?

IBM invented many things that now is used in processors all over the world, but they have no clue right?

Their research where what made RISC a possibility, their Power line of processors provided the base for the PPC architecture which clearly shows that they are clueless.

How they did their mainframes? They designed and implemented an architecture that inspired most other computer manufacturers and is still after many many years top of the line in it's field. But of course they don't know how to design CPUs...

Quote:
<snippyli snap>
a good compiler would implement this as:

move.l d1,d0
add.l #5,d0
rts

2 registers implementing 4 variables,
No a good compiler would inline that code...

Quote:
<snip>
many 1024 term expressions can be done with much fewer registers eg:

x1 + x2 + x3 +....+x1024 only needs 2 registers:

move x1,d0
move x2,d1
add d1,d0
move x3,d1
add d1,d0
move x4,d1
add d1,d0
.....
move x1024,d1
add d1,d0
And suddenly your processor is serialized! The last add will be dependent on the preceding 2047 instructions, a better compiler would make use of the superscalar nature of the processor and allow the hardware to parallellize it.

Quote:
<snip>
In real code from real programs a lot of the time only 4 registers are necessary,
Yes completly true.

Quote:
<snip++>
with 32 registers you could do expressions with 4 billion terms which is
an impractical ability,
And yet again your ignorance shows. The reason for having many registers are that one can generate more efficient code, I have sometimes been forced to use the ESP (the x86 stack pointer register) in some innerloops to get satisfactory results as the 7 "free" registers where not enough. 16 registers really simplifies the code generation (for both ASM-programmers and compilers) and 32 is even better.

Quote:
<things about register file circuitry snipped>
Let's see... The year is 2004, most manufacturers are now beginning to target 90nm processes. This fact combines with the fact that register files are compact (==short wires). When we then add the fact that modern processors already have >80registers (for renaming), that decode and scheduling parts of the processors are much more complex and larger than a tiny register file it really seem redicolous to complain about it. And with more registers we can optimize the code better and thus get faster execution, do you still complain?

Quote:
real compiler code uses exactly the accumulator concept see the code fragment
above: there is no alternative,
computing a mathematical expression is entirely an accumulator-process:

x = (y+z)*t ; load y, add z, mul t, store x,

also its always like this,

this is why accumulator CPUs will be very fast,
LOL!
Megol is offline   Reply With Quote
Old 04-13-2004, 04:17 AM   #50
that_punk_guy
Energizer Bunny of Babble
Points: 10,865, Level: 69 Points: 10,865, Level: 69 Points: 10,865, Level: 69
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2002
Posts: 4,526
Default Re: 68k AGA AROS + UAE => winner!

. . . . .
that_punk_guy is offline   Reply With Quote
Old 04-13-2004, 04:19 AM   #51
bloodline
Master Sock Abuser
Points: 39,844, Level: 100 Points: 39,844, Level: 100 Points: 39,844, Level: 100
Activity: 16% Activity: 16% Activity: 16%
 
bloodline's Avatar
 
Join Date: Mar 2002
Location: London, UK
Posts: 11,977
Blog Entries: 3
Default Re: 68k AGA AROS + UAE => winner!

Quote:
Getting AROS to boot directly on an A1 sounds a very high priority project, so if it hasnt been completed I may join that project, it also sounds very interesting,


I understand AROS already runs above Linux on A1 so AROS is there already but not the way many people want,
It would certainly be a great thing to have AROS running Nativly on the A1.

The PPC Linux hosted version of AROS is coming on rather quickly thanks to Markus, who is resolving some stack issues, and attempting to get the Graphics drivers to work.

Quote:
if you compile AROS with big endian Intel gcc then you can have seamless 68k + x86 AROS integration using some variant of my suggestion,

read + execute exceptions would toggle between emulated and nonemulated instructions,
That's true, if we treated all memory access in AROS as Big Endien we could have the same 68k emulation method as OS4 and MorphOS use. But it has been decided that the performance penalty of running a little Endien CPU with Big Endian data was to significant (something like 30% penalty) that it was not worth it.
Besides if we use an integrated UAE we also get Hardware compatibility and improved stability so it's a benefit all round.

Quote:
Would there be any point in creating your own AROS PPC platform?
AROS IS the platform, what hardware you choose to run it on is up to you. :-)
Be that a Mac, a PC, a Pegasos, an A1 or a washing machine... it's up to you.



Quote:
>The Default Compiler is gcc.

this is the deciding factor,

which versions?

I hope you have gcc2.95.3-4 even though its not the most current,

is it a specifically AROS gcc or do you reuse generic ones?

Have you got 68k hosted cross compiler gcc's (PPC , Intel) for AROS?
To compile AROS you have to use the latest gcc (3.x.x). The AROS native version of gcc is the latest.

gcc supports the x86, the PPC and the 68k. this was a deciding factor in choosing it. Since AROS can be compiled for all of those CPUs.

Quote:
you realise that gcc is also an assembler, the moment a platform has gcc
it automatically has an assembler:
Of course, other wise gcc wouldn't be able to output executable code. But the AROS distribution also includes the x86 assembler NASM, for those that want to just ply with x86 asm in AROS.
It should be noted however, that due to the cross platform nature of AROS, the use of ASM is actively discouraged. One should use C at all times.
The only exception is in some low level systems (noteably the exec.library) which need to access CPU specific features.

Quote:
so eg commercial AROS IBrowse can be closed source?
Yes of course software can be closed source. But it would then be up to the software vendor to provide support for the different CPU versions of their software (ie 68k, PPC and x86). This can lead to the situation where x86 AROS users get a program that PPC AROS users don't get, and Vice Versa.

Quote:
iospirit announced they have abandoned OS4 development,
there was a link to this from AmigaWorld.net at the time of the
KMOS takeover,


ask iospirit if they will recompile + sell IBrowse to AROS,


they have nothing to lose by doing this,
they already have an up and running website for selling IBrowse,
If there is demand for it, it will happen. Remember that AROS does not at this time have a fully functional TCP/IP stack. So general networking software is not a priority.
__________________
My iPhone Game: Puny Humans -
http://itunes.apple.com/gb/app/puny-...362230281?mt=8
bloodline is offline   Reply With Quote
Old 04-13-2004, 09:14 PM   #52
whoosh777
Too much caffeine
Points: 5,031, Level: 45 Points: 5,031, Level: 45 Points: 5,031, Level: 45
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Posts: 114
Default Re: 68k AGA AROS + UAE => winner!


@Megol,

-----------

<BOLLOX>

<SNIP>

-----------

in the time it would take to reply to your time-wasting I could
write some significant code, so I wont even give you the time of day,

buy your own watch,

you are yet another person who confuses h/w architecture with s/w,

oh well I suppose if you throw enough buzz words into one sentence
maybe it will get you somewhere, anything is possible,

sorry! byee!


@Karlos

>When I said the "big endian" data emulation model was slow,
>I meant compared to normal x86 code - not that it will be slow compared
>to a real 680x0

fair enough,

it will be trickier for 68k emulation to share datastructures with the
host AROS,

:you could run it in its own screenspace without problems,


>becomes slow though because the byte swapping will have to
>
>That is to say, compile 2 programs, one running in the normal x86 little
>endian memory access fashion, and once compiled to support a big endian data
>model. The little endian version will have much less work to do at runtime
>(eg swapping data when loading a
>saving registers) and also the code will be optimal for it's
>"memory based operand" architecture.

probably true, but you need to actually generate both versions and time it
before concluding things,

so many times I have done analysis comparable to yours and then spent
an afternoon "speeding" up the code: but when I run it the original "slow" code
is faster,

nowadays I always backup code fully before attempting "speed ups",

the big endian will certainly be slower, but will it be significantly slower?
eg in the gcc generated x86 asm fragment there was no overhead as the
reversal happened at compile time,

>As for RISC style load/store chips, lots of registers arent bad at all.
>Your main point is that you cant use them effectively and instructions
>increase in size etc.
>
>On the usage front, I beg to differ - just study the PPC open ABI.
> Lots of important stuff (useful for the system) can stay in registers and you
>have lots of volatile registers for passing data to functions directly etc.

you only need 1 register to store the entire system stuff,

register r9 = pointer to struct { void *this, *that, *theother, *execbase, ... } ;

you want execbase in r3?:

move.l 12(r9),r3

1 asm instruction, nothing really,

(I am not talking here about CPU registers eg SRP and TC on 68030 which are
not general purpose registers),

on AmigaOS its quite practical to have no system stuff at all in registers,

(ok the library base goes into a6, but thats it),

all other system stuff can be obtained in one or 2 asm statements
on the rare occasions its needed,

give me a list of specific system things you want in individual registers?

>Stack based passing is only used there when there are more than
>8 parameters to pass to a function and that's not often.

register arguments dont necessarily speed things up:
the registers need to be backed up to the stack and later restored
(==2 extra stack references), so its only useful for heavily used registers,

but if a register is heavily used the overhead for backing up the original
contents becomes irrelevant

so actually stack based arguments are more practical and totally portable,

>Also, the 3 operand instructions allow for lots of optimisations
>with clever compilers. Don't assume gcc is the smartest.
>I can quite assure you it isnt.

SAS C is better than gcc,

3 operand optimisations are a false economy:

to do d0 + d1 + d2 + d3 via 3 address code and lots of registers:

add.l d0,d1,d4
add.l d2,d3,d5
add.l d4,d5,d6

is 12 bytes of instructions and has a register cost of 3 (d4,d5,d6),

load d0
add d1
add d2
add d3

is a mere 4 bytes of instruction and has 0 register cost,
(register cost == 1 if you count the accumlator register),

that is an efficiency improvement of 3 and 3 respectively,


I do know what I am talking about despite Megols confused attempt,

I think I will argue my point till I am hoarse,


so my time is better spent coding which I am doing right now:
I am currently "improving" my 68k AmigaOS,


after todays posts I will be spending several hours coding to
modernise 68k AmigaOS, this is why I dont have the 1/2 hour to reply to Megol:
priorities,


your comments are constructive so I give it the 1/2 hour,


>Ultimately, it comes down to a system where memory accesses for
>data are rare, since a lot of stuff is created in registers,
>passed arounf in registers and ultimately never ends up in memory unless
>that's its final destination.

well written code will be implemented with minimal memory references,

often the major overhead is h/w i/o eg disks,

>This means memory/caches are hit less, stay valid for longer etc. and
>most memory accesses end up as bursts.

>>IBM dont know how to design CPUs otherwise why did they
>>originally use Intel and now use Motorolas PPC specification

>They don't? As I recall, Motorola used IBM's existing POWER
>(Performance Optimised With Enhanced RISC) architecture to create a desktop CPU,
>the PowerPC (with Apple as the cheif customer).
>It was basically a partnership in which IBM provided the architectua
>e, Motorola provided the fabrication processes and Apple sold them

you are right, Power is IBM's, for some reason I thought it was Motorola,

my point is valid though because Intel is faster,

"Performance Optimised With Enhanced RISC And Nowhere Near As Quick As Intel"

POWERANNAQAI

IBM good Intel bad Motorola hopeless,

>IBM currently make by far the best implementations of the PowerPC architecture,
>surpassing motorola by quite a margin.

true, and Intel is faster,

>On the design complexity front, the large register count is not a big deal.
>You do realise that modern x86 cores (since as far back as the Pentium2)
>use dozens of registers in shadow register files?
>Internally, the architecture is totally different from tha
> code you see in your assembler. Incoming x86 code is decomposed into smaller
>RISC like operations. The core executes these operations and makes use of a
>very large number of registers in the process.

but hidden registers are better than visible ones,

I would prefer to have an ultra rapid stack cache rather than lots of registers,

have 32 * 64 bytes of stack cache,

>Hence the objections you raise aren't really valid because x86 and PPC
>both use multi-operand instructions with lots of registers.
>It's just that you see it directly on the PPC and don't on the x86

but then your objections to x86 also arent valid because your criticism
was that x86 has to act on ram operands and now you tell me it doesnt,

>Finally, I dunno what bswap are for, but I expect they probably aren't
>endian conversion operations. I'm not too hot on x86 assembler

ok, I wont ask Megol because he's trouble,

>The trouble is, stimulating conversation though it has been,
>I genuinely don't feel I'm helping you get any closer that goal,
>so I'll settle for wishing you luck

fine,

whoosh777 is offline   Reply With Quote
Old 04-13-2004, 09:56 PM   #53
Karlos
Sockologist
Points: 50,827, Level: 100 Points: 50,827, Level: 100 Points: 50,827, Level: 100
Activity: 8% Activity: 8% Activity: 8%
 
Karlos's Avatar
 
Join Date: Nov 2002
Location: Barishabaad, Sardistan
Posts: 16,670
Blog Entries: 18
Default Re: 68k AGA AROS + UAE => winner!

Hi,

Quote:
but then your objections to x86 also arent valid because your criticism was that x86 has to act on ram operands and now you tell me it doesnt
No, that's not quite what I said. I meant the design paragdim of x86, from the programmer perspective, is that it can use memory direct operands for x86 level instructions, just like 680x0 does. So you do an add operation of a memory opetand onto a register or vice versa etc.

It doesn't *have* to use menory operands (adding register to register, for example is fine), but it doens't have a great deal of registers to spare, so it makes sense to use memory operands.

Up to the Pentium, this is how the core worked, but it was badly falling behind newer RISC architectures in terms of performance. The chip designers could see the advantage of load/store based superscalar CPUs, but couldn't throw away their existing object code compatibility.

So they did pretty much what motorola also did with the 68060, and later cores simply used a lower level RISC style code that incoming x86 "micro-op" instructions are dissasembled into.

So, from your perspective, as a programmer, the x86 does use memory based operands, and you can add a memory operand to your accumulator register.

Now, in reality, that add instruction, during execution gets decomposed into a micro-op load operation to fetch the ram operand, an add, a register file write and so on.

If you look at what x86 code is translated into at micro-op level, you see something very much more like your PPC style code, with lots of register to register operations, many registers and load/store for memory access.

However, as a programmer, you are not exposed to this level, but it does exist. This is partly why most chip designers simply shrug off RISC v CISC comparisions these days, since deep down most modern CPUs are virtually the same.

Regarding 32 registers on PPC, I wrote plenty of code that makes use of it. You are incorrect about the implicit register backup required on the stack for register based calls because the PPC open ABI defines those registers as volatile. Just like you never need to worry about saving d0/d1/a0/a1 for most 680x0 stuff (and similarly cannot assume they survive a function call), roughly half the registers of the PPC ABI are volatile and can be used for argument passing and temporary local variables.

I wrote various matrix multiplication code for transforms on ppc (some of the only hand asm I wrote) that is extremely fast since the entire transformation matrix' terms remain in registers for any number of vertices processed. 12 floats defining the buisness end of the matrix are held in volatile registers and you even get single cycle 4-operand multiply-add (register*register+register -> register) operations to help when evaluating the matrix * vector stuff.

Literally, throught the loop the only stuff accessing memory was the instruction stream, incoming vertices and outgoing transformed vertieces. And the loop, of course pretty much stays in the cache :-D

There is no way I could get say the 680x0 to get anywhere near the code efficiency of it.

I know you don't see large register files as at all useful, but if you look at the way present x86 processors actually work inside and the way they are going with the 64-bit designs (IIRC, IA64 has 256 registers) I think you might be in a minority.
__________________
OCA
This isn't SCSI... This is SATA!!!
I have CDO. It's like OCD except all the letters are in ascending order. The way they should be.
Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64
A1XE 800MHz / 512MB / Radeon 9200 / OS4.1
A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4
A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9
A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB
Karlos is offline   Reply With Quote
Old 04-13-2004, 11:54 PM   #54
whoosh777
Too much caffeine
Points: 5,031, Level: 45 Points: 5,031, Level: 45 Points: 5,031, Level: 45
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Posts: 114
Default Re: 68k AGA AROS + UAE => winner!


@bloodline


>>Getting AROS to boot directly on an A1 sounds a very high priority project, so if it hasnt been completed I may join that project, it also sounds very interesting,


>>I understand AROS already runs above Linux on A1 so AROS is there already but not the way many people want,




>>It would certainly be a great thing to have AROS running Nativly on the A1.

>The PPC Linux hosted version of AROS is coming on rather quickly thanks to Markus,
>who is resolving some stack issues, and attempting to get the Graphics drivers
>to work.

for me the word Linux is underlined here,

can this Linux work be reused in a directly A1 booting AROS?

IMO the future of the Amiga will be entirely powered by 3rd party developers,


I still havent decided between A1 vs PC, I have decided on AROS though!

I feel if I buy a PC I am a turncoat or traitor, however if AROS directly
boots ie no Windows and its not Intel then maybe thats better than
using IBM PPC on the A1?

Its strange that IBM are now "good" and Intel are "bad",

Having read the book "Big Blue" IMO IBM are anything but good,

and there is no basis for thinking Intel are "bad": Intel never
did anything "bad",

MS OTOH IMO are bad,


Now PPC AROS has a very clean boot environment: portable + standard
via Openfirmware and eventually UBoot too,

How clean is PC AROS boot?

(the cleanness of the PPC AROS boot appeals to me),


Re PC AROS if I have understood you:

1. I buy a PC,
2. I download AROS,
3. I directly boot AROS?
4. I run UAE above AROS for full 68k compatibilty?

Is this correct?



In the UK have you any tips about buying a new PC?

Are the places like PCWorld, Comet, Staples, Dixons a good place to try
or should I go to specialist shops eg from computer mag adverts?



>>If you compile AROS with big endian Intel gcc then you can have seamless 68k + x86
>> AROS integration using some variant of my suggestion,

>>read + execute exceptions would toggle between emulated and nonemulated instructions,




>That's true, if we treated all memory access in AROS as Big Endien
>we could have the same 68k emulation method as OS4 and MorphOS use.
>But it has been decided that the performance penalty of running
>a little Endien CPU with Big Endian data was to significant
>(something like 30% penalty) that it was not worth it.

30% is nothing,

if a car goes by at 70mph and 10 minutes later another car goes by at 100mph
would you know the difference (I am talking about perceptions here),
(70mph being 30% slower than 100mph)

can you go both ways: ie have Big endian PC AROS and Little endian PC AROS,


>Besides if we use an integrated UAE we also get Hardware compatibility
>and improved stability so it's a benefit all round.

can you integrate UAE at the RAM level with little endian RAM?

if a 68k program accesses OS data structures ints and words at the byte level
or bytes at the word level the OS will get mangled

most programs wont do this so maybe you dont lose too much,

>>Would there be any point in creating your own AROS PPC platform?


>AROS IS the platform, what hardware you choose to run it on is up to you.
>Be that a Mac, a PC, a Pegasos, an A1 or a washing machine... it's up to you.

!

ok, the answer is no!

everyone says its not a big deal that Eyetech created the A1,
I wondered whether they could prove this by doing their own one,

>>>The Default Compiler is gcc.

>>this is the deciding factor,

>>which versions?

>>I hope you have gcc2.95.3-4 even though its not the most current,

>>is it a specifically AROS gcc or do you reuse generic ones?

>>Have you got 68k hosted cross compiler gcc's (PPC , Intel) for AROS?


>To compile AROS you have to use the latest gcc (3.x.x).
>The AROS native version of gcc is the latest.

do you keep the earlier ones?

>gcc supports the x86, the PPC and the 68k.
>this was a deciding factor in choosing it.
>Since AROS can be compiled for all of those CPUs.

gcc supports all modern CPUs in existence more or less,

certainly all CPUs that run Unix and Linux: there are many,

if you dont have gcc you have to have a very good excuse,
no other programs are compulsory though,


>>you realise that gcc is also an assembler, the moment a platform has gcc
>>it automatically has an assembler:


>Of course, other wise gcc wouldn't be able to output executable code.

not what I meant,
some compilers only have an internal assembler,

some Modula 3 compilers dont even have an internal assembler:
they convert Modula 3 progs to c which is then fed to gcc,

A Basic interpreter such as AmigaBasic I think doesnt have an assembler,
you can write + run progs with it though,

IMO 68k hosted cross compilers are very important,
imagine you have say:

big endian PC AROS, little endian PC AROS, Openfirmware AROS, UBoot AROS,
68k AROS,

ie 5 variants of AROS,

that would require 25 cross compilers to get code from any one AROS to any
other,


however if you have 68k hosted cross compilers then you only need 5
68k-hosted cross compilers,


this would greatly reduce the compiler maintainance overhead,

I think you need 2 68k hosted cross compiler gcc's
for PPC and PC AROS,

eople on all Amiga variants could then start generating AROS native progs,




>But the AROS distribution also includes the x86 assembler NASM,
>for those that want to just ply with x86 asm in AROS.

a real assembler is useful because it will conform with
3rd party assembler textbooks
:gcc assembler *doesnt* conform, it is totally nonstandard,
for me it has become my default 68k assembler though,

>It should be noted however, that due to the cross platform nature of AROS,
>the use of ASM is actively discouraged. One should use C at all times.
>The only exception is in some low level systems (noteably the exec.library)
>which need to access CPU specific features.

I agree with the policy, however some things can only be done via assembler,


the AROS developers themselves probably need to use asm
to implement low level things,


also to really understand a machine you need to at least understand assembler,

there are a lot of programmers who will only code in assembler,
give them an assembler and they will write fantastic programs,

and they will *never* use C or any other high level language,

you were talking about demo coders, you may need to provide full docs on
coding entirely on assembler on specific platforms if you want
fantastic demos written,


These demos will be unportable but fantastic, the demo coders soon move on
to writing games,


there is a real buzz from directly controlling hardware from asm statements,
switching supervisor stack to fastram was really satisfying even though it
just amounts to:

movea.l new_stackpointer,a7

but done in supervisor mode, (its more complicated because you have to
also copy the exception stack frame over, I used 11 asm statements to do it
ending in rte,)

you cannot get this buzz from C. You are the controller with asm, with C
the controller is the compiler and the OS,

when you control via asm you start to develop a total
understanding of the system,



With the a500 and a600 there were literally *millions* of users who *never*
even used Workbench, they booted games directly,


this may be why games consoles are so popular,


Now you can also write C demos but its a totally different mindset,

C coding tends to lead to system programs and apps,

I am not into apps myself (except possibly developer apps) as
apps are either too entertaining (paint and songs) or too office-ish,
I want serious fun, not paint-cans and doh-ray-mee and accounts,
(company statistician speaking, pencil behind ear,),


assembler coding leads to games: creatures + music + graphics,


3D games probably needs C, though maybe just the engine needs to be done in C?


2D games is an inexhaustible genre,
my favourites are Lemmings, boulderdash, tetris,
(all of which could've been done in C),

good games are not actually about impressive graphics but about
engaging your mind in an interesting way,


IMO the best games are 2D, 3D games look impressive as an onlooker
but to actually play I find them a total disappointment: why not just do
something in the real world if you want 3D eg play basketball as a hobby,


learn to drive a real car, driving an arcade car is so sad, please grow up,
if you are going to be sad be sad properly!


computer 3D always sucks because you can literally see the computer slow down
and wince whenever something computationally complex happens,


:the real world never slows down (except in the circus in that film "Big Fish"),


if I could slow down the real world, I would smash a whole column of plates
and utilise the slowdown somehow,



>>so eg commercial AROS IBrowse can be closed source?

>Yes of course software can be closed source.

good, if it wasnt I would say "very interesting but no thank you",


industry is all about closed source design,

closed source => competition + money,
open source => bloat,

open source can be lean and mean but then it gets raided,


if you spend 1 month generating some nightmare cutting edge algorithms for
some API you may not want 3rd party developers to raid your work,


lets say you spent 1 year creating a cutting edge 3D graphics engine,
you could make some serious money from this, not just games:
you could sell it to film companies and make 7 digits of money,


what happens on gnu is developers cover their tracks by coating the
work with impenetrable layers of bloat,

probably same is true of Linux,

Amiga.org is closed source isnt it?


ISTR Mike Bouma paid a fat cheque for AmigaWorld.net, quite right too,
he had the money they had the IP, they made an exchange,
now he has the site, they have the money, everyones happy,


Mike really wanted amiga.org, had amiga.org been open source he would
have just downloaded it and created his own site to compete with amiga.org,

(this is public knowledge based on public discussions between Mike and Wayne),

I think Mike wasnt prepared to pay for Waynes asking price,

maybe Wayne should have rented it out!
(or rented out some forums),


>But it would then be up to the software vendor to provide support for
>the different CPU versions of their software (ie 68k, PPC and x86).
>This can lead to the situation where x86 AROS users get a program that PPC
> AROS users don't get, and Vice Versa.

but this is better than no program at all!

Shelves of stuff will only appear if its closed source,

a lot of www.aminet is closed source,

Note that if a 68k version is also done then it reaches all platforms via UAE,
so its just the native compile that would be lacking,

:this is a good reason for having a fully implemented 68k AROS,


>>iospirit announced they have abandoned OS4 development,
>>there was a link to this from AmigaWorld.net at the time of the
>> KMOS takeover,
>>
>>
>>ask iospirit if they will recompile + sell IBrowse to AROS,
>>
>>
>> they have nothing to lose by doing this,
>> they already have an up and running website for selling IBrowse,

>If there is demand for it, it will happen.
>Remember that AROS does not at this time have a fully functional TCP/IP stack.
>So general networking software is not a priority.


If AROS has fully integrated 68k compatibility you could use a
3rd party 68k TCP/IP stack until you have your own open source one written,


:you mustnt use Amiga co. OS binaries, but you are free to use 3rd party
68k OS binaries + libraries eg from www.aminet,

can you get the 3rd party TCP/IP stack people to recompile their code for you?
presumably its in OS3.1 c??

IBrowse and other commercial developers may not be aware that
closed source is permitted,


I didnt know and assumed it wasnt permitted except via 68k and UAE,
in fact I was dreading your reply to the question, luckily the answer is
exactly what I want,


you may need to target some publicity at potential commercial developers
about AROS allowing closed source + commercial programs,


AROS publicity tends to tell us that AROS is an open source reimplementation
of OS3.1, the phrase "closed source" is never mentioned,

to this day I dont even know if closed source commercial binaries
are allowed on Linux,

whoosh777 is offline   Reply With Quote
Old 04-14-2004, 04:05 AM   #55
bloodline
Master Sock Abuser
Points: 39,844, Level: 100 Points: 39,844, Level: 100 Points: 39,844, Level: 100
Activity: 16% Activity: 16% Activity: 16%
 
bloodline's Avatar
 
Join Date: Mar 2002
Location: London, UK
Posts: 11,977
Blog Entries: 3
Default Re: 68k AGA AROS + UAE => winner!

Quote:
>>It would certainly be a great thing to have AROS running Nativly on the A1.

>The PPC Linux hosted version of AROS is coming on rather quickly thanks to Markus,
>who is resolving some stack issues, and attempting to get the Graphics drivers
>to work.

for me the word Linux is underlined here,

can this Linux work be reused in a directly A1 booting AROS?
Yes, when you think of AROS Hosted, think of Linux as a Hardware abstraction layer. Running AROS on Linux can be thought of as the same as running AOS3.1 in UAE.

While AROS is running on Linux one can work out all the bugs and issues. Then you can add the Firmware boot code and boot AROS on it's own.

It should be noteed at 99.9% of the AROS source code is cross platform, it's just the CPU specific/ASM stuff that needs reworking.

Quote:
I feel if I buy a PC I am a turncoat or traitor, however if AROS directly
boots ie no Windows and its not Intel then maybe thats better than
using IBM PPC on the A1?

Its strange that IBM are now "good" and Intel are "bad",

Having read the book "Big Blue" IMO IBM are anything but good,

and there is no basis for thinking Intel are "bad": Intel never
did anything "bad",

MS OTOH IMO are bad,
To be honest, the "good"/"bad" lables are a redundant conceptual model. Nothing is good or bad, things fall into two categories, "Usefull" and "Useless" with respect to your requirements.
Windows for example is "Useless" if I want to use an OS that looks and feels the way I want an OS to look and feel, but it is "Useful" is I want to run a certain peice of software.
When choosing hardware you must first consider what your requirements are, then choose what is useful and at the cheapest price. Ignore religious/political issues like "brand" and "make", these things are unimportant when it comes to technology.



Quote:
How clean is PC AROS boot?

(the cleanness of the PPC AROS boot appeals to me),


Re PC AROS if I have understood you:

1. I buy a PC,
2. I download AROS,
3. I directly boot AROS?
4. I run UAE above AROS for full 68k compatibilty?

Is this correct?
Yes, just download the AROS CD image, burn that to a CD-ROM, then put that in the CD drive, turn the PC on... AROS will boot and run by itself.
You will presented with an early startup menu allowing you to select certain hardware options (good news if you have a Nvidia gfx card), or you can ignore them and it will boot after 5 seconds.

Quote:
In the UK have you any tips about buying a new PC?

Are the places like PCWorld, Comet, Staples, Dixons a good place to try
or should I go to specialist shops eg from computer mag adverts?
I would build the machine myself. There are plenty of shorps that sell PC parts for good prices. http://www.dabs.com is a great UK website selling top quality parts for a low price.
Don't forget that Black Troll sell complete PC's with AROS already installed for around 160 or so (depending upon the exchange rate).
High Street stores will rip you off.

Quote:
>That's true, if we treated all memory access in AROS as Big Endien
>we could have the same 68k emulation method as OS4 and MorphOS use.
>But it has been decided that the performance penalty of running
>a little Endien CPU with Big Endian data was to significant
>(something like 30% penalty) that it was not worth it.

30% is nothing,

if a car goes by at 70mph and 10 minutes later another car goes by at 100mph
would you know the difference (I am talking about perceptions here),
(70mph being 30% slower than 100mph)

can you go both ways: ie have Big endian PC AROS and Little endian PC AROS,
30% is far too much. When running AROS on a 3.066Ghz CPU, are you really happy to write off nearly a whole 1Ghz (919.8Mhz) of performance?

There is no point to cripple a CPU, AROS runs using the Native byte order of the CPU, thus it is big endien on 68k and PPC and little Endien on the x86.

You could build a Big Endien AROS for the x86 but that would be incompatible with the faster Little Endian one.

Quote:
Besides if we use an integrated UAE we also get Hardware compatibility
>and improved stability so it's a benefit all round.

can you integrate UAE at the RAM level with little endian RAM?

if a 68k program accesses OS data structures ints and words at the byte level
or bytes at the word level the OS will get mangled

most programs wont do this so maybe you dont lose too much,
The Idea for the integrated UAE is so that the 68k and the x86 do not share Data structures. But instead allow the two system to synchronise their data. This will allow 68k programs to run in the same environment as the x86 programs. The only down side is that 68k programs will not be able to call x86 functions and vice versa. This could be possible, but probably not worth it. The UAE Emulator will be running a 68k version of AROS (specially designed to synchronise with the x86 version).

Quote:
everyone says its not a big deal that Eyetech created the A1, I wondered whether they could prove this by doing their own one,
Anyone is able to sell Terrons. I could put a little sticker on it if you like and sell it to you.

Quote:
people on all Amiga variants could then start generating AROS native progs,
Since AROS is source code compatible with AmigaOS, it is easy to write your program on your A1200 in C... and then take that source code to An AROS machine and recompile for whatever CPU that is running.

gcc has a cross compiler, there is no probelm generating code for any CPU from any CPU, providing you have the includes of course.

Quote:
computer 3D always sucks because you can literally see the computer slow down
and wince whenever something computationally complex happens,
I've guess you've not used a new 3D card then. I have yet to write a program that causes my Radeon 9000 to slow down even with over 10000 objects (using the DX7 interface).

Quote:
Amiga.org is closed source isnt it?
No. Both Amiga.org and Amigaworld.net use xoops which is a great example of opensource software. Aros-Exec.org also uses xoops.

Quote:
Note that if a 68k version is also done then it reaches all platforms via UAE,
so its just the native compile that would be lacking,

:this is a good reason for having a fully implemented 68k AROS,
We need the 68k AROS for the integrated UAE Emulator idea. I also want to run AROS on my A1200. We do have a working 68k AROS, but it needs to be adapted to boot the Amgia Hardwre. At the moment it only boots the Palm PDA.

Quote:
If AROS has fully integrated 68k compatibility you could use a
3rd party 68k TCP/IP stack until you have your own open source one written,
That would require a Big endien AROS, something that we have already decided is a bad idea on the x86.

AROS will not use a 3rd party TCP/IP stack. When AROS gets a TCP/IP stack it will be fully integrated and designed as part of AROS, rather than an add-on.

Quote:
you may need to target some publicity at potential commercial developers
about AROS allowing closed source + commercial programs,


AROS publicity tends to tell us that AROS is an open source reimplementation
of OS3.1, the phrase "closed source" is never mentioned,
AROS is a word of mouth effort, there's no budget for promotion :-)

Well AROS is an Open source reimplementation of OS3.1 :-)

Quote:
to this day I dont even know if closed source commercial binaries are allowed on Linux,
It depends. If you link to a GPL library or use any GPL code, then your software automatically becomes GPL.

If you link to an LGPL library then you program is not GPL or LGPL.

If you use any BSD code then that does does not cause your code to be BSD.

Simply check the licence of any software you are using to find out what you can and can't do.

AROS is covered by the APL, which is similar to LGPL. IF you use AROS source only the code that you use must remain Opensource. The rest of your program is yours.

Licence issues are very complex.
__________________
My iPhone Game: Puny Humans -
http://itunes.apple.com/gb/app/puny-...362230281?mt=8
bloodline is offline   Reply With Quote
Old 04-14-2004, 04:40 AM   #56
Crumb
Defender of the Faith
Points: 12,515, Level: 73 Points: 12,515, Level: 73 Points: 12,515, Level: 73
Activity: 2% Activity: 2% Activity: 2%
 
Crumb's Avatar
 
Join Date: Mar 2002
Posts: 1,764
Default Re: 68k AGA AROS + UAE => winner!

@Bloodline:

hi!
"An A.orger who wants inline 68k emulation in x86 AROS "

that's me! :-)

Bernd Meyer replied some of my posts in ANN, and he thinks that using the memory as I described (using the memory in reverse order for the 68k memory allocated areas like in the Mac emu Executor) may work to a certain degree but may cause problems like reversed screens etc... so a better approach should be found
__________________
The only spanish amiga news web page/club: Club de Usuarios de Amiga de Zaragoza (CUAZ)
Crumb is offline   Reply With Quote
Old 04-14-2004, 04:48 AM   #57
bloodline
Master Sock Abuser
Points: 39,844, Level: 100 Points: 39,844, Level: 100 Points: 39,844, Level: 100
Activity: 16% Activity: 16% Activity: 16%
 
bloodline's Avatar
 
Join Date: Mar 2002
Location: London, UK
Posts: 11,977
Blog Entries: 3
Default Re: 68k AGA AROS + UAE => winner!

Quote:
Crumb wrote:
@Bloodline:

hi!
"An A.orger who wants inline 68k emulation in x86 AROS "

that's me! :-)

Bernd Meyer replied some of my posts in ANN, and he thinks that using the memory as I described (using the memory in reverse order for the 68k memory allocated areas like in the Mac emu Executor) may work to a certain degree but may cause problems like reversed screens etc... so a better approach should be found
Then have a go and get a test program up and running, and we can test it and see if we can work out any bugs.
__________________
My iPhone Game: Puny Humans -
http://itunes.apple.com/gb/app/puny-...362230281?mt=8
bloodline is offline   Reply With Quote
Old 04-14-2004, 07:03 AM   #58
Karlos
Sockologist
Points: 50,827, Level: 100 Points: 50,827, Level: 100 Points: 50,827, Level: 100
Activity: 8% Activity: 8% Activity: 8%
 
Karlos's Avatar
 
Join Date: Nov 2002
Location: Barishabaad, Sardistan
Posts: 16,670
Blog Entries: 18
Default Re: 68k AGA AROS + UAE => winner!

Quote:
Crumb wrote:

Bernd Meyer replied some of my posts in ANN, and he thinks that using the memory as I described (using the memory in reverse order for the 68k memory allocated areas like in the Mac emu Executor) may work to a certain degree but may cause problems like reversed screens etc... so a better approach should be found
Just thinking about this.

Suppose your Task structure is a little different for any thread currently emulating 68K code, allowing the OS to see that this thread expects big endian memory.

Libraries can then detect if the caller is expecting big endian data or not. So in theory you could allow calls to the OS from the 680x0 side because they can check for it and respond accordingly.

Alternatively, you can have stub libraries for "big endian data access" to normal libraries and whenever a emulated 680x0 thread opens a library, it really gets the stub.

As for that screen issue.

Now, your 680x0 emulation opens a screen by ultimately calling the x86 native OS code. The data format returned should be in whatever format makes most sense for the x86, since most of the rendering will be done by the OS anyway.
If the user (under 680x0 emulation) wants to render into the screen directly, via LockBitMapTags() or whatever, the data format returned for the locked bitmap will be eg RGB16PC or BGRA or whatever other absolute format. Most likely it will always be a "little endian" format.

Since locking bitmaps for direct access *must* care for any format returned anyway, I don't see a problem for the 680x0, other than the fact it will be performing an unneeded byteswap in 680x0 code (which is then byteswapped again in the emulation :-D ) so the relative perfomance will be down a little.

If you think about it, lots of existing amiga gfx cards use little endian data modes that any code doing direct access to has to take care of, I don't see this as being any different.
__________________
OCA
This isn't SCSI... This is SATA!!!
I have CDO. It's like OCD except all the letters are in ascending order. The way they should be.
Core2 Quad Q9450 2.66GHz / X48T / 4GB DDR3 / nVidia GTX275 / Linux x64, AROS, Win64
A1XE 800MHz / 512MB / Radeon 9200 / OS4.1
A1200T BPPC 240MHz / 256MB / Permedia 2 / OS 3.1 - OS3.9, OS4
A1200T Apollo 1240 28MHz / 32MB / Mediator1200 / Voodoo 3000 / OS3.9
A1200D Apollo 1240 25MHz (ejector seat ROM edition) / 32MB
Karlos is offline   Reply With Quote
Old 04-15-2004, 03:44 AM   #59
whoosh777
Too much caffeine
Points: 5,031, Level: 45 Points: 5,031, Level: 45 Points: 5,031, Level: 45
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Posts: 114
Default Re: 68k AGA AROS + UAE => winner!


by Karlos on 2004/4/14 4:56:20

Hi,

Quote:


>>but then your objections to x86 also arent valid because your criticism was that x86 has to act on ram operands and now you tell me it doesnt


/*
No, that's not quite what I said. I meant the design paragdim of x86,
from the programmer perspective, is that it can use memory direct operands for
x86 level instructions, just like 680x0 does. So you do an add operation of a
memory opetand onto a registem
or vice versa etc.

It doesn't *have* to use menory operands (adding register to register,
for example is fine), but it doens't have a great deal of registers to spare,
so it makes sense to use memory operands.

Up to the Pentium, this is how the core worked, but it was badly falling
behind newer RISC architectures in terms of performance.
The chip designers could see the advantage of load/store based superscalar CPUs,
but couldn't throw away their existing objecm
code compatibility.

So they did pretty much what motorola also did with the 68060, and later
cores simply used a lower level RISC style code that incoming x86 "micro-op"
instructions are dissasembled into.

So, from your perspective, as a programmer, the x86 does use memory based
operands, and you can add a memory operand to your accumulator register.
*/

effectively you are saying it has an internal RISC emulator of the CISC code,

/*
Now, in reality, that add instruction, during execution gets decomposed into a
micro-op load operation to fetch the ram operand, an add, a register file
write and so on.

If you look at what x86 code is translated into at micro-op level, you see
something very much more like your PPC style code, with lots of register to
register operations, many registers and load/store for memory access.
*/

ok, but this may be exactly why Intel is faster than PPC,

you are regarding this mechanism as worse than PPC,
but I think it may be better than PPC,

I have to talk in terms of 68k which I understand, so pretend we have a
68k version of Intels mechanism ie a PPC inner core interpreting 68k binary,

the advantage is that the binary is typically twice as small which means we
double the instruction access speed,

that inner RISC core when it emulates will be using some inner ultra fast RAM,
ie not external real RAM, thus we combine the small code advantage of CISC
with the fast code execution of RISC,

if a 68k + PPC hybrid were done ie 68k binaries with inner PPC interpretation
of the 68k as with Intel then I think this could be considerably faster
than PPC,

/*
However, as a programmer, you are not exposed to this level, but it does exist.
This is partly why most chip designers simply shrug off RISC v CISC comparisions
these days, since deep down most modern CPUs are virtually the same.
*/

outer RISC may be inferior to inner RISC + outer CISC,

/*
Regarding 32 registers on PPC, I wrote plenty of code that makes use of it.
You are incorrect about the implicit register backup required on the stack for
register based calls because the PPC open ABI defines those registers as volatile.
Just like you nevm
r need to worry about saving d0/d1/a0/a1 for most 680x0 stuff (and similarly
cannot assume they survive a function call), roughly half the registers of the
PPC ABI are volatile and can be used for argument passing and temporary local
variables.
*/

but say there are 16 such registers then they will not survive across function
calls,

if those 16 registers contain info needed beyond the calls then you have to
back them up to another 16 registers or the stack,

I think the function call convention is actually quite a difficult problem,

the best answer is probably somewhere in between what I am saying and what
you are saying, eg maybe:

1. small number of volatile scratch registers,
2. the first function arguments in non-scratch registers: if function arguments
are in scratch registers then those registers are no longer scratch!:

scratch registers are short term breathing space,

3. further arguments to the stack,

eg on 68k:

a0,a1,d0,d1 as scratch, and function arguments f(d2,a2,d3,a3)

if you have too many function arguments in registers you
run out of breathing space: the called function has to start using the
stack to free up registers for internal use,
also the calling function may already be using those registers for
something else so it has to back them up somewhere,


if you have too many scratch registers its wasteful,


So I concede that you need some register arguments but I differ with
68k AmigaOS in that I think they should be non scratch and not too many,
not too few either!

I also think you should have uniform register argument usage:

eg argument 1 is always d2,
in AmigaOS the usage is totally erratic eg: Open(d1,d2), FindTask(a1),
AllocSignal(d0), Enqueue(a0,a1)


I also differ in thinking that you only need 1 system variables register if any,
which would point to a structure containing the OS global variables,



/*
I wrote various matrix multiplication code for transforms on ppc (some of the
only hand asm I wrote) that is extremely fast since the entire transformation
matrix' terms remain in registers for any number of vertices processed. 12 floats
defining the buisness end of the matrix are held in volatile registers and you
even get single
cycle 4-operand multiply-add (register*register+register -> register) operations
to help when evaluating the matrix * vector stuff.

Literally, throught the loop the only stuff accessing memory was the instruction
stream, incoming vertices and outgoing transformed vertieces. And the loop, of
course pretty much stays in the cache

There is no way I could get say the 680x0 to get anywhere near the code efficiency
of it.
*/

I can believe that you generated optimal code relative to the PPC architecture,

if the CPU has lots of registers then you should make use of them if it makes
sense,


matrices would be very rapid with lots of registers,


to some extent optimal CPU design may depend on what you are using the
CPU for,


I think the design of an optimal FPU may be totally different from the
design of an optimal CPU,

FPU's are about real maths, whereas CPU's tend to be about
non maths stuff: addresses + flags + counts and such like,
flags are represented as numbers but they are not numbers but
booleans,


the maths involved in most of an OS is generally quite simple,
especially outside of graphics, most of an OS doesnt even require
floating point numbers,


the only use of floating points probably is for things like rendering
rescalable fonts,


/*
I know you don't see large register files as at all useful, but if you look at the
way present x86 processors actually work inside and the way they are going with
the 64-bit designs (IIRC, IA64 has 256 registers) I think you might be in a
minority.
*/

its survival of the fittest, just because some company or even all companies
makes some decision doesnt mean its correct,


most cars use petrol, but alcohol is sustainable and has no pollution:
alcohol turns into carbon dioxide and water, certainly no lead or fumes,
that carbon dioxide came from the atmosphere in the first place via
photosynthesis so its a clean cycle,


somewhere in South America they use alcohol powered cars,
and its quite viable: all you need is prairies filled with sugar cane
which you ferment, but there is probably 12 digit money politics
that makes sure we all use petrol,


I was talking about the fixed problem of what the PPC CPU does,
that a different architecture with the same technology could even be
4 x as fast,


FPUs are a totally different problem, eg vector processing in hardware
could make a huge difference:

have a very wide data bus and very wide vector registers,
then you can load a vector in one read cycle,

also 3D graphics is probably totally different from generic FPU use,
because in theory you could load an entire matrix in one read cycle,

in which case you only need 2 registers 1 for the matrix and one for
the vector,

if you have 32 bit 3 address instructions then you probably need
at least 16 opcodes == 4 bits which leaves 28 bits meaning that
512 registers is the maximum possible,


remember that vector maths is a very narrow slice of what computers are
used for thats why they dont use GPUs as CPUs!


I think we are starting to argue in circles,

whoosh777 is offline   Reply With Quote
Old 04-15-2004, 05:34 AM   #60
whoosh777
Too much caffeine
Points: 5,031, Level: 45 Points: 5,031, Level: 45 Points: 5,031, Level: 45
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Posts: 114
Default Re: 68k AGA AROS + UAE => winner!


@bloodline,

I spent several hours yesterday looking at Comet, Staples and PC World,

Comet have an entry-level system for 270 Intel based HP system,
but it uses a shared memory Intel graphics card,

12.95 delivery charge,

128Meg ATI-Radeon 9xxx looks like its 100 at PC World,

Comet also do a 470 mid-level PC base unit also HP,
with 128 Meg ATI-Radeon 9xxx SE,
it has various things not in the 270 one eg writable DVD whereas
the 270 is read only for DVDs but writable for CDs,

all the systems in Staples seem expensive,

If I make a system from the parts via PCWorld its:
52.86 for CPU, mobo 47.59, 128Meg RAM 22.33, 128 Meg ATI-Radeon 100,
case 35,

which adds up to 267.80 without s/w or OS,

unbundled MS OS is 160 I was told,

Fast SCSI-2 interface for external drives is 29.95 from PCWorld,

looking at real machines made me realise what an achievement it is
to get AROS to boot directly on PCs,

>>>>It would certainly be a great thing to have AROS running Nativly on the A1.

>>>The PPC Linux hosted version of AROS is coming on rather quickly thanks to Markus,
>>>who is resolving some stack issues, and attempting to get the Graphics drivers
>>>to work.

>for me the word Linux is underlined here,

>can this Linux work be reused in a directly A1 booting AROS?


/*
Yes, when you think of AROS Hosted, think of Linux as a Hardware abstraction layer.
Running AROS on Linux can be thought of as the same as running AOS3.1 in UAE.

While AROS is running on Linux one can work out all the bugs and issues.
Then you can add the Firmware boot code and boot AROS on it's own.
*/

ok thats good, it means indirectly they are working on a direct boot,


/*
It should be noteed at 99.9% of the AROS source code is cross platform,
it's just the CPU specific/ASM stuff that needs reworking.
*/

as it should be!

/*/*
I feel if I buy a PC I am a turncoat or traitor, however if AROS directly
boots ie no Windows and its not Intel then maybe thats better than
using IBM PPC on the A1?

Its strange that IBM are now "good" and Intel are "bad",

Having read the book "Big Blue" IMO IBM are anything but good,

and there is no basis for thinking Intel are "bad": Intel never
did anything "bad",

MS OTOH IMO are bad,
*/*/


/*
To be honest, the "good"/"bad" lables are a redundant conceptual model.
Nothing is good or bad, things fall into two categories, "Usefull" and "Useless"
with respect to your requirements.
Windows for example is "Useless" if I want to use an OS that looks and feels
the way I want an OS to look and feel, but it is "Useful" is I want to run a
certain peice of software.
*/

an interesting viewpoint,

certainly I dont like at all the Windows interface,
to me it says "go away", whereas the Amiga interface says "FUN-time"!


The bundles provide a full set of s/w, but I am really intent on
bypassing MS.

Its quite insiduous how they force themselves on you the customer,
the default line of action is you end up with Windows and all the MS s/w,

that would be ok if they manufactured the h/w but they dont,
so its not right,


looking at machines I dont get what the big deal is about MS,

they have created a drab looking OS and the only interesting thing about it
I can see is that all h/w is geared to run on it,

Really h/w manufacturers should be forced to release c drivers for their
products so any OS can use any driver,


it is truly sinful that 3rd party h/w can be produced to only run on Windows,


/*
When choosing hardware you must first consider what your requirements are,
then choose what is useful and at the cheapest price. Ignore religious/political
issues like "brand" and "make", these things are unimportant when it comes to
technology.
*/

I am upset though that something so horrible as MS has a stranglehold,
why cant we have a quality monopoly,

/*/*
How clean is PC AROS boot?

(the cleanness of the PPC AROS boot appeals to me),


Re PC AROS if I have understood you:

1. I buy a PC,
2. I download AROS,
3. I directly boot AROS?
4. I run UAE above AROS for full 68k compatibilty?

Is this correct?

*/*/

/*
Yes, just download the AROS CD image, burn that to a CD-ROM,
then put that in the CD drive, turn the PC on...
AROS will boot and run by itself.
*/

sounds painless, so in theory I dont need MS?

now could I do this via the PCs hard disk instead of via CD-ROM?

ie if I download to the hard disk could I boot from that instead?

or if I download to an external SCSI hard drive on the Amiga and
then connect this to the PC via the interface I mentioned,

or is it only possible via CD?




/*
You will presented with an early startup menu allowing you to select
certain hardware options (good news if you have a Nvidia gfx card),
or you can ignore them and it will boot after 5 seconds.
*/

looks like I will get an ATI Radeon if I get a PC or use the
onboard shared graphics Intel card,


/*/*
In the UK have you any tips about buying a new PC?

Are the places like PCWorld, Comet, Staples, Dixons a good place to try
or should I go to specialist shops eg from computer mag adverts?
*/*/


/*
I would build the machine myself. There are plenty of shorps that sell PC parts
for good prices. http://www.dabs.com is a great UK website selling top
quality parts for a low price.
Don't forget that Black Troll sell complete PC's with AROS already
installed for around 160 or so (depending upon the exchange rate).
High Street stores will rip you off.
*/

I will look into these then,

I read this after making yesterdays visit to those shops,
160 is almost half the price of what I saw,


will the 160 PC come with MS OS or MS s/w or is MS completely absent?

what sort of graphics card?

do you think the shop bundles are there to catch ignorant first timers??

/*/*
>That's true, if we treated all memory access in AROS as Big Endien
>we could have the same 68k emulation method as OS4 and MorphOS use.
>But it has been decided that the performance penalty of running
>a little Endien CPU with Big Endian data was to significant
>(something like 30% penalty) that it was not worth it.

30% is nothing,

if a car goes by at 70mph and 10 minutes later another car goes by at 100mph
would you know the difference (I am talking about perceptions here),
(70mph being 30% slower than 100mph)

can you go both ways: ie have Big endian PC AROS and Little endian PC AROS,
*/*/

/*
30% is far too much. When running AROS on a 3.066Ghz CPU, are you
really happy to write off nearly a whole 1Ghz (919.8Mhz) of performance?
*/

it may depend on your upbringing, I was trained to never put speed as the
top priority, ie robustness + portability + compatibility etc are
higher up the ladder than speed

eg a formula 1 car is very fast but I dont see many people driving them
on the roads!

/*
There is no point to cripple a CPU, AROS runs using the Native byte order
of the CPU, thus it is big endien on 68k and PPC and little Endien on the x86.
*/

I dont mind though and many people are happy with WinUAE and Amithlon which
are big endian on x86,

the be686-amithlon-gcc shows that people are even prepared to produce
big endian compilers for x86,

/*
You could build a Big Endien AROS for the x86 but that would be incompatible
with the faster Little Endian one.
*/

effectively it would be AROS on a different platform,

it would be very interesting to see what exactly the slow down is,
you believe its 30% but as it hasnt been done we dont know for sure

/*/*
Besides if we use an integrated UAE we also get Hardware compatibility
>and improved stability so it's a benefit all round.

can you integrate UAE at the RAM level with little endian RAM?

if a 68k program accesses OS data structures ints and words at the byte level
or bytes at the word level the OS will get mangled

most programs wont do this so maybe you dont lose too much,
*/*/


/*
The Idea for the integrated UAE is so that the 68k and the x86 do not share
Data structures.

But instead allow the two system to synchronise their data.
*/

ok, you've gone down that path,
its going to be much more work,

I suppose if you redirect each 68k jump vector to also call the
corresponding x86 jump vector or something,

will each x86 API call have to synchronise the 68k data structures?

/*
This will allow 68k programs to run in the same environment as the x86 programs.
The only down side is that 68k programs will not be able to call x86
functions and vice versa. This could be possible, but probably not worth it.
The UAE Emulator will be running a 68k version of AROS (specially designed to
synchronise with the x86 version).
*/

I suppose the open nature of AROS means someone else could
create their own variant of AROS some other way, (I'm not volunteering just yet!)

whereas with AmigaOS we would all be stuck with the company's decision,

/*/*
everyone says its not a big deal that Eyetech created the A1,
I wondered whether they could prove this by doing their own one,
*/*/

/*
Anyone is able to sell Terrons.
I could put a little sticker on it if you like and sell it to you.
*/

see you are telling me its not a big deal,

how much would you sell it for?

/*/*
people on all Amiga variants could then start generating AROS native progs,
*/*/

/*
Since AROS is source code compatible with AmigaOS, it is easy to write your
program on your A1200 in C... and then take that source code to An AROS
machine and recompile for whatever CPU that is running.

gcc has a cross compiler, there is no probelm generating code for any
CPU from any CPU, providing you have the includes of course.
*/

your gcc outputs to multiple CPUs?

68k gcc only appears to output 68k series code,
but it sounds like your one outputs PPC and Intel code,

in which case there isnt an issue except for 68k people as it appears
you havent completed 68k Amiga AROS,

/*/*
computer 3D always sucks because you can literally see the computer slow down
and wince whenever something computationally complex happens,
*/*/

/*
I've guess you've not used a new 3D card then. I have yet to write a program
that causes my Radeon 9000 to slow down even with over 10000 objects
(using the DX7 interface).
*/

but as you throw more and more objects it must eventually slow down?

ie

for( i = 1 ; ; i++ ){ introduce_1000_objects() ; }

must eventually catch up with your CPU power,

how about 10000 explosions?


/*/*
Amiga.org is closed source isnt it?
*/*/


/*
No. Both Amiga.org and Amigaworld.net use xoops which is a great example of
opensource software. Aros-Exec.org also uses xoops.
*/

xoops is opensource but presumably the specific configuration is closed source??

/*/*
Note that if a 68k version is also done then it reaches all platforms via UAE,
so its just the native compile that would be lacking,

:this is a good reason for having a fully implemented 68k AROS,
*/*/

/*
We need the 68k AROS for the integrated UAE Emulator idea.
I also want to run AROS on my A1200. We do have a working 68k AROS,
but it needs to be adapted to boot the Amgia Hardwre.
At the moment it only boots the Palm PDA.
*/

reimplementing AmigaOS via the same custom chips!

you may need to study UAE to understand the custom chips!

(a bit like studying AROS to understand AmigaOS),

I hope you fix some of the existing bugs eg
AGA SetRGB32CM doesnt set the lower 4 bits of the blue component,

also blitter OS calls can fail horribly on bitmaps exceeding
1024 pixels width even though the h/w can cope with huge bitmaps,

according to the autodocs someone forget to set an AGNES big blit flag
they knew that in 1992 and havent yet fixed the bug!


/*/*
to this day I dont even know if closed source commercial binaries are
allowed on Linux,
*/*/


/*
It depends. If you link to a GPL library or use any GPL code,
then your software automatically becomes GPL.

If you link to an LGPL library then you program is not GPL or LGPL.

If you use any BSD code then that does does not cause your code to be BSD.

Simply check the licence of any software you are using to find out what you
can and can't do.
*/

so it can be done,


/*
AROS is covered by the APL, which is similar to LGPL. IF you use AROS source only
the code that you use must remain Opensource. The rest of your program is yours.

Licence issues are very complex.
*/

sounds a good license, some licenses are quite tight fisted!


whoosh777 is offline   Reply With Quote
Reply

Bookmarks

Tags
aga , uae , 68k , winner , aros

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Winner Z4 busboard for sale Boot_WB Amiga Marketplace 6 07-11-2006 02:40 AM
Winner 4-DEV IDE Interface jgratton Amiga Hardware Issues and discussion 2 01-14-2006 04:22 PM
Elbox: Winner IDE jimmyboy Amiga Hardware Issues and discussion 1 09-15-2005 07:40 AM
Winner Z4 busboard for A1200 Eco Amiga Marketplace 1 05-20-2005 05:25 PM
Meteorite hits lottery winner blobrana CH / Entertainment 5 07-12-2004 08:01 AM