[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Byte-misaligned memcpy problems





On Thu, 11 Jun 1998, Robin Kirkham wrote:

> The -mcpu32 flag to egcs, and its associated preprocessor symbol __mcpu32__,
> are supposed to mean the compiler is generating code suitable for EITHER a
> CPU32 or CPU32+ processors.

Right.

> The GCC manual (post egcs 1.0.2, at least) makes this clear.

We are probably all guilty of not having read that. :)

> I don't think the egcs folks took -mcpu32 to mean "CPU32+ only" at all;
> I certainly never did. Actually I think the egcs folks are far more interested
> in x86, ppc, alpha, and so on, and probably don't have a view on it; and
> that's probably fair enough. 

The option correctly generates code for either CPU32 or CPU32+.  This is
good.  If there is a distinction between cpu32 and cpu32+ which gcc can
take advantage of, then it probably should.  The downside is that it adds
another multilib variety.

I got a fairly quick response on this from the egcs maintainers.  The m68k
is far from dead.  It is popular in embedded systems which pay Cygnus'
pills and is used in a number of other systems for which gcc is the best
compiler available.  sun3, next, and m68k linux get regular test reports.

> The memcpy() in question routine incorrectly assumed it meant CPU32+ only,
> and on that basis inserted some *assembler* code to speed up memory copies,
> which produced address errors when run on an "ordinary" CPU32. This memcpy()
> is inserted into newlib by the RTEMS patches, as far as I can tell.

Correct on all counts. 

> egcs isn't really at fault here, and I'd advise against patching or hacking
> it to solve this problem. I prefer Eric's solution
> 
> > 1) Move memcpy from newlib to RTEMS for the 68k family.
> > 2) Change the memcpy source to include m68k.h
> > 3) Change the #ifdef in the memcpy source to
> > 	#ifdef (M68K_HAS_MISALIGNED == 0)
> > 		code for byte-by-byte copy
> 
> and M68K_HAS_MISALIGNED would be defined using the -D compiler flag in the
> make/custom files for the gen360 BSP (only the 68360 and 68349 processors
> have a CPU32+). At the moment, there is no way RTEMS m68k.h can work it out
> from the __mxxx__ symbols.

I don't think the -D path is necessary.  All it takes is properly setting
RTEMS_CPU_MODEL.  We can distinguish between CPU32 and CPU32+ ourselves.
targopts.h always has the symbol for RTEMS_CPU_MODEL defined so m68k.h
could read:

#elif defined(__mcpu32__)

#if defined(mcpu32plus)
#define CPU_MODEL_NAME          "mcpu32+"
#define M68K_HAS_MISALIGNED      1
#else
#define CPU_MODEL_NAME          "mcpu32"
#define M68K_HAS_MISALIGNED      0
#endif

...


> Alternatively, just put the CPU32+-only assembler version of memcpy() in 
> the gen360 BSP (I don't think there is a BSP for a 68349) and take it out of
> newlib. Then, the 68360 people get the jazzed-up non-aligned version,
> and everyone else uses the safe-but-sure generic newlib one.

I don't like adding generic code like this to a BSP.  

> A better alternative may be to write a C memcpy()/strcpy() that does aligned
> longword copies wherever possible, with 0, 1, 2 or 3 byte copies at each end
> if necessary.  There will be a slight overhead for small buffer copies, but
> the speed should improve considerably with buffer size. This is a common
> and I would think portable way of writing memcpy(), and I'm a bit suprised
> the newlib generic one don't do that.

The old GNU/960 one went further and aligned to 16-byte boundaries, then
copied 16-byte structures at a chunk.  The i960 has ldq/stq instructions
which can load/store 16-bytes into 4 registers at a time.

> There are a number of issues associated with adding a -mcpu32+ flag to egcs:
> 
>     o the CPU32 target macro in the m68k code generator code is a bit of
>       a hack at present (its the 68020 macro with a bit to stop it emitting
>       bitfield instructions). If there are to be CPU32 variants then this
>       probably needs to be straigtened out first.

I was afraid of this.  I was thinking overnight that the m68k machine
description is quite complex with all the variants (including now the
coldfire).

>     o this aspect of gcc/egcs is very fiddly and IMHO not too well thought
>       out. The target macros, -m flag table, specs strings, and MULTILIB 
>       variables (as well as the default configurations) all need to be
>       adjusted in concert, or the result is a mess.

It would be extremely easy to break something. All you end up with is
another multilib type with possibily very little run-time benefit.

>     o If this new flag -mcpu32+ does not alter the code generated, there is
>       no point having it. If it *does* alter the code (and I don't think
>       altering STRICT_ALIGNMENT will make a very large difference for most
>       code, although I'm prepared to be corrected) then you will need to
>       make a separate multilib for the CPU32+ (after all, this issue involves
>       a library function). There are heaps of multilibs for the 68k already.

There are more multilibs for the powerpc and (I think) mips than the m68k
but adding one should not be taken lightly.

I also doubt that it would have much impact on code generation.

>     o There are probably a number of other fine-grain differences between
>       processors that have the same gcc/egcs -m flag. These matter for things
>       like RTEMS. At least, Joel keeps telling me this ... anyway, we can't
>       keep adding gcc/egcs flags and multilibs for all of them, there would be 
>       dozens and I think it's just not practical.

Unfortunately, I seem to be right on this.  Just when it seems that gcc
makes a fine enough distinction between CPU models for RTEMS' purposes, we
find a case where we need a finer distinction than the compiler.  The m68k
was the only family gcc can close to distinguishing enough on to satisfy
the RTEMS distinctions.

>From a purely academic perspecitive, cpu32 and cpu32+ are different cores
so should be recognized and honored as such by gcc.  Whether it is worth
it from a practical perspective is doubtful.

I have been bludgoened into submission on this one. :)  So I guess the
remaining questions are:

 + where this stuff should go in the tree.  Suggestions include:
    - score/cpu
    - lib/libc
    - lib/libcpu  (my preference)
    - BSP
 + do we remove the m68k specific memcpy we added to newlib or fix it?
 + do we disable newlib from building a memcpy?

--joel