Problem report: Struct aliasing problem causes Thread_Ready_Chain corruption in 4.6.99.3

Till Straumann strauman at slac.stanford.edu
Thu Dec 7 19:13:42 CST 2006


Joel Sherrill wrote:
> Till Straumann wrote:
>> Ralf Corsepius wrote:
>>  
>>> On Thu, 2006-12-07 at 09:47 +0100, Wolfram Wadepohl wrote:
>>>      
>>>> Till Straumann schrieb:
>>>>
>>>>          
>>>>> I agree with Linus' sentiment, too. The problem, however,
>>>>> is (repeating mantra) that this is not just some weirdo
>>>>> gcc optimization that can be switched off. It is the C99 *standard*.
>>>>> Even if you can switch this off for gcc today, there is no
>>>>> guarantee that you will be able to in the future or on other
>>>>> compilers. If we want to produce C99 compliant code then
>>>>> we must comply with the alias rule. period.
>>>>>       Steven Johnson wrote:
>>>>>              
>>>>>> I have been quietly following this thread, but I find the whole 
>>>>>> -fstrict-aliasing/-fnostrict-aliasing issue to be very 
>>>>>> disturbing.  Luckily my program isn't built with -O2 or I 
>>>>>> probably would have been tracking untold numbers of strange bugs 
>>>>>> in known working code.  For the C language to change so that a 
>>>>>> pointer (regardless of the pointer type used to reference that 
>>>>>> memory) no longer points to a known piece of memory, in a 
>>>>>> predictable way is whacked.
>>>>>>
>>>>>> I for one do not look forward to adding __attribute__ 
>>>>>> ((may_alias)) to the hundreds of places where I change the way I 
>>>>>> address memory using pointers.  It is a monumental waste of time, 
>>>>>> prone to error and in my opinion putting in declarations to fix a 
>>>>>> broken compiler optimisation.  When a compiler optimisation 
>>>>>> breaks a fundamental aspect of C that has existed since the 
>>>>>> beginnings of the language, then I consider that optimisation to 
>>>>>> be broken, and not the code itself.  I will be adding 
>>>>>> -fno-strict-aliasing to all of my builds in future, and I will be 
>>>>>> making sure RTEMS (and all of the other Open Source libraries I 
>>>>>> use) builds with -fno-strict-aliasing, regardless of what is 
>>>>>> ultimately decided here),  I just don't want the headache.  In my 
>>>>>> opinion you wouldn't be fixing RTEMS by adding these declarations 
>>>>>> or changing the code, you would be working around a broken 
>>>>>> compiler.  The other OS's that use -fno-strict-aliasing are (in 
>>>>>> my opinion) doing the right thing.  I also fail to see how the 
>>>>>> option could yield any tangible benefits on performance that 
>>>>>> would warrant the pain and difficulty it causes.
>>>>>>
>>>>>> But that is my 2c.
>>>>>>                   
>>>> Hi all,
>>>>
>>>> i've follwed the discussion on the list. As a user of RTEMS 
>>>> building comercial *embedded* applications with high availability 
>>>> the only short term solution is to use -fno-strict-aliasing for the 
>>>> whole program including all RTEMS parts.
>>>>
>>>> Till is right in telling us the gcc optimization (weird or not) 
>>>> *is* C99 standard.           
>>> I agree with Till and you.
>>>
>>>      
>>>> Following this argumentation and expecting that 
>>>> -fno-strict-aliasing will be dropped eventually and also 
>>>> considering that we write embedded code dealing with real hardware 
>>>> i ask the question if C is the right language to implement these 
>>>> applications in the future. A programming language forcing me to 
>>>> consider what machine code the compiler will eventually produce is 
>>>> not worth using for *embedded* programming.
>>>>           
>>> Well, let me put this way: There are people having tried to (ab-)use C
>>> as macro assembler. C99 (probably under the influence of C++) has 
>>> voided
>>> this aspect to a large extend and shifted to a different level of
>>> programming languages (more into Pascal's direction).
>>>
>>> As most high level applications don't apply such "assembler like"
>>> features, so they aren't really affected. And those highlevel
>>> application which do (esp. some GUI toolkits) are facing similar issues
>>> as we are.
>>>
>>>      
>>>> In general and from an academic point of view Ralf is totally 
>>>> right; the standard is clear and the code should be fixed. A proper 
>>>> data model, well designed from base on,  will hopefully not produce 
>>>> aliasing problems. But is this always possible and adequate in real 
>>>> work? It shuold be for basic technology like RTOS or general 
>>>> libraries like newlib!
>>>>           
>>> I think so, but ... as we currently all are experiencing, the "C as
>>> macro assembler times" seem to be over.
>>>
>>>      
>>>> In fact the current RTEMS code is not in the shape that aliasing is 
>>>> not considered as a problem. It has grown over more than a decade 
>>>> of years.
>>>> Is this the time for a complete rewrite?
>>>>           
>>> Frankly speaking, I think, at least some very basic parts/types RTEMS
>>> are in need of a redesign/rewrite. IMO, introducing "type strictness"
>>> and related to it, to "properly-typed" APIs is in dire need.
>>>
>>> RTEMS definitely has weaknesses related to these areas. Therefore, I
>>> would expect a large amount of the issues related to "strict aliasing"
>>> and "strict alignment" to collapse, once they would be addressed.
>>>
>>>      
>>>>  Can we fix it?
>>>>           
>>> I hope so, but do not expect this to happen any time soon.
>>>
>>>      
>>>>  What piece of work can i do, as a user with limited knowledge of 
>>>> kernel functonality?
>>>>           
>>> Good question.
>>>
>>> ATM, from my point of view, people being familiar with certain flavors
>>> of asm who could identify aliasing showing effects on RTEMS code would
>>> be helpful. I have been trying to identify files being affected by
>>> strict-aliasing and meanwhile have a list consisting of ca. 20000 
>>> object
>>> (Note: *.o not *.c!) files (out of ca. 70000) from RTEMS-4.8, which are
>>> affected by aliasing.
>>>
>>> Now, identifying those which really are broken by aliasing would be
>>> necessary. So far, apart of Peer's/Thomas's case [1], I haven't found
>>> any :)
>>>       
>> Problem with your current approach is that you don't really
>> find alias rule violations but only the subset of them that
>> cause problems with current gcc's optimization implementation.
>>   
> Agreed.  That's why I think it is important to make sure that we find 
> a procedure that
> is good enough to run for future gcc versions.
> Do you think counting load/store instructions as a second level check 
> for differences in
> strict aliasing will reduce the false positive cases?
Not sure. I wouldn't go that route at all but rather write a parser
(or better: find an existing one) that could find all pointer casts
in the source. These would then have to be inspected 'manually'.

I don't think the approach of comparing disassembled code
is a good one. The only thing you know after doing this (and
checking all the many false positives) is that gcc didn't perform
any 'unexpected' optimization. Change one line of the source and
go back to square one.

Unless you can 'prove' that RTEMS is in compliance with the
standard (by carefully revising the source) we should build
it with -fno-strict-aliasing.

I'm pretty sure you'll *never* going to make the current
networking stack compliant with the rule (e.g., mtod() violates it)

Personally, something in my guts tells me that this rule is broken
and that we should just go with linux and BSD and -fno-strict-aliasing.

The day gcc declares this option obsolete and linux and BSD are
going alias-safe - then could be a good time to go ahead...

T.
>
> --joel
>
>> T.
>>  
>>> Ralf
>>>
>>> [1] Which meanwhile is supposed to be worked-around.
>>>
>>>
>>>       
>>
>> _______________________________________________
>> rtems-users mailing list
>> rtems-users at rtems.com
>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>   
>




More information about the rtems-users mailing list