RTEMS 220.127.116.11 Available -- PowerPC/virtex feedback - IT WAS THE STACK CHECKER!!! RESOLVED
Robert S. Grimes
rsg at alum.mit.edu
Tue Aug 14 11:23:18 CDT 2007
Hi Greg et. al.
It was the stack checker!
At some point last week, I added the stack checker - figured it was a
"good idea". No problems showed up immediately after doing so, so I
promptly forgot about it, and started adding some new code. Some of
this new code was getting a bit aggressive, if you know what I mean - it
exercised networking, a user communications interrupt, and some C++
features, so I was not too surprised when things went haywire. But in
this case, with the older RTEMS Head, the problem only showed up when a
message was sent to my application, _not during startup_! This was the
subject of my "isync, exceptions" email from Thursday. After attempting
to figure this out using debugger sleuthing, removing the new code, etc,
it seemed to be pretty clearly my new code that was causing the
problem. Or so I thought...
Anyway, I decided to send that "isync" message, and while waiting for
insight from you all, I decided to heed Joel's call to test 18.104.22.168.
So I cut out all the questionable code, installed 22.214.171.124, and rebuilt
my application - that's when I ran into all those "exception7" problems.
So, it turns out the stack checker was the culprit. See another email
for more details. Right now, both the Head and 126.96.36.199 seem to work
just fine, even with my new code!
Thanks for all the help!
Robert S. Grimes wrote:
> gregory.menke at gsfc.nasa.gov wrote:
>> I don't have any machines up with the RTEMS ppc build, so I am going
>> from memory. I looked at the printf suite since that stuff comes with
>> newlib and is independent of whatever flags RTEMS and user software are
>> compiled with and issues due to newlib will arise no matter what you do
>> with your makefiles.
> Understood - thanks!
>> I use "powerpc-rtems-objdump -dst xxxxx > tmpfile" (where xxxx is the
>> rtems library that contains the newlib stuff) then search tempfile for
>> the printf library code & examine it. The problem is not necessarily
>> floating point opcodes but the use of floating point registers (FPR0 and
>> similar) in any of the general cpu instructions.
> I just did a case-insensitive search for fpr[0-9], and found no matches,
> so I think those issues (floating-point registers or floating-point
> instructions) are not the problem.
>> OTOH your toolchain results suggest it is likely not a fpu related
>> problem; newlib comes from the gcc build, not RTEMS so this could be a
>> bsp related issue. When we were running into startup problems because
>> of the use of FPU registers, we couldn't even finish the executive
>> startup before being caught in an infinite exception loop.
> Seems I'm there now, except there are no FPU registers or instructions
> to be found. Seems I'm left with "illegal instructions" or a BSP
> interrupt enabling problem. No idea where to look for that...
> On the positive, because 188.8.131.52 fails much more predictably (as far as
> I can tell so far, in exactly the same place), I have both a good place
> to start looking, and (I guess) I'm helping with the 4.8 release
> candidate testing! ;)
>> Robert S. Grimes writes:
>> > Chris and Greg, you seem to be right - see below.
>> > gregory.menke at gsfc.nasa.gov wrote:
>> > > I ended up using the newlib printf suite as the test- if fp registers
>> > > are shown in the instructions, then it meant I had a build problem w/
>> > > newlib. The assembly is dramatically different when the fp registers
>> > > are excluded.
>> > >
>> > Don't know what you are talking about ("newlib printf suite"), so I
>> > checked my own code.
>> > > Chris Caudle writes:
>> > > > Doesn't that imply that the tools for PPC405 should already be using soft
>> > > > float?
>> > > > Everyone is scurrying around figuring out how to rebuild tools with soft
>> > > > float, but it looks like that should be the default already, and so far no
>> > > > one has presented any code which confirms that the exception was actually
>> > > > caused by using a floating point register.
>> > > >
>> > > > Shouldn't looking at the instructions to see if that is a feasible
>> > > > explanation be the first step?
>> > >
>> > >
>> > I did this instruction:
>> > $ powerpc-rtems-objdump -d --architecture=powerpc:403 myprog.exe >
>> > myprog.exe.disasm.txt
>> > and examined the results. Now, I'm not 100% sure what I'm looking for,
>> > but I figured the floating point store instructions (e.g.stfd, stfx,
>> > etc.) were good candidates. In fact, searching for "stf" turned up
>> > nothing. I also couldn't find any floating point load, nor any of the
>> > operations that I looked for. So I think, Chris, your comment from your
>> > first email on this is likely closer to the truth:
>> > "More accurately it is "program exception," which is also triggered
>> > by illegal opcode."
>> > You had followed this up by asking if the tool chain had changed between
>> > head from a month ago and 184.108.40.206, and the answer is: No.
>> > So I'm a bit stumped right now. On the one hand I have an application
>> > that works with the older RTEMs, but fails on startup with 220.127.116.11 -
>> > sounds bad for 18.104.22.168! On the other hand, its failure sounds a bit
>> > like that of my earlier post regarding "isync, exceptions, et. al" from
>> > last Thursday - similar, but worse.
>> > Any ideas?
>> > Does it seem correct to rule out (at least temporarily) the tool chain
>> > as the culprit?
>> > Thanks!
>> > -Bob
> rtems-users mailing list
> rtems-users at rtems.com
More information about the rtems-users