Stack checker broken on PowerPC/virtex BSP - 22.214.171.124 or later
joel.sherrill at oarcorp.com
Thu Aug 16 10:14:08 CDT 2007
Robert S. Grimes wrote:
> See below
> Joel Sherrill wrote:
>>> I've narrowed it down to the invocation of this macro:
>>> #define Stack_check_Dope_stack(_stack) \
>>> memset((_stack)->area, BYTE_PATTERN, (_stack)->size)
>>> In the suspect code, this translates to this:
>>> memset(0xe75c0, 0xa5, 0x2808);
>>> Thus, it is attempting to set the area from 0xE75C0 to 0xE9DC8
>>> This is the relevant output of the application build process - the .num
>>> map file:
>>> 000e0000 A stack.start
>>> 000e8000 A IntrStack_start
>>> 000e8000 A stack.end
>>> 000ec000 A intrStack
>>> 00100000 A _endloader
>>> 00800000 A _HeapSize
>>> A little fishy? Yeah, but I don't know why... Anyway, here is the
>>> exception output:
>>> Exception handling initialization done
>>> opb_intc_init: mask = 0x7
>>> exception handler called for exception 7
>>> Next PC or Address of fault = A5A5A5A4
>>> Saved MSR = 0
>>> R0 = A5A5A5A5
>>> R1 = E7EB4
>>> R2 = C56D8
>>> R3 = 1
>>> R4 = A5
>>> R5 = 0
>>> R6 = FEFFFFFF
>>> R7 = D0000
>>> R8 = D327C
>>> R9 = E9DC8
>>> R10 = 1
>>> R11 = E9DC8
>>> R12 = 0
>>> R13 = FFFEA680
>>> R14 = FFFFFFFF
>>> R15 = FFFFFFFF
>>> R16 = FFFFFFFF
>>> R17 = FFFFFFFF
>>> R18 = FFFFFFFF
>>> R19 = FFFFFFFF
>>> R20 = FFFFFFFF
>>> R21 = FFFFFFFF
>>> R22 = FFFE0000
>>> R23 = FFFE0000
>>> R24 = 0
>>> R25 = D3154
>>> R26 = 1
>>> R27 = 0
>>> R28 = D0000
>>> R29 = D33FC
>>> R30 = E0F38
>>> R31 = A5A5A5A5
>>> CR = 39000033
>>> CTR = 0
>>> XER = E000007F
>>> LR = A5A5A5A5
>>> MSR = 0
>>> DAR = 0
>>> Stack Trace:
>>> IP: 0xA5A5A5A4, LR: 0xA5A5A5A5
>>> --^ 0x00000000
>>> So it is clearly trying to execute code in the just-doped stack, though
>>> I don't know why...
>>> Anything else I should try?
>> I really suspect that the RTEMS workspace (and possibly the
>> C Program Heap) is overlapping the initial stack and possibly
>> the interrupt stack
> Yeah, something like that is wrong...
>> Break at RTEMS_Malloc_Initialize and look at the first
>> two arguments (start and length).
> start = 0x1f0000, length = 0x800000
>> While there look at the first two entries in what
>> _Configuration_Table points to (workspace start and size).
> work_space_start = 0xe0000
> work_space_size = 0x100000
>> Draw a memory map showing the range of your program's
>> text, data, bss, interrupt stack, starting stack, C Program
>> Heap, and RTEMS Workspace.
> Text 00010000 - 000c72bc (symbols text.start and text.end)
> Data 000c72c0 - 000cb520 (symbols data.start and data.end)
> BSS 000cb520 - 000d4418 (symbols bss.start and bss.end)
> Init Stack 000e0000 - 000e8000 (symbols stack.start and stack.end)
> Intr Stack 000e8000 - 000ec000 (symbols IntrStack_start and intrStack)
> Workspace 000e0000 - 001e0000 (from values above)
> C Heap 001f0000 - 009f0000 (from RTEMS_Malloc_Initialize)
>> Give the symptoms and the addresses you posted above,
>> it is really looking like a memory map issue.
> The only overlap I see is the initial and interrupt stacks overlap with
> the beginning of the work space, which may be normal? If not, why is
> this wrong? Is there some configuration setting wrong here?
No configuration mistake. This is a BSP
The BSP initialization and rtems_initialize_execute_XXX runs using
the starting stack. Object control blocks, task stacks, internal
data structures, etc are allocated from that. While initializing
all that, the memory for the Init stack is going to get trashed.
You are seeing this happen when the first task stack is doped but
this is just the most brutal overwrite of the memory.
The interrupt stack also overlaps this and is getting clobbered also
but this doesn't matter until you get an interrupt that uses stack
memory that is thought by RTEMS to be a task stack or a data structure
and clobbers that.
The BSP needs to be fixed to the workspace memory doesn't conflict
with the interrupt or initial stacks. Reserve them.
Make sure the BSP is starting the workspace after the high address
symbol for the interrupt stack.
Look for this in bspstart.c:
/* round _end up to next 64k boundary for start of workspace */
BSP_Configuration.work_space_start = (void *)((((uint32_t)&_end) +
0x18000) & 0xffff0000);
It looks like _end in linkcmds should be defined after the definition
of intrStackPtr not before those stacks.
Then the BSP better be careful to make sure that the
heap starts BSP_Configuration.work_space_size bytes after that and the
rest of memory goes to the Heap.
But right now, the linkcmds says
_HeapSize defaults to 8M and the code appears to ignore it.
Look at libbsp/shared/m68k/m68kpretaskinghook.c for a way to default
to all of memory (_HeapSize == 0) or let the user specify a maximum
size by defining _HeapSize on the command line.
> Here are my CONFIGURE_ settings:
> #define CONFIGURE_APPLICATION_NEEDS_CONSOLE_DRIVER
> #define CONFIGURE_APPLICATION_NEEDS_CLOCK_DRIVER
> #define CONFIGURE_RTEMS_INIT_TASKS_TABLE
> #define CONFIGURE_LIBIO_MAXIMUM_FILE_DESCRIPTORS 20
> #define CONFIGURE_USE_IMFS_AS_BASE_FILESYSTEM
> //#define CONFIGURE_EXECUTIVE_RAM_SIZE (512*1024)
> #define CONFIGURE_EXECUTIVE_RAM_SIZE (1024*1024)
> #define CONFIGURE_MAXIMUM_SEMAPHORES 20
> #define CONFIGURE_MAXIMUM_TASKS 20
> #define CONFIGURE_MAXIMUM_EVENTS 20
> #define CONFIGURE_MAXIMUM_MESSAGE_QUEUES 4
> #define CONFIGURE_MICROSECONDS_PER_TICK 10000
> #define CONFIGURE_TICKS_PER_TIMESLICE 50
> #define CONFIGURE_INIT_TASK_STACK_SIZE (10*1024)
> #define CONFIGURE_INIT_TASK_PRIORITY 120
> #define CONFIGURE_INIT_TASK_INITIAL_MODES (RTEMS_PREEMPT |
> RTEMS_NO_TIMESLICE | \
> RTEMS_NO_ASR |
> #define CONFIGURE_MAXIMUM_USER_EXTENSIONS 8
> #define STACK_CHECKER_ON
> // These will increase over time, of course!
> #define CONFIGURE_MAXIMUM_POSIX_THREADS 4
> #define CONFIGURE_MAXIMUM_POSIX_MUTEXES 4
> #define CONFIGURE_MAXIMUM_POSIX_CONDITION_VARIABLES 4
> #define CONFIGURE_INIT
> #define CONFIGURE_INIT_TASK_ENTRY_POINT MasterInit
> rtems_task MasterInit (rtems_task_argument argument);
> Still confused, after all these years...
I know this doesn't help you but I cover this specific
problem in the BSP part of the RTEMS Class. I plan
to pull up the mailing list archives next week and show
everyone how weird the results can be from not getting
the memory map right. This is such a simple mistake
that results in weird errors.
More information about the rtems-users