[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Cogent 637 board
Jiri Gaisler wrote:
I had a new look at this issue, and enabled the
ARM patches also for the SPARC port. I could
run without ipalign in the network driver (after
also fixing ip_checksum_hdr).
However, performance dropped from about 30 Mbit/s
to ~ 20 Mbit/s for my specific driver. Running
hardware profiling (possible on the LEON3 cpu),
I could see where the cpu spend the time:
With ARM patches
function samples ratio(%)
_CPU_Thread_Idle_body 703 40.77
memcpy 443 25.69
in_cksum 119 6.90
tcp_input 36 2.08
_Workspace_Handler_initialization 29 1.68
syscall 29 1.68
c_rtems_main 25 1.45
_Thread_Dispatch 24 1.39
memset 20 1.16
ip_input 19 1.10
_ISR_Handler 16 0.92
With ipalign in the driver
function samples ratio(%)
_CPU_Thread_Idle_body 656 44.68
ipalign 198 13.48
memcpy 158 10.76
in_cksum 121 8.24
tcp_input 34 2.31
_Workspace_Handler_initialization 29 1.97
soreceive 22 1.49
memset 20 1.36
c_rtems_main 19 1.29
ip_input 18 1.22
syscall 18 1.22
_ISR_Handler 14 0.95
The cpu is loaded to the same degree in both cases (~ 60%),
but the time is memcpy doubled when using the ARM patches.
The reason for this is that doing ipalign in the driver
aligns the packet on a word address, and memcpy can use
32-bit accesses to move the packets in the stack. Without
ipalign, memcpy resorts to byte-wise copying (!). The
memcpy implementation in newlib is thus not very efficient,
at least not for the SPARC port. Implementating a
modified memcpy that would use 16-bit transfers for
16-bit aligned data resulted in 30 Mbit/s performance again.
The question is what to do next. The ARM patches could be
modified so that they could be enabled by all targets that
need it, but that would require a better memcpy implementation
in newlib. Or to stick with the ipalign in the driver and
live with the cpu overhead. Comments anyone ...?
For CPUs with RTEMS ports, newlib has optimized assembly language
implementations of memcpy for only the h8300, sh, and i386. RTEMS
itself has an optimized version for the m68k which knows some CPU
specific model characteristics beyond the multlibs.
I can't find an appropriately licensed SPARC optimized memcpy
implementation to compare. NetBSD doesn't even have one and the
ARM version they have has an unacceptable advertising clause.
We want the optimized code merged into newlib so it has to have
the correct licensing terms.
Optimizing memcpy and the cksum code are the only CPU specific
optimizations that can be made to the network stack.
Jay Monkman wrote:
Jiri Gaisler wrote:
This issue is also a problem for the SPARC port, which does
not allow miss-aligned access. In my opinion, the network stack
has a bug here as it uses a pointer to structure without checking
the pointer alignment. The whole problem would be solved by
splitting that access to the IP address in the IP header into
two 16-bit reads, rather than a 32-bit read. It would require
to modify the IP stack, but all these issues would be solved
once and for all. Maybe the access could be done through a
#define which would be empty on targets supporting unaligned
An other solution (which is used in eCos) is to implement an
unaligned access trap handler. The trap handler emulates the
access using two 16-bit reads. The overhead should not be
that large as it is only the IP addresses in the IP header
which are miss-aligned.
This is also a problem on MIPS.
The network stack is from an older version of FreeBSD, from a time
when it must
have only supported x86 and m68k, which can deal with misaligned data.
actually goes out of its way to make the problem worse. In at least
one place it
uses two 16 bit fields to store a 32 bit value, where the first 16 bit
not 4-byte aligned. Something like this:
| Byte 0 | Byte 1 | Byte 2 | Byte 3 |
| long 1 |
| ? | ? | short 1 |
| short 2 | ? | ? |
So, the 32 bit value is stored in short 1:short 2. So no matter what
something is misaligned.
If you search the networking code for __arm__, you'll find the places
we had to
change to fix the misaligned accesses for ARM.
I think the real solution would be to update the network stack to a
version of NetBSD. (FreeBSD would probably be fine, but I think NetBSD
safer since it runs on more platforms.) Unfortunately, it's hard to
find time to
Joel Sherrill, Ph.D. Director of Research & Development
joel@OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985