RTEMS Coverage Analysis

From RTEMSWiki
Revision as of 08:36, 23 July 2011 by Panzon (Talk | contribs)

Jump to: navigation, search

Contents

RTEMS is used in many critical systems. It is important that the RTEMS Project ensure that the RTEMS product is tested as thoroughly as possible. With this goal in mind, we have set out to expand the RTEMS test suite so that 100% of the RTEMS executive is tested.

There are numerous industry and country specific standards for safety including FAA DO-178B for flight software in the United States. There are similar aviation standards in other countries as well as in domains such as medical devices, trains, medical and military applications. As a free software project, the RTEMS Project may never have a complete set of certification paperwork available for download. But we would like to ensure that RTEMS meets the technical requirements that are shared across these safety and quality oriented standards.

We encourage members of the community to help out. If you are in a domain where a safety or certification standard applies, work with us to understand that standard and guide us to providing a polished RTEMS product that helps meets that criteria. Providing funding to augment tests, test procedures or documentation that would aid you in using RTEMS in your domain. Once the artifact is merged into the project, it becomes a community asset that will be easier to maintain. Plus the increased level of testing ensures that submissions to RTEMS do not negatively impact you.

Be active and help us meet your application domain requirements while improving the product for all!

Applying Coverage Analysis to RTEMS

In order to achieve the 100% tested goal, it is important to define what constitutes 100% tested. A lot of information exists about how to completely test a software application. In general, the term Code Coverage is used to refer to the analysis that is performed to determine what portions of the software are tested by the test suite and what portions are not tested. It should be noted that Code Coverage does not prove correctness only that all code has been tested. For some background information on Code Coverage Analysis, see Coverage Analysis Theory.

Traditionally, Code Coverage Analysis has been performed by instrumenting the source code or object code or by using special hardware to monitor the instructions executed. The guidelines for the RTEMS code coverage effort were to use existing tools and to avoid altering the code to be analyzed. This was accomplished by using a processor simulator that provides coverage analysis information. The information was processed to determine which instructions are executed. We called this Object Code Coverage and we defined 100% tested to be 100% Object Code Coverage.

In addition to defining the method for determining 100% tested, it is also important to define what is actually being tested. We accomplished this by defining a set of Coverage Profiles that allowed us to specify the feature set, configuration options and compiler options used when performing the analysis. This was important for two reasons. First, it allowed us to simplify the problem space (uncovered code) so that the goal was attainable. Second, we wanted to recognize that not all RTEMS users configure RTEMS in the same manner and we wanted 100% tested to be applicable to as many user configurations as possible. The first profile that we defined encompassed the RTEMS executive and was called the Baseline/-Os/POSIX Enabled Profile. The RTEMS executive is a large body of code that is generally defined to contain the score, sapi, rtems, and posix directories in the cpukit directory. This represents the full tasking and synchronization feature set. More details about Coverage Profiles are discussed below. Initially, we set out to achieve 100% Object Code Coverage of the Baseline/-Os/POSIX Enabled Profile.

An issue that had to be addressed from the very beginning was the different coverage map formats. Each source of a coverage map (e.g. simulator, hardware debugger, etc.) may produce a coverage map in a different format. The covmerge tool is implemented using C++ classes and provides for inheriting new Coverage Reader and Writer classes for specific coverage map formats. This allows different formats to be converted to the internal representation used by covmerge and its replacement covoar. The covoar program currently supports the formats produced by the TSIM, Skyeye , and Qemu simulators.

How it is Done Now?

The RTEMS Code Coverage Analysis process is designed to be as automated as possible. The coverage testing is performed using a set of processor simulators in conjunction with a set of RTEMS Coverage Scripts. The simulators are configured to save trace or coverage information which is analysed by covoar once the run is complete. covoar merges coverage information for a set of methods of interest. The setup for running the Coverage procedure to run this is as follows:

  • make a "base directory" to work in
  • cd "base directory"
  • check out rtems-testing from CVS
  • cd rtems-testing
  • make
  • Edit rtems-testing/rtems-coverage/VERSIONS-COVERAGE to account for your local setup
  • cd ..
  • check out rtems from CVS
  • cd rtems/testsuites
  • run ..../rtems-testing/rtems-coverage/remove_managers_not_wanted
    • This step is necessary to ensure that every symbol has a unique implementation across the entire set of executables. The "managers not wanted" code places stubs in the executables.
  • cd ..
  • ./bootstrap
    • watch for errors in case removing the managers note wanted script failed.
  • make the directory "tarballs" in the base directory. Output is saved here

Once this is done, actually running coverage is simple. You have three scripts:

  • do_coverage - lowest level script. This one takes a number of arguments and knows the standard RTEMS configurations we test. Use do_coverage -? for details. It is common to use this manually when doing partial runs or you are interested in a single configuration.
  • run_coverage - one logical level higher. Has more advanced commands:
    • update - updates the source and bootstraps
    • BSP_O[sS2][pP][dD] - run a specific configuration for BSP. The codes match those of the standard runs.
    • BSP_baseline - run all 8 standard configurations for BSP. This actually builds the BSP 4 times.
  • coverage_cron - runs update followed by baseline for all supported BSPs.

When you run one, be sure about the following:

  • The RTEMS toolset must be at the head of your PATH just like a normal build
  • Add rtems-testing/bin to your PATH

The output produced by covoar is actually a set of HTML and simple ASCII files that give a developer the necessary information to quickly determine the current status of the Code Coverage and enough information to determine the location of the uncovered code. See http://www.rtems.org/ftp/pub/rtems/people/joel/coverage/ and drill down to a single run to see the current output.

There is one other script of interest. If you collect the output tarballs into a single directory, then you can use generate_coverage_html to generate a the HTML which you see before you drill down.

Historical: How it was Done Initially

The RTEMS Code Coverage Analysis process is designed to be as automated as possible. The coverage testing is performed using a processor simulator in conjunction with a set of RTEMS Coverage Scripts. The code to be analyzed is linked together as a single relocatable with special start (start_coverage) and end (end_coverage) symbols. The relocatable is then linked to the same address in every test from the test suite. Each test is then executed on a processor simulator that gathers information about which instructions were executed and produces a coverage map for the test. After all tests have finished, the program covmerge is used to merge all coverage maps into a unified coverage map for the entire test suite and to produce reports that identify the uncovered code. The picture shown provides the general flow of the process.

One issue that had to be addressed was the different coverage map formats. Each source of a coverage map (e.g. simulator, hardware debugger, etc.) may produce a coverage map in a different format. The covmerge tool is implemented using C++ classes and provides for inheriting new Coverage Reader and Writer classes for specific coverage map formats. This allows different formats to be converted to the internal representation used by covmerge. The covmerge program currently supports the formats produced by the TSIM and Skyeye simulators.

The output produced by covmerge is actually a set of simple ASCII files that give a developer the necessary information to quickly determine the current status of the Code Coverage and enough information to determine the location of the uncovered code. The following set of files is produced by covmerge.

File Name Purpose of File
configuration.txt Details the settings for the coverage run
summary.txt Provides a summary of the results of the coverage run
sizes.txt Provides a list identifying the file name and source line number of each uncovered range along with its size in bytes
report.txt Provides the details of each uncovered range
Explanations.txt.NotFound Contains the Explanations that were not found for this coverage run (see RTEMS Code Coverage How To for more information about how and why to use Explanations)
annotated.dmp Provides the disassembled listing of hello.exe with indications of the object code that was not executed
hello.num The symbol table of hello.exe

You may wonder why the annotated disassembly (annotated.dmp) and symbol table (hello.num) are from hello.exe. Because the set of object code to analyse is the same in all tests and linked to the same address range, the disassembly and symbol table for the analyzable portion of all executables is the same.

What was Discovered

When we began the RTEMS Code Coverage effort, we performed coverage analysis on the development head of RTEMS 4.8 using the Baseline/-Os/POSIX Enabled Profile. Some of our initial observations were interesting. First, we were a little surprised at the incompleteness of the test suite. We knew that there were some areas of the RTEMS code that were not tested at all, but we also found that areas we thought were tested were only partially tested. We also observed some interesting things about the code we were analyzing. We noticed that the use of inlining sometimes caused significant branch explosion. This generated a lot of uncovered ranges that really mapped back to the same source code. We also found that some defensive coding habits and coding style idioms could generate unreachable object code. Also, the use of a case statement that includes all values of an enumerated type instead of an if statement sometimes lead to unreachable code.

Other observations were related to the performance of the covmerge tool. Of particular interest was the handling of NOP instructions. Compilers can use NOP instructions to force alignment between functions or to fill delay-slots required by the processor. Of course the NOP instructions are not executed and thus had a negative impact on the coverage. The first attempt at dealing with NOP instructions was to mark them all as EXECUTED. This was correct for the NOPs used for function alignment, but not for NOPs used for delay-slots. Marking delay-slot NOPs as EXECUTED produced an unwanted side effect of occasionally spliting an uncovered range into two ranges. We finally settled on an improved method for dealing with NOPs where NOPs were marked as EXECUTED unless they were between two NOT EXECUTED instructions. An example is shown below:

2003ee8:  80 a6 20 00 	cmp  %i0, 0                            <== NOT EXECUTED
2003eec:  02 80 00 06 	be  2003f04 <IMFS_fifo_write+0x60>     <== NOT EXECUTED
2003ef0:  01 00 00 00 	nop                                    <== NOT EXECUTED
2003ef4:  40 00 78 fb 	call  20222e0 <__errno>                <== NOT EXECUTED
2003ef8:  b0 20 00 18 	neg  %i0                               <== NOT EXECUTED

This solution to the NOP problem was important because NOPs were falsely increasing the number of uncovered ranges. This created an unnecessary explosion of the reports and increased the uncovered ranges to examine.

Resolving Uncovered Code

The output files produced by covmerge are intended to provide both a quick-look at the status of a coverage run and the details needed to resolve the uncovered ranges. As we worked through the resolution of the uncovered ranges, we noticed that the uncovered ranges usually fit into one of the following categories:

  • A new test case is needed.
  • Code unreachable in selected RTEMS configuration. For example, the SuperCore could have a feature only exercised by a POSIX API object. It should be disabled when POSIX is not configured.
  • Debug or sanity checking code which should be placed inside an RTEMS_DEBUG conditional.
  • Unreachable paths generated for switch statements. If the switch is based upon an enumerated type and the switch includes cases for all values, then it must be possible to actually generate all values at this point in the code. You can restructure the switch to only include possible values and thus avoid unreachable object code. This is sometimes best done by rewriting the switch into a series of if/else statements.
  • Critical sections which are synchronizing actions with ISRs. Most of these are very hard to hit and may require very specific support from a simulator environment. OAR has used tsim to exercise these paths but this is not reproducible in a BSP independent manner. Worse, sometimes there is often no external way to know the case in question has been hit and no way to do it in a one shot test. The spintrcriticalXX and psxintrcriticalXX tests attempt to reproduce these cases.

In general, it is interesting to note that the resolution of uncovered code does not simply translate into additions to the test suite. Often the resolution points to improvements or changes to the analyzed code. This can lead to more intelligent factoring of the code or a code re-design that produces a simpler solution. There is also the notion that just because the analyzed code is "good" the way it is does not mean that it should not be rewritten to improve its testability. Code that is completely tested is always better.

Measuring Progress

As mentioned above, the covmerge program produces reports that contain several metrics that can be used to measure progress. The first is the number of uncovered object code ranges. The second is the percent of untested object code as a percentage of the total object code size under analysis. Together these metrics provide useful information about the status or progress of the Object Code Coverage.

When we started the RTEMS Code Coverage effort, we did not immediately capture results to measure progress. This actually ended up being the correct thing to do since the covmerge tool was in development and often produced results that were not directly comparable. Now that the development of covmerge has largely settled, we can perform coverage runs on several RTEMS release points and see the progress of the coverage effort. The results shown below were of the Baseline/-Os/POSIX Enabled Profile run on the SPARC/ERC32.

Release Covered % Uncovered Ranges Uncovered Bytes Total Bytes
4.7 77.51 454 17508 77840
4.8 76.37 538 21772 92140
4.9 96.41 167 2532 70564
4.10 (head 09/09/2009) 100 0 0 70480

Several interesting facts can be seen from the data in the table. There was no organized effort to perform coverage analysis prior to the 4.8 release. This is evident in that there was no measurable improvement in coverage between 4.7 and 4.8. The unassisted developer is just not going to recognize the need for more test cases in the test suite. The coverage analysis began prior to the 4.9 release. Not surprising, the progress was significant between 4.8 and 4.9. At that time we addressed large uncovered ranges by doing simple things like adding test cases to the test suite and disabling code that was not used by the chosen configuration. The last 3.5% of uncovered code was much harder to address, but the development head has now achieved 100% coverage.

Now that we have achieved 100% Code Coverage using the Baseline/-Os/POSIX Enabled Profile, we would like to keep it 100% covered. We have setup a periodic run of the coverage analysis against the development head. The results are captured (http://rtems/ftp/pub/rtems/people/joel/coverage/) and can be monitored to ensure that future modifications to the analyzed code base do not produce uncovered code.

Coverage Profiles

RTEMS contains a lot of source code and although the primary focus of coverage analysis is to achieve 100% coverage of well-defined code subsets, we would also like to increase the amount of source code analyzed. In order to manage the increase in a systematic manner, we defined two basic groups of source code. The first group is called Baseline and the second group is called Developmental. The Baseline group contains the source code that has achieved (or nearly achieved) 100% Object Code Coverage. The Developmental group contains the source code for which there are very few test cases and therefore very poor coverage.

Initially, the Baseline group included source code from the cpukit. Specifically the following cpukit directories were included: score, sapi, rtems and posix. This group represents a full tasking and synchronization feature set. What was not in the Baseline group was placed in the Developmental group. The Developmental group included: libcsupport, libfs/imfs, libmisc/stackchk, libmisc/cpuuse, libmisc/bspcmdline, libmisc/dmpbuf and libmisc/devnull.

Within the two groups, we recognized the need to use different compiler optimization levels and to analyze each group with POSIX threads enabled and POSIX threads disabled. Applying these options produced eight sub-groups that we called profiles. The eight profiles are:

  • Baseline/-Os/POSIX Enabled
  • Baseline/-O2/POSIX Enabled
  • Baseline/-Os/POSIX Disabled
  • Baseline/-O2/POSIX Disabled
  • Developmental/-Os/POSIX Enabled
  • Developmental/-O2/POSIX Enabled
  • Developmental/-Os/POSIX Disabled
  • Developmental/-O2/POSIX Disabled

Over time it is desirable to migrate code from the Developmental group to the Baseline. As support libraries in cpukit become nearly 100% covered, they will be move from the Developmental group to the Baseline group. Eventually, the Baseline group should contain all of the RTEMS code and the Developmental group should contain nothing.

Compilation and Configuration Options

The compilation level and POSIX configuration options are passed as command line arguments to the RTEMS Coverage Scripts. RTEMS Code Coverage How To provides details concerning how to run the RTEMS Coverage Scripts. When we started the RTEMS Code Coverage effort, the code analyzed was compiled with optimization level -Os. This optimizes for size without making the object code too difficult to follow. Following the object code is important when trying to determime how to resolve the uncovered code. Once the analyzed code approaches 100% covered, it is desirable to change the optimization level to -O2. This is the most often used optimization level.

Enabling or disabling POSIX allows us to analyze the RTEMS code in its two most commonly use threading configurations. When POSIX is enabled, RTEMS is configured to use POSIX threads and the POSIX tests are built and executed as part of the test suite. When POSIX is disabled, RTEMS is configured to use Classic RTEMS threads and the POSIX tests are not included in the test suite.

Internal Compilation and Configuration Options

There are several compilation and configuration options that are built into the RTEMS Coverage Scripts and are not selectable from the command line. These options effect the RTEMS build and are used to simplify the code to aid analysis. Ideally, we would like the coverage build to match the default build for RTEMS. Over time, we will work to eliminate the need for the internal options. The current options being used are:

  • NDEBUG=1 - Disables asserts. We will probably keep this option.
  • RTEMS_DO_NOT_INLINE_THREAD_ENABLE_DISPATCH=1 - Inlining resulted in branch explosion. Over 200 new test cases will be needed to eliminate this option.
  • RTEMS_DO_NOT_INLINE_CORE_MUTEX_SEIZE=1 - Inlining resulted in very difficult code to analyze. This option should be able to be eliminated.
  • RTEMS_DO_NOT_UNROLL_THREADQ_ENQUEUE_PRIORITY=1 - Unrolling loop resulted in multiple interrupt critical sections. This option should be able to be eliminated.

Beyond Object Code Coverage

At this point, the RTEMS Code Coverage effort has been focused on Object Code Coverage. But we would like to go beyond Object Code Coverage and address other traditional coverage criteria (see Coverage Analysis Theory). We would also like to remain true to our original guidelines of using existing tools and performing the analysis without modifying the code to analyze.

Achieving Statement Coverage

Achieving Statement Coverage requires knowing which source files are involved (which covoar does) and which lines in those files can produce assembly code (which I don't think covoar can). If any dead source code in RTEMS is detected by the combination of gcc and Coverity Scan, then we can assume that all source code in RTEMS is represented in the generated executables.

The current object coverage utility covoar reports on which source lines were covered. It could easily be modified to generate a report indicating which source lines were covered, not covered, or only partially covered.

covoar could also generate a bitmap per source file where the bit index indicates if a source line in that file was executed or not. If we can generate a similar bit map from the source code which marks comments and other non-executable source lines as covered, then the union of the two bitmaps can be used to generate a report showing which source lines are not covered or represented in the object code. This may indicate dead code or weaknesses in the tests.

Adding a statement coverage report to covoar is an open project.

Achieving Condition/Decision Coverage

Achieving Condition/Decision Coverage requires knowing whether each branch has been both taken and not taken. Currently QEMU and tsim can be used to gather this information.

tsim produces bitmaps indicating instruction executed, branch taken, and branch not taken.

All versions of QEMU produce a debug log of the instructions executed when an executable is run. The trace information is analysed to identify branch instructions and to determine whether the branch was taken and/or not taken. Some versions of QEMU may also be able to produce a trace log which is denser but contains the same information.

skyeye does not produce branch taken/not taken information.

covoar produces reports on which branch instructions are taken and not taken. Our goal is to ensure that each branch instruction is taken and not taken.

GCC does not include debug information which indicates that a sequence of compare and branch instructions are part of a single logical condition. This hinders our ability to augment covoar to make direct claims regarding Decision Coverage (DC) and Modified condition/decision coverage (MC/DC).

We believe that for single condition if statements such as if (cond) action or if (cond) action1 else action2, that we are achieving full DC and MC/DC coverage because all logical paths are exercised.

Similarly given a dual OR condition if statement (in C) such as one the following:

Case OR1: if (cond1 or cond2)
            action
Case OE2: if (cond1 or cond2)
            action1
          else
            action2

We aim for the following cases given our branch coverage requirements:

  • cond1 branch taken, cond2 short-circuited
  • cond1 branch not taken, cond2 taken
  • cond1 branch not taken, cond2 not taken

As the above set of test cases represent the entire set of possible execution paths, we have achieved DC and MC/DC level coverage.

Case AND1: if (cond1 and cond2)
             action
Case AND2: if (cond1 and cond2)
             action1
           else
             action2

We aim for the following cases given our branch coverage requirements:

  • cond1 branch taken, cond2 taken
  • cond1 branch taken, cond2 not taken
  • cond1 branch not taken, cond2 short-circuited

Again, the above set of test cases represent the entire set of possible execution paths, we have achieved DC and MC/DC level coverage.

Open projects in this area include:

  • proving our branch coverage testing policy meets decision coverage (DC) requirements in a more general sense.
  • extending GCC to provide the debug information required to let covoar evaluate DC and MC/DC in C programs.
    • IDEA: If GCC reliably reports that all conditions within a single if condition have the same line number, then we can use that information as the basis for the analysis. Did we execute the proper set of cases for all branch instructions associated with a single debug line number.

Current Status

The Code Coverage Status section lists the RTEMS BSPs on which we are performing (or would like to perform) Object Code Coverage. We would like to continue to grow this list. If you know of a simulator that includes coverage analysis, please let us know.

With the instruction level coverage of core of RTEMS (e.g. score, rtems, posix, and sapi directories) near 100%, we have expanded our attention to include other non-networking portions of the cpukit. The best way to find out which portions of the cpukit are not currently being included in coverage analysis is to look at the commented out lines calling filter_nm() in the method generate_symbols() in rtems-testing/rtems-coverage/do_coverage

If you are interested in writing some simple parameter check error cases, then take a look at the branch taken/not taken coverage reports for the "core configuration". Some of these are a simple matter of adding missing test cases for bad parameter path. Other cases are more difficult. So if you run into trouble with the analysis, ask or skip it. A common pattern is this:

if (arg1 bad)
  return EINVAL;
if (arg2 bad)
  return EINVAL;

GCC is smart enough to optimize the returns into one block of code. Thus we could have a test for arg1 or arg2 bad and obtain 100% instruction coverage. But would not get 100% branch coverage.

Initial analysis has been done at -Os which instructs gcc to generate smaller object code. At -O2 which optimizes for speed, more code is generated and it is often clear looking at the -O2 reports, that there are test cases needed which are not required at -Os.

References

General Coverage Testing

Standards and Certifications

  • FAA DO-178B - United States Aviation Standard
Personal tools
Namespaces

Variants
Actions
Navigation
Gedare's Special Help
Toolbox