On this website we provide the material referenced in the paper
“OpenJDK’s java.utils.Collection.sort() is broken: The good, the bad and the worst case”
in particular the test generator, which breaks TimSort, as well as the source and proof of the fixed and verified version.
The paper has been submitted to CAV’15 and is currently under review.
Update: The paper has now been accepted. The final version can be found here.
Aftermath
We reported the bug to Oracle/OpenJDK Developers after submission of the paper. We suggested two possible solutions:
-
Adjust length of runLen based on worst-case analysis, which in our opinion is only a work-around that also decreases performance due to larger runLen bounds or
-
The real fix: adjust mergeCollapse so that the invariant is satisfied (i.e. the version that was successfully mechanically verified)
As a result the in our opinion sub-optimal first suggestion was chosen.
Testgenerator
To generate a collection that causes TimSort to crash you can use the program provided below:
Usage:
- Compile “javac TestTimSort.java”
- Run “java TestTimSort <nr>” with nr >= 67108864
Source and Proofs
We provide here the KeY version used for the verification as well as a zipped archive containing the project setup and all proofs.
KeY Version:
- Webstart: To start KeY simply click on the link. (4GB minimum required)
- Binary Installation (please note you might need to change the startscript to allow KeY to use 8GB of memory to be able to load all proofs)
Requirements: Java 8, Memory: 4GB
Source & Proofs:
- SourceAndProofs:
- The proofs can be found in directory “proofs/finished”. They need to be unzipped before loading.
Troubleshooting
- Proof loading error: “Proof could only be loaded partially …. rule impRight not available or not applicable in this context”
- Sometimes options stored in the proof file are ignored. Either restart KeY and load again or change the integer semantics settings in “Options > Taclet Options > Integer semantics” to
- Java Semantics for binarySort, ensuresCapacity, gallopLeft and gallopRight
- Checking Overflow for all others
- Start a fresh instance of KeY
- Sometimes options stored in the proof file are ignored. Either restart KeY and load again or change the integer semantics settings in “Options > Taclet Options > Integer semantics” to
Other Utilities
- PrintWorstCase: prints worst case stack sizes for broken (current)
version of TimSort, in terms of the minimal needed input array length.
This corresponds to the worst case analysis presented in Section 4 of
the paper.
Usage: java PrintWorstCase.
Files: PrintWorstCase.java. - CalcStackDepth: calculates the worst case stack size for corrected
version of TimSort (with adapted mergeCollapse).
This corresponds to the calculation of the bounds assuming the invariant
holds, as presented in Section 3.1 of the paper.
Usage: java CalcStackDepth
Files: CalcStackDepth.java - Benchmark: run the OpenJDK benchmark (with additional WORST_CASE
strategy) for both the current and the fixed version of TimSort, and
compute speedup, as described in Section 5.1.
Usage: java SortPerf
Files: ArrayBuilder.java, Sorter.java, SortPerf.java,
Dependencies: TimSort.java (the current OpenJDK implementation),
StijnSort.java (the fixed implementation of TimSort). - Results of the benchmark presented in Section 5.1.
Files: Benchmark_Result_Sys1, Benchmark_Result_Sys2, Benchmark_Result_Sys3
One thought on “OpenJDK’s java.utils.Collection.sort() is broken: The good, the bad and the worst case”