Abstract
In evaluating code clone detection tools, a benchmark is created to measure their precision and recall. Benchmarks in previous research have either of the following issues: the first one is that they depend on the code clone definitions of benchmark creators; the second one is that they are not code clones occurring in actual development process. To get rid of both the two issues, we propose a methodology that creates code clone references based on code clones occurring in development process without any human judgements. More concretely, we use multiple revisions included in the source code repository of target software to identify merged methods in the past development process. We regard merged methods as real code clones. The authors' benchmark can evaluate detection accuracy of code clone detection tools more objectivity.