From: Date: Fri, Nov 1, 2013 at 11:40 AM Subject: [NDSS 2014] Paper #156 "ROPecker: A Generic and Practical..." To: Zongwei Zhou Cc: Lujo Bauer Dear Zongwei Zhou, The NDSS program committee is happy to inform you that your paper #156 has been *conditionally* accepted to appear at NDSS 2014. To appear at the conference, the paper must be revised by Fri Nov 29 to address specific concerns raised during the review process. Title: ROPecker: A Generic and Practical Approach For Defending Against ROP Attacks Authors: Yueqiang Cheng (Singapore Management University) Zongwei Zhou (Carnegie Mellon University) Miao Yu (Carnegie Mellon University) Xuhua Ding (Singapore Management University) Robert H. Deng (Singapore Management University) Virgil D. Gligor (Carnegie Mellon University) Paper site: https://ndss2014.ece.cmu.edu/crp/ndss2014/paper/156 The program committee has assigned a shepherd for your paper. The timeline for paper revision is as follows: -- By Nov 5, your shepherd will send you a list of required changes or specific concerns that will have to be addressed for the paper to be accepted. -- By Nov 15, you will provide the shepherd with (1) the revised version of your manuscript, (2) a diff file showing changes from the initial submission, and (3) an explanatory note, explaining how the paper has been modified to address concerns pointed out by the shepherd. -- The shepherd will consult with the program committee, and, by Nov 22, the shepherd will notify you if the paper is accepted or more revisions are necessary. In the latter case, the outstanding concerns will need to be addressed by Nov 29. -- If the revised version is acceptable, then a camera-ready version will need to be submitted through the submission system by Dec 6. The program committee may reject your paper if the committee's concerns have not been addressed by Nov 29. After receiving the shepherd's first instructions, the onus is on the authors to communicate with the shepherd in a timely and clear fashion. Feel free to clarify with your shepherd any questions about reviewers' comments or requirements for revision. Your shepherd may also amend this timeline to accommodate constraints on either side. Reviews (and comments, if any) are appended to this email. Please take reviewer comments into account as you prepare the final version of the paper. Best regards, -Lujo Bauer NDSS 2014 PC chair =========================================================================== NDSS 2014 Review #156A Updated 15 Oct 2013 4:59:25pm EDT --------------------------------------------------------------------------- Paper #156: ROPecker: A Generic and Practical Approach For Defending Against ROP Attacks --------------------------------------------------------------------------- Overall merit: 5. Weak accept Reviewer expertise: 3. Knowledgeable ===== Paper summary ===== ROPecker detects ROP attacks by identifying gadgets in the execution flow. The detection relies on LBR (last branch record) information available during runtime and an offline gadget database. A sliding window mechanism is in charge of invoking the detection when needed. The LBR is used by "Past Payload Detection" to detect how many gadgets have been executed. The gadgets information is extracted from the application offline, which can improve the performance and help the simulation of ESP for "Future Payload Detection". ===== Main strengths ===== - Very low overhead - No need to modify source code or binary ===== Main weaknesses ===== - It's unclear to me how you extract the gadgets. The effectiveness of ROPeck largely relies on a complete gadgets database, however, it is difficult to achieve that in reality. What if very long gadgets or obscure gadgets exist and are used by attackers? Since the alert is threshold-based, it's possible to bypass ROPecker by inserting unidentified gadgets in the gadget chain. - In some cases, one or two gadgets are enough to launch meaningful ROP, which would go unnoticed by ROPecker. The same goes to very long ROP gadgets, which can overflow the sliding window and then bypass the detection. - The ESP simulation part in "Future Payload Detection" may cause a lot of performance overhead. ===== Other comments to authors ===== To me this paper looks quite similar to kBouncer [Usenix security'13]. There are only some minor differences, e.g., how to trigger the detection. =========================================================================== NDSS 2014 Review #156B Updated 17 Sep 2013 4:33:34pm EDT --------------------------------------------------------------------------- Paper #156: ROPecker: A Generic and Practical Approach For Defending Against ROP Attacks --------------------------------------------------------------------------- Overall merit: 5. Weak accept Reviewer expertise: 3. Knowledgeable ===== Paper summary ===== In this paper the authors describe a new system to protect userspace applications against ROP attacks by looking at sufficiently long chains of gadgets in the program execution. This is achieved by analyzing the Last Branch Registers (LBRs) using a sliding windows mechanism. The code of the application outside the window is protected from execution, so that every attempt to execute code outside the window can be intercepted to trigger the ROPecker routine. ===== Main strengths ===== * I like a lot the general idea of using a sliding window to trigger the detection algorithm. Since gadgets are likely distributed in large sections of code, the window is a very clever idea to improve the performance without reducing the tool effectiveness ===== Main weaknesses ===== * The main limitation of ROPEcker is the fact that it completely relies on a perfect disassembly to identify all instructions in the code. We all know that this can be a challenging task (especially by using a linear sweep disassembler as used in this paper), so I found quite annoying the fact that the authors did not even discuss this as a possible limitation. I'd like to see this problem discussed and measure to which extent it can introduce false positive (misunderstanding between aligned and not aligned instructions). * The authors should measure what is the impact of choosing a maximum gadget length of six instructions. What would change with 7? or 8? * I'd like to know what is the average memory overhead for a system (e.g. a server, or a common desktop) running Ropecker. * The authors say that their approach only add an *insignificant* overhead in their tests. Well, 2.6% average on cpu and up to almost 50% in certain network-bound tests are absolutely NOT insignificant. "Often acceptable" may be more a more fair way to say it. * I'm not convinced about the comparison with previous approaches. For instance, some are dismissed because they may generate false positives (also this approach can, if a binary contains a sufficiently long sequence of gadgets) or because they may have problems in particular versions of Windows (the difference in length between gadgets chains was only evaluated on Linux and maybe it wont work in Windows at all). Also the limitations mentioned for Gfree are not correct (they authors should instead say that it requires the application source code). * I didnt understand the new tool developed to test the chains generated by Q. =========================================================================== NDSS 2014 Review #156C Updated 18 Oct 2013 10:20:01am EDT --------------------------------------------------------------------------- Paper #156: ROPecker: A Generic and Practical Approach For Defending Against ROP Attacks --------------------------------------------------------------------------- Overall merit: 5. Weak accept Reviewer expertise: 2. Some familiarity ===== Paper summary ===== In this paper, the authors present a new approach for effectively and efficiently defending against return-oriented programming exploitation. The proposal combines the dynamic online detection with offline static processing. In particular, ROPecker detects an ROP attack at runtime by checking the presence of sufficiently long chain of gadgets in the past and future execution flow. It employs some hardware and kernel features, such as Last Branch Record register and kernel-assisted sliding windows to improve the efficiency. Another advantage, which is different from several existing work, is that ROPecker does not rely on any other side information, such as source code, compiler support and binary rewriting. ===== Main strengths ===== Overall I think the idea in the paper is interesting. It is nice to use LBR register and sliding window to achieve runtime checking. Since we can skip the instruction-by-instruction checking inside the sliding window, we could reduce the overhead. Meanwhile, the idea of checking the presence of a sufficiently long chain of gadgets is also interesting. ===== Main weaknesses ===== However, since the topic is a well-studied one, I expect the authors to show enough evidence that their approach can outperform existing work but the authors did not convince me. For example, I am not so convinced of the motivation. Why we need runtime check and why we need an approach without compiler or binary rewriting support? Where can we deploy ROPecker? At the analysis side or the client side? If at the client side, will it cause trouble if we enable such defense in the kernel against all programs? It seems that the authors do not provide strong evidence to answer these questions. From my viewpoint, since ROPecker also conducts offline static analysis on the binary, why not just use the existing binary rewriting scheme, such as Gadget Smashing [32]. Such rewriting scheme is also an efficient solution. Unfortunately, the authors do not provide a performance comparison between any existing work and ROPecker in the evaluation. It will be very helpful to find some undisco vered RO P attacks using ROPecker. Moreover, ROPecker also has the similar limitation as the previous approach. For example, it requires enabling emulation to handle indirect jump, which may consume a lot of resources without improving much defense effectiveness. Another limitation is the judgment based on the empirical study of the chain length. One suggestion about the figures is that the authors should adjust the scale, so we can see clearly about the difference. *******After rebuttal****** Thank you for addressing some of my concerns. I now increase my score accordingly. =========================================================================== NDSS 2014 Review #156D Updated 28 Oct 2013 9:20:15am EDT --------------------------------------------------------------------------- Paper #156: ROPecker: A Generic and Practical Approach For Defending Against ROP Attacks --------------------------------------------------------------------------- Overall merit: 3. Weak reject Reviewer expertise: 3. Knowledgeable ===== Paper summary ===== ROP defense ===== Main strengths ===== I really liked the idea behind this paper. ROPecker gets impressive performance and has some clever ideas. However, the security story is not convincing. The positive side is that the paper has a very clever idea that might stimulate more research. The negative side is that publishing this paper might give the community an overly optimistic view of the effectiveness of this idea. ===== Main weaknesses ===== It appears that ROPecker is not secure: as the paper acknowledges, there are attacks that can defeat it (see Section VIII). The paper suggests that it might be possible to stop those attacks, but this is speculative and has not been implemented or evaluated. And even if the authors are right that those attacks can be stopped, I still find the security evaluation unconvincing and my guess would be that ROPecker can be defeated through other methods not considered in the paper. I'm OK with speculative ideas where we're not sure whether they can be defeated, but the paper needs to do a better job of evaluating all plausible attacks. I don't think this paper does a good enough job of that. The paper needs to do a good enough job that, if someone breaks the scheme, then the break will likely be "interesting" or involve non-obvious ideas. If I had to bet, I'd bet that ROPecker can be broken without requiring non-obvious new ideas, and the paper doesn't persuade me otherwise. ===== Other comments to authors ===== I suspect that ROPecker can be defeated simply by using longer gadgets (longer than 6 instructions). I didn't see a convincing evaluation of how effective ROPecker will be at detecting attacks. I would prefer to see an evaluation where you run some programs under ROPecker, look at all of the sliding windows that ever occur during normal execution, analyze the amount of code that's executable during each such sliding window using Q to see for which time periods a ROP attack would be possible (because there are enough gadgets), and then compute a time-average of the fraction of time instants at which a ROP attack would be possible. I couldn't see anything like that. I had a hard time understanding what the evaluation methodology in VII.A.2 was, or what class of attacks were considered. The paper misrepresents the findings of the Q paper. The findings are not that you need 20KB of code for a ROP attack to be successful. The findings are that, for 80% of all programs containing at least 20KB of code, a ROP attack will be possible. (Over all programs containing at least 10KB of code, the corresponding success probability is about 60%.) Also, for comparison, Roglia et al. (ACSAC 2009) estimate that 96% of programs containing at least 20K of code can be attacked with a ROP attack. Make sure you read the erratum to the Q paper: http://users.ece.cmu.edu/~ejschwar/papers/usenix11-update.pdf The consequences of this for ROPecker are not good. ROPecker seems to assume that if less than 20KB of code is available, a ROP attack will surely fail. However, the Q paper emphatically does not say this. My best guess is that this is not correct: if there's a bit less than 20KB of code available, my best guess is that a ROP attack might still often be possible (maybe even with probability greater than 50%). This is problematic for ROPecker, because it means that the attacker might have a high success probability even staying within the sliding window. Actually, the implications are even worse than that, because the attacker has some control over the time when the the ROP attack is initiated. A smart attacker might be able to initiate the attack at a time when he knows that there will be enough gadgets within the sliding window. There's a second problem with using the Q paper's results for your purposes. The Q paper looks at the universe of all programs with >= 20 KB of code, and computes the fraction of them that contain enough gadgets for a ROP attack to succeed. This is not the same as looking at the set of programs with exactly 20KB of code. There many programs that have much more than 20KB of code, and intuitively, we'd expect those to be much more likely to have a large gadget set -- so the success probability for programs of size 20KB might be smaller than 80%. This submission doesn't seem to be aware of this distinction or its implications for the security of ROPecker. (If you asked me to guess, I suspect 20KB is still about the right size: where 20KB programs are more likely than not to allow a ROP attack, and smaller ones have less than a 50% chance of a ROP attack. However, I shouldn't have to guess. The paper needs to do this analysis itself. You could download the Q tool and measure the numbers that you need yourself.) ROPecker seems very fragile. If the attacker finds even one gadget that your analysis doesn't, then I suspect the attacker wins: the attacker can construct a ROP attack that ROPecker won't flag. The attacker just needs to invoke this undetected gadget at least once per every other 10 gadget invocations (or more frequently), and ROPecker will think there is no long chain of gadgets. Will the implementation of ROPecker be made available? I hope so. When publishing a speculative defense like this, we need to make sure there's a way for other researchers who want to attack the system to have a clear target for attacks. If they want to evaluate a potential attack, they need a way to show their attack is (or isn't) effective. With this kind of paper, that sort of evaluation will be harder to do without the code (if they reimplement your idea, it would be too easy to dismiss their results by accusing them of doing a poor job of reimplementation). One of the arguments for accepting a paper like this, where it's possible that the scheme might get broken, is that even if the scheme gets broken, we've learned something. However that argument is premised on the assumption that we can tell whether an alleged attack is indeed effective -- so I think that making the code of ROPecker available would be helpful to the scientific community. I find Section VIII.A confusing. What's the bottom line? Can an attacker evade ROPecker with a stack pivot? I think the bottom line answer is "no" (but the first sentence says yes, which is confusing). If the answer is "no", then ROPecker is a very weak defense, as many/most ROP attacks use a stack pivot (and it would be easy to adjust most ROP attacks to use a stack pivot if necessary). The security analysis in Section VIII.B seems overly optimistic. For instance, saying that longer gadgets have never been seen is irrelevant (even if true): attackers have to date not had any incentive to use longer badgets, but if ROPecker is deployed, they will. Therefore, past experience is not a useful predictor of attacker behavior if ROPecker is deployed and not a convincing reason to believe that it is hard to build ROP attacks using long gadgets. Similarly, saying that this attack can "in theory" defeat your scheme comes off as an attempt to downplaying this weakness. Either it can defeat your scheme, or it can't. I suspect it can, and not just in theory, but in practice, too. Section VIII.B makes many unsubstantiated claims. For instance, it claims that few useful long gadgets can be found. Well, where's your evidence? My guess is that this is simply false. What's more, the attacker doesn't even need lots of long gadgets; it would suffice to use a long gadget for every 10th gadget. Basically, Section VIII is handwavy and unconvincing. I would be more positive about this paper if the paper described a scheme but admitted forthrightly that the scheme can probably be broken and described a number of promising attacks that will likely work (without trying to downplay those attacks or pretend that ROPecker will be secure in its current form). Such a paper might stimulate future research by suggesting a starting point that other researchers could work from, while making clear that ROPecker in its current form is not sufficient and additional ideas will be needed. I wonder if I've really understood how ROPecker works. At any point in time, there are only 5 pages that are executable? (5 pages * 4KB/page = 20KB) And all the rest of the pages in the address space are non-executable? So that's a total of 5 pages, out of the program's text segment, all libraries, GOT/PLT entries, etc? The paper wasn't super-explicit on this. It seems very surprising that this is enough for good performance -- can 5 pages really be enough for performance? Are you sure that kBouncer requires binary rewriting? That wasn't my recollection (though I admit I haven't gone back to check the kBouncer paper carefully). Table II's caption is confusing. What's "unintended" and "intended"? I think you mean "unaligned" and "aligned". The discussion at the end of Section V.D needs more elaboration. Do you mean that 62-90% of indirect branches jump to an unaligned address? What's your definition of an unaligned address? I would assume that any address that marks the start of an instruction that could be executed during any normal program execution would count as aligned. Are you perhaps looking only at instructions that can be found statically by a recursive disassembler, and treating their start addresses as aligned and everything else as unaligned? If so, that's flawed, because every address that can only be reached through an indirect jump will be missed by the disassembler and thus will wrongly be counted as "unaligned". The paper needs careful proofreading. e.g., sever, may hybrid, Date Execution, "In addition, The", never by executed, e.t., "has to... or analyzes". =========================================================================== Authors' Response by Yueqiang Cheng Paper #156: ROPecker: A Generic and Practical Approach For Defending Against ROP Attacks --------------------------------------------------------------------------- Thanks for the reviewers¡¯ valuable comments. 1. Comparison with existing work. KBouncer: a) KBouncer ONLY monitors execution flows on selected critical paths, e.g., system APIs, and inevitably misses the ROP attacks that do not use those paths. ROPecker checks attacks whenever the execution flow jumps out of the sliding window. b) KBouncer identifies (PAST) ROP gadget chain in LBR records. Unfortunately, LBR is not fully reliable, because it can be overflowed or polluted by context switch. We use offline analysis and online detection to identify PAST and FUTURE ROP gadget chain. c) KBouncer needs on-the-fly binary rewriting to install checkpoints, while we require NO modification on the binary code (completely transparent to protected applications). Runtime check v.s. gadget elimination (with binary rewriting and compiler support). It is relatively easy to identify gadgets within a binary application, but it is infeasible to remove all of them. As shown in the ¡°just-in-time code reuse¡± paper published at Oakland¡¯13, the gadget elimination procedure may introduce new gadgets. Thus, our scheme chooses to only identify gadgets (for generating the gadget database) but not to eliminate them. In fact, our scheme is complementary to gadget elimination and other methods, such as enforcing control flow integrity or randomizing address space. Our scheme provides another line to defend against ROP attacks. 2. Detection accuracy and efficiency. (1) Reliance on disassembler: ROPecker identifies the gadget chain in execution flow, not by a single gadget. This relaxes the requirement for a perfect assembler to correctly identify all gadgets. In our implementation, all code sequences ended with indirect branches are as potential gadgets. In addition, we can verify the disassembling results using more advanced tools and techniques (Section VI-A). (2) Handling with long gadgets: As the reviewers pointed out, long gadgets may introduce false negatives. Thus, we propose an aggregation threshold extension to mitigate such attacks (Section VIII-C). We also argue that constructing a meaningful gadget chain with lots of consecutive long gadgets is very difficult. (3) Emulation overhead: Although instruction emulation is relatively inefficient, we deal with this by minimize its invocation. Our offline analysis database is carefully designed to cover most usual cases, so that most of the time, ROPecker only need to access the table to decide the position of next possible gadget. The invocation of emulation is rare. As is proved in the experiment (Section VII-B), our runtime overhead is reasonably low. 3. Limitations and experimental results. (1) ROPecker cannot detect the ROP attacks with only one or two gadgets, where the power of the adversary is also limited. (2) As for the memory overhead of our system, in Section VI-A, we measured the storage space of database for all shared libraries (about 548MB). Note that this database can be shared by almost all applications. Protecting one specific application requires much smaller database, e.g., as in Section VII B 1), htediter¡¯s database is smaller than 1MB. =========================================================================== Comment Paper #156: ROPecker: A Generic and Practical Approach For Defending Against ROP Attacks --------------------------------------------------------------------------- I understand this paper will be shepherded. I feel strongly that the paper needs to state very clearly and explicitly that the scheme as presented in the paper, with the specific parameters that were proposed and evaluated, is not secure and can be evaded by ROP attacks -- and that while changing the parameters or adjusting the scheme might address those evasion attacks, we do not know how doing so might affect performance and/or the false positive rate. This is critical for the future literature, so we don't give readers the impression that the problem has been solved.