From: <lbauer@cmu.edu>
Date: Fri, Nov 1, 2013 at 11:40 AM
Subject: [NDSS 2014] Paper #156 "ROPecker: A Generic and Practical..."
To: Zongwei Zhou <stephenzhou@cmu.edu>
Cc: Lujo Bauer <lbauer@cmu.edu>


Dear Zongwei Zhou,

The NDSS program committee is happy to inform you that your paper #156 has
been *conditionally* accepted to appear at NDSS 2014.  To appear at the
conference, the paper must be revised by Fri Nov 29 to address specific
concerns raised during the review process.

       Title: ROPecker: A Generic and Practical Approach For Defending
              Against ROP Attacks
     Authors: Yueqiang Cheng (Singapore Management University)
              Zongwei Zhou (Carnegie Mellon University)
              Miao Yu (Carnegie Mellon University)
              Xuhua Ding (Singapore Management University)
              Robert H. Deng (Singapore Management University)
              Virgil D. Gligor (Carnegie Mellon University)
  Paper site: https://ndss2014.ece.cmu.edu/crp/ndss2014/paper/156

The program committee has assigned a shepherd for your paper. The timeline
for paper revision is as follows:

-- By Nov 5, your shepherd will send you a list of required changes or
specific concerns that will have to be addressed for the paper to be
accepted.

-- By Nov 15, you will provide the shepherd with
(1) the revised version of your manuscript,
(2) a diff file showing changes from the initial submission, and
(3) an explanatory note, explaining how the paper has been modified to
address concerns pointed out by the shepherd.

-- The shepherd will consult with the program committee, and, by Nov 22,
the shepherd will notify you if the paper is accepted or more revisions are
necessary. In the latter case, the outstanding concerns will need to be
addressed by Nov 29.

-- If the revised version is acceptable, then a camera-ready version will
need to be submitted through the submission system by Dec 6. The program
committee may reject your paper if the committee's concerns have not been
addressed by Nov 29.

After receiving the shepherd's first instructions, the onus is on the
authors to communicate with the shepherd in a timely and clear fashion.
Feel free to clarify with your shepherd any questions about reviewers'
comments or requirements for revision. Your shepherd may also amend this
timeline to accommodate constraints on either side.

Reviews (and comments, if any) are appended to this email. Please take
reviewer comments into account as you prepare the final version of the
paper.

Best regards,

-Lujo Bauer
NDSS 2014 PC chair

===========================================================================
                          NDSS 2014 Review #156A
                     Updated 15 Oct 2013 4:59:25pm EDT
---------------------------------------------------------------------------
  Paper #156: ROPecker: A Generic and Practical Approach For Defending
              Against ROP Attacks
---------------------------------------------------------------------------

                      Overall merit: 5. Weak accept
                 Reviewer expertise: 3. Knowledgeable

                         ===== Paper summary =====

ROPecker detects ROP attacks by identifying gadgets in the execution
flow. The detection relies on LBR (last branch record) information
available during runtime and an offline gadget database. A sliding
window mechanism is in charge of invoking the detection when needed. The
LBR is used by "Past Payload Detection" to detect how many gadgets have
been executed. The gadgets information is extracted from the application
offline, which can improve the performance and help the simulation of
ESP for "Future Payload Detection".

                        ===== Main strengths =====

- Very low overhead
- No need to modify source code or binary

                        ===== Main weaknesses =====

- It's unclear to me how you extract the gadgets. The effectiveness of
ROPeck largely relies on a complete gadgets database, however, it is
difficult to achieve that in reality. What if very long gadgets or
obscure gadgets exist and are used by attackers? Since the alert is
threshold-based, it's possible to bypass ROPecker by inserting
unidentified gadgets in the gadget chain.

- In some cases, one or two gadgets are enough to launch meaningful ROP,
which would go unnoticed by ROPecker. The same goes to very long ROP
gadgets, which can overflow the sliding window and then bypass the
detection.

- The ESP simulation part in "Future Payload Detection" may cause a lot
of performance overhead.

                   ===== Other comments to authors =====

To me this paper looks quite similar to kBouncer [Usenix security'13]. There are only some minor differences, e.g., how to trigger the detection.

===========================================================================
                          NDSS 2014 Review #156B
                     Updated 17 Sep 2013 4:33:34pm EDT
---------------------------------------------------------------------------
  Paper #156: ROPecker: A Generic and Practical Approach For Defending
              Against ROP Attacks
---------------------------------------------------------------------------

                      Overall merit: 5. Weak accept
                 Reviewer expertise: 3. Knowledgeable

                         ===== Paper summary =====

In this paper the authors describe a new system to protect userspace
applications against ROP attacks by looking at
sufficiently long chains of gadgets in the program execution.
This is achieved by analyzing the Last Branch Registers (LBRs)
using a sliding windows mechanism. The code of the application
outside the window is protected from execution,
so that every attempt to execute code outside the window can be intercepted
to trigger the ROPecker routine.

                        ===== Main strengths =====

* I like a lot the general idea of using a sliding window to trigger the detection algorithm.
   Since gadgets are likely distributed in large sections of code, the window is a very clever
   idea to improve the performance without reducing the tool effectiveness

                        ===== Main weaknesses =====

* The main limitation of ROPEcker is the fact that it completely relies on
  a perfect disassembly to identify all instructions in the code.
  We all know that this can be a challenging task (especially by using a
  linear sweep disassembler as used in this paper), so I found quite annoying
  the fact that the authors did not even discuss this as a possible limitation.
  I'd like to see this problem discussed and measure to which extent
  it can introduce false positive (misunderstanding between aligned and not
  aligned instructions).

* The authors should measure what is the impact of choosing a maximum gadget
  length of six instructions. What would change with 7? or 8?

* I'd like to know what is the average memory overhead for a system (e.g. a server,
   or a common desktop) running Ropecker.

* The authors say that their approach only add an *insignificant* overhead in
  their tests. Well, 2.6% average on cpu and up to almost 50% in certain
  network-bound tests are absolutely NOT insignificant.
  "Often acceptable" may be more a more fair way to say it.

* I'm not convinced about the comparison with previous approaches. For
  instance, some are dismissed because they may generate false positives
  (also this approach can, if a binary contains a sufficiently long sequence
  of gadgets) or because they may have problems in particular versions of
  Windows (the difference in length between gadgets chains was only evaluated
  on Linux and maybe it wont work in Windows at all).
  Also the limitations mentioned for Gfree are not correct
  (they authors should instead say that it requires the application source code).

* I didnt understand the new tool developed to test the chains generated by Q.

===========================================================================
                          NDSS 2014 Review #156C
                    Updated 18 Oct 2013 10:20:01am EDT
---------------------------------------------------------------------------
  Paper #156: ROPecker: A Generic and Practical Approach For Defending
              Against ROP Attacks
---------------------------------------------------------------------------

                      Overall merit: 5. Weak accept
                 Reviewer expertise: 2. Some familiarity

                         ===== Paper summary =====

In this paper, the authors present a new approach for effectively and efficiently defending against return-oriented programming exploitation. The proposal combines the dynamic online detection with offline static processing. In particular, ROPecker detects an ROP attack at runtime by checking the presence of sufficiently long chain of gadgets in the past and future execution flow. It employs some hardware and kernel features, such as Last Branch Record register and kernel-assisted sliding windows to improve the efficiency. Another advantage, which is different from several existing work, is that ROPecker does not rely on any other side information, such as source code, compiler support and binary rewriting.

                        ===== Main strengths =====

Overall I think the idea in the paper is interesting. It is nice to use LBR register and sliding window to achieve runtime checking. Since we can skip the instruction-by-instruction checking inside the sliding window, we could reduce the overhead. Meanwhile, the idea of checking the presence of a sufficiently long chain of gadgets is also interesting.

                        ===== Main weaknesses =====

However, since the topic is a well-studied one, I expect the authors to show enough evidence that their approach can outperform existing work but the authors did not convince me. For example, I am not so convinced of the motivation. Why we need runtime check and why we need an approach without compiler or binary rewriting support? Where can we deploy ROPecker? At the analysis side or the client side? If at the client side, will it cause trouble if we enable such defense in the kernel against all programs?  It seems that the authors do not provide strong evidence to answer these questions. From my viewpoint, since ROPecker also conducts offline static analysis on the binary, why not just use the existing binary rewriting scheme, such as Gadget Smashing [32]. Such rewriting scheme is also an efficient solution. Unfortunately, the authors do not provide a performance comparison between any existing work and ROPecker in the evaluation. It will be very helpful to find some undisco
 vered RO
 P attacks using ROPecker.

Moreover, ROPecker also has the similar limitation as the previous approach. For example, it requires enabling emulation to handle indirect jump, which may consume a lot of resources without improving much defense effectiveness. Another limitation is the judgment based on the empirical study of the chain length.

One suggestion about the figures is that the authors should adjust the scale, so we can see clearly about the difference.

*******After rebuttal******
Thank you for addressing some of my concerns. I now increase my score accordingly.

===========================================================================
                          NDSS 2014 Review #156D
                     Updated 28 Oct 2013 9:20:15am EDT
---------------------------------------------------------------------------
  Paper #156: ROPecker: A Generic and Practical Approach For Defending
              Against ROP Attacks
---------------------------------------------------------------------------

                      Overall merit: 3. Weak reject
                 Reviewer expertise: 3. Knowledgeable

                         ===== Paper summary =====

ROP defense

                        ===== Main strengths =====

I really liked the idea behind this paper.
ROPecker gets impressive performance and
has some clever ideas.  However, the security
story is not convincing.

The positive side is that the paper has a
very clever idea that might stimulate more
research.  The negative side is that publishing
this paper might give the community an overly
optimistic view of the effectiveness of this
idea.

                        ===== Main weaknesses =====

It appears that ROPecker is not secure: as
the paper acknowledges, there are attacks that
can defeat it (see Section VIII).  The paper
suggests that it might be possible to stop
those attacks, but this is speculative and
has not been implemented or evaluated.  And
even if the authors are right that those
attacks can be stopped, I still find the
security evaluation unconvincing and my
guess would be that ROPecker can be defeated
through other methods not considered in the
paper.

I'm OK with speculative ideas where we're not
sure whether they can be defeated, but the paper
needs to do a better job of evaluating all
plausible attacks.  I don't think this paper
does a good enough job of that.  The paper needs
to do a good enough job that, if someone breaks
the scheme, then the break will likely be
"interesting" or involve non-obvious ideas.
If I had to bet, I'd bet that ROPecker can
be broken without requiring non-obvious new
ideas, and the paper doesn't persuade me
otherwise.

                   ===== Other comments to authors =====

I suspect that ROPecker can be defeated
simply by using longer gadgets (longer than
6 instructions).


I didn't see a convincing evaluation of how
effective ROPecker will be at detecting attacks.
I would prefer to see an evaluation where you
run some programs under ROPecker, look at all
of the sliding windows that ever occur during
normal execution, analyze the amount of code
that's executable during each such sliding
window using Q to see for which time periods
a ROP attack would be possible (because there
are enough gadgets), and then compute a
time-average of the fraction of time instants
at which a ROP attack would be possible.  I
couldn't see anything like that.  I had a hard
time understanding what the evaluation methodology
in VII.A.2 was, or what class of attacks were
considered.

The paper misrepresents the findings of the
Q paper.  The findings are not that you need
20KB of code for a ROP attack to be successful.  The
findings are that, for 80% of all programs containing
at least 20KB of code, a ROP attack will be possible.
(Over all programs containing at least 10KB of code,
the corresponding success probability is about 60%.)
Also, for comparison, Roglia et al. (ACSAC 2009)
estimate that 96% of programs containing at least
20K of code can be attacked with a ROP attack.
Make sure you read the erratum to the Q paper:

http://users.ece.cmu.edu/~ejschwar/papers/usenix11-update.pdf

The consequences of this for ROPecker are not good.
ROPecker seems to assume that if less than 20KB of
code is available, a ROP attack will surely fail.
However, the Q paper emphatically does not say this.
My best guess is that this is not correct: if there's
a bit less than 20KB of code available, my best guess
is that a ROP attack might still often be possible
(maybe even with probability greater than 50%).

This is problematic for ROPecker, because it means
that the attacker might have a high success probability
even staying within the sliding window.  Actually,
the implications are even worse than that, because
the attacker has some control over the time when
the the ROP attack is initiated.  A smart attacker
might be able to initiate the attack at a time when
he knows that there will be enough gadgets within
the sliding window.

There's a second problem with using the Q paper's
results for your purposes.  The Q paper looks at
the universe of all programs with >= 20 KB of code,
and computes the fraction of them that contain enough
gadgets for a ROP attack to succeed.  This is not
the same as looking at the set of programs with
exactly 20KB of code.  There many programs that
have much more than 20KB of code, and intuitively,
we'd expect those to be much more likely to have a
large gadget set -- so the success probability for
programs of size 20KB might be smaller than 80%.
This submission doesn't seem to be aware of this
distinction or its implications for the security
of ROPecker.  (If you asked me to guess, I suspect
20KB is still about the right size: where 20KB
programs are more likely than not to allow a ROP
attack, and smaller ones have less than a 50%
chance of a ROP attack.  However, I shouldn't
have to guess.  The paper needs to do this analysis
itself.  You could download the Q tool and measure
the numbers that you need yourself.)


ROPecker seems very fragile.  If the attacker
finds even one gadget that your analysis doesn't,
then I suspect the attacker wins: the attacker
can construct a ROP attack that ROPecker won't
flag.  The attacker just needs to invoke this
undetected gadget at least once per every other
10 gadget invocations (or more frequently),
and ROPecker will think there is no long chain
of gadgets.


Will the implementation of ROPecker be made
available?  I hope so.  When publishing a
speculative defense like this, we need to
make sure there's a way for other researchers
who want to attack the system to have a clear
target for attacks.  If they want to evaluate
a potential attack, they need a way to show
their attack is (or isn't) effective.  With this
kind of paper, that sort of evaluation will
be harder to do without the code (if they
reimplement your idea, it would be too easy
to dismiss their results by accusing them
of doing a poor job of reimplementation).
One of the arguments for accepting a paper
like this, where it's possible that the scheme
might get broken, is that even if the scheme
gets broken, we've learned something.
However that argument is premised on the
assumption that we can tell whether an
alleged attack is indeed effective -- so
I think that making the code of ROPecker
available would be helpful to the scientific
community.


I find Section VIII.A confusing.  What's
the bottom line?  Can an attacker evade
ROPecker with a stack pivot?  I think the
bottom line answer is "no" (but the first
sentence says yes, which is confusing).
If the answer is "no", then ROPecker is
a very weak defense, as many/most ROP attacks
use a stack pivot (and it would be easy to
adjust most ROP attacks to use a stack pivot
if necessary).

The security analysis in Section VIII.B
seems overly optimistic.  For instance,
saying that longer gadgets have never
been seen is irrelevant (even if true):
attackers have to date not had any
incentive to use longer badgets, but if
ROPecker is deployed, they will.
Therefore, past experience is not a
useful predictor of attacker behavior if
ROPecker is deployed and not a
convincing reason to believe that it is
hard to build ROP attacks using long
gadgets.  Similarly, saying that this
attack can "in theory" defeat your
scheme comes off as an attempt to
downplaying this weakness.  Either it
can defeat your scheme, or it can't.  I
suspect it can, and not just in theory,
but in practice, too.

Section VIII.B makes many unsubstantiated
claims.  For instance, it claims that few
useful long gadgets can be found.  Well, where's
your evidence?  My guess is that this is
simply false.  What's more, the attacker
doesn't even need lots of long gadgets; it
would suffice to use a long gadget for
every 10th gadget.

Basically, Section VIII is handwavy and
unconvincing.  I would be more positive
about this paper if the paper described a
scheme but admitted forthrightly that the
scheme can probably be broken and described
a number of promising attacks that will
likely work (without trying to downplay
those attacks or pretend that ROPecker
will be secure in its current form).
Such a paper might stimulate future research
by suggesting a starting point that other
researchers could work from, while making
clear that ROPecker in its current form is
not sufficient and additional ideas will
be needed.


I wonder if I've really understood how
ROPecker works.  At any point in time, there
are only 5 pages that are executable?
(5 pages * 4KB/page = 20KB)  And all the
rest of the pages in the address space are
non-executable?  So that's a total of 5
pages, out of the program's text segment,
all libraries, GOT/PLT entries, etc?
The paper wasn't super-explicit on this.
It seems very surprising that this is
enough for good performance -- can 5 pages
really be enough for performance?


Are you sure that kBouncer requires binary rewriting?
That wasn't my recollection (though I admit I haven't gone
back to check the kBouncer paper carefully).

Table II's caption is confusing.  What's
"unintended" and "intended"?  I think you mean
"unaligned" and "aligned".  The discussion at
the end of Section V.D needs more
elaboration.  Do you mean that 62-90% of indirect
branches jump to an unaligned address?  What's
your definition of an unaligned address?  I would
assume that any address that marks the start of
an instruction that could be executed during
any normal program execution would count as
aligned.  Are you perhaps looking only at
instructions that can be found statically by
a recursive disassembler, and treating their
start addresses as aligned and everything else
as unaligned?  If so, that's flawed, because
every address that can only be reached through
an indirect jump will be missed by the
disassembler and thus will wrongly be
counted as "unaligned".

The paper needs careful proofreading.  e.g., sever,
may hybrid, Date Execution, "In addition, The",
never by executed, e.t., "has to... or analyzes".


===========================================================================
       Authors' Response by Yueqiang Cheng <yqcheng.2008@smu.edu.sg>
  Paper #156: ROPecker: A Generic and Practical Approach For Defending
              Against ROP Attacks
---------------------------------------------------------------------------
Thanks for the reviewers�� valuable comments.
1.  Comparison with existing work.
KBouncer: a) KBouncer ONLY monitors execution flows on selected critical paths, e.g., system APIs, and inevitably misses the ROP attacks that do not use those paths. ROPecker checks attacks whenever the execution flow jumps out of the sliding window. b) KBouncer identifies (PAST) ROP gadget chain in LBR records. Unfortunately, LBR is not fully reliable, because it can be overflowed or polluted by context switch. We use offline analysis and online detection to identify PAST and FUTURE ROP gadget chain. c) KBouncer needs on-the-fly binary rewriting to install checkpoints, while we require NO modification on the binary code (completely transparent to protected applications).

Runtime check v.s. gadget elimination (with binary rewriting and compiler support).
It is relatively easy to identify gadgets within a binary application, but it is infeasible to remove all of them. As shown in the ��just-in-time code reuse�� paper published at Oakland��13, the gadget elimination procedure may introduce new gadgets. Thus, our scheme chooses to only identify gadgets (for generating the gadget database) but not to eliminate them. In fact, our scheme is complementary to gadget elimination and other methods, such as enforcing control flow integrity or randomizing address space. Our scheme provides another line to defend against ROP attacks.

2.  Detection accuracy and efficiency.
(1) Reliance on disassembler: ROPecker identifies the gadget chain in execution flow, not by a single gadget. This relaxes the requirement for a perfect assembler to correctly identify all gadgets. In our implementation, all code sequences ended with indirect branches are as potential gadgets. In addition, we can verify the disassembling results using more advanced tools and techniques (Section VI-A).
(2) Handling with long gadgets: As the reviewers pointed out, long gadgets may introduce false negatives. Thus, we propose an aggregation threshold extension to mitigate such attacks (Section VIII-C). We also argue that constructing a meaningful gadget chain with lots of consecutive long gadgets is very difficult.
(3) Emulation overhead: Although instruction emulation is relatively inefficient, we deal with this by minimize its invocation. Our offline analysis database is carefully designed to cover most usual cases, so that most of the time, ROPecker only need to access the table to decide the position of next possible gadget. The invocation of emulation is rare. As is proved in the experiment (Section VII-B), our runtime overhead is reasonably low.

3. Limitations and experimental results.
(1) ROPecker cannot detect the ROP attacks with only one or two gadgets, where the power of the adversary is also limited. (2) As for the memory overhead of our system, in Section VI-A, we measured the storage space of database for all shared libraries (about 548MB). Note that this database can be shared by almost all applications. Protecting one specific application requires much smaller database, e.g., as in Section VII B 1), htediter��s database is smaller than 1MB.


===========================================================================
                                  Comment
  Paper #156: ROPecker: A Generic and Practical Approach For Defending
              Against ROP Attacks
---------------------------------------------------------------------------
I understand this paper will be shepherded.  I feel strongly that the paper needs to state very clearly and explicitly that the scheme as presented in the paper, with the specific parameters that were proposed and evaluated, is not secure and can be evaded by ROP attacks -- and that while changing the parameters or adjusting the scheme might address those evasion attacks, we do not know how doing so might affect performance and/or the false positive rate.  This is critical for the future literature, so we don't give readers the impression that the problem has been solved.