\documentclass[11pt]{article}
\usepackage{amssymb,amsmath,amsthm,url}
\usepackage{graphicx}
%uncomment to get hyperlinks
%\usepackage{hyperref}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%Some macros (you can ignore everything until "end of macros")
\def\class{0}
\topmargin 0pt \advance \topmargin by -\headheight \advance
\topmargin by -\headsep
\textheight 8.9in
\oddsidemargin 0pt \evensidemargin \oddsidemargin \marginparwidth
0.5in
\textwidth 6.5in
%%%%%%
\newcommand{\getsr}{\gets_{\mbox{\tiny R}}}
\newcommand{\bits}{\{0,1\}}
\newcommand{\Ex}{\mathbb{E}}
\newcommand{\To}{\rightarrow}
\newcommand{\e}{\epsilon}
\newcommand{\R}{\mathbb{R}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\maxpr}{\text{\rm max-pr}}
\newenvironment{summary}{\begin{quote}\textbf{Summary.}}{\end{quote}}
\newtheorem{theorem}{Theorem}
\newtheorem{axiom}{Axiom}
\newtheorem{lemma}{Lemma}
\newtheorem{claim}{Claim}[theorem]
\theoremstyle{definition}
\newtheorem{exercise}{Exercise}
\newtheorem{definition}{Definition}
\newcommand{\sstart}{\triangleright}
\newcommand{\send}{\triangleleft}
\newcommand{\cclass}[1]{\mathbf{#1}}
\renewcommand{\P}{\cclass{P}}
\newcommand{\NP}{\cclass{NP}}
\newcommand{\Time}{\cclass{Time}}
\newcommand{\BPP}{\cclass{BPP}}
\newcommand{\Size}{\cclass{Size}}
\newcommand{\Ppoly}{\cclass{P_{/poly}}}
\newcommand{\CSAT}{\ensuremath{\mathsf{CSAT}}}
\newcommand{\SAT}{\ensuremath{\mathsf{3SAT}}}
\newcommand{\IS}{\mathsf{INDSET}}
\newcommand{\poly}{\mathrm{poly}}
\newcommand{\inp}{\mathsf{in}}
\newcommand{\outp}{\mathsf{out}}
\newcommand{\Adv}{\mathsf{Adv}}
\newcommand{\Supp}{\mathsf{Supp}}
\newcommand{\dist}{\Delta}
\newcommand{\indist}{\approx}
\newcommand{\PRG}{\mathsf{G}}
\newcommand{\Enc}{\mathsf{E}}
\newcommand{\Dec}{\mathsf{D}}
\newcommand{\Fcal}{\mathcal{F}}
\newcommand{\Sign}{\mathsf{Sign}}
\newcommand{\Ver}{\mathsf{Ver}}
\newcommand{\angles}[1]{\langle #1 \rangle}
\newcommand{\eqdef}{\stackrel{\vartriangle}{=}}
% end of macros
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\title{Lecture 9 - Message Authentication Codes}
\author{Boaz Barak}
\begin{document}
\maketitle
{ \ifnum\class=1 \fontsize{14pt}{16pt} \selectfont \fi
\begin{description}
\item[Reading:] Boneh-Shoup chapter 6, Sections 9.1--9.3.
\item[Data integrity] Until now we've only been interested in
protecting \emph{secrecy} of data. However, in many cases what we
care about is \emph{integrity}.
Maintaining integrity is about preventing an adversary from
tampering with the data that was sent or stored by the legitimate
users. For example, often people are not worried so much about
secrecy of their email, but they definitely want to be assured that
the email they received was indeed the one being sent.
Another important example is over-the-air software patches --- you want to make sure that the software patch you
are installing is the right one from the software company and not by some hacker, but there's nothing secret about
the patch.
In general, integrity is more basic than secrecy, in the sense there are many situations where one cares about
integrity and not secrecy, but not so many of the reverse. (In fact, as we saw last week, without integrity it's
often possible to violate secrecy as well.)
\item[Encryption and integrity] Does encryption guarantee integrity?
It might seem at first that yes: if an attacker can't read the
message, how can she change it?
However, this is not the case. For example, suppose that we encrypt
the message $x$ with the PRF-based CPA-secure scheme to
$\angles{r,f_s(r) \oplus x}$. The attacker can flip the last bit of
$f_s(r) \oplus x$ causing the receiver to believe the sent message
was $x_1,\ldots,x_{n-1},\overline{x_n}$.
More generally, while encryption is supposed to be the digital analog of a sealed envelope, that provides both
secrecy and integrity, one should not get confused by this metaphore. (Indeed, the closes thing to a digital analog
of a sealed envelope is a CCA secure encryption, that provides some measure of integrity as well.)
\item[Checksums etc.] A common device used for correcting errors is adding redundancy or checksums. A simple
example is adding to $x$ as a last bit the \emph{parity} of $x$, that is $\sum_i {x_i}
\pmod{2}$.\footnote{Sometimes this is generalized to more bits, say, parity mod $2^{32}$.} When receiving a
message, the receiver checks the parity, and if the check fails, considers the message corrupted (and if
appropriate asks to resend it). This works against \emph{random} errors but not against \emph{malicious}
errors: the attacker can change also parity check bit. In fact, as we saw above, the attacker can do this even
if the message (including the parity check bit) is encrypted.
\item[Message Authentication Codes (MAC)] The cryptographic primitive that we use for this is a \emph{message
authentication code} (MAC). A message authentication code (MAC) consists of two algorithms $(\Sign,\Ver)$ (for
signing and verifying). There is a shared key $k$ between the signer and the verifier. The sender of a message
$x$ computes $s = \Sign_k(x)$, $s$ is often called a \emph{signature} or a \emph{tag}. Then, it sends $(x,s)$
to the receiver. The receiver accepts the pair $(x,s)$ as valid \emph{only} if $\Ver_k(x,s)=1$.
\item[Security for MACs] We define a MAC secure if it withstands a \emph{chosen message attack}. (Notation: $n$ -
key length, $m$ - message length, $t$ - tag length)
\begin{definition}[CMA secure MAC] A pair of algorithms
$(\Sign,\Ver)$ (with $\Sign:\bits^n\times\bits^m\To\bits^t$,
$\Ver:\bits^n\times\bits^m\times\bits^t\To\bits$) is a
$(T,\e)$-CMA-secure MAC if:
\begin{description}
\item[Validity] For every $x,k$, $\Ver_k(x,\Sign_k(x))=1$.
\item[Security] For every $T$-time $\Adv$, consider the following
experiment:
\begin{itemize}
\item Choose $k \getsr \bits^n$
\item Give adversary access to black boxes for $\Sign_k(\cdot)$ and
$\Ver_k(\cdot)$.
\item Adversary \emph{wins} if it comes up with a pair
$\angles{x',s'}$ such that \textbf{(a)} $x'$ is \emph{not} one of
the messages that the adversary gave to the black box
$\Sign_k(\cdot)$ and \textbf{(b)} $\Ver_k(x',s')=1$.
\end{itemize}
Then the probability $\Adv$ wins is at most $\e$.
\end{description}
\end{definition}
Naturally, we define $(\Sign,\Ver)$ to be CMA-secure if for every
$n$ it is $(T(n),\e(n))$-CMA-secure for super-polynomial $T,\e$. In
other words, there is no polynomial-time $\Adv$ that succeeds with
polynomial probability to break it.
\item[Example] As discussed above, the following are \emph{not}
MACs:
\begin{itemize}
\item A CPA-secure encryption scheme.
\item A cyclic redundancy code (CRC)
\end{itemize}
\item[Construction for a message authentication code.] We prove the
following theorem:
\begin{theorem} Let $\{ f_k \}$ be a PRF. Then the following is a
MAC:
\begin{itemize}
\item $\Sign_k(x) = f_k(x)$.
\item $\Ver_k(x,s) = 1$ iff $f_k(x)=s$.
\end{itemize}
\end{theorem}
\begin{proof} We prove this in the typical way we prove
constructions using PRFs are secure: we define an \emph{ideal} MAC
scheme that uses a \emph{truly random} function, prove it secure,
and then derive security for our real scheme.
\paragraph{Proof of security for ideal scheme.} Let $A$ be an
adversary running a chosen-message attack against the ideal scheme.
At the end of the attack it outputs a string $x'$ that was
\emph{not} asked by it before from the signing oracle and some
supposed tag $t'$. Since this is a random function, we can think of
the oracle at this point choosing the tag $t$ for $x'$ at random and
we have that $\Pr[ t=t' ] = 2^{-n}$.
\end{proof}
Note that this MAC has the property that both signing and verification are \emph{deterministic}, and moreover for
every message $x$ there is a \emph{unique tag} that the verification accept. We call this the \emph{unique tag}
property--- most of if not all MACs we'll consider have this technical property and it's sometime useful.
\item[Using Authentication to get CCA security] As we saw last time, CPA secure encryption is not always strong
enough. For this purpose we defined CCA security as follows:
\begin{definition}[CCA security] An encryption
$(\Enc,\Dec)$ is said to be ($T,\e$)-\emph{CCA secure} if it's valid
($\Dec_k(\Enc_k(x))=x$) and for every $T$-time $A$ if we consider
the following game:
\begin{itemize}
\item Sender and receiver choose shared $k \getsr \bits^n$.
\item $A$ gets access to black boxes for $\Enc_k(\cdot)$ and
$\Dec_k(\cdot)$.
\item $A$ chooses $x_1,x_2$.
\item Sender chooses $i \getsr \{1,2\}$ and gives $A$
$y=\Enc_k(x_i)$.
\item $A$ gets more access to black boxes for $\Enc_k(\cdot)$ and
$\Dec_k(\cdot)$ but is restricted not to ask $y$ to the decryption
box. More formally, $A$ gets access to the following function
$D'_k(\cdot)$ instead of $\Dec_k(\cdot)$
\[
D'_k(y') = \begin{cases}
D_k(y') & y' \neq y \\
\bot & y' = y
\end{cases}
\]
($\bot$ is a symbol that signifies ``failure'' or ``invalid input'')
\item $A$ outputs $j\in \{1,2\}$.
\end{itemize}
$A$ is successful if $j=i$, the scheme is $(T,\e)$ secure if the
probability that $A$ is successful is at most $\tfrac{1}{2}+\e$.
\end{definition}
\item[Order of Encryption and Authentication]
A natural approach to get CCA security is to add authentication. There are three natural
constructions:
\begin{itemize}
\item Encrypt and then Authenticate (EtA): Compute $y = \Enc_k(x)$ and $t_y =
\Sign_{k'}(y)$ and send $(y,t_y)$. (IPSec-style)
\item Authenticate and then Encrypt (AtE): Compute $t_x=
\Sign_{k'}(x)$ and then $\Enc_k(t_x)$. (SSL style)
\item Encrypt and Authenticate (E\& A): Compute $y=\Enc_k(x)$ and
$t_x=\Sign_{k'}(x)$ and send $(y,t_x)$. (SSH style)
\end{itemize}
(Use only CRC for authentication is WEP-style) Note that in all these methods we use independent keys for
encryption and authentication.
It turns out that generically there is only one right choice.
\begin{theorem}~\\
\begin{enumerate}
\item If $(\Enc,\Dec)$ is CPA-secure and $(\Sign,\Ver)$ is CMA-secure with unique tags property then the the
EtA protocol gives a CCA secure encryption scheme.
\item There is a CPA-secure encryption such that for every CMA-secure MAC
the AtE protocol is not a CCA secure encryption scheme.
\item There is a CMA-secure MAC (with unique tags) such that for every CPA-secure encryption, the A\& E
protocol is not even a CPA secure encryption scheme.
\end{enumerate}
\end{theorem}
\noindent\textbf{Note:} This does not by itself mean that, say, SSL
is not secure. But it does mean that it is not \emph{generically
secure}. That is, the SSL protocol relies on specific (and not
explicitly stated) properties of the encryption scheme used.
This theorem and its proof can be found in Hugo Krawczyk's CRYPTO 2001 paper ``The order of encryption and
authentication for protecting communications (Or: how secure is SSL?)'', see \url{http://eprint.iacr.org/2001/045}.
We now sketch the proof:
\item[Item 1: EtA is CCA secure] This is basically the proof we saw last time, where we used a PRF to convert a CPA
secure encryption into a CCA secure encryption. By examining the proof, one can see that all we really used is
the fact that a PRF is a MAC to ensure that the decryption box is useless for the adversary. (We also used the
unique-tags property, but EtA will give a meaningful notion close to CCA security, namely authenticated
encryption, even if the MAC doesn't have the unique-tags property.)
\item[Item 3: E\&A is not generally secure] The idea is that a MAC does not have to preserve secrecy of the
message.
\item[Item 2: AtE is not generically secure] We'll use ``Sushant's cryptosystem''. Take any CPA secure encryption
$(E,D)$ for one bit messages. Then you can construct from it a CPA-secure encryption for $m$ bit messages by
letting $E'_k(x_1\cdots x_n) = E_k(x_1)E_k(x_2)\cdots E_k(x_n)$ (exercise). Now we can assume that the input
is encoded so that every string ends with ``$0$''. This means that given an encryption $E'(x)$, we can by
replacing the $i^{th}$ block with a copy of the $m+1^{th}$ block convert it to an encryption of $x_1\cdots
x_{i-1}0x_{i+1}\cdots x_m$. (We can also assume the string ends with $01$ and so also change the $i^{th}$ bit
to $1$, moreover, for this proof the attacker can choose to use messages that only end with $0$ or $01$ so we
don't even need this assumption.)
Now suppose we use $E'$ in the AtE setting and so we get an encryption of the form $E'(x,\Sign(x))$. If we have
access to a decryption box, we change the $i^{th}$ bit of $x$ to $0$, and see if the MAC still passes
verification. If it does, then we know that the original bit was $0$, otherwise we know that it was $1$. This
allows to launch a successful CCA attack.
(In fact, there was a successful attack against SSL of similar nature, using knowledge of whether or not the
Mac failed.)
\iffalse
\item[Construction of a CCA secure scheme.] We'll now show how to construct
a CCA-secure encryption scheme. That is, we prove the following
theorem:
\begin{theorem} Assuming Axiom 1, there exists a CCA secure
(private key) encryption scheme.
\end{theorem}
\begin{proof} The proof is actually to use the EtA construction,
assuming some extra condition on the MAC (which is satisfied by the
PRF-based construction). We say that a MAC has \emph{unique
signatures} if for every $x$ there's at most one tag $t$ such that
$\Ver_k(x,t)=1$. This is equivalent to saying that $\Ver_k(x,t)$
outputs $1$ if and only if $t=\Sign_k(x,t)$ (note that this is how
$\Ver$ worked in the PRF-based construction). Let $(\Sign,\Ver)$ be
such a MAC and let $(\Enc',\Dec')$ be a CPA-secure scheme. Our
CCA-secure scheme $(\Enc,\Dec)$ will be the following:
\begin{itemize}
\item Key: $\angles{k,k'}$ with $k,k' \getsr \bits^n$.
\item Encryption: To encrypt $x$ compute $y = \Enc_k(x)$, $t =
\Sign_{k'}(y)$ and send $\angles{y,t}$.
\item Decryption: To decrypt $\angles{y,t}$ first verify that
$\Ver_{k'}(y,t)=1$, otherwise abort (i.e., output $\bot$). If check
passes, compute $\Dec_k(y)$.
\end{itemize}
\noindent\textbf{Security:} Suppose that $A$ is a $T$-time algorithm
attacking the encryption scheme $(\Enc,\Dec)$. We'll convert $A$ to
an algorithm $A'$ that breaks the CPA-secure scheme $(\Enc,\Dec)$.
First, we need to remember what does it mean to have a CPA attack
against $(\Enc',\Dec')$. The algorithm $A'$ gets black-box access to
$\Enc'_{k}(\cdot)$ but \emph{not} to $\Dec'_k(\cdot)$. The algorithm
$A'$ will do the following:
\begin{itemize}
\item Choose $k' \getsr \bits^n$.
\item Run $A$ in ``its belly''
\item Whenever $A$ asks for an encryption of $x$, pass the request
to the encryption box $\Enc'_k$ to obtain $y=\Enc'_k(x)$, compute
$t=\Sign_{k'}(y)$ and give $\angles{y,t}$ to $A'$. Also record this
query in a table.
\item If $A$ asks for a decryption of $\angles{y,t}$ which was
previously returned to it as an encryption of $x$ then return $x$ to
$A$.
\item If $A$ asks for a decryption of $\angles{y,t}$ which was
\emph{not} previously returned to from the encryption oracle, then
check if $\Ver_{k'}(y,t)=1$. If check fails then return $\bot$ to
$A$. If check succeeds then abort the computation. In this case we
say that $A'$ failed to simulate $A$.
\item When $A$ sends the challenge $x_1,x_2$ pass it on to the
sender to obtain $y = \Enc'_k(x_i)$ and give $\angles{y,t}$ to $A$,
where $t = \Sign_{k'}(y)$.
\item When $A$ outputs a guess $j$, output the same guess $j$.
\end{itemize}
We see that the only case that $A'$ fails to simulate the CCA attack
of $A$ is when $A$ manages to produce a pair $\angles{y,t}$ such
that
\begin{enumerate}
\item $\angles{y,t}$ was \emph{not} obtained as a previous response
to a query $x$ of the encryption oracle.
\item $\angles{y,t}$ is \emph{not} the encryption of the challenge.
\item $Ver_{k'}(y,t)=1$
\end{enumerate}
However, if $A$ does that then he breaks the MAC. Indeed, because of
the unique signatures property of the MAC, Properties~1 and~2 imply
that $y$ was not previously signed by the MAC, and hence it should
not be possible for $A$ to find a $t$ such that $\Ver_{k'}(y,t)=1$.
\end{proof}
Note that the unique signatures property is indeed crucial. Suppose
that the MAC had the property that it had an ``extra unused bit''.
That is, the tag is of the form $t\circ b$ where $b$ is a single
bit, but verification only looks at the tag $t$. Thus,
$\Ver_{k'}(y,t0)=1$ if and only if $\Ver_{k'}(y,t1)=1$.
In this case the encryption scheme will be clearly \emph{not} CCA
secure. (If the adversary gets the challenge $\angles{y,t0}$ it will
give $\angles{y,t1}$ to the decryption oracle.) This sensitivity of
CCA security to extra unused bits is one reason why some people feel
CCA security is a bit too strong, but the research community has yet
to find a clean definition that is still sufficient for all the
applications of CCA security.
\fi
\item[Input length extension] We showed how to construct a PRF from every pseudorandom generator, but Practical
constructions of PRFs come from block ciphers or similar functions, that have a fixed and small block size, say
$128$ bits. On the other hand, the messages we want to sign --- say programs --- are often very large
(megabytes or even gigabytes). So, given a pseudorandom function $f_k:\bits^n\To\bits^n$, (e.g. with $n=128$)
we'd want to transform it into a PRF $g_k:\bits^*\To\bits^n$ that can take as inputs strings of arbitrary
length. Some desired properties for such a transformation are:
\begin{enumerate}
\item Security: obviously we want $\{ g_k \}$ to be PRF if $\{ f_k \}$ was. In fact for practical
applications we'd want as tight as possible reduction relating the security of $\{g_k \}$ to the
security of $\{ f_k \}$.
\item Efficiency: ideally the transformation should be very efficient. One goal is to minimize the number
of invocations of $f_k$. For starters, we might want to ensure that we use roughly $|x|/n$ invocations
to evaluate $g_k$ on $x$. (In fact, you might even be able to get away with \emph{one} invocation, as
we'll see next week.)
\item Secret key length: you want the secret key of $g$ to be not much longer than the secret key of $f$
\item Streaming: for very long messages, you often get them one block at a time, and you might not even
know the length of the message until you are done. So you should be able to compute $g_k(x)$ even if
you get only streaming access to $x$.
\item Parallelism: sometimes you might want to use hardware parallelism to compute the MAC on a large
message, so you want to be able to ensure that if you have $\ell$ CPU's working on the MAC on $x$, you
can actually compute it $\ell$ times faster.
\item Incremental: sometimes you might want the property that if you already computed a MAC $t$ of a long
message $x$, and then you modify only one block of $x$ to get $x'$, you don't have to do a long
computation to compute the MAC on $x'$.
\end{enumerate}
(There could be other requirements depending on the application.)
\item[Achieving input length extension] There is a general approach of converting a PRF $f_k:\bits^n\To\bits^n$
into a PRF mapping $\bits^*$ to $\bits^n$ in two stages:
\begin{enumerate}
\item Blockwise function: first transform $\{f_k\}$ into a PRF that works only on inputs of that are
integer multiple of $n$.
\item General function: use padding to get rid of the requirement that the input length is an integer
multiple of $n$.
\end{enumerate}
The Boneh-Shoup book describes several different approaches used to achieve these goals. We'll focus on one
elegant solution that combines two steps: PMAC to solve the first (and main) step, and CMAC to solve the
second step.
We remark that often there is an intermediate step, in which one construct a function that is blockwise and
also \emph{prefix free}. That is, security is only guaranteed if the adversary never makes two queries such
that one is a prefix of the other. The CMAC trick can be used to get rid of that condition as well.
\item[PMAC] Here's the PMAC construction (we'll actually work with a simplified variant very close to PMAC$_0$
described in Boneh-Shoup, but PMAC is basically just an optimized variant of PMAC$_0$). For simplicity we
think that $f_k$ maps $\Z_p = \{0,\ldots,p-1\}$ to $\Z_p$, where $p$ is some prime. (We can think of $p$ as
being very close to $2^n$, say $2^n-n$, and so we can embed $\Z_p$ in $\bits^n$ without much loss in
efficiency.\footnote{One subtle point is how do we ensure the \emph{output} of the PRF is in $\Z_p$, but
since in a random function the probability that a random $n$ bit string, interpreted as a number, is larger
than $2^n-n$ is negligible, we can assume this almost never happens for the PRF as well and treat this case
arbitrarily.}) For $x = x_1,\ldots,x_{\ell}$ where $x_i \in \Z_p$, we define
\[
g_{k,k',r} = f_{k'}(\sum_{i=1}^{\ell}f_k(x_i+ir))
\]
where $k,k'$ are random keys for the PRF $f_k$ and $r$ is random in $\Z_p$.
We remark that all we'll use about $\Z_p$ is that it's a field that has multiplication and addition, and
everything works the same in the finite field $GF(2^n)$ of $2^n$ elements. The latter field is more
convenient for current computer architecture, and in fact in that field addition corresponds to XOR. The
main difference between PMAC and PMAC$_0$ is that PMAC uses $GF(2^n)$
\begin{theorem} If $f_k$ was a PRF then $g_{k,k,r}$ is a PRF.
\end{theorem}
The heart of the proof is obtained by showing that the adversary has only negligible probability to succeed
in finding two inputs $x=x_1\cdots x_{\ell}$ and $x'=x'_1\cdots x'_{\ell'}$ such that
\[
\sum_{i=1}^{\ell}f_k(x_i+ir) = \sum_{i=1}^{\ell'}f_k(x'_i+ir)
\]
\item[CMAC] To get a bitwise MAC we need to use some padding to pad up the message to an integer multiple of
$n$. The simplest padding is just to pad a message that is not of length an integer multiple of $n$ with
zeroes but although this is sometimes used, this is insecure (can you see why?). A secure padding is to add
to every message a bit $1$ at the end, and then pad it with zeroes. But this means that if you have a
message that is exactly one block length, you'll need two blocks to encode it--- a 100\% overhead. The CMAC
is a clever trick to get rid of this problem by using the following \emph{randomized} padding scheme (this
is again a simplified variant of CMAC):
If $x$ is of length an integer multiple of $n$ then do nothing. Otherwise, add $1$ to $x$, pad it with
zeroes, and xor a random secret $r\in \bits^n$ to the last block. One can show that the probability that an
adversary finds two messages whose padding is the same is negligible.
\end{description}
}
\end{document}