Introduction
In the lab, we will learn how to perform control hijacking attacks by exploiting buffer overflows bugs. Control hijacking refers to an attack that attempts to divert the control flow of a program away from normal execution to execute arbitrary attack code (typically, dropping into a root shell). Each one of the provided programs contains a security vulnerability that you must exploit to hijack the control flow of the program and satisfy the requirements of each part.
A Statement on Ethics and Responsibility
In this lab, you will be learning about and performing control flow hijacking attacks on sample controlled piece of code. Performing the same attacks outside of the controlled environment that we provide you with is against the law and Rose-Hulman policies, and may result in fines, expulsions, and even jail time. You SHALL not attack anyone else’s machine without prior authorization and in a controlled environment.
Learning Objectives
At the end of this lab, you should be able to:
- Define control flow hijacking attacks.
- Implement different types of exploits to cause unintended consequences when running vulnerable code.
- Identify defenses against control flow hijacking attacks, and then subvert them.
- Write better code and avoid buffer overflow bugs.
Getting the Source Code
We will do this lab in the main
branch of your labs repository. To make sure
you are on the right branch, check it out using:
$ git branch
The branch you are currently on will be highlighted for you (with a * next to its name).
If you are working on the main
or master
branch, then follow these
instructions:
$ git fetch upstream
$ git pull upstream main
At this stage, you should have the latest copy of the code, and you are good to
get started. The starter code is contained under the stack_smashing/
directory.
If you are currently on a different branch (say you are still on
clab_solution
from a previous lab), then we need to switch to main
or
master
(depending on your default’s name).
First, add, commit, and push your changes to the clab_solution
to make sure
you do not lose any progress you did on the last lab. To check the status of
your current branch, you can use:
$ git status
This will show you all the files you have modified and have not yet committed
and pushed. Make sure you add
those files, then commit
your changes, and
push
them.
If git push
complains about not knowing where to push, you’d want to push the
current branch you are on. So for example, if I am working on clab_solution
,
then I’d want to do git push origin clab_solution
.
Now, you are ready to swap back into main
(or master
).
$ git checkout main
Then, grab the latest changes using:
$ git fetch upstream
$ git pull upstream main
At this stage, you should have the latest copy of the code, and you are good to
get started. The starter code is contained under the stack_smashing/
directory.
Installing Needed Software
To make things a bit simpler, we will run 32-bits version of the code that we provide you with. To do so, you should install some software first to cross compile 32 bit applications on a 64 bit machine. To do so, run
sudo apt install -y gdb gdb-multiarch gcc-multilib python2
sudo apt install -y gcc-riscv64-linux-gnu
(The reason we run both steps is because xv6 needs gcc-riscv. If you run these instructions in this order, both this lab and xv6 will work.)
Also, install gef
on top of gdb
, it will make your life a lot easier.
bash -c "$(curl -fsSL https://gef.blah.cat/sh)"
A Note on Python
Our scripts in this lab run on python2
(some day we will upgrade those to
python3
, but not today). The commands above install python2
on your machine.
You can use your favorite scripting language to write these solutions. For me,
python2
was the most comfortable since it was easy to print weird characters
(we’re going to need that a lot in this lab).
A Note About WSL2
It seems that WSL1 on Windows is not able to run 32 bit applications. To be able
to do this lab, you must be running on WSL2. To upgrade from WSL1 to WSL2,
follow the instructions
here and
here.
If running the wsl
command from Powershell does not work, then you are running
an older version of Windows, and you need to manually upgrade WSL by following
the instructions
here.
Note that you can still run Linux virtual machines on WSL1 if you require them,
you can use the command wsl --set-version <vm-name> 1
from Powershell, where
<vm-name>
is the name of the distribution that you would like to run on WSL1.
Prelude
PLEASE READ THE NOTE BELOW, your code will not work if you do not do the below two steps.
Generating Your Cookie
Before you start this assignment, you must run the cookie generation script to generate your own unique ID. We use this ID in our code to make sure that your solution will only work for you, and for none of your classmates.
To generate your cookie, run the setcookie.py
script and pass it your Rose
email ID (without the @rose-hulman.edu
). For example, running the script for
myself would be:
$ ./setcookie.py noureddi
This will generate a file name cookie
in your lab directory. You need that
file to be present at all times with solving this lab.
Please make sure that you use your usename as input to the script. That is what we will be using when grading the lab. If you use the wrong value, all your solution will break and you will receive no credit for any of the parts.
Disabling Kernel Protection
Every time you want to work on this lab, you must turn the Linux kernel’s virtual memory protection scheme (called Address Space Layout Randomization (ASLR)). To do so, run the following script:
$ ./disable_aslr.sh
Once you are done with this lab, please reenable ASLR using the script:
$ ./enable_aslr.sh
Some gef
commands
Here are a bunch of useful gef
and gdb
commands:
-
next
orn
: execute the current line and move to the next line. Note that this jumps over function calls. -
step
ors
: step into a function. Uses
if you would want to move execution into a function to debug further. I do not recommend usingstep
forlibc
functions for this lab. -
nexti
orni
: execute the current instruction and move to the next one. -
stepi
orsi
: step into the first instruction of a function call. -
context
: redisplay the currentgef
context. -
context stack
: dump the stack on the screen. -
gef config context.grow_stack_down True
: make the stack grow downward instead of upward. I recommend using this setting as it is easier to visualize. -
For a list of useful
gef
commands, see the documentation
Part 1 🌶️
Note that we did this part together in class, so you can simply just copy the solution we did together. This is to reward you for attending class.
In this part, we will start off easy by overwriting a variable on the stack.
Take a look at the source code in part1.c
, compile it using make
and run the
program. The program will wait for you to enter your name and then will print
the following:
$ ./part1
mohammad
Hi mohammad you are a wonderful person!.
Your job is to pass an input to this program such that it will change the
output of the program. We will create our input using a python
script which
we will then redirect to the standard input for this program. When successfully
completing this part, you should see something that looks like the following
$ python2 part1.py | ./part1
Hi mohammad you are pwnd!.
Note that mohammad
should be replaced with your own name!
Here are some suggested steps that you might take:
- Examine
part1.c
, where is the buffer overflow? - Start
part1
ingdb
and disassemble the_main
function. Identify the function calls and their arguments and local variables. - Draw a picture of the stack, where is the variable that you will overflow and where is your target? How are they stored relative to each other?
- How can your input affect the values of other variables in the code. Test your thought process by trying different inputs to the program.
Submission
Submit a python program called part1.py
that prints the line that you must
pass to your program to cause the overwriting of the intended variable. To test
your program, you can write
$ python2 part1.py | ./part1
Hint: To write a string that contains non-printable ASCII characters in
python, you can supply your code with the hexadecimal value of the byte you
intend to print. For example, to print byte 0xf6
, you can use "\xf6"
as part
of your python string.
Hint: To repeat a character “X” n
times in python, you can use "X"*n
.
Part 2 🌶️
Note that we did this part together in class, so you can simply just copy the solution we did together. This is to reward you for attending class.
In this part, instead of overwriting a variable, we will overwrite the return
address of a function so that it goes and executes a piece of code that we
select (which the program originally was not intended to execute). Take a look
at part2.c
and try it out:
$ ./part2
mohammad
Have a nice day =)
Your job is to overwrite the return address of a function that you must
identify, in order to cause the program to execute the function
print_bad_outcome
instead of executing print_good_outcome
. By successfully
completing this part, you should see something like the following
$ python2 part2.py | ./part2
You are pwnd!
Here’s a suggested way to approach this problem
- Examine
part2.c
, where is the buffer overflow? - Start
part2
ingdb
and find the starting address ofprint_bad_outcome
. To do so, you can use(gdb) info address print_bad_outcome
. Record that address in your notes. - Set a breakpoint at the vulnerable function. To set a breakpoint at a
specific machine address, you can use
(gdb) b *(0xDEADBEEF)
where0xDEADBEEF
is the address you want to break at.- Run the program using
run
to reach the breakpoint.
- Run the program using
- Disassemble the vulnerable function and draw its stack. Where is the
vulnerable buffer stored?
gef
will prove to be very useful here. - What should the value of the return address of the vulnerable function be (i.e., where should this function return to)?
- Examine the return address of the vulnerable function.
- What should your input be to overwrite the return address?
Submission
Submit a python program called part2.py
that prints the line that you must
pass to your program to cause the overwrite of the return address. To test your
program, use
$ python2 part2.py | ./part2
Hint: When developing your solution, you might find it useful to examine the hex content of your input string, to do so use:
python2 part2.py | hd
Hint: As mentioned in class, you might find the struct
package in python
very useful.
Part 3 🌶️
In this part, we will start the actual fun of redirecting the user’s program to execute a system call that opens a shell for us to use at we please. This program takes its input as a command line argument, rather than through the standard input.
To help you out, we have provided you with shellcode.py
that contains
machine instructions that, when executed, will open a shell. Therefore, placing
this piece of code in memory and redirecting the execution to the start of these
instructions will cause the program to open a shell.
Your job is to exploit the part3
binary to open a root shell when executed.
Here is an example output
$ sudo ./part3 "$(python2 part3.py)"
# whoami
root
#
Note that the quote ""
around the python2
command are necessary to avoid
the case where what you are writing contains a space character.
Here is a suggested approach:
- Examine
part3.c
, where is the buffer overflow? - Write a python program, called
part3.py
and make it print the shell as followsfrom shellcode import shellcode print shellcode
- Start up
gdb
with the program arguments as follows:$ gdb --args ./part3 $(python2 part3.py)
- Set a breakpoint at the vulnerable function and start the program.
- Disassemble the vulnerable function, where is the starting address of the buffer?
- Identify the instruction that follows the call to
strcpy
and set a breakpoint there, then continue execution(gdb) b *(<address of instruction>) (gdb) c
- Examine the content of the buffer, does it now contain the shell code? To do
so, you can use the following gdb instruction
(gdb) x/32bx 0x<address>
. - Disassemble the shellcode using
(gdb) disassemble/r 0x<address>,+32
. What is it doing? Note the call toint
and check what it is doing. - Modify
part3.py
so that the input overflows the buffer and causes the program to run the shellcode.
Submission
Submit a python program called part3.py
that prints the line that you must
pass as an argument to your program to create the exploit. To test your code,
use
$ sudo ./part3 "$(python2 part3.py)"
#
If you are successful, you should see a command prompt starting with the
character #
. Typing whoami
should print root
as shown in the example
above.
Part 4 🌶️🌶️
In this part, the developer has realized the errors of their ways, and decided
to use a safer function called strncpy
(check the manpage for strncpy
if you
are not sure what the difference between it and strcpy
is). Therefore, the
exploit technique we used in the previous part no longer works.
Lucky for us, the developer miscalculated the size of the buffer. Hopefully this will help you figure out another way to redirect the program’s control flow to execute our shell code.
Submission
Create a python program, called part4.py
, that prints the line you must pass
as an argument to your part4
to cause the creation of a root shell. To test
your code, use the following:
$ sudo ./part4 "$(python2 part4.py)"
#
Part 5 🌶️🌶️
In this part, the developers are really on our tail, they have enabled data execution prevention (DEP) in the compiler so that no code from the stack can be executed.
This part resembles part 3, except that now the program will not execute any code from the stack, which renders our previous solution to part 3 obsolete. You can still overflow a buffer and overwrite the return address, but you can’t execute any code on the stack or heap (so maybe there’s another place to execute from?).
Your task is the find a way to hijack the control flow of the program in a way
that causes it to start a bash shell. Check out the manpage for the system()
function call, and try to pass /bin/sh
to system
and see what happens.
Submission
Create a python program, called part5.py
, that prints the line you must pass
as an argument to your part5
to cause the creation of a root shell. To test
your code and avoid issues with special characters, use the following:
$ sudo ./part5 "$(python2 part5.py)"
#
Note that you will loose some points if your exploit generates error messages. We strive to have clean exploits that are not easily detected.
Hint
Some binaries contains a lot of constant strings that you did not even create.
To find such strings, you can use the strings
Unix utility (surprise!) as
follows:
$ strings part5
For example, if you want to find the string /bin/sh
in your binary, you can
use:
$ strings part5 | grep /bin/sh
Furthermore, you can ask strings
to print you the offset of the starting byte
of this string in memory as follows:
$ strings -t x part5 | grep /bin/sh
6d871 /bin/sh
This means that the start of my /bin/sh
string in memory is 0x6d871
bytes
away from the start of my process’s memory space. (Note that the offset in your
case might be different than mine).
To find the actual address of /bin/sh
in memory, we will have to use gdb
.
First load the program using gdb --args part5 abc
.
Then, start the program using:
gef> start
After that, check the starting address of our memory address space:
gef> info proc mappings
Start Addr End Addr Size Offset objfile
0x8048000 0x8049000 0x1000 0x0 ...
0x8049000 0x80b5000 0x6c000 0x1000 ...
0x80b5000 0x80e4000 0x2f000 0x6d000 ...
0x80e5000 0x80e7000 0x2000 0x9c000 ...
0x80e7000 0x80e9000 0x2000 0x9e000 ...
0x80e9000 0x810c000 0x23000 0x0 [heap]
0xf7ff9000 0xf7ffc000 0x3000 0x0 [vvar]
0xf7ffc000 0xf7ffe000 0x2000 0x0 [vdso]
0xfffdd000 0xffffe000 0x21000 0x0 [stack]
You can see that the starting address in my case is 0x8048000
. Next, we can
add the offset we obtained above and find the address of /bin/sh
as follows:
gef> x/x 0x8048000 + 0x6d871
0x80b5871: 0x6e69622f
To see the string in memory, you can change the print format as follows:
gef> x/s 0x8048000 + 0x6d871
0x80b5871: "/bin/sh"
Part 6
Ignore Part 6. Skip to Part 8.
Part 7
Ignore Part 7. Skip to Part 8.
Part 8 🌶️🌶️🌶️
At the start of this lab, we asked you to turn off ASLR to make your exploits feasible. If enabled, ASLR will make exploiting a buffer overflow really hard since the position of the stack is changed on each execution. To give you an idea of what that looks like, we will simulate ASLR, but not really turn it on.
This part is very similar to part 3. However, the stack position is randomly
changed every time you run this code. Try it out in gdb
and check out the
start of the vulnerable_fn
’s stack, it should change on every run. The caveat
is, unlike ASLR, we are only modifying the stack position by a bounded
random offset.
Your job is to write an exploit that can open a root shell every time you run this program, despite the presence of our little randomization.
Submission
Create a python program, called part8.py
, that prints the line you must pass
as an argument to your part8
to cause the creation of a root shell. To test
your code, use the following:
$ sudo ./part8 "$(python2 part8.py)"
#
Some Hints
In x86, the NOP instruction (opcode 0x90
) can be used to simply advance the
program counter without really doing anything else. You might find the NOP
instructions very useful in this part. Maybe spraying some NOPs here and there
would be a useful idea?
Submission
Submit all of your python (or any other scripting language) scripts to
Gradescope. Do not submit your cookie
file or any
of the binaries.
Rubric
Part | Points |
---|---|
Part 1 | 10 |
Part 2 | 10 |
Part 3 | 15 |
Part 4 | 20 |
Part 5 | 20 |
Part 6 or Part 7 or Part 8 | 25 |