Deep Dive Into The Threads In Windows
Windows - Thread Internals
In this article, We will look at what threads really are and some interesting techniques which are using for code injection. The topic of Thread Internals seems very complicated, but we will try to summarize in this article.
Table Of Contents
1. What Is Thread? 1.1 Components Of Threads 2. Thread Environment Block (TEB) 3. Thread Local Storage (TLS) 4. Internal Structures Of Threads 5. How CreateThread Works? 5.1 Example Usage Of CreateThread 5.2 Internals Of CreateThread 6. Applications 6.1 Thread Hijacking 6.2 Thread Stack Spoofing 7. References
What Is Thread?
Thread is an entity within a process that Windows schedules for execution. Windows creates main thread within related process when a process is run. Threads enable to do more than one job in the same process.
Components Of Threads
Threads consist of several components:
- CPU registers representing the state of the processor
- Two stacks (user-mode stack & kernel-mode stack)
- A private storage area called Thread Local Storage (TLS)
- A unique identifier called Thread ID (we will see this as TID)
In addition, Threads may have their own security context or token. We will mention this in the following sections.
Thread Environment Block (TEB)
Thread Environment Block (TEB) is a structure that stores context information for threads in user mode (a thread that is not in user mode has no TEB structure) and dll files. Because these components run in user mode, they need a data structure writable from user mode. That’s why this structure exists in the process address space instead of in the system space, where it would be writable only from kernel mode. The general purpose of this structure is to make all thread-related information, process information and loaded DLLs available to all threads. It is also known as Thread Information Block (TIB).
typedef struct _TEB {
NT_TIB Tib;
PVOID EnvironmentPointer;
CLIENT_ID Cid;
PVOID ActiveRpcInfo;
PVOID ThreadLocalStoragePointer;
PPEB Peb;
ULONG LastErrorValue;
ULONG CountOfOwnedCriticalSections;
PVOID CsrClientThread;
PVOID Win32ThreadInfo;
ULONG Win32ClientInfo[0x1F];
PVOID WOW32Reserved;
ULONG CurrentLocale;
ULONG FpSoftwareStatusRegister;
PVOID SystemReserved1[0x36];
PVOID Spare1;
ULONG ExceptionCode;
ULONG SpareBytes1[0x28];
PVOID SystemReserved2[0xA];
ULONG GdiRgn;
ULONG GdiPen;
ULONG GdiBrush;
CLIENT_ID RealClientId;
PVOID GdiCachedProcessHandle;
ULONG GdiClientPID;
ULONG GdiClientTID;
PVOID GdiThreadLocaleInfo;
PVOID UserReserved[5];
PVOID GlDispatchTable[0x118];
ULONG GlReserved1[0x1A];
PVOID GlReserved2;
PVOID GlSectionInfo;
PVOID GlSection;
PVOID GlTable;
PVOID GlCurrentRC;
PVOID GlContext;
NTSTATUS LastStatusValue;
UNICODE_STRING StaticUnicodeString;
WCHAR StaticUnicodeBuffer[0x105];
PVOID DeallocationStack;
PVOID TlsSlots[0x40];
LIST_ENTRY TlsLinks;
PVOID Vdm;
PVOID ReservedForNtRpc;
PVOID DbgSsReserved[0x2];
ULONG HardErrorDisabled;
PVOID Instrumentation[0x10];
PVOID WinSockData;
ULONG GdiBatchCount;
ULONG Spare2;
ULONG Spare3;
ULONG Spare4;
PVOID ReservedForOle;
ULONG WaitingOnLoaderLock;
PVOID StackCommit;
PVOID StackCommitMax;
PVOID StackReserved;
} TEB, *PTEB;
The TEB address can be obtained by accessing the FS Register
(It has no processor-defined purpose, but instead are given purpose by the OS's running them). For Example:
void *getTIB() {
#ifdef _M_IX86
return (void *)__readfsdword(0x18);
#elif _M_AMD64
return (void *)__readgsqword(0x30);
#else
#error unsupported architecture
#endif
}
Thread Local Storage (TLS)
Thread Local Storage (TLS) is a mechanism in computer programming that allows each thread of a multithreaded program to have its own private storage for variables, without the need for synchronization mechanisms such as locks or mutexes. Imagine a multithreaded process, Each thread of this process shares same virtual address space which means that any variable can be accessed by any thread. TLS provides a solution for this problem by create a private storage space for each thread. After allocating space with the TlsAlloc
function, we can assign a variable with the TlsSetValue
function and access that variable with the TlsGetValue
function. usage
Internal Structures of Threads
Each Windows Thread is represented by an executive thread object. Executive thread object called ETHREAD Structure consists of many objects, including the KTHREAD structure. Here are some important objects within the ETHREAD structure:
0:000> dt nt!_ethread
ntdll!_ETHREAD
+0x000 Tcb : _KTHREAD
+0x480 CreateTime : _LARGE_INTEGER
+0x488 ExitTime : _LARGE_INTEGER
+0x488 KeyedWaitChain : _LIST_ENTRY
+0x498 PostBlockList : _LIST_ENTRY
+0x498 ForwardLinkShadow : Ptr64 Void
+0x4a0 StartAddress : Ptr64 Void
+0x4a8 TerminationPort : Ptr64 _TERMINATION_PORT
+0x4a8 ReaperLink : Ptr64 _ETHREAD
+0x4a8 KeyedWaitValue : Ptr64 Void
+0x4b0 ActiveTimerListLock : Uint8B
+0x4b8 ActiveTimerListHead : _LIST_ENTRY
+0x4c8 Cid : _CLIENT_ID
+0x4d8 KeyedWaitSemaphore : _KSEMAPHORE
+0x4d8 AlpcWaitSemaphore : _KSEMAPHORE
+0x4f8 ClientSecurity : _PS_CLIENT_SECURITY_CONTEXT
+0x500 IrpList : _LIST_ENTRY
+0x510 TopLevelIrp : Uint8B
+0x518 DeviceToVerify : Ptr64 _DEVICE_OBJECT
+0x520 Win32StartAddress : Ptr64 Void
.....
+0x538 ThreadListEntry : _LIST_ENTRY
+0x548 RundownProtect : _EX_RUNDOWN_REF
+0x550 ThreadLock : _EX_PUSH_LOCK
+0x558 ReadClusterSize : Uint4B
+0x55c MmLockOrdering : Int4B
+0x560 CrossThreadFlags : Uint4B
+0x560 Terminated : Pos 0, 1 Bit
+0x560 ThreadInserted : Pos 1, 1 Bit
+0x560 HideFromDebugger : Pos 2, 1 Bit
+0x560 ActiveImpersonationInfo : Pos 3, 1 Bit
+0x560 HardErrorsAreDisabled : Pos 4, 1 Bit
+0x560 BreakOnTermination : Pos 5, 1 Bit
+0x560 SkipCreationMsg : Pos 6, 1 Bit
+0x560 SkipTerminationMsg : Pos 7, 1 Bit
.....
The first member of the ETHREAD is Thread Control Block (TCB) which is a structure of type KTHREAD. Following that are the informations of current thread, security information in the form of a pointer to the access token and impersonation information, fields relating to Asynchronous Local Procedure Call (ALPC) messages, pending I/O requests (IRPs), CPU Sets. KTHREAD (Kernel Thread Structure):
0:000> dt nt!_kthread
ntdll!_KTHREAD
+0x000 Header : _DISPATCHER_HEADER
+0x018 SListFaultAddress : Ptr64 Void
+0x020 QuantumTarget : Uint8B
+0x028 InitialStack : Ptr64 Void
+0x030 StackLimit : Ptr64 Void
+0x038 StackBase : Ptr64 Void
+0x040 ThreadLock : Uint8B
+0x048 CycleTime : Uint8B
+0x050 CurrentRunTime : Uint4B
+0x054 ExpectedRunTime : Uint4B
+0x058 KernelStack : Ptr64 Void
....
+0x078 ThreadFlags : Int4B
+0x07c Tag : UChar
+0x07d SystemHeteroCpuPolicy : UChar
+0x07e UserHeteroCpuPolicy : Pos 0, 7 Bits
+0x07e ExplicitSystemHeteroCpuPolicy : Pos 7, 1 Bit
+0x07f Spare0 : UChar
+0x080 SystemCallNumber : Uint4B
+0x084 ReadyTime : Uint4B
+0x088 FirstArgument : Ptr64 Void
+0x090 TrapFrame : Ptr64 _KTRAP_FRAME
+0x098 ApcState : _KAPC_STATE
+0x098 ApcStateFill : [43] UChar
+0x0c3 Priority : Char
+0x0c4 UserIdealProcessor : Uint4B
+0x0c8 WaitStatus : Int8B
+0x0d0 WaitBlockList : Ptr64 _KWAIT_BLOCK
+0x0d8 WaitListEntry : _LIST_ENTRY
+0x0d8 SwapListEntry : _SINGLE_LIST_ENTRY
+0x0e8 Queue : Ptr64 _DISPATCHER_HEADER
+0x0f0 Teb : Ptr64 Void
+0x0f8 RelativeTimerBias : Uint8B
+0x100 Timer : _KTIMER
+0x140 WaitBlock : [4] _KWAIT_BLOCK
+0x140 WaitBlockFill4 : [20] UChar
+0x154 ContextSwitches : Uint4B
+0x140 WaitBlockFill5 : [68] UChar
+0x184 State : UChar
....
+0x21c QueuePriority : Int4B
+0x220 Process : Ptr64 _KPROCESS
....
How CreateThread Works?
Example Usage Of CreateThread
After discussing about internal structures of threads, let's take a look at CreateThread function that use for creating thread in user-mode (current process). Of course, I will create a project in Visual Studio so I can analyze it better.
HANDLE CreateThread(
[in, optional] LPSECURITY_ATTRIBUTES lpThreadAttributes,
[in] SIZE_T dwStackSize,
[in] LPTHREAD_START_ROUTINE lpStartAddress,
[in, optional] __drv_aliasesMem LPVOID lpParameter,
[in] DWORD dwCreationFlags,
[out, optional] LPDWORD lpThreadId
);
#include <stdio.h>
#include <windows.h>
DWORD WINAPI ThreadFunc1(LPVOID lpParam);
DWORD WINAPI ThreadFunc2(LPVOID lpParam);
int main()
{
HANDLE hThread1, hThread2;
DWORD threadID1, threadID2;
hThread1 = CreateThread(NULL, 0, ThreadFunc1, NULL, 0, &threadID1);
printf("%d -- 1\n", threadID1);
hThread2 = CreateThread(NULL, 0, ThreadFunc2, NULL, 0, &threadID2);
printf("%d -- 2\n", threadID2);
WaitForSingleObject(hThread1, INFINITE);
WaitForSingleObject(hThread2, INFINITE);
getchar();
CloseHandle(hThread1);
CloseHandle(hThread2);
return 0;
}
DWORD WINAPI ThreadFunc1(LPVOID lpParam)
{
printf("Thread 1 is working\n");
return 0;
}
DWORD WINAPI ThreadFunc2(LPVOID lpParam)
{
printf("Thread 2 is working\n");
return 0;
}
This small program is creating two threads that print the current thread working. You can add SuspendThread(hThreadX) after the thread finishes its work, since it will not be visible on Process Hacker.
hThread1 = CreateThread(NULL, 0, ThreadFunc1, NULL, 0, &threadID1);
printf("%d -- 1\n", threadID1);
hThread2 = CreateThread(NULL, 0, ThreadFunc2, NULL, 0, &threadID2);
printf("%d -- 2\n", threadID2);
SuspendThread(hThread1);
SuspendThread(hThread2);
Internals Of CreateThread
What happened when CreateThread was called? In this section, we will answer this question.
Here is the asm codes of CreateThread function:
mov r11, rsp
sub rsp, 48h
mov r10d, [rsp+48h+dwCreationFlags]
mov rax, [rsp+48h+lpThreadId]
and r10d, 10004h
mov [r11-10h], rax
and qword ptr [r11-18h], 0
mov [r11-20h], r10d
mov [r11-28h], r9
mov r9, r8 ; lpStartAddress
mov r8, rdx ; dwStackSize
mov rdx, rcx ; lpThreadAttributes
or rcx, 0FFFFFFFFFFFFFFFFh ; hProcess
cal cs:CreateRemoteThreadEx_0
nop dword ptr [rax+rax+00h]
add rsp, 48h
retn
and pseudo-codes:
HANDLE __stdcall CreateThread(
LPSECURITY_ATTRIBUTES lpThreadAttributes,
SIZE_T dwStackSize,
LPTHREAD_START_ROUTINE lpStartAddress,
LPVOID lpParameter,
DWORD dwCreationFlags,
LPDWORD lpThreadId)
{
return CreateRemoteThreadEx_0(
(HANDLE)0xFFFFFFFFFFFFFFFFi64,
lpThreadAttributes,
dwStackSize,
lpStartAddress,
lpParameter,
dwCreationFlags & 0x10004,
0i64,
lpThreadId);
}
When we start the CreateRemoteThreadEx function in KernelBase.dll
on IDA, we can see a lot of NT queries. I will mention them.
Process manager allocates space for a thread object and calls the kernel to initialize KTHREAD. This function eventually call CreateRemoteThreadEx
.
- It converts the Windows API parameters to native flags
- It builds an attribute list with two entries: client ID and TEB address.
- Since the
CreateRemoteThreadEx
function is also used byCreateRemoteThread
, process information is extracted from the data sent by the handle. If it is equal to the data returned fromGetCurrentProcess
(-1), it is the same process. Obtains information using NtQueryInformationProcess in case the process handle is not equal to -1. - Because of to make the transition to the executive in kernel mode, It calls
NtCreateThreadEx
(ntdll) function that has same parameters.NtCreateThreadEx
(inside the executive) creates and initializes the user-mode thread context and then callsPspCreateThread
to create a suspended executive thread object. Then the function returns, eventually ending back in user mode atCreateRemoteThreadEx
.
PspCreateThread
: In fact, until this stage, the thread object has been created on the kernel-mode side, but there is no thread created on the user-mode side. In fact, until this stage, the thread object has been created on the kernel-mode side, but there is no thread created on the user-mode side. ThePspCreateThread
function is responsible for creating the thread, together with two helper functions (PspAllocateThread
andPspInsertThread
).PspAllocateThread
creates the object, thenPspInsertThread
sets the security attributes and makes the thread object executable using theKeStartThread
call.
- Unless the caller created the thread with the
CREATE_SUSPENDED
flag set, the thread is now resumed so that it can be scheduled for execution. When the thread starts running, it executes the a few steps that I'll mention before calling the actual user’s specified start address.
- The newly created thread comes to life by running
KiStartUserThread
. It sends the startup function specified by the user as a parameter to thePspUserThreadStartup
function after completing IRQL operations.
- The thread handle and the thread ID are returned to the caller.
Applications
Thread Hijacking
This method is to redirect the instruction pointer (rip register) of an existing thread to the start address of the injected shellcode.
PoC and Technical Details:
- Identify the Target: Once the target process is identified, the
CreateToolhelp32Snapshot
function is used to take a snapshot of the process and the target thread is identified. - Thread functionality is suspended using
OpenThread
andSuspendThread
. The current state of the thread is obtained withGetThreadContext
. - The shellcode to be used is injected using
VirtualAlloc
andWriteProcessMemory
functions. - The instruction pointer is synchronized to the shellcode address using the
SetThreadContext
function and resumed withResumeThread
.
Thread Stack Spoofing
The purpose of this method is to hide references to malicious code in the thread call stack. According to the way the technique works, it can be interpreted as an abusing of Stack Unwinding
. After hooking a call used in shellcode, 0 is overwritten to the return address on the thread call stack and after the actual function completes its function, the original return address is overwritten and the function exits. The reason for setting the return value to 0 is to end the call stack. The example in repo:
void WINAPI MySleep(DWORD dwMilliseconds)
{
// Locate this stack frame's return address.
auto overwrite = (PULONG_PTR)_AddressOfReturnAddress();
const auto origReturnAddress = *overwrite;
// By overwriting the return address with 0 we're basically telling call stack unwinding algorithm
// to stop unwinding call stack any further, as there further frames. This we can hide our remaining stack frames
// referencing shellcode memory allocation from residing on a call stack.
*overwrite = 0;
// Perform sleep emulating originally hooked functionality.
::SleepEx(dwMilliseconds, false);
// Restore original thread's call stack.
*overwrite = origReturnAddress;
}