Deep Dive Into The Threads In Windows

Deep Dive Into The Threads In Windows

Windows - Thread Internals

In this article, We will look at what threads really are and some interesting techniques which are using for code injection. The topic of Thread Internals seems very complicated, but we will try to summarize in this article.

Table Of Contents

1. What Is Thread?
    1.1 Components Of Threads
2. Thread Environment Block (TEB)
3. Thread Local Storage (TLS)
4. Internal Structures Of Threads
5. How CreateThread Works?
  5.1 Example Usage Of CreateThread
  5.2 Internals Of CreateThread
6. Applications
  6.1 Thread Hijacking
  6.2 Thread Stack Spoofing
7. References

What Is Thread?

Thread is an entity within a process that Windows schedules for execution. Windows creates main thread within related process when a process is run. Threads enable to do more than one job in the same process.

Components Of Threads

Threads consist of several components:

  • CPU registers representing the state of the processor
  • Two stacks (user-mode stack & kernel-mode stack)
  • A private storage area called Thread Local Storage (TLS)
  • A unique identifier called Thread ID (we will see this as TID)

In addition, Threads may have their own security context or token. We will mention this in the following sections.

Thread Environment Block (TEB)

Thread Environment Block (TEB) is a structure that stores context information for threads in user mode (a thread that is not in user mode has no TEB structure) and dll files. Because these components run in user mode, they need a data structure writable from user mode. That’s why this structure exists in the process address space instead of in the system space, where it would be writable only from kernel mode. The general purpose of this structure is to make all thread-related information, process information and loaded DLLs available to all threads. It is also known as Thread Information Block (TIB).

typedef struct _TEB {
  NT_TIB          Tib;
  PVOID           EnvironmentPointer;
  CLIENT_ID       Cid;
  PVOID           ActiveRpcInfo;
  PVOID           ThreadLocalStoragePointer;
  PPEB            Peb;
  ULONG           LastErrorValue;
  ULONG           CountOfOwnedCriticalSections;
  PVOID           CsrClientThread;
  PVOID           Win32ThreadInfo;
  ULONG           Win32ClientInfo[0x1F];
  PVOID           WOW32Reserved;
  ULONG           CurrentLocale;
  ULONG           FpSoftwareStatusRegister;
  PVOID           SystemReserved1[0x36];
  PVOID           Spare1;
  ULONG           ExceptionCode;
  ULONG           SpareBytes1[0x28];
  PVOID           SystemReserved2[0xA];
  ULONG           GdiRgn;
  ULONG           GdiPen;
  ULONG           GdiBrush;
  CLIENT_ID       RealClientId;
  PVOID           GdiCachedProcessHandle;
  ULONG           GdiClientPID;
  ULONG           GdiClientTID;
  PVOID           GdiThreadLocaleInfo;
  PVOID           UserReserved[5];
  PVOID           GlDispatchTable[0x118];
  ULONG           GlReserved1[0x1A];
  PVOID           GlReserved2;
  PVOID           GlSectionInfo;
  PVOID           GlSection;
  PVOID           GlTable;
  PVOID           GlCurrentRC;
  PVOID           GlContext;
  NTSTATUS        LastStatusValue;
  UNICODE_STRING  StaticUnicodeString;
  WCHAR           StaticUnicodeBuffer[0x105];
  PVOID           DeallocationStack;
  PVOID           TlsSlots[0x40];
  LIST_ENTRY      TlsLinks;
  PVOID           Vdm;
  PVOID           ReservedForNtRpc;
  PVOID           DbgSsReserved[0x2];
  ULONG           HardErrorDisabled;
  PVOID           Instrumentation[0x10];
  PVOID           WinSockData;
  ULONG           GdiBatchCount;
  ULONG           Spare2;
  ULONG           Spare3;
  ULONG           Spare4;
  PVOID           ReservedForOle;
  ULONG           WaitingOnLoaderLock;
  PVOID           StackCommit;
  PVOID           StackCommitMax;
  PVOID           StackReserved;

The TEB address can be obtained by accessing the FS Register (It has no processor-defined purpose, but instead are given purpose by the OS's running them). For Example:

void *getTIB() {
#ifdef _M_IX86
    return (void *)__readfsdword(0x18);
#elif _M_AMD64
    return (void *)__readgsqword(0x30);
#error unsupported architecture

Thread Local Storage (TLS)

Thread Local Storage (TLS) is a mechanism in computer programming that allows each thread of a multithreaded program to have its own private storage for variables, without the need for synchronization mechanisms such as locks or mutexes. Imagine a multithreaded process, Each thread of this process shares same virtual address space which means that any variable can be accessed by any thread. TLS provides a solution for this problem by create a private storage space for each thread. After allocating space with the TlsAlloc function, we can assign a variable with the TlsSetValue function and access that variable with the TlsGetValue function. usage

Internal Structures of Threads

Each Windows Thread is represented by an executive thread object. Executive thread object called ETHREAD Structure consists of many objects, including the KTHREAD structure. Here are some important objects within the ETHREAD structure:

0:000> dt nt!_ethread
   +0x000 Tcb              : _KTHREAD
   +0x480 CreateTime       : _LARGE_INTEGER
   +0x488 ExitTime         : _LARGE_INTEGER
   +0x488 KeyedWaitChain   : _LIST_ENTRY
   +0x498 PostBlockList    : _LIST_ENTRY
   +0x498 ForwardLinkShadow : Ptr64 Void
   +0x4a0 StartAddress     : Ptr64 Void
   +0x4a8 TerminationPort  : Ptr64 _TERMINATION_PORT
   +0x4a8 ReaperLink       : Ptr64 _ETHREAD
   +0x4a8 KeyedWaitValue   : Ptr64 Void
   +0x4b0 ActiveTimerListLock : Uint8B
   +0x4b8 ActiveTimerListHead : _LIST_ENTRY
   +0x4c8 Cid              : _CLIENT_ID
   +0x4d8 KeyedWaitSemaphore : _KSEMAPHORE
   +0x4d8 AlpcWaitSemaphore : _KSEMAPHORE
   +0x4f8 ClientSecurity   : _PS_CLIENT_SECURITY_CONTEXT
   +0x500 IrpList          : _LIST_ENTRY
   +0x510 TopLevelIrp      : Uint8B
   +0x518 DeviceToVerify   : Ptr64 _DEVICE_OBJECT
   +0x520 Win32StartAddress : Ptr64 Void
   +0x538 ThreadListEntry  : _LIST_ENTRY
   +0x548 RundownProtect   : _EX_RUNDOWN_REF
   +0x550 ThreadLock       : _EX_PUSH_LOCK
   +0x558 ReadClusterSize  : Uint4B
   +0x55c MmLockOrdering   : Int4B
   +0x560 CrossThreadFlags : Uint4B
   +0x560 Terminated       : Pos 0, 1 Bit
   +0x560 ThreadInserted   : Pos 1, 1 Bit
   +0x560 HideFromDebugger : Pos 2, 1 Bit
   +0x560 ActiveImpersonationInfo : Pos 3, 1 Bit
   +0x560 HardErrorsAreDisabled : Pos 4, 1 Bit
   +0x560 BreakOnTermination : Pos 5, 1 Bit
   +0x560 SkipCreationMsg  : Pos 6, 1 Bit
   +0x560 SkipTerminationMsg : Pos 7, 1 Bit

The first member of the ETHREAD is Thread Control Block (TCB) which is a structure of type KTHREAD. Following that are the informations of current thread, security information in the form of a pointer to the access token and impersonation information, fields relating to Asynchronous Local Procedure Call (ALPC) messages, pending I/O requests (IRPs), CPU Sets. KTHREAD (Kernel Thread Structure):

0:000> dt nt!_kthread
   +0x000 Header           : _DISPATCHER_HEADER
   +0x018 SListFaultAddress : Ptr64 Void
   +0x020 QuantumTarget    : Uint8B
   +0x028 InitialStack     : Ptr64 Void
   +0x030 StackLimit       : Ptr64 Void
   +0x038 StackBase        : Ptr64 Void
   +0x040 ThreadLock       : Uint8B
   +0x048 CycleTime        : Uint8B
   +0x050 CurrentRunTime   : Uint4B
   +0x054 ExpectedRunTime  : Uint4B
   +0x058 KernelStack      : Ptr64 Void
   +0x078 ThreadFlags      : Int4B
   +0x07c Tag              : UChar
   +0x07d SystemHeteroCpuPolicy : UChar
   +0x07e UserHeteroCpuPolicy : Pos 0, 7 Bits
   +0x07e ExplicitSystemHeteroCpuPolicy : Pos 7, 1 Bit
   +0x07f Spare0           : UChar
   +0x080 SystemCallNumber : Uint4B
   +0x084 ReadyTime        : Uint4B
   +0x088 FirstArgument    : Ptr64 Void
   +0x090 TrapFrame        : Ptr64 _KTRAP_FRAME
   +0x098 ApcState         : _KAPC_STATE
   +0x098 ApcStateFill     : [43] UChar
   +0x0c3 Priority         : Char
   +0x0c4 UserIdealProcessor : Uint4B
   +0x0c8 WaitStatus       : Int8B
   +0x0d0 WaitBlockList    : Ptr64 _KWAIT_BLOCK
   +0x0d8 WaitListEntry    : _LIST_ENTRY
   +0x0d8 SwapListEntry    : _SINGLE_LIST_ENTRY
   +0x0e8 Queue            : Ptr64 _DISPATCHER_HEADER
   +0x0f0 Teb              : Ptr64 Void
   +0x0f8 RelativeTimerBias : Uint8B
   +0x100 Timer            : _KTIMER
   +0x140 WaitBlock        : [4] _KWAIT_BLOCK
   +0x140 WaitBlockFill4   : [20] UChar
   +0x154 ContextSwitches  : Uint4B
   +0x140 WaitBlockFill5   : [68] UChar
   +0x184 State            : UChar
   +0x21c QueuePriority    : Int4B
   +0x220 Process          : Ptr64 _KPROCESS

How CreateThread Works?

Example Usage Of CreateThread

After discussing about internal structures of threads, let's take a look at CreateThread function that use for creating thread in user-mode (current process). Of course, I will create a project in Visual Studio so I can analyze it better.

HANDLE CreateThread(
  [in, optional]  LPSECURITY_ATTRIBUTES   lpThreadAttributes,
  [in]            SIZE_T                  dwStackSize,
  [in]            LPTHREAD_START_ROUTINE  lpStartAddress,
  [in, optional]  __drv_aliasesMem LPVOID lpParameter,
  [in]            DWORD                   dwCreationFlags,
  [out, optional] LPDWORD                 lpThreadId


#include <stdio.h>
#include <windows.h>

DWORD WINAPI ThreadFunc1(LPVOID lpParam);
DWORD WINAPI ThreadFunc2(LPVOID lpParam);

int main()
    HANDLE hThread1, hThread2;
    DWORD threadID1, threadID2;

    hThread1 = CreateThread(NULL, 0, ThreadFunc1, NULL, 0, &threadID1);
    printf("%d -- 1\n", threadID1);

    hThread2 = CreateThread(NULL, 0, ThreadFunc2, NULL, 0, &threadID2);
    printf("%d -- 2\n", threadID2);

    WaitForSingleObject(hThread1, INFINITE);
    WaitForSingleObject(hThread2, INFINITE);



    return 0;

DWORD WINAPI ThreadFunc1(LPVOID lpParam)
    printf("Thread 1 is working\n");
    return 0;

DWORD WINAPI ThreadFunc2(LPVOID lpParam)
    printf("Thread 2 is working\n");
    return 0;

This small program is creating two threads that print the current thread working. You can add SuspendThread(hThreadX) after the thread finishes its work, since it will not be visible on Process Hacker.

hThread1 = CreateThread(NULL, 0, ThreadFunc1, NULL, 0, &threadID1);
printf("%d -- 1\n", threadID1);
hThread2 = CreateThread(NULL, 0, ThreadFunc2, NULL, 0, &threadID2);
printf("%d -- 2\n", threadID2);

Internals Of CreateThread

What happened when CreateThread was called? In this section, we will answer this question.

Here is the asm codes of CreateThread function:

mov r11, rsp
sub rsp, 48h
mov r10d, [rsp+48h+dwCreationFlags]
mov rax, [rsp+48h+lpThreadId]
and r10d, 10004h
mov [r11-10h], rax
and qword ptr [r11-18h], 0
mov [r11-20h], r10d
mov [r11-28h], r9
mov r9, r8          ; lpStartAddress
mov r8, rdx         ; dwStackSize
mov rdx, rcx        ; lpThreadAttributes
or  rcx, 0FFFFFFFFFFFFFFFFh ; hProcess
cal cs:CreateRemoteThreadEx_0
nop dword ptr [rax+rax+00h]
add rsp, 48h

and pseudo-codes:

HANDLE __stdcall CreateThread(
        LPSECURITY_ATTRIBUTES lpThreadAttributes,
        SIZE_T dwStackSize,
        LPTHREAD_START_ROUTINE lpStartAddress,
        LPVOID lpParameter,
        DWORD dwCreationFlags,
        LPDWORD lpThreadId)
  return CreateRemoteThreadEx_0(
           dwCreationFlags & 0x10004,

When we start the CreateRemoteThreadEx function in KernelBase.dll on IDA, we can see a lot of NT queries. I will mention them.

Process manager allocates space for a thread object and calls the kernel to initialize KTHREAD. This function eventually call CreateRemoteThreadEx.

  1. It converts the Windows API parameters to native flags
  2. It builds an attribute list with two entries: client ID and TEB address.
  3. Since the CreateRemoteThreadEx function is also used by CreateRemoteThread, process information is extracted from the data sent by the handle. If it is equal to the data returned from GetCurrentProcess (-1), it is the same process. Obtains information using NtQueryInformationProcess in case the process handle is not equal to -1.
  4. Because of to make the transition to the executive in kernel mode, It calls NtCreateThreadEx (ntdll) function that has same parameters. NtCreateThreadEx (inside the executive) creates and initializes the user-mode thread context and then calls PspCreateThread to create a suspended executive thread object. Then the function returns, eventually ending back in user mode at CreateRemoteThreadEx.
  • PspCreateThread: In fact, until this stage, the thread object has been created on the kernel-mode side, but there is no thread created on the user-mode side. In fact, until this stage, the thread object has been created on the kernel-mode side, but there is no thread created on the user-mode side. The PspCreateThread function is responsible for creating the thread, together with two helper functions (PspAllocateThread and PspInsertThread). PspAllocateThread creates the object, then PspInsertThread sets the security attributes and makes the thread object executable using the KeStartThread call.
  1. Unless the caller created the thread with the CREATE_SUSPENDED flag set, the thread is now resumed so that it can be scheduled for execution. When the thread starts running, it executes the a few steps that I'll mention before calling the actual user’s specified start address.
  • The newly created thread comes to life by running KiStartUserThread. It sends the startup function specified by the user as a parameter to the PspUserThreadStartup function after completing IRQL operations.
  1. The thread handle and the thread ID are returned to the caller.


Thread Hijacking

This method is to redirect the instruction pointer (rip register) of an existing thread to the start address of the injected shellcode.

PoC and Technical Details:

  1. Identify the Target: Once the target process is identified, the CreateToolhelp32Snapshot function is used to take a snapshot of the process and the target thread is identified.
  2. Thread functionality is suspended using OpenThread and SuspendThread. The current state of the thread is obtained with GetThreadContext.
  3. The shellcode to be used is injected using VirtualAlloc and WriteProcessMemory functions.
  4. The instruction pointer is synchronized to the shellcode address using the SetThreadContext function and resumed with ResumeThread.

Thread Stack Spoofing

GitHub Repo

The purpose of this method is to hide references to malicious code in the thread call stack. According to the way the technique works, it can be interpreted as an abusing of Stack Unwinding. After hooking a call used in shellcode, 0 is overwritten to the return address on the thread call stack and after the actual function completes its function, the original return address is overwritten and the function exits. The reason for setting the return value to 0 is to end the call stack. The example in repo:

void WINAPI MySleep(DWORD dwMilliseconds)
    // Locate this stack frame's return address.
    auto overwrite = (PULONG_PTR)_AddressOfReturnAddress();
    const auto origReturnAddress = *overwrite;

    // By overwriting the return address with 0 we're basically telling call stack unwinding algorithm
    // to stop unwinding call stack any further, as there further frames. This we can hide our remaining stack frames
    // referencing shellcode memory allocation from residing on a call stack.
    *overwrite = 0;

    // Perform sleep emulating originally hooked functionality.
    ::SleepEx(dwMilliseconds, false);

    // Restore original thread's call stack.
    *overwrite = origReturnAddress;