Simplify TLS and reduce startup latency

This change simplifies the thread-local storage support code. On Windows
and Mac OS X the startup latency of __enable_tls() has been reduced from
30ms to 1ms. On Windows, TLS memory accesses will now go much faster due
to better self-modifying code that prevents a function call and acquires
our thread information block pointer in a single instruction.
This commit is contained in:
Justine Tunney 2022-07-18 03:33:32 -07:00
parent 38c3fa63fe
commit b1d9d11be1
15 changed files with 136 additions and 312 deletions

View file

@ -26,5 +26,5 @@ STATIC_YOINK("_main_thread_ctor");
* Returns thread descriptor of the current thread.
*/
cthread_t(cthread_self)(void) {
return (cthread_t)__get_tls_inline();
return (cthread_t)__get_tls();
}