Multithreading support in C++11
After the millennium, mainstream chip manufacturers began to produce multi-core processors, so parallel programming became more and more important. In C++98, there is no set of multi-thread programming library of its own at all. It uses the mutex in the POSIX standard pthread library in C99 to complete multi-thread programming.
First, let's simplify a concept: atomic operations, that is, "the smallest and non-parallelizable operations" in multi-threaded programs. In layman's terms, if the operation on a resource is an atomic operation, it means that only one atomic operation of one thread can operate on this resource at a time. In C99, we generally use mutexes to complete coarse-grained atomic operations.
#include<pthread.h> #include<iostream> using namespace std; static long long total =0; pthread_mutex_t m=PTHREAD_MUTEX_INITIALIZER;//mutex void * func(void *) { long long i; for(i=0;i<100000000LL;i++) { pthread_mutex_lock(&m); total +=i; pthread_mutex_unlock(&m); } } int main() { pthread_t thread1,thread2; if(pthread_create(&thread1,nullptr,&func,nullptr)) { throw; } if(pthread_create(&thread2,nullptr,&func,nullptr)) { throw; } pthread_join(thread1,nullptr); pthread_join(thread2,nullptr); cout<<total<<endl;//9999999900000000 }
It can be seen that total +=i; in the above code is an atomic operation.
1. Atomic types in C++11
We found that the mutex in C99 needs to be explicitly declared, and the lock must be switched on and off by itself. In order to simplify the code, atomic types are defined in C++11. These atomic types are a class, and their interfaces are atomic operations. As follows:
#include<atomic> #include<thread> #include<iostream> using namespace std; atomic_llong total {0};//atomic data type void func(int) { for(long long i=0;i<100000000LL;i++) { total+=i; } } int main() { thread t1(func,0); thread t2(func,0); t1.join(); t2.join(); cout<<total<<endl;//9999999900000000 }
In the above code, total is an atomic class object, and its interface, such as the overloaded operator+=() here, is an atomic operation, so we don't need to explicitly call the mutex.
How many atomic types are there in total? The approach of C++11 is that there is an atomic class template through which we can define the desired atomic type:
using atomic_llong = atomic<long long>;
So what type we want to make into an atomic type, we only need to pass in different template arguments.
In short, atomic operations in C++11 are member functions of atomic template classes.
1.1 Interfaces of atomic types
We know that atomic type interfaces are atomic operations, but let's focus now, what interfaces do they have?
The atomic type belongs to resource data, and multiple threads can only access a copy of a single type. Therefore, the atomic type in C++11 does not support move semantics and copy semantics. The operations of the atomic type are all operations on the only resource. The atomic type does not have copy construction, copy assignment, move construction, and move assignment.
atomic<float> af{1.2f};//correct atomic<float> af1{af};//error, atomic types do not support copy semantics float f=af;//Correct, the interface of the atomic type is called af=0.0;//Correct, the interface of the atomic type is called
Look at some atomic types of interfaces in the above table, load() is for reading operations, for example
atomic<int> a(2); int b=a; b=a.load();
As in the above code, b=a is equivalent to b=a.load(). In fact, there is an operator int() interface in atomic<int>. In this interface:
operator __int_type() const noexcept { return load(); }
The store() interface is used to write data:
atomic<int> a; a=1; a.store(1);
In the above code, a=1 is equivalent to a.load(1), and there is an operator=(int) interface in atomic<int>, and its implementation is as follows:
__int_type operator=(__int_type __i) noexcept { store(__i); return __i; }
For example, other operations, such as exchange is for exchange, compare_exchange_weak/strong() is for comparison and exchange (CAS operation), their implementation will be more complicated, and there are some symbol overloading, which will not be introduced here.< Chapter 5 and Chapter 7 of <C++ Concurrency in Action>> will introduce this part in detail.
It is worth noting that there is a special atomic type atomic_flag, which is an atomic type that is lock-free, that is to say, the thread's access to this type of data is lock-free, so it does not need interfaces: load and store, that is, multiple threads can operate this resource at the same time. We can use it to implement a spinlock
1.2 Implementation of simple spin lock
Mutual exclusion lock means that when a thread accesses a resource, it will set a lock for the code entering the critical area, and open the lock when it comes out. When other processes want to enter the critical area, it will realize the Check the lock, if the lock is off, then block yourself, so that the core will perform other tasks, resulting in context switching, so its efficiency will be relatively low. The is_lock_free() in the atomic type is the mutex that indicates whether the access of this atomic type is used.
The opposite of the mutex is the spin lock. The difference is that when other processes want to enter the critical section, if the lock is closed, it will not block itself, but constantly check whether the lock is open, so that it will not Causes a context switch, but also increases cpu utilization.
We can use the atomic_flag atomic type to implement the spin lock, because the atomic_flag itself is lock-free, so multiple threads can access it at the same time, which is equivalent to accessing the spin lock at the same time, as follows:
#include<thread> #include<atomic> #include<iostream> #include<unistd.h> using namespace std; atomic_flag lock=ATOMIC_FLAG_INIT;//get spinlock void f(int n) { while(lock.test_and_set()) {//Attempt to acquire an atomic lock cout<<"Waiting from thread "<<n<<endl; } cout<<"Thread "<<n<<" starts working"<<endl; } void g(int n) { cout<<"Thread "<<n<<" is going to start"<<endl; lock.clear();//open the lock cout<<"Thread "<<n<<" starts working"<<endl; } int main() { lock.test_and_set();//close the lock thread t1(f,1); thread t2(g,2); t1.join(); usleep(100000); t2.join(); }
Here test_and_set() is an atomic operation, what it does is, write the new value and return the old value. In main(), we first write a true value to the lock variable, that is, close the lock, and then in thread t1, it keeps trying to obtain the spin lock, and in thread t2, the clear() interface is equivalent to setting the lock The variable value becomes false, and the spin lock is opened, so that thread t1 can execute the rest of the code.
Simply we can encapsulate the lock
void Lock(atomic_flag & lock) { while(lock.test_and_set()); } void Unlock(atomic_flag & lock) { lock.clear(); }
In the above operation, we are equivalent to completing a lock, which can be used to realize the function of mutually exclusive access to the critical section. However, this is different from pthread_mutex_lock() and pthread_mutex_unlock() in C99. These two locks in C99 are mutual exclusion locks, and the above code implements a spin lock.
2. Increase the degree of parallelism
#include <thread> #include <atomic> atomic<int> a; atomic<int> b; void threadHandle() { int t = 1; a = t; b = 2; // assignment of b does not depend on a }
In the above code, the assignment statements to a and b can actually be done regardless of the sequence. If the compiler or hardware is allowed to reorder or execute them concurrently, it will increase the degree of parallelism.
In a single-threaded program, we don't care about their execution order at all. The results are the same anyway, but multi-threading is different. If the execution order is different, the results will be different.
#include <thread> #include <atomic> #include<iostream> using namespace std; atomic<int> a{0}; atomic<int> b{0}; void ValueSet(int ) { int t = 1; a = t; b = 2; // assignment of b does not depend on a } int Observer(int) { cout<<"("<<a<<","<<b<<")"<<endl; } int main() { thread t1(ValueSet,0); thread t2(Observer,0); t1.join(); t2.join(); cout<<"Final: ("<<a<<","<<b<<")"<<endl; }
In the above code, the output in Observer() will be related to the assignment order of a and b, and its output may be: (0,0) (1,0) (0,2) (1,2) . This shows that in a multi-threaded program, if the order of execution of instructions is different, the results will be different.
Two key factors that affect the degree of parallelism are: whether the compiler has permission to reorder instructions and whether the hardware has permission to reorder assembly code.
In C++11, we can explicitly tell the compiler and hardware their permissions, thereby increasing the degree of concurrency. In layman's terms, if we require the highest degree of parallelism, then we empower the compiler and hardware to allow them to reorder instructions.
2.1 parameters of memory_order
Most of the member functions of atomic type can receive a parameter of type memory_order, which can tell the compiler and hardware whether reordering is possible.
typedef enum memory_order { memory_order_relaxed, // no guarantees about order of execution memory_order_acquire, // In this thread, all subsequent read operations must be performed after this atomic operation is completed memory_order_release, // In this thread, this atomic operation can only be performed after all previous write operations are completed memory_order_acq_rel, // Contains both memory_order_acquire and memory_order_release memory_order_consume, // In this thread, all subsequent operations related to this atomic type must be executed after this atomic operation is completed memory_order_seq_cst // All accesses are performed sequentially } memory_order;
In C++11, the default value of the memory_order parameter is memory_order_seq_cst, that is, the compiler and hardware are not allowed to reorder. In this way, the output result in Observer() in the above code cannot be (0, 2), because the assignment statement to a is prior to b. This is actually: sequential consistency, to be precise, in the same thread, the order of atomic operations is consistent with the order of code.
And if we change the code:
#include <thread> #include <atomic> #include<iostream> using namespace std; atomic<int> a{0}; atomic<int> b{0}; void ValueSet(int ) { int t = 1; a.store(t,memory_order_relaxed); b.store(2,memory_order_relaxed); // assignment of b does not depend on a } int Observer(int) { cout<<"("<<a<<","<<b<<")"<<endl; } int main() { thread t1(ValueSet,0); thread t2(Observer,0); t1.join(); t2.join(); cout<<"Final: ("<<a<<","<<b<<")"<<endl; }
The output of Observer() in the above code may be: (0,2), because memory_order_relaxed here does not have strict requirements on the order of atomic operations, it may happen that b is assigned first, and at this time a Not yet assigned a value.
Therefore, in order to further develop the parallelism of atomic operations, our goal is to ensure that the program is both fast and correct.
2.2 release-acquire memory order
#include <thread> #include <atomic> #include<iostream> using namespace std; atomic<int> a; atomic<int> b; void Thread1(int ) { int t = 1; a.store(t,memory_order_relaxed); b.store(2,memory_order_release); // The write operation before this operation must be completed first, that is, the assignment of a is guaranteed to be faster than that of b } void Thread2(int ) { while(b.load(memory_order_acquire)!=2);//The following code must be executed after the atomic operation is completed cout<<a.load(memory_order_relaxed)<<endl;//1 } int main() { thread t1(Thread1,0); thread t2(Thread2,0); t2.join(); t1.join(); }
In the above code, a kind of spin lock operation is actually implemented. We ensure that a.store is faster than b.store, and b.load must be faster than a.load. Moreover, a release-acquire memory order is implemented for the store and load of b.
2.3 release-consume memory order
#include<thread> #include<atomic> #include<cassert> #include<string> using namespace std; atomic<string*> ptr; atomic<int> date; void Producer() { string *p=new string("hello"); date.store(42,memory_order_relaxed); ptr.store(p,memory_order_release);//date assignment is faster than ptr } void Consumer() { string *p2; while(!(p2=ptr.load(memory_order_consume))); assert(*p2=="hello");//must be established assert(date.load(memory_order_relaxed)==42);//Possible assertion failure because this instruction may be executed first in this thread } int main() { thread t1(Producer); thread t2(Consumer); t1.join(); t2.join(); }
The above memory order is also called producer-consumer order.
In fact, the total memory model is 4: sequential consistency, loose (relaxed),release-consume and release-acquire.
2.4 Summary
In fact, for parallel programming, the most fundamental thing is that the parallel algorithm, rather than optimizing the memory model from the hardware, if you find it troublesome, all use the sequential consistency memory model, which will have a great impact on parallel efficiency. not very big.
3. Thread local storage
Threads have their own stack space, but the heap space and static data area (file data,bss segment, global/static variables) are shared. Threads share with each other, static data is of course very good, but we also need the thread's own local variables
#include<pthread.h> #include<iostream> using namespace std; int thread_local errorCode=0; void* MaySetErr(void *input) { if(*(int*)input==1) errorCode=1; else if(*(int*)input==2) errorCode=2; else errorCode=0; cout<<errorCode<<endl; } int main() { int input_a=1; int input_b=2; pthread_t thread1,thread2; pthread_create(&thread1,nullptr,&MaySetErr,&input_a); pthread_create(&thread2,nullptr,&MaySetErr,&input_b); pthread_join(thread1,nullptr); pthread_join(thread2,nullptr); cout<<errorCode<<endl;//0 }
The errorCode in the above code is a thread_local variable, which means that it is a global variable inside a thread. When the thread starts, it will be initialized, and then the value will not be valid when the thread ends. In fact, the two processes will have their own errorCode, and the main function also has its own errorCode
4. Quick Exit
In C++98, we will see 3 termination functions: terminate,abort,exit. In C++11, we added the quick_exit termination function, which is mainly used in threads.
- The terminate function, which is related to the exception mechanism of C++, usually calls terminate if the exception is not caught
- The abort function is the underlying termination function, and terminate is called to terminate the process, but when abort is called, no destructor will be called, which will cause memory leaks, but generally speaking, it will give POSIX-compliant operating systems When a signal is thrown, the signal handler will release all resources of the process by default to avoid memory leaks.
- The exit function exits normally, and it will call the destructor, but sometimes the destructor is very complicated, so we might as well call the absort function directly and leave the release of resources to the operating system.
In the case of multi-threading, we generally use exit to exit, but it is easy to get stuck. When the thread is complicated, the normal exit method of exit is too conservative, but the exit method of abort is too radical , so there is a new exit function: the quick_exit function.
- quick_exit
This function does not execute the destructor, so that the program terminates, but unlike abort, abort generally exits abnormally, while quick_exit exits normally.
#include<cstdlib> #include<iostream> using namespace std; struct A{~A(){cout<<"Destruct A."<<endl;}}; void closeDevice(){cout<<"device is closed."<<endl;} int main() { A a; at_quick_exit(closeDevice); quick_exit(0); } so easy to get stuck,When threads are complex,`exit`This normal way of exiting,too conservative,but`abort`This exit method is too radical,So there is a new exit function:`quick_exit`function. * `quick_exit` >This function does not execute the destructor,causing the program to terminate,but with`abort`the difference is,`abort`Usually exits abnormally,and`quick_exit`is a normal exit. ```cpp #include<cstdlib> #include<iostream> using namespace std; struct A{~A(){cout<<"Destruct A."<<endl;}}; void closeDevice(){cout<<"device is closed."<<endl;} int main() { A a; at_quick_exit(closeDevice); quick_exit(0); }