Lock free single producer/single consumer circular buffer












3















I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed.
I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.



With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.



Here is the Producer code:



bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); //(1)
const auto next_tail = increment(current_tail);

if(next_tail != _head.load(std::memory_order_acquire)) //(2)
{
_array[current_tail] = item; //(3)
_tail.store(next_tail, std::memory_order_release); //(4)
return true;
}
return false; // full queue
}


Here is the Consumer code:



bool pop(Element& item)
{
const auto current_head = _head.load(std::memory_order_relaxed); //(1)
if(current_head == _tail.load(std::memory_order_acquire)) //(2)
return false; // empty queue

item = _array[current_head]; //(3)
_head.store(increment(current_head), std::memory_order_release); //(4)
return true;
}


I understand why the Producer (4) and the Consumer (2) statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store by the Producerwill be visible side effects once the consumer will see the stored value.



I also understand why the Consumer (4) statement is needed, this is to make sure that the Consumer (3) load will be performed before the Consumer (4) store will be performed.



The question




  • Why is the Producer (2) load needs to be performed with acquire semantic (instead of relaxed)? Is it to prevent Producer (3) or (4) to be reodered before the condition (at compile time or at runtime)?










share|improve this question

























  • One question per question please.

    – πάντα ῥεῖ
    Jan 19 at 14:50











  • @πάνταῥεῖ done ;)

    – Yoo
    Jan 19 at 14:52
















3















I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed.
I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.



With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.



Here is the Producer code:



bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); //(1)
const auto next_tail = increment(current_tail);

if(next_tail != _head.load(std::memory_order_acquire)) //(2)
{
_array[current_tail] = item; //(3)
_tail.store(next_tail, std::memory_order_release); //(4)
return true;
}
return false; // full queue
}


Here is the Consumer code:



bool pop(Element& item)
{
const auto current_head = _head.load(std::memory_order_relaxed); //(1)
if(current_head == _tail.load(std::memory_order_acquire)) //(2)
return false; // empty queue

item = _array[current_head]; //(3)
_head.store(increment(current_head), std::memory_order_release); //(4)
return true;
}


I understand why the Producer (4) and the Consumer (2) statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store by the Producerwill be visible side effects once the consumer will see the stored value.



I also understand why the Consumer (4) statement is needed, this is to make sure that the Consumer (3) load will be performed before the Consumer (4) store will be performed.



The question




  • Why is the Producer (2) load needs to be performed with acquire semantic (instead of relaxed)? Is it to prevent Producer (3) or (4) to be reodered before the condition (at compile time or at runtime)?










share|improve this question

























  • One question per question please.

    – πάντα ῥεῖ
    Jan 19 at 14:50











  • @πάνταῥεῖ done ;)

    – Yoo
    Jan 19 at 14:52














3












3








3


2






I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed.
I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.



With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.



Here is the Producer code:



bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); //(1)
const auto next_tail = increment(current_tail);

if(next_tail != _head.load(std::memory_order_acquire)) //(2)
{
_array[current_tail] = item; //(3)
_tail.store(next_tail, std::memory_order_release); //(4)
return true;
}
return false; // full queue
}


Here is the Consumer code:



bool pop(Element& item)
{
const auto current_head = _head.load(std::memory_order_relaxed); //(1)
if(current_head == _tail.load(std::memory_order_acquire)) //(2)
return false; // empty queue

item = _array[current_head]; //(3)
_head.store(increment(current_head), std::memory_order_release); //(4)
return true;
}


I understand why the Producer (4) and the Consumer (2) statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store by the Producerwill be visible side effects once the consumer will see the stored value.



I also understand why the Consumer (4) statement is needed, this is to make sure that the Consumer (3) load will be performed before the Consumer (4) store will be performed.



The question




  • Why is the Producer (2) load needs to be performed with acquire semantic (instead of relaxed)? Is it to prevent Producer (3) or (4) to be reodered before the condition (at compile time or at runtime)?










share|improve this question
















I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed.
I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.



With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.



Here is the Producer code:



bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); //(1)
const auto next_tail = increment(current_tail);

if(next_tail != _head.load(std::memory_order_acquire)) //(2)
{
_array[current_tail] = item; //(3)
_tail.store(next_tail, std::memory_order_release); //(4)
return true;
}
return false; // full queue
}


Here is the Consumer code:



bool pop(Element& item)
{
const auto current_head = _head.load(std::memory_order_relaxed); //(1)
if(current_head == _tail.load(std::memory_order_acquire)) //(2)
return false; // empty queue

item = _array[current_head]; //(3)
_head.store(increment(current_head), std::memory_order_release); //(4)
return true;
}


I understand why the Producer (4) and the Consumer (2) statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store by the Producerwill be visible side effects once the consumer will see the stored value.



I also understand why the Consumer (4) statement is needed, this is to make sure that the Consumer (3) load will be performed before the Consumer (4) store will be performed.



The question




  • Why is the Producer (2) load needs to be performed with acquire semantic (instead of relaxed)? Is it to prevent Producer (3) or (4) to be reodered before the condition (at compile time or at runtime)?







c++ multithreading c++11 atomic lock-free






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 20 at 17:10







Yoo

















asked Jan 19 at 14:47









YooYoo

526




526













  • One question per question please.

    – πάντα ῥεῖ
    Jan 19 at 14:50











  • @πάνταῥεῖ done ;)

    – Yoo
    Jan 19 at 14:52



















  • One question per question please.

    – πάντα ῥεῖ
    Jan 19 at 14:50











  • @πάνταῥεῖ done ;)

    – Yoo
    Jan 19 at 14:52

















One question per question please.

– πάντα ῥεῖ
Jan 19 at 14:50





One question per question please.

– πάντα ῥεῖ
Jan 19 at 14:50













@πάνταῥεῖ done ;)

– Yoo
Jan 19 at 14:52





@πάνταῥεῖ done ;)

– Yoo
Jan 19 at 14:52












2 Answers
2






active

oldest

votes


















2














we need prove that



_array[current_tail] = item; // push(3)


excecuted after conforming (current_head == current_tail)



item = _array[current_head]; // pop(3)


is completed. we can overwrite cell, only after data from it already copied to item



the



_head.load(std::memory_order_acquire) // push(2)


synchronized with



_head.store(increment(current_head), std::memory_order_release);   //pop(4)


via Release-Acquire ordering:



all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.



so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head] is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head] already free.



from another side from memory_order_acquire load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed



item = _array[current_head];                                        //pop(3)
_head.store(increment(current_head), std::memory_order_release); //pop(4)
-----
_head.load(std::memory_order_acquire); //push(2)
_array[current_tail] = item; //push(3)





share|improve this answer
























  • Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

    – Yoo
    Jan 20 at 15:37













  • @Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

    – RbMm
    Jan 20 at 15:46











  • I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

    – Yoo
    Jan 20 at 15:50













  • @Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

    – RbMm
    Jan 20 at 15:52











  • @Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

    – RbMm
    Jan 20 at 15:55



















1














The memory barriers prevent the CPU from reordering accesses to the Element object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).



Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.



The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.






share|improve this answer


























  • The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

    – Yoo
    Jan 19 at 18:05











  • @Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

    – Ben Voigt
    Jan 19 at 18:15













  • Thanks for your detailed answers !

    – Yoo
    Jan 20 at 17:09











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54268248%2flock-free-single-producer-single-consumer-circular-buffer%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














we need prove that



_array[current_tail] = item; // push(3)


excecuted after conforming (current_head == current_tail)



item = _array[current_head]; // pop(3)


is completed. we can overwrite cell, only after data from it already copied to item



the



_head.load(std::memory_order_acquire) // push(2)


synchronized with



_head.store(increment(current_head), std::memory_order_release);   //pop(4)


via Release-Acquire ordering:



all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.



so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head] is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head] already free.



from another side from memory_order_acquire load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed



item = _array[current_head];                                        //pop(3)
_head.store(increment(current_head), std::memory_order_release); //pop(4)
-----
_head.load(std::memory_order_acquire); //push(2)
_array[current_tail] = item; //push(3)





share|improve this answer
























  • Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

    – Yoo
    Jan 20 at 15:37













  • @Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

    – RbMm
    Jan 20 at 15:46











  • I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

    – Yoo
    Jan 20 at 15:50













  • @Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

    – RbMm
    Jan 20 at 15:52











  • @Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

    – RbMm
    Jan 20 at 15:55
















2














we need prove that



_array[current_tail] = item; // push(3)


excecuted after conforming (current_head == current_tail)



item = _array[current_head]; // pop(3)


is completed. we can overwrite cell, only after data from it already copied to item



the



_head.load(std::memory_order_acquire) // push(2)


synchronized with



_head.store(increment(current_head), std::memory_order_release);   //pop(4)


via Release-Acquire ordering:



all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.



so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head] is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head] already free.



from another side from memory_order_acquire load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed



item = _array[current_head];                                        //pop(3)
_head.store(increment(current_head), std::memory_order_release); //pop(4)
-----
_head.load(std::memory_order_acquire); //push(2)
_array[current_tail] = item; //push(3)





share|improve this answer
























  • Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

    – Yoo
    Jan 20 at 15:37













  • @Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

    – RbMm
    Jan 20 at 15:46











  • I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

    – Yoo
    Jan 20 at 15:50













  • @Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

    – RbMm
    Jan 20 at 15:52











  • @Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

    – RbMm
    Jan 20 at 15:55














2












2








2







we need prove that



_array[current_tail] = item; // push(3)


excecuted after conforming (current_head == current_tail)



item = _array[current_head]; // pop(3)


is completed. we can overwrite cell, only after data from it already copied to item



the



_head.load(std::memory_order_acquire) // push(2)


synchronized with



_head.store(increment(current_head), std::memory_order_release);   //pop(4)


via Release-Acquire ordering:



all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.



so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head] is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head] already free.



from another side from memory_order_acquire load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed



item = _array[current_head];                                        //pop(3)
_head.store(increment(current_head), std::memory_order_release); //pop(4)
-----
_head.load(std::memory_order_acquire); //push(2)
_array[current_tail] = item; //push(3)





share|improve this answer













we need prove that



_array[current_tail] = item; // push(3)


excecuted after conforming (current_head == current_tail)



item = _array[current_head]; // pop(3)


is completed. we can overwrite cell, only after data from it already copied to item



the



_head.load(std::memory_order_acquire) // push(2)


synchronized with



_head.store(increment(current_head), std::memory_order_release);   //pop(4)


via Release-Acquire ordering:



all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.



so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head] is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head] already free.



from another side from memory_order_acquire load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed



item = _array[current_head];                                        //pop(3)
_head.store(increment(current_head), std::memory_order_release); //pop(4)
-----
_head.load(std::memory_order_acquire); //push(2)
_array[current_tail] = item; //push(3)






share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 19 at 19:40









RbMmRbMm

17.7k11224




17.7k11224













  • Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

    – Yoo
    Jan 20 at 15:37













  • @Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

    – RbMm
    Jan 20 at 15:46











  • I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

    – Yoo
    Jan 20 at 15:50













  • @Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

    – RbMm
    Jan 20 at 15:52











  • @Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

    – RbMm
    Jan 20 at 15:55



















  • Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

    – Yoo
    Jan 20 at 15:37













  • @Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

    – RbMm
    Jan 20 at 15:46











  • I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

    – Yoo
    Jan 20 at 15:50













  • @Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

    – RbMm
    Jan 20 at 15:52











  • @Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

    – RbMm
    Jan 20 at 15:55

















Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

– Yoo
Jan 20 at 15:37







Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

– Yoo
Jan 20 at 15:37















@Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

– RbMm
Jan 20 at 15:46





@Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

– RbMm
Jan 20 at 15:46













I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

– Yoo
Jan 20 at 15:50







I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

– Yoo
Jan 20 at 15:50















@Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

– RbMm
Jan 20 at 15:52





@Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

– RbMm
Jan 20 at 15:52













@Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

– RbMm
Jan 20 at 15:55





@Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

– RbMm
Jan 20 at 15:55













1














The memory barriers prevent the CPU from reordering accesses to the Element object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).



Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.



The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.






share|improve this answer


























  • The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

    – Yoo
    Jan 19 at 18:05











  • @Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

    – Ben Voigt
    Jan 19 at 18:15













  • Thanks for your detailed answers !

    – Yoo
    Jan 20 at 17:09
















1














The memory barriers prevent the CPU from reordering accesses to the Element object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).



Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.



The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.






share|improve this answer


























  • The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

    – Yoo
    Jan 19 at 18:05











  • @Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

    – Ben Voigt
    Jan 19 at 18:15













  • Thanks for your detailed answers !

    – Yoo
    Jan 20 at 17:09














1












1








1







The memory barriers prevent the CPU from reordering accesses to the Element object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).



Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.



The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.






share|improve this answer















The memory barriers prevent the CPU from reordering accesses to the Element object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).



Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.



The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 19 at 16:37

























answered Jan 19 at 16:32









Ben VoigtBen Voigt

235k29311571




235k29311571













  • The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

    – Yoo
    Jan 19 at 18:05











  • @Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

    – Ben Voigt
    Jan 19 at 18:15













  • Thanks for your detailed answers !

    – Yoo
    Jan 20 at 17:09



















  • The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

    – Yoo
    Jan 19 at 18:05











  • @Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

    – Ben Voigt
    Jan 19 at 18:15













  • Thanks for your detailed answers !

    – Yoo
    Jan 20 at 17:09

















The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

– Yoo
Jan 19 at 18:05





The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

– Yoo
Jan 19 at 18:05













@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

– Ben Voigt
Jan 19 at 18:15







@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

– Ben Voigt
Jan 19 at 18:15















Thanks for your detailed answers !

– Yoo
Jan 20 at 17:09





Thanks for your detailed answers !

– Yoo
Jan 20 at 17:09


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54268248%2flock-free-single-producer-single-consumer-circular-buffer%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Homophylophilia

Updating UILabel text programmatically using a function

Cloud Functions - OpenCV Videocapture Read method fails for larger files from cloud storage