Lock free single producer/single consumer circular buffer
I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed.
I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.
With this implementation, there is only a unique thread which can call the push()
function and another unique thread which can call the pop()
function.
Here is the Producer
code:
bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); //(1)
const auto next_tail = increment(current_tail);
if(next_tail != _head.load(std::memory_order_acquire)) //(2)
{
_array[current_tail] = item; //(3)
_tail.store(next_tail, std::memory_order_release); //(4)
return true;
}
return false; // full queue
}
Here is the Consumer
code:
bool pop(Element& item)
{
const auto current_head = _head.load(std::memory_order_relaxed); //(1)
if(current_head == _tail.load(std::memory_order_acquire)) //(2)
return false; // empty queue
item = _array[current_head]; //(3)
_head.store(increment(current_head), std::memory_order_release); //(4)
return true;
}
I understand why the Producer (4)
and the Consumer (2)
statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store
by the Producer
will be visible side effects once the consumer
will see the stored value.
I also understand why the Consumer (4)
statement is needed, this is to make sure that the Consumer (3)
load will be performed before the Consumer (4)
store will be performed.
The question
- Why is the
Producer (2)
load needs to be performed with acquire semantic (instead of relaxed)? Is it to preventProducer (3) or (4)
to be reodered before the condition (at compile time or at runtime)?
c++ multithreading c++11 atomic lock-free
add a comment |
I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed.
I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.
With this implementation, there is only a unique thread which can call the push()
function and another unique thread which can call the pop()
function.
Here is the Producer
code:
bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); //(1)
const auto next_tail = increment(current_tail);
if(next_tail != _head.load(std::memory_order_acquire)) //(2)
{
_array[current_tail] = item; //(3)
_tail.store(next_tail, std::memory_order_release); //(4)
return true;
}
return false; // full queue
}
Here is the Consumer
code:
bool pop(Element& item)
{
const auto current_head = _head.load(std::memory_order_relaxed); //(1)
if(current_head == _tail.load(std::memory_order_acquire)) //(2)
return false; // empty queue
item = _array[current_head]; //(3)
_head.store(increment(current_head), std::memory_order_release); //(4)
return true;
}
I understand why the Producer (4)
and the Consumer (2)
statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store
by the Producer
will be visible side effects once the consumer
will see the stored value.
I also understand why the Consumer (4)
statement is needed, this is to make sure that the Consumer (3)
load will be performed before the Consumer (4)
store will be performed.
The question
- Why is the
Producer (2)
load needs to be performed with acquire semantic (instead of relaxed)? Is it to preventProducer (3) or (4)
to be reodered before the condition (at compile time or at runtime)?
c++ multithreading c++11 atomic lock-free
One question per question please.
– πάντα ῥεῖ
Jan 19 at 14:50
@πάνταῥεῖ done ;)
– Yoo
Jan 19 at 14:52
add a comment |
I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed.
I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.
With this implementation, there is only a unique thread which can call the push()
function and another unique thread which can call the pop()
function.
Here is the Producer
code:
bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); //(1)
const auto next_tail = increment(current_tail);
if(next_tail != _head.load(std::memory_order_acquire)) //(2)
{
_array[current_tail] = item; //(3)
_tail.store(next_tail, std::memory_order_release); //(4)
return true;
}
return false; // full queue
}
Here is the Consumer
code:
bool pop(Element& item)
{
const auto current_head = _head.load(std::memory_order_relaxed); //(1)
if(current_head == _tail.load(std::memory_order_acquire)) //(2)
return false; // empty queue
item = _array[current_head]; //(3)
_head.store(increment(current_head), std::memory_order_release); //(4)
return true;
}
I understand why the Producer (4)
and the Consumer (2)
statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store
by the Producer
will be visible side effects once the consumer
will see the stored value.
I also understand why the Consumer (4)
statement is needed, this is to make sure that the Consumer (3)
load will be performed before the Consumer (4)
store will be performed.
The question
- Why is the
Producer (2)
load needs to be performed with acquire semantic (instead of relaxed)? Is it to preventProducer (3) or (4)
to be reodered before the condition (at compile time or at runtime)?
c++ multithreading c++11 atomic lock-free
I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed.
I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.
With this implementation, there is only a unique thread which can call the push()
function and another unique thread which can call the pop()
function.
Here is the Producer
code:
bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); //(1)
const auto next_tail = increment(current_tail);
if(next_tail != _head.load(std::memory_order_acquire)) //(2)
{
_array[current_tail] = item; //(3)
_tail.store(next_tail, std::memory_order_release); //(4)
return true;
}
return false; // full queue
}
Here is the Consumer
code:
bool pop(Element& item)
{
const auto current_head = _head.load(std::memory_order_relaxed); //(1)
if(current_head == _tail.load(std::memory_order_acquire)) //(2)
return false; // empty queue
item = _array[current_head]; //(3)
_head.store(increment(current_head), std::memory_order_release); //(4)
return true;
}
I understand why the Producer (4)
and the Consumer (2)
statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store
by the Producer
will be visible side effects once the consumer
will see the stored value.
I also understand why the Consumer (4)
statement is needed, this is to make sure that the Consumer (3)
load will be performed before the Consumer (4)
store will be performed.
The question
- Why is the
Producer (2)
load needs to be performed with acquire semantic (instead of relaxed)? Is it to preventProducer (3) or (4)
to be reodered before the condition (at compile time or at runtime)?
c++ multithreading c++11 atomic lock-free
c++ multithreading c++11 atomic lock-free
edited Jan 20 at 17:10
Yoo
asked Jan 19 at 14:47
YooYoo
526
526
One question per question please.
– πάντα ῥεῖ
Jan 19 at 14:50
@πάνταῥεῖ done ;)
– Yoo
Jan 19 at 14:52
add a comment |
One question per question please.
– πάντα ῥεῖ
Jan 19 at 14:50
@πάνταῥεῖ done ;)
– Yoo
Jan 19 at 14:52
One question per question please.
– πάντα ῥεῖ
Jan 19 at 14:50
One question per question please.
– πάντα ῥεῖ
Jan 19 at 14:50
@πάνταῥεῖ done ;)
– Yoo
Jan 19 at 14:52
@πάνταῥεῖ done ;)
– Yoo
Jan 19 at 14:52
add a comment |
2 Answers
2
active
oldest
votes
we need prove that
_array[current_tail] = item; // push(3)
excecuted after conforming (current_head == current_tail
)
item = _array[current_head]; // pop(3)
is completed. we can overwrite cell, only after data from it already copied to item
the
_head.load(std::memory_order_acquire) // push(2)
synchronized with
_head.store(increment(current_head), std::memory_order_release); //pop(4)
via Release-Acquire ordering:
all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head
become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head
.
so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head]
is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head]
already free.
from another side from memory_order_acquire
load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed
item = _array[current_head]; //pop(3)
_head.store(increment(current_head), std::memory_order_release); //pop(4)
-----
_head.load(std::memory_order_acquire); //push(2)
_array[current_tail] = item; //push(3)
Thanks for the detailed answer.Something still bother me with the barrier in the condition of theProducer
code. Because of possible speculative execution, the lineif(next_tail != _head.load(std::memory_order_acquire))
could be compiled asconst auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was }
as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.
– Yoo
Jan 20 at 15:37
@Yoo - i not understand what you try say. when we do load withmemory_order_acquire
- no reads or writes in the current thread can be reordered before this load.
– RbMm
Jan 20 at 15:46
I want to say that theload-acquire
could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you writeif(next_tail != _head.load(std::memory_order_acquire))
that it couldn't be interpreted as first a load then the condition
– Yoo
Jan 20 at 15:50
@Yoo - no. you wrong. the_head.load(std::memory_order_acquire)
will be executed before_array[current_tail] = item;
– RbMm
Jan 20 at 15:52
@Yoo in which order will be 2 loads from memory -_head.load(std::memory_order_acquire)
andnext_tail
- no play role, becausenext_tail
local, private to thread variable. but store to memory_array[current_tail] = item
can not begin until load acquire full completed
– RbMm
Jan 20 at 15:55
|
show 5 more comments
The memory barriers prevent the CPU from reordering accesses to the Element
object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).
Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?
– Yoo
Jan 19 at 18:05
@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of theif
, not in the controlling expression.
– Ben Voigt
Jan 19 at 18:15
Thanks for your detailed answers !
– Yoo
Jan 20 at 17:09
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54268248%2flock-free-single-producer-single-consumer-circular-buffer%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
we need prove that
_array[current_tail] = item; // push(3)
excecuted after conforming (current_head == current_tail
)
item = _array[current_head]; // pop(3)
is completed. we can overwrite cell, only after data from it already copied to item
the
_head.load(std::memory_order_acquire) // push(2)
synchronized with
_head.store(increment(current_head), std::memory_order_release); //pop(4)
via Release-Acquire ordering:
all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head
become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head
.
so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head]
is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head]
already free.
from another side from memory_order_acquire
load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed
item = _array[current_head]; //pop(3)
_head.store(increment(current_head), std::memory_order_release); //pop(4)
-----
_head.load(std::memory_order_acquire); //push(2)
_array[current_tail] = item; //push(3)
Thanks for the detailed answer.Something still bother me with the barrier in the condition of theProducer
code. Because of possible speculative execution, the lineif(next_tail != _head.load(std::memory_order_acquire))
could be compiled asconst auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was }
as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.
– Yoo
Jan 20 at 15:37
@Yoo - i not understand what you try say. when we do load withmemory_order_acquire
- no reads or writes in the current thread can be reordered before this load.
– RbMm
Jan 20 at 15:46
I want to say that theload-acquire
could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you writeif(next_tail != _head.load(std::memory_order_acquire))
that it couldn't be interpreted as first a load then the condition
– Yoo
Jan 20 at 15:50
@Yoo - no. you wrong. the_head.load(std::memory_order_acquire)
will be executed before_array[current_tail] = item;
– RbMm
Jan 20 at 15:52
@Yoo in which order will be 2 loads from memory -_head.load(std::memory_order_acquire)
andnext_tail
- no play role, becausenext_tail
local, private to thread variable. but store to memory_array[current_tail] = item
can not begin until load acquire full completed
– RbMm
Jan 20 at 15:55
|
show 5 more comments
we need prove that
_array[current_tail] = item; // push(3)
excecuted after conforming (current_head == current_tail
)
item = _array[current_head]; // pop(3)
is completed. we can overwrite cell, only after data from it already copied to item
the
_head.load(std::memory_order_acquire) // push(2)
synchronized with
_head.store(increment(current_head), std::memory_order_release); //pop(4)
via Release-Acquire ordering:
all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head
become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head
.
so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head]
is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head]
already free.
from another side from memory_order_acquire
load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed
item = _array[current_head]; //pop(3)
_head.store(increment(current_head), std::memory_order_release); //pop(4)
-----
_head.load(std::memory_order_acquire); //push(2)
_array[current_tail] = item; //push(3)
Thanks for the detailed answer.Something still bother me with the barrier in the condition of theProducer
code. Because of possible speculative execution, the lineif(next_tail != _head.load(std::memory_order_acquire))
could be compiled asconst auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was }
as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.
– Yoo
Jan 20 at 15:37
@Yoo - i not understand what you try say. when we do load withmemory_order_acquire
- no reads or writes in the current thread can be reordered before this load.
– RbMm
Jan 20 at 15:46
I want to say that theload-acquire
could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you writeif(next_tail != _head.load(std::memory_order_acquire))
that it couldn't be interpreted as first a load then the condition
– Yoo
Jan 20 at 15:50
@Yoo - no. you wrong. the_head.load(std::memory_order_acquire)
will be executed before_array[current_tail] = item;
– RbMm
Jan 20 at 15:52
@Yoo in which order will be 2 loads from memory -_head.load(std::memory_order_acquire)
andnext_tail
- no play role, becausenext_tail
local, private to thread variable. but store to memory_array[current_tail] = item
can not begin until load acquire full completed
– RbMm
Jan 20 at 15:55
|
show 5 more comments
we need prove that
_array[current_tail] = item; // push(3)
excecuted after conforming (current_head == current_tail
)
item = _array[current_head]; // pop(3)
is completed. we can overwrite cell, only after data from it already copied to item
the
_head.load(std::memory_order_acquire) // push(2)
synchronized with
_head.store(increment(current_head), std::memory_order_release); //pop(4)
via Release-Acquire ordering:
all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head
become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head
.
so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head]
is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head]
already free.
from another side from memory_order_acquire
load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed
item = _array[current_head]; //pop(3)
_head.store(increment(current_head), std::memory_order_release); //pop(4)
-----
_head.load(std::memory_order_acquire); //push(2)
_array[current_tail] = item; //push(3)
we need prove that
_array[current_tail] = item; // push(3)
excecuted after conforming (current_head == current_tail
)
item = _array[current_head]; // pop(3)
is completed. we can overwrite cell, only after data from it already copied to item
the
_head.load(std::memory_order_acquire) // push(2)
synchronized with
_head.store(increment(current_head), std::memory_order_release); //pop(4)
via Release-Acquire ordering:
all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head
become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head
.
so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head]
is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head]
already free.
from another side from memory_order_acquire
load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed
item = _array[current_head]; //pop(3)
_head.store(increment(current_head), std::memory_order_release); //pop(4)
-----
_head.load(std::memory_order_acquire); //push(2)
_array[current_tail] = item; //push(3)
answered Jan 19 at 19:40
RbMmRbMm
17.7k11224
17.7k11224
Thanks for the detailed answer.Something still bother me with the barrier in the condition of theProducer
code. Because of possible speculative execution, the lineif(next_tail != _head.load(std::memory_order_acquire))
could be compiled asconst auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was }
as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.
– Yoo
Jan 20 at 15:37
@Yoo - i not understand what you try say. when we do load withmemory_order_acquire
- no reads or writes in the current thread can be reordered before this load.
– RbMm
Jan 20 at 15:46
I want to say that theload-acquire
could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you writeif(next_tail != _head.load(std::memory_order_acquire))
that it couldn't be interpreted as first a load then the condition
– Yoo
Jan 20 at 15:50
@Yoo - no. you wrong. the_head.load(std::memory_order_acquire)
will be executed before_array[current_tail] = item;
– RbMm
Jan 20 at 15:52
@Yoo in which order will be 2 loads from memory -_head.load(std::memory_order_acquire)
andnext_tail
- no play role, becausenext_tail
local, private to thread variable. but store to memory_array[current_tail] = item
can not begin until load acquire full completed
– RbMm
Jan 20 at 15:55
|
show 5 more comments
Thanks for the detailed answer.Something still bother me with the barrier in the condition of theProducer
code. Because of possible speculative execution, the lineif(next_tail != _head.load(std::memory_order_acquire))
could be compiled asconst auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was }
as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.
– Yoo
Jan 20 at 15:37
@Yoo - i not understand what you try say. when we do load withmemory_order_acquire
- no reads or writes in the current thread can be reordered before this load.
– RbMm
Jan 20 at 15:46
I want to say that theload-acquire
could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you writeif(next_tail != _head.load(std::memory_order_acquire))
that it couldn't be interpreted as first a load then the condition
– Yoo
Jan 20 at 15:50
@Yoo - no. you wrong. the_head.load(std::memory_order_acquire)
will be executed before_array[current_tail] = item;
– RbMm
Jan 20 at 15:52
@Yoo in which order will be 2 loads from memory -_head.load(std::memory_order_acquire)
andnext_tail
- no play role, becausenext_tail
local, private to thread variable. but store to memory_array[current_tail] = item
can not begin until load acquire full completed
– RbMm
Jan 20 at 15:55
Thanks for the detailed answer.Something still bother me with the barrier in the condition of the
Producer
code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire))
could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was }
as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.– Yoo
Jan 20 at 15:37
Thanks for the detailed answer.Something still bother me with the barrier in the condition of the
Producer
code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire))
could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was }
as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.– Yoo
Jan 20 at 15:37
@Yoo - i not understand what you try say. when we do load with
memory_order_acquire
- no reads or writes in the current thread can be reordered before this load.– RbMm
Jan 20 at 15:46
@Yoo - i not understand what you try say. when we do load with
memory_order_acquire
- no reads or writes in the current thread can be reordered before this load.– RbMm
Jan 20 at 15:46
I want to say that the
load-acquire
could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire))
that it couldn't be interpreted as first a load then the condition– Yoo
Jan 20 at 15:50
I want to say that the
load-acquire
could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire))
that it couldn't be interpreted as first a load then the condition– Yoo
Jan 20 at 15:50
@Yoo - no. you wrong. the
_head.load(std::memory_order_acquire)
will be executed before _array[current_tail] = item;
– RbMm
Jan 20 at 15:52
@Yoo - no. you wrong. the
_head.load(std::memory_order_acquire)
will be executed before _array[current_tail] = item;
– RbMm
Jan 20 at 15:52
@Yoo in which order will be 2 loads from memory -
_head.load(std::memory_order_acquire)
and next_tail
- no play role, because next_tail
local, private to thread variable. but store to memory _array[current_tail] = item
can not begin until load acquire full completed– RbMm
Jan 20 at 15:55
@Yoo in which order will be 2 loads from memory -
_head.load(std::memory_order_acquire)
and next_tail
- no play role, because next_tail
local, private to thread variable. but store to memory _array[current_tail] = item
can not begin until load acquire full completed– RbMm
Jan 20 at 15:55
|
show 5 more comments
The memory barriers prevent the CPU from reordering accesses to the Element
object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).
Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?
– Yoo
Jan 19 at 18:05
@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of theif
, not in the controlling expression.
– Ben Voigt
Jan 19 at 18:15
Thanks for your detailed answers !
– Yoo
Jan 20 at 17:09
add a comment |
The memory barriers prevent the CPU from reordering accesses to the Element
object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).
Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?
– Yoo
Jan 19 at 18:05
@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of theif
, not in the controlling expression.
– Ben Voigt
Jan 19 at 18:15
Thanks for your detailed answers !
– Yoo
Jan 20 at 17:09
add a comment |
The memory barriers prevent the CPU from reordering accesses to the Element
object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).
Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.
The memory barriers prevent the CPU from reordering accesses to the Element
object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).
Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.
edited Jan 19 at 16:37
answered Jan 19 at 16:32
Ben VoigtBen Voigt
235k29311571
235k29311571
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?
– Yoo
Jan 19 at 18:05
@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of theif
, not in the controlling expression.
– Ben Voigt
Jan 19 at 18:15
Thanks for your detailed answers !
– Yoo
Jan 20 at 17:09
add a comment |
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?
– Yoo
Jan 19 at 18:05
@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of theif
, not in the controlling expression.
– Ben Voigt
Jan 19 at 18:15
Thanks for your detailed answers !
– Yoo
Jan 20 at 17:09
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?
– Yoo
Jan 19 at 18:05
The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?
– Yoo
Jan 19 at 18:05
@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the
if
, not in the controlling expression.– Ben Voigt
Jan 19 at 18:15
@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the
if
, not in the controlling expression.– Ben Voigt
Jan 19 at 18:15
Thanks for your detailed answers !
– Yoo
Jan 20 at 17:09
Thanks for your detailed answers !
– Yoo
Jan 20 at 17:09
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54268248%2flock-free-single-producer-single-consumer-circular-buffer%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
One question per question please.
– πάντα ῥεῖ
Jan 19 at 14:50
@πάνταῥεῖ done ;)
– Yoo
Jan 19 at 14:52