Lock free single producer/single consumer circular buffer

I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed.
I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.

With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.

Here is the Producer code:

bool push(const Element& item)

{       

  const auto current_tail = _tail.load(std::memory_order_relaxed);  //(1)

  const auto next_tail = increment(current_tail);



  if(next_tail != _head.load(std::memory_order_acquire))            //(2)               

  {     

    _array[current_tail] = item;                                    //(3)

    _tail.store(next_tail, std::memory_order_release);              //(4)

    return true;

  }

  return false; // full queue

}

Here is the Consumer code:

bool pop(Element& item)

{

  const auto current_head = _head.load(std::memory_order_relaxed);    //(1)

  if(current_head == _tail.load(std::memory_order_acquire))           //(2)

    return false; // empty queue



  item = _array[current_head];                                       //(3)

  _head.store(increment(current_head), std::memory_order_release);   //(4)

  return true;

}

I understand why the Producer (4) and the Consumer (2) statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store by the Producerwill be visible side effects once the consumer will see the stored value.

I also understand why the Consumer (4) statement is needed, this is to make sure that the Consumer (3) load will be performed before the Consumer (4) store will be performed.

The question

Why is the Producer (2) load needs to be performed with acquire semantic (instead of relaxed)? Is it to prevent Producer (3) or (4) to be reodered before the condition (at compile time or at runtime)?

edited Jan 20 at 17:10

asked Jan 19 at 14:47

Yoo

526

One question per question please.

– πάντα ῥεῖ
Jan 19 at 14:50

@πάνταῥεῖ done ;)

– Yoo
Jan 19 at 14:52

add a comment |

With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.

Here is the Producer code:

bool push(const Element& item)

{       

  const auto current_tail = _tail.load(std::memory_order_relaxed);  //(1)

  const auto next_tail = increment(current_tail);



  if(next_tail != _head.load(std::memory_order_acquire))            //(2)               

  {     

    _array[current_tail] = item;                                    //(3)

    _tail.store(next_tail, std::memory_order_release);              //(4)

    return true;

  }

  return false; // full queue

}

Here is the Consumer code:

bool pop(Element& item)

{

  const auto current_head = _head.load(std::memory_order_relaxed);    //(1)

  if(current_head == _tail.load(std::memory_order_acquire))           //(2)

    return false; // empty queue



  item = _array[current_head];                                       //(3)

  _head.store(increment(current_head), std::memory_order_release);   //(4)

  return true;

}

I also understand why the Consumer (4) statement is needed, this is to make sure that the Consumer (3) load will be performed before the Consumer (4) store will be performed.

The question

Why is the Producer (2) load needs to be performed with acquire semantic (instead of relaxed)? Is it to prevent Producer (3) or (4) to be reodered before the condition (at compile time or at runtime)?

edited Jan 20 at 17:10

asked Jan 19 at 14:47

Yoo

526

One question per question please.

– πάντα ῥεῖ
Jan 19 at 14:50

@πάνταῥεῖ done ;)

– Yoo
Jan 19 at 14:52

add a comment |

With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.

Here is the Producer code:

bool push(const Element& item)

{       

  const auto current_tail = _tail.load(std::memory_order_relaxed);  //(1)

  const auto next_tail = increment(current_tail);



  if(next_tail != _head.load(std::memory_order_acquire))            //(2)               

  {     

    _array[current_tail] = item;                                    //(3)

    _tail.store(next_tail, std::memory_order_release);              //(4)

    return true;

  }

  return false; // full queue

}

Here is the Consumer code:

bool pop(Element& item)

{

  const auto current_head = _head.load(std::memory_order_relaxed);    //(1)

  if(current_head == _tail.load(std::memory_order_acquire))           //(2)

    return false; // empty queue



  item = _array[current_head];                                       //(3)

  _head.store(increment(current_head), std::memory_order_release);   //(4)

  return true;

}

I also understand why the Consumer (4) statement is needed, this is to make sure that the Consumer (3) load will be performed before the Consumer (4) store will be performed.

The question

Why is the Producer (2) load needs to be performed with acquire semantic (instead of relaxed)? Is it to prevent Producer (3) or (4) to be reodered before the condition (at compile time or at runtime)?

edited Jan 20 at 17:10

asked Jan 19 at 14:47

Yoo

526

With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.

Here is the Producer code:

bool push(const Element& item)

{       

  const auto current_tail = _tail.load(std::memory_order_relaxed);  //(1)

  const auto next_tail = increment(current_tail);



  if(next_tail != _head.load(std::memory_order_acquire))            //(2)               

  {     

    _array[current_tail] = item;                                    //(3)

    _tail.store(next_tail, std::memory_order_release);              //(4)

    return true;

  }

  return false; // full queue

}

Here is the Consumer code:

bool pop(Element& item)

{

  const auto current_head = _head.load(std::memory_order_relaxed);    //(1)

  if(current_head == _tail.load(std::memory_order_acquire))           //(2)

    return false; // empty queue



  item = _array[current_head];                                       //(3)

  _head.store(increment(current_head), std::memory_order_release);   //(4)

  return true;

}

I also understand why the Consumer (4) statement is needed, this is to make sure that the Consumer (3) load will be performed before the Consumer (4) store will be performed.

The question

Why is the Producer (2) load needs to be performed with acquire semantic (instead of relaxed)? Is it to prevent Producer (3) or (4) to be reodered before the condition (at compile time or at runtime)?

c++ multithreading c++11 atomic lock-free

edited Jan 20 at 17:10

asked Jan 19 at 14:47

Yoo

526

edited Jan 20 at 17:10

asked Jan 19 at 14:47

Yoo

526

edited Jan 20 at 17:10

asked Jan 19 at 14:47

Yoo

526

asked Jan 19 at 14:47

Yoo

526

asked Jan 19 at 14:47

Yoo

526

One question per question please.

– πάντα ῥεῖ
Jan 19 at 14:50

@πάνταῥεῖ done ;)

– Yoo
Jan 19 at 14:52

add a comment |

One question per question please.

– πάντα ῥεῖ
Jan 19 at 14:50

@πάνταῥεῖ done ;)

– Yoo
Jan 19 at 14:52

One question per question please.

– πάντα ῥεῖ
Jan 19 at 14:50

@πάνταῥεῖ done ;)

– Yoo
Jan 19 at 14:52

add a comment |

2 Answers
2

active

oldest

votes

we need prove that

_array[current_tail] = item; // push(3)

excecuted after conforming (current_head == current_tail)

item = _array[current_head]; // pop(3)

is completed. we can overwrite cell, only after data from it already copied to item

the

_head.load(std::memory_order_acquire) // push(2)

synchronized with

_head.store(increment(current_head), std::memory_order_release);   //pop(4)

via Release-Acquire ordering:

all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.

so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head] is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head] already free.

from another side from memory_order_acquire load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed

item = _array[current_head];                                        //pop(3)

_head.store(increment(current_head), std::memory_order_release);    //pop(4)

-----

    _head.load(std::memory_order_acquire);                          //push(2)

    _array[current_tail] = item;                                    //push(3)

answered Jan 19 at 19:40

RbMm

17.7k11224

Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

– Yoo
Jan 20 at 15:37

@Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

– RbMm
Jan 20 at 15:46

I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

– Yoo
Jan 20 at 15:50

@Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

– RbMm
Jan 20 at 15:52

@Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

– RbMm
Jan 20 at 15:55

|
show 5 more comments

The memory barriers prevent the CPU from reordering accesses to the Element object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).

Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.

The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.

edited Jan 19 at 16:37

answered Jan 19 at 16:32

Ben Voigt

235k29311571

The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

– Yoo
Jan 19 at 18:05

@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

– Ben Voigt
Jan 19 at 18:15

Thanks for your detailed answers !

– Yoo
Jan 20 at 17:09

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54268248%2flock-free-single-producer-single-consumer-circular-buffer%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

we need prove that

_array[current_tail] = item; // push(3)

excecuted after conforming (current_head == current_tail)

item = _array[current_head]; // pop(3)

is completed. we can overwrite cell, only after data from it already copied to item

the

_head.load(std::memory_order_acquire) // push(2)

synchronized with

_head.store(increment(current_head), std::memory_order_release);   //pop(4)

via Release-Acquire ordering:

all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.

item = _array[current_head];                                        //pop(3)

_head.store(increment(current_head), std::memory_order_release);    //pop(4)

-----

    _head.load(std::memory_order_acquire);                          //push(2)

    _array[current_tail] = item;                                    //push(3)

answered Jan 19 at 19:40

RbMm

17.7k11224

Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

– Yoo
Jan 20 at 15:37

@Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

– RbMm
Jan 20 at 15:46

I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

– Yoo
Jan 20 at 15:50

@Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

– RbMm
Jan 20 at 15:52

@Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

– RbMm
Jan 20 at 15:55

|
show 5 more comments

we need prove that

_array[current_tail] = item; // push(3)

excecuted after conforming (current_head == current_tail)

item = _array[current_head]; // pop(3)

is completed. we can overwrite cell, only after data from it already copied to item

the

_head.load(std::memory_order_acquire) // push(2)

synchronized with

_head.store(increment(current_head), std::memory_order_release);   //pop(4)

via Release-Acquire ordering:

all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.

item = _array[current_head];                                        //pop(3)

_head.store(increment(current_head), std::memory_order_release);    //pop(4)

-----

    _head.load(std::memory_order_acquire);                          //push(2)

    _array[current_tail] = item;                                    //push(3)

answered Jan 19 at 19:40

RbMm

17.7k11224

Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

– Yoo
Jan 20 at 15:37

@Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

– RbMm
Jan 20 at 15:46

I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

– Yoo
Jan 20 at 15:50

@Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

– RbMm
Jan 20 at 15:52

@Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

– RbMm
Jan 20 at 15:55

|
show 5 more comments

we need prove that

_array[current_tail] = item; // push(3)

excecuted after conforming (current_head == current_tail)

item = _array[current_head]; // pop(3)

is completed. we can overwrite cell, only after data from it already copied to item

the

_head.load(std::memory_order_acquire) // push(2)

synchronized with

_head.store(increment(current_head), std::memory_order_release);   //pop(4)

via Release-Acquire ordering:

all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.

item = _array[current_head];                                        //pop(3)

_head.store(increment(current_head), std::memory_order_release);    //pop(4)

-----

    _head.load(std::memory_order_acquire);                          //push(2)

    _array[current_tail] = item;                                    //push(3)

answered Jan 19 at 19:40

RbMm

17.7k11224

we need prove that

_array[current_tail] = item; // push(3)

excecuted after conforming (current_head == current_tail)

item = _array[current_head]; // pop(3)

is completed. we can overwrite cell, only after data from it already copied to item

the

_head.load(std::memory_order_acquire) // push(2)

synchronized with

_head.store(increment(current_head), std::memory_order_release);   //pop(4)

via Release-Acquire ordering:

all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.

item = _array[current_head];                                        //pop(3)

_head.store(increment(current_head), std::memory_order_release);    //pop(4)

-----

    _head.load(std::memory_order_acquire);                          //push(2)

    _array[current_tail] = item;                                    //push(3)

answered Jan 19 at 19:40

RbMm

17.7k11224

answered Jan 19 at 19:40

RbMm

17.7k11224

answered Jan 19 at 19:40

RbMm

17.7k11224

answered Jan 19 at 19:40

RbMm

17.7k11224

Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

– Yoo
Jan 20 at 15:37

@Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

– RbMm
Jan 20 at 15:46

I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

– Yoo
Jan 20 at 15:50

@Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

– RbMm
Jan 20 at 15:52

@Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

– RbMm
Jan 20 at 15:55

|
show 5 more comments

Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was } as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

– Yoo
Jan 20 at 15:37

@Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

– RbMm
Jan 20 at 15:46

I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

– Yoo
Jan 20 at 15:50

@Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

– RbMm
Jan 20 at 15:52

@Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

– RbMm
Jan 20 at 15:55

Thanks for the detailed answer.Something still bother me with the barrier in the condition of the Producer code. Because of possible speculative execution, the line if(next_tail != _head.load(std::memory_order_acquire)) could be compiled as

const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was }

as you can see, the condition has been reversed due to speculative store and it could harm the memory if the pop function were called. To me the acquire barrier should be in the condition.

– Yoo
Jan 20 at 15:37

const auto head = _head.load(std::memory_order_acquire); //then here perform the stores if(next_tail == head) { //restore he memory as it was }

@Yoo - i not understand what you try say. when we do load with memory_order_acquire - no reads or writes in the current thread can be reordered before this load.

– RbMm
Jan 20 at 15:46

I want to say that the load-acquire could be executed before the condition test, so the store could still be executed before the condition (but still after the barrier). Is it more clear ? It is not beceause you write if(next_tail != _head.load(std::memory_order_acquire)) that it couldn't be interpreted as first a load then the condition

– Yoo
Jan 20 at 15:50

@Yoo - no. you wrong. the _head.load(std::memory_order_acquire) will be executed before _array[current_tail] = item;

– RbMm
Jan 20 at 15:52

@Yoo in which order will be 2 loads from memory - _head.load(std::memory_order_acquire) and next_tail - no play role, because next_tail local, private to thread variable. but store to memory _array[current_tail] = item can not begin until load acquire full completed

– RbMm
Jan 20 at 15:55

|
show 5 more comments

Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.

edited Jan 19 at 16:37

answered Jan 19 at 16:32

Ben Voigt

235k29311571

The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

– Yoo
Jan 19 at 18:05

@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

– Ben Voigt
Jan 19 at 18:15

Thanks for your detailed answers !

– Yoo
Jan 20 at 17:09

add a comment |

Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.

edited Jan 19 at 16:37

answered Jan 19 at 16:32

Ben Voigt

235k29311571

The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

– Yoo
Jan 19 at 18:05

@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

– Ben Voigt
Jan 19 at 18:15

Thanks for your detailed answers !

– Yoo
Jan 20 at 17:09

add a comment |

Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.

edited Jan 19 at 16:37

answered Jan 19 at 16:32

Ben Voigt

235k29311571

Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.

edited Jan 19 at 16:37

answered Jan 19 at 16:32

Ben Voigt

235k29311571

edited Jan 19 at 16:37

answered Jan 19 at 16:32

Ben Voigt

235k29311571

answered Jan 19 at 16:32

Ben Voigt

235k29311571

answered Jan 19 at 16:32

Ben Voigt

235k29311571

The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

– Yoo
Jan 19 at 18:05

@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

– Ben Voigt
Jan 19 at 18:15

Thanks for your detailed answers !

– Yoo
Jan 20 at 17:09

add a comment |

The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

– Yoo
Jan 19 at 18:05

@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

– Ben Voigt
Jan 19 at 18:15

Thanks for your detailed answers !

– Yoo
Jan 20 at 17:09

The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full. So you mean that the producer could speculatively perform a store before executing the condition, and in case of test failure restoring the original one? For the condition ( if(next_tail != _head.load(std::memory_order_acquire)) ), there is still a chance that the cpu first execute the load + the barrier then check the condition so the (3) operation could still be reordered between barrier and the condition, whar we are trying to avoid. Where am i wrong ?

– Yoo
Jan 19 at 18:05

@Yoo: You're not wrong, but since the condition won't stall the pipeline, there's zero advantage to performing speculation. In contrast, waiting for a memory fetch can easily stall, so out-of-order execution makes sense. Having zero payoff doesn't equate to formally forbidding speculation, to be theoretically correct I think the memory barrier should be inside the brace of the if, not in the controlling expression.

– Ben Voigt
Jan 19 at 18:15

Thanks for your detailed answers !

– Yoo
Jan 20 at 17:09

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku