Is this intel pipelining instruction?
In my knowledge, intel 8086 pipelining is technique
that fetching the next instruction when the present instruction is being executed.
This article said that one of Advantage of pipelining is
eliminates the waiting time of EU and speeds up the processing.
I think, instructions like lea 0x7(%eax), %ecx can be split into several instructions,
like add $0x7, %eax; lea %eax, %ecx.
My think)
So, by definition,
I think above example matches the definition of intel 8086 pipelining
because it executes several instructions at one time slot and
so operation like this speeds up the processing.
Question)
I'm curious about the below instructions can be the example of pipelining.
main:
mov $0x2, %eax
mov $0x3, %esi
lea (%eax), %ecx # result: 2. Pipeling?
lea 0x7(%eax), %ecx # result: 9. Pipeling?
lea 0x7(%eax,%esi,), %ecx # result: 12. Pipeling?
lea 0x7(,%esi,4), %ecx # result: 19. Pipeling?
lea 0x7(%eax,%esi,4), %ecx # result: 21. Pipeling?
assembly intel x86-16 disassembly pipelining
add a comment |
In my knowledge, intel 8086 pipelining is technique
that fetching the next instruction when the present instruction is being executed.
This article said that one of Advantage of pipelining is
eliminates the waiting time of EU and speeds up the processing.
I think, instructions like lea 0x7(%eax), %ecx can be split into several instructions,
like add $0x7, %eax; lea %eax, %ecx.
My think)
So, by definition,
I think above example matches the definition of intel 8086 pipelining
because it executes several instructions at one time slot and
so operation like this speeds up the processing.
Question)
I'm curious about the below instructions can be the example of pipelining.
main:
mov $0x2, %eax
mov $0x3, %esi
lea (%eax), %ecx # result: 2. Pipeling?
lea 0x7(%eax), %ecx # result: 9. Pipeling?
lea 0x7(%eax,%esi,), %ecx # result: 12. Pipeling?
lea 0x7(,%esi,4), %ecx # result: 19. Pipeling?
lea 0x7(%eax,%esi,4), %ecx # result: 21. Pipeling?
assembly intel x86-16 disassembly pipelining
4
Pipelining is a feature of the cpu it's not applicable to single instructions.
– Jester
Jan 19 at 12:13
and is not specific to intel or the x86, it is widely used.
– old_timer
Jan 19 at 13:29
although I dont think the 8086 used it, it had a small prefetch buffer...
– old_timer
Jan 19 at 13:29
add a comment |
In my knowledge, intel 8086 pipelining is technique
that fetching the next instruction when the present instruction is being executed.
This article said that one of Advantage of pipelining is
eliminates the waiting time of EU and speeds up the processing.
I think, instructions like lea 0x7(%eax), %ecx can be split into several instructions,
like add $0x7, %eax; lea %eax, %ecx.
My think)
So, by definition,
I think above example matches the definition of intel 8086 pipelining
because it executes several instructions at one time slot and
so operation like this speeds up the processing.
Question)
I'm curious about the below instructions can be the example of pipelining.
main:
mov $0x2, %eax
mov $0x3, %esi
lea (%eax), %ecx # result: 2. Pipeling?
lea 0x7(%eax), %ecx # result: 9. Pipeling?
lea 0x7(%eax,%esi,), %ecx # result: 12. Pipeling?
lea 0x7(,%esi,4), %ecx # result: 19. Pipeling?
lea 0x7(%eax,%esi,4), %ecx # result: 21. Pipeling?
assembly intel x86-16 disassembly pipelining
In my knowledge, intel 8086 pipelining is technique
that fetching the next instruction when the present instruction is being executed.
This article said that one of Advantage of pipelining is
eliminates the waiting time of EU and speeds up the processing.
I think, instructions like lea 0x7(%eax), %ecx can be split into several instructions,
like add $0x7, %eax; lea %eax, %ecx.
My think)
So, by definition,
I think above example matches the definition of intel 8086 pipelining
because it executes several instructions at one time slot and
so operation like this speeds up the processing.
Question)
I'm curious about the below instructions can be the example of pipelining.
main:
mov $0x2, %eax
mov $0x3, %esi
lea (%eax), %ecx # result: 2. Pipeling?
lea 0x7(%eax), %ecx # result: 9. Pipeling?
lea 0x7(%eax,%esi,), %ecx # result: 12. Pipeling?
lea 0x7(,%esi,4), %ecx # result: 19. Pipeling?
lea 0x7(%eax,%esi,4), %ecx # result: 21. Pipeling?
assembly intel x86-16 disassembly pipelining
assembly intel x86-16 disassembly pipelining
asked Jan 19 at 11:48
JiwonJiwon
320213
320213
4
Pipelining is a feature of the cpu it's not applicable to single instructions.
– Jester
Jan 19 at 12:13
and is not specific to intel or the x86, it is widely used.
– old_timer
Jan 19 at 13:29
although I dont think the 8086 used it, it had a small prefetch buffer...
– old_timer
Jan 19 at 13:29
add a comment |
4
Pipelining is a feature of the cpu it's not applicable to single instructions.
– Jester
Jan 19 at 12:13
and is not specific to intel or the x86, it is widely used.
– old_timer
Jan 19 at 13:29
although I dont think the 8086 used it, it had a small prefetch buffer...
– old_timer
Jan 19 at 13:29
4
4
Pipelining is a feature of the cpu it's not applicable to single instructions.
– Jester
Jan 19 at 12:13
Pipelining is a feature of the cpu it's not applicable to single instructions.
– Jester
Jan 19 at 12:13
and is not specific to intel or the x86, it is widely used.
– old_timer
Jan 19 at 13:29
and is not specific to intel or the x86, it is widely used.
– old_timer
Jan 19 at 13:29
although I dont think the 8086 used it, it had a small prefetch buffer...
– old_timer
Jan 19 at 13:29
although I dont think the 8086 used it, it had a small prefetch buffer...
– old_timer
Jan 19 at 13:29
add a comment |
1 Answer
1
active
oldest
votes
The very first computers were not pipelined. They fetched an instruction, executed all the cycles required by the instructions and passed to the next instruction. On the average an instruction required 5-6 cycles.
This behavior was true for all computers up to, mid 80's, including the 8086 (introduced in 78).
In the late seventies, people discovered that pipeline was an efficient way to improve efficiency. The first commercial chip was IBM 801, but success of pipeline was with Sun Sparc, Berkeley Risc and MIPS, in the mid eighties.
The idea is to split all instructions in similar stages, and to associate this stages to independent HW resources, in such a way that you can start a new instruction, without waiting for the previous one to complete, allowing to start a new instruction per cycle. To deal with instruction interaction (hazards), it is more 1 instruction every ~1.5 cycles, but the gain vs the previous generation is huge (X3 performances).
The problem is that pipelining requires specific instructions sets, based on simple data movement (called RISC instruction sets). New computers were based on this scheme, but older processors instruction sets were not adapted (including x86).
To profit from the pipeline benefit, while keeping upward compatibility, intel decided to have a microarchitecture based on two part: the first one fetches x86 instructions and translates them to pipelinable ones (called μOps) and the second one is a pipeline execution of these μOps. This was first introduced with Pentium Pro (introduced in 1995) and is present in any successive versions.
The code that you give
lea 0x7(%eax), %ecx
translated to
add $0x7, %eax;
lea %eax, %ecx
is an example of μOps translation. Translation is peculiarly useful do convert ALU operations with an in-memory operand to simpler operations performing a single task (memory transfer or ALU op).
Presently all computers are pipelined (except some simple μcontrolers or some processors embedded in FPGA).
Whatever sequence of instructions that you give will be executed in pipeline, including of course the ones in your question. The only limitation is that depending on instruction interactions, there can be hazards that may imply a slowdown (stall) of the pipeline.
I think above example matches the definition of intel 8086 pipelining
Pipeline is a characteristic of a microarchitecture, not of an instruction set. So the microarchitecture 8086 was NOT pipeline, but subsequent architectural implementations of its instruction set (called x86 or IA32) are.
because it executes several instructions at one time slot
You are actually right and several instructions can be started at one time slot, but this is another technique above pipeline allowing parallel execution at the instruction level and called superscalar.
1
Original 8086 pipelines instruction prefetch, with a 6 byte buffer. This is really easy to implement compared to pipelining decode / execute / etc, but it's still pipelining that some even earlier CPUs didn't have.
– Peter Cordes
Jan 19 at 18:33
1
PPro (first-gen P6 family) was the first to decode x86 instructions into RISC-like uops. 486 and Pentium just pipelined x86 instructions when possible, making it more efficient to use more but simpler x86 instructions (e.g. avoiding memory-destinationadd [mem], reg, because even though it can pair in either pipe, it takes 3 cycles before another (pair of) instruction(s) can start to execute.) Also avoiding push/pop was sometimes good on P5 Pentium. But mostly that was good for P6 before they added a stack engine to get rid of the stack-pointer update uops, while P5 could pair push/pop.
– Peter Cordes
Jan 19 at 18:39
1
Splitting LEA into ADD and LEA (I guess actually MOV?) is hardly a good example. It's that sort of idea, but that decoding doesn't even make sense (it modifies the original EAX), and shift+add is not hard so LEA is always a single uop on all CPUs that decode complex instructions to multiple uops. A better example is a memory-destination add, which is load + add + store.
– Peter Cordes
Jan 19 at 18:41
Yes, pipelined vs. non-pipelined is an implementation decision. But some ISAs explicitly expose some of the pipelining. e.g. branch-delay slots on MIPS and some other RISCs, and especially early MIPS load delay slots (using the result of a load in the next instruction gives unpredictable results; the first-gen MIPS CPU hardware doesn't interlock for the dependency, so the ISA leaves that undefined). Also, the Mill has delayed loads only write their destination register after some number of other instructions. (So you can expose ILP to an in-order pipeline.)
– Peter Cordes
Jan 19 at 18:45
See Agner Fog's microarch pdf (agner.org/optimize) for details on the pipeline in P5 Pentium vs. P6, and vs. in-order Atom. (In-order Atom (not Silvermont) does decode to uops, unlike in-order P5.)
– Peter Cordes
Jan 19 at 18:47
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54266755%2fis-this-intel-pipelining-instruction%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The very first computers were not pipelined. They fetched an instruction, executed all the cycles required by the instructions and passed to the next instruction. On the average an instruction required 5-6 cycles.
This behavior was true for all computers up to, mid 80's, including the 8086 (introduced in 78).
In the late seventies, people discovered that pipeline was an efficient way to improve efficiency. The first commercial chip was IBM 801, but success of pipeline was with Sun Sparc, Berkeley Risc and MIPS, in the mid eighties.
The idea is to split all instructions in similar stages, and to associate this stages to independent HW resources, in such a way that you can start a new instruction, without waiting for the previous one to complete, allowing to start a new instruction per cycle. To deal with instruction interaction (hazards), it is more 1 instruction every ~1.5 cycles, but the gain vs the previous generation is huge (X3 performances).
The problem is that pipelining requires specific instructions sets, based on simple data movement (called RISC instruction sets). New computers were based on this scheme, but older processors instruction sets were not adapted (including x86).
To profit from the pipeline benefit, while keeping upward compatibility, intel decided to have a microarchitecture based on two part: the first one fetches x86 instructions and translates them to pipelinable ones (called μOps) and the second one is a pipeline execution of these μOps. This was first introduced with Pentium Pro (introduced in 1995) and is present in any successive versions.
The code that you give
lea 0x7(%eax), %ecx
translated to
add $0x7, %eax;
lea %eax, %ecx
is an example of μOps translation. Translation is peculiarly useful do convert ALU operations with an in-memory operand to simpler operations performing a single task (memory transfer or ALU op).
Presently all computers are pipelined (except some simple μcontrolers or some processors embedded in FPGA).
Whatever sequence of instructions that you give will be executed in pipeline, including of course the ones in your question. The only limitation is that depending on instruction interactions, there can be hazards that may imply a slowdown (stall) of the pipeline.
I think above example matches the definition of intel 8086 pipelining
Pipeline is a characteristic of a microarchitecture, not of an instruction set. So the microarchitecture 8086 was NOT pipeline, but subsequent architectural implementations of its instruction set (called x86 or IA32) are.
because it executes several instructions at one time slot
You are actually right and several instructions can be started at one time slot, but this is another technique above pipeline allowing parallel execution at the instruction level and called superscalar.
1
Original 8086 pipelines instruction prefetch, with a 6 byte buffer. This is really easy to implement compared to pipelining decode / execute / etc, but it's still pipelining that some even earlier CPUs didn't have.
– Peter Cordes
Jan 19 at 18:33
1
PPro (first-gen P6 family) was the first to decode x86 instructions into RISC-like uops. 486 and Pentium just pipelined x86 instructions when possible, making it more efficient to use more but simpler x86 instructions (e.g. avoiding memory-destinationadd [mem], reg, because even though it can pair in either pipe, it takes 3 cycles before another (pair of) instruction(s) can start to execute.) Also avoiding push/pop was sometimes good on P5 Pentium. But mostly that was good for P6 before they added a stack engine to get rid of the stack-pointer update uops, while P5 could pair push/pop.
– Peter Cordes
Jan 19 at 18:39
1
Splitting LEA into ADD and LEA (I guess actually MOV?) is hardly a good example. It's that sort of idea, but that decoding doesn't even make sense (it modifies the original EAX), and shift+add is not hard so LEA is always a single uop on all CPUs that decode complex instructions to multiple uops. A better example is a memory-destination add, which is load + add + store.
– Peter Cordes
Jan 19 at 18:41
Yes, pipelined vs. non-pipelined is an implementation decision. But some ISAs explicitly expose some of the pipelining. e.g. branch-delay slots on MIPS and some other RISCs, and especially early MIPS load delay slots (using the result of a load in the next instruction gives unpredictable results; the first-gen MIPS CPU hardware doesn't interlock for the dependency, so the ISA leaves that undefined). Also, the Mill has delayed loads only write their destination register after some number of other instructions. (So you can expose ILP to an in-order pipeline.)
– Peter Cordes
Jan 19 at 18:45
See Agner Fog's microarch pdf (agner.org/optimize) for details on the pipeline in P5 Pentium vs. P6, and vs. in-order Atom. (In-order Atom (not Silvermont) does decode to uops, unlike in-order P5.)
– Peter Cordes
Jan 19 at 18:47
|
show 1 more comment
The very first computers were not pipelined. They fetched an instruction, executed all the cycles required by the instructions and passed to the next instruction. On the average an instruction required 5-6 cycles.
This behavior was true for all computers up to, mid 80's, including the 8086 (introduced in 78).
In the late seventies, people discovered that pipeline was an efficient way to improve efficiency. The first commercial chip was IBM 801, but success of pipeline was with Sun Sparc, Berkeley Risc and MIPS, in the mid eighties.
The idea is to split all instructions in similar stages, and to associate this stages to independent HW resources, in such a way that you can start a new instruction, without waiting for the previous one to complete, allowing to start a new instruction per cycle. To deal with instruction interaction (hazards), it is more 1 instruction every ~1.5 cycles, but the gain vs the previous generation is huge (X3 performances).
The problem is that pipelining requires specific instructions sets, based on simple data movement (called RISC instruction sets). New computers were based on this scheme, but older processors instruction sets were not adapted (including x86).
To profit from the pipeline benefit, while keeping upward compatibility, intel decided to have a microarchitecture based on two part: the first one fetches x86 instructions and translates them to pipelinable ones (called μOps) and the second one is a pipeline execution of these μOps. This was first introduced with Pentium Pro (introduced in 1995) and is present in any successive versions.
The code that you give
lea 0x7(%eax), %ecx
translated to
add $0x7, %eax;
lea %eax, %ecx
is an example of μOps translation. Translation is peculiarly useful do convert ALU operations with an in-memory operand to simpler operations performing a single task (memory transfer or ALU op).
Presently all computers are pipelined (except some simple μcontrolers or some processors embedded in FPGA).
Whatever sequence of instructions that you give will be executed in pipeline, including of course the ones in your question. The only limitation is that depending on instruction interactions, there can be hazards that may imply a slowdown (stall) of the pipeline.
I think above example matches the definition of intel 8086 pipelining
Pipeline is a characteristic of a microarchitecture, not of an instruction set. So the microarchitecture 8086 was NOT pipeline, but subsequent architectural implementations of its instruction set (called x86 or IA32) are.
because it executes several instructions at one time slot
You are actually right and several instructions can be started at one time slot, but this is another technique above pipeline allowing parallel execution at the instruction level and called superscalar.
1
Original 8086 pipelines instruction prefetch, with a 6 byte buffer. This is really easy to implement compared to pipelining decode / execute / etc, but it's still pipelining that some even earlier CPUs didn't have.
– Peter Cordes
Jan 19 at 18:33
1
PPro (first-gen P6 family) was the first to decode x86 instructions into RISC-like uops. 486 and Pentium just pipelined x86 instructions when possible, making it more efficient to use more but simpler x86 instructions (e.g. avoiding memory-destinationadd [mem], reg, because even though it can pair in either pipe, it takes 3 cycles before another (pair of) instruction(s) can start to execute.) Also avoiding push/pop was sometimes good on P5 Pentium. But mostly that was good for P6 before they added a stack engine to get rid of the stack-pointer update uops, while P5 could pair push/pop.
– Peter Cordes
Jan 19 at 18:39
1
Splitting LEA into ADD and LEA (I guess actually MOV?) is hardly a good example. It's that sort of idea, but that decoding doesn't even make sense (it modifies the original EAX), and shift+add is not hard so LEA is always a single uop on all CPUs that decode complex instructions to multiple uops. A better example is a memory-destination add, which is load + add + store.
– Peter Cordes
Jan 19 at 18:41
Yes, pipelined vs. non-pipelined is an implementation decision. But some ISAs explicitly expose some of the pipelining. e.g. branch-delay slots on MIPS and some other RISCs, and especially early MIPS load delay slots (using the result of a load in the next instruction gives unpredictable results; the first-gen MIPS CPU hardware doesn't interlock for the dependency, so the ISA leaves that undefined). Also, the Mill has delayed loads only write their destination register after some number of other instructions. (So you can expose ILP to an in-order pipeline.)
– Peter Cordes
Jan 19 at 18:45
See Agner Fog's microarch pdf (agner.org/optimize) for details on the pipeline in P5 Pentium vs. P6, and vs. in-order Atom. (In-order Atom (not Silvermont) does decode to uops, unlike in-order P5.)
– Peter Cordes
Jan 19 at 18:47
|
show 1 more comment
The very first computers were not pipelined. They fetched an instruction, executed all the cycles required by the instructions and passed to the next instruction. On the average an instruction required 5-6 cycles.
This behavior was true for all computers up to, mid 80's, including the 8086 (introduced in 78).
In the late seventies, people discovered that pipeline was an efficient way to improve efficiency. The first commercial chip was IBM 801, but success of pipeline was with Sun Sparc, Berkeley Risc and MIPS, in the mid eighties.
The idea is to split all instructions in similar stages, and to associate this stages to independent HW resources, in such a way that you can start a new instruction, without waiting for the previous one to complete, allowing to start a new instruction per cycle. To deal with instruction interaction (hazards), it is more 1 instruction every ~1.5 cycles, but the gain vs the previous generation is huge (X3 performances).
The problem is that pipelining requires specific instructions sets, based on simple data movement (called RISC instruction sets). New computers were based on this scheme, but older processors instruction sets were not adapted (including x86).
To profit from the pipeline benefit, while keeping upward compatibility, intel decided to have a microarchitecture based on two part: the first one fetches x86 instructions and translates them to pipelinable ones (called μOps) and the second one is a pipeline execution of these μOps. This was first introduced with Pentium Pro (introduced in 1995) and is present in any successive versions.
The code that you give
lea 0x7(%eax), %ecx
translated to
add $0x7, %eax;
lea %eax, %ecx
is an example of μOps translation. Translation is peculiarly useful do convert ALU operations with an in-memory operand to simpler operations performing a single task (memory transfer or ALU op).
Presently all computers are pipelined (except some simple μcontrolers or some processors embedded in FPGA).
Whatever sequence of instructions that you give will be executed in pipeline, including of course the ones in your question. The only limitation is that depending on instruction interactions, there can be hazards that may imply a slowdown (stall) of the pipeline.
I think above example matches the definition of intel 8086 pipelining
Pipeline is a characteristic of a microarchitecture, not of an instruction set. So the microarchitecture 8086 was NOT pipeline, but subsequent architectural implementations of its instruction set (called x86 or IA32) are.
because it executes several instructions at one time slot
You are actually right and several instructions can be started at one time slot, but this is another technique above pipeline allowing parallel execution at the instruction level and called superscalar.
The very first computers were not pipelined. They fetched an instruction, executed all the cycles required by the instructions and passed to the next instruction. On the average an instruction required 5-6 cycles.
This behavior was true for all computers up to, mid 80's, including the 8086 (introduced in 78).
In the late seventies, people discovered that pipeline was an efficient way to improve efficiency. The first commercial chip was IBM 801, but success of pipeline was with Sun Sparc, Berkeley Risc and MIPS, in the mid eighties.
The idea is to split all instructions in similar stages, and to associate this stages to independent HW resources, in such a way that you can start a new instruction, without waiting for the previous one to complete, allowing to start a new instruction per cycle. To deal with instruction interaction (hazards), it is more 1 instruction every ~1.5 cycles, but the gain vs the previous generation is huge (X3 performances).
The problem is that pipelining requires specific instructions sets, based on simple data movement (called RISC instruction sets). New computers were based on this scheme, but older processors instruction sets were not adapted (including x86).
To profit from the pipeline benefit, while keeping upward compatibility, intel decided to have a microarchitecture based on two part: the first one fetches x86 instructions and translates them to pipelinable ones (called μOps) and the second one is a pipeline execution of these μOps. This was first introduced with Pentium Pro (introduced in 1995) and is present in any successive versions.
The code that you give
lea 0x7(%eax), %ecx
translated to
add $0x7, %eax;
lea %eax, %ecx
is an example of μOps translation. Translation is peculiarly useful do convert ALU operations with an in-memory operand to simpler operations performing a single task (memory transfer or ALU op).
Presently all computers are pipelined (except some simple μcontrolers or some processors embedded in FPGA).
Whatever sequence of instructions that you give will be executed in pipeline, including of course the ones in your question. The only limitation is that depending on instruction interactions, there can be hazards that may imply a slowdown (stall) of the pipeline.
I think above example matches the definition of intel 8086 pipelining
Pipeline is a characteristic of a microarchitecture, not of an instruction set. So the microarchitecture 8086 was NOT pipeline, but subsequent architectural implementations of its instruction set (called x86 or IA32) are.
because it executes several instructions at one time slot
You are actually right and several instructions can be started at one time slot, but this is another technique above pipeline allowing parallel execution at the instruction level and called superscalar.
edited Jan 27 at 12:47
answered Jan 19 at 18:11
Alain MerigotAlain Merigot
1,093313
1,093313
1
Original 8086 pipelines instruction prefetch, with a 6 byte buffer. This is really easy to implement compared to pipelining decode / execute / etc, but it's still pipelining that some even earlier CPUs didn't have.
– Peter Cordes
Jan 19 at 18:33
1
PPro (first-gen P6 family) was the first to decode x86 instructions into RISC-like uops. 486 and Pentium just pipelined x86 instructions when possible, making it more efficient to use more but simpler x86 instructions (e.g. avoiding memory-destinationadd [mem], reg, because even though it can pair in either pipe, it takes 3 cycles before another (pair of) instruction(s) can start to execute.) Also avoiding push/pop was sometimes good on P5 Pentium. But mostly that was good for P6 before they added a stack engine to get rid of the stack-pointer update uops, while P5 could pair push/pop.
– Peter Cordes
Jan 19 at 18:39
1
Splitting LEA into ADD and LEA (I guess actually MOV?) is hardly a good example. It's that sort of idea, but that decoding doesn't even make sense (it modifies the original EAX), and shift+add is not hard so LEA is always a single uop on all CPUs that decode complex instructions to multiple uops. A better example is a memory-destination add, which is load + add + store.
– Peter Cordes
Jan 19 at 18:41
Yes, pipelined vs. non-pipelined is an implementation decision. But some ISAs explicitly expose some of the pipelining. e.g. branch-delay slots on MIPS and some other RISCs, and especially early MIPS load delay slots (using the result of a load in the next instruction gives unpredictable results; the first-gen MIPS CPU hardware doesn't interlock for the dependency, so the ISA leaves that undefined). Also, the Mill has delayed loads only write their destination register after some number of other instructions. (So you can expose ILP to an in-order pipeline.)
– Peter Cordes
Jan 19 at 18:45
See Agner Fog's microarch pdf (agner.org/optimize) for details on the pipeline in P5 Pentium vs. P6, and vs. in-order Atom. (In-order Atom (not Silvermont) does decode to uops, unlike in-order P5.)
– Peter Cordes
Jan 19 at 18:47
|
show 1 more comment
1
Original 8086 pipelines instruction prefetch, with a 6 byte buffer. This is really easy to implement compared to pipelining decode / execute / etc, but it's still pipelining that some even earlier CPUs didn't have.
– Peter Cordes
Jan 19 at 18:33
1
PPro (first-gen P6 family) was the first to decode x86 instructions into RISC-like uops. 486 and Pentium just pipelined x86 instructions when possible, making it more efficient to use more but simpler x86 instructions (e.g. avoiding memory-destinationadd [mem], reg, because even though it can pair in either pipe, it takes 3 cycles before another (pair of) instruction(s) can start to execute.) Also avoiding push/pop was sometimes good on P5 Pentium. But mostly that was good for P6 before they added a stack engine to get rid of the stack-pointer update uops, while P5 could pair push/pop.
– Peter Cordes
Jan 19 at 18:39
1
Splitting LEA into ADD and LEA (I guess actually MOV?) is hardly a good example. It's that sort of idea, but that decoding doesn't even make sense (it modifies the original EAX), and shift+add is not hard so LEA is always a single uop on all CPUs that decode complex instructions to multiple uops. A better example is a memory-destination add, which is load + add + store.
– Peter Cordes
Jan 19 at 18:41
Yes, pipelined vs. non-pipelined is an implementation decision. But some ISAs explicitly expose some of the pipelining. e.g. branch-delay slots on MIPS and some other RISCs, and especially early MIPS load delay slots (using the result of a load in the next instruction gives unpredictable results; the first-gen MIPS CPU hardware doesn't interlock for the dependency, so the ISA leaves that undefined). Also, the Mill has delayed loads only write their destination register after some number of other instructions. (So you can expose ILP to an in-order pipeline.)
– Peter Cordes
Jan 19 at 18:45
See Agner Fog's microarch pdf (agner.org/optimize) for details on the pipeline in P5 Pentium vs. P6, and vs. in-order Atom. (In-order Atom (not Silvermont) does decode to uops, unlike in-order P5.)
– Peter Cordes
Jan 19 at 18:47
1
1
Original 8086 pipelines instruction prefetch, with a 6 byte buffer. This is really easy to implement compared to pipelining decode / execute / etc, but it's still pipelining that some even earlier CPUs didn't have.
– Peter Cordes
Jan 19 at 18:33
Original 8086 pipelines instruction prefetch, with a 6 byte buffer. This is really easy to implement compared to pipelining decode / execute / etc, but it's still pipelining that some even earlier CPUs didn't have.
– Peter Cordes
Jan 19 at 18:33
1
1
PPro (first-gen P6 family) was the first to decode x86 instructions into RISC-like uops. 486 and Pentium just pipelined x86 instructions when possible, making it more efficient to use more but simpler x86 instructions (e.g. avoiding memory-destination
add [mem], reg, because even though it can pair in either pipe, it takes 3 cycles before another (pair of) instruction(s) can start to execute.) Also avoiding push/pop was sometimes good on P5 Pentium. But mostly that was good for P6 before they added a stack engine to get rid of the stack-pointer update uops, while P5 could pair push/pop.– Peter Cordes
Jan 19 at 18:39
PPro (first-gen P6 family) was the first to decode x86 instructions into RISC-like uops. 486 and Pentium just pipelined x86 instructions when possible, making it more efficient to use more but simpler x86 instructions (e.g. avoiding memory-destination
add [mem], reg, because even though it can pair in either pipe, it takes 3 cycles before another (pair of) instruction(s) can start to execute.) Also avoiding push/pop was sometimes good on P5 Pentium. But mostly that was good for P6 before they added a stack engine to get rid of the stack-pointer update uops, while P5 could pair push/pop.– Peter Cordes
Jan 19 at 18:39
1
1
Splitting LEA into ADD and LEA (I guess actually MOV?) is hardly a good example. It's that sort of idea, but that decoding doesn't even make sense (it modifies the original EAX), and shift+add is not hard so LEA is always a single uop on all CPUs that decode complex instructions to multiple uops. A better example is a memory-destination add, which is load + add + store.
– Peter Cordes
Jan 19 at 18:41
Splitting LEA into ADD and LEA (I guess actually MOV?) is hardly a good example. It's that sort of idea, but that decoding doesn't even make sense (it modifies the original EAX), and shift+add is not hard so LEA is always a single uop on all CPUs that decode complex instructions to multiple uops. A better example is a memory-destination add, which is load + add + store.
– Peter Cordes
Jan 19 at 18:41
Yes, pipelined vs. non-pipelined is an implementation decision. But some ISAs explicitly expose some of the pipelining. e.g. branch-delay slots on MIPS and some other RISCs, and especially early MIPS load delay slots (using the result of a load in the next instruction gives unpredictable results; the first-gen MIPS CPU hardware doesn't interlock for the dependency, so the ISA leaves that undefined). Also, the Mill has delayed loads only write their destination register after some number of other instructions. (So you can expose ILP to an in-order pipeline.)
– Peter Cordes
Jan 19 at 18:45
Yes, pipelined vs. non-pipelined is an implementation decision. But some ISAs explicitly expose some of the pipelining. e.g. branch-delay slots on MIPS and some other RISCs, and especially early MIPS load delay slots (using the result of a load in the next instruction gives unpredictable results; the first-gen MIPS CPU hardware doesn't interlock for the dependency, so the ISA leaves that undefined). Also, the Mill has delayed loads only write their destination register after some number of other instructions. (So you can expose ILP to an in-order pipeline.)
– Peter Cordes
Jan 19 at 18:45
See Agner Fog's microarch pdf (agner.org/optimize) for details on the pipeline in P5 Pentium vs. P6, and vs. in-order Atom. (In-order Atom (not Silvermont) does decode to uops, unlike in-order P5.)
– Peter Cordes
Jan 19 at 18:47
See Agner Fog's microarch pdf (agner.org/optimize) for details on the pipeline in P5 Pentium vs. P6, and vs. in-order Atom. (In-order Atom (not Silvermont) does decode to uops, unlike in-order P5.)
– Peter Cordes
Jan 19 at 18:47
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54266755%2fis-this-intel-pipelining-instruction%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
Pipelining is a feature of the cpu it's not applicable to single instructions.
– Jester
Jan 19 at 12:13
and is not specific to intel or the x86, it is widely used.
– old_timer
Jan 19 at 13:29
although I dont think the 8086 used it, it had a small prefetch buffer...
– old_timer
Jan 19 at 13:29