Get byte representation of ASM instruction within C code
Is there a way, within C code, to go from a textual representation of an ASM instruction (like cmpwi r3, 0x20) to its binary representation (0x2c030020)?
I am writing code that will be embedded into another application at runtime. That code is supposed to alter the behaviour / the code of the running program. That means, there is a bunch of code lines like this:
*((volatile int *)(0x80001234)) = 0x2c030020;
That code writes the ASM instruction cmpwi r3, 0x20 to 0x80001234, overwriting the current instruction at that address. Now, having the constant "0x2c030020" in my C code without knowing what that does is bad for maintaining the code. Thus, I'd usually add comments to code like the one above, stating the ASM instruction: // 2c 03 00 20 = cmpwi r3, 0x20
However, from time to time these get out of sync. I might do a quick change to the integer value and forgot to update the comment, or I might just make a typo in the comment, causing confusion.
Is there some way I could do something like this instead? (pseudo-code) *((volatile int *)(0x80001234)) = asm("cmpwi r3, 0x20"); which would then result in 0x2c030020 being written to 80001234? Or would I need a hacky solution with a custom preprocessor running over my C source files, replacing ASM instructions with their byte code?
I know there is the C syntax for inline assembler code using the asm() function, but that would execute the given ASM instructions, not give me their binary representation.
c assembly powerpc
add a comment |
Is there a way, within C code, to go from a textual representation of an ASM instruction (like cmpwi r3, 0x20) to its binary representation (0x2c030020)?
I am writing code that will be embedded into another application at runtime. That code is supposed to alter the behaviour / the code of the running program. That means, there is a bunch of code lines like this:
*((volatile int *)(0x80001234)) = 0x2c030020;
That code writes the ASM instruction cmpwi r3, 0x20 to 0x80001234, overwriting the current instruction at that address. Now, having the constant "0x2c030020" in my C code without knowing what that does is bad for maintaining the code. Thus, I'd usually add comments to code like the one above, stating the ASM instruction: // 2c 03 00 20 = cmpwi r3, 0x20
However, from time to time these get out of sync. I might do a quick change to the integer value and forgot to update the comment, or I might just make a typo in the comment, causing confusion.
Is there some way I could do something like this instead? (pseudo-code) *((volatile int *)(0x80001234)) = asm("cmpwi r3, 0x20"); which would then result in 0x2c030020 being written to 80001234? Or would I need a hacky solution with a custom preprocessor running over my C source files, replacing ASM instructions with their byte code?
I know there is the C syntax for inline assembler code using the asm() function, but that would execute the given ASM instructions, not give me their binary representation.
c assembly powerpc
The simplest solution may be a map(const char*) -> (unsigned long): create a function, say,unsigned long asm(const char* asm);, whose body would be a hugeifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.
– ForceBru
Jan 19 at 13:55
1
The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to useifstatements, other control structures, and function pointers.
– Eric Postpischil
Jan 19 at 13:57
I know that the proper way to alter execution isifor other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.
– Florian Bach
Jan 19 at 13:58
If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)
– Eric Postpischil
Jan 19 at 14:17
add a comment |
Is there a way, within C code, to go from a textual representation of an ASM instruction (like cmpwi r3, 0x20) to its binary representation (0x2c030020)?
I am writing code that will be embedded into another application at runtime. That code is supposed to alter the behaviour / the code of the running program. That means, there is a bunch of code lines like this:
*((volatile int *)(0x80001234)) = 0x2c030020;
That code writes the ASM instruction cmpwi r3, 0x20 to 0x80001234, overwriting the current instruction at that address. Now, having the constant "0x2c030020" in my C code without knowing what that does is bad for maintaining the code. Thus, I'd usually add comments to code like the one above, stating the ASM instruction: // 2c 03 00 20 = cmpwi r3, 0x20
However, from time to time these get out of sync. I might do a quick change to the integer value and forgot to update the comment, or I might just make a typo in the comment, causing confusion.
Is there some way I could do something like this instead? (pseudo-code) *((volatile int *)(0x80001234)) = asm("cmpwi r3, 0x20"); which would then result in 0x2c030020 being written to 80001234? Or would I need a hacky solution with a custom preprocessor running over my C source files, replacing ASM instructions with their byte code?
I know there is the C syntax for inline assembler code using the asm() function, but that would execute the given ASM instructions, not give me their binary representation.
c assembly powerpc
Is there a way, within C code, to go from a textual representation of an ASM instruction (like cmpwi r3, 0x20) to its binary representation (0x2c030020)?
I am writing code that will be embedded into another application at runtime. That code is supposed to alter the behaviour / the code of the running program. That means, there is a bunch of code lines like this:
*((volatile int *)(0x80001234)) = 0x2c030020;
That code writes the ASM instruction cmpwi r3, 0x20 to 0x80001234, overwriting the current instruction at that address. Now, having the constant "0x2c030020" in my C code without knowing what that does is bad for maintaining the code. Thus, I'd usually add comments to code like the one above, stating the ASM instruction: // 2c 03 00 20 = cmpwi r3, 0x20
However, from time to time these get out of sync. I might do a quick change to the integer value and forgot to update the comment, or I might just make a typo in the comment, causing confusion.
Is there some way I could do something like this instead? (pseudo-code) *((volatile int *)(0x80001234)) = asm("cmpwi r3, 0x20"); which would then result in 0x2c030020 being written to 80001234? Or would I need a hacky solution with a custom preprocessor running over my C source files, replacing ASM instructions with their byte code?
I know there is the C syntax for inline assembler code using the asm() function, but that would execute the given ASM instructions, not give me their binary representation.
c assembly powerpc
c assembly powerpc
asked Jan 19 at 13:50
Florian BachFlorian Bach
18012
18012
The simplest solution may be a map(const char*) -> (unsigned long): create a function, say,unsigned long asm(const char* asm);, whose body would be a hugeifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.
– ForceBru
Jan 19 at 13:55
1
The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to useifstatements, other control structures, and function pointers.
– Eric Postpischil
Jan 19 at 13:57
I know that the proper way to alter execution isifor other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.
– Florian Bach
Jan 19 at 13:58
If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)
– Eric Postpischil
Jan 19 at 14:17
add a comment |
The simplest solution may be a map(const char*) -> (unsigned long): create a function, say,unsigned long asm(const char* asm);, whose body would be a hugeifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.
– ForceBru
Jan 19 at 13:55
1
The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to useifstatements, other control structures, and function pointers.
– Eric Postpischil
Jan 19 at 13:57
I know that the proper way to alter execution isifor other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.
– Florian Bach
Jan 19 at 13:58
If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)
– Eric Postpischil
Jan 19 at 14:17
The simplest solution may be a map
(const char*) -> (unsigned long): create a function, say, unsigned long asm(const char* asm);, whose body would be a huge ifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.– ForceBru
Jan 19 at 13:55
The simplest solution may be a map
(const char*) -> (unsigned long): create a function, say, unsigned long asm(const char* asm);, whose body would be a huge ifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.– ForceBru
Jan 19 at 13:55
1
1
The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to use
if statements, other control structures, and function pointers.– Eric Postpischil
Jan 19 at 13:57
The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to use
if statements, other control structures, and function pointers.– Eric Postpischil
Jan 19 at 13:57
I know that the proper way to alter execution is
if or other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.– Florian Bach
Jan 19 at 13:58
I know that the proper way to alter execution is
if or other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.– Florian Bach
Jan 19 at 13:58
If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)
– Eric Postpischil
Jan 19 at 14:17
If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)
– Eric Postpischil
Jan 19 at 14:17
add a comment |
1 Answer
1
active
oldest
votes
This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.
One approach you could use is to use an assembler to during your build to generate compile-time constants.
The first step is to make a file that has every assembly instruction you will use, one per line.
For example:
cmpwi 3,0x20
addi 3,3,0
blr
Name that file input.def. Then, use this shell script:
#!/usr/bin/env bash
(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s
powerpc-linux-gnu-as asm.s -o asm.o
powerpc-linux-gnu-objdump -d asm.o |
sed '1,/<main>/ d' |
paste -d't' - input.def |
awk -F't' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %sn", asm, bytes, disasm)
}'
# Clean temporaries
rm asm.s asm.o
(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)
This shell script:
- Creates an assembly file
- Assembles it
- Puts it side by side with input.def. (This is so it can see what assembly you typed.)
- Reformats the hex so it is a legal C constant. Reformats the asm so it is a legal C symbol. Then, writes a define to map the instruction name to the constant.
- Put all of this in asm.h
This is a lot of work, but you can do all of it at compile time.
This produces a header file named asm.h:
#define ASM_cmpwi_3_0x20 0x2c030020 // disassembly: cmpwi r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr
You use the asm.h file like this:
#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;
If you need a new asm constant, edit input.def and re-run the shell script.
1
Seems to me it would be a lot easier to get those bytes into your object file by putting anasm(".globl machine_code; machine_code: ;""cmpwi 3,0x20nt"...);statement at global scope in a C source file. (Maybe use.pushsection .rodatain front of it). Then declareextern uint32_t machine_code;in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.
– Peter Cordes
Jan 19 at 18:52
@PeterCordes That sounds like a good approach - you should write that as an answer.
– Nick ODell
Jan 19 at 18:55
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54267768%2fget-byte-representation-of-asm-instruction-within-c-code%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.
One approach you could use is to use an assembler to during your build to generate compile-time constants.
The first step is to make a file that has every assembly instruction you will use, one per line.
For example:
cmpwi 3,0x20
addi 3,3,0
blr
Name that file input.def. Then, use this shell script:
#!/usr/bin/env bash
(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s
powerpc-linux-gnu-as asm.s -o asm.o
powerpc-linux-gnu-objdump -d asm.o |
sed '1,/<main>/ d' |
paste -d't' - input.def |
awk -F't' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %sn", asm, bytes, disasm)
}'
# Clean temporaries
rm asm.s asm.o
(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)
This shell script:
- Creates an assembly file
- Assembles it
- Puts it side by side with input.def. (This is so it can see what assembly you typed.)
- Reformats the hex so it is a legal C constant. Reformats the asm so it is a legal C symbol. Then, writes a define to map the instruction name to the constant.
- Put all of this in asm.h
This is a lot of work, but you can do all of it at compile time.
This produces a header file named asm.h:
#define ASM_cmpwi_3_0x20 0x2c030020 // disassembly: cmpwi r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr
You use the asm.h file like this:
#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;
If you need a new asm constant, edit input.def and re-run the shell script.
1
Seems to me it would be a lot easier to get those bytes into your object file by putting anasm(".globl machine_code; machine_code: ;""cmpwi 3,0x20nt"...);statement at global scope in a C source file. (Maybe use.pushsection .rodatain front of it). Then declareextern uint32_t machine_code;in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.
– Peter Cordes
Jan 19 at 18:52
@PeterCordes That sounds like a good approach - you should write that as an answer.
– Nick ODell
Jan 19 at 18:55
add a comment |
This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.
One approach you could use is to use an assembler to during your build to generate compile-time constants.
The first step is to make a file that has every assembly instruction you will use, one per line.
For example:
cmpwi 3,0x20
addi 3,3,0
blr
Name that file input.def. Then, use this shell script:
#!/usr/bin/env bash
(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s
powerpc-linux-gnu-as asm.s -o asm.o
powerpc-linux-gnu-objdump -d asm.o |
sed '1,/<main>/ d' |
paste -d't' - input.def |
awk -F't' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %sn", asm, bytes, disasm)
}'
# Clean temporaries
rm asm.s asm.o
(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)
This shell script:
- Creates an assembly file
- Assembles it
- Puts it side by side with input.def. (This is so it can see what assembly you typed.)
- Reformats the hex so it is a legal C constant. Reformats the asm so it is a legal C symbol. Then, writes a define to map the instruction name to the constant.
- Put all of this in asm.h
This is a lot of work, but you can do all of it at compile time.
This produces a header file named asm.h:
#define ASM_cmpwi_3_0x20 0x2c030020 // disassembly: cmpwi r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr
You use the asm.h file like this:
#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;
If you need a new asm constant, edit input.def and re-run the shell script.
1
Seems to me it would be a lot easier to get those bytes into your object file by putting anasm(".globl machine_code; machine_code: ;""cmpwi 3,0x20nt"...);statement at global scope in a C source file. (Maybe use.pushsection .rodatain front of it). Then declareextern uint32_t machine_code;in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.
– Peter Cordes
Jan 19 at 18:52
@PeterCordes That sounds like a good approach - you should write that as an answer.
– Nick ODell
Jan 19 at 18:55
add a comment |
This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.
One approach you could use is to use an assembler to during your build to generate compile-time constants.
The first step is to make a file that has every assembly instruction you will use, one per line.
For example:
cmpwi 3,0x20
addi 3,3,0
blr
Name that file input.def. Then, use this shell script:
#!/usr/bin/env bash
(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s
powerpc-linux-gnu-as asm.s -o asm.o
powerpc-linux-gnu-objdump -d asm.o |
sed '1,/<main>/ d' |
paste -d't' - input.def |
awk -F't' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %sn", asm, bytes, disasm)
}'
# Clean temporaries
rm asm.s asm.o
(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)
This shell script:
- Creates an assembly file
- Assembles it
- Puts it side by side with input.def. (This is so it can see what assembly you typed.)
- Reformats the hex so it is a legal C constant. Reformats the asm so it is a legal C symbol. Then, writes a define to map the instruction name to the constant.
- Put all of this in asm.h
This is a lot of work, but you can do all of it at compile time.
This produces a header file named asm.h:
#define ASM_cmpwi_3_0x20 0x2c030020 // disassembly: cmpwi r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr
You use the asm.h file like this:
#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;
If you need a new asm constant, edit input.def and re-run the shell script.
This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.
One approach you could use is to use an assembler to during your build to generate compile-time constants.
The first step is to make a file that has every assembly instruction you will use, one per line.
For example:
cmpwi 3,0x20
addi 3,3,0
blr
Name that file input.def. Then, use this shell script:
#!/usr/bin/env bash
(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s
powerpc-linux-gnu-as asm.s -o asm.o
powerpc-linux-gnu-objdump -d asm.o |
sed '1,/<main>/ d' |
paste -d't' - input.def |
awk -F't' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %sn", asm, bytes, disasm)
}'
# Clean temporaries
rm asm.s asm.o
(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)
This shell script:
- Creates an assembly file
- Assembles it
- Puts it side by side with input.def. (This is so it can see what assembly you typed.)
- Reformats the hex so it is a legal C constant. Reformats the asm so it is a legal C symbol. Then, writes a define to map the instruction name to the constant.
- Put all of this in asm.h
This is a lot of work, but you can do all of it at compile time.
This produces a header file named asm.h:
#define ASM_cmpwi_3_0x20 0x2c030020 // disassembly: cmpwi r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr
You use the asm.h file like this:
#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;
If you need a new asm constant, edit input.def and re-run the shell script.
answered Jan 19 at 18:45
Nick ODellNick ODell
2,95511440
2,95511440
1
Seems to me it would be a lot easier to get those bytes into your object file by putting anasm(".globl machine_code; machine_code: ;""cmpwi 3,0x20nt"...);statement at global scope in a C source file. (Maybe use.pushsection .rodatain front of it). Then declareextern uint32_t machine_code;in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.
– Peter Cordes
Jan 19 at 18:52
@PeterCordes That sounds like a good approach - you should write that as an answer.
– Nick ODell
Jan 19 at 18:55
add a comment |
1
Seems to me it would be a lot easier to get those bytes into your object file by putting anasm(".globl machine_code; machine_code: ;""cmpwi 3,0x20nt"...);statement at global scope in a C source file. (Maybe use.pushsection .rodatain front of it). Then declareextern uint32_t machine_code;in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.
– Peter Cordes
Jan 19 at 18:52
@PeterCordes That sounds like a good approach - you should write that as an answer.
– Nick ODell
Jan 19 at 18:55
1
1
Seems to me it would be a lot easier to get those bytes into your object file by putting an
asm(".globl machine_code; machine_code: ;" "cmpwi 3,0x20nt" ... ); statement at global scope in a C source file. (Maybe use .pushsection .rodata in front of it). Then declare extern uint32_t machine_code; in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.– Peter Cordes
Jan 19 at 18:52
Seems to me it would be a lot easier to get those bytes into your object file by putting an
asm(".globl machine_code; machine_code: ;" "cmpwi 3,0x20nt" ... ); statement at global scope in a C source file. (Maybe use .pushsection .rodata in front of it). Then declare extern uint32_t machine_code; in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.– Peter Cordes
Jan 19 at 18:52
@PeterCordes That sounds like a good approach - you should write that as an answer.
– Nick ODell
Jan 19 at 18:55
@PeterCordes That sounds like a good approach - you should write that as an answer.
– Nick ODell
Jan 19 at 18:55
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54267768%2fget-byte-representation-of-asm-instruction-within-c-code%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The simplest solution may be a map
(const char*) -> (unsigned long): create a function, say,unsigned long asm(const char* asm);, whose body would be a hugeifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.– ForceBru
Jan 19 at 13:55
1
The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to use
ifstatements, other control structures, and function pointers.– Eric Postpischil
Jan 19 at 13:57
I know that the proper way to alter execution is
ifor other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.– Florian Bach
Jan 19 at 13:58
If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)
– Eric Postpischil
Jan 19 at 14:17