Get byte representation of ASM instruction within C code












1















Is there a way, within C code, to go from a textual representation of an ASM instruction (like cmpwi r3, 0x20) to its binary representation (0x2c030020)?



I am writing code that will be embedded into another application at runtime. That code is supposed to alter the behaviour / the code of the running program. That means, there is a bunch of code lines like this:



*((volatile int *)(0x80001234)) = 0x2c030020;



That code writes the ASM instruction cmpwi r3, 0x20 to 0x80001234, overwriting the current instruction at that address. Now, having the constant "0x2c030020" in my C code without knowing what that does is bad for maintaining the code. Thus, I'd usually add comments to code like the one above, stating the ASM instruction: // 2c 03 00 20 = cmpwi r3, 0x20



However, from time to time these get out of sync. I might do a quick change to the integer value and forgot to update the comment, or I might just make a typo in the comment, causing confusion.



Is there some way I could do something like this instead? (pseudo-code) *((volatile int *)(0x80001234)) = asm("cmpwi r3, 0x20"); which would then result in 0x2c030020 being written to 80001234? Or would I need a hacky solution with a custom preprocessor running over my C source files, replacing ASM instructions with their byte code?



I know there is the C syntax for inline assembler code using the asm() function, but that would execute the given ASM instructions, not give me their binary representation.










share|improve this question























  • The simplest solution may be a map (const char*) -> (unsigned long): create a function, say, unsigned long asm(const char* asm);, whose body would be a huge ifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.

    – ForceBru
    Jan 19 at 13:55






  • 1





    The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to use if statements, other control structures, and function pointers.

    – Eric Postpischil
    Jan 19 at 13:57











  • I know that the proper way to alter execution is if or other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.

    – Florian Bach
    Jan 19 at 13:58













  • If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)

    – Eric Postpischil
    Jan 19 at 14:17


















1















Is there a way, within C code, to go from a textual representation of an ASM instruction (like cmpwi r3, 0x20) to its binary representation (0x2c030020)?



I am writing code that will be embedded into another application at runtime. That code is supposed to alter the behaviour / the code of the running program. That means, there is a bunch of code lines like this:



*((volatile int *)(0x80001234)) = 0x2c030020;



That code writes the ASM instruction cmpwi r3, 0x20 to 0x80001234, overwriting the current instruction at that address. Now, having the constant "0x2c030020" in my C code without knowing what that does is bad for maintaining the code. Thus, I'd usually add comments to code like the one above, stating the ASM instruction: // 2c 03 00 20 = cmpwi r3, 0x20



However, from time to time these get out of sync. I might do a quick change to the integer value and forgot to update the comment, or I might just make a typo in the comment, causing confusion.



Is there some way I could do something like this instead? (pseudo-code) *((volatile int *)(0x80001234)) = asm("cmpwi r3, 0x20"); which would then result in 0x2c030020 being written to 80001234? Or would I need a hacky solution with a custom preprocessor running over my C source files, replacing ASM instructions with their byte code?



I know there is the C syntax for inline assembler code using the asm() function, but that would execute the given ASM instructions, not give me their binary representation.










share|improve this question























  • The simplest solution may be a map (const char*) -> (unsigned long): create a function, say, unsigned long asm(const char* asm);, whose body would be a huge ifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.

    – ForceBru
    Jan 19 at 13:55






  • 1





    The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to use if statements, other control structures, and function pointers.

    – Eric Postpischil
    Jan 19 at 13:57











  • I know that the proper way to alter execution is if or other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.

    – Florian Bach
    Jan 19 at 13:58













  • If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)

    – Eric Postpischil
    Jan 19 at 14:17
















1












1








1








Is there a way, within C code, to go from a textual representation of an ASM instruction (like cmpwi r3, 0x20) to its binary representation (0x2c030020)?



I am writing code that will be embedded into another application at runtime. That code is supposed to alter the behaviour / the code of the running program. That means, there is a bunch of code lines like this:



*((volatile int *)(0x80001234)) = 0x2c030020;



That code writes the ASM instruction cmpwi r3, 0x20 to 0x80001234, overwriting the current instruction at that address. Now, having the constant "0x2c030020" in my C code without knowing what that does is bad for maintaining the code. Thus, I'd usually add comments to code like the one above, stating the ASM instruction: // 2c 03 00 20 = cmpwi r3, 0x20



However, from time to time these get out of sync. I might do a quick change to the integer value and forgot to update the comment, or I might just make a typo in the comment, causing confusion.



Is there some way I could do something like this instead? (pseudo-code) *((volatile int *)(0x80001234)) = asm("cmpwi r3, 0x20"); which would then result in 0x2c030020 being written to 80001234? Or would I need a hacky solution with a custom preprocessor running over my C source files, replacing ASM instructions with their byte code?



I know there is the C syntax for inline assembler code using the asm() function, but that would execute the given ASM instructions, not give me their binary representation.










share|improve this question














Is there a way, within C code, to go from a textual representation of an ASM instruction (like cmpwi r3, 0x20) to its binary representation (0x2c030020)?



I am writing code that will be embedded into another application at runtime. That code is supposed to alter the behaviour / the code of the running program. That means, there is a bunch of code lines like this:



*((volatile int *)(0x80001234)) = 0x2c030020;



That code writes the ASM instruction cmpwi r3, 0x20 to 0x80001234, overwriting the current instruction at that address. Now, having the constant "0x2c030020" in my C code without knowing what that does is bad for maintaining the code. Thus, I'd usually add comments to code like the one above, stating the ASM instruction: // 2c 03 00 20 = cmpwi r3, 0x20



However, from time to time these get out of sync. I might do a quick change to the integer value and forgot to update the comment, or I might just make a typo in the comment, causing confusion.



Is there some way I could do something like this instead? (pseudo-code) *((volatile int *)(0x80001234)) = asm("cmpwi r3, 0x20"); which would then result in 0x2c030020 being written to 80001234? Or would I need a hacky solution with a custom preprocessor running over my C source files, replacing ASM instructions with their byte code?



I know there is the C syntax for inline assembler code using the asm() function, but that would execute the given ASM instructions, not give me their binary representation.







c assembly powerpc






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 19 at 13:50









Florian BachFlorian Bach

18012




18012













  • The simplest solution may be a map (const char*) -> (unsigned long): create a function, say, unsigned long asm(const char* asm);, whose body would be a huge ifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.

    – ForceBru
    Jan 19 at 13:55






  • 1





    The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to use if statements, other control structures, and function pointers.

    – Eric Postpischil
    Jan 19 at 13:57











  • I know that the proper way to alter execution is if or other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.

    – Florian Bach
    Jan 19 at 13:58













  • If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)

    – Eric Postpischil
    Jan 19 at 14:17





















  • The simplest solution may be a map (const char*) -> (unsigned long): create a function, say, unsigned long asm(const char* asm);, whose body would be a huge ifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.

    – ForceBru
    Jan 19 at 13:55






  • 1





    The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to use if statements, other control structures, and function pointers.

    – Eric Postpischil
    Jan 19 at 13:57











  • I know that the proper way to alter execution is if or other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.

    – Florian Bach
    Jan 19 at 13:58













  • If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)

    – Eric Postpischil
    Jan 19 at 14:17



















The simplest solution may be a map (const char*) -> (unsigned long): create a function, say, unsigned long asm(const char* asm);, whose body would be a huge ifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.

– ForceBru
Jan 19 at 13:55





The simplest solution may be a map (const char*) -> (unsigned long): create a function, say, unsigned long asm(const char* asm);, whose body would be a huge ifstatement matching all known strings and returning the corresponding bytecode. Basically, you'll have to hard-code this mapping. Otherwise, you can call an assembler and get its output somehow, but it'll be much more time-consuming.

– ForceBru
Jan 19 at 13:55




1




1





The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to use if statements, other control structures, and function pointers.

– Eric Postpischil
Jan 19 at 13:57





The software that converts assembly source code into machine instructions is called an assembler. You can embed one in your code or invoke one in the system. Either way, this is a Bad Idea. There is almost no reason to modify instructions at run time. On many processors, doing so requires invalidating instruction cache. In many operating systems, it also requires modifying the access permissions of the pages containing instructions to make them writable. The proper way to alter program code during execution is to use if statements, other control structures, and function pointers.

– Eric Postpischil
Jan 19 at 13:57













I know that the proper way to alter execution is if or other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.

– Florian Bach
Jan 19 at 13:58







I know that the proper way to alter execution is if or other control structures. However, I have an existing binary for that embedded ppc system, and have no source code for that. I can only mod that by injecting custom code. I thought about a mapping table from string to byte representation, but that would increase the code size drastically, so I am looking for a compile-time solution, not one that works out the value at runtime.

– Florian Bach
Jan 19 at 13:58















If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)

– Eric Postpischil
Jan 19 at 14:17







If you only need assembly at compile time, not run time, then just use a custom preprocessor pass, or put the assembly in a separate source file and refer to it from the C source code. This is still a bad idea. If you must patch an old executable, it would be better to patch it once, in a static way, to call some dynamic library routine, and then you could link in whatever routine you wanted. (The loader of a dynamic library is in fact code to modify a running program. But it is designed for that and results in supported code [aside from the patch] instead of a kludge.)

– Eric Postpischil
Jan 19 at 14:17














1 Answer
1






active

oldest

votes


















1














This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.



One approach you could use is to use an assembler to during your build to generate compile-time constants.



The first step is to make a file that has every assembly instruction you will use, one per line.



For example:



cmpwi   3,0x20
addi 3,3,0
blr


Name that file input.def. Then, use this shell script:



#!/usr/bin/env bash

(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s

powerpc-linux-gnu-as asm.s -o asm.o

powerpc-linux-gnu-objdump -d asm.o |
sed '1,/<main>/ d' |
paste -d't' - input.def |
awk -F't' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %sn", asm, bytes, disasm)
}'

# Clean temporaries
rm asm.s asm.o


(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)



This shell script:




  1. Creates an assembly file

  2. Assembles it

  3. Puts it side by side with input.def. (This is so it can see what assembly you typed.)

  4. Reformats the hex so it is a legal C constant. Reformats the asm so it is a legal C symbol. Then, writes a define to map the instruction name to the constant.

  5. Put all of this in asm.h


This is a lot of work, but you can do all of it at compile time.



This produces a header file named asm.h:



#define ASM_cmpwi_3_0x20         0x2c030020    // disassembly: cmpwi   r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr


You use the asm.h file like this:



#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;


If you need a new asm constant, edit input.def and re-run the shell script.






share|improve this answer



















  • 1





    Seems to me it would be a lot easier to get those bytes into your object file by putting an asm(".globl machine_code; machine_code: ;" "cmpwi 3,0x20nt" ... ); statement at global scope in a C source file. (Maybe use .pushsection .rodata in front of it). Then declare extern uint32_t machine_code; in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.

    – Peter Cordes
    Jan 19 at 18:52













  • @PeterCordes That sounds like a good approach - you should write that as an answer.

    – Nick ODell
    Jan 19 at 18:55











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54267768%2fget-byte-representation-of-asm-instruction-within-c-code%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.



One approach you could use is to use an assembler to during your build to generate compile-time constants.



The first step is to make a file that has every assembly instruction you will use, one per line.



For example:



cmpwi   3,0x20
addi 3,3,0
blr


Name that file input.def. Then, use this shell script:



#!/usr/bin/env bash

(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s

powerpc-linux-gnu-as asm.s -o asm.o

powerpc-linux-gnu-objdump -d asm.o |
sed '1,/<main>/ d' |
paste -d't' - input.def |
awk -F't' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %sn", asm, bytes, disasm)
}'

# Clean temporaries
rm asm.s asm.o


(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)



This shell script:




  1. Creates an assembly file

  2. Assembles it

  3. Puts it side by side with input.def. (This is so it can see what assembly you typed.)

  4. Reformats the hex so it is a legal C constant. Reformats the asm so it is a legal C symbol. Then, writes a define to map the instruction name to the constant.

  5. Put all of this in asm.h


This is a lot of work, but you can do all of it at compile time.



This produces a header file named asm.h:



#define ASM_cmpwi_3_0x20         0x2c030020    // disassembly: cmpwi   r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr


You use the asm.h file like this:



#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;


If you need a new asm constant, edit input.def and re-run the shell script.






share|improve this answer



















  • 1





    Seems to me it would be a lot easier to get those bytes into your object file by putting an asm(".globl machine_code; machine_code: ;" "cmpwi 3,0x20nt" ... ); statement at global scope in a C source file. (Maybe use .pushsection .rodata in front of it). Then declare extern uint32_t machine_code; in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.

    – Peter Cordes
    Jan 19 at 18:52













  • @PeterCordes That sounds like a good approach - you should write that as an answer.

    – Nick ODell
    Jan 19 at 18:55
















1














This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.



One approach you could use is to use an assembler to during your build to generate compile-time constants.



The first step is to make a file that has every assembly instruction you will use, one per line.



For example:



cmpwi   3,0x20
addi 3,3,0
blr


Name that file input.def. Then, use this shell script:



#!/usr/bin/env bash

(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s

powerpc-linux-gnu-as asm.s -o asm.o

powerpc-linux-gnu-objdump -d asm.o |
sed '1,/<main>/ d' |
paste -d't' - input.def |
awk -F't' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %sn", asm, bytes, disasm)
}'

# Clean temporaries
rm asm.s asm.o


(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)



This shell script:




  1. Creates an assembly file

  2. Assembles it

  3. Puts it side by side with input.def. (This is so it can see what assembly you typed.)

  4. Reformats the hex so it is a legal C constant. Reformats the asm so it is a legal C symbol. Then, writes a define to map the instruction name to the constant.

  5. Put all of this in asm.h


This is a lot of work, but you can do all of it at compile time.



This produces a header file named asm.h:



#define ASM_cmpwi_3_0x20         0x2c030020    // disassembly: cmpwi   r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr


You use the asm.h file like this:



#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;


If you need a new asm constant, edit input.def and re-run the shell script.






share|improve this answer



















  • 1





    Seems to me it would be a lot easier to get those bytes into your object file by putting an asm(".globl machine_code; machine_code: ;" "cmpwi 3,0x20nt" ... ); statement at global scope in a C source file. (Maybe use .pushsection .rodata in front of it). Then declare extern uint32_t machine_code; in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.

    – Peter Cordes
    Jan 19 at 18:52













  • @PeterCordes That sounds like a good approach - you should write that as an answer.

    – Nick ODell
    Jan 19 at 18:55














1












1








1







This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.



One approach you could use is to use an assembler to during your build to generate compile-time constants.



The first step is to make a file that has every assembly instruction you will use, one per line.



For example:



cmpwi   3,0x20
addi 3,3,0
blr


Name that file input.def. Then, use this shell script:



#!/usr/bin/env bash

(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s

powerpc-linux-gnu-as asm.s -o asm.o

powerpc-linux-gnu-objdump -d asm.o |
sed '1,/<main>/ d' |
paste -d't' - input.def |
awk -F't' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %sn", asm, bytes, disasm)
}'

# Clean temporaries
rm asm.s asm.o


(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)



This shell script:




  1. Creates an assembly file

  2. Assembles it

  3. Puts it side by side with input.def. (This is so it can see what assembly you typed.)

  4. Reformats the hex so it is a legal C constant. Reformats the asm so it is a legal C symbol. Then, writes a define to map the instruction name to the constant.

  5. Put all of this in asm.h


This is a lot of work, but you can do all of it at compile time.



This produces a header file named asm.h:



#define ASM_cmpwi_3_0x20         0x2c030020    // disassembly: cmpwi   r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr


You use the asm.h file like this:



#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;


If you need a new asm constant, edit input.def and re-run the shell script.






share|improve this answer













This sounds like a mad thing to do, but I assume you have a good reason for it. Life's no fun without a little bit of madness.



One approach you could use is to use an assembler to during your build to generate compile-time constants.



The first step is to make a file that has every assembly instruction you will use, one per line.



For example:



cmpwi   3,0x20
addi 3,3,0
blr


Name that file input.def. Then, use this shell script:



#!/usr/bin/env bash

(cat << HEADER
.global main
.text
main:
HEADER
cat input.def) > asm.s

powerpc-linux-gnu-as asm.s -o asm.o

powerpc-linux-gnu-objdump -d asm.o |
sed '1,/<main>/ d' |
paste -d't' - input.def |
awk -F't' '{
bytes=$2
asm=$4
disasm=$3
gsub(/ /, "", bytes);
gsub(/[, ]+/, "_", asm);
printf("#define ASM_%-20s 0x%s // disassembly: %sn", asm, bytes, disasm)
}'

# Clean temporaries
rm asm.s asm.o


(I am using GNU assembler and objdump here. You might need to change this part if you don't use those tools. objdump is being used as a glorified hexdump utility here.)



This shell script:




  1. Creates an assembly file

  2. Assembles it

  3. Puts it side by side with input.def. (This is so it can see what assembly you typed.)

  4. Reformats the hex so it is a legal C constant. Reformats the asm so it is a legal C symbol. Then, writes a define to map the instruction name to the constant.

  5. Put all of this in asm.h


This is a lot of work, but you can do all of it at compile time.



This produces a header file named asm.h:



#define ASM_cmpwi_3_0x20         0x2c030020    // disassembly: cmpwi   r3,32
#define ASM_addi_3_3_0 0x38630000 // disassembly: addi r3,r3,0
#define ASM_blr 0x4e800020 // disassembly: blr


You use the asm.h file like this:



#include "asm.h"
*((volatile int *)(0x80001234)) = ASM_cmpwi_3_0x20;


If you need a new asm constant, edit input.def and re-run the shell script.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 19 at 18:45









Nick ODellNick ODell

2,95511440




2,95511440








  • 1





    Seems to me it would be a lot easier to get those bytes into your object file by putting an asm(".globl machine_code; machine_code: ;" "cmpwi 3,0x20nt" ... ); statement at global scope in a C source file. (Maybe use .pushsection .rodata in front of it). Then declare extern uint32_t machine_code; in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.

    – Peter Cordes
    Jan 19 at 18:52













  • @PeterCordes That sounds like a good approach - you should write that as an answer.

    – Nick ODell
    Jan 19 at 18:55














  • 1





    Seems to me it would be a lot easier to get those bytes into your object file by putting an asm(".globl machine_code; machine_code: ;" "cmpwi 3,0x20nt" ... ); statement at global scope in a C source file. (Maybe use .pushsection .rodata in front of it). Then declare extern uint32_t machine_code; in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.

    – Peter Cordes
    Jan 19 at 18:52













  • @PeterCordes That sounds like a good approach - you should write that as an answer.

    – Nick ODell
    Jan 19 at 18:55








1




1





Seems to me it would be a lot easier to get those bytes into your object file by putting an asm(".globl machine_code; machine_code: ;" "cmpwi 3,0x20nt" ... ); statement at global scope in a C source file. (Maybe use .pushsection .rodata in front of it). Then declare extern uint32_t machine_code; in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.

– Peter Cordes
Jan 19 at 18:52







Seems to me it would be a lot easier to get those bytes into your object file by putting an asm(".globl machine_code; machine_code: ;" "cmpwi 3,0x20nt" ... ); statement at global scope in a C source file. (Maybe use .pushsection .rodata in front of it). Then declare extern uint32_t machine_code; in C so you can just access the array to copy from it. Your way has the advantage of giving you the machine code as compile-time constants the compiler can see and use as immediates though.

– Peter Cordes
Jan 19 at 18:52















@PeterCordes That sounds like a good approach - you should write that as an answer.

– Nick ODell
Jan 19 at 18:55





@PeterCordes That sounds like a good approach - you should write that as an answer.

– Nick ODell
Jan 19 at 18:55


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54267768%2fget-byte-representation-of-asm-instruction-within-c-code%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Callistus III

Ostreoida

Plistias Cous