Project Stage 2.B
Project Stage 2 – Part B: Comparing GIMPLE of Cloned Functions
In the previous post, I wrote about how I detected cloned functions in the compiled program. I used naming conventions (like .constprop, .variant, and .resolver) to group clones by their original function name.
Now that I’ve found these cloned functions, it’s time to look inside them—into their GIMPLE representation—and figure out if they’re actually doing the same thing. This blog is all about how I accessed, compared, and analyzed the internal structure of these functions.
Wait, What Is GIMPLE Again?
GIMPLE is a simplified version of C/C++ code that GCC uses for optimization. It’s kind of like a low-level, easy-to-analyze form of your code, broken down into really basic steps (like x = y + 1).
Each function is made up of:
Basic Blocks: These are sequences of instructions with no jumps except at the end.
GIMPLE Statements: Individual operations or assignments.
In this part, I extracted GIMPLE from each clone and its base function to compare:
Number of basic blocks
Number of GIMPLE statements
(Later on) Whether those statements are similar
How I Got GIMPLE from Functions
First, I looped through every function—same as in Part A. But this time, for each function, I stored their basic block and GIMPLE counts in a std::map using the function name as the key.
Here’s the structure I used:
struct FunctionInfo {
int basic_block_count;
int gimple_statement_count;
};
std::map<std::string, FunctionInfo> function_data;
Then, while visiting each function:
FunctionInfo info = {0, 0};
FOR_EACH_BB_FN(bb, fun) {
info.basic_block_count++;
for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
info.gimple_statement_count++;
}
}
function_data[full_name] = info;
This stored the structure of every function I saw. So now, I could finally compare each clone with its original!
Comparing Clones to Their Originals
Once I had data for all the functions, I looped through them again and compared each clone to its base function (from Part A, we saved base names by stripping off suffixes like .constprop.0, etc).
Here’s how I did the comparison:
if (is_clone) {
std::string clone_name(full_name);
std::string base_name(base_name_str);
if (function_data.count(base_name)) {
FunctionInfo clone_info = function_data[clone_name];
FunctionInfo base_info = function_data[base_name];
bool same_bb_count = (clone_info.basic_block_count == base_info.basic_block_count);
bool same_stmt_count = (clone_info.gimple_statement_count == base_info.gimple_statement_count);
if (same_bb_count && same_stmt_count) {
fprintf(stderr, "CLONE MATCH: %s is similar to base %s\n", full_name, base_name_str);
} else {
fprintf(stderr, "CLONE DIFFERENT: %s differs from base %s\n", full_name, base_name_str);
}
}
}
So in short:
If the clone has the same number of basic blocks and GIMPLE statements as the base → mark it as similar.
Otherwise → it might have meaningful differences, so flag it as different.
This isn’t a deep semantic comparison (yet!), but it’s a good first filter before deciding whether to prune.
Reflection: GIMPLE Ain’t So Scary After All
Honestly, when I first saw the term “GIMPLE,” it felt super low-level and intimidating. But once I started working with it, I realized you don’t need to understand every detail of the IR to start making useful analyses. Just counting basic blocks and statements gives you a surprisingly good signal.
One tricky part was dealing with architecture-specific clones. On x86_64, you might not get the same suffixes or the same number of clones as on aarch64. I had to run my pass on both and compare outputs to make sure it was behaving consistently.
The best part? Watching the logs show:
CLONE MATCH: scale_samples.constprop.0 is similar to base scale_samples
It meant my analysis was working, my pass was finding similarities all on its own.
What’s Next?
In the final part (Part C), I’ll use this comparison data to make pruning recommendations. That means:
If two functions are “substantially the same,” I’ll suggest keeping one and pruning the rest.
I’ll also log these decisions clearly so other passes or users can act on them.
Stay tuned for the last part of the Clone-Pruning journey!
Comments
Post a Comment