Project Stage 2.A
Project Stage 2 – Part A: Finding Cloned Functions
For Stage 2 of my SPO600 project, I’m taking the basic GCC analysis pass I built in Stage 1 and evolving it into something
a bit more interesting—analyzing cloned functions. The end goal? Build a Clone-Pruning Analysis Pass that checks if functions
in a compiled program are just duplicates and, if they are, recommends pruning them to save space and avoid redundancy.
But before we can make any pruning decisions, we first need to find these clones. That’s what this blog is all about: detecting cloned functions in the compiled code.
What Is a Cloned Function?
When GCC optimizes code, it often makes slightly modified copies of functions—these are called clones. You’ll see them with names like:
scale_samples.constprop.0
scale_samples.Mrng
scale_samples.resolver
These cloned functions usually come from optimizations like constant propagation, inlining, or architecture-specific tuning. For example, if the compiler notices that a function is always called with the same constant, it might create a version of that function optimized for that case.
That’s cool… but sometimes these clones end up being nearly identical. If we can detect that, we might be able to prune them.
Goal for This Part
My main objective in Part A was to:
Go through every function in the compiled program
Check if it’s a clone
Extract the base name (the original function it was cloned from)
Keep track of those names for later comparison
Reusing Code from Stage 1
The foundation came from my Stage 1 pass, which walked through every function and counted its basic blocks and GIMPLE statements. That gave me a way to see all functions and their structures—perfect for extending into clone detection.
Here’s a snippet from that pass, just to recap:
FOR_EACH_BB_FN(bb, fun) {
for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
gimple_count++;
}
bb_count++;
}
fprintf(stderr, "Function %s: %d BBs, %d GIMPLE stmts\n", function_name(fun), bb_count, gimple_count);
Detecting Clones
To detect clones, I used a pretty simple trick—look for a dot (.) in the function name. In GCC, clones often include suffixes like .constprop, .variant, or .resolver, which makes it easy to spot them.
Here’s the logic I used:
const char *full_name = function_name(fun);
char base_name[256] = "";
bool is_clone = false;
const char *dot = strchr(full_name, '.');
if (dot != nullptr) {
is_clone = true;
size_t len = dot - full_name;
strncpy(base_name, full_name, len);
base_name[len] = '\0';
fprintf(stderr, "Detected clone for function: %s (full name: %s)\n", base_name, full_name);
} else {
strncpy(base_name, full_name, sizeof(base_name) - 1);
fprintf(stderr, "Processing original function: %s\n", full_name);
}
So with this code in place, every time GCC processes a function, my pass checks if it looks like a clone and logs it. This gives me the info I need to compare functions later.
Why This Step Matters
Before you can decide whether two functions are “substantially the same,” you need to know which ones to compare. Detecting and grouping cloned functions by their base name is the foundation of the whole analysis.
It may seem simple, but getting this step right makes the rest of the process (especially the comparisons in Part B) a lot easier and more accurate.
Reflection: Getting Comfortable with Compiler Internals
This part of the project helped me get more comfortable with how GCC structures its internal representation of functions—especially through names and passes. I learned how clones are named, why they’re created, and how to parse those names to pull out meaningful data.
I also started noticing how compiler behavior can vary between architectures. For example, suffixes like .Msve2 or .Mrng appear on different platforms like ARM (aarch64) versus x86. I’ll have to take that into account in later parts of the project when I start comparing functions.One of the most satisfying moments was finally seeing the Detected clone for function: messages show up in my diagnostic output. That was the first sign my pass was doing its job.
What’s Next?
In the next post (Part B), I’ll dive into the actual comparison of these cloned functions. I’ll look at their GIMPLE, basic block counts, and more to see if they’re “substantially the same.”
Why This Step Matters
Comments
Post a Comment