Project Stage 2.C
Project Stage 2 – Part C: Clone-Pruning in Action
This is it the final part of my Clone-Pruning Analysis Pass series!
So far, I’ve:
Found cloned functions using name patterns and suffixes (Part A)
Extracted and compared their GIMPLE to check for similarity (Part B)
Now, in this post, I’ll show the analysis in action—specifically how my pass performs on two test cases provided in the course: clone-test-prune and clone-test-noprune.
This is where all the logic from earlier pays off!
Clone-Prune Test Files
These test files contain three functions:
scale_samples.default
scale_samples.Mrng
scale_samples.resolver
My goal here: identify whether scale_samples.Mrng is identical to scale_samples.default—and if so, recommend pruning.
Let’s look at them.
scale_samples.default
// Simplified view of instructions...
vscale = 2;
for (...) {
result[i] = input[i] * vscale;
}
scale_samples.Mrng
// Basically the same logic, just cloned for target Mrng
vscale = 2;
for (...) {
result[i] = input[i] * vscale;
}
Aside from memory addresses (which always vary), these functions are identical in logic, control flow, and GIMPLE structure. My pass detected:
Same number of basic blocks
Same number of GIMPLE statements
Matching control flow
Result:
PRUNE: scale_samples
This is a successful match! My pass recognized that scale_samples.Mrng is a clone of scale_samples.default and recommended pruning it to avoid redundancy.
Clone-NoPrune Test Files
Next up, the clone-test-noprune case. This one contains:
scale_samples.default
scale_samples.Msve2
scale_samples.resolver
Let’s see why these should not be pruned.
scale_samples.default
vscale = 2;
for (...) {
result[i] = input[i] * vscale;
}
scale_samples.Msve2
vscale = get_vector_length();
for (...) {
result[i] = input[i] * vscale;
}
Okay, now we’re seeing meaningful differences:
The multiplication factor (vscale) is calculated differently.
Even though the loop structure looks similar, the GIMPLE is different because the logic behind vscale changes.
scale_samples.resolver
This one is completely different. It resolves variants at runtime, and its structure and logic don’t resemble the other two at all.
Result:
NOPRUNE: scale_samples
Correct behavior again! My pass correctly avoided pruning these functions.
Reflection: Wrapping It All Up
What I Learned
GCC Internals Make Sense Now
GIMPLE, basic blocks, function structures—they were all a mystery at first. But building this pass forced me to work inside GCC, and that gave me a much deeper understanding of how compilers analyze and optimize code.
Detecting Clones Isn’t Always Easy
Clones can look deceptively similar but behave very differently. I started out thinking, “I'll just compare statements!” but realized things like SSA names, labels, or register allocation can make identical logic look different. I had to learn how to look past those surface-level differences and focus on meaningful structure.
Architecture Differences Are Real
It surprised me how the same code compiled on x86_64 and aarch64 could end up with very different clone suffixes or even different clone strategies. I had to make my code more flexible just to account for that.
Challenges Faced
Suffix Matching Logic
Parsing clone suffixes like .constprop, .variant, and .resolver across platforms was a challenge. There was no universal pattern, so I built a name normalization system to group clones properly.Deciding "Substantially the Same"
This was the core problem: how do you tell if two functions are “basically the same”? I had to balance strict matching (to avoid false positives) with leniency (to not miss obvious clones). I settled on comparing basic blocks and GIMPLE statement counts as a first filter—it's fast and effective.
Final Thoughts
Working on this project was honestly kind of fun. It was tough at times, especially with GCC’s steep learning curve, but seeing my pass correctly identify clones and avoid bad pruning felt really rewarding.
Here’s what I ended up with:
A pass that finds and groups cloned functions
A simple but effective comparison mechanism using GIMPLE
Clear pruning decisions based on analysis
Support for different architectures
This project taught me so much, not just about compilers, but about how deep you can go into even a simple concept like “function duplication.”
I’m proud of how far this pass has come, and I’m excited to see where I can take it next.
Stage 2 of this project is officially wrapped.
Stay Tuned for Project Stage 3!
Comments
Post a Comment