Project Stage 2.C

 

Project Stage 2 – Part C: Clone-Pruning in Action

This is it the final part of my Clone-Pruning Analysis Pass series!

So far, I’ve:

  • Found cloned functions using name patterns and suffixes (Part A)

  • Extracted and compared their GIMPLE to check for similarity (Part B)

Now, in this post, I’ll show the analysis in action—specifically how my pass performs on two test cases provided in the course: clone-test-prune and clone-test-noprune.

This is where all the logic from earlier pays off!

Clone-Prune Test Files

These test files contain three functions:

  • scale_samples.default

  • scale_samples.Mrng

  • scale_samples.resolver

My goal here: identify whether scale_samples.Mrng is identical to scale_samples.default—and if so, recommend pruning.

Let’s look at them.

scale_samples.default

// Simplified view of instructions...

vscale = 2;

for (...) {

    result[i] = input[i] * vscale;

}


scale_samples.Mrng

// Basically the same logic, just cloned for target Mrng

vscale = 2;

for (...) {

    result[i] = input[i] * vscale;

}


Aside from memory addresses (which always vary), these functions are identical in logic, control flow, and GIMPLE structure. My pass detected:

  • Same number of basic blocks

  • Same number of GIMPLE statements

  • Matching control flow

Result:

PRUNE: scale_samples

This is a successful match! My pass recognized that scale_samples.Mrng is a clone of scale_samples.default and recommended pruning it to avoid redundancy.

Clone-NoPrune Test Files

Next up, the clone-test-noprune case. This one contains:

  • scale_samples.default

  • scale_samples.Msve2

  • scale_samples.resolver

Let’s see why these should not be pruned.

scale_samples.default

vscale = 2;

for (...) {

    result[i] = input[i] * vscale;

}


scale_samples.Msve2

vscale = get_vector_length();

for (...) {

    result[i] = input[i] * vscale;

}


Okay, now we’re seeing meaningful differences:

  • The multiplication factor (vscale) is calculated differently.

  • Even though the loop structure looks similar, the GIMPLE is different because the logic behind vscale changes.

scale_samples.resolver

This one is completely different. It resolves variants at runtime, and its structure and logic don’t resemble the other two at all.

Result:

NOPRUNE: scale_samples

Correct behavior again! My pass correctly avoided pruning these functions.

Reflection: Wrapping It All Up

What I Learned

GCC Internals Make Sense Now

GIMPLE, basic blocks, function structures—they were all a mystery at first. But building this pass forced me to work inside GCC, and that gave me a much deeper understanding of how compilers analyze and optimize code.

Detecting Clones Isn’t Always Easy

Clones can look deceptively similar but behave very differently. I started out thinking, “I'll just compare statements!” but realized things like SSA names, labels, or register allocation can make identical logic look different. I had to learn how to look past those surface-level differences and focus on meaningful structure.

Architecture Differences Are Real

It surprised me how the same code compiled on x86_64 and aarch64 could end up with very different clone suffixes or even different clone strategies. I had to make my code more flexible just to account for that.

Challenges Faced

  • Suffix Matching Logic
    Parsing clone suffixes like .constprop, .variant, and .resolver across platforms was a challenge. There was no universal pattern, so I built a name normalization system to group clones properly.

  • Deciding "Substantially the Same"
    This was the core problem: how do you tell if two functions are “basically the same”? I had to balance strict matching (to avoid false positives) with leniency (to not miss obvious clones). I settled on comparing basic blocks and GIMPLE statement counts as a first filter—it's fast and effective.

Final Thoughts

Working on this project was honestly kind of fun. It was tough at times, especially with GCC’s steep learning curve, but seeing my pass correctly identify clones and avoid bad pruning felt really rewarding.

Here’s what I ended up with:

  • A pass that finds and groups cloned functions

  • A simple but effective comparison mechanism using GIMPLE

  • Clear pruning decisions based on analysis

  • Support for different architectures

This project taught me so much, not just about compilers, but about how deep you can go into even a simple concept like “function duplication.”

I’m proud of how far this pass has come, and I’m excited to see where I can take it next.

Stage 2 of this project is officially wrapped.

Stay Tuned for Project Stage 3!


Comments

Popular posts from this blog

Project Stage 2.A

Project Stage 3

Welcome to My SPO600 Journey

Lab1 6502 Assembly Language

Stage 1 Completed

Project Stage 2.B

Lab 4 Building GCC on My Linux VM

Lab 2 Adventures in 6502 Assembly

Project Stage 1 Diving In!