Project Stage 2.A

Project Stage 2 – Part A: Finding Cloned Functions

For Stage 2 of my SPO600 project, I’m taking the basic GCC analysis pass I built in Stage 1 and evolving it into something

a bit more interesting—analyzing cloned functions. The end goal? Build a Clone-Pruning Analysis Pass that checks if functions

in a compiled program are just duplicates and, if they are, recommends pruning them to save space and avoid redundancy.

But before we can make any pruning decisions, we first need to find these clones. That’s what this blog is all about: detecting cloned functions in the compiled code.

What Is a Cloned Function?

When GCC optimizes code, it often makes slightly modified copies of functions—these are called clones. You’ll see them with names like:

  • scale_samples.constprop.0

  • scale_samples.Mrng

  • scale_samples.resolver

These cloned functions usually come from optimizations like constant propagation, inlining, or architecture-specific tuning. For example, if the compiler notices that a function is always called with the same constant, it might create a version of that function optimized for that case.

That’s cool… but sometimes these clones end up being nearly identical. If we can detect that, we might be able to prune them.

Goal for This Part

My main objective in Part A was to:

  • Go through every function in the compiled program

  • Check if it’s a clone

  • Extract the base name (the original function it was cloned from)

  • Keep track of those names for later comparison

Reusing Code from Stage 1

The foundation came from my Stage 1 pass, which walked through every function and counted its basic blocks and GIMPLE statements. That gave me a way to see all functions and their structures—perfect for extending into clone detection.

Here’s a snippet from that pass, just to recap:

FOR_EACH_BB_FN(bb, fun) {

    for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {

        gimple_count++;

    }

    bb_count++;

}

fprintf(stderr, "Function %s: %d BBs, %d GIMPLE stmts\n", function_name(fun), bb_count, gimple_count);

Detecting Clones

To detect clones, I used a pretty simple trick—look for a dot (.) in the function name. In GCC, clones often include suffixes like .constprop, .variant, or .resolver, which makes it easy to spot them.

Here’s the logic I used:

const char *full_name = function_name(fun);

char base_name[256] = "";

bool is_clone = false;


const char *dot = strchr(full_name, '.');


if (dot != nullptr) {

    is_clone = true;

    size_t len = dot - full_name;

    strncpy(base_name, full_name, len);

    base_name[len] = '\0';

    fprintf(stderr, "Detected clone for function: %s (full name: %s)\n", base_name, full_name);

} else {

    strncpy(base_name, full_name, sizeof(base_name) - 1);

    fprintf(stderr, "Processing original function: %s\n", full_name);

}


So with this code in place, every time GCC processes a function, my pass checks if it looks like a clone and logs it. This gives me the info I need to compare functions later.


Why This Step Matters

Before you can decide whether two functions are “substantially the same,” you need to know which ones to compare. Detecting and grouping cloned functions by their base name is the foundation of the whole analysis.

It may seem simple, but getting this step right makes the rest of the process (especially the comparisons in Part B) a lot easier and more accurate.

Reflection: Getting Comfortable with Compiler Internals

This part of the project helped me get more comfortable with how GCC structures its internal representation of functions—especially through names and passes. I learned how clones are named, why they’re created, and how to parse those names to pull out meaningful data.

I also started noticing how compiler behavior can vary between architectures. For example, suffixes like .Msve2 or .Mrng appear on different platforms like ARM (aarch64) versus x86. I’ll have to take that into account in later parts of the project when I start comparing functions.One of the most satisfying moments was finally seeing the Detected clone for function: messages show up in my diagnostic output. That was the first sign my pass was doing its job.

What’s Next?

In the next post (Part B), I’ll dive into the actual comparison of these cloned functions. I’ll look at their GIMPLE, basic block counts, and more to see if they’re “substantially the same.”


Comments

Popular posts from this blog

Project Stage 3

Project Stage 2.C

Welcome to My SPO600 Journey

Lab1 6502 Assembly Language

Stage 1 Completed

Project Stage 2.B

Lab 4 Building GCC on My Linux VM

Lab 2 Adventures in 6502 Assembly

Project Stage 1 Diving In!