2 min read

How do You Know if Code is Duplicated or Not

To detect code duplication, you need to think in terms of concepts instead of individual statements.
How do You Know if Code is Duplicated or Not

Code Duplication

There are many tools available in most languages for analyzing the code base for code duplication. These tools work by looking for duplicated lines of code. These tools don't necessarily detect code duplication in the way this article will be covering. Code duplication isn't just duplicated statements; it is code that is missing some type of concept.

Thinking in Concepts

Everything in programming can be boiled down to either a statement or a concept. A statement is a line of code that does something. For example:

// Java code
System.out.println("Hello World!");

A concept is one or more statements that can be called in more than one place in the code. The way to know if you have created a concept instead of a statement is if you can call it in more than one place. Concepts can be classes, constants, enumerations, functions, methods, etc.

The following example shows the previous example but instead introduces the concept of saying hello.

// Java code
public void sayHello() {
    System.out.println("Hello World!");
}

The concept is created in the form of a function in this case. It can be called whenever the accessibility allows it.

When designing code, instead of thinking, "I need to create a class for a user", instead think "I need to create a concept of a user using a class."

How Do You Know if Code is Duplicated?

The same statement in more than one place is not necessarily code duplication. This is usually true for variable declarations. For example:

// Java code
public void method1() {
    final String firstName;
    final String lastName;

    //...
}

public void method2() {
    final String firstName;
    final String lastName;

    //...
}

None of this is duplication. The way to tell if you have code duplication is by analyzing if the code is the same. If it is, is the code missing a concept of what the code is doing? Does that code need to be called in multiple places?

The following example shows code duplication. The queries are omitted for simplicity.

// Java code
public class UserDao {

    public List<User> findAdmins() {
        final List<User> result = connection.sql("...");

        // Duplication
        result.forEach(
            user -> {
                final List<Role> roles = connection.sql("...");
                user.setRoles(roles);
            }
        );

        return result;
    }

    public User findById(final String id) {
        final User result = connection.sql("...");
        
        // Duplication
        final List<Role> roles = connection.sql("...");
        result.setRoles(roles);

        return result;
    }
}

This code contains duplication because there isn't a concept of loading roles. The query, although omitted in the example, would be duplicated as well as the code. This can be refactored to eliminate the code duplication by introducing the concept of loading roles.

// Java code
public class UserDao {

    public List<User> findAdmins() {
        final List<User> result = connection.sql("...");

        result.forEach(
            user -> user.setRoles(loadRoles())
        );

        return result;
    }

    public User findById(final String id) {
        final User result = connection.sql("...");

        result.setRoles(loadRoles(id));

        return result;
    }

    private List<Role> loadRoles() {
        return connection.sql("...");
    }
}

Conclusion

Tools can be very helpful in helping find duplicated statements. To know though if code is really duplicated, think in terms of concepts.