You’ve inherited a long running Alfresco development project. Years of code changes have created an accretion of unused classes from bulk additions of reused packages, code changes leaving bits and pieces behind, etc. You’re pretty sure that you’re spending a lot of time grepping and sifting through obsolete classes. It would be nice to get rid of some of the junk without spending immense amounts of time. You would buy a sophisticated tool that would analyze the code base and tell you where to start working but those are pretty expensive and you are pretty low on budget. That is where the Poor Man’s Cruft Removal can be useful.
This is static analysis to help identify dead code only, it won’t help you clean up your source or test coverage. It works well on Spring based projects like Alfresco customizations where there are explicit bean configurations in xml. The pattern is to create one list representing the universe of classes in our project and another that we are pretty sure could be in use. Differencing the lists produces a relatively small output list for manual evaluation. It is not a push button process, eyes on the source are required before removing anything.
My project contained a pretty hefty set of Alfresco customizations, a couple of external servlet apps and, a long history of changes and Alfresco upgrades.
I used the following rules to identify source that was a candidate for removal. The driving lists were created by grep. I then used Python collections for de-duping and differencing lists.
- Any class that is duplicated in source is a good candidate for one class definition removal.
By spitting out a list of filenames where package + class name are duplicated then differencing the source files identified, I found a few duplicated Java class definitions. Where there were actual duplicate files by name and content I was able to remove the .class file from one output jar or another. Where the file contents were different I often found that the classes were ending up in different application .war files and left them alone. - Any class that appears in source but does not show up in the compiled classes list is a good candidate. I created a difference of the list of all .java files and the list of all .class files. In my project uncompiled source turned out to be unit tests that were unmaintained, nobody knew how to run, etc.
- Any xml with no active child nodes relevant to the application should be removed.
I ran through a listing of all .xml files, loading and parsing the xml. The output is a list of files where there are less than some low number of child levels below the root—I used three levels. If there wasn’t anything important in the contents I removed the file. This caught some commented workflow definitions and context files where all <bean> elements had been commented out over time. - Any class that is compiled but not Spring configured or a dependency of a Spring configured class is a good candidate. This is the most involved list to generate and yielded the most obsolete class results. I created a list of classes from the class attributes of <bean> elements in *context.xml files. This created a list of classes that I would not investigate as candidate obsolete classes. I ran dependencies for each class in that list using the open source DependencyFinder-1.2.1-beta4 and added them to the list of non candidates. I took the non-candidate list and differenced against a list of all compiled classes in my project. That difference was my candidate obsolete class list. To make life easier, I took my list of candidate obsolete classes and ran a grep on the classname in the project source to produce a spreadsheet that listed hits in .java, .xml, .properties, .*. I added a column that contained the sum of non zero values in the previous columns. I could then use the spreadsheet to sort on the file reference count ascending and worked the easiest pickings with zero total references first. I quickly saw diminishing returns at two references and stopped after working the list after that.
POOR MAN’S CRUFT REMOVAL RESULTS
Project started with:
1365 classes defined in .java
785 context and workflow .xml
Project ended with:
914 classes defined in .java
718 context and workflow .xml
Need help with your ECM project? Contact us today.