The first measures we took highlighted such a huge kludge that we took the time to better the comparison algorithms before taking any further action. Through the use of google's Guava and some rethinking, we improved the comparison time on big model comparison a great deal, dividing the total comparison time in half!
Here is a sample of the time it takes now to compare your models with the latest builds of EMF Compare :
Structure of the sample models used in these tests ("fragments" are the number of fragmented files, the rest are UML model elements contained by the samples) :
Small | Nominal | Large | ||
Fragments | 99 | 399 | 947 | |
Packages | 97 | 389 | 880 | |
Classes | 140 | 578 | 2169 | |
Primitive Types | 581 | 5370 | 17152 | |
Data Types | 599 | 5781 | 18637 | |
State Machines | 55 | 209 | 1311 | |
States | 202 | 765 | 10156 | |
Dependencies | 235 | 2522 | 8681 | |
Transitions | 798 | 3106 | 49805 | |
Operations | 1183 | 5903 | 46029 |
Time and memory used required to compare each of the model sizes (the model is copied, randomly modified, then the copy is compared with its original) :
Small | Nominal | Large | ||
Time (seconds) | 6 | 22 | 125 | |
maximum Heap (Mo) | 555 | 1019 | 2100 |
We initially thought that the time was mostly spent matching the elements together. It turns out that the real bottleneck of a comparison is the differencing phase (we know the two elements "match", now we need to know if, and how they differ). Just goes to show you that a profiler is mandatory in order to really know why your product might be slow ;).
Future development
These profiling sessions made clear the problems of EMF Compare on large models. We can properly handle "medium" models : EMF Compare is fast and its GUI can react at reasonable speed. However we cannot scale this behavior to large models : memory management is somewhat inefficient (more than 2Go of heap space to compare two models that both "weigh" 50Mo on disk...), the GUI is sluggish... We did isolate a number of potential improvements though. Stay tuned!
Note : the report is online and can be retrieved from here for those interested.