Thursday, October 27, 2011

EMF Compare scalability

During the sponsored work on EMF Compare, we have taken some time to measure the performance of EMF Compare and think about possible improvement axis. We meant to profile both the time needed to compare two models and the overall memory footprint of a comparison, this has been achieved through the use of the Yourkit Java profiler.

The first measures we took highlighted such a huge kludge that we took the time to better the comparison algorithms before taking any further action. Through the use of google's Guava and some rethinking, we improved the comparison time on big model comparison a great deal, dividing the total comparison time in half!

Here is a sample of the time it takes now to compare your models with the latest builds of EMF Compare :

Structure of the sample models used in these tests ("fragments" are the number of fragmented files, the rest are UML model elements contained by the samples) :

Small Nominal Large

Fragments 99 399 947

Packages 97 389 880

Classes 140 578 2169

Primitive Types 581 5370 17152

Data Types 599 5781 18637

State Machines 55 209 1311

States 202 765 10156

Dependencies 235 2522 8681

Transitions 798 3106 49805

Operations 1183 5903 46029

Time and memory used required to compare each of the model sizes (the model is copied, randomly modified, then the copy is compared with its original) :

Small Nominal Large

Time (seconds) 6 22 125

maximum Heap (Mo) 555 1019 2100

We initially thought that the time was mostly spent matching the elements together. It turns out that the real bottleneck of a comparison is the differencing phase (we know the two elements "match", now we need to know if, and how they differ). Just goes to show you that a profiler is mandatory in order to really know why your product might be slow ;).

Future development

These profiling sessions made clear the problems of EMF Compare on large models. We can properly handle "medium" models : EMF Compare is fast and its GUI can react at reasonable speed. However we cannot scale this behavior to large models : memory management is somewhat inefficient (more than 2Go of heap space to compare two models that both "weigh" 50Mo on disk...), the GUI is sluggish... We did isolate a number of potential improvements though. Stay tuned!

Note : the report is online and can be retrieved from here for those interested.

Wednesday, September 28, 2011

The dynamic interpreter, your code generation companion

The latest stable release of Acceleo, version 3.1.1 included in Indigo SR1, is fresh from the oven; but development on the next version is already well underway. The upcoming version 3.2 already includes major improvements of the tooling performance (compilation time, completion proposal computation, memory management...), making the everyday use of the Acceleo editor much more appealing; and some of the planned features for this version are already in their finalization stage.

The one feature I'll linger on here is the dynamic interpreter for Acceleo and OCL expressions that we implemented for this version. Typically, you develop your code generators within the Acceleo editor, making use of the advanced edition features such as completion proposals, live compilation and syntax error reporting, syntax highlighting ...

However the editor itself cannot help you determine what a given expression will return as its result. When you are in the middle of a complex OCL expression, it is sometimes difficult to say "this expression will output that classifier". And here is where the live interpreter comes into play!

The image above displays the "interpreter" view and the use we can make of it : We opened the UML metamodel and selected "Class" in it. This makes "Class" the context of any evaluation entered in the interpreter (that would work with anything that can be adapted to an EMF object : selection in a model editor, selection in a graphical editor (GMF, Obeo Designer... even the "variable" view when debugging Acceleo generations!). We entered an expression... and the view made the rest, evaluating that expression on-the fly and displaying its result (here, all of the super-types of the UML "Class").

Using the Interpreter during debug

The Interpreter can be used as is, defining your own variables as you need them and entering the expression manually... but it can also be used in conjunction with the debugger. For example, let's say I want to know what happens within the 'UML 2 Java' example that is provided with Acceleo.
  • I open the "classBody" module of that example, and set a breakpoint somewhere within the "generateClassBody" template :
  • Then, run the example in debug mode until the breakpoint is hit :
  • And from there, I can either copy/paste whole expressions in the interpreter :
  • Or "link" the interpreter with the current module context...
  • And directly call its templates :
  • Of course, the result of this last action is lengthier than a single line, double clicking the result will open a popup in which it will be more readable :

The view features a number of possibilities, creating and assigning variables to be used in the expression, real-time evaluation, linking with an Acceleo editor's context, saving the expression as a new query or template in a given Acceleo module... The view itself can be used without Acceleo, and accepts any other languages through extension points. I can't detail all of its features and extension possibilities here, see the wiki page for more (still a work in progress, yet it does describe the view in little more details than here) :).

Tuesday, August 9, 2011

EMF Compare - giant steps towards a working merger

EMF Compare, or how to provide meaningful comparison algorithms and visualization for models. Comparing the models' XMI serializations as text is a chore that we should never have to go back to, and EMF Compare has always strived at doing the best possible job towards that end.

Who would rather compare their models textually :

Than compare them logically (yes, these are the same models as above) :

However, EMF Compare still had issues with most algorithm : small errors when matching elements together, differences that weren't detected ... and worst of all : merging all diffs from one side to the other rarely gave the expected result of "two identical models", i.e : if the comparison process in itself was working fairly well, we had a huge number of failures when merging.

We've decided to tackle this problem the way we should have from the very start : list all potential differences that can be observed between two models ... and create the corresponding unit test!

The results are already showing. True, we are still missing a number of unit tests in order to test all potential use cases we can list ... but we've already fixed an incredible number of bugs be they known (half of our opened bugs, forty-ish out of eighty or so) or bugs we had never detected before. The number of unit tests talks for itself :

That's 4000 unit tests and as many different combinations of differences added and fixed in the past weeks.

I won't lie, some of these tests are redundant because of the way we decided to work : we are not trying to cover code, but rather cover use cases; which means some of these tests overlap with others. This does mean, however, that we are fairly sure we'll never have regressions on things that are tested :).

And now, back to the last few tests we are still missing!

Thursday, June 16, 2011

What's next in EMF Compare

Indigo is now out and about; another great release shipped on time. I am always amazed that the Eclipse community manages to release each year on schedule. Congratulations to all that help make it happen!

EMF Compare 1.2.0 was part of the train again this year, I recently blogged about the new and noteworthy of that release. Now that the train has arrived, we can think of what to do next... and it looks like a busy year is coming EMF Compare! Further integration with the Team framework, graphical comparison...

Here is a quick peek of what EMF Compare could look like for Eclipse Juno next year :

Logical Resources support

Up until now, we've considered that EMF models were tied to physical files, one EMF Resource being one File on the disk. That is true in some cases ... but in many others, a "model" is not really a "file". One file holding an EMF Resource (let's call it "library") can reference multiple other EMF Resources ("books" and "writers"), and it can even be split into multiple physical files itself.

If we call EMF models "logical resources" and their files on disk "physical resources", we thus distinguish two cases for EMF Compare : 1) one logical resource mapped to one physical resource (no reference, no fragments) and 2) one logical resource mapped to multiple physical resources (an EMF model that references others, and/or is fragmented). Only the first of these two cases is properly handled for now; we've only scratched the surface of the second and handled the most common cases.

Eclipse provides us with a framework to work on logical resources and resource mappings. We've decided to take advantage of these APIs for the case of collaborative work on EMF models. The idea is to make sure that the user never ends up with a corrupt logical model by preventing him from doing anything on a single part of the physical resources' set (or warning him when he does so).

For example, if an EMF model "library.genmodel" references another EMF model "library.ecore", and there are changes in both (the name of an EClass changed, and so did the corresponding GenClass in order to react to that name change), it should never be allowed to the user to commit only one of those two files : they are part of the same logical model, commiting only one of the two physical files may corrupt the logical model (which is the case in this example). The same applies to comparing either one of the physical files with the repository : as they are part of a set, the whole set should be used when comparing.

This support is already implemented and functional with CVS; the EGit plugin does not yet use the necessary APIs, yet support is on its way and should be gradually implemented on the way to Juno.

As a result, here is what you would get with the current EMF Compare when trying to commit such linked files to CVS HEAD :

In other words, you are allowed to commit the genmodel file alone... which will prevent anyone retrieving that version from opening it from their side as it references a Class that does not exist without the change to "library.ecore". On the contrary, here is the same flow of action with the logical resources support :

Here, even though we tried to commit "library.genmodel", Eclipse forces us to commit "library.ecore" along with it in order not to corrupt the logical model underneath.

Diagram comparison

EMF Compare supports comparison and merging of any kind of EMF model. It then displays the differences in a the form of two trees side by side as illustrated below :

The tree form, however, is not the most adapted representation for all models. When comparing GMF diagrams for example, it would be interesting to display the detected differences directly on the diagram itself :

Support for this graphical representation of the differences is planned for the end of the year. More information regarding this feature (and specifications of what we intend to implement) can be found on the EMF Compare wiki.

UML Compare

We have faced since the first version of EMF Compare a number of problems when comparing UML files; problems that just could not be properly handled by the generic approach we take for the comparison.

Most of these problems come from the fact that a "semantic" change in UML is often reflected by a number of "physical" changes in the model. For example, adding an association between two classes results (with the Papyrus) in two actual changes in the model :
  • a new element is added (the Association)
  • a new Property is added to the target class
With the Indigo release, we integrated the first version of an UML-specific comparison engine in order to properly detect (and display) the semantic level differences instead of the previous 'physical level' differences.

This first implementation will be greatly improved so as to handle the most common UML diagrams (class, package, use case, sequence... the full list can be found on the wiki page), for all potential changes that can be applied in them.

For example, the Indigo version of the UML comparison engine does not handle specifically message additions, which thus result in seven differences detected :

While the current implementation (which will be available for the Indigo SR1 release later this year) properly handles them and displays a single change :

Indigo is out, time to move on to these exciting new features... Well, almost :). For now, 'tis time to relax a little bit. Sun, beach and beer seem like a perfect combination to that end :).

What's new in EMF Compare

Here we are again, it is almost time for the annual Eclipse release; this year for the sixth consecutive named simultaneous release. EMF Compare has been there since Callisto, the first of those six; and EMF Compare 1.2.0 is now on its way out. So, what has been added to this new release?

Among the usual bug fixes, we have added some noteworthy features :

Difference grouping

EMF Compare can detect a number of differences between models; additions, deletions, moves... It is now possible to "group" these changes by type in order to read the comparison with a little more ease.

New Grouping strategies can also easily be contributed by client plugins; more information can be found on the dedicated wiki page.

Difference filtering

It is now possible to filter out some of the differences from the EMF Compare UI. For example, here is the same comparison as above without the "added" elements :

As for grouping, new filtering strategies can be contributed by clients plugins. Detailled information on this feature can be found on the wiki.

Textual comparison of attribute values

If a change in an attribute value has been detected by EMF Compare, it will display, by default, a message such as "attribute : in class has changed from to ". This message is hardly readable for long "string" values of attributes. EMF Compare now allows you to get a dialog displaying the textual comparison of those values :

As usual, detailled information can be located on the corresponding wiki page.

UML-specific comparison engine

EMF Compare provides a "generic" comparison engine that works on every EMF models you can find. The disadvantage of a generic engine, though, is that it cannot know of specific needs. UML models, for example, have a number of features that EMF Compare cannot properly handle in a generic manner.

We now provide along with EMF Compare 1.2.0 an UML-specific comparison engine to take care of that metamodel. For example, here is what the generic engine detects when applying a stereotype to a Class :

And here is what can will be detected and displayed by the UML specific engine :

This UML comparison engine is still in its early stages and will be improved in the months to come, but it is still provided for early adopters. Further technical information about this extension are described on the Eclipse wiki.

Most of these features have been implemented as part of the Modeling Platform working group.

Tuesday, June 7, 2011

Acceleo query cache

Some of our users have been bit by the fact that Acceleo caches the result of Query invocations, returning the very same result each time a given query is called. These users often came with the same two questions : "why?" and "Can this cache be disabled in my case?".

The answer to the first has always been and will remain the same : Acceleo is an implementation of the OMG's MOFM2T 1.0 specification, and this specification tells us that
  • A query is required to produce the same result each time it is invoked with the same arguments.
That is the reason we decided to cache the result : in order to return that same result each time the query is called, without re-evaluating it.

The answer to the second, however, changes with the 3.1.0 release of Acceleo : even though we strive to be as close as possible to the specification, there are times where this caching of the query return values is not desirable : it can be really costly memory-wise, it could be a call to a Java method that has random results, it could be a query which return value changes according to variable states that are not passed as parameters... That was possible through programmatic calls in 3.0.2, but we've decided to make that preference available through the UI in 3.1.0 :

The next step is to have the specification evolve in order to be able to disable the query cache "per-query" instead of globally but... that's another story :).

Friday, April 22, 2011

Acceleo syntax coloring

I announced in my latest message that the syntax highlighting colors of Acceleo would soon be configurable. Well, the preference page has now made its apparition and will definitely be available in the 3.1.0M7 build of Acceleo.

If anyone would like to contribute, we're still missing import/export capabilities to make these color themes easier to share between Eclipses and colleagues; we'd gladly accept such contributions :).

Monday, March 28, 2011

Acceleo over the rainbow

We've often had complaints about the default choice of colors we made for the Acceleo template editor :

For example, people who like high contrast themes, or who'd rather develop with a white/green font on a black/gray background complained that this colors were unusable when using such backgrounds. And since we inherit the background color from the default "text editor background" color, this could give awful results :

There were also those that simply didn't like to have "red" text in their code, since this color usually reflect an error of some kind :). As hinted from St├ęphane's recent post, we've decided to make these colors customizable in Acceleo 3.1. And even though his "rainbow" theme was a little extreme (just in time for april's fool), it did show a number of the areas that now get individually customizable.

We hope to include import/export facilities for these color schemes, and the rainbow scheme will most likely be included along with the default :).

I know my editor will more or less look like this (but I kind of find blue-ish themes easy on the eye) :

How will be yours? Look forward for the 3.1M7 build for these options :).