Monday, June 3, 2013

EMF Compare - GMF strikes back

Here we are, EMF Compare 2.1.0 is just around the corner. This was a very exciting, yet very busy year. EMF Compare 2.0 was released last summer, laying out the foundations for all of the cool stuff we wanted to (but could not) include. Now, 2.1.0 comes to fill the gap between the 1.* and 2.* streams, most notably through the re-introduction of support for graphical comparisons.

So what can you really expect from Compare 2.1? Let's skip the project life and focus on the cool features that can now be used.

Graphical Comparison

We introduced a "preview" support of the graphical display of differences into the project in its fifth milestone (M5) back in february. This was one of the biggest features we wanted to polish for the release, and polished it has been :).

This support is quite generic, and it should be able to handle most GMF-based modelers without too much work from their side (a few extensions are needed to tell EMF Compare how to open their diagrams). For now, our primary target was Papyrus.

Though the differences themselves were all pretty well detected, we were not happy with how they were displayed in this first draft. The decorators we use to tell the users "what changed in there?" were thus our primary focus. Here are a few examples of the differences that can be detected in Papyrus UML diagrams... and how we display them in said diagrams.
  • Adding a Class
    Both "new" and "old" models are displayed, on the left and right sides respectively. The Class "A" has been added : it is highlighted in the left (new) diagram, and a transparent placeholder for its shape is shown over the right (old) diagram.
  • Removing a feature
    Within lists, the placeholder is shown as a single line:
  • And if we need some context to understand the change?
    Some changes cannot be understood if shown alone. For example, cascading differences (delete a package and all of its content) need some context to understand the differences related to said "content". Likewise, if we delete both sides of an association along with the association itself, we'll need contextual information to understand the association difference. This is handled through lighter-colored placeholders.

  • What about conflicts?
    In case of conflicts, we try and highlight all related information on all three sides of the comparison:
Comparing models without identifiers

One of the hardest part of comparing EMF models is that we need to "match" the elements contained by these models together. When we are given two instances of a given model in two different versions, we need a way to tell that some package "library" in the first version is the same element as the package "library" in the second version. Likewise, we need to be able to tell that the attribute "pages" of a class "Book" in the first is the same as the attribute "length" of the class "Book" in the second version before we can even tell that there is a difference on that attribute (it has been renamed).

When we have identifiers in our model, this is an easy matter. We assume that the identifier of one given object has not changed between the two versions. However, this is not always (rarely, in fact) the case. EMF Compare 2.1.0 re-introduces support for such models, computing and matching objects through their similarity. For example, here is the result of comparing two ecore files together:
Enhanced user experience

The amout of data we compute is quite large, reflecting the accuracy we desire for the comparison; and the number of differences between two versions of the same model can be daunting. We strived to improve the comparison UI in order to provide a much more precise and intelligible information. We've used two means to that end, both of which can be extended by clients of the API.
  • Grouping differences together
    By default, EMF Compare does not group differences, and simply displays them as they've been detected:
    One of the options we provide by default lets you group these differences according to their originating side (in the case of three-way comparisons, comparing with a remotely controlled version for example), along with a special group for the conflicts (if any):
  • Filtering differences out of the view
    A second option (of course, both can be combined) to limit the number of visible information is to filter out differences that could be considered somewhat as "noise". For example, EMF Compare detects all differences within the containment tree: if the Class "Book" has been removed, then of course its attribute "pages" has been removed. And in turn, the "type" of this attribute has been unset. Those are three differences resulting from a single one. By default, EMF Compare will not display the "resulting" differences, focusing on the "root" only:
    However, they are still computed, and they are still there in the comparison. We called these "cascading" differences, and users can choose to have them displayed instead by unticking the associated filter:

This has already become too long of a post (kudos if you read all the way till here ;)). Anyone interested in the full list of enhancements and highlights of this release can find it here on the project's wiki, with a little more details.

Thursday, February 14, 2013

EMF and Graphical comparison - what's in the pipeline?

A lot of things are being worked on for EMF Compare, and while we're polishing the core and refining the user-interface, a few new features have been included in the latest milestones. The biggest focus is on the graphical (GMF) comparison and displaying of differences:

Graphical Comparison

It's been a while since I mentionned that graphical comparison with EMF Compare 2 was coming. Well, the feature has been included as a preview in the M5 milestone of EMF Compare (which can be installed from this update site). We can now properly detect and merge graphical differences in GMF models.

However, we have now initiated a more thorough reflection on how to display these graphical differences. Our preliminary implementation is not satisfactory:
Should we use a custom color code or reuse the "team" colors (those that show on the icon's overlays), should be draw a rectangle "around" the differences or decorate the existing figure's borders, how can we best show that an element has been deleted?

We are trying to determine how these differences would be best displayed... and your opinion matters :). If you think you can help on this reflection, or wish to share any thought on this point, the specification of what we expect the graphical display to look like can be found on the wiki. We've initiated a thread on the compare forum for the discussion to take place for anyone interested.

Two special integration features for Papyrus and Ecoretools are also contributed, though these only include means to detect "label" differences and may be temporary: labels are computed when displayed, potentially from many distinct other features... thus detecting and merging them is very costly. For the technical, we have to create off-screen editparts and compare the labels textually. Merging requires calls to the direct edit tool when there is one. These integration features may not be kept in the final release.

User Interface

The second most visible change that's coming with this M5 milestone of EMF Compare is a deep modification of how differences are displayed in the structural view. In short, we previously had a very long sentence that tried to describe the changes in details:
While we have reduced this to the bare minimum useful information with this new version:
We expect that the simple label, along with the change icon (remote, local, conflict...), will be enough to understand what happened in the model. We add the name of the feature that actually changed along with the type of difference we detected as additional information.

The content viewers (the two/three panes displayed as the bottom half of the comparison editor) are also expected to change before Kepler is live. Namely, we are currently changing the way we show the differences in their context. Currently, the only context we offer is the list of siblings of the changed element:
For containment changes, this is quite disturbing, and we are changing that to display the whole tree instead (along with the other containment changes detected during this comparison:


Extensibility

EMF Compare is thought and implemented as a framework, and we are striving to provide all extensibility means that could be necessary to tweak, customize or replace the comparison and merging processes. I won't go in much detail here, more information on each possibility will be added to the wiki, or questions can be asked on the forum.
  • Customize the comparison process: Most steps of the comparison process can be modified, be it the matching, differencing, detection of equivalences, detection of conflicts, resolution of the logical model...
  • Custom mergers: We now provide an extensible merging framework so that extenders (or users) can alter the default behavior or contribute their own merging policy for either default or custom differences.
  • Filtering or grouping differences: Differences displayed in the structural view can be filtered and/or grouped together. A number of default options are provided, but new ones can be added seamlessly through extension points.
  • Customized user interface: There are a number of entry points to customize the user interface of EMF Compare. For example, the graphical comparison we were discussing above is entirely contributed to the EMF Compare UI as an extension. Clients can also tweak the labels and icons of the differences, contribute new toolbar actions, ... This is a part that still lacks good documentation, feel free to get in touch through the forum if you need more details on this.

Friday, November 30, 2012

Diagram Comparison with EMF Compare 2

We have had a number of questions regarding the diagram comparison support in EMF Compare 2. This is something that was (somewhat) provided with EMF Compare 1.3, but that we have yet to port to the new version of the APIs.

Well, the work is ongoing and preliminary support for diagram comparison is available in the current integration builds. The port was done in the context of the modeling platform working group and is mainly targetting Papyrus, though ecoretools diagrams (and plain GMF diagrams) should also be supported.

We have not fully settled down on how to represent deletions on a diagram, but this otherwise begins to take form :
Do note that this is still very rough, and merging some of these differences (mainly "label changes") graphically is still problematic, but we expect this to be fine-tuned for Kepler's release :).

Stay tuned!

Wednesday, October 17, 2012

EMF Compare scalability - take two

Exactly one year ago, I published here the results of the measuring and enhancing of EMF Compare 1.3 scalability in which I detailed how we greatly improved the comparison times, and how we had observed that EMF Compare was a glutton in terms of memory usage. At the time, we did isolate a number of improvement axes... And they were part of what made us go forward to a new major version, EMF Compare 2.0.

Now that 2.0 is out of the oven, and that we are rid of the 1.* stream limitations, it was time to move on to the next step and make sure we could scale to millions! We continued the work started last year as part of the modeling platform working group on the support for logical resources in EMF Compare.

The main problem was that EMF Compare needed to load two (or three, for comparisons with VCSs) times the whole logical model to compare. For large models, that quickly reached up to huge amount of required heap space. The idea was to be able to determine beforehand which of the fragments really needed to be loaded : if a model is composed of 1000 fragments, and only 3 out of this thousand have changed, it is a huge waste of time and memory to load the 997 others.

To this end, we tried with EMF Compare 1.3 to make use of Eclipse Team's "logical model" APIs. This was promising (as shown in last year's post about our performances then), but limited since the good behavior of this API depends on the VCS providers to properly use them. In EMF Compare 2, we decided on circumventing this limitation by using our own implementation of a logical model support (see the wiki for more on that).

As a quick explanation, an EMF model can be seen as a set of elements referencing each other. This is what we call a "logical model" :
However, these elements are not always living in memory. When they are serialized on disk, they may be saved into a single physical file, or they can be split across multiple files, as long as the references between these files can be resolved :
 If we remain at the file level, this logical model can be seen as a set of 7 files :
Now let's suppose that we are trying to compare a modified copy (left) of this model with a remote version (right). This requires a three-way comparison of left, right and their common ancestor, origin. That would mean loading all 21 files in memory in order to recompose the logical model. However, suppose that only some of these files really did change (blue-colored hereafter) :
LeftRight
In such a case, what we need to load is actually a small part of the 21 total files. For each three sides of the comparison, we only need to load the ones that can potentially show differences :
Which means that we only need to load 9 out of the 21 files, for 58% less memory than what EMF Compare 1 needed (if we consider all files to be of equal "weight"). Just imagine if the whole logical model was composed of a thousand fragments instead of 7, allowing us to load only 9 files out of the total 3000!

With these enhancements, we decided to run the very same tests as what we did last year with EMF Compare 1.3. Please take a look at my old post for the three sample models' structure and the test modus operandi. I will only paste here as a reminder :

The models' structure :







Small Nominal Large

Fragments 99 399 947

Size on disk (MB) 1.37 8.56 49.9

Packages 97 389 880

Classes 140 578 2,169

Primitive Types 581 5,370 17,152

Data Types 599 5,781 18,637

State Machines 55 209 1,311

States 202 765 10,156

Dependencies 235 2,522 8,681

Transitions 798 3,106 49,805

Operations 1,183 5,903 46,029

Total element count 3,890 24,623 154,820

And the Time and memory used required to compare each of the model sizes with EMF Compare 1.3 :







Small Nominal Large

Time (seconds) 6 22 125

maximum Heap (MB) 555 1,019 2,100

And without further ado, here is how these figures look like with EMF Compare 2.1 :







Small Nominal Large

Time (seconds) 5 13 48

maximum Heap (MB) 262 318 422

In short, EMF Compare now scales much more smoothly than it did with the 1.* stream, and it requires a lot less memory than it previously did. Furthermore, the one remaining time sink are the I/O and model parsing activities. The comparison in itself being reduced to only a few seconds.

For those interested, you can find a much more detailed description of this work and the decomposition of the comparison process on the wiki.

Friday, September 14, 2012

EMF Compare 2 is available

On behalf of the team, I am very proud to announce that the 2.0 release of EMF Compare is now available and can be downloaded from its update site. This release is compatible with Eclipse 3.5, 3.6, 3.7, 3.8 and 4.2.

This new version is an overhaul of the project and as such contains too many enhancements to really list. You can refer to the project plan for an overview of the major themes adressed by this new release, or to my previous post for more information on why we decided to rewrite EMF Compare.

API compatibility between the 1.3 and 2.0 releases is not assured. The new APIs provided with 2.0 were developed in order to be simpler and more intuitive. A migration guide is not yet available, but we will provide all possible help in order to bridge the gap between the two streams' APIs and help users migrate to the new and improved Compare.

Do not hesitate to drop by on the forum for any question regarding this release!

Thursday, August 2, 2012

EMF Compare 2.0 on the launching pad!

This year has been a really important one for us at Obeo to think about our current products and how to improve them. Eclipse Juno was a big investment for us, with a lot of modeling projects in the release train : Acceleo, EMF Compare, EEF, ATL, Mylyn Intent...

EMF Compare is the one that got us thinking the most, and we ended up developping two versions in parallel during the development phase of Juno. First is the version we actually included in the release train, 1.3.1, which is now the maintenance stream.

This version includes a number of new developments and enhancements. Most notably, the detection and merging of differences on values ordering has been greatly improved, and the merging in general was one of our major endeavors this year, fixing thousands of corner cases. Please refer to the release highlights for the noteworthy changes of this version.

However, the 6 years of development spent on EMF Compare allowed us to identify a number of limitations in the architecture and overall design of the project, most of which cannot be worked around or re-designed without breaking the core of the comparison engine. We needed a complete overhaul of the project, and thus the second version we've been developping through the past year is the 2.0.0.

Version 2 has been started as part of the work sponsored in the context of the Modeling Platform Working Group. Its two main objectives are to :
  • lift the architectural and technical limitations of EMF Compare so that we can reduce its memory footprint while improving its overall performance, and
  • simplify the API we offer to our adopters so that launching comparisons or exploiting the difference model can be done more effectively.
We're also taking this opportunity to further our integration with the "standard" Eclipse Compare UI, making sure to use the Eclipse Compare UI everywhere we can. This means that the icon decorators will now properly show exactly as they would with text comparisons, the behavior when switching viewers on selection is now closer to what Eclipse Compare does...

Version 2.0.0 of EMF Compare will be out in the first half of august. It focuses on the comparison of models presenting IDs of some sort (XMI ID, technical ID, ...). The design and architecture have both been remasterized with all the experience accumulated on the 1.* stream, greatly improving the user experience, performance, memory footprint and reusability of the tool. We've submitted a talk for Eclipse con Europe 2012 focussing on the intended improvements of this version. Do not hesitate to comment there if you're interested in it!

Version 2.1.0 is scheduled shortly after that, sometime during september. That new version will leverage the new architecture to provide the scalability improvements mandatory for very large model comparisons. In short, EMF Compare will be able to function with time and memory footprints dependent on the number of actual differences instead of being dependent on the input models' size.

Friday, January 20, 2012

Traceability test case : UML to Java generation

The next version of Acceleo will introduce a number of improvements of its existing features, one of its most important, the automatic builder, being entirely re-written in order to get rid of legacy code and improve the user experience. This also comes with a much better experience for users that need to build their Acceleo generators in standalone, through maven or tycho. More news on this are available on the bugzilla and Stephane's latest post.

One of the least visible features, yet one I find among the most interesting aspects of the Acceleo code generator, is the traceability between the generated code, the model used as input of the generation, and the generator itself. Basically, you always know where that esoteric line of code came from : which part of the generation template generated it, and which model element triggered its generation.

This feature has known a series of improvements as we used it intensively with the UML to java generator. We are now confident that we can record and display accurate information even for some of the most complex use cases. This generator isn't the most complex generator we could write with Acceleo, but it is quite complete nonetheless. Here are some examples of what the Traceability can provide to architects developping their code generators :

Determine which model element triggered the generation of a line of code


On the left-hand side, a file that has been generated. On the right-hand side, an editor opened as result of using the action "open input" : the model is opened and the exact element that was used to generate the part of code is selected.

Find the part of a generator that created a given part of the code

On the left-hand side, the same generated file as above. On the right-hand side, the result of using the "open generator" action : the Acceleo generator which generated that selected part of the code is opened, with the exact source expression selected.

Real-time Synchronization



These are but a few of the features that can be derived from the synchronization between code, model, and generators. Some more examples include : previewing the result of a re-generation, incremental generation, round-trip (updating the input model according to manual changes in the output generated code)...

Most of these features are better seen in video to get an idea. If you want to see some of them in action, some more flash videos of what traceability can do for you are available on the Obeo network (though a free registration is required).