Quantcast
Channel: Java – blog@CodeFX
Viewing all 68 articles
Browse latest View live

The Design of Optional

$
0
0

In my last post I promoted using Java 8’s new type Optional nearly everywhere as a replacement for null.

As it turns out, this puts me at odds with with the expert group which introduced the type. This made me curious so I read up on its creation and decided to share my findings here.

Update

This is part of a series of posts about Optional:

Overview

The main part of this post is trying to give a summary of the process which lead to the introduction of Optional in Java 8. I will let the experts speak for themselves by quoting them wherever possible. I hope to properly convey the discourse but as the discussions were frequent, lengthy and sometimes controversial and heated, this is no trivial task.

At the end I will contrast the expert group’s reasoning with my own.

JSR 335

The Java Specification Request 335 dealt with Lambda Expressions for the JavaTM Programming Language. Its goal was:

Extend the Java language to support compact lambda expressions (closures), as well as related language and library features to enable the Java SE APIs to use lambda expressions effectively.

It was this context which lead to the inclusion of Optional in Java 8.

Members of the expert group for JSR-335 and strongly involved in the multiple discussions about Optional were people like Brian Goetz, Doug Lea and Rémi Forax. Chiming in were known experts like Joshua Bloch, Tim Peierls and others.

The archive of the mailing list lambda-libs-spec-experts is the source for this post. It can be found here. As it is plain text, all layout details like bold face or links were added by me.

The Road to Optional

How to create Optional?

Published by Raymond Bryson under CC-BY 2.0.

First Blood

The prize for first mentioning Optional (back in September 2012) seems to go to Rémi Forax, although it was Doug Lea who CC’ed the group’s mailing list. In this mail he gave a quick overview over the reasoning behind the possible need for a new type:

[…] There has been a lot of discussion about [Optional] here and there over the years. I think they mainly amount to two technical problems, plus at least one style/usage issue:

  1. Some collections allow null elements, which means that you cannot unambiguously use null in its otherwise only reasonable sense of “there’s nothing there”.
  2. If/when some of these APIs are extended to primitives, there is no value to return in the case of nothing there. The alternative to Optional is to return boxed types, which some people would prefer not to do.
  3. Some people like the idea of using Optional to allow more fluent APIs.
    As in
    x = s.findFirst().or(valueIfEmpty)

    vs
    if ((x = s.findFirst()) == null) x = valueIfEmpty;

    Some people are happy to create an object for the sake of being able to do this. Although sometimes less happy when they realize that Optionalism then starts propagating through their designs, leading to
    Set<Optional<T>>
    ’s and so on.

It’s hard to win here.

Doug Lea – Sep 14 2012

(By the way, if there were a motto for the discussions about Optional, the last sentence would be it.)

Note that Optional is solely described as a return type for queries to a collection, which was discussed in the context of streams. More precisely, it was needed for those terminal operations which can not return a value if the stream is empty. (Currently those are reduce, min, max, findFirst and findAny.)

It is hard to say whether that shaped the future discourse or just reflected the opinions already held. But it came to be the sole context in which Optional was discussed: as a type for return values.

Endless Discussions

From then on Optional created long exchanges and opposing sides every time it was mentioned. And not just two sides, either:

Boy, it seems that one can’t discuss Optional in any context without it generating hundreds of messages.

Whatever we do here is a compromise between several poles whose proponents hold very strong opinions. There are those that really want Elvis instead; there are others who feel that a box-like class for Optional is a hack when it really should be part of the type system. Neither group is going to get what they want here; we can compromise and make everyone a little unhappy, or we can do nothing and make everyone unhappy (except that they will still hold out vain hope for their pet feature in the future.)

Brian Goetz – Jun 5 2013

An often voiced opinion not mentioned in that particular quote was to forgo Optional completely.

In fact, we don’t need Optional at all, because we don’t need to return a value that can represent a value or no value, the idea is that methods like findFirst should take a lambda as parameter letting the user to decide what value should be returned by findFirst if there is a value and if there is no value.

Remi Forax – Mar 6 2013

So terminal operations which can not return a value if the stream is empty should in that case return a user provided value. So instead of this:

Optional<T> findFirst();

it would be one (or both) of these:

// return a fixed default value if necessary
T findFirst(T defaultValue);

// create a default value if necessary
T findFirst(Supplier defaultValue);

Some agreed…

I am for removing [Optional] […] if it doesn’t have nearly the same functionality as the Scala Option. The way Optional is written right now I would tell people not to use it anyway and it would just be a wart on this API.

Sam Pullara, Mar 6 2013

… some didn’t …

[Returning the user provided default value] prevents people from distinguishing between a stream that is empty and a stream containing only the “orElse” value. Just like Map.get() prevents distinguishing between “not there” and “mapped to null.”

Brian Goetz, Mar 6 2013

The last sentence hints at an often cited case: The fact that Map.get(Object key) can return null, which can either mean that the map contains the pair (key, null) or that it does not contain the key. Both cases are not easily distinguished by the caller. Everyone on the list agreed that this was a serious shortcoming of the Map API. Most noted that they would have liked all collections to forbid null as a value (like many Guava collections do) so returning null could always signal “nothing there”.

Another opinion about whether to return Optional or not was to have both variants. Then the users would be able to decide whether they want to use Optional or not.

People wanting to avoid Optional can then then get all of the derived versions (allMatch, plain findAny, etc) easily enough.
Surprisingly enough, that’s the only missing feature that would otherwise enable a completely Optional-free usage style of the Stream API.

Doug Lea, Mar 6 2013

But not everyone aggreed:

[…] the foremost reason I see for not allowing an Optional-free usage style is that people will adopt it rather than use Optional. They will see it as a license to put null everywhere, and they’ll get NPEs way downstream and blame it on Java.

Tim Peierls, Mar 6 2013

A survey about whether the not-Optional-bearing-variants should be added came to a tie of 3 in favor, 3 opposed and 1 abstained. But it seemed that some voters had the misconception that they could still get rid of Optional which made the result unreliable. Strangely enough, the survey was neither mentioned again nor repeated (or did I overlook something?).

Convergence

But the discussion slowly converged. Optional would be the return value of those stream operations which needed it (and there would be no Optional-free variant). It would contain some methods for fluent usage at the tail end of stream operations (like ifPresent, orElse, filter and map) but not much more. For example would it not be embedded into the Collection system (by implementing Iterable) like Scala’s Option.

Simplicity

The reason for not adding more functionality was a broad consensus that Optional should be kept simple and not support too many different use cases. Especially its use in collections should be discouraged:

Optional should be (and currently is) a very limited abstraction, one that is only good for holding a potential result, testing for its presence, retrieving it if it is present, and providing an alternative if not. We should resist the temptation to make it into something more or make it into a knock-off of the similar Scala type.

Tim Peierls, Mar 6 2013

Others feared that any discouragement would be ignored:

I don’t like it; I think it’s going to result in things like:

Map<String,Optional<List<Optional<String>>>>

David M. Lloyd, Sep 14 2012

Which was answered:

Only if you really work hard at obfuscating your code. I’ve been using a version of Optional for about a year, and the only time I had reason to use Optional as a type parameter was

Callable<Optional<Result>>
, which conveys exactly what I mean: “Might have a result when it returns.”

Tim Peierls, Sep 14 2012

Even equals/hashCode were only added to prevent user rage:

We talked to Kevin [Kevin Bourrillion from Google – member of the expert group] about their experiences with Guava’s Optional. His response was that they felt reasonable hashCode/equals methods were obligatory and without them users would, if not immediately then eventually, curse us for not providing them. The implementations are added with grudging reluctance.

Mike Duigou, Mar 8 2013

Value Type

Besides the goal to limit Optional’s use, there was another reason to keep the class simple:

Here’s another reason to stay lean: The more limited Optional is, the easier it will be some day to optimize away the extra object. Make it a first class participant and you can kiss those optimizations goodbye.

Tim Peierls, Feb 26 2013

What Tim Peierls is referring to is the concept of value types, which will very likely be introduced in some future version of Java. The gross simplification of that idea is that the user can define a new kind of type, different from classes and interfaces. Their central characteristic is that they will not be handled by reference (like classes) but by value (like primitives). Or, as Brian Goetz puts it in his introductory article State of the Values:

Codes like a class, works like an int!

That Java would likely evolve that way led Doug Lea to write this:

Note that Optional is itself a value-like class, without a public constructor, just factory methods.

The factory methods do not even guarantee to return unique objects. For all that the spec does and should say, every call to Optional.of could return the same Optional object. (This would require a magical implementation, but still not disallowed, and variants that sometimes return the same one are very much possible.)

This means that there are no object-identity-related guarantees for Optionals.

myOptional1 == myOptional2
tells you nothing, and
synchronized(myOptional)
has unpredictable effects — it might block forever.

Doug Lea – Oct 19 2013

This led to another lengthy discussion about how to inform the user about that. At the end, Optional’s (and other class’) Javadoc contained a small remark, that it is a value-based class, which includes a link to the term’s definition. That definition contains this warning:

A program may produce unpredictable results if it attempts to distinguish two references to equal values of a value-based class, whether directly via reference equality or indirectly via an appeal to synchronization, identity hashing, serialization, or any other identity-sensitive mechanism. Use of such identity-sensitive operations on instances of value-based classes may have unpredictable effects and should be avoided.

This defines a small battery of things which must not be done on those classes. They would most likely work for now, but might break in future versions.

[The users] are more likely to behave, but the special pleading has two motivations […]:

  • discourage users from doing wrong things
  • provide cover so that when we break code that does wrong things, they were adequately warned

Brian Goetz – Oct 23 2013

And with that lookout ended the discussions about Optional. At least for the JSR but judging from the opinions out there, I’d say it just broke free from that mailing list…

Reflection

In short, the expert group clearly wishes us to only use Optional as a type for return values whereas I recommend to also use it in other situations.

But I think we share some common ground. I only compared Optional to null and deliberately ignored the design decisions which led to “something not being there” even having a representation (either null or an empty Optional). In many cases the necessity to represent such a thing can be avoided with a different, often clearer design. A path which should definitely be taken! And I think this is what the expert group is trying to accomplish: have the programmer look for a better solution than sprinkling Optional everywhere.

There might be situations though, were such a design is not feasible for whatever reason. And in those and only those, I recommend to use Optional instead of null.

Following this principle will lead to Optionals being mostly created as return values. But I see no reason to reflexively and immediately extract the actual value (or use the default value). Especially not if the absence of a value might change the logical flow at some point in the future. The Optional box should then be handed over as is (again: if no other way exists). Another reason would be that the final use of the value does allow null (e.g. an argument to a library call). In that case the Optional should be handed around until the very last moment to avoid dealing with null.

But I share the expert group’s opinion about collections over Optionals: don’t do it! Extract the values and deal with missing ones separately (sure you can’t just ignore them?). Google has a quick guide on how to handle null in specific collections and the same concepts apply here.

So be careful with Optional, don’t let it be your Bolivian Tree Lizard, but use it if you must!

Want to join the discussion about Optional? Comment below or answer with a post and ping back.

Sigh, why does everything related to Optional have to take 300 messages?

Brian Goetz – Oct 23 2013

The post The Design of Optional appeared first on blog@CodeFX.


Why Isn’t Optional Serializable?

$
0
0

Java 8’s new type Optional was met with different reactions and one point of criticism is that it isn’t serializable. Let’s have a look at the reasons for that. A future post will then show how to overcome that fact if it is really necessary.

Update

This is part of a series of posts about Optional:

Overview

The post will examine the reasons for not making Optional serializable.

Shouldn’t Optional Be Serializable?

Why isn't Optional serializable?

Published by Horla Varlan under CC-BY 2.0.

This question was asked back in September 2013 on the jdk8-dev mailing list. In July 2014 a similar question was posted on StackOverflow.

That Optional is not serializable is also noted as a disadvantage of the new type here (especially in the comments) and here.

To establish the facts: Optional does not implement Serializable. And it is final, which prevents users from creating a serializable subclass.

So why isn’t Optional serializable?

Return Type

As I described in my summary of the process which introduced Optional into Java, it was designed as a return type for methods. The caller of such a method is expected to immediately check the returned instance. If the value is present, it should be retrieved; if it is not, a default value should be used or an exception should be thrown.

Used like that, instances of Optional have an extremely short life expectancy. Put somewhat simplified, they are created at the end of some method’s call and discarded a couple of lines later in the calling method. Serializing it seems to offer little in this scenario.

The discussion which followed the above question to the mailing list contains a number of answers by those involved in creating Optional. An indeed, their replies follow this argumentation:

There is a good reason to not allow Optional to implement Serializable, it promotes a bad way to use Optional […]

Remi Forax

Optional is nice from the API point of view, but not if you store it in a field.
If it’s not something that should be stored in field, there is no point to make it serializable.

Remi Forax

Using Optional as a field type doesn’t seem to offer much.
[…]
Concern that Optional would be misused in other use cases threatened to derail it’s inclusion in Java entirely! Optional is being added for the value it offers in “fluent” sequences of statements. In this context use of Optional as a visible type or for serialization isn’t relevant.

Mike Diugou

The JSR-335 EG felt fairly strongly that Optional should not be on any more than needed to support the optional-return idiom only. (Someone suggested maybe even renaming it to OptionalReturn to beat users over the head with this design orientation; perhaps we should have taken that suggestion.) I get that lots of people want Optional to be something else. But, its not simply the case that the EG “forgot” to make it serializable; they explicitly chose not to.

Brian Goetz

(Sidenote: The proposal to change the name was made by Stephen Colebourne on the Open JDK mailing list.)

These opinions are often reproduced when others answer questions about Optional. A good example for that is the answer given on StackOverflow by Stuart Marks. It’s an excellent summary of the expert group’s intentions about the use of Optional.

But I’m not sure whether that’s the whole picture. First of all, the same arguments apply to

equals
and
hashCode
. Arguably those methods are even worse, because they allow to effectively use Optional in collections, something which the EG wanted to avoid. Still, they were added to Optional without much ado.

It also stands to reason that not supporting serialization does not help very much in preventing misuse (as seen by the EG):

[…] Keeping Optional non-serializable doesn’t do much to prevent that from happening. In the vast majority of cases, Optional will be used in a non-serialized context. So, as preventative measures go, this isn’t a very effective one.

Joseph Unruh

Finally, I couldn’t find any discussion on whether Optional should be serializable on the lambda-libs-spec-experts mailing list. Something you would expect if it were decided for preventive purposes.

Lock-In by Serialization

There exists a general argument against serialization from the JDK-developers’ point of view:

Making something in the JDK serializable makes a dramatic increase in our maintenance costs, because it means that the representation is frozen for all time. This constrains our ability to evolve implementations in the future, and the number of cases where we are unable to easily fix a bug or provide an enhancement, which would otherwise be simple, is enormous. So, while it may look like a simple matter of “implements Serializable” to you, it is more than that. The amount of effort consumed by working around an earlier choice to make something serializable is staggering.

Brian Goetz

This certainly makes sense. Joshua Bloch’s excellent book Effective Java (2nd Edition) contains a whole chapter about serialization. Therein he describes the commitment a developer makes when she declares a class serializable. To make a long story short: it’s a big one!

I have no overview over the percentage of serializable classes in the JDK and how this quota changed with Java 8. But, to pick an example, it looks like most of the classes from the new date/time API are serializable.

Value Types

The next big change to the language casts its shadow (and it can already be seen in Java 8): value types. To repeat what little I already wrote about them:

The gross simplification of that idea is that the user can define a new kind of type, different from classes and interfaces. Their central characteristic is that they will not be handled by reference (like classes) but by value (like primitives). Or, as Brian Goetz puts it in his introductory article State of the Values:

Codes like a class, works like an int!

As described above, value types are not handled by reference, which means they have no identitiy. This implies that no identity based mechanism can be applied to them. Some such mechanisms are locking (the lock has to be acquired and released on the same instance), identity comparison (with

==
by checking whether the references point to the same adress) and – as Brian Goetz and Marko Topolnik were kind enough to explain to me – serialization.

In Java 8 value types are preceded by value-based classes. Their precise relation in the future is unclear but it could be similar to that of boxed and unboxed primitives (e.g.

Integer
and
int
). Additionally, the compiler will likely be free to silently switch between the two to improve performance. Exactly that switching back and forth, i.e. removing and later recreating a reference, also forbids identity based mechanisms to be applied to value-based classes. (Imagine locking on a reference which will be removed by the compiler. This might either make the lock meaningless or lead to a deadlock.)

To allow that change in the future, value-based classes already have similar limitations to those of value types. As their documentation says:

A program may produce unpredictable results if it attempts to distinguish two references to equal values of a value-based class, whether directly via reference equality or indirectly via an appeal to synchronization, identity hashing, serialization, or any other identity-sensitive mechanism.

You see serialization in there? Now, guess what! Optional is a value-based class!

And it’s a good thing, too, because it might lead to the compiler being allowed to optimize code using Optional to a degree that makes its impact negligible even in high performance areas. This also explains the implementation of

equals
and
hashCode
. Both are central to the definition of value-based classes and are implemented according to that definition.

(I assume that the limitation about serialization is a safety net. The current concept of value types already alludes to a way to serialize them. Something which is clearly necessary as not being able to do so would be equivalent to not being able to serialize an

int
, which is just crazy.)

So this might be the final nail that made Optional unserializable. And even though I like to think that it is The Real ReasonTM for that decision, this theory has some holes:

  • Why are some other value-based classes, like LocalDate and LocalTime, serializable? (That’s actually a good question regardless of Optional. I’ll follow up on that.)
  • The timing is not perfect. The above discussion on the mailing list (where the EG was adamant in not making it serializable) happened in September 2013, the discussion about Optional’s special status in the face of future language changes started in October 2013.
  • Why wouldn’t the EG come out and say it?

I guess it’s up to you to decide whether those holes sink the theory.

Reflection

We saw that there are different reasons to not make Optional serializable: The design goal as a type for return values, the technical lock-in produced by having to support serialized forms forever and the limitations of value-based classes which might allow future optimizations.

It is hard to say whether any single one is already a show stopper but together they way heavily against serialization. But for those undeterred, I will explore how to serialize Optional in another post in the next couple of days. To stay up to date, subscribe via RSS or Newsletter!

The post Why Isn’t Optional Serializable? appeared first on blog@CodeFX.

Concepts of Serialization

$
0
0

With all this talk about why Optional isn’t serializable and what to do about it (coming up soon), let’s have a closer look at serialization.

Overview

This post presents some key concepts of serialization. It tries to do so succinctly without going into great detail, which includes keeping advice to a minimum. It has no narrative and is more akin to a wiki article. The main source is Joshua Bloch’s excellent book Effective Java, which has several items covering serialization (1st edition: 54-57; 2nd edition: 74-78). Way more information can be found in the official serialization specification

Definition

With Serialization instances can be encoded as a byte stream (called serializing) and such a byte stream can be turned back into an instance (called deserializing).

The key feature is that both processes do not have to be executed by the same JVM. This makes serialization a mechanism for storing objects on disk between system runs or transferring them between different systems for remote communication.

Extralinguistic Character

Serialization is a somewhat strange mechanism. It converts instances into a stream of bytes and vice versa with only little visible interaction with the class. Neither does it call accessors to get to the values nor does it use a constructor to create instances. And for that to happen all the developer of the class is required to do is implement an interface with no methods.

Bloch describes this as an extralinguistic character and it is the root for many of the issues with serialization.

Methods

The serialization process can be customized by implementing some of the following methods. They can be private and the JVM will find them based on their signature. The descriptions are taken from the class comment on

Serializable
.

  • private void writeObject(java.io.ObjectOutputStream out) throws IOException

    Is responsible for writing the state of the object for its particular class so that the corresponding readObject method can restore it.
  • private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException

    Is responsible for reading from the stream and restoring the classes fields.
  • private void readObjectNoData() throws ObjectStreamException

    Is responsible for initializing the state of the object for its particular class in the event that the serialization stream does not list the given class as a superclass of the object being deserialized.
  • ANY-ACCESS-MODIFIER Object writeReplace() throws ObjectStreamException

    Designates an alternative object to be used when writing an object of this class to the stream.
  • ANY-ACCESS-MODIFIER Object readResolve() throws ObjectStreamException;

    Designates a replacement object when an instance of this class is read from the stream.

A good way to deal with the extralinguistic character of deserialization is to see all involved methods as an additional constructor of that class.

The object streams involved in (de)serializing provide these helpful default (de)serialization methods:

  • java.io.ObjectOutputStream.defaultWriteObject() throws IOException

    Writes the non-static and non-transient fields of the current class to this stream.
  • java.io.ObjectInputStream.defaultReadObject() throws IOException, ClassNotFoundException

    Reads the non-static and non-transient fields of the current class from this stream.

Invariants

One effect of not using a constructor to create instances is that a class’s invariants are not automatically established on deserialization. So while a class does usually check all constructor arguments for validity, this mechanism is not automatically applied to the deserialized values of fields.

Implementing such a check for deserialization is an extra effort which easily leads to code duplication and all the problems it typically ensues. If forgotten or done carelessly, the class is open for bugs or security holes.

Serialized Form

Serialization!

Published by infocux Technologies under CC-BY-NC 2.0.

The structure of a serializable class’s byte stream encoding is called its serialized form. It is mainly defined by the names and types of the class’s fields.

The serialized form has some properties that are not immediately obvious. While some of the problematic ones can be mitigated by carefully defining the form, they will usually still be a burden on future development of a class.

Public API

The most important property of the serialized form is:

It is part of the class’s public API!

From the moment a serializable class is deployed, it has to be assumed that serialized instances exist. And it is usually expected of a system to support the deserialization of instances which were created with older versions of the same system. Users of a class rely on its serialized form as much as on its documented behavior.

Reduced Information Hiding

The concept of information hiding allows a class to maintain its documented behavior while changing its way of implementing it. This expressively includes the representation of its state, which is usually hidden and can be adapted as needed. Since the serialized form, which captures that representation of the state, becomes part of the public API so does the representation itself.

A serializable class only effectively hides the implementation of its behavior while exposing the definition of that behavior and the state it uses to implement it.

Reduced Flexibility

Hence, like changing a class’s API (e.g. by changing or removing methods or altering their documented behavior) might break code using it, so does changing the serialized form. It is easy to see that improving a class becomes vastly more difficult if its fields are fixed. This greatly reduces the flexibility to change such a class if the need arises.

Making something in the JDK serializable makes a dramatic increase in our maintenance costs, because it means that the representation is frozen for all time. This constrains our ability to evolve implementations in the future, and the number of cases where we are unable to easily fix a bug or provide an enhancement, which would otherwise be simple, is enormous. So, while it may look like a simple matter of “implements Serializable” to you, it is more than that. The amount of effort consumed by working around an earlier choice to make something serializable is staggering.

Brian Goetz

Increased Testing Effort

If a serializable class is changed, it is necessary to test whether serialization and deserialization works across different versions of the system. This is no trivial task and will create measurable costs.

Class representations

The serialized from represents a class but not all representations are equal.

Physical

If a class defines fields with reference types (i.e. non-primitives), its instances contain pointers to instances of those types. Those instance, in turn, can point to other ones and so on. This defines a directed graph of interlinked instances. The physical representation of an instance is the graph of all instances reachable from it.

As an example, consider a doubly linked list. Each element of the list is contained in a node and each node knows the previous and the next one. This is basically already the list’s physical representation. A list with a dozen elements would be a graph of 13 nodes. The list instance points to the first and last list node and starting from there one can traverse the ten nodes in between in both directions.

One way to serialize an instance of a class is to simply traverse the graph and serialize each instance. This effectively writes the physical representation to the byte stream, which is the default serialization mechanism.

While the physical representation of a class is usually an implementation detail, this way to serialize it exposes this otherwise hidden information. Serializing the physical representation effectively binds the class to it which makes it extremely hard to change it in the future. There are other disadvantages, which are described in Effective Java (p. 297 in 2nd edition).

Logical

The logical representation of a class’s state is often more abstract. It is usually more removed from the implementation details and contains less information. When trying to formulate this representation, it is advisable to push both aspects as far as possible. It should be as implementation independent as possible and should be minimal in the sense that leaving out any bit of information makes it impossible to recreate an instance from it.

To continue the example of the linked list, consider what it actually represents: just some elements in a certain order. Whether these are contained in nodes or not and how those hypothetical nodes might be linked is irrelevant. A minimal, logical representation would hence only consist of those elements. (In order to properly recreate an instance from the stream it is necessary to add the number of elements. While this is redundant information it doesn’t seem to hurt much.)

So a good logical representation only captures the state’s abstract structure and not the concrete fields representing it. This implies that while changing the former is still problematic the latter can be evolved freely. Compared to serializing the physical representation this restores a big part of the flexibility for further development of the class.

Serialization Patterns

There are at least three ways to serialize a class. Calling all of them patterns is a little overboard so the term is used loosely.

Default Serialized Form

This is as simple as adding

implements Serializable
to the declaration. The serialization mechanism will then write all non-transient fields to the stream and on deserialization assign all the values present in a stream to their matching fields.

This is the most straight forward way to serialize a class. It is also the one where all the sharp edges of serialization are unblunted and waiting for their turn to really hurt you. The serialized form captures the physical representation and there is absolutely no checking of invariants.

Custom Serialized Form

By implementing

writeObject
a class can define what gets written to the byte stream. A matching
readObject
must read an according stream and use the information to assign values to fields.

This approach allows more flexibility than the default form and can be used to serialize the class’s logical representation. There are some details to consider and I can only recommend to read the respective item in Effective Java (item 55 in 1st edition; item 75 in 2nd edition).

Serialization Proxy Pattern

In this case the instance to serialize is replaced by a proxy. This proxy is written to and read from the byte stream instead of the original instance. This is achieved by implementing the methods

writeReplace
and
readResolve
.

In most cases this is by far the best approach to serialization. It deserves its own post and it will get it soon (stay tuned).

Misc

Some other details about serialization.

Artificial Byte Stream

The happy path of deserialization assumes a byte stream which was created by serializing an instance of the same class. While doing so is alright in most situations, it must be avoided in security critical code. This includes any publicly reachable service which uses serialization for remote communication.

Instead the assumption must be that an attacker carefully handcrafted the stream to violate the class’s invariants. If this is not countered, the result can be an unstable system which might crash, corrupt data or be open for attacks.

Documentation

Javadoc has special annotations to document the serialized form of a class. For this it creates a special page in the docs where it lists the following information:

  • The tag
    @serialData
    can annotate methods and the following comment is supposed to document the data written do the byte stream. The method signature and the comment is shown under Serialization Methods.
  • The tag
    @serial
    can annotate fields and the following comment is supposed to describe the field. The field’s type and name and the comment are then listed under Serialized Fields.

A good example is the documentation for the LinkedList.

The post Concepts of Serialization appeared first on blog@CodeFX.

Serialize Optional

$
0
0

In a recent post I explained why we can’t serialize Optional. But what if, after all that, we still really, really want to? Let’s see how to come as close as possible.

Update

This is part of a series of posts about Optional:

Overview

This post first takes a look at possible scenarios in which we want to serialize Optional and then presents a serializable wrapper for it. Finally, bringing both together, it shows solutions for the different scenarios. The post relies on the concepts of serialization and especially the serialization proxy pattern, which I described recently.

I created a demo project at GitHub which contains the complete code of the involved classes and a demonstration on how to use them. Check it out for more details. The next release of LibFX will also contain the serializable wrapper from that project.

When To Serialize Optional

Do restate the facts: Optional does not implement Serializable. And it is final, which prevents us from creating a serializable subclass.

Why And How to Serialize Optional?

Published by Ethan Lofton under CC-BY 2.0.

There are at least two scenarios which require to serialize an Optional:

  • It could be an argument or return type of a method which is send over the wire with a serialization-based RPC framework, like RMI.
  • It could be a field in a serializable class.

Let’s have a closer look at both cases.

Serializing an Optional Argument or Return Value

In this case the argument or return type has to be serializable. This can only be achieved by changing the method’s signature and use a class which implements that interface. Several approaches exist.

It would be possible to simply create a class which duplicates all the functionality of Optional but is serializable. It can then be used as a full replacement. This will likely lead to this class permeating the code base instead of the official one. In the face of future changes to the language, which might optimize performance if Optional follows a defined and rigid structure, and for various other reasons I don’t think this is a good idea.

Instead a simple class could be created which just allows to wrap and unwrap an Optional. It would be serializable by writing/reading the object contained in that Optional. I implemented such a class in my demo project. It is called

SerializableOptional
and explained further below.

But let’s first look at the other reason to serialize Optional.

Disclaimer

I do not have much architectural experience with remote calls. On the OpenJDK mailing list Remi Forax describes downsides of having the client act upon an empty Optional. For the exchange regarding that see Remi’s initial statement, this request for clarification and Remi’s explanation.

Serializing an Optional Field

If a class wants to serialize a field of type Optional, it has to customize its serialization mechanism. This is actually a good idea for any serializable – see chapter 11 in Java Bloch’s excellent Effective Java (2nd Edition). It can either define a custom serialized form or implement the serialization proxy pattern.

As it is the recommended approach in most cases, I will only cover the proxy pattern.

Disclaimer

In most cases there is no need to have a nullable/optional field in a class. A better design can often be created and should be actively looked for! (Apparently Optional isn’t serializable in order to convey that fact.)

If the class must have an optional field, it should be carefully decided whether it is part of the class’s logical representation. The fact that it is nullable/optional makes it likely that it is transient and can be recreated after deserialization. Only if that is not the case, does it make sense to serialize the field.

Serializable Optional

The

SerializableOptional<T>
(link) only exists to wrap and unwrap an Optional and offers little of its features. In the case of arguments or return values, it can (and in most cases should) be used without even declaring a variable of type
SerializableOptional
.

Wrapping

The class has two methods to wrap and unwrap an Optional (where

T
always extends
Serializable
– left out for brevity):

/**
 * Creates a serializable optional from the specified 'Optional'.
 */
public static <T> SerializableOptional<T> fromOptional(Optional<T> optional);

/**
 * Returns the 'Optional' instance with which this instance was created.
 */
public Optional<T> asOptional();

To make construction a little less verbose if no Optional exists, it has these equivalents of Optional’s methods with the same name:

/**
 * Usability method which creates a serializable optional which wraps
 * an empty Optional. Equivalent to 'Optional.empty()'.
 */
public static <T> SerializableOptional<T> empty();

/**
 * Usability method which creates a serializable optional for the specified
 * value by wrapping it in an 'Optional.' The value must be non-null.
 * Equivalent to 'Optional.of(Object)'.
 */
public static <T> SerializableOptional<T> of(T value);

/**
 * Usability method which creates a serializable optional for the specified
 * value by wrapping it in an 'Optional'. The value can be null.
 * Equivalent to 'Optional.ofNullable(Object)'.
 */
public static <T> SerializableOptional<T> ofNullable(T value);

Serialization

SerializableOptional
uses the serialization proxy pattern. Its logical representation only consists of the value contained in the wrapped Optional (or null if it is empty).

Serialization then works as usual. See the demo for different use cases.

Serialize Optional

Let’s see how to approach the situations in which we would like to serialize an argument, return value or field of type

Optional
.

Methods With Optional Arguments Or Return Value

A method which likes to have an argument or return value of type

Optional
but needs it to be serializable, can use
SerializableOptional
instead. Using it of course adds another layer of indirection, which leads to additional calls:

// these methods require all of their argument and return types
// to be serializable (e.g. for RMI)
public SerializableOptional<String> search(int id);
public void log(int id, SerializableOptional<String> item);

// shows how to quickly wrap and unwrap an 'Optional';
// note that no local variable of type 'SerializableOptional' is needed
private void callMethods() {
	for (int id = 0; id < 7; id++) {
		// unwrap the returned optional using 'asOptional'
		Optional<String> searchResult = search(id).asOptional();
		// wrap the optional using 'fromOptional'
		// (if used often, this could be a static import)
		log(id, SerializableOptional.fromOptional(searchResult));
	}
}

Fields Of Type Optional

The recommended approach to serialization is to use the serialization proxy pattern. In that case, there are two possibilities to serialize Optional.

Serializing The Extracted Value

The proxy can simply have a field of the type which is wrapped by the Optional. In its constructor it then assigns the value contained in the Optional (or null if it is empty) to that field.

This is done by the

ClassUsingOptionalCorrectly
(link):

private static class SerializationProxy<T> implements Serializable {

	private final T optionalValue;

	public SerializationProxy(
			ClassUsingOptionalCorrectly<T> classUsingOptional) {

		optionalValue = classUsingOptional.optional.orElse(null);
	}
}

Using SerializableOptional

Alternatively, the serialization proxy can have an instance of

SerializableOptional
. This is done by the class
TransformForSerializationProxy
(link):

private static class SerializationProxy<T> implements Serializable {

	private final SerializableOptional<T> optional;

	public SerializationProxy(TransformForSerializationProxy<T> transform) {
		optional = SerializableOptional.fromOptional(transform.optional);
	}
	
}

The main difference to extracting the value is readability. It makes it clearer that the logical representation contains an optional field. The costs are an increased size of byte representation and more time to write it. I didn’t benchmark this so I can’t tell whether this can be important. I guess that, as usual, it depends.

Reflection

We have seen the two main (and only?) reasons to serialize an Optional and what to do about it: If a method’s arguments or return value needs to be serialized, use the

SerializableOptional
and immediately wrap/unwrap it when the method is called. If a class has an optional field which it wants to serialize, its serialization proxy could either extract the Optional’s value or use write the
SerializableOptional
to the byte stream.

The helper class

SerializableOptional
from my demo project is public domain and can be used without any legal limitations.

The post Serialize Optional appeared first on blog@CodeFX.

Impulse: “Lambdas In Java: A Peek Under The Hood”

$
0
0

Java 8 brought us lambda expressions and we’re all very happy using them. But how do they work? What happens behind the scenes and how well do they perform? A talk by Brian Goetz, specification lead for the Java specification request which introduced lambda expressions, answers these questions.

Overview

This post is going to outline the talk “Lambdas in Java: A peek under the hood”, which Brian Goetz held in October 2013 at the goto; conference in Aarhus. As usual for this series, some details are left out for brevity, so if you’re interested, make sure to check out the video.

The Talk

Here’s the talk:

The slides can be found here.

The Gist

After a short introduction of lambdas, Brian Goetz presents some of the ideas the expert group for JSR 335 considered for their implementation. He talks about their type and their runtime representation, the advantages of the chosen approach and gives some numbers regarding its performance. He finishes the talk with an outlook on possible future optimizations.

Lambdas in Java 8

On a single slide, Goetz explains what lambda expressions are and how they look in Java 8. Now, 13 months later, everybody knows them, so there is no reason to repeat that introduction. He then goes on to quickly answer why lambdas were included in the language.

An important reason is parallelization. When it comes to multi-threaded processing of collections, it is best to have the collection provide a customized parallel iteration mechanism instead of letting the user implement one. This is called internal iteration and the opposite of external iteration, e.g. with for loops. If the collection does the iteration itself, it needs to delegate the processing of the individual elements to a function provided by the user. This makes it necessary to have a concise format in which the user can specify such functions. While anonymous classes are a possibility, their syntax is too lengthy for frequent use.

The importance of lambda expressions can be judged by the fact that Java was the last mainstream OO language without them.

Type

The first big question the expert group had to answer was what type the new lambda expressions were going to have.

Why Not Just Add Function Types?

A seemingly straight forward way to implement lambdas would be to define a the new type. So in addition to primitives, arrays, classes and interfaces there would also be a function type. This immediately entails the question of how this could be implemented in the virtual machine.

One approach would be to represent all functions as a single type and use generics to distinguish them. This would unavoidably bring the “pain of generics” to functions. Type erasure would make it impossible to overload the same method with different function arguments (say with one from string to integer and with another from string to boolean). Generics would also require to box all primitives which would be detrimental to performance.

Goetz then states that due to the many changes to the virtual machine “teaching [it] about ‘real’ function types would be a huge effort”. It would introduce complexity and corner cases and many interoperability challenges between libraries using functions and those not using them.

Functional Interfaces

A quick historical detour shows that Java already models functions in many places. It uses interfaces with only one method to do so. Prime examples are

Runnable
,
Callable
and
Comparator
.

So instead of adding a new type for functions, the expert group decided to formalize the existing pattern. They named the concept of an interface with a single abstract method functional interface.

The compiler would then be able to interpret all interfaces with one method as a function. (E.g.

Comparator
as a generic function from a pair of instances of some type
T
to
int
.) When a lambda expression is used in a place where such an interface is expected, the compiler can transform the lambda expression to an instance of that interface.

An important bonus of that decision is that old libraries are forward compatible with lambdas! Code that was written before Java 8, which might use interfaces with just one method, can now be used with lambdas. This considerably reduces the amount of necessary rework of existing code.

Runtime Representation

Another question is how to represent lambda expressions at runtime, i.e. in byte code. It is important to notice that whatever representation is chosen will be fixed forever.

Why Not Just Use Inner Classes?

An obvious approach to the representation of lambdas would be to silently create a matching inner class. This is exactly what anonymous classes do. This has the advantage of being comparatively simple and straight forward. It introduces no new concepts and seamlessly integrates with existing mechanisms.

But it also brings the disadvantages of anonymous classes to lambda expressions. While most of them are largely invisible to the average programmer they also include a suboptimal performance. Or in Goetz’s words:

Well, inner classes suck!

Why Not Just Use Method Handles?

Goetz then covers method handles. This is a lower level language construct with I am not familiar with. So instead of relating wrong information I’m not going to cover it.

What it comes down to is that method handles also turn out to be no good representation of lambdas. He describes the root cause as follows:

It takes an implementation technique and it conflates our binary representation with that choice of implementation technique.

More Indirection!

With the most obvious possibilities failing to properly represent lambdas, the expert group looked for a more indirect approach. One which does not bind the representation to a specific implementation and does not compromise on performance.

So instead of choosing a byte code representation which imperatively creates an interface from a lambda, the byte code merely gives a declarative recipe. It is then up to the runtime to execute that recipe in the best and most performant way.

A Recipe For Lambdas

But how could a byte code representation of such a deferred creation look like? It turned out that a tool introduced in Java 7 provides much of the needed functionality.

invokedynamic

Prior to Java 7 there were four byte codes for method invocations. These codes are close representations of use cases needed by the Java language:

  • calling a static method (invokestatic)
  • calling a class method (invokevirtual)
  • calling an interface method (invokeinterface)
  • everything else (invokespecial, e.g. for constructors)

With the rise of dynamic JVM based languages a new type was needed for cases where an instance’s type is not known at compile time. It was eventually introduced in Java 7: invokedynamic. It allows languages to influence the calling behavior of the JVM at runtime. This is done by providing their own specific logic.

It is implemented such that the virtual machine calls back to the language logic when encountering an invokedynamic call site for the first time. The language logic then returns how to resolve that call.

This bootstrap method is a comparatively expensive operation. To avoid calling it every time, the language logic also returns the conditions under which the decision can be reused. The JVM can then optimize invokedynamic call sites as any other invocation byte code which guarantees a comparable performance.

Lambda Factory

With invokedynamic at hand, it is fairly straight forward to represent a lambda capture site (i.e. the place where a lambda is used).

First, a method is created which is equivalent to the lambda expression (this is called “desugaring”). Depending on the captured context (e.g. method calls or accessed fields) this can either be an instance or a static method of the class where the expression occurred. A handle to that method is a central ingredient in the recipe for that lambda expression. Other ingredients are the target interface, some metadata (e.g. for serialization) and the values captured by the lambda expression. Returning an interface implementation for such a recipe is called transformation and different strategies exist (e.g. inner classes as described above).

The capture site itself then becomes a factory which takes the recipe for the lambda and returns an implementation of the corresponding functional interface. The factory is represented by a call to invokedynamic with the recipe as an argument.

Lambda Metafactory

The bootstrapping process of such an invokedynamic call is realized by the lambda metafactory. Its task is to transform a given recipe to an interface implementation. To this end, it can choose whatever strategy it deems best.

Evaluation

Using invokedynamic to represent a lambda capture site as a lambda factory is described by Goetz as “the ultimate procrastination aid”. It defers choosing a transformation strategy (from lambda expression to interface implementation) to runtime and makes it a pure implementation detail.

Advantages

One advantage is that the runtime has more information about the running program and the underlying system than the compiler. It can thus make a more informed decision about what the best strategy is. It is even possible for different VMs on different systems to provide different transformation strategies, which are optimized to use system specific features.

Another advantage is that changes to the lambda metafactory can happen at any time and all existing code would automatically profit from that without having to be recompiled. So new strategies could be implemented and the mechanism which picks a strategy for a given call site can be improved as well. All this can be done in any minor Java update as it happens behind the scenes.

This mechanism also brings concrete performance advantages with it:

  • There is no need for additional fields or static initialization. So there is no increase in memory or runtime footprint of a class.
  • Lambdas which capture no variables only need to be instantiated once.
  • Initialization cost is deferred to a lambda’s first use, which implies no such costs when one is not used at all.

Last but not least, the implemented mechanism can be used by all JVM based languages. This means that they will also benefit from future optimizations.

Performance

But does the indirection have a performance price? Goetz breaks the costs down into three components: linkage, capture and invocation cost. Linkage happens once for each lambda expression and provides the VM with the means to process the expression. Capturing means providing an instance of the functional interface which the lambda expression is implementing. Finally invocation means actually calling the method.

If inner classes were used for lambda expressions, these costs were the following: Linkage means going to the file system to load the byte code for the class. For capture, the loaded class has to be instantiated. The invocation is a regular method call.

For the chosen implementation the costs are as follows: Linkage is the call to the lambda metafactory. It returns a class and for lambdas which use variables from the surrounding scope a new instance has to be created for each capture of the expression. If, on the other hand, an expression uses only variables provided to it as arguments (which is fairly common), the same instance can be reused so there is no capture cost. Finally the invocation is also a regular method call.

To compare these costs, Goetz presents some measurements. The very short and overly simplified summary is this:

  • Linkage: Lamdas are between 8% and 24% faster
  • Capture: Very similar for inner classes and lambdas which capture no variables. Non-capturing lambdas, where the same instance can be reused, are somewhat faster on a single thread but really excel in a multi-threaded scenario.

Goetz stresses the fact that this is just “the dumb strategy” and that future optimizations can improve performance even more. He goes on to name how some improvements of the VM could speed lambda processing up.

Summary

After a quick dive into the additional requirements for serialization, Goetz summarizes his talk.

Noteworthy is his notion of “obvious-but-wrong” ideas. They seem to be a perfect match but closer inspection might reveal serious problems. Examples of this are the approaches described above. He stresses that one should always be on the lookout for these ideas.

Reflection

We saw why Java 8 has no function type but reuses regular interfaces which match a certain condition. We then went to in see how invokedynamic is used to defer the transformation of a lambda expression to an instance of that interface from compile time to runtime. An overview over advantages and performance properties of the chose approach justifies that decision.

The post Impulse: “Lambdas In Java: A Peek Under The Hood” appeared first on blog@CodeFX.

Instances of Non-Capturing Lambdas

$
0
0

Roughly a month ago, I summarized Brian Goetz’ peek under the hood of lambda expressions in Java 8. Currently I’m researching for a post about default methods and to my mild surprise came back to how Java handles lambda expressions. The intersection of these two features can have a subtle but surprising effect, which I want to discuss.

Overview

To make this more interesting I’ll start the post with an example, which will culminate in my personal WTF?! moment. The full example can be found in a dedicated GitHub project.

We will then see the explanation for this somewhat unexpected behavior and finally draw some conclusions to prevent bugs.

Example

Here goes the example… It’s not as trivial or abstract as it could be because I wanted it to show the relevance of this scenario. But it is still an example in the sense that it only alludes to code which might actually do something useful.

A Functional Interface

Assume we need a specialization of the interface

Future
for a scenario where the result already exists during construction.

We decide to implement this by creating an interface

ImmediateFuture
which implements all functionality except
get()
with default methods. This results in a functional interface.

You can see the source here.

A Factory

Next, we implement a

FutureFactory
. It might create all kinds of Futures but it definitely creates our new subtype. It does so like this:

/**
 * Creates a new future with the default result.
 */
public static Future createWithDefaultResult() {
	ImmediateFuture immediateFuture = () -> 0;
	return immediateFuture;
}

/**
 * Creates a new future with the specified result.
 */
public static Future createWithResult(Integer result) {
	ImmediateFuture immediateFuture = () -> result;
	return immediateFuture;
}

Creating The Futures

Finally we use the factory to create some futures and gather them in a set:

public static void main(String[] args) {
	Set<Future<?>> futures = new HashSet<>();

	futures.add(FutureFactory.createWithDefaultResult());
	futures.add(FutureFactory.createWithDefaultResult());
	futures.add(FutureFactory.createWithResult(42));
	futures.add(FutureFactory.createWithResult(63));

	System.out.println(futures.size());
}

WTF?!

Run the program. The console will say…

4?

Nope. 3.

WTF?!

Evaluation of Lambda Expressions

So what’s going on here? Well, with some background knowledge about the evaluation of lambda expressions it’s actually not that surprising. If you’re not too familiar with how Java does this, now is a good time to catch up. One way to do so is to watch Brian Goetz’ talk “Lambdas in Java: A peek under the hood” or read my summary of it.

instances-non-capturing-lambdas

Instances of Lambda Expressions

The key point to understanding this behavior is the fact that the JRE makes no promise about how it turns a lambda expression into an instance of the respective interface. Let’s look at what the Java Language Specification has to say about the matter:

15.27.4. Run-time Evaluation of Lambda Expressions

[…]

Either a new instance of a class with the properties below is allocated and initialized, or an existing instance of a class with the properties below is referenced.

[… properties of the class – nothing surprising here …]

These rules are meant to offer flexibility to implementations of the Java programming language, in that:

  • A new object need not be allocated on every evaluation.
  • Objects produced by different lambda expressions need not belong to different classes (if the bodies are identical, for example).
  • Every object produced by evaluation need not belong to the same class (captured local variables might be inlined, for example).
  • If an “existing instance” is available, it need not have been created at a previous lambda evaluation (it might have been allocated during the enclosing class’s initialization, for example).

[…]

JLS, Java SE 8 Edition, §15.27.4

Amongst other optimizations, this clearly enables the JRE to return the same instance for repeated evaluations of a lambda expression.

Instances of Non-Capturing Lambda Expressions

Note that in the example above the expression does not capture any variables. It can hence never change from evaluation to evaluation. And since lambdas are not designed to have state, different evaluations can also not “drift apart” during their lifetime. So in general, there is no good reason to create several instances of non-capturing lambdas as they would all be exactly the same over their whole lifetime. This enables the optimization to always return the same instance.

(Contrast this with a lambda expression which captures some variables. A straight forward evaluation of such an expression is to create a class which has the captured variables as fields. Each single evaluation must then create a new instance which stores the captured variables in its fields. These instances are obviously not generally equal.)

So that’s exactly what happens in the code above.

() -> 0
is a non-capturing lambda expression so each evaluation returns the same instance. Hence the same is true for each call to
createWithDefaultResult()
.

Remember, though, that this might only be true for the JRE version currently installed on my machine (Oracle 1.8.0_25-b18 for Win 64). Yours can differ and so can the next gal’s and so on.

Lessons Learned

So we saw why this happens. And while it makes sense, I’d still say that this behavior is not obvious and will hence not be expected by every developer. This is the breeding ground for bugs so let’s try to analyze the situation and learn something from it.

Subtyping with Default Methods

Arguably the root cause of the unexpected behavior was the decision of how to refine

Future
. We did this by extending it with another interface and implementing parts of its functionality with default methods. With just one remaining unimplemented method
ImmediateFuture
became a functional interface which enables lambda expressions.

Alternatively

ImmediateFuture
could have been an abstract class. This would have prevented the factory from accidentally returning the same instance because it could not have used lambda expressions.

The discussion of abstract classes vs. default methods is not easily resolved so I’m not trying to do it here. But I’ll soon publish a post about default methods and I plan to come back to this. Suffice it to say that the case presented here should be considered when making the decision.

Lambdas in Factories

Because of the unpredictability of a lambda’s reference equality, a factory method should carefully consider using them to create instances. Unless the method’s contract clearly allows for different calls to return the same instance, they should be avoided altogether.

I recommend to include capturing lambdas in this ban. It is not at all clear (to me), under which circumstances the same instance could or will be reused in future JRE versions. One possible scenario would be that the JIT discovers that a tight loop creates suppliers which always (or at least often) return the same instance. By the logic used for non-capturing lambdas, reusing the same supplier instance would be a valid optimization.

Anonymous Classes vs Lambda Expressions

Note the different semantics of an anonymous class and a lambda expression. The former guarantees the creation of new instances while the latter does not. To continue the example, the following implementation of

createWithDefaultResult()
would lead to the
futures
- set having a size of four:

public static Future<Integer> createWithDefaultResult() {
	ImmediateFuture<Integer> immediateFuture = new ImmediateFuture<Integer>() {
		@Override
		public Integer get() throws InterruptedException, ExecutionException {
			return 0;
		}
	};
	return immediateFuture;
}

This is especially unsettling because many IDEs allow the automatic conversion from anonymous interface implementations to lambda expressions and vice versa. With the subtle differences between the two this seemingly purely syntactic conversion can introduce subtle behavior changes. (Something I was not initially aware of.)

In case you end up in a situation where this becomes relevant and chose to use an anonymous class, make sure to visibly document your decision! Unfortunately there seems to be no way to keep Eclipse from converting it anyway (e.g. if conversion is enabled as a save action), which also removes any comment inside the anonymous class.

The ultimate alternative seems to be a (static) nested class. No IDE I know would dare to transform it into a lambda expression so it’s the safest way. Still, it needs to be documented to prevent the next Java-8-fanboy (like yours truly) to come along and screw up your careful consideration.

Functional Interface Identity

Be careful when you rely on the identity of functional interfaces. Always consider the possibility that wherever you’re getting those instances might repeatedly hand you the same one.

But this is of course pretty vague and of little concrete consequence. First, all other interfaces can be reduces to a functional one. This is actually the reason why I picked

Future
– I wanted to have an example which does not immediately scream CRAZY LAMBDA SHIT GOING ON! Second, this can make you paranoid pretty quickly.

So don’t overthink it – just keep it in mind.

Guaranteed Behavior

Last but not least (and this is always true but deserves being repeated here):

Do not rely on undocumented behavior!

The JLS does not guarantee that each lambda evaluation returns a new instance (as the code above demonstrates). But it neither guarantees the observed behavior, i.e. that non-capturing lambdas are always represented by the same instance. Hence don’t write code which depends on either.

I have to admit, though, that this is a tough one. Seriously, who looks at the JLS of some feature before using it? I surely don’t.

Reflection

We have seen that Java does not make any guarantees about the identity of evaluated lambda expressions. While this is a valid optimization, it can have surprising effects. To prevent this from introducing subtle bugs, we derived guidelines:

  • Be careful when partly implementing an interface with default methods.
  • Do not use lambda expressions in factory methods.
  • Use anonymous or, better yet, inner classes when identity matters.
  • Be careful when relying on the identity of functional interfaces.
  • Finally, do not rely on undocumented behavior!

The post Instances of Non-Capturing Lambdas appeared first on blog@CodeFX.

New Javadoc Tags @apiNote, @implSpec and @implNote

$
0
0

If you’re already using Java 8, you might have seen some new Javadoc tags: @apiNote, @implSpec and @implNote. What’s up with them? And what do you have to do if you want to use them?

Overview

This post will have a quick view at the tags’ origin and current status. It will then explain their meaning and detail how they can be used with IDEs, the Javadoc tool and via Maven’s Javadoc plugin.

I created a demo project on GitHub to show some examples and the necessary additions to Maven’s pom.xml. To make things easier for the Maven-averse, it already contains the generated javadoc.

Context

Origin

The new Javadoc tags are a byproduct of JSR-335, which introduced lambda expressions. They came up in the context of default methods because these required a more standardized and fine grained documentation.

In January 2013 Brian Goetz gave a motivation and made a proposal for these new tags. After a short discussion it turned into a feature request three weeks later. By April the JDK Javadoc maker was updated and the mailing list informed that they were ready to use.

Current Status

It is important to note that the new tags are not officially documented (they are missing in the official list of Javadoc tags) and thus subject to change. Furthermore, the implementer Mike Duigou wrote:

There are no plans to attempt to popularize these particular tags outside of use by JDK documentation.

So while it is surely beneficial to understand their meaning, teams should carefully consider whether using them is worth the risk which comes from relying on undocumented behavior. Personally, I think so as I deem the considerable investment already made in the JDK as too high to be reversed. It would also be easy to remove or search/replace their occurrences in a code base if that became necessary.

@apiNote, @implSpec and @implNote

Not the new Javadoc tags

Published by the Brooklyn Museum under CC-BY 3.0.

Let’s cut to the heart of things. What is the meaning of these new tags? And where and how are they used?

Meaning

The new Javadoc tags are explained pretty well in the feature request’s description (I changed the layout a little):

There are lots of things we might want to document about a method in an API. Historically we’ve framed them as either being “specification” (e.g., necessary postconditions) or “implementation notes” (e.g., hints that give the user an idea what’s going on under the hood.) But really, there are four boxes (and we’ve been cramming them into two, or really 1.5):

 { API, implementation } x { specification, notes }

(We sometimes use the terms normative/informative to describe the difference between specification/notes.) Here are some descriptions of what belongs in each box.

1. API specification.
This is the one we know and love; a description that applies equally to all valid implementations of the method, including preconditions, postconditions, etc.

2. API notes.
Commentary, rationale, or examples pertaining to the API.

3. Implementation specification.
This is where we say what it means to be a valid default implementation (or an overrideable implementation in a class), such as “throws UOE.” Similarly this is where we’d describe what the default for

putIfAbsent
does. It is from this box that the would-be-implementer gets enough information to make a sensible decision as to whether or not to override.

4. Implementation notes.
Informative notes about the implementation, such as performance characteristics that are specific to the implementation in this class in this JDK in this version, and might change. These things are allowed to vary across platforms, vendors and versions.

The proposal: add three new Javadoc tags, @apiNote, @implSpec, and @implNote. (The remaining box, API Spec, needs no new tag, since that’s how Javadoc is used already.) @impl{spec,note} can apply equally well to a concrete method in a class or a default method in an interface.

So the new Javadoc tags are meant to categorize the information given in a comment. It distinguishes between the specification of the method’s, class’s, … behavior (which is relevant for all users of the API – this is the “regular” comment and would be @apiSpec if it existed) and other, more ephemeral or less universally useful documentation. More concretely, an API user can not rely on anything written in @implSpec or @implNote, because these tags are concerned with this implementation of the method, saying nothing about overriding implementations.

This shows that using these tags will mainly benefit API designers. But even Joe Developer, working on a large project, can be considered a designer in this context as his code is surely consumed and/or changed by his colleagues at some point in the future. In that case, it helps if the comment clearly describes the different aspects of the API. E.g. is “runs in linear time” part of the method’s specification (and should hence not be degraded) or a detail of the current implementation (so it could be changed).

Examples

Let’s see some examples! First from the demo project to show some rationale behind how to use the tags and then from the JDK to see them in production.

The Lottery

The project contains an interface

Lottery
from some fictitious library. The interface was first included in version 1.0 of the library but a new method has to be added for version 1.1. To keep backwards compatibility this is a default method but the plan is to make it abstract in version 2.0 (giving customers some time to update their code).

With the new tags the method’s documentation clearly distinguishes the meanings of its documentation:

/**
 * Picks the winners from the specified set of players.
 * <p>
 * The returned list defines the order of the winners, where the first
 * prize goes to the player at position 0. The list will not be null but
 * can be empty.
 *
 * @apiNote This method was added after the interface was released in
 *          version 1.0. It is defined as a default method for compatibility
 *          reasons. From version 2.0 on, the method will be abstract and
 *          all implementations of this interface have to provide their own
 *          implementation of the method.
 * @implSpec The default implementation will consider each player a winner
 *           and return them in an unspecified order.
 * @implNote This implementation has linear runtime and does not filter out
 *           null players.
 * @param players
 *            the players from which the winners will be selected
 * @return the (ordered) list of the players who won; the list will not
 *         contain duplicates
 * @since 1.1
 */
default List<String> pickWinners(Set<String> players) {
	return new ArrayList<>(players);
}

JDK

The JDK widely uses the new tags. Some examples:

  • ConcurrentMap
    :
    • Several @implSpecs defining the behavior of the default implementations, e.g. on
      replaceAll
      .
    • Interesting @implNotes on
      getOrDefault
      and
      forEach
      .
    • Repeated @implNotes on abstract methods which have default implementations in Map documenting that “This implementation intentionally re-abstracts the inappropriate default provided in Map.”, e.g.
      replace
      .
  • Objects
    uses @apiNote to explain why the seemingly useless methods
    isNull
    and
    nonNull
    were added.
  • The abstract class
    Clock
    uses @implSpec and @implNote in its class comment to distinguish what implementations must beware of and how the existing methods are implemented.

Inheritance

When an overriding method has no comment or inherits its comment via

{@inheritDoc}
, the new tags are not included. This is a good thing, since they will not generally apply. To inherit specific tags, just add the snippet
@tag {@inheritDoc}
to the comment.

The implementing classes in the demo project examine the different possibilities. The README gives an overview.

Tool Support

IDEs

You will likely want to see the improved documentation (the JDK’s and maybe your own) in your IDE. So how do the most popular ones currently handle them?

Eclipse displays the tags and their content but provides no special rendering, like ordering or prettifying the tag headers. There is a feature request to resolve this.

IntellyJ‘s current community edition 14.0.2 displays neither the tags nor their content. This was apparently solved on Christmas Eve (see this ticket) so I guess the next version will not have this problem anymore. I cannot say anything regarding the rendering, though.

NetBeans also shows neither tags nor content and I could find no ticket asking to fix this.

All in all not a pretty picture but understandable considering the fact that this is no official Javadoc feature.

Generating Javadoc

If you start using those tags in your own code, you will soon realize that generating Javadoc fails because of the unknown tags. That is easy to fix, you just have to tell it how to handle them.

Command Line

This can be done via the command line argument -tag. The following arguments allow those tags everywhere (i.e. on packages, types, methods, …) and give them the headers currently used by the JDK:

-tag "apiNote:a:API Note:"
-tag "implSpec:a:Implementation Requirements:"
-tag "implNote:a:Implementation Note:"

(I read the official documentation as if those arguments should be

-tag apiNote:a:"API Note:"
[note the quotation marks] but that doesn’t work for me. If you want to limit the use of the new tags or not include them at all, the documentation of -tag tells you how to do that.)

By default all new tags are added to the end of the generated doc, which puts them below, e.g., @param and @return. To change this, all tags have to be listed in the desired order, so you have to add the known tags to the list below the three above:

-tag "param"
-tag "return"
-tag "throws"
-tag "since"
-tag "version"
-tag "serialData"
-tag "see"

Maven

Maven’s Javadoc plugin has a configuration setting tag which is used to verbosely create the same command line arguments. The demo project on GitHub shows how this looks like in the pom.

Reflection

We have seen that the new Javadoc tags @apiNote, @implSpec and @implNote were added to allow the division of documentation into parts with different semantics. Understanding them is helpful to every Java developer. API designers might chose to employ them in their own code but must keep in mind that they are still undocumented and thus subject to change.

We finally took a look at some of the involved tools and saw that IDE support needs to improve but the Javadoc tool and the Maven plugin can be parameterized to make full use of them.

The post New Javadoc Tags @apiNote, @implSpec and @implNote appeared first on blog@CodeFX.

Everything You Need To Know About Default Methods

$
0
0

So, default methods… yesterday’s news, right? Yes but after a year of use, a lot of facts accumulated and I wanted to gather them in one place for those developers who are just starting to use them. And maybe even the experienced ones can find a detail or two they didn’t know about yet.

I will extend this post in the future if new shit comes to light. So I am asking my readers (yes, both of you!) to provide me with each little fact regarding default methods which you can’t find here. If you’ve got something, please tweet, mail or leave a comment.

Overview

I guess I failed in giving this post a meaningful narrative. The reason is that, in its heart, it’s a wiki article. It covers different concepts and details of default methods and while these are naturally related, they do not lend themselves to a continuous narration.

But this has an upside, too! You can easily skip and jump around the post without degrading your reading experience much. Check the table of contents for a complete overview over what’s covered and go where your curiosity leads you.

Default Methods

By now most developers will already have used, read and maybe even implemented default methods, so I’m going to spare everyone a detailed introduction of the syntax. I’ll spend some more time on its nooks and crannies before covering broader concepts.

Syntax

What the new language feature of default methods comes down to is that interfaces can now declare non-abstract methods, i.e. ones with a body.

The following example is a modified version of

Comparator.thenComparing(Comparator)
(link) from JDK 8:

default Comparator<T> thenComparing(Comparator<? super T> other) {
	return (o1, o2) -> {
		int res = this.compare(o1, o2);
		return (res != 0) ? res : other.compare(o1, o2);
	};
}

This looks just like a “regular” method declaration except for the keyword

default
. This is necessary to add such a method to an interface without a compile error and hints at the method call resolution strategy.

Every class which implements

Comparator
will now contain the public method
thenComparing(Comparator)
without having to implement it itself – it comes for free, so to speak.

Explicit Calls to Default Methods

Further below, we will see some reasons why one might want to explicitly call a default implementation of a method from some specific superinterface. If the need arises, this is how it’s done:

class StringComparator implements Comparator<String> {

	// ...

	@Override
	public Comparator<String> thenComparing(Comparator<? super String> other) {
		log("Call to 'thenComparing'.");
		return Comparator.super.thenComparing(other);
	}
}

Note how the name of the interface is used to specify the following

super
which would otherwise refer to the superclass (in this case
Object
). This is syntactically similar to how the reference to the outer class can be accessed from a nested class.

Resolution Strategy

So let’s consider an instance of a type which implements an interface with default methods. What happens if a method is called for which a default implementation exists? (Note that a method is identified by its signature, which consists of the name and the parameter types.)

Rule #1:
Classes win over interfaces. If a class in the superclass chain has a declaration for the method (concrete or abstract), you’re done, and defaults are irrelevant.
Rule #2:
More specific interfaces win over less specific ones (where specificity means “subtyping”). A default from
List
wins over a default from
Collection
, regardless of where or how or how many times
List
and
Collection
enter the inheritance graph.
Rule #3:
There’s no Rule #3. If there is not a unique winner according to the above rules, concrete classes must disambiguate manually.

Brian Goetz – Mar 3 2013 (formatting mine)

First of all, this clarifies why these methods are called default methods and why they must be started off with the keyword

default
:

Such an implementation is a backup in case a class and none of its superclasses even consider the method, i.e. provide no implementation and are not declaring it as abstract (see Rule #1). Equivalently, a default method of interface

X
is only used when the class does not also implement an interface
Y
which extends
X
and declares the same method (either as default or abstract; see Rule #2).

While these rules are simple, they do not prevent developers from creating complex situations. This post gives an example where the resolution is not trivial to predict and arguments that this feature should be used with care.

The resolution strategy implies several interesting details…

Conflict Resolution

Rule #3, or rather its absence, means that concrete classes must implement each method for which competing default implementations exist. Otherwise the compiler throws an error. If one of the competing implementations is appropriate, the method body can just explicitly call that method.

This also implies that adding default implementations to an interface can lead to compile errors. If a class

A
implements the unrelated interfaces
X
and
Y
and a default method which is already present in
X
is added to
Y
, class
A
will not compile anymore.

What happens if

A
,
X
and
Y
are not compiled together and the JVM stumbles upon this situation? Interesting question to which the answer seems somewhat unclear. Looks like the JVM will throw an
IncompatibleClassChangeError
.

Re-Abstracting Methods

If an abstract class or interface

A
declares a method as abstract for which a default implementation exists in some superinterface
X
, the default implementation of
X
is overridden. Hence all concrete classes which subtype
A
must implement the method. This can be used as an effective tool to enforce the reimplementation of inappropriate default implementations.

This technique is used throughout the JDK, e.g. on

ConcurrentMap
(link) which re-abstracts a number of methods for which
Map
(link) provides default implementations because these are not thread-safe (search for the term “inappropriate default”).

Note that concrete classes can still chose to explicitly call the overridden default implementation.

Overriding Methods on ‘Object’

It is not possible for an interface to provide default implementations for the methods in

Object
. Trying to do so will result in a compile error. Why?

Well first of all, it would be useless. Since every class inherits from

Object
, Rule #1 clearly implies that those methods would never be called.

But that rule is no law of nature and the expert group could have made an exception. The mail which also contains the rules, Brian Goetz gives many reasons why they didn’t. The one I like best (formatting mine):

At root, the methods from

Object
— such as
toString
,
equals
, and
hashCode
— are all about the object’s state. But interfaces do not have state; classes have state. These methods belong with the code that owns the object’s state — the class.

Modifiers

Note that there are a lot of modifiers you can not use on default methods:

  • the visibility is fixed to public (as on other interface methods)
  • the keyword
    synchronized
    is forbidden (as on abstract methods)
  • the keyword
    final
    is forbidden (as on abstract methods)

Of course these features were requested and comprehensive explanations for their absence exist (e.g. for final and synchronized). The arguments are always similar: This is not what default methods were intended for and introducing those features will result in more complex and error prone language rules and/or code.

You can use

static
though, which will reduce the need for plural-form utility classes.

A Little Context

Now that we know all about how to use default methods let’s put that knowledge into context.

Default methods could've been called vanilla methods

Published by F_A under CC-BY 2.0.

Interface Evolution

The expert group which introduced default methods can often be found stating that their goal was to allow “interface evolution”:

The purpose of default methods […] is to enable interfaces to be evolved in a compatible manner after their initial publication.

Brian Goetz – Sep 2013

Before default methods it was practically impossible (excluding some organizational patterns; see this nice overview) to add methods to interfaces without breaking all implementations. While this is irrelevant for the vast majority of software developers which also control those implementations, it is a crucial problem for API designers. Java always stayed on the safe side and never changed interfaces after they were released.

But with the introduction of lambda expressions, this became unbearable. Imagine the collective pain of always writing

Stream.of(myList).forEach(...)
because
forEach
could no be added to
List
.

So the expert group which introduced lambdas decided to find a way to enable interface evolution without breaking any existing implementations. Their focus on this goal explains the characteristics of default methods.

Where the group deemed it possible without degrading usability of this primary use case, they also enabled the use of default methods to create traits — or rather something close to them. Still, they were frequently attacked for not going “all the way” to mixins and traits, to which the often repeated answer was: “Yes, because that is/was not our goal.”

Ousting Utility Classes

The JDK and especially common auxiliary libraries like Guava and Apache Commons are full of utility classes. Their name is usually the plural form of the interface they are providing their methods for, e.g. Collections or Sets. The primary reason for their existence is that those utility methods could not be added to the original interface after its release. With default methods this becomes possible.

All those static methods which take an instance of the interface as an argument can now be transformed into a default method on the interface. As an example, look at the static

Collections.sort(List)
(link), which as of Java 8 simply delegates to the new instance default method
List.sort(Comparator)
(link). Another example is given in my post on how to use default methods to improve the decorator pattern. Other utility methods which take no arguments (usually builders) can now become static default methods on the interface.

While removing all interface-related utility classes in a code base is possible, it might not be advisable. The usability and cohesiveness of the interface should remain the main priority — not stuffing every imaginable feature in there. My guess is that it only makes sense to move the most general of those methods to the interface while more obscure operations could remain in one (or more?) utility classes. (Or remove them entirely, if you’re into that.)

Classification

In his argument for new Javadoc tags, Brian Goetz weakly classifies the default methods which were introduced into the JDK so far (formatting mine):

1. Optional methods:
This is when the default implementation is barely conformant, such as the following from Iterator:
default void remove() {
	throw new UnsupportedOperationException("remove");
}

It adheres to its contract, because the contract is explicitly weak, but any class that cares about removal will definitely want to override it.
2. Methods with reasonable defaults but which might well be overridden by implementations that care enough:
For example, again from Iterator:
default void forEach(Consumer<? super E> consumer) {
	while (hasNext())
		consumer.accept(next());
}

This implementation is perfectly fine for most implementations, but some classes (e.g.,
ArrayList
) might have the chance to do better, if their maintainers are sufficiently motivated to do so. The new methods on
Map
(e.g.,
putIfAbsent
) are also in this bucket.
3. Methods where its pretty unlikely anyone will ever override them:
Such as this method from Predicate:
default Predicate<T> and(Predicate<? super T> p) {
	Objects.requireNonNull(p);
	return (T t) -> test(t) && p.test(t);
}

Brian Goetz – Jan 31 2013

I call this classification “weak” because it naturally lacks hard rules about where to place a method. That does not make it useless, though. Quite the opposite, I consider it a great help in communicating about them and a good thing to keep in mind while reading or writing default methods.

Documentation

Note that default methods were the primary reason to introduce the new (unofficial) Javadoc tags @apiNote, @implSpec and @implNote. The JDK makes frequent use of them, so it is important to understand their meaning. A good way to learn about them is to read my last post (smooth, right?), which covers them in all detail.

Inheritance and Class-Building

Different aspects of inheritance and how it is used to build classes often come up in discussions about default methods. Let’s take a closer look at them and see how they relate to the new language feature.

Multiple Inheritance — Of What?

With inheritance a type can assume characteristics of another type. Three kinds of characteristics exist:

  • type, i.e. by subtyping a type is another type
  • behavior, i.e. a type inherits methods and thus behaves the same way as another type
  • state, i.e. a type inherits the variables defining the state of another type

Since classes subtype their superclass and inherit all methods and variables, class inheritance clearly covers all three of those characteristics. At the same time, a class can only extend one other class so this is limited to single inheritance.

Interfaces are different: A type can inherit from many interfaces and becomes a subtype of each. So Java has been supporting this kind of multiple inheritance from day 1.

But before Java 8 an implementing class only inherited the interface’s type. Yes, it also inherited the contract but not its actual implementation so it had to provide its own behavior. With default methods this changes so from version 8 on Java supports multiple inheritance of behavior as well.

Java still provides no explicit way to inherit the state of multiple types. Something similar can be achieved with default methods, though, either with an evil hack or the virtual field pattern. The former is dangerous and should never be used, the latter also has some drawbacks (especially regarding encapsulation) and should be used with great care.

Default Methods vs Mixins and Traits

When discussing default methods, they are sometimes compared to mixins and traits. This article can not cover those in detail but will give a rough idea how they differ from interfaces with default methods. (A helpful comparison of mixins and traits can be found on StackOverflow.)

Mixins

Mixins allow to inherit their type, behavior and state. A type can inherit from several mixins, thus providing multiple inheritance of all three characteristics. Depending on the language one might also be able to add mixins to single instances at runtime.

As interfaces with default methods allow no inheritance of state, they are clearly no mixins.

Traits

Similar to mixins, traits allow types (and instances) to inherit from multiple traits. They also inherit their type and behavior but unlike mixins, conventional traits do not define their own state.

This makes traits similar to interfaces with default methods. The concepts are still different, but those differences are not entirely trivial. I might come back to this in the future and write a more detailed comparison but until then, I will leave you with some ideas:

  • As we’ve seen, method call resolution is not always trivial which can quickly make the interaction of different interfaces with default methods a complexity burden. Traits typically alleviate this problem one way or another.
  • Traits allow certain operations which Java does not fully support. See the bullet point list after “selection of operations” in the Wikipedia article about traits.
  • The paper “Trait-oriented Programming in Java 8″ explores a trait-oriented programming style with default methods and encounters some problems.

So while interfaces with default methods are no traits, the similarities allow to use them in a limited fashion like they were. This is in line with the expert group’s design goal which tried to accommodate this use-case wherever it did not conflict with their original goal, namely interface evolution and ease of use.

Default Methods vs Abstract Classes

Now that interfaces can provide behavior they inch into the territory of abstract classes and soon the question arises, which to use in a given situation.

Language Differences

Let’s first state some of the differences on the language level:

While interfaces allow multiple inheritance they fall short on basically every other aspect of class-building. Default methods are never final, can not be synchronized and can not override

Object
’s methods. They are always public, which severely limits the ability to write short and reusable methods. Furthermore, an interface can still not define fields so every state change has to be done via the public API. Changes made to an API to accommodate that use case will often break encapsulation.

Still, there are some use cases left, in which those differences do not matter and both approaches are technically feasible.

Conceptual Differences

Then there are the conceptual differences. Classes define what something is, while interfaces usually define what something can do.

And abstract classes are something special altogether. Effective Java’s item 18 comprehensively explains why interfaces are superior to abstract classes for defining types with multiple subtypes. (And this does not even take default methods into account.) The gist is: Abstract classes are valid for skeletal (i.e. partial) implementations of interfaces but should not exist without a matching interface.

So when abstract classes are effectively reduced to be low-visibility, skeletal implementations of interfaces, can default methods take this away as well? Decidedly: No! Implementing interfaces almost always requires some or all of those class-building tools which default methods lack. And if some interface doesn’t, it is clearly a special case, which should not lead you astray. (See this earlier post about what can happen when an interface is implemented with default methods.)

More Links

I wrote some other posts about default methods but I want to explicitly recommend one which presents precise steps on how to use default methods for their intended goal:

And the internet is of course full of articles about the topic:

Reflection

This article should have covered everything one needs to know about default methods. If you disagree, tweet, mail or leave a comment. Approval and +1’s are also acceptable.

The post Everything You Need To Know About Default Methods appeared first on blog@CodeFX.


Value-Based Classes

$
0
0

In Java 8 some classes got a small note in Javadoc stating they are value-based classes. This includes a link to a short explanation and some limitations about what not to do with them. This is easily overlooked and if you do that, it will likely break your code in subtle ways in future Java releases. To prevent that I wanted to cover value-based classes in their own post – even though I already mentioned the most important bits in other articles.

Overview

This post will first look at why value-based classes exist and why their use is limited before detailing those limitations (if you’re impatient, jump here). It will close with a note on FindBugs, which will soon be able to help you out.

Background

Let’s have a quick look at why value-based classes were introduced and which exist in the JDK.

Why Do They Exist?

A future version of Java will most likely contain value types. I will write about them in the coming weeks (so stay tuned) and will present them in some detail. And while they definitely have benefits, these are not covered in the present post, which might make the limitations seem pointless. Believe me, they aren’t! Or don’t believe me and see for yourself.

For now let’s see what little I already wrote about value types:

The gross simplification of that idea is that the user can define a new kind of type, different from classes and interfaces. Their central characteristic is that they will not be handled by reference (like classes) but by value (like primitives). Or, as Brian Goetz puts it in his introductory article State of the Values:

Codes like a class, works like an int!

It is important to add that value types will be immutable – as primitive types are today.

In Java 8 value types are preceded by value-based classes. Their precise relation in the future is unclear but it could be similar to that of boxed and unboxed primitives (e.g.

Integer
and
int
).

The relationship of existing types with future value types became apparent when

Optional
was designed. This was also when the limitations of value-based classes were specified and documented.

What Value-Based Classes Exist?

These are all the classes I found in the JDK to be marked as value-based:

java.util
Optional, OptionalDouble, OptionalLong, OptionalInt
java.time
Duration, Instant, LocalDate, LocalDateTime, LocalTime, MonthDay, OffsetDateTime, OffsetTime, Period, Year, YearMonth, ZonedDateTime, ZoneId, ZoneOffset
java.time.chrono
HijrahDate, JapaneseDate, MinguaDate, ThaiBuddhistDate

I can not guarantee that this list is complete as I found no official source listing them all.

Is this value-based?

Published by Jeremy Schultz under CC-BY 2.0.

In addition there are non-JDK classes which should be considered value-based but do not say so. An example is Guava’s

Optional
. It is also safe to assume that most code bases will contain classes which are meant to be value-based.

It is interesting to note that the existing boxing classes like

Integer
,
Double
and the like are not marked as being value-based. While it sounds desirable to do so – after all they are the prototypes for this kind of classes – this would break backwards compatibility because it would retroactively invalidate all uses which contravene the new limitations.

Optional
is new, and the disclaimers arrived on day 1.
Integer
, on the other hand, is probably hopelessly polluted, and I am sure that it would break gobs of important code if
Integer
ceased to be lockable (despite what we may think of such a practice.)

Brian Goetz – Jan 6 2015 (formatting mine)

Still, they are very similar so let’s call them “value-ish”.

Characteristics

At this point, it is unclear how value types will be implemented, what their exact properties will be and how they will interact with value-based classes. Hence the limitations imposed on the latter are not based on existing requirements but derived from some desired characteristics of value types. It is by no means clear whether these limitations suffice to establish a relationship with value types in the future.

That being said, let’s continue with the quote from above:

In Java 8 value types are preceded by value-based classes. Their precise relation in the future is unclear but it could be similar to that of boxed and unboxed primitives (e.g.

Integer
and
int
). Additionally, the compiler will likely be free to silently switch between the two to improve performance. Exactly that switching back and forth, i.e. removing and later recreating a reference, also forbids identity-based mechanisms to be applied to value-based classes.

Implemented like this the JVM is freed from tracking the identity of value-based instances, which can lead to substantial performance improvements and other benefits.

Identity

The term identity is important in this context, so let’s have a closer look. Consider a mutable object which constantly changes its state (like a list being modified). Even though the object always “looks” different we would still say it’s the same object. So we distinguish between an object’s state and its identity. In Java, state equality is determined with

equals
(if appropriately implemented) and identity equality by comparing references. In other words, an object’s identity is defined by its reference.

Now assume the JVM will treat value types and value-based classes as described above. In that case, neither will have a meaningful identity. Value types won’t have one to begin with, just like an

int
doesn’t. And the corresponding value-based classes are merely boxes for value types, which the JVM is free to destroy and recreate at will. So while there are of course references to individual boxes, there is no guarantee at all about how they boxes will exist.

This means that even though a programmer might look at the code and follow an instance of a value-based class being passed here and there, the JVM might behave differently. It might remove the reference (thus destroying the object’s identity) and pass it as a value type. In case of an identity sensitive operation, it might then recreate a new reference.

With regard to identity it is best to think of value-based classes like of integers: talking about different instances of “3” (the

int
) makes no sense and neither does talking about different instances of “11:42 pm” (the
LocalTime
).

State

If instances of value-based classes have no identity, their equality can only be determined by comparing their state (which is done by implementing

equals
). This has the important implication that two instances with equal state must be fully interchangeable, meaning replacing one such instance with another must not have any discernible effect.

This indirectly determines what should be considered part of a value-based instance’s state. All fields whose type is a primitive or another value-based class can be part of it because they are also fully interchangeable (all “3”s and “11:42 pm”s behave the same). Regular classes are trickier. As operations might depend on their identity, a vale-based instance can not generally be exchanged for another if they both refer to equal but non-identical instances.

As an example, consider locking on a

String
which is then wrapped in an
Optional
. At some other point another
String
is created with the same character sequence and also wrapped. Then these two
Optional
s are not interchangeable because even though both wrap equal character sequences, those
String
instances are not identical and one functions as a lock while the other one doesn’t.

Strictly interpreted this means that instead of including the state of a reference field in its own state, a value-based class must only consider the reference itself. In the example above, the

Optional
s should only be considered equal if they actually point to the same string.

This may be overly strict, though, as the given as well as other problematic examples are necessarily somewhat construed. And it is very counterintuitive to force value-based classes to ignore the state of “value-ish” classes like

String
and
Integer
.

Value Type Boxes

Being planned as boxes for value types adds some more requirements. These are difficult to explain without going deeper into value types so I’m not going to do that now.

Limitations

First, it is important to note, that in Java 8 all the limitations are purely artificial. The JVM does not know the first thing about this kind of classes and you can ignore all of the rules without anything going wrong – for now. But this might change dramatically when value types are introduced.

As we have seen above, instances of value-based classes have no guaranteed identity, less leniency in defining equality and should fit the expected requirements of boxes for value types. This has two implications:

  • The class must be built accordingly.
  • Instances of the class must not be used for identity-based operations.

This is the ground for the limitations stated in the Javadoc and they can hence be separated into limitations for the declaration of the class and the use of its instances.

Declaration Site

Straight from the documentation (numbering and formatting mine):

Instances of a value-based class:

  1. are final and immutable (though may contain references to mutable objects);
  2. have implementations of
    equals
    ,
    hashCode
    , and
    toString
    which are computed solely from the instance’s state and not from its identity or the state of any other object or variable;
  3. make no use of identity-sensitive operations such as reference equality (
    ==
    ) between instances, identity hash code of instances, or synchronization on an instances’s intrinsic lock;
  4. are considered equal solely based on
    equals()
    , not based on reference equality (
    ==
    );
  5. do not have accessible constructors, but are instead instantiated through factory methods which make no committment as to the identity of returned instances;
  6. are freely substitutable when equal, meaning that interchanging any two instances
    x
    and
    y
    that are equal according to
    equals()
    in any computation or method invocation should produce no visible change in behavior.

With what was discussed above most of these rules are obvious.

Rule 1 is motivated by value-based classes being boxes for value types. For technical and design reasons those must be final and immutable and these requirements are transfered to their boxes.

Rule 2 murkily addresses the concerns about how to define the state of a value-based class. The rule’s precise effect depends on the interpretation of “the instance’s state” and “any other variable”. One way to read it is to include “value-ish” classes in the state and regard typical reference types as other variables.

Number 3 through 6 regard the missing identity.

It is interesting to note, that

Optional
breaks rule 2 because it calls
equals
on the wrapped value. Similarly, all value-based classes from
java.time
and
java.time.chrono
break rule 3 by being serializable (which is an identity-based operation – see below; this thread on the Valhalla mailing list talks about this).

Use Site

Again from the documentation:

A program may produce unpredictable results if it attempts to distinguish two references to equal values of a value-based class, whether directly via reference equality or indirectly via an appeal to synchronization, identity hashing, serialization, or any other identity-sensitive mechanism.

Considering the missing identity it is straight forward that references should not be distinguished. There is no explanation, though, why the listed examples are violating that rule, so let’s have a closer look. I made a list of all violations I could come up with and included a short explanation and concrete cases for each (vbi stands for instance of value-based class):

Reference Comparison
This obviously distinguishes instances based on their identity.
Serialization of vbi
It is desirable to make value types serializable and a meaningful definition for that seems straight-forward. But as it is today, serialization makes promises about object identity which conflict with the notion of identity-less value-based classes. In its current implementation, serialization also uses object identity when traversing the object graph. So for now, it must be regarded as an identity-based operation which should be avoided.
Cases:
  • non-transient field in serializable class
  • direct serialization via
    ObjectOutputStream.writeObject
Locking on a vbi
Uses the object header to access the instance’s monitor – headers of value-based classes are free to be removed and recreated and primitive/value types have no headers.
Cases:
  • use in synchronized block
  • calls to
    Object.wait
    ,
    Object.notify
    or
    Object.notifyAll
Identity Hash Code
This hash code is required to be constant over an instance’s lifetime. With instances of value-based classes being free to be removed and recreated constancy can not be guaranteed in a sense which is meaningful to developers.
Cases:
  • argument to
    System.identityHashCode
  • key in an
    IdentityHashMap

Comments highlighting other violations or improving upon the explanations are greatly appreciated!

FindBugs

Of course it is good to know all this but this doesn’t mean a tool which keeps you from overstepping the rules wouldn’t be really helpful. Being a heavy user of FindBugs I decided to ask the project to implement this and created a feature request. This ticket covers the use-site limitations and will help you uphold them for the JDK’s as well as your own value-based classes (marked with an annotation).

Being curious about FindBugs and wanting to contribute I decided to set out and try to implement it myself. So if you’re asking why it takes so long to get that feature ready, now you know: It’s my fault. But talk is cheap so why don’t you join me and help out? I put a FindBugs clone up on GitHub and you can see the progress in this pull request.

As soon as that is done I plan to implement the declaration-site rules as well, so you can be sure your value-based classes are properly written and ready when value types finally roll around.

Reflection

We have seen that value-based classes are the precursor of value types. With the changes coming to Java these instances will have no meaningful identity and limited possibilities to define their state which creates limitations both for their declaration and their use. These limitations were discussed in detail.

The post Value-Based Classes appeared first on blog@CodeFX.

Roll Your Own Pirate-Elvis Operator

$
0
0

So, Java doesn’t have an Elvis operator (or, as it is more formally known, null coalescing operator or null-safe member selection) … While I personally don’t much care about it, some people seem to really like it. And when a colleague needed one a couple of days back I sat down and explored our options.

And what do you know! You can get pretty close with method references.

Overview

We will first have a look at what the Elvis operator is and why pirates are involved. I will then show how to implement it with a utility method.

The implementation, a demo and most of the examples from this post can be found in a dedicated GitHub project. The code is Public Domain so you can use it without limitations.

Elvis? Isn’t he Dead?

I thought so, too, but apparently not. And much like rumors about The King being alive, people wishing for the Elvis operator also never quite die out. So let’s see what they want.

elvis-operator

Published by That Hartford Guy under CC-BY-SA.

(If you want to read one discussion about it for yourself, see this thread on the OpenJDK mailing list, where Stephen Colebourne proposed these operators for Java 7.)

The Elvis Operator

In its simplest form Elvis is a binary operator which selects the non-null operand, preferring the left one. So instead of …

private String getStreetName() {
	return streetName == null ? "Unknown Street" : streetName;
//	or like this?
//	return streetName != null ? streetName : "Unknown Street";
}

… you can write …

private String getStreetName() {
	return streetName ?: "Unknown Street";
}

I’d be ok to get this one in Java. It’s a nice shortcut for a frequently used pattern and keeps me from wasting time on the decision which way to order the operands for the ternary “? :” (because I always wonder whether I want to put the regular case first or want to avoid the double negative).

Emulating this with a static utility function is of course trivial but, I’d say, also borderline pointless. The effort of statically importing that method and having all readers of the code look up what it means outweighs the little benefit it provides.

So I’m not talking about this Elvis. Btw, it’s called that because ?: looks like a smiley with a pompadour. And who could that be if not Elvis… And yes, this is how we in the industry pick names all the time! More formally it is also known as the null coalescing operator.

The Pirate-Elvis Operator

Then there is this other thing which doesn’t seem to have it’s own name and this is what I want to talk about. It’s sometimes also called Elvis, but other times it gets handy names like “null-safe member selection operator”. At least, that explains pretty well what it does: It short circuits a member selection if the instance on which the member is called is null so that the whole call returns null.

This comes in handy when you want to chain method calls but some of them might return null. Of course you’d have to check for this or you’ll run into a NullPointerExeption. This can lead to fairly ugly code. Instead of …

private String getStreetName(Order order) {
	return order.getCustomer().getAddress().getStreetName();
}

… you’d have to write …

private String getStreetName(Order order) {
	Customer customer = order == null ? null : order.getCustomer();
	Address address = customer == null ? null : customer.getAddress();
	return address == null ? null : address.getStreetName();
}

That is clearly terrible. But with the “null-safe member selection operator”:

private String getStreetName(Order order) {
	return order?.getCustomer()?.getAddress()?.getStreetName();
}

Looks better, right? Yes. And it let’s you forget about all those pesky nulls, mh? Yes. So that’s why I think it’s a bad idea.

Fields being frequently null reeks of bad design. And with Java 8, you can instead avoid null by using Optional. So there should really be little reason to make throwing nulls around even easier. That said, sometimes you still want to, so let’s see how to get close.

By the way, since there seems to be no official term for this variant yet, I name ?. the Pirate-Elvis operator (note the missing eye). Remember, you read it here first! ;)

Implementing The Pirate-Elvis Operator

So now that we know what we’re talking about, let’s go implement it. We can use Optional for this or write some dedicated methods.

With Optional

Just wrap the first instance in an Optional and apply the chained functions as maps:

private String getStreetName(Order order) {
	return Optional.ofNullable(order)
			.map(Order::getCustomer)
			.map(Customer::getAddress)
			.map(Address::getStreetName)
			.orElse(null);
}

This requires a lot of boilerplate but already contains the critical aspects: Specify the methods to call with method references and if something is null (which in this case leads to an empty Optional), don’t call those methods.

I still like this solution because it clearly documents the optionality of those calls. It is also easy (and actually makes the code shorter) to do the right thing and return the street name as an

Optional<String>
.

With Dedicated Utility Methods

Starting from the solution with Optional, finding a shorter way for this special case is pretty straight forward: Just hand the instance and the method references over to a dedicated method and let it sort out when the first value is null.

public static <T1, T2> T2 applyNullCoalescing(T1 target,
		Function<T1, T2> f) {
	return target == null ? null : f.apply(target);
}

public static <T1, T2, T3> T3 applyNullCoalescing(T1 target,
		Function<T1, T2> f1, Function<T2, T3> f2) {
	return applyNullCoalescing(applyNullCoalescing(target, f1), f2);
}

public static <T1, T2, T3, T4> T4 applyNullCoalescing(T1 target,
		Function<T1, T2> f1, Function<T2, T3> f2,
		Function<T3, T4> f3) {
	return applyNullCoalescing(applyNullCoalescing(target, f1, f2), f3);
}

public static <T1, T2, T3, T4, T5> T5 applyNullCoalescing(T1 target,
		Function<T1, T2> f1, Function<T2, T3> f2,
		Function<T3, T4> f3, Function<T4, T5> f4) {
	return applyNullCoalescing(applyNullCoalescing(target, f1, f2, f3), f4);
}

(This implementation is optimized for succinctness. If each method were implemented explicitly, the performance could be improved.)

Using method references these methods can be called in a very readable fashion:

private String getStreetName(Order order) {
	return applyNullCoalescing(order,
			Order::getCustomer, Customer::getAddress, Address::getStreetName);
}

Still no

order?.getCustomer()?.getAddress()?.getStreetName();
but close.

Reflection

We have seen what the null coalescing operator (?:) and the null-safe member selection operator (?.) are. Even though the latter might encourage bad habits (passing nulls around) we have then gone and implemented it with a utility method which can be called with method references.

Any code you like is free to use.

The post Roll Your Own Pirate-Elvis Operator appeared first on blog@CodeFX.

How Java 9 And Project Jigsaw May Break Your Code

$
0
0

Java 9 looms on the horizon and it will come with a completed Project Jigsaw. I didn’t pay much attention to it until I learned from a recent discussion on the OpenJFX mailing list that it may break existing code. This is very unusual for Java so it piqued my interest.

I went reading the project’s JEPs and some related articles and came to the conclusion that, yes, this will break existing code. It depends on your project whether you will be affected but you might be and it might hurt.

Series

This post is part of an ongoing series about Project Jigsaw. In the recommended order (which is different from their publication order) these are:

The corresponding tag lists more articles about the topic.

Overview

After a cursory introduction to what Project Jigsaw is about, I will describe the potentially breaking changes.

I compiled that list of changes from the available documents. There is of course no guarantee that I caught everything and since I am unfamiliar with some of the concepts, I might have gotten some facts wrong. Caveat emptor.

If you find a mistake or think something could be made clearer or more precise, leave a comment and I will be happy to include your input.

Project Jigsaw

I might write a more detailed description of Project Jigsaw at some point, but for now I’ll be lazy and simply quote:

The primary goals of this Project are to:

  • Make the Java SE Platform, and the JDK, more easily scalable down to small computing devices;
  • Improve the security and maintainability of Java SE Platform Implementations in general, and the JDK in particular;
  • Enable improved application performance; and
  • Make it easier for developers to construct and maintain libraries and large applications, for both the Java SE and EE Platforms.

To achieve these goals we propose to design and implement a standard module system for the Java SE Platform and to apply that system to the Platform itself, and to the JDK. The module system should be powerful enough to modularize the JDK and other large legacy code bases, yet still be approachable by all developers.

Jigsaw Project Site – Feb 11 2015

If you want to know more about the project, check out its site and especially the list of goals and requirements (current version is draft 3 from July 2014).

The main thing to take away here is the module system. From version 9 on Java code can be (and the JRE/JDK will be) organized in modules instead of JAR files.

putting together project jigsaw

Published by Yoel Ben-Avraham under CC-BY-ND 2.0.

Breaking Code

This sounds like an internal refactoring so why would it break existing code? Well, it doesn’t do that necessarily and compatibility is even one of the project’s central requirements (as usual for Java):

An application that uses only standard Java SE APIs, and possibly also JDK-specific APIs, must work the same way […] as it does today.

Project Jigsaw: Goals & Requirements – DRAFT 3

The important part is the qualification “only standard APIs”. There are plenty of ways to create applications which for some critical detail rely on unspecified or deprecated properties like non-standard APIs, undocumented folder structures and internal organizations of JAR files.

So let’s see the potentially breaking changes. For more details, make sure to check the project’s site, especially JEP 220, which contains a more precise description of most of what follows.

Internal APIs Become Unavailable

With JAR files any public class is visible anywhere in the JVM. This severely limits the ability of JDK-implementations to keep internal APIs private. Instead many are accessible and they are often used for a variety of reasons (e.g. to improve performance or work around [former] bugs in the Java runtime; the Java FAQ explains why that may be a bad idea).

This changes with modules. Every module will be able to explicitly declare which types are made available as part of its API. The JDK will use this feature to properly encapsulate all internal APIs which will hence become unavailable.

This may turn out to be the biggest source of incompatibilities with Java 9. It surely is the least subtle one as it causes compile errors.

To prepare for Java 9 you could check your code for dependencies upon internal APIs. Everything you find must be replaced one way or another. Some workarounds might have become unnecessary. Other classes might find their way into the public API. To find out whether this is the case, you will have to research and maybe resort to asking this on the OpenJDK mailing list for the functionality you are interested in.

Internal APIs

So what are internal APIs? Definitely everything that lives in a

sun.*
-package. I could not confirm whether everything in
com.sun.*
is private as well – surely some parts are but maybe not all of them?
Update (5th of May, 2015)
This got cleared up in a comment by Stuart Marks as follows:

Unfortunately, com.sun is a mixture of internal and publicly supported (“exported”) APIs. An annotation @jdk.Exported distinguishes the latter from internal APIs. Note also that com.sun.* packages are only part of the Oracle (formerly Sun) JDK, and they are not part of Java SE.

So if starts with

com.sun.*
, it won’t exist on any non-Oracle JDK. And if it belongs to one of those packages and is not annotated with
@jdk.Exported
, it will be unaccessible from Java 9 on.

Two examples, which might prove especially problematic, are

sun.misc.Unsafe
and everything in
com.sun.javafx.*
. Apparently the former is used in quite a number of projects for mission and performance critical code. From personal experience I can say that the latter is a crucial ingredient to properly building JavaFX controls (e.g. all of ControlsFX depends on these packages). It is also needed to work around a number of bugs.

Both of these special cases are considered for being turned into public API (see for Unsafe and for JavaFX – although some people would rather see Unsafe die in a fire).

Tool Support

Fortunately you don’t have to find these dependencies by hand. Since Java 8 the JDK contains the Java Dependency Analysis Tool jdeps (introduction with some internal packages, official documentation for windows and unix), which can list all packages upon which a project depends.

If you run it with the parameter -jdkinternals, it will output all internal APIs your project uses – exactly the ones which you will have to deal with before Java 9 rolls around.

Update (15th of May, 2015)
JDeps does not yet recognize all packages which will be unavailable in Java 9. This affects at least those which belong to JavaFX as can be seen in JDK-8077349. I could not find other issues regarding missing functionality (using this search).
Update (11th of May, 2015)
I created a Maven plugin which uses JDeps to discover problematic dependencies and breaks the build if it finds any. See the release post for details.

Merge Of JDK And JRE

The main goal of Project Jigsaw is the modularization of the Java Platform to allow the flexible creation of runtime images. As such the JDK and JRE loose their distinct character and become just two possible points in a spectrum of module combinations.

This implies that both artifacts will have the same structure. This includes the folder structure and any code which relies on it (e.g. by utilizing the fact that a JDK folder contains a subfolder jre) will stop working correctly.

Internal JARs Become Unavailable

Internal JARs like lib/rt.jar and lib/tools.jar will no longer be accessible. Their content will be stored in implementation-specific files with a deliberately unspecified and possibly changing format.

Code which assumes the existence of these files, will stop working correctly. This might also lead to some transitional pains in IDEs or similar tools as they heavily rely on these files.

New URL Schema For Runtime Image Content

Some APIs return URLs to class and resource files in the runtime (e.g.

ClassLoader.getSystemResource
). Before Java 9 these are jar URLs and they have the following form:

    jar:file:<path-to-jar>!<path-to-file-in-jar>

Project Jigsaw will use modules as a container for code files and the individual JARs will no longer be available. This requires a new format so such APIs will instead return jrt URLs:

    jrt:/<module-name>/<path-to-file-in-module>

Code that uses the instances returned by such APIs to access the file (e.g. with

URL.getContent
) will continue to work as today. But if it depends on the structure of jar URLs (e.g. by constructing them manually or parsing them), it will fail.

Removal Of The Endorsed Standards Override Mechanism

Some parts of the Java API are considered Standalone Technologies and created outside of the Java Community Process (e.g. JAXB). It might be desirable to update them independently of the JDK or use alternative implementations. The endorsed standards override mechanism allows to install alternative versions of these standards into a JDK.

This mechanism is deprecated in Java 8 and will be removed in Java 9. Its replacement are upgradeable modules.

If you’ve never heard about this, you’re probably not using it. Otherwise you might want to verify whether the implementation you are using will be made into an upgradeable module.

Removal Of The Extension Mechanism

With the extension mechanism custom APIs can be made available to all applications running on the JDK without having to name them on the class path.

This mechanism is deprecated in Java 8 and will be removed in Java 9. Some features which are useful on their own will be retained.

If you’ve never heard about this, you’re probably not using it. Otherwise you might want to check JEP 220 for details.

Preparations For Java 9

Together these changes impose a risk for any large project’s transition to Java 9. One way to assess and reduce it could be an “update spike”: Use jdeps to identify dependencies on internal APIs. After fixing these, invest some time to build and run your project with one of the Java 9 early access builds. Thoroughly test relevant parts of the system to get a picture of possible problems.

Information gathered this way can be returned to the project, e.g. by posting it on the Jigsaw-Dev mailing list. To quote the (almost) final words of JEP 220:

It is impossible to determine the full impact of these changes in the abstract. We must therefore rely upon extensive internal and—especially—external testing. […] If some of these changes prove to be insurmountable hurdles for developers, deployers, or end users then we will investigate ways to mitigate their impact.

Reflection & Lookout

We have seen that Project Jigsaw will modularize the Java runtime. Internal APIs (packages

sun.*
and maybe
com.sun.*
) will be made unavailable and the internal structure of the JRE/JDK will change, which includes folders and JARs. Following their deprecation in Java 8, the endorsed standards override mechanism and the extension mechanism will be removed in Java 9.

If you want to help your friends and followers to prepare for Java 9, make sure to share this post.

twittergoogle_plusredditlinkedin

So far we focused on the problematic aspects of Project Jigsaw. But that should not divert from the exciting and – I think – very positive nature of the planned changes. After reading the documents, I am impressed with the scope and potential of this upcoming Java release. While it is likely not as groundbreaking for individual developers as Java 8, it is even more so for everyone involved in building and deploying – especially of large monolithic projects.

As such, I will surely write about Project Jigsaw again – and then with a focus on the good sides. Stay tuned if you want to read about it.

twittergoogle_plusrss

The post How Java 9 And Project Jigsaw May Break Your Code appeared first on blog@CodeFX.

JavaFX, Project Jigsaw and JEP 253

$
0
0

So Java 9 may break your code

This is particularly likely if your project uses JavaFX because many customizations and home-made controls require the use of internal APIs. With Project Jigsaw these will be unaccessible in Java 9. Fortunately, Oracle announced JEP 253 a couple of days ago. Its goal:

Define public APIs for the JavaFX UI controls and CSS functionality that is presently only available via internal APIs and will hence become inaccessible due to modularization.

JEP 253 – May 14 2015

Let’s have a look at how JavaFX, Project Jigsaw and JEP 253 interact.

Overview

To better understand the role internal APIs play in JavaFX, it is helpful to know its control architecture, so we will start with that. We will then look at why internal APIs are frequently used when working with JavaFX. This will help put the new JEP in context.

Because I am familiar with it I will often refer to ControlsFX as an example. I assume that similar libraries (e.g. JFXtras) as well as other projects which customize JavaFX are in the same situation.

JavaFX Control Architecture

Model-View-Controller

JavaFX controls are implemented according to model-view-controller. Without going into too much detail, let’s have a quick look at how this is done. (A great and more detailed explanation can be found at the GuiGarage.)

All official controls extend the abstract class

Control
. This is MVC’s model.

The control defines a

skinProperty
, which contains a
Skin
implementation. It visualizes the control’s current state, i.e. it is MVC’s view. By default, it is also in charge of capturing and executing user interaction, which in MVC is the controller’s task.

The skin is most often implemented by extending

BehaviorSkinBase
. It creates an implementation of
BehaviorBase
to which it delegates all user interaction and which updates the model accordingly. So here we have MVC’s controller.

Key Bindings

It is also noteworthy how controls resolve user input. In order to link an action to an input (e.g. “open new tab in background” for “CTRL + mouse click”), they create a list of

KeyBinding
s. Input events are then compared to all created bindings and the correct action is called.

JavaFX, Project Jigsaw and JEP 253

Published by Flosweb under CC-BY-SA – jigsaw effect added by me.

Internal APIs in JavaFX

When working with JavaFX, it is common to rely on internal API. This is done to create new controls, tweak existing ones or to fix bugs.

Creating New Controls

While

Control
,
Skin
and even
SkinBase
are all public API the frequently used
BehaviorSkinBase
and
BehaviorBase
are not. With Project Jigsaw, they will be unaccessible.

This API is heavily used, though. ControlsFX contains about two dozen controls and roughly half of them require implementations of either of these classes.

Similarly,

KeyBinding
s are not published so creating them to manage user interaction adds another problematic dependency.

Tweaking Existing Controls

Customizing an existing control usually happens to either change the visualization or to tweak the behavior for certain user interactions.

For the former it is often easiest to simply extend and modify the existing Skin. Unfortunately all skins of existing controls live in

com.sun.javafx.scene.control.skin
. When they become unaccessible, many customized controls will no longer compile.

To change a control’s reaction to user interaction it is necessary to interfere with the behavior defined in

BehaviorBase
. This is analog to creating a new control as it is often done by extending
BehaviorSkinBase
and
BehaviorBase
and creating new
KeyBinding
s.

Making Controls Styleable Via CSS

In JavaFX controls can be implemented so that they are styleable via CSS. All official controls come with this feature and some of those provided by other projects as well.

A central step in styling a control is to convert the attributes’ textual representations from the CSS file to instances of

Number
,
Paint
, an enum, … so they can be assigned to properties. To ensure uniform, high quality conversion JavaFX provides an API for this. Unfortunately it lives in
com.sun.javafx.css.converters
.

Update (11th of June 2015)
As pointed out by Michael, it is not necessary to create the converters directly. Instead the static factory methods on the published
StyleConverter
should be used. This makes the above paragraph moot.

Advanced styling requirements must be implemented with help of the

StyleManager
, which, you guessed it, is also not published.

Working Around Bugs

JavaFX is comparatively young and still contains some bugs which are not too hard to come in contact with. Often the only work around is to hack into a control’s inner workings and thus use private APIs. (Examples for such cases can be found on the OpenJFX mailing list, e.g. in these mails by Robert Krüger, Stefan Fuchs and Tom Schindl.)

Such workarounds will fail in Java 9. Since it seems unlikely that they become unnecessary because all bugs are fixed, concerns like the following are understandable:

Of course, in theory, if all of [those bugs] get fixed in [Java] 9 I am fine, but if there is a period of time where half of them are fixed in 9 and the other half can only be worked around on 8, what do I do with my product?

Robert Krüger – April 9 2015

JEP 253

We have seen why the use of internal APIs is ubiquitous when working with JavaFX. So how is JEP 253 going to solve this?

(Unless otherwise noted all quotes in this section are taken from the JEP.)

Goals, Non-Goals and Success Metrics

The proposal addresses precisely the problem described up to this point. And it recognizes that “[i]n many cases, to achieve a desired result, developers have no choice but to use these internal APIs”. So “[t]he goal of this JEP is to define public APIs for the functionality presently offered by the internal APIs”.

(Note that this still entails compile errors while developers move their code from the internal and now unaccessible to the new public API.)

At the same time this JEP plans neither breaking changes nor enhancements to existing, published code: “All other existing APIs that are not impacted by modularization will remain the same.”

Two success metrics are defined:

  • “Projects that depend on JavaFX internal APIs, in particular Scene Builder, ControlsFX, and JFXtras, continue to work after updating to the new API with no loss of functionality.”
  • “Ultimately, if all works to plan, third-party controls should be buildable without any dependency upon internal APIs.”

Three Projects

The JEP is split into three projects:

Project One: Make UI control skins into public APIs
Skins of existing controls will be moved from
com.sun.javafx.scene.control.skin
to
javafx.scene.control.skin
. This will make them published API. (Note that this does not include the behavior classes.)
Project Two: Improve support for input mapping
Behavior will be definable by input mapping. This allows to alter a control’s behavior at runtime without requiring to extend any specific (and unpublished) classes.
Project Three: Review and make public relevant CSS APIs
CSS API which is currently available in
com.sun.*
packages will be reviewed and published.

The proposal goes into more detail and describes the current state of each project as well as some risks and assumptions.

The projects address three out of the four use cases described above. It is reasonable to assume that these can be fulfilled and that in Java 9 it will be possible to properly create, tweak and skin controls even though internal APIs are unaccessible.

What about working around bugs? At least some of them seem to be solvable with the same tools (e.g. extending an existing skin). But I can not say whether this is true for all of them and how crucial the ones which are left without a workaround are.

Schedule

If you want to try out the new APIs, you’ll have to be patient for a while. In a tweet Jonathan Giles, Oracle tech lead in the JavaFX UI controls team and owner of JEP 253, states that he “probably won’t merge into the repo for a few months yet…”.

On the other hand, since feature completeness for Java 9 is scheduled for December, it must be available within the next seven months.

Reflection

We have seen that working with JavaFX often entails the use of private API. This happens in four largely distinct areas:

  • Creating new controls according to the control architecture (MVC).
  • Tweaking existing controls by extending their skin or altering key bindings.
  • Making controls styleable via CSS.
  • Working around bugs.

JEP 253 is split into three projects which address the first three areas. Whether they will suffice to enable working around bugs with only public API is unclear (to me).

The post JavaFX, Project Jigsaw and JEP 253 appeared first on blog@CodeFX.

Motivation And Goals Of Project Jigsaw

$
0
0

A couple of weeks ago I wrote about how Project Jigsaw may break existing code. So what do we get in return? Let’s look at the pain points the project addresses and its goals for how to solve them in Java 9.

Series

This post is part of an ongoing series about Project Jigsaw. In the recommended order (which is different from their publication order) these are:

The corresponding tag lists more articles about the topic.

Overview

We will first cover the pain points which motivated the creation of Project Jigsaw before looking at the project’s goals.

The main sources are JSR 376 and the talk Java 9, And Beyond, given by Mark Reinhold (chief architect of the Java Platform Group at Oracle) at EclipseCon 2015.

Pain Points

There are a couple of pain points Project Jigsaw is aimed to solve.

JAR/Classpath Hell

Lots of people have written about classpath hell and JAR hell and there is no need to repeat it all.

This problem shows itself when the runtime resolves dependencies differently from how the developer assumed it would. This can lead to, e.g., running the wrong version of a library. Finding what caused this can be extremely unpleasant (hence the upbeat term).

This happens because of the way the Java runtime loads classes. The mechanism is fragile (e.g. depends on order), possibly complex (e.g. with multiple nested class loaders) and hence easy to get wrong. Additionally, the runtime has no way to analyze which classes are needed so unfulfilled dependencies will only be discovered at runtime.

It is also not generally possible to fulfill dependencies on different versions of the same library.

Weak Encapsulation Across Packages

Java’s visibility modifiers are great to implement encapsulation between classes in the same package. But across package boundaries there is only one visibility: public.

Since a class loader folds all loaded packages into one big ball of mud, all public classes are visible to all other classes. There is hence no way to create functionality which is visible throughout a whole JAR but not outside of it.

This makes it very hard to properly modularize a system. If some functionality is required by different parts of a module (e.g. a library or a sub-project of your system) but should not be visible outside of it, the only way to achieve this is to put them all into one package (so package visibility can be used). This effectively removes any structure the code might have had before.

Manual Security

An immediate consequence of weak encapsulation across package boundaries is that security relevant functionality will be exposed to all code running in the same environment. This means that malicious code can access critical functionality which may allow it to circumvent security measures.

Since Java 1.1 this was prevented by a hack:

java.lang.SecurityManager.checkPackageAccess
is invoked on every code path into security relevant code and checks whether the access is allowed. Or more precisely: it should be invoked on every such path. Forgetting these calls lead to some of the vulnerabilities, which plagued Java in the past.

Startup Performance

It currently takes a while before the Java runtime has loaded all required classes and just-in-time compiled the often used ones.

One reason is that class loading executes a linear scan of all JARs on the class path. Similarly, identifying all occurrences of a specific annotation requires to inspect all classes on the class path.

Rigid Java Runtime

Before Java 8 there was no way to install a subset of the JRE. All Java installations had support for, e.g., XML, SQL and Swing which many use cases do not require at all.

While this may be of little relevance for medium sized computing devices (e.g. desktop PCs or laptops) it is obviously important for the smallest devices like routers, TV-boxes, cars and all the other nooks and crannies where Java is used. With the current trend of containerization it may also gain relevance on servers, where reducing an image’s footprint will reduce costs.

Java 8 brought compact profiles, which define three subsets of Java SE. They alleviate the problem but do not solve it. Compact profiles are fixed and hence unable to cover all current and future needs for partial JREs.

project-jigsaw-2-goals

Published by Riccardo Cuppini under CC-BY-NC-ND 2.0.

Goals Of Project Jigsaw

Project Jigsaw aims to solve the problems discussed above by introducing a language level mechanism to modularize large systems. This mechanism will be used on the JDK itself and is also available to developers to use on their own projects. (More details on the planned features in the next post.)

It is important to note that not all goals are equally important to the JDK and to us developers. Many are more relevant for the JDK and most will not have a huge impact on day to day coding (unlike, e.g., lambda expressions or default methods). They will still change the way how big projects are developed and deployed.

Reliable Configuration

The individual modules will declare their dependencies on other modules. The runtime will be able to analyze these dependencies at compile-time, build-time and launch-time and can thus fail fast for missing or conflicting dependencies.

Strong Encapsulation

One of the key goals of Project Jigsaw is to enable modules to only export specific packages. All other packages are private to the module.

A class that is private to a module should be private in exactly the same way that a private field is private to a class. In other words, module boundaries should determine not just the visibility of classes and interfaces but also their accessibility.

Mark Reinhold – Project Jigsaw: Bringing the big picture into focus

Dependencies of modules on libraries or other modules can also be kept private. It is hence possible for two modules to use different versions of the same library, each keeping its dependency on that code to itself. The runtime will then keep the versions separate and thus prevent conflicts.

Improved Security And Maintainability

The strong encapsulation of module internal APIs can greatly improve security and maintainability.

It will help with security because critical code is now effectively hidden from code which does not require to use it. It makes maintenance easier as a module’s public API can more easily be kept small.

Casual use of APIs that are internal to Java SE Platform implementations is both a security risk and a maintenance burden. The strong encapsulation provided by the proposed specification will allow components that implement the Java SE Platform to prevent access to their internal APIs.

JSR 376

Improved Performance

With clearer bounds of where code is used, existing optimization techniques can be used more effectively.

Many ahead-of-time, whole-program optimization techniques can be more effective when it is known that a class can refer only to classes in a few other specific components rather than to any class loaded at run time.

JSR 376

It might also be possible to index code with regards to the existing annotations so that such classes can be found without a full class path scan.

Scalable Platform

With the JDK being modularized, users will have the possibility to cherry pick the functionality they need and create their own JRE consisting of only the required modules. This will maintain Java’s position as a key player for small devices as well as for containers.

The proposed specification will allow the Java SE Platform, and its implementations, to be decomposed into a set of components which can be assembled by developers into custom configurations that contain only the functionality actually required by an application.

JSR 376

Reflection

We have seen that Java suffers from some problems with the way classes are loaded, encapsulation in the large and an ever growing, rigid runtime. Project Jigsaw aims to solve this by introducing a modularization mechanism which will be applied to the JDK and will also be available to users.

It promises reliable configuration and strong encapsulation which can make JAR/classpath hell a thing of the past. It can be used to improve security, maintainability and performance. Last not least, this will allow users to create a Java runtime specific for their own needs.

The next post in this series will discuss the features Project Jigsaw will bring to Java 9. Stay tuned!

twittergoogle_plusredditlinkedin

If you like what I’m writing about, why don’t you follow me?

twittergoogle_plusrss

Got any questions or comments about this post or Project Jigsaw in general? Feel free to leave a comment or ping me wherever you find me.

The post Motivation And Goals Of Project Jigsaw appeared first on blog@CodeFX.

The Features Project Jigsaw Brings To Java 9

$
0
0

So, Project Jigsaw… We already know quite a bit about it but have not yet seen the details of how it plans to deliver on its promises. This post will do precisely that and present the project’s core concepts and features.

Series

This post is part of an ongoing series about Project Jigsaw. In the recommended order (which is different from their publication order) these are:

The corresponding tag lists more articles about the topic.

Overview

The first part will cover the core concepts of Project Jigsaw, namely the modules. We will then see which features they will have and how they are planned to interact with existing code and tools.

Main sources for this article are the requirements of Project Jigsaw and of JSR 376. While these documents are based on a thorough exploratory phase and are hence very mature, they are still subject to change. Nothing of what follows is set in stone.

The Core Concept

With Project Jigsaw the Java language will be extended to have a concept of modules.

[Modules] are named, self-describing program components consisting of code and data. A module must be able to contain Java classes and interfaces, as organized into packages, and also native code, in the form of dynamically-loadable libraries. A module’s data must be able to contain static resource files and user-editable configuration files.

Java Platform Module System: Requirements (DRAFT 2)

To get a feeling for modules you can think of well-known libraries like each of the Apache Commons (e.g. Collections or IO), Google Guava or (cough) LibFX as a module. Well, depending on how granular their authors want to split them, each might actually consist of several modules.

The same is true for an application. It might be a single monolithic module but it might also be separated into more. I’d say a project’s size and cohesion will be the main determining factors for the number of modules into which it could be carved up. Whether its actual architecture and implementation allows that is another story of course.

The plan is that modules will become a regular tool in a developer’s box to organize her code.

Developers already think about standard kinds of program components such as classes and interfaces in terms of the language. Modules should be just another kind of program component, and like classes and interfaces they should have meaning in all phases of a program’s development.

Mark Reinholds – Project Jigsaw: Bringing the big picture into focus

Modules can then be combined into a variety of configurations in all phases of development, i.e. at compile time, build time, install time or run time. They will be available to Java users like us (in that case sometimes called developer modules) but they will also be used to dissect the Java runtime itself (then often called platform modules).

In fact, this is the current plan for how the JDK will be modularized:

JDK Modularization

Features

So how do modules work? Looking at the planned features will help us get a feeling for them.

Note that even though the following sections will present a lot of features, they are neither discussed in all available detail nor is the list complete. If you’re interested to learn more, you can start by following the bracketed links or check out the complete requirements of Project Jigsaw and of JSR 376 straight away.

Dependency Management

In order to solve JAR/classpath hell one of the core features Project Jigsaw implements is dependency management.

Declaration And Resolution

A module will declare which other modules it requires to compile and run [dependencies]. This will be used by the module system to transitively identify all the modules required to compile or run the initial one [resolution].

It will also be possible to depend not on specific modules but on a set of interfaces. The module system will then try to find modules which implement these interfaces and thus satisfy the dependency [services, binding].

Versioning

There will be support for versioning modules [versioning]. They will be able to indicate their own version (in pretty much any format as long as it is totally ordered) as well as constraints for their dependencies. It will be possible to override both of these pieces of information in any phase. The module system will enforce during each phase that a configuration satisfies all constraints.

Project Jigsaw will not necessarily support multiple versions of a module within a single configuration [multiple versions]. But wait, then how does this solve JAR hell? Good question.

The module system might also not implement version selection . So when I wrote above that “the module system [will] identify all the modules required to compile or run” another module, this was based on the assumption that there is only one version of each. If there are several, an upstream step (e.g. the developer or, more likely, the build tool he uses) must make a selection and the system will only validate that it satisfies all constraints [version-selection].

Encapsulation

All public classes and interfaces in a JAR are automatically available to all other code which was loaded from the same class path. This will be different for modules, where the system will enforce a stronger encapsulation in all phases (regardless of whether a security manager is present or not).

A module will declare specific packages and only the types contained in them will be exported. This means that only they will be visible and accessible to other modules. Even stricter, the types will only be exported to those modules which explicitly depend on the module containing them [export, encapsulation].

To help developers (especially those modularizing the JDK) in keeping exported API surfaces small, an additional publication mechanism will exist. This one will allow a module to specify additional packages to be exported but only to an also specified set of modules. So whereas with the “regular” mechanism the exporting module won’t know (nor care) who accesses the packages, this one will allow it to limit the set of possible dependants [qualified exports].

It will also be possible for a module to re-export the API (or parts thereof) of a module it depends upon. This will allow to split and merge modules without breaking dependencies because the original ones can continue to exist. They will export the exact same packages as before even though they might not contain all the code [refactoring]. In the extreme case so-called aggregator modules could contain no code at all and act as a single abstraction of a set of modules. In fact, the compact profiles from Java 8 will be exactly that.

Different modules will be able to contain packages with the same name, they will even be allowed to export them [export, non-interference].

Oracle will use this opportunity to make all internal APIs unavailable. This will be the biggest impediment for adoption of Java 9 but is definitely setting the right course. First and foremost, it will greatly improve security as critical code is now hidden from attackers. It will also make the JDK considerably more maintainable, which will pay off in the long run.

Configuration, Phases, And Fidelity

As mentioned earlier, modules can be combined into a variety of configurations in all phases of development. This is true for the platform modules, which can be used to create images identical to the full JRE or JDK, the compact profiles introduced in Java 8, or any custom configuration which contains only a specified set of modules (and their transitive dependencies) [JEP 200; Goals]. Likewise, developers can use the mechanism to compose different variants of their own modularized applications.

At compile time, the code being compiled will only see types which are exported by a configured set of modules [compile-time configuration]. At build-time, a new tool (presumably called JLink) will allow the creation of binary run-time images which contain specific modules and their dependencies [build-time configuration]. At launch time, an image can be made to appear as if it only contains a subset of its modules [launch-time configuration].

It will also be possible to replace modules which implement an endorsed standard or a standalone technology with a newer version in each of the phases [upgradeable modules]. This will replace the deprecated endorsed standards override mechanism and the extension mechanism.

All aspects of the module system (like dependency management, encapsulation and so forth) will work in the same manner in all phases unless this is not possible for specific reasons [fidelity].

All module-specific information (like versions, dependencies and package export) will be expressed in code files, independent of IDEs and build tools.

Performance

Whole-Program Optimization Techniques

Within a module system with strong encapsulation it is much easier to automatically reason about all the places where a specific piece of code will be used. This makes certain program analysis and optimization techniques more feasible:

Fast lookup of both JDK and application classes; early bytecode verification; aggressive inlining of, e.g., lambda expressions, and other standard compiler optimizations; construction of JVM-specific memory images that can be loaded more efficiently than class files; ahead-of-time compilation of method bodies to native code; and the removal of unused fields, methods, and classes.

Project Jigsaw: Goals & Requirements (DRAFT 3)

These are labeled whole-program optimization techniques and at least two such techniques will be implemented in Java 9. It will also contain a tool which analyzes a given set of modules and applies these optimizations to create a more performant binary image.

Annotations

Auto discovery of annotated classes (like e.g. Spring allows) currently requires to scan all classes in some specified packages. This is usually done during a program’s start and can slow it down considerably.

Modules will have an API allowing callers to identify all classes with a given annotation. One envisioned approach is to create an index of such classes that will be created when the module is compiled [annotation-detection].

Features Project Jigsaw

Published by droetker0912 under CC-BY-NC-SA 2.0.

Integration With Existing Concepts And Tools

Diagnostic tools (e.g. stack traces) will be upgraded to convey information about modules. Furthermore, they will be fully integrated into the reflection API, which can be used to manipulate them in the same manner as classes [reflection, debugging and tools]. This will include the version information which can be reflected on and overriden at runtime [version strings in reflective APIs, overridable version information].

The module’s design will allow build tools to be used for them “with a minimum of fuss” [build tools]. The compiled form of a module will be usable on the class path or as a module so that library developers are not forced to create multiple artifacts for class-path and module-based applications [multi-mode artifacts].

Interoperability with other module systems, most notably OSGi, is also planned [interoperation].

Even though modules can hide packages from other modules it will be possible to test the contained classes and interfaces [white-box testing].

OS-Specific Packaging

The module system is designed with package manager file formats “such as RPM, Debian, and Solaris IPS” in mind. Not only will developers be able to use existing tools to create OS-specific packages from a set of modules. Such modules will also be able to call other modules that were installed with the same mechanism [module packaging].

Developers will also be able to package a set of modules which make up an application into an OS-specific package “which can be installed and invoked by an end user in the manner that is customary for the target system”. Building on the above, only those modules which are not present on the target system have to be packaged [application packaging].

Dynamic Configuration

Running applications will have the possibility to create, run, and release multiple isolated module configurations [dynamic configuration]. These configurations can contain developer and platform modules.

This will be useful for container architectures like IDEs, application servers, or the Java EE platform.

Reflection

We have seen most of the features Project Jigsaw will bring to Java 9. They all revolve around the new core language concept of modules.

Maybe most important in day-to-day programming will be dependency management, encapsulation, and configuration across the different phases. Improved performance is always a nice take-away. And then there is the work invested into cooperation with existing tools and concepts, like reflection, diagnostics, build tools and OS-specific packaging.

Can’t wait to try it out? Neither can I! But we’ll have to wait until JSR 376 will have come along further before the early access releases of JDK9 or JDK 9 with Project Jigsaw will actually contain the module system. When it finally does, you’ll read about it here.

The post The Features Project Jigsaw Brings To Java 9 appeared first on blog@CodeFX.

Casting In Java 8 (And Beyond?)

$
0
0

Casting an instance to a type reeks of bad design. Still, there are situations where there is no other choice. The ability to do this has hence been part of Java since day one.

I think Java 8 created a need to slightly improve this ancient technique.

Static Casting

The most common way to cast in Java is as follows:

Object obj; // may be an integer
if (obj instanceof Integer) {
	Integer objAsInt = (Integer) obj;
	// do something with 'objAsInt'
}

This uses the

instanceof
and cast operators, which are baked into the language. The type to which the instance is cast, in this case
Integer
, must be statically known at compile time, so let’s call this static casting.

If

obj
is no
Integer
, the above test would fail. If we try to cast it anyways, we’d get a
ClassCastException
. If
obj
is
null
, it fails the
instanceof
test but could be cast because
null
can be a reference of any type.

Dynamic Casting

A technique I encounter less often uses the methods on

Class
that correspond to the operators:

Object obj; // may be an integer
if (Integer.class.isInstance(obj)) {
	Integer objAsInt = Integer.class.cast(obj);
	// do something with 'objAsInt'
}

Note that while in this example the class to cast to is also known at compile time, this is not necessarily so:

Object obj; // may be an integer
Class<T> type = // may be Integer.class
if (type.isInstance(obj)) {
	T objAsType = type.cast(obj);
	// do something with 'objAsType'
}

Because the type is unknown at compile type, we’ll call this dynamic casting.

The outcomes of tests and casts for instances of the wrong type and null references are exactly as for static casting.

casting-java-8-and-beyond

Published by vankarsten under CC-BY-NC 2.0.

Casting In Streams And Optionals

The Present

Casting the value of an

Optional
or the elements of a
Stream
is a two-step-process: First we have to filter out instances of the wrong type, then we can cast to the desired one.

With the methods on

Class
, we do this with method references. Using the example of
Optional
:

Optional<?> obj; // may contain an Integer
Optional<Integer> objAsInt = obj
		.filter(Integer.class::isInstance)
		.map(Integer.class::cast);

That we need two steps to do this is no big deal but I feel like it is somewhat awkward and more verbose than necessary.

The Future (Maybe)

I propose to implement casting methods on

Class
which return an
Optional
or a
Stream
. If the passed instance is of the correct type, an
Optional
or a singleton
Stream
containing the cast instance would be returned. Otherwise both would be empty.

Implementing these methods is trivial:

public Optional<T> castIntoOptional(Object obj) {
	if (isInstance(obj))
		return Optional.of((T) obj);
	else
		Optional.empty();
}

public Stream<T> castIntoStream(Object obj) {
	if (isInstance(obj))
		return Stream.of((T) obj);
	else
		Stream.empty();
}

This lets us use

flatMap
to filter and cast in one step:

Stream<?> stream; // may contain integers
Stream<Integer> streamOfInts = stream.
		flatMap(Integer.class::castIntoStream);

Instances of the wrong type or null references would fail the instance test and would lead to an empty

Optional
or
Stream
. There would never be a
ClassCastException
.

Costs And Benefits

What is left to be determined is whether these methods would pull their own weight:

  • How much code could actually use them?
  • Will they improve readability for the average developer?
  • Is saving one line worth it?
  • What are the costs to implement and maintain them?

I’d answer these questions with not much, a little, yes, low. So it’s close to a zero-sum game but I am convinced that there is a small but non-negligible benefit.

What do you think? Do you see yourself using these methods?

The post Casting In Java 8 (And Beyond?) appeared first on blog@CodeFX.


All About Project Jigsaw On InfoQ

$
0
0

Over the last few weeks I had the opportunity to polish my posts about Project Jigsaw for InfoQ. The result was published today:

Project Jigsaw is Really Coming in Java 9

Besides refining what I wrote about motivation, goals, the core concept, features and risks it casts light on the history and current structure, and presents the strawman syntax from JEP 200. If you haven’t read much about Jigsaw, this is your chance to get a comprehensive view of it.

And thanks to Victor Grazi for his excellent editing and perseverance. It improved and tightened the text noticeably.

Next Steps

Working on the article took a lot of time. (Which is why I count this lousy announcement here as one of the three posts I want to publish each month.) But now that I am done with that the road is clear to experimenting with the module system itself!

This be possible as soon as the current work on JSR 376 is merged into the OpenJDK repositories, which can happen any day now.

Stay tuned:

twittergoogle_plusrss

The post All About Project Jigsaw On InfoQ appeared first on blog@CodeFX.

Impulse: “Adventures On The Road to Valhalla”

$
0
0

With all this talk about Java 9 and Project Jigsaw we should not loose sight of another big change coming to Java. Hopefully in version 10 or 11 Project Valhalla will come to fruition and introduce value types and specialization.

So what is this about, how far along is the project and what challenges does it face? A couple of days ago Brian Goetz, Java Language Architect at Oracle and project lead for Valhalla, answered these questions in a talk at the JVM Language Summit 2015.

Let’s have a look.

Overview

This post is a going to present three out of four parts of Goetz’s talk “Adventures On The Road to Valhalla”.

He begins with a prologue, which I padded with a couple of additional explanations for those who do not yet know about Project Valhalla. Goetz continues to present the two prototypes, of which the first was made publicly available last year and the second only two weeks ago. I will not cover his last part about future experiments as the post is already long enough. If you find this topic interesting, you should definitely watch the whole talk!

All quotes throughout the text are either taken from the slides or verbatim.

The Talk

Here’s the talk:

(Btw, big kudos to the JVMLS team for getting all the talks online within a couple of hours!)

If you can spare the 50 minutes, go watch it! No need to read this post, then.

The Gist

Prologue

The two major topics addressed by Project Valhalla are value types and generic specialization.

Value Types

The former will allow users to define “int-like” types with the same properties (like immutability, equality instead of identity) and the performance advantages emerging from that. They are preceded by Java 8’s value-based classes.

(Unless otherwise noted, when the rest of this post talks about primitives, value types are included.)

Generic Specialization

With everybody declaring their own primitive-ish types, the problems caused by the fact that generics do not work over them (i.e. no

ArrayList<int>
) become insufferable. While having to box primitives is ok from a conceptual point of view, it has notable performance costs.

First of all, storing objects instead of primitives costs extra memory (e.g. for object headers). Then, and this is worse, boxing destroys cache locality. When the CPU caches an

Integer
-array, it only gets pointers to the actual values. Fetching those is an additional random memory access. This extra level of indirection costs dearly and potentially cripples parallelization when the CPUs are mostly waiting for cache misses.

So another goal of Project Valhalla is to expand the scope of parametric polymorphism to enable generics over primitives. To be successful the JVM should use primitives instead of boxes for generic fields, arguments and return values in a generic class.

Because of the way it will likely be implemented, this is called generic specialization.

So generics need to play nicely with value types and primitives can come along for the ride.

Current State Of Generics

Due to erasure, type variables are erased to their bound, i.e.

ArrayList<Integer>
effectively becomes
ArrayList<Object>
(or rather just
ArrayList
). Such a bound must be the supertype of all possible instantiations. But Java has no type above primitives and reference types.

Additionally, JVM bytecode instructions are typically orthogonal, i.e. split along the same lines. An

aload
or
astore
can only move references. Specialized variants have to be used for primitives, e.g.
iload
or
istore
for
int
. There is no bytecode that can move both a reference and an
int
.

So neither the type system nor the bytecode instruction set are up to the task of generifying over primitives. This was well understood when generics were developed over ten years ago and, as a compromise, the decision was to simply not allow it.

Today’s problems come from yesterday’s solutions…

Compatibility!

Everything that happens under Project Valhalla has of course to be backwards compatible. This takes several forms:

Binary Compatibility
Existing bytecode, i.e. compiled class files, must continue to mean the same thing. This ensures that dependencies continue to work without having to be recompiled.
Source Compatibility
Source files must continue to mean exactly the same thing, so recompiling them must not change anything “just because the language has changed”.
Migration Combatibility
Compiled classes from different Java versions must work together to allow migrating one dependency at a time.

An additional requirement is to not make the JVM mimic the Java language in too many details. Doing so would force other JVM languages to deal with semantics of the Java language.

Prototype Model 1: Making It Work

About a year ago Goetz and his colleagues presented the first experimental implementation of specialization.

The Idea

In this prototype the compiler continues to produce erased classfiles but augments them with additional type information.

This information is ignored by the VM but will be used by the specializer, which is a new part of the class loader. The latter will recognizes when a class with a primitive type parameter is required and let the specializer generate it on the fly from the erased but augmented classfile.

With erasure, all generic instantiations of a class use the same classfile. In contrast, creating a new classfile for each primitive type is called specialization.

The Details

In this prototype specialized classes are described with a “name-mangling technique”. The class name is appended with a string that denotes which type argument is specialized to which primitive. E.g.

ArrayList${0=I}
means “
ArrayList
instantiated with first type variable
int
”.

During specialization the signatures and the bytecode have to be changed. To do this correctly the specializer needs to know which of the occurrences of

Object
(to which all generic types were erased) have to be specialized to which type. The required signature information were already mostly present in the classfile and the prototype annotates the bytecode with the additional type metadata.

From 8:44 on Goetz gives a couple of examples of how this plays out. He also uses them to point to some of the details that such an implementation would have to be aware of, like the topic of generic methods.

I know that was a lot of fast hand-waving. The point is, this is straight-forward enough but there is lots of fiddly little bits of complexity.

The Summary

This experiment shows that on-the-fly specialization based on classfile metadata works without changes to the VM. These are important achievements but there are prohibitive disadvantages.

First, it requires the implementation of a complicated set of details.

Second and maybe most importantly, it has problematic type system characteristics. Without changes to the VM there is still no common supertype of

int
and
String
and hence no common supertype of
ArrayList<int>
and
ArrayList<String>
. This means there is no way to declare “any instantiation of
ArrayList
”.

Third, this has terrible code sharing properties. Even though much of the code of

ArrayList<int>
and
ArrayList<String>
is identical, it would be duplicated in
ArrayList${0=I}
and
ArrayList
.

Death by 1000 cuts.

Prototype Model 2: Rescuing Wildcards

The second, and very new prototype addresses the problematic type system characteristics.

The Problem

Currently, unbounded wildcards express “any instantiation of a class”, e.g.

ArrayList<?>
means “any
ArrayList
”. They are heavily used, especially by library developers. In a system where
ArrayList<int>
and
ArrayList<String>
are different classes, wildcards may be even more important as they bridge the gap between them “and express the basic
ArrayList
-ness”.

But if we assume

ArrayList<?>
were a supertype to
ArrayList<int>
, we’d end up in situations where we require multiple inheritance of classes. The reason is that
ArrayList<T>
extends
AbstractList<T>
so we’d also want
ArrayList<int>
to extend
AbstractList<int>
. Now
ArrayList<int>
would extend both
ArrayList<?>
and
AbstractList<int>
(which have no inheritance relationship).

adventures-on-the-road-to-project-valhalla-multiple-inheritance

(Note the difference to the current generics with erasure. In the VM,

ArrayList<Integer>
and
ArrayList<?>
are the same class
ArrayList
, which is free to extend
AbstractList
.)

The root cause is that while

ArrayList<?>
might look like it means “any
ArrayList
” it actually means
ArrayList<? extends Object>
, i.e. “any
ArrayList
over reference types”.

The Idea

The prototype introduces a new hierarchy of wildcards with

ref
,
val
, and
any
:
  • ref
    comprises all reference types and replaces
    ?
  • val
    comprises all primitives and value types (this is not currently supported by the prototype and not mentioned in the talk but was announced on the Valhalla mailing list)
  • any
    contains both
    ref
    and
    val

The multiple inheritance of specialized classes will be solved by representing the any-types with synthetic interfaces.

ArrayList<int>
will thus extend
AbstractList<int>
and implement
ArrayList<any>
.

The Details

Hierarchy

ArrayList<ref>
, which is
ArrayList<?>
, will continue to be the erased type.

To represent

ArrayList<any>
the compiler will create an interface
ArrayList$any
. It will be implemented by all classes generated from
ArrayList
(e.g.
ArrayList<int>
and the erased
ArrayList
) and will extend all the synthetic interfaces that correspond to the superclasses, e.g.
AbstractList$any
for
AbstractList<any>
.

adventures-on-the-road-to-project-valhalla-any-interface

The interface will contain declarations for all of the class’s methods and accessors for its fields. Because there is still no common supertype to objects and primitives, their generic parameter and return types would have to be boxed.

But this detour would only have to taken if the class is accessed as

ArrayList<any>
whereas the access is direct for, e.g.,
ArrayList<int>
. So the performance cost of boxing is only borne by those developers using wildcards, while code using primitive specializations directly gets the improved performance it expects.

It works prety cleanly.

You shouldn’t believe me, it gets complicated. But it’s a good story. We’ll keep going.

From 26:33 on Goetz starts giving examples to explain some details.

Accessibility

Accessibility is an area where the VM needs to change. Up to now, interfaces can not have private or package visible methods. (In Java 9 private default methods will be possible but that doesn’t help here because the need to have an implementation.)

A connected but much older problem is that an outer class and its inner classes can access each others private members even though the VM does not allow that because to it these are all unrelated classes. This is currently solved by generating bridge methods, i.e. methods with a higher visibility that will then be called instead of the inaccessible private members.

Creating even more bridge methods for specialized classes would be possible but unwieldly. Instead a possible change is to create the notion of a nest of classes. It would contain all specialized and inner classes and the VM would allow access of private members inside a nest.

This would align the interpretation of the language, which sees a class with all its specializations and inner classes as one unit, and of the VM, which up to now only sees a bunch of unrelated classes.

Arrays

Generic methods might also take or return arrays. But while specialization can box an

int
to an
Object
, an
int[]
is no
Object[]
and boxing each individual
int
is a terrible idea.

Arrays 2.0 might come to the rescue here. Because the discussion requires a basic familiarity with the proposal I will not go into details. In summary, it looks like they will solve the problem.

The Summary

The changes to the language are conceptually simple. In the absence of

any
nothing changes. Type variables can be decorated with
any
and if such an instance needs to be assigned to a wildcarded type, the wildcard has to use
any
as well.

With the common supertype to generic classes across primitive and reference types, e.g.

ArrayList<any>
, the resulting programming model is way more reasonable. Talking about his team’s experience with porting the Stream API to this prototype, Goetz says:

It’s just really smooth. It’s exactly what you want. About 70% of the code just evaporates because all of the hand-specialized primitive stuff just goes away and then a lot of the complex machinery to support the hand-specialization, that goes away, and it becomes this simple library a third year student could write. So we consider that a pretty successful experiment.

There is also excellent compatibility with existing code.

Unfortunately, the bad code sharing properties of the first prototype remain.

ArrayList<int>
and
ArrayList<String>
are still different classes that are very similar but share no code. The next part, which I will not cover in this post, addresses that and presents possible approaches to solving this problem.

Reflection

The talk is very dense and covers a lot of ground. We have seen that the introduction of value types and desired performance improvements require generic specialization so boxing can be reduced or even prevented.

The first prototype achieves this without JVM changes by specializing classes when they are loaded. But it has the problem that there is no common supertype to all instantiations of a class because primitive and reference type parameters yield entirely unrelated classes. The second prototype introduces the wildcards

ref
,
val
, and
any
and uses synthetic interfaces to to denote any-types.

This is all very exciting and I can’t wait to try it out! Unfortunately, I’m going on a holiday so I can’t for a while. Stupid real life… Don’t wreck things while I’m gone!

The post Impulse: “Adventures On The Road to Valhalla” appeared first on blog@CodeFX.

Java 8 SE Optional, a strict approach

$
0
0

About two weeks ago Stephen Colebourne presented his pragmatic approach to using

Optional
. If you read it, you might have guessed from my previous recommendations that I don’t agree.

Overview

I have to start with a disclaimer but then I’ll jump right in and explain why I think his approach is less then ideal.

All quotes that are not attributed to somebody else are taken from Stephen’s post. While not strictly necessary I recommend to read it first. But don’t forget to come back!

I created three gists, which I present throughout the post: the same example in Stephen’s version, my basic version, and my extended version.

Disclaimer

Stephen Colebourne is a Java legend. Quoting Markus Eisele’s Heroes of Java post about him:

Stephen Colebourne is a Member of Technical Staff at OpenGamma. He is widely known for his work in open source and his blog. He created Joda-Time which is now being further developed as JSR-310/ThreeTen. He contributes to debates on the future of Java, including proposals for the diamond operator for generics and FCM closures, both of which are close to the adopted changes in Java 7 and 8. Stephen is a frequent conference speaker, JavaOne Rock Star and Java Champion.

I had the pleasure to contribute to Stephen’s Property Alliance and this reinforced my opinion of him as an extremely competent developer and a very deliberate person.

All of which goes to say that if in doubt, trust him over me.

Then there is the fact, that his approach is rooted in the axiom that

Optional
should solely be used as a return type. This is absolutely in line with the recommendations of those who introduced the class in the first place. Quoting Brian Goetz:

Of course, people will do what they want. But we did have a clear intention when adding this feature, and it was not to be a general purpose Maybe or Some type, as much as many people would have liked us to do so. Our intention was to provide a limited mechanism for library method return types where there needed to be a clear way to represent “no result”, and using

null
for such was overwhelmingly likely to cause errors.

[…] You should almost never use it as a field of something or a method parameter.

So if in doubt, trust his opinion over mine.

/stephen-colebourne-optional-a-strict-approach

Published by JD Hancock under CC-BY 2.0.

Juxtaposition

Of course, even better than to just trust anyone is to make up your own mind. So here are my arguments in contrast to Stephen’s.

Basic Points

These are Stephen’s five basic points:

  1. Do not declare any instance variable of type
    Optional
    .
  2. Use
    null
    to indicate optional data within the private scope of a class.
  3. Use
    Optional
    for getters that access the optional field.
  4. Do not use
    Optional
    in setters or constructors.
  5. Use
    Optional
    as a return type for any other business logic methods that have an optional result.

Here are mine:

  1. Design your code to avoid optionality wherever feasibly possible.
  2. In all remaining cases, prefer
    Optional
    over
    null
    .

Examples

Let’s compare examples. His is:

I like that no consumer of this class can receive

null
. I dislike how you still have to deal with it – within the class but also without.

This would be my (basic) version:

There are simply no nulls, here.

Differences

A Constrained Problem

Within the object, the developer is still forced to think about

null
and manage it using
!= null
checks. This is reasonable, as the problem of null is constrained. The code will all be written and tested as a unit (you do write tests don’t you?), so nulls will not cause many issues.

Do you see how his constructor allows one of the arguments to be

null
? And the only way to find out which one requires you to leave what you are doing and look at some other class’ code. This is no big thing but unnecessary nonetheless.

Even leaving this aside, the problem is not as constrained as it should be. Assuming that everybody hates comments, we have to assume they are not there, which leaves the constructor internals and the getter’s return type to tell you that the field is nullable. Not the best places for this information to jump out at you.

His argument for tests might get crushed by numbers. If all tests include all fields, each optional field would double the number of tests as each should be run for the null and the non-null case. I’d prefer having the type system as a first line of defense here.

On the other hand, this pain might convince the developer to maybe find a solution with less optionality within a single class.

Performance

Stephen correctly points out that an instance created for a method return value that is then quickly discarded (which is typical for uses of

Optional
) has little to no costs. Unlike an
Optional
field, which exists for the entire life time of the containing object and adds an additional layer of indirection from that object to the
Optional
’s payload.

For him this is a reason to prefer

null
.

While it is easy to claim this is “premature optimization”, as engineers it is our responsibility to know the limits and capabilities of the system we work with and to choose carefully the point where it should be stressed.

I agree. But to me part of choosing carefully means to profile first. And if someone shows me convincing arguments that in his concrete case replacing some

Optional
fields with nullable fields causes a noticeable performance gain, I’d rip them stupid boxes right out. But in all other cases I stick with the code I consider more maintainable.

By the way, the same argument could be made for using arrays instead of

ArrayList
s or
char
-arrays instead of strings. I’m sure nobody would follow that advice without considerable performance gains.

This recurring topic in the discussion deserves some attention, though. I will try to find some time to profile some use cases that I think would be interesting.

Serializability

While it is a minor point, it should be noted that the class could be

Serializable
, something that is not possible if any field is
Optional
(as
Optional
does not implement
Serializable
).

I consider this to be solved. Causes a little extra work, though.

Convenience

[I]t is my experience that having

Optional
on a setter or constructor is annoying for the caller, as they typically have the actual object. Forcing the caller to wrap the parameter in
Optional
is an annoyance I’d prefer not to inflict on users. (ie. convenience trumps strictness on input)

While writing annoying code can be fun I see his point. So don’t force users, overload your methods:

Of course this doesn’t scale well with many optional fields. In that case, the builder pattern will help.

Then there is the fact that if our nullable postcode has a setter, the developer working on some other code must again stop and come looking at this class to determine whether she can pass

null
. And since she can never be sure, she has to check for other getters, too. Talking about annoying code…

With a field of type

Optional
the setter could look like this:

Again, all

null
values are immediately answered with an exception.

Beans

On the downside, this approach results in objects that are not beans.

Yep. Having a field of type

Optional
doesn’t suffer from that.

Commonalities

It should not be overlooked that we’re discussing details here. Our goal is the same and we’re proposing similar ways of getting there.

If adopted widely in an application, the problem of

null
tends to disappear without a big fight. Since each domain object refuses to return
null
, the application tends to never have
null
passed about. In my experience, adopting this approach tends to result in code where
null
is never used outside the private scope of a class. And importantly, this happens naturally, without it being a painful transition. Over time, you start to write less defensive code, because you are more confident that no variable will actually contain
null
.

This is a great goal to achieve! And following Stephen’s advice will get you most of the way there. So don’t take my disagreement as a reason to not use

Optional
at least that much.

All I’m saying is that I see little reason to stop short of banning

null
even more!

Reflection

I addressed and hopefully refuted a number of arguments against using

Optional
whenever something is nullable. I hope to have shown that my stricter approach goes further in exorcising
null
. This should free up your mind to think about more relevant problems.

The price to pay might be a shred of performance. If someone proves that it is more, we can still return to

null
for those specific cases. Or throw hardware at the problem. Or wait for value types.

What do you think?

The post Java 8 SE Optional, a strict approach appeared first on blog@CodeFX.

Stream Performance

$
0
0

When I read Angelika Langer’s Java performance tutorial – How fast are the Java 8 streams? I couldn’t believe that for a specific operation they took about 15 times longer than for loops. Could stream performance really be that bad? I had to find out!

Coincidently, I recently watched a cool talk about microbenchmarking Java code and I decided to put to work what I learned there. So lets see whether streams really are that slow.

Overview

As usual I will start with a dull prologue. This one will explain why you should be very careful with what I present here, how I produced the numbers, and how you can easily repeat and tweak the benchmark. If you don’t care about any of this, jump right to Stream Performance.

But first, two quick pointers: All benchmark code is up on GitHub and this Google spreadsheet contains the resulting data.

Prologue

Disclaimer

This post contains a lot of numbers and numbers are deceitful. They seem all scientific and precise and stuff, and they lure us into focusing on their interrelation and interpretation. But we should always pay equal attention to how they came to be!

The numbers I’ll present below were produced on my system with very specific test cases. It is easy to over-generalize them! I should also add that I have only two day’s worth of experience with non-trivial benchmarking techniques (i.e. ones that are not based on looping and manual

System.currentTimeMillis()
).

Be very careful with incorporating the insights you gained here into your mental performance model. The devil hiding in the details is the JVM itself and it is a deceitful beast. It is entirely possible that my benchmarks fell victim to optimizations that skewed the numbers.

System

  • CPU: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz
  • RAM: Samsung DDR3 16GB @ 1.60GHz (the tests ran entirely in RAM)
  • OS: Ubuntu 15.04. Kernel version 3.19.0-26-generic
  • Java: 1.8.0_60
  • JMH: 1.10.5

Benchmark

JMH

The benchmarks were created using the wonderful Java Microbenchmarking Harness (JMH), which is developed and used by the JVM performance team itself. It’s thoroughly documented, easy to set up and use, and the explanation via samples is awesome!

If you prefer a casual introduction, you might like Aleksey Shipilev’s talk from Devoxx UK 2013.

Setup

To create somewhat reliable results, benchmarks are run individually and repeatedly. There is a separate run for each benchmark method that is made up of several forks, each running a number of warmup iterations before the actual measurement iterations.

I ran separate benchmarks with 50’000, 500’000, 5’000’000, 10’000’000, and 50’000’000 elements. Except the last one all had two forks, both consisting of five warmup and five measurement iterations, where each iteration was three seconds long. Parts of the last one were run in one fork, two warmup and three measurement iterations, each 30 seconds long.

Langer’s article states that their arrays are populated with random integers. I compared this to the more pleasant case where each

int
in the array equals its position therein. The deviation between the two scenarios averaged 1.2% with the largest difference being 5.4%.

Since creating millions of randomized integers takes considerable time, I opted to execute the majority of the benchmarks on the ordered sequences only, so unless otherwise noted numbers pertain to this scenario.

Code

The benchmark code itself is available on GitHub. To run it, simply go to the command line, build the project, and execute the resulting jar:

mvn clean install
java -jar target/benchmarks.jar

Some easy tweaks:

  • adding a regular expression at the end of the execution call will only benchmark methods whose fully-qualified name matches that expression; e.g. to only run
    ControlStructuresBenchmark
    :
    java -jar target/benchmarks.jar Control
  • the annotations on
    AbstractIterationBenchmark
    govern how often and how long each benchmark is executed
  • the constant
    NUMBER_OF_ELEMENTS
    defines the length of the array/list that is being iterated over
  • tweak
    CREATE_ELEMENTS_RANDOMLY
    to switch between an array of ordered or of random numbers

stream-performance

Published by Bart under CC-BY-NC-ND 2.0.

Stream Performance

Repeating The Experiment

Let’s start with the case that triggered me to write this post: Finding the maximum value in an array of 500’000 random elements.

int m = Integer.MIN_VALUE;
for (int i = 0; i < intArray.length; i++)
	if (intArray[i] > m)
		m = intArray[i];

First thing I noticed: My laptop performs much better than the machine used for the JAX article. This was to be expected as it was described as “outdated hardware (dual core, no dynamic overclocking)” but it made me happy nevertheless since I paid enough for the damn thing. Instead of 0.36 ms it only took 0.130 ms to loop through the array.

More interesting are the results for using a stream to find the maximum:

// article uses 'reduce' to which 'max' delegates
Arrays.stream(intArray).max();

Langer reports a runtime of 5.35 ms for this, which compared to the loop’s 0.36 ms yields the reported slowdown by x15. I consistently measured about 0.560 ms, so I end up with a slowdown of “only” x4.5. Still a lot, though.

Next, the article compares iterating over lists against streaming them.

// for better comparability with looping over the array
// I do not use a "for each" loop (unlike the Langer's article);
// measurements show that this makes things a little faster
int m = Integer.MIN_VALUE;
for (int i = 0; i < intList.size(); i++)
	if (intList.get(i) > m)
		m = intList.get(i);

intList.stream().max(Math::max);

The results are 6.55 ms for the for loop and 8.33 ms for the stream. My measurements are 0.700 ms and 3.272 ms. While this changes their relative performance considerably, it creates the same order:

Angelika Langer Me
operation time (ms) slower time (ms) slower
array_max_for
0.36 0.123
array_max_stream
5.35 14’861% 0.599 487%
list_max_for
6.55 22% 0.700 17%
list_max_stream
8.33 27% 3.272 467%

I ascribe the marked difference between iterations over arrays and lists to boxing; or rather to the resulting indirection. The primitive array is packed with the values we need but the list is backed by an array of

Integer
s, i.e. references to the desired values which we must first resolve.

The considerable difference between Langer’s and my series of relative changes (+14’861% +22% +27% vs +487% + 17% + 467%) underlines her statement, that “the performance model of streams is not a trivial one”.

Bringing this part to a close, her article makes the following observation:

We just compare two integers, which after JIT compilation is barely more than one assembly instruction. For this reason, our benchmarks illustrate the cost of element access – which need not necessarily be a typical situation. The performance figures change substantially if the functionality applied to each element in the sequence is cpu intensive. You will find that there is no measurable difference any more between for-loop and sequential stream if the functionality is heavily cpu bound.

So let’s have a lock at something else than just integer comparison.

Comparing Operations

I compared the following operations:

max
Finding the maximum value.
sum
Computing the sum of all values; aggregated in an
int
ignoring overflows.
arithmetic
To model a less simple numeric operation I combined the values with a a handful of bit shifts and multiplications.
string
To model a complex operation that creates new objects I converted the elements to strings and xor’ed them character by character.

These were the results (for 500’000 ordered elements; in milliseconds):

max sum arithmetic string
array list array list array list array list
for 0.123 0.700 0.186 0.714 4.405 4.099 49.533 49.943
stream 0.559 3.272 1.394 3.584 4.100 7.776 52.236 64.989

This underlines how cheap comparison really is, even addition takes a whooping 50% longer. We can also see how more complex operations bring looping and streaming closer together. The difference drops from almost 400% to 25%. Similarly, the difference between arrays and lists is reduced considerably. Apparently the arithmetic and string operations are CPU bound so that resolving the references had no negative impact.

(Don’t ask me why for the arithmetic operation streaming the array’s elements is faster than looping over them. I have been banging my head against that wall for a while.)

So let’s fix the operation and have a look at the iteration mechanism.

Comparing Iteration Mechanisms

There are at least two important variables in accessing the performance of an iteration mechanism: its overhead and whether it causes boxing, which will hurt performance for memory bound operations. I decided to try and bypass boxing by executing a CPU bound operation. As we have seen above, the arithmetic operation fulfills this on my machine.

Iteration was implemented with straight forward for and for-each loops. For streams I made some additional experiments:

@Benchmark
public int array_stream() {
	// implicitly unboxed
	return Arrays
			.stream(intArray)
			.reduce(0, this::arithmeticOperation);
}

@Benchmark
public int array_stream_boxed() {
	// explicitly boxed
	return Arrays
			.stream(intArray)
			.boxed()
			.reduce(0, this::arithmeticOperation);
}

@Benchmark
public int list_stream_unbox() {
	// naively unboxed
	return intList
			.stream()
			.mapToInt(Integer::intValue)
			.reduce(0, this::arithmeticOperation);
}

@Benchmark
public int list_stream() {
	// implicitly boxed
	return intList
			.stream()
			.reduce(0, this::arithmeticOperation);
}

Here, boxing and unboxing does not relate to how the data is stored (it’s unboxed in the array and boxed in the list) but how the values are processed by the stream.

Note that

boxed
converts the
IntStream
, a specialized implementation of
Stream
that only deals with primitive
int
s, to a
Stream<Integer>
, a stream over objects. This should have a negative impact on performance but the extent depends on how well escape analysis works.

Since the list is generic (i.e. no specialized

IntArrayList
), it returns a
Stream<Integer>
. The last benchmark method calls
mapToInt
, which returns an
IntStream
. This is a naive attempt to unbox the stream elements.
arithmetic
array list
for 4.405 4.099
forEach 4.434 4.707
stream (unboxed) 4.100 4.518
stream (boxed) 7.694 7.776

Well, look at that! Apparently the naive unboxing does work (in this case). I have some vague notions why that might be the case but nothing I am able to express succinctly (or correctly). Ideas, anyone?

(Btw, all this talk about boxing/unboxing and specialized implementations makes me ever more happy that Project Valhalla is advancing so well.)

The more concrete consequence of these tests is that for CPU bound operations, streaming seems to have no considerable performance costs. After fearing a considerable disadvantage this is good to hear.

Comparing Number Of Elements

In general the results are pretty stable across runs with a varying sequence length (from 50’000 to 50’000’000). To this end I examined the normalized performance per 1’000’000 elements across those runs.

But I was pretty astonished that performance does not automatically improve with longer sequences. My simple mind assumed, that this would give the JVM the opportunity to apply more optimizations. Instead there are some notable cases were performance actually dropped:

From 500’000 to 50’000’000 Elements
method time
array_max_for
+ 44.3%
array_sum_for
+ 13.4%
list_max_for
+ 12.8%

Interesting that these are the simplest iteration mechanisms and operations.

Winners are more complex iteration mechanisms over simple operations:

From 500’000 to 50’000’000 Elements
method time
array_sum_stream
– 84.9%
list_max_stream
– 13.5%
list_sum_stream
– 7.0%

This means that the table we have seen above for 500’000 elements looks a little different for 50’000’000 (normalized to 1’000’000 elements; in milliseconds):

max sum arithmetic string
array list array list array list array list
500’000 elements
for 0.246 1.400 0.372 1.428 8.810 8.199 99.066 98.650
stream 1.118 6.544 2.788 7.168 8.200 15.552 104.472 129.978
50’000’000 elements
for 0.355 1.579 0.422 1.522 8.884 8.313 93.949 97.900
stream 1.203 3.954 0.421 6.710 8.408 15.723 96.550 117.690

We can see that there is almost no change for the arithmetic and string operations. But things changes for the simpler max and sum operations, where more elements brought the field closer together.

Reflection

All in all I’d say that there were no big revelations. We have seen that palpable differences between loops and streams exist only with the simplest operations. It was a bit surprising, though, that the gap is closing when we come into the millions of elements. So there is little need to fear a considerable slowdown when using streams.

There are still some open questions, though. The most notable: What about parallel streams? Then I am curious to find out at which operation complexity I can see the change from iteration dependent (like sum and max) to iteration independent (like arithmetic) performance. I also wonder about the impact of hardware. Sure, it will change the numbers, but will there be qualitative differences as well?

Another takeaway for me is that microbenchmarking is not so hard. Or so I think until someone points out all my errors…

The post Stream Performance appeared first on blog@CodeFX.

Stream Performance – Your Ideas

$
0
0

Last week I presented some benchmark results regarding the performance of streams in Java 8. You guys and gals were interested enough to leave some ideas what else could be profiled.

So that’s what I did and here are the results.

Overview

The last post’s prologue applies here as well. Read it to find out why all numbers lie, how I came up with them, and how you can reproduce them.

I added a new class

CommentOperationsBenchmark
to the code on GitHub that includes precisely the benchmarks discussed in this post. I also updated the Google spreadsheet to include the new numbers.

Impact Of Comparisons

Nice. Been saying for a long time writing java to being Ansi C like is faster (arrays not lists).

The next step down the rabbit hole is…

try { for(int i = 0;;) do stuff; } catch (Exception ex) { blah blah; }

Don’t check for the loop at all and just catch the exception, nice for HD pixel processing.

Chaoslab

WAT? People are doing that?

public int array_max_forWithException() {
	int m = Integer.MIN_VALUE;
	try {
		for (int i = 0; ; i++)
			if (intArray[i] > m)
				m = intArray[i];
	} catch (ArrayIndexOutOfBoundsException ex) {
		return m;
	}
}

Maybe they should stop because it looks like it doesn’t improve performance:

runtime in ms normalized to 1’000’000 elements
50’000 500’000 1’000’000 5’000’000 10’000’000 50’000’000
array_max_for
0.261 0.261 0.277 0.362 0.347 0.380
array_max_forWithException
0.265 0.265 0.273 0.358 0.347 0.386

Looks like the mechanism used to break the loop has no measurable impact. This makes sense as loop unrolling can avoid most of the comparisons and the cost of throwing an exception is in the area of a handful of microseconds and thus orders of magnitude smaller than what happens here.

And this assumes that the compiler does have even more tricks up its sleeve. Maybe it understands loops on a much more profound level and JIT compiles both methods to the same instructions.

On a side note: See how

array_max_forWithException
does not have a
return
statement after the loop?

Turns out that the Java compiler recognizes simple infinite loops. Wow! So it knows that every code path with a finite computation returns and doesn’t care about the infinite ones.

Boiled down, this compiles:

public int infiniteLoop() {
	for(;;);
}

You never cease to learn…

Impact Of Assignments

[F]or the “max” tests I expect there’s some drag from updating the local variable on every iteration. I’m curious whether finding the minimum value runs in a comparable amount of time.

b0b0b0b

This refers to the fact that all tests were run on arrays or lists whose elements equaled the index within the structure, i.e.

[0, 1, 2, ..., n-1]
. So finding the maximum indeed requires
n
assignments.

What about finding the minimum instead, which only takes one assignment?

runtime in ms normalized to 1’000’000 elements
50’000 500’000 1’000’000 5’000’000 10’000’000 50’000’000
array_max_for
0.261 0.261 0.277 0.362 0.347 0.380
array_min_for
0.264 0.260 0.280 0.353 0.348 0.359

Nope, no difference. My guess is that due to pipelining, the assignment is effectively free.

stream-performance-your-ideas

Published by Khalid Albaih under CC-BY 2.0 – field of view changed by me.

Impact Of Boxing

There were two comments regarding boxing.

It would also be nice to see the Integer[] implementation, to confirm the suspicion about boxing.

ickysticky

Ok, let’s do that. The following numbers show a for loop and a for-each loop over an

int[]
, an
Integer[]
, and a
List<Integer>
:
runtime in ms normalized to 1’000’000 elements
50’000 500’000 1’000’000 5’000’000 10’000’000 50’000’000
array_max_for
0.261 0.261 0.277 0.362 0.347 0.380
array_max_forEach
0.269 0.262 0.271 0.349 0.349 0.356
boxedArray_max_for
0.804 1.180 1.355 1.387 1.306 1.476
boxedArray_max_forEach
0.805 1.195 1.338 1.405 1.292 1.421
list_max_for
0.921 1.306 1.436 1.644 1.509 1.604
list_max_forEach
1.042 1.472 1.579 1.704 1.561 1.629

We can see clearly that the dominating indicator for the runtime is whether the data structure contains primitives or Objects. But wrapping the Integer array into a list causes an additional slowdown.

Yann Le Tallec also commented on boxing:

intList.stream().max(Math::max); incurs more unboxing than is necessary.
intList.stream().mapToInt(x -> x).max(); is about twice as fast and close to the array version.

Yann Le Tallec

This claim is in line with what we deduced in the last post: Unboxing a stream as soon as possible may improve performance.

Just to check again:

runtime in ms normalized to 1’000’000 elements (error in %)
50’000 500’000 1’000’000 5’000’000 10’000’000 50’000’000
boxedArray_max _stream
4.231 (43%) 5.715 (3%) 5.004 (27%) 5.461 (53%) 5.307 (56%) 5.507 (54%)
boxedArray_max _stream_unbox
3.367 (<1%) 3.515 (<1%) 3.548 (2%) 3.632 (1%) 3.547 (1%) 3.600 (2%)
list_max _stream
7.230 (7%) 6.492 (<1%) 5.595 (36%) 5.619 (48%) 5.852 (45%) 5.631 (51%)
list_max _stream_unbox
3.370 (<1%) 3.515 (1%) 3.527 (<1%) 3.668 (3%) 3.807 (2%) 3.702 (5%)

This seems to verify the claim. But the results look very suspicious because the errors are huge. Running these benchmarks over and over with different settings revealed a pattern:

  • Two performance levels exist, one at ~3.8 ns/op and one at ~7.5 ns/op.
  • Unboxed streams exclusively perform at the better one.
  • Individual iterations of boxed streams usually run on any of these two levels but rarely clock in at another time.
  • Most often the behavior only changes from fork to fork (i.e. from one set of iterations to the next).

This all smells suspiciously of problems with my test setup. I would be very interesting to hear from someone with any idea what is going on.

Update
Yann indeed had an idea and pointed to this interesting question and great answer on StackOverflow. Now my best guess is that boxed streams can perform on the level of unboxed ones but might fall pray to accidental deoptimizations.

Impact Of Hardware

Redditor robi2106 ran the suite for 500’000 elements on his “i5-4310 @2Ghz w 8GB DDR2”. I added the results to the spreadsheet.

It’s hard to draw conclusions from the data. Robi noted “I didn’t stop using my system for these 2.5hrs either”, which might explain the massive error bounds. They are on median 23 and on average 168 times larger than mine. (On the other hand, I continued to use my system as well but with pretty low load.)

If you squint hard enough, you could deduce that the i5-4310 is slightly faster on simple computations but lags behind on more complex ones. Parallel performance is generally as you would expect considering that the i7-4800 has twice as many cores.

Impact of Language

It would be interesting how this compares to Scala (with @specialized).

cryptos6

I still didn’t try Scala and don’t feel like working my way into it for a single benchmark. Maybe someone more experienced or less squeamish can give it a try?

Reflection

When interpreting these numbers, remember that the iterations executed an extremely cheap operation. Last time we found out that already simple arithmetic operations cause enough CPU load to almost completely offset the difference in iteration mechanisms. So, as usual, don’t optimize prematurely!

All in all I’d say: No new discoveries. But I enjoyed playing around with your ideas and if you have more, leave a comment. Or even better, try it out yourself and post the results.

The post Stream Performance – Your Ideas appeared first on blog@CodeFX.

Viewing all 68 articles
Browse latest View live