Quantcast
Channel: Java – blog@CodeFX
Viewing all 68 articles
Browse latest View live

JAR Hell

$
0
0

What is JAR hell? (Or is it classpath hell? Or dependency hell?) And which aspects are still relevant when considering modern development tools like Maven or OSGi?

Interestingly enough there seems to be no structured answer to these questions (i.e. even the second page listed no promising headlines). This post is supposed to fill that gap.

Overview

We’ll start with a list of problems that make up JAR hell, momentarily ignoring build tools and component systems. We will come back to them for the second part when we assess the current state of affairs.

JAR Hell

JAR Hell is an endearing term referring to the problems that arise from the characteristics of Java’s class loading mechanism. Some of them build on one another; others are independent.

Unexpressed Dependencies

A JAR cannot express which other JARs it depends on in a way that the JVM will understand. An external entity is required to identify and fulfill the dependencies. Developers would have to do this manually by reading the documentation, finding the correct projects, downloading the JARs and adding them to the project. Optional dependencies, where a JAR might only require another JAR if the developer wants to use certain features, further complicate the process.

The runtime will not detect unfulfilled dependencies until it needs to access them. This will lead to a

NoClassDefFoundError
crashing the running application.

Transitive Dependencies

For an application to work it might only need a handful of libraries. Each of those in turn might need a handful of other libraries, and so on. As the problem of unexpressed dependencies is compounded it becomes exponentially more labor-intensive and error-prone.

Shadowing

Sometimes different JARs on the classpath contain classes with the same fully-qualified name. This can happen for different reasons, e.g. when there are two different versions of the same library, when a fat JAR contains dependencies that are also pulled in as standalone JARs, or when a library is renamed and unknowingly added to the classpath twice.

Since classes will be loaded from the first JAR on the classpath to contain them, that variant will “shadow” all others and make them unavailable.

If the variants differ semantically, this can lead to anything from too-subtle-to-notice-misbehavior to havoc-wreaking-errors. Even worse, the form in which this problem manifests itself can seem non-deterministic. It depends on the order in which the JARs are searched. This may well differ across different environments, for example between a developer’s IDE and the production machine where the code will eventually run.

Version Conflicts

This problem arises when two required libraries depend on different, non-compatible versions of a third library.

If both versions are present on the classpath, the behavior will be unpredictable. First, because of shadowing, classes that exist in both versions will only be loaded from one of them. Worse, if a class that exists in one but not the other is accessed, that class will be loaded as well. Code calling into the library might hence find a mix of both versions.

Since non-compatible versions are required, the program will most likely not function correctly if one of them is missing. Again, this can manifests itself as unexpected behavior or as

NoClassDefFoundError
s.

Complex Class Loading

By default all application classes are loaded by the same class loader but developers are free to add additional class loaders.

This is typically done by containers like component systems and web servers. Ideally this implicit use is completely hidden from application developers but, as we know, all abstractions are leaky. In some circumstances developers might explicitly add class loaders to implement features, for example to allow their users to extend the application by loading new classes, or to be able to use conflicting versions of the same dependency.

Regardless of how multiple class loaders enter the picture, they can quickly lead to a complex mechanism that shows unexpected and hard to understand behavior.

Classpath Hell and Dependency Hell

Classpath hell and JAR hell are essentially the same thing, although the latter seems to focus a little more on the problems arising from complex class loader hierarchies. Both terms are specific to Java and the JVM.

Dependency hell, on the other hand, is a more widely used term. It describes general problems with software packages and their dependencies and applies to operating systems as well as to individual development ecosystems. Given its universality it does not cover problems specific to single systems.

From the list above it includes transitive and maybe unexpressed dependencies as well as version conflicts. Class loading and shadowing are Java specific mechanics, which would not be covered by dependency hell.

jar-hell

Published by the Wellcome Library under CC-BY 4.0

State Of Affairs

Build Tools

Looking over the list of problems we see how build tools help with some of them. They excel in making dependencies explicit so that they can hunt down each required JAR along the myriad edges of the transitive dependency tree. This largely solves the problems of unexpressed and transitive dependencies.

But Maven et al. do nothing much about shadowing. While they generally work towards reducing duplicate classes, they can not prevent them. Build tools do also not help with version conflicts except to point them out. And since class loading is a runtime construct they do not touch on it either.

Component Systems

I’ve never used a component system like OSGi or Wildfly so I can not testify to how well they work. From what they claim they seem to be able to solve most of the problems of JAR hell.

This comes with additional complexity, though, and often requires the developer to take a deeper dive into class loader mechanics. Ironically, also a point on the list above.

But regardless of whether or not component systems indeed considerably ease the pain of JAR hell, I am under the impression that a vast majority of projects does not employ them. Under this assumption said vast majority still suffers from classpath-related problems.

Where does this leave us?

Because they are not widely used, component systems leave the big picture untouched. But the ubiquity of build tools considerably changed the severity of the different circles of JAR hell.

No build tool supported project I partook in or heard of spent a mentionable amount of time dealing with problems from unexpressed or transitive dependencies. Shadowing rears its ugly head every now and then and requires a varying amount of time to be solved – but it always eventually is.

Version conflicts are the single most problematic aspect of JAR hell.

But every project sooner or later fought with dependencies on conflicting versions and had to make some hard decisions to work these problems out. Usually some desired update had to be postponed because it would force other updates that could currently not be performed.

I’d venture to say that for most applications, services, and libraries of decent size, version conflicts are one of the main deciding factors for when and how dependencies are updated. I find this intolerable.

I have too little experience with non-trivial class loader hierarchies to asses how much of a recurring problem they are. But given the fact that none of the projects I have worked on so far required them, I’d venture to say that they are not commonplace. Searching the net for reasons to use them often turns up what we already discussed: dependencies resulting in conflicting versions.

So based on my experience I’d say that conflicting versions are the single most problematic aspect of JAR hell.

Reflection

We have discussed the constituents of JAR hell:

  • unexpressed dependencies
  • transitive dependencies
  • shadowing
  • version conflicts
  • complex class loading

Based on what build tools and component systems bring to the game and how widely they are used we concluded that unexpressed and transitive dependencies are largely solved, shadowing at least eased and complex class loading not commonplace.

This leaves version conflicts as the most problematic aspect of JAR hell, influencing everyday update decisions in most projects.

In my next post I will discuss how Jigsaw addresses these issues. If you are interested, you can follow me:

twittergoogle_plusrss

And if you liked this post. why not share it with your friends and followers?

twittergoogle_plusredditlinkedin

The post JAR Hell appeared first on blog@CodeFX.


Will There Be Module Hell?

$
0
0

Project Jigsaw has ambitious objectives, one of them is “to escape the ‘JAR hell’ of the brittle and error-prone class-path mechanism”. But while it will achieve many of its goals it looks like it may fall short on this one.

So will there be module hell instead?

Overview

To know what we are talking about we’ll start with a quick recap of JAR hell. We will then discuss what aspects Jigsaw touches on and how that might not change the big picture. Lastly we will have a look at the official stance on the topic and formulate a proposal to prevent looming module hell.

JAR Hell

I discussed JAR hell in detail in my last post, which you might want to read if you haven’t already. It ends with this list of the different circles of JAR hell:

  • unexpressed and transitive dependencies
  • shadowing
  • version conflicts
  • complex class loading

Based on what build tools and component systems (called containers by the JDK developers) bring to the game and how widely they are used it concludes that unexpressed and transitive dependencies are largely solved, shadowing at least eased and complex class loading not commonplace.

This leaves version conflicts as the most problematic aspect of JAR hell, influencing everyday update decisions in many, many projects.

What Will Change With Jigsaw?

I have already written about all the features Project Jigsaw was planned to bring to Java 9 but this post will take a different angle. First, it is influenced by experiments with the current early access build and, second, it only looks at the aspects pertaining JAR/module hell.

The core concept Jigsaw brings to Java are modules. Overly simplified, a module is like a JAR with some additional information and features. Some of those pieces of information are a module’s name and the names of other modules it depends on.

Dependencies

The information is interpreted by the compiler and the JVM when they process a module. Tasked to compile or launch one, they will transitively resolve all dependencies within a universe of modules specified via the module path. Roughly said, this is analogue to a class path scan but now we are looking for entire modules instead of individual classes and, in case of the JVM, we are doing it at launch-time not at runtime.

Jigsaw solves the problem of unexpressed and endlessly transitive dependencies.

Resolving the transitive dependencies of a module fails with an error if not all modules are found on the module path. This clearly solves the problem of unexpressed and endlessly transitive dependencies.

I see it as a material benefit that the Java language now officially knows about dependencies and that all the tools, starting with the compiler and JVM, understand and work with them! This should not be understated.

But I assume it will have little effect on the typical developer’s everyday work since this is already sufficiently addressed by existing infrastructure, i.e. the build tool.

This becomes even clearer when we consider where the module information will come from. It already exists as part of the build information, e.g. in the

pom.xml
. It would be redundant to additionally specify names and dependencies for the module system and it is hence assumed that the build tool will use its information to automatically generate the module information. (I am sure Mark Reinhold or Alan Bateman repeatedly stated this but can’t find a quote right now. Store this as hearsay for now.)

Shadowing

Jigsaw eliminates the problem of shadowing:

The module system ensures that every dependence is fulfilled by precisely one other module, […] that every module reads at most one module defining a given package, and that modules defining identically-named packages do not interfere with each other.

State Of The Module System – Readibility (Sep 2015)

To be more precise, the module system quits and reports an error as soon as it encounters ambiguous situations, e.g. two modules exporting the same package to the same module.

Version Conflicts

We identified conflicting versions of third party libraries as the most daunting aspect of JAR hell. The most straight forward solution would be a module system able to load different versions of the same module. It would have to prove that these versions can not interact but given the strong promises regarding encapsulation and readability it looks like it should be able to do that.

Now, here is the problem:

It is not necessary to support more than one version of a module within a single configuration.

Java Platform Module System: Requirements – Multiple Versions (Apr 2015)

Indeed the current build neither creates nor understands module version information.

For some time it looked like there would be workarounds. The ugliest but most promising one renames the conflicting artifacts so that they are no longer two different versions of the same module but appear as two different modules, coincidently exporting the same packages.

Jigsaw does nothing to help with the problem of conflicting versions.

But this approach fails. Apparently ensuring “that modules defining identically-named packages do not interfere with each other” is solved by roundly rejecting any launch configuration where two modules export the same packages. Even if no module would read them both!

So apparently Jigsaw does nothing to help with the problem of conflicting versions unless one resorts to component-system-like behavior at runtime. What a disappointment!

Complex Class Loading

Discussing how modules and class loaders interact and how that might change the complexity of class loading deserves its own post. Preferably by someone more experienced with class loaders.

Let’s just have a look at the basics.

The module system, in fact, places few restrictions on the relationships between modules and class loaders. A class loader can load types from one module or from many modules, so long as the modules do not interfere with each other and the types in any particular module are loaded by just one loader.

State Of The Module System – Class Loaders (Sep 2015)

So there will be a 1:n-relationship of class loaders to modules.

Then there is the new notion of layers, which component systems can use to structure class loader hierarchies.

A layer encapsulates a module graph and a mapping from each module in that graph to a class loader. The boot layer is created by the Java virtual machine at startup by resolving the application’s initial module against the observable modules built-in to the run-time environment and also against those found on the module path.

[…]

Layers can be stacked: A new layer can be built on top of the boot layer, and then another layer can be built on top of that. As a result of the normal resolution process the modules in a given layer can read modules in that layer or in any lower layer.

State Of The Module System – Layers (Sep 2015)

So while the class loader system gets more elements, the mechanics and best practices might improve, possibly resulting in less complexity of well designed systems. At the same time the new fail-fast properties regarding dependencies and shadowing will make problems more obvious and troubleshooting easier.

So all in all it looks like this problem does not go away but becomes less vexing.

Module Hell?

With dependencies and shadowing solved and class loading improved why would I talk about module hell? Just because of version conflicts? Short answer: Yes!

If Jigsaw wants to solve JAR hell, it has to address version conflicts.

Long answer: Take a look at the search results for JAR hell – the topic of conflicting versions is by far the most common motivator for discussing this. Of all the aspects we talked about so far it is the only one that commonly plagues the majority of projects (at least by my conjecture).

So if Jigsaw wants to solve JAR hell, it has to address version conflicts! Otherwise not much might change for many projects. They will still struggle with it and they will continue to get themselves into custom built class loader nightmares. Sounds like module hell to me.

module-hell

Published by the Wellcome Library under CC-BY 4.0 – flipped by me.
Yes, it looks just like JAR hell – that’s because module hell will be so similar.

Official Stance On Versions

So what is the official stance on the topic of versions?

Multiple Versions

It is not necessary to support more than one version of a module within a single configuration.

Most applications are not containers and, since they currently rely upon the class path, do not require the ability to load multiple versions of a module.

Java Platform Module System: Requirements – Multiple Versions (Apr 2015)

I strongly disagree with this assessment! As I said before, I am convinced that this is a problem for pretty much any project. In fact, I believe that the quoted rationale reverses cause and effect.

In my opinion it’s more like this:

Most applications decide against the complexity of running a container and, since they are consequently stuck with the class path, are not able to load multiple versions of a module.

Version Information

And why does the current early access build go even further and completely abandon version information?

A module’s declaration does not include a version string, nor constraints upon the version strings of the modules upon which it depends. This is intentional: It is not a goal of the module system to solve the version-selection problem, which is best left to build tools and container applications.

State Of The Module System – Module Declarations (Sep 2015)

It is easy to agree with the premise. Many tools have tackled the non-trivial problem of version selection and there is no need to bake one of those solutions into the VM.

But I fail to see what this has to do with completely ignoring version information. And it does also not exclude letting an external tool select the versions and pass its solution to the launching VM.

Conflicting Versions

Summarized, the official stance regarding conflicting versions is this:

The module system isn’t suggesting any solutions, it is instead leaving this problem to the build tools and containers.

Alan Bateman on Jigsaw-Dev (Oct 2015)

Which sounds great except that the module system does currently not provide any new mechanisms for build tools to solve this longstanding and fundamental problem.

Proposal

Given only an initial module and a universe of modules to resolve dependencies within, the current JVM refuses to launch if any ambiguities, e.g. two versions of the same module, are encountered. This is very reasonable behavior and I would not change it.

My proposal is to enable developers and build tools to pass additional information that solve ambiguous situations. (While I thought through the proposal Ali Ebrahimi independently made the same one.)

How

The two common ways to pass such information are the command line and configuration files.

Command line arguments would have to be repeated on every launch. Depending on how comprehensive the information and how large the project is, this could be tedious.

A configuration file could be created by the build tool and later specified via command line. This looks like the best approach to me.

What

Currently, the initial module and all transitive dependencies are resolved as a single configuration, which is used to create a single layer. But it is already straight forward to load multiple versions of the same module into different layers at runtime. (This is what component systems might do in the future.)

All that is needed are explicit configurations with multiple layers.

So all that is needed is to allow users to explicitly specify configurations with multiple layers. The JVM would parse this when it launches and create the layers accordingly.

Looking at the current goals, requirements and capabilities this fits in quite nicely. Especially since it does not implement version selection and does not require new module system capabilities. And it would be a nice feature to enable complex configurations at launch-time regardless of version conflicts. I am sure there are other use cases.

As an add-on, it might be interesting to think about partial configurations. They would only specify those parts of the module graph that are of special interest, e.g. because of conflicting versions. Everything else could be resolved relative to them.

Demarcation

This is not meant to replace existing component systems! Users of OSGi, Wildfly, … most likely have more reasons to use them than just version conflicts. Instead it would be an entry-level mechanism usable by every project out there without much additional complexity.

Reflection

In the first part we have assessed how Project Jigsaw addresses JAR hell:

  • unexpressed and transitive dependencies: solved
  • shadowing: solved
  • version conflicts: untouched
  • complex class loading: remains to be seen

Since version conflicts are the most relevant aspect of JAR hell today, we concluded that they will give rise to module hell tomorrow.

To prevent that, a proposal was made that requires no notable changes to the module system and utilizes already existing features:

Enable explicitly specified configurations with multiple layers.

You can give this proposal more weight by sharing it with the community:

twittergoogle_plusredditlinkedin

If you care about the topic, you might want to watch or participate in the ongoing discussions on the Jigsaw-Dev and JPMS-Spec mailing lists.

The post Will There Be Module Hell? appeared first on blog@CodeFX.

JavaOne 2015: Prepare For JDK 9

$
0
0

JavaOne 2015 saw a series of talks by the Project Jigsaw team about modularity in Java 9. They are all very interesting and full of valuable information and I urge every Java developer to watch them.

Beyond that I want to give the community a way to search and reference them, so I summarize them here:

  • Prepare For JDK 9
  • Introduction To Modular Development (upcoming)
  • Advanced Modular Development (upcoming)
  • Under the Hood Of Project Jigsaw (upcoming)

I made an effort to link to as many external resources as possible to keep the individual posts short. The play icons will take you straight to the corresponding point in the ten hour long video streams that Oracle put online for each room and day. (Great format, guys!) Not only did they (so far) fumble the cutting, they also seem to have resorted to low-volume mono sound so make sure to crank up the volume.

Let’s start with preparations for JDK 9!

Overview

  • Content: What to expect when moving from JDK 8 to JDK 9
  • Speaker: Alan Bateman
  • Links: Video and Slides

Background

Alan Bateman begins the talk by giving some background information.

JDK 9 And Project Jigsaw Goals

A quick recap of Jigsaw’s goals. For more details, see my post about them.

Modularity Landscape

A short overview over the multitude of Java Specification Requests (JSRs) and JDK Enhancement Proposals (JEPs) that cover Project Jigsaw’s efforts.

Compatibility

Bateman categorizes the kinds of APIs exposed by the JDK:

  • Supported and intended for external use:
    • JCP standard:
      java.*
      ,
      javax.*
    • JDK-specific API: some
      com.sun.*
      , some
      jdk.*
  • Not intended for external use:
    sun.*
    , rest
    com.sun.*
    , rest
    jdk.*

He points out that if an application uses only supported APIs and works on Java N, it should also work on Java N+1. Java 9 will make use of this and change/remove APIs that have been internal or deprecated in Java 8.

He then goes into managing (in)compatibilities and mentions a post by Joseph Darcy, Kinds of Compatibility: Source, Binary, and Behavioral, that he recommends to read. It sheds some light on the different aspects of compatibility and hence, by extension, the complexity of evolving Java.

Incompatible Changes In JDK 9

The bulk of this talk covers the different incompatibilities Java 9 will incur. This is largely covered by my post about how Java 9 may break your code.

Encapsulating JDK-Internal APIs

Bateman starts by presenting some data on uses of internal APIs. Details can be found on slide 16 but the gist is that only a couple of APIs are frequently used.

APIs that are not used in the wild or are only used for convenience are non-critical. By default, these will be encapsulated in Java 9. Those in actual use for which it would be hard or impossible to create implementations outside of the JDK are deemed critical. If alternatives exist, they will also be encapsulated.

The critical APIs without alternative will be deprecated in Java 9, with the plan to remove them in 10. JEP 260 proposes the following APIs for this:

  • sun.misc.Unsafe
  • sun.misc.{Signal,SignalHandler}
  • sun.misc.Cleaner
  • sun.reflect.Reflection::getCallerClass
  • sun.reflect.ReflectionFactory

If you miss something on the list, contact the Jigsaw team and argue your case (and bring data to support it).

He then goes into how jdeps can be used to find the uses of internal APIs. This part also contains some examples of what will happen if problematic code is run on JDK 9 (start here) and how to solve such issues (start here).

Removing API

This is quick. The following six methods will not be present in Java 9:

  • java.util.logging.LogManager::addPropertyChangeListener
  • java.util.logging.LogManager::removePropertyChangeListener
  • java.util.jar.Pack200.Packer::addPropertyChangeListener
  • java.util.jar.Pack200.Packer::removePropertyChangeListener
  • java.util.jar.Pack200.Unpacker::addPropertyChangeListener
  • java.util.jar.Pack200.Unpacker::removePropertyChangeListener

Change Of JDK/JRE Binary Structure

By merging JDK and JRE into a common structure, several existing practices will stop working.

Bateman describes some of the problems with the old run-time image directory layout and presents how the new one will look. Slides 29 and 30 juxtapose both layouts:

javaone-project-jigsaw-jdk-structure

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Since Java 7 there is an API with which tools can interact with these files regardless of the physical layout. This also means that version N can access version N+1 files.

Removed Mechanisms

As I described earlier, the endorsed standards override mechanism and the extension mechanism will be removed. They will be replaced by upgradeable modules.

Other Changes

See JEP 261 (section Risks And Assumptions) for a full list of changes. Bateman names a few:

  • Application and extension class loaders are no longer instances of
    java.net.URLClassLoader
    .
  • Command line arguments
    -Xbootclasspath
    and
    -Xbootclasspath/p
    are removed.
  • System property
    sun.boot.class.path
    is removed.

Non-Jigsaw Incompatibilities in Java 9

Bateman also shorty addresses two issues that are not connected to Project Jigsaw but will show up in Java 9 and might break some code:

  • The version-string schema changes. For details see JEP 223 – it also has a nice comparison of current and future version strings.
  • Underscore is no longer allowed as a one-character identifier.

javaone-project-jigsaw-prepare-sf

Published by Ricardo Villar under CC-BY-NC 2.0.

What Can You Do To Prepare For Java 9?

There are a couple of preparatory steps you can take:

  • Check code for usages of JDK-internal APIs with jdeps.
  • Check code that might be sensitive to the version-string schema change.
  • Check code for uses of underscore as an identifier.
  • If you develop tools, check code for a dependency on rt.jar, tools.jar, or the runtime-image layout in general.
  • Test the JDK 9 EA builds and Project Jigsaw EA builds.

Make sure to report any unexpected or overly problematic findings back to the Jigsaw mailing list.

Questions

There were a couple of question, of which I picked the two most interesting ones.

How Can Libraries Target Java 8 and Java 9?

JEP 238 will introduce multi-release JARs, i.e. JARs that can contain specialized code for specific Java releases.

When Does Support For Java 8 End?

Nobody on stage knew the exact answer so they pointed to the documentation of Oracle’s update policy on oracle.com. The current answer is: Not before September 2017.

The post JavaOne 2015:
Prepare For JDK 9
appeared first on blog@CodeFX.

JavaOne 2015: Introduction to Modular Development

$
0
0

JavaOne 2015 saw a series of talks by the Project Jigsaw team about modularity in Java 9. They are all very interesting and full of valuable information and I urge every Java developer to watch them.

Beyond that I want to give the community a way to search and reference them, so I summarize them here:

I made an effort to link to as many external resources as possible to keep the individual posts short. The play icons will take you straight to the corresponding point in the ten hour long video streams that Oracle put online for each room and day. (Great format, guys!) Not only did they (so far) fumble the cutting, they also seem to have resorted to low-volume mono sound so make sure to crank up the volume.

After preparing for JDK 9 let’s continue with an introduction to modular development!

Overview

  • Content: Introduction to the module system and the concept of modules
  • Speaker: Alan Bateman
  • Links: Video and Slides

What Is A Module?

Alan Bateman starts by explaining the basic concept of modules as named, self describing collections of code and data. This part is more than covered by The State Of The Module System (SOTMS):

Platform Modules

Platform modules are the ones that make up the JDK – SOTMS explains them as well. Their dependency graph is shown on Slide 19:

javaone-project-jigsaw-platform-modules

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

A very similar graph that includes the OpenJDK-specific modules can be found in JEP 200.

Bateman also mentions

java -listmods
, which will list all the available platform modules. (Note that there are discussions on the mailing list to rename the flag.)

Command Line

After explaining the module path Bateman gives an introduction to the various new command line options. The Jigsaw quick-start guide has us covered here.

An option the guide does not mention is

java -Xdiag:resolver
, which outputs additional information regarding module dependency resolution.

Packaging As Modular JAR

Modules can be packaged into so called modular JARs, which the quick-start guide covers as well.

Bateman stresses that such JARs work both on the module path in Java 9 as well as on the class path in Java 8 (as long as they target 1.8). He also quickly shows how the module path and class path can be mixed to launch a program.

Linking

Linking allows to bundle some modules and all of their transitive dependencies into a run-time image. If the initial modules are platform modules, the result will essentially be a variant of the JDK. This is in fact how the current Jigsaw builds are being created.

This is done with the new tool jlink and the quick-start guide shows how to do it.

javaone-project-jigsaw-introduction-sf

Published by Christian Arballo under CC-BY-NC 2.0.

Questions

There were a couple of interesting questions.

Is There Any Solution For Optional Dependencies?

In earlier Jigsaw prototypes there was a notion of optional dependencies. Working out the precise semantics turned out to be hard so the feature was not implemented. Research showed that optional dependencies can typically be refactored to services that might or might not be present at runtime.

Services are covered by the quick-start guide.

Can jlink Cross-Compile? Can It Create A Self-Executing File?

“Yes” and “Not directly but other tools will be improved so that will be doable in the future”.

Can Modules Be Versioned?

Long story short: “Versions are hard, we don’t want to replicate build tool functionality, so ‘No'”. For more, listen to Mark Reinhold’s full answer.

Can jlink Use Cross-Module Optimizations?

Yes.

How Does JavaDoc Handle Modules?

JavaDoc will be upgraded so that it understands what modules are. It will display them along with packages and classes. And it will also by default not generate documentation for types in not-exported packages.

The post JavaOne 2015:
Introduction to Modular Development
appeared first on blog@CodeFX.

JavaOne 2015: Advanced Modular Development

$
0
0

JavaOne 2015 saw a series of talks by the Project Jigsaw team about modularity in Java 9. They are all very interesting and full of valuable information and I urge every Java developer to watch them.

Beyond that I want to give the community a way to search and reference them, so I summarize them here:

I made an effort to link to as many external resources as possible to keep the individual posts short. The play icons will take you straight to the corresponding point in the ten hour long video streams that Oracle put online for each room and day. (Great format, guys!) Not only did they (so far) fumble the cutting, they also seem to have resorted to low-volume mono sound so make sure to crank up the volume.

Let’s build on the introduction with some advanced modular development and migration advice!

Overview

  • Content: How to migrate applications and libraries to the module system
  • Speaker: Mark Reinhold, Alex Buckley, Alan Bateman
  • Links: Video and Slides

Introductory Remarks

Mark Reinhold begins by emphasizing that the current prototype is still a work in progress, a proposal with some rough edges and missing parts. The reason the Jigsaw team is spending so much time talking about it is to spread the word and gather feedback.

So try it out and give feedback!

Application Migration

javaone-project-jigsaw-migration-scenario

Copyright © 2015, Oracle and/or its affiliates.
All rights reserved.

In the talk’s first part, Alex Buckley covers how to migrate an application to the module system. He discusses this under the assumption that the application’s dependencies are not yet published as modules. (Because if they were, this would be fairly simple and straight-forward.)

Top-Down Migration

Whenever a JAR is turned into a module, two questions have to be answered:

  • What does the module require?
  • What does the module export?

The first question can be answered with the help of jdeps. The second requires the module’s authors to consciously decide which packages contain its public API.

Both answers must then be poured into the

module-info.java
as explained in the introduction to modular development and the quick-start guide.

Automatic Modules

Buckley now addresses the intrinsic problem of his example: What to do with the application’s dependencies that were not yet published as modules? The solution are automatic modules.

javaone-project-jigsaw-automatic-modules

Copyright © 2015, Oracle and/or its affiliates.
All rights reserved.

Simply by placing a JAR on the module path instead of the class path it becomes an automatic module. This is a full fledged module but requires no changes to the JAR itself. Its name is derived from the JAR name and it exports all its packages. It can read all modules on the module path (by implicitly requiring them all) and all classes on the class path.

This provides the maximum compatibility surface for migrating JAR files.

System Structure

Even with the slightly exceptional automatic modules, which add a lot of edges to the module path, the situation is better than it was on the class path. There everything could access everything else and the JVM simply erased any system structure envisioned by the developers.

Compiling And Running The Example

The example is compiled and run with the commands covered by the quick-start guide.

Buckley also demonstrates the javac flag

-modulesourcepath
to enable multi-module compilation. It requires a single directory and expects it to contain one subdirectory per module. Each module directory can then contain source files and other resources required to build the module. This corresponds to the new directory schema proposed by JEP 201 and

Summary

For top-down migration the application’s JARs are transformed into modules by creating

module-info.java
files for them. The dependencies are turned into automatic modules by putting them on the module path instead of the class path.

Library Migration

Alan Bateman approaches the same scene but from a different perspective. He is showing how to convert libraries to modules without requiring the application’s using them to do the same.

Bottom-Up Migration

For libraries the same questions need to be answered as for application modules:

  • What does the module require?
  • What does the module export?

Again, jdeps is brought out to answer the first. But here Bateman also demonstrates how the flag

-genmoduleinfo
can be used to generate a first draft of the
module-info.java
files. In this mode jdeps derives the module name from the JAR name, requires the correct dependencies and simply exports all packages. The module authors should then decide which exports to take out.

Bateman then compiles and packages the modules like described above and in the quick-start guide.

Mixing Class Path And Module Path

The application is not yet converted to modules, which has two implications:

javaone-project-jigsaw-library-modules

Copyright © 2015, Oracle and/or its affiliates.
All rights reserved.

  • Both the class path and the module path are required to run it.
  • The application can not express which modules it depends on.

Mixing class and module path on the command line is verbose but straight forward. On top of that the flag

-addmods
must be used to specify the root modules against which the module system has to resolve the module path. In the running examples, this would be the freshly converted library modules.

Advanced Migration Challenges

In the presented example one of the newly created library modules uses reflection to access the application’s code. This is problematic because modules can only access code from modules on which they depend and clearly libraries can not depend on the applications using them.

The solution is

addReads
on the new class
java.lang.Module
. It can be used to allow the module calling the method to read a specified module. To get a module call

Class.getModule()
.

Putting It All Together

javaone-project-jigsaw-migration-done

Copyright © 2015, Oracle and/or its affiliates.
All rights reserved.

Putting both approaches together results in a nice dependency graph and super short command to launch the application.

Bateman then goes on to package the resulting application in a minimal self-contained run time image with jlink as described in the introduction to modular development.

Summary

In summary, the two approaches show how application and library maintainers can modularize their projects independently and at their own pace. But note that some code changes may be required.

Go forth and modularize!

javaone-project-jigsaw-advanced-sf

Published by Joe Parks under CC-BY-NC 2.0.

Questions

The vast majority of questions were interesting so here we go.

Can Someone Override Your Security Packages?

The Jigsaw team is prototyping an optional verification step. At build time, it would compute a module’s strong hash and bake that into the modules depending on it. It would then validate the hash at launch time.

Is It Possible To Access Non-Exported Types?

Not from code. If certain types must be available in this way (e.g. for a dependency injection framework), they have to be exported. There is intentionally no way to break module encapsulation with reflection.

But it is possible with the command line flag

-XaddExports
as explained in JEP 261 under section Breaking Encapsulation.

Is Jigsaw Compatible With OSGi?

No, but OSGi will run on top of it.

What About JNI? Can Modules Contain DLLs, SOs?

JNI works exactly as before and modules can contain all kinds of resources including OS-specific libraries.

Why Is The Main Class Not Specified in
module-info.java
?

Because it’s not an essential information for the compiler and the JVM. In fact, it isn’t even an essential property of the program as it might change for different deployments of the same project version.

How To Express Dependencies On Unmodularized JARs?

The library can require its dependencies as shown above. If those were not yet modularized, the documentation should mention that they have to be added to the module path (as opposed to the class path) nonetheless. They would then be turned into automatic modules, which makes them available to the library. Of course the class path remains an exit hatch and the library can always be put there and everything works as before.

Alternatively, Buckley suggests to use reflection if the collaboration between the projects is limited. The library would then not have to require its dependency and instead start reading it at runtime regardless of whether it is placed on the class or the module path.

What About Tools Like Maven?

The Jigsaw team hopes to work with all tool vendors to enable support but there are no plans at the moment because it is still fairly early.

Buckley tries to manage expectations by describing the incorporation of the module system into tools as a distributed problem. The Java 9 release should not be seen as the point at which everything must cooperate perfectly but as the start to getting everything cooperating.

What About (Context-) Class Loaders?

The module system is almost orthogonal to class loaders and there should be no problematic interaction. Loaders are describes as a low-level mechanisms while the modules are a higher abstraction.

For more details wait for the upcoming summary of a peek under the hood of Project Jigsaw.

Is It Possible To Package Multiple Modules Into A Single JAR?

Or on other words, will it be possible to build a fat/uber JAR containing several modules, typically all of its dependencies?

For now there is no support but creating an image might be a solution for some of the use cases. Reinhold promises to think about it as this question has come up repeatedly.

The post JavaOne 2015:
Advanced Modular Development
appeared first on blog@CodeFX.

JavaOne 2015: Under The Hood Of Project Jigsaw

$
0
0

JavaOne 2015 saw a series of talks by the Project Jigsaw team about modularity in Java 9. They are all very interesting and full of valuable information and I urge every Java developer to watch them.

Beyond that I want to give the community a way to search and reference them, so I summarize them here:

I made an effort to link to as many external resources as possible to keep the individual posts short. The play icons will take you straight to the corresponding point in the ten hour long video streams that Oracle put online for each room and day. (Great format, guys!) Not only did they (so far) fumble the cutting, they also seem to have resorted to low-volume mono sound so make sure to crank up the volume.

Let’s top the series off with a peek under the hood of the Java Platform Module System!

Overview

  • Content: A technical investigation of the module system’s mechanisms
  • Speaker: Alex Buckley
  • Links: Video and Slides

Accessibility & Readability

Alex Buckley reiterates how

public
no longer means “publicly accessible to everyone”. By default, public classes will be inaccessible outside of their module. Only by exporting the containing package do they become accessible. If the export is qualified, then this is only true for the specificly mentioned modules.

Accessibility and Class Loaders

Class loaders can be used to prevent classes from one package to see classes from another. Lacking a better mechanism this is currently used to simulate limited accessibility and strong encapsulation.

But if looked at closer, it becomes apparent that this mechanism can not fulfill those promises. As soon as a piece of codes gets hold of a

Class
object, it can use it to create more instances via reflection. It is also not a feasible solution to use inside the JVM as spinning up a complex web of class loaders is a compatibility nightmare.

Strong encapsulation is about being able to prevent access even if the accessing class and the target class are in the same class loader and even if someone is using core reflection to manipulate class objects.

The Role Of Readability

The essential concepts of readability and accessibility, as defined in The State Of The Module System (SOTMS), are independent of class loaders. That means that accessibility works at compile time, when there aren’t any class loaders, and that one can reason about it solely based on the static information in

module-info.java
.
javaone-project-jigsaw-implied-readability

Copyright © 2015, Oracle and/or its affiliates.
All rights reserved.

Buckley goes on to discuss implied readability and how it can be used to refactor modules. This feature allows a module to split off some of its functionality into a separate module without its clients noticing. While this allows what Buckley calls “downwards decomposability” a module can not be deleted and have its role filled by another module. So modules can not be merged without breaking clients.

Core Reflection

Strong encapsulation is upheld by reflection. It is not possible to access a member of a type in a non-exported package. Not even with setAccessible(true), which in fact performs the same checks as compiler and VM to check accessibility.

Different Kinds Of Modules

Buckley presents the three kinds of modules:

Named Modules
Contain a
module-info.java
and are loaded from the module path. Only their exported packages are accessible and they can only require and thus read other named modules (which excludes the unnamed module).
Unnamed Module
The unnamed module contains of all the classes from the class path. All packages are exported and they read all modules.
Automatic Modules
Automatic, or automatically named, modules are JARs without a
module-info.java
that were loaded from the module path. Their name is derived from the JAR file name, they export every package and read any module, including other automatic ones and the unnamed module.

So like described in application migration, automatic modules are the only way for modules to read unmodularized JARs. This apparent detour was chosen to prevent the well-formed module graph to depend on the arbitrary contents of the class path. Interestingly enough, a named module can

require public
an automatic module, which means it exports all of its types to its dependencies as discussed in implied readability.

Buckley describes automatic modules as a necessary evil akin to raw types in generics. They are necessary because they enable migration and evil because of their hidden complexity, which hopefully doesn’t leak to users.

javaone-project-jigsaw-under-the-hood-sf

Published by BKL under CC-BY-NC 2.0.

Loaders And Layers

Class Loading

The talk’s third part starts with a clear message: Class loading doesn’t change! In fact the module system operates beneath the class loading mechanism and the known three loaders continue to work as they do now except for some implementation details.

javaone-project-jigsaw-layers-class-loaders

Copyright © 2015, Oracle and/or its affiliates.
All rights reserved.

Most JDK modules will be loaded by the boot loader, a few by the extension loader, and a handful of tool-related modules by the application/system loader. Since the boot loader runs will all permissions, security can be improved by moving modules out of the boot loader. This deprivileging work will continue throughout JDK 9 and 10.

Layers

Buckley goes on to describe layers, which SOTMS mentions only briefly. A layer is created from a module graph and a mapping from modules to class loaders – see its documentation for details. Creating a layer informs the JVM about the modules and their contained packages so that it knows where to load classes from when they are required.

The module system enforces the following constraints on the module graph and the mapping from modules to class loaders.

Well-Formed Graphs

Module graphs are directed graphs and they must be acyclic. Additionally, a module can not read two or more modules that export the same package.

Well-Formed Maps

Because a class loader can not load two classes with the same fully qualified name, the broad decision was made that any two modules with the same package (exported or concealed) can not be mapped to the same class loader. Trying this fails with an exception.

javaone-project-jigsaw-layers-loader-delegation

Copyright © 2015, Oracle and/or its affiliates.
All rights reserved.

Furthermore at runtime class loader delegation must respect module readability.

Since the module graph is acyclic one might think that class loader delegation is, too. Buckley gives an example from the JDK where this is assumption is wrong. It boils down to having three modules, each reading the next but a single class loader for the first and last, thus creating a cycle between the two loaders.

Next, Buckley discusses the problem of split packages, which occurs when multiple loaders define classes for the same package. Since class loader delegation will respect module readability, a loader can not delegate to two different loaders for the same package.

He presents the example of JSR 305 and Xerxes in detail, which is worth watching.

Layers Of Layers

Layers can, surprise, be layered. Each layer, except the boot layer, has a parent so a layer tree emerges. Modules in one layer can read modules in their ancestor layers.

This gives frameworks the freedom to organize modules at runtime without upsetting their traditional uses of class loaders. It can also be used to allow multiple versions of the same module, partly addressing a recent pet peeve of mine. Because multiple versions only work out when controlled by a dedicated system, like an application server, this can not be achieved via command line.

Just as modules wrap up coherent sets of packages and interact with the VM’s accessibility mechanism, layers wrap up coherent sets of modules and interact with the class loader’s visibility mechanism. It will be up to frameworks to make use of layers in the next 20 years just as they have made use of class loaders in the first 20 years.

Summary Of Summaries

  • In Java 9 there is strong encapsulation of modules by the compiler, VM, reflection.
  • Unnamed and automatic modules help with migration.
  • The system is safe by construction – no cycles or split packages.

javaone-project-jigsaw-seatbelt

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Questions

Are Resources Also Strongly Encapsulated?

Yes, but there are discussions going on about that very topic. Follow the mailing list for details.

Can Non-Exported Instances Be Accessed Through An Exported Interface?

Yes! (Buckley was visibly excited to get a chance to answer that question.)

So a module that exports some type and returns instances of it from a method can instead return instances of any non-exported subtype. The caller can interact with it as long she does not try to cast it to the encapsulated subtype or use reflection to, e.g., create a new instance from it.

What Are The Performance Implications?

There is so much to say about that, Buckley can not go into it. Regarding accessibility, the JVM creates a nice lookup table and the checks are, performance-wise, basically a no-op.

What About Monkey Patching?

Assume there is a known bug in a library not under the one’s control. Before Jigsaw, it was possible to fix the bug locally, put the new class on the class path and have it shadow the original one. Will that still work from Java 9 on?

Yes, as long as a module does not export the package containing the monkey patched class.

When Are The Access Checks Performed For Reflection?

The question was actually somewhat different but Buckley answered this one instead.

The general answer is, “as late/lazily as possible”. So

getClass
will always return an instance (even if the class is not accessible) and only when one uses it to access fields, methods or constructors that are not accessible, are the checks performed and possible exceptions thrown.

So Many More…

There a lot of other questions being asked. If you are interested in this topic, make sure to check them out.

The post JavaOne 2015:
Under The Hood Of Project Jigsaw
appeared first on blog@CodeFX.

Six-Month Delay Of Java 9 Release

$
0
0

Yesterday evening Mark Reinhold, Chief Architect of the Java Platform Group at Oracle and Specification Lead of JSR 376, for which Project Jigsaw is the current prototype, proposed a six-month extension of the schedules for the JSR and for Java 9 release.

Proposal

The interleaved schedule proposals for modularity and Java 9 look as follows:

  • 2016-01 — JSR 376: Early Draft Review
  • 2016-05 — JDK 9: Feature Complete
  • 2016-06 — JSR 376: Public Review
  • 2016-08 — JDK 9: Rampdown Start
  • 2016-10 — JDK 9: Zero Bug Bounce
  • 2016-12 — JSR 376: Proposed Final Draft
  • 2016-12 — JDK 9: Rampdown Phase 2
  • 2017-01 — JDK 9: Final Release Candidate
  • 2017-03 — JSR 376: Final Release
  • 2017-03 — JDK 9: General Availability

The definition for the JDK 9 milestones are the same as for JDK 8 and worth a read. Especially feature complete, which means that “[a]ll features have been implemented and integrated into the master forest, together with unit tests.” It does not mean that development stops. Instead reckless improvements are still possible, at least until rampdown phases start, in which “increasing levels of scrutiny are applied to incoming changes.”

The proposals are up for debate until December 8th but I’d be very surprised to not see them become the new schedule.

delay-of-Java-9-release-proposed

Published by Lee Jordan under CC-BY-SA 2.0.

Reason

The reasons for this delay clearly are JSR 376 and Jigsaw:

In the current JDK 9 schedule [7] the Feature Complete milestone is set for 10 December, less than two weeks from today, but Jigsaw needs more time. The JSR 376 EG has not yet published an Early Draft Review specification, the volume of interest and the high quality of the feedback received over the last two months suggests that there will be much more to come, and we want to ensure that the maintainers of the essential build tools and IDEs have adequate time to design and implement good support for modular development.

Mark Reinhold – 1 Dec 2015

This makes a lot of sense. And while I think highly of the current prototype there is still lots of work to do. The additional six months will give the engineers more time to address the various problems and improve migration compatibility.

As with previous schedule changes, the intent here is not to open the gates to a flood of new features unrelated to Jigsaw, nor to permit the scope of existing features to grow without bound. It would be best to use the additional time to stabilize, polish, and fine-tune the features that we already have rather than add a bunch of new ones. The later FC milestone does apply to all features, however, so reasonable proposals to target additional JEPs to JDK 9 will be considered so long as they do not add undue risk to the overall release.

Mark Reinhold – 1 Dec 2015

The additional time might help in convincing some of the more critical members of the community. As it currently stands there are even members of the JSR 376 expert group which are openly opposing the path Jigsaw took. The common counter proposal is solely based on class loaders as used by, e.g., OSGi implementations.

The post Six-Month Delay Of Java 9 Release appeared first on blog@CodeFX.

Jigsaw Hands-On Guide

$
0
0

I originally wrote this post for the Java Advent Calendar, where it was published on December 10th.

Project Jigsaw will bring modularization to the Java platform and according to the original plan it was going to be feature complete on the 10th of December. So here we are but where is Jigsaw?

Surely a lot happened in the last six months: The prototype came out, the looming removal of internal APIs caused quite a ruckus, the mailing list is full of critical discussions about the project’s design decisions, and JavaOne saw a series of great introductory talks by the Jigsaw team. And then Java 9 got delayed for half year due to Jigsaw.

But let’s ignore all of that for now and just focus on the code. In this post we’ll take an existing demo application and modularize it with Java 9. If you want to follow along, head over to GitHub, where all of the code can be found. The setup instructions are important to get the scripts running with Java 9. For brevity, I removed the prefix

org.codefx.demo
from all package, module, and folder names an this article.

The Application Before Jigsaw

Even though I do my best to ignore the whole Christmas kerfuffle, it seemed prudent to have the demo uphold the spirit of the season. So it models an advent calendar:

  • There is a calendar, which has 24 calendar sheets.
  • Each sheet knows its day of the month and contains a surprise.
  • The death march towards Christmas is symbolized by printing the sheets (and thus the surprises) to the console.

Of course the calendar needs to be created first. It can do that by itself but it needs a way to create surprises. To this end it gets handed a list of surprise factories. This is what the

main
method looks like:

public static void main(String[] args) {
	List<SurpriseFactory> surpriseFactories = Arrays.asList(
			new ChocolateFactory(),
			new QuoteFactory()
	);
	Calendar calendar =
		Calendar.createWithSurprises(surpriseFactories);
	System.out.println(calendar.asText());
}

The initial state of the project is by no means the best of what is possible before Jigsaw. Quite the contrary, it is a simplistic starting point. It consists of a single module (in the abstract sense, not the Jigsaw interpretation) that contains all required types:

  • “Surprise API” –
    Surprise
    and
    SurpriseFactory
    (both are interfaces)
  • “Calendar API” –
    Calendar
    and
    CalendarSheet
    to create the calendar
  • Surprises – a couple of
    Surprise
    and
    SurpriseFactory
    implementations
  • Main – to wire up and run the whole thing.

Compiling and running is straight forward (commands for Java 8):

# compile
javac -d classes/advent ${source files}
# package
jar -cfm jars/advent.jar ${manifest and compiled class files}
# run
java -jar jars/advent.jar

Entering Jigsaw Land

The next step is small but important. It changes nothing about the code or its organization but moves it into a Jigsaw module.

Modules

So what’s a module? To quote the highly recommended State of the Module System:

A module is a named, self-describing collection of code and data. Its code is organized as a set of packages containing types, i.e., Java classes and interfaces; its data includes resources and other kinds of static information.

To control how its code refers to types in other modules, a module declares which other modules it requires in order to be compiled and run. To control how code in other modules refers to types in its packages, a module declares which of those packages it exports.

So compared to a JAR a module has a name that is recognized by the JVM, declares which other modules it depends on and defines which packages are part of its public API.

Name

A module’s name can be arbitrary. But to ensure uniqueness it is recommended to stick with the inverse-URL naming schema of packages. So while this is not necessary it will often mean that the module name is a prefix of the packages it contains.

Dependencies

A module lists the other modules it depends on to compile and run. This is true for application and library modules but also for modules in the JDK itself, which was split up into about 80 of them (have a look at them with

java -listmods
).

Again from the design overview:

When one module depends directly upon another in the module graph then code in the first module will be able to refer to types in the second module. We therefore say that the first module reads the second or, equivalently, that the second module is readable by the first.

[…]

The module system ensures that every dependence is fulfilled by precisely one other module, that no two modules read each other, that every module reads at most one module defining a given package, and that modules defining identically-named packages do not interfere with each other.

When any of the properties is violated, the module system refuses to compile or launch the code. This is an immense improvement over the brittle classpath, where e.g. missing JARs would only be discovered at runtime, crashing the application.

It is also worth to point out that a module is only able to access another’s types if it directly depends on it. So if A depends on B, which depends on C, then A is unable to access C unless it requires it explicitly.

Exports

A module lists the packages it exports. Only public types in these packages are accessible from outside the module.

This means that

public
is no longer really public. A public type in a non-exported package is as hidden from the outside world as much as a non-public type in an exported package. Which is even more hidden than package-private types are today because the module system does not even allow reflective access to them. As Jigsaw is currently implemented command line flags are the only way around this.

Implementation

To be able to create a module, the project needs a

module-info.java
in its root source directory:

module advent {
    // no imports or exports
}

Wait, didn’t I say that we have to declare dependencies on JDK modules as well? So why didn’t we mention anything here? All Java code requires

Object
and that class, as well as the few others the demo uses, are part of the module
java.base
. So literally every Java module depends on
java.base
, which led the Jigsaw team to the decision to automatically require it. So we do not have to mention it explicitly.

The biggest change is the script to compile and run (commands for Java 9):

# compile (include module-info.java)
javac -d classes/advent ${source files}
# package (add module-info.class and specify main class)
jar -c \
	--file=mods/advent.jar \
	--main-class=advent.Main \
	${compiled class files}
# run (specify a module path and simply name to module to run)
java -mp mods -m advent

We can see that compilation is almost the same – we only need to include the new

module-info.java
in the list of classes.

The jar command will create a so-called modular JAR, i.e. a JAR that contains a module. Unlike before we need no manifest anymore but can specify the main class directly. Note how the JAR is created in the directory

mods
.

Utterly different is the way the application is started. The idea is to tell Java where to find the application modules (with

-mp mods
, this is called the module path) and which module we would like to launch (with
-m advent
).

jigsaw-hands-on-guide-advent

Published by Tina D under CC-BY 2.0.

Splitting Into Modules

Now it’s time to really get to know Jigsaw and split that monolith up into separate modules.

Made-up Rationale

The “surprise API”, i.e.

Surprise
and
SurpriseFactory
, is a great success and we want to separate it from the monolith.

The factories that create the surprises turn out to be very dynamic. A lot of work is being done here, they change frequently and which factories are used differs from release to release. So we want to isolate them.

At the same time we plan to create a large Christmas application of which the calendar is only one part. So we’d like to have a separate module for that as well.

We end up with these modules:

  • surprise
    Surprise
    and
    SurpriseFactory
  • calendar – the calendar, which uses the surprise API
  • factories – the
    SurpriseFactory
    implementations
  • main – the original application, now hollowed out to the class
    Main

Looking at their dependencies we see that surprise depends on no other module. Both calendar and factories make use of its types so they must depend on it. Finally, main uses the factories to create the calendar so it depends on both.

jigsaw-hands-on-splitting-into-modules

Implementation

The first step is to reorganize the source code. We’ll stick with the directory structure as proposed by the official quick start guide and have all of our modules in their own folders below

src
:

src
  - advent.calendar: the "calendar" module
      - org ...
      module-info.java
  - advent.factories: the "factories" module
      - org ...
      module-info.java
  - advent.surprise: the "surprise" module
      - org ...
      module-info.java
  - advent: the "main" module
      - org ...
      module-info.java
.gitignore
compileAndRun.sh
LICENSE
README

To keep this readable I truncated the folders below

org
. What’s missing are the packages and eventually the source files for each module. See it on GitHub in its full glory.

Let’s now see what those module infos have to contain and how we can compile and run the application.

surprise

There are no required clauses as surprise has no dependencies. (Except for

java.base
, which is always implicitly required.) It exports the package
advent.surprise
because that contains the two classes
Surprise
and
SurpriseFactory
.

So the

module-info.java
looks as follows:

module advent.surprise {
	// requires no other modules
	// publicly accessible packages
	exports advent.surprise;
}

Compiling and packaging is very similar to the previous section. It is in fact even easier because surprises contains no main class:

# compile
javac -d classes/advent.surprise ${source files}
# package
jar -c --file=mods/advent.surprise.jar ${compiled class files}

calendar

The calendar uses types from the surprise API so the module must depend on surprise. Adding

requires advent.surprise
to the module achieves this.

The module’s API consists of the class

Calendar
. For it to be publicly accessible the containing package
advent.calendar
must be exported. Note that
CalendarSheet
, private to the same package, will not be visible outside the module.

But there is an additional twist: We just made

Calendar.createWithSurprises(List<SurpriseFactory>)
publicly available, which exposes types from the surprise module. So unless modules reading calendar also require surprise, Jigsaw will prevent them from accessing these types, which would lead to compile and runtime errors.

Marking the requires clause as

public
fixes this. With it any module that depends on calendar also reads surprise. This is called implied readability.

The final module-info looks as follows:

module advent.calendar {
	// required modules
	requires public advent.surprise;
	// publicly accessible packages
	exports advent.calendar;
}

Compilation is almost like before but the dependency on surprise must of course be reflected here. For that it suffices to point the compiler to the directory

mods
as it contains the required module:

# compile (point to folder with required modules)
javac -mp mods \
	-d classes/advent.calendar \
	${source files}
# package
jar -c \
	--file=mods/advent.calendar.jar \
	${compiled class files}

factories

The factories implement

SurpriseFactory
so this module must depend on surprise. And since they return instances of
Surprise
from published methods the same line of thought as above leads to a
requires public
clause.

The factories can be found in the package

advent.factories
so that must be exported. Note that the public class
AbstractSurpriseFactory
, which is found in another package, is not accessible outside this module.

So we get:

module advent.factories {
	// required modules
	requires public advent.surprise;
	// publicly accessible packages
	exports advent.factories;
}

Compilation and packaging is analog to calendar.

main

Our application requires the two modules calendar and factories to compile and run. It still has no API to export.

module advent {
	// required modules
	requires advent.calendar;
	requires advent.factories;
	// no exports
}

Compiling and packaging is like with last section’s single module except that the compiler needs to know where to look for the required modules:

#compile
javac -mp mods \
	-d classes/advent \
	${source files}
# package
jar -c \
	--file=mods/advent.jar \
	--main-class=advent.Main \
	${compiled class files}
# run
java -mp mods -m advent

Services

Jigsaw enables loose coupling by implementing the service locator pattern, where the module system itself acts as the locator. Let’s see how that goes.

Made-up Rationale

Somebody recently read a blog post about how cool loose coupling is. Then she looked at our code from above and complained about the tight relationship between main and factories. Why would main even know factories?

Because…

public static void main(String[] args) {
	List<SurpriseFactory> surpriseFactories = Arrays.asList(
			new ChocolateFactory(),
			new QuoteFactory()
	);
	Calendar calendar =
		Calendar.createWithSurprises(surpriseFactories);
	System.out.println(calendar.asText());
}

Really? Just to instantiate some implementations of a perfectly fine abstraction (the

SurpriseFactory
)?

And we know she’s right. Having someone else provide us with the implementations would remove the direct dependency. Even better, if said middleman would be able to find all implementations on the module path, the calendar’s surprises could easily be configured by adding or removing modules before launching.

This is indeed possible with Jigsaw. We can have a module specify that it provides implementations of an interface. Another module can express that it uses said interface and find all implementations with the

ServiceLocator
.

We use this opportunity to split factories into chocolate and quote and end up with these modules and dependencies:

  • surprise
    Surprise
    and
    SurpriseFactory
  • calendar – the calendar, which uses the surprise API
  • chocolate – the
    ChocolateFactory
    as a service
  • quote – the
    QuoteFactory
    as a service
  • main – the application; no longer requires individual factories

jigsaw-hands-on-services

Implementation

The first step is to reorganize the source code. The only change from before is that

src/advent.factories
is replaced by
src/advent.factory.chocolate
and
src/advent.factory.quote
.

Lets look at the individual modules.

surprise and calendar

Both are unchanged.

chocolate and quote

Both modules are identical except for some names. Let’s look at chocolate because it’s more yummy.

As before with factories the module

requires public
the surprise module.

More interesting are its exports. It provides an implementation of

SurpriseFactory
, namely
ChocolateFactory
, which is specified as follows:

provides advent.surprise.SurpriseFactory
	with advent.factory.chocolate.ChocolateFactory;

Since this class is the entirety of its public API it does not need to export anything else. Hence no other export clause is necessary.

We end up with:

module advent.factory.chocolate {
	// list the required modules
	requires public advent.surprise;
	// specify which class provides which service
	provides advent.surprise.SurpriseFactory
		with advent.factory.chocolate.ChocolateFactory;
}

Compilation and packaging is straight forward:

javac -mp mods \
	-d classes/advent.factory.chocolate \
	${source files}
jar -c \
	--file mods/advent.factory.chocolate.jar \
	${compiled class files}

main

The most interesting part about main is how it uses the ServiceLocator to find implementation of SurpriseFactory. From its main method:

List surpriseFactories = new ArrayList<>();
ServiceLoader.load(SurpriseFactory.class)
	.forEach(surpriseFactories::add);

Our application now only requires calendar but must specify that it uses

SurpriseFactory
. It has no API to export.

module advent {
	// list the required modules
	requires advent.calendar;
	// list the used services
	uses advent.surprise.SurpriseFactory;
	// exports no functionality
}

Compilation and execution are like before.

And we can indeed change the surprises the calendar will eventually contain by simply removing one of the factory modules from the module path. Neat!

Summary

So that’s it. We have seen how to move a monolithic application into a single module and how we can split it up into several. We even used a service locator to decouple our application from concrete implementations of services. All of this is on GitHub so check it out to see more code!

But there is lots more to talk about! Jigsaw brings a couple of incompatibilities but also the means to solve many of them. And we haven’t talked about how reflection interacts with the module system and how to migrate external dependencies.

If these topics interest you, watch this tag as I will surely write about them over the coming months.

The post Jigsaw Hands-On Guide appeared first on blog@CodeFX.


Beware Of findFirst() And findAny()

$
0
0

After filtering a Java 8

Stream
it is common to use
findFirst()
or
findAny()
to get the element that survived the filter. But that might not do what you really meant and subtle bugs can ensue.

So What’s Wrong With findFirst() And findAny()?

As we can see from their Javadoc (here and here) both methods return an arbitrary element from the stream – unless the stream has an encounter order, in which case

findFirst()
returns the first element. Easy.

A simple example looks like this:

public Optional<Customer> findCustomer(String customerId) {
	return customers.stream()
			.filter(customer -> customer.getId().equals(customerId))
			.findFirst();
}

Of course this is just the fancy version of the good old for-each-loop:

public Optional<Customer> findCustomer(String customerId) {
	for (Customer customer : customers)
		if (customer.getId().equals(customerId))
			return Optional.of(customer);
	return Optional.empty();
}

But both variants contain the same potential bug: they are built on the implicit assumption that there can only be one customer with any given ID.

Now, this might be a very reasonable assumption. Maybe this is a known invariant, guarded by dedicated parts of the system, relied upon by others. In that case this is totally fine.

Often the code relies on a unique matching element but does nothing to assert this.

But in many cases I see out in the wild, it is not. Maybe the customers were just loaded from an external source that makes no guarantees about the uniqueness of their IDs. Maybe an existing bug allowed two books with the same ISBN. Maybe the search term allows surprisingly many unforeseen matches (did anyone say regular expressions?).

Often the code’s correctness relies on the assumption that there is a unique element matching the criteria but it does nothing to enforce or assert this.

Worse, the misbehavior is entirely data-driven, which might hide it during testing. Unless we have this scenario in mind, we might simply overlook it until it manifests in production.

Even worse, it fails silently! If the assumption that there is only one such element proves to be wrong, we won’t notice this directly. Instead the system will misbehave subtly for a while before the effects are observed and the cause can be identified.

So of course there is nothing inherently wrong with

findFirst()
and
findAny()
. But it is easy to use them in a way that leads to bugs within the modeled domain logic.

stream-findfirst-findany-reduce

Published by Steven Depolo under CC-BY 2.0

Failing Fast

So let’s fix this! Say we’re pretty sure that there’s at most one matching element and we would like the code to fail fast if there isn’t. With a loop we have to manage some ugly state and it would look as follows:

public Optional<Customer> findOnlyCustomer(String customerId) {
	boolean foundCustomer = false;
	Customer resultCustomer = null;
	for (Customer customer : customers)
		if (customer.getId().equals(customerId))
			if (!foundCustomer) {
				foundCustomer = true;
				resultCustomer = customer;
			} else {
				throw new DuplicateCustomerException();
			}

	return foundCustomer
			? Optional.of(resultCustomer)
			: Optional.empty();
}

Now, streams give us a much nicer way. We can use the often neglected

reduce
, about which the documentation says:

Performs a reduction on the elements of this stream, using an associative accumulation function, and returns an Optional describing the reduced value, if any. This is equivalent to:

boolean foundAny = false;
T result = null;
for (T element : this stream) {
    if (!foundAny) {
        foundAny = true;
        result = element;
    }
    else
        result = accumulator.apply(result, element);
}
return foundAny ? Optional.of(result) : Optional.empty();

but is not constrained to execute sequentially.

Doesn’t that look similar to our loop above?! Crazy coincidence…

So all we need is an accumulator that throws the desired exception as soon as it is called:

public Optional<Customer> findOnlyCustomerWithId_manualException(String customerId) {
	return customers.stream()
			.filter(customer -> customer.getId().equals(customerId))
			.reduce((element, otherElement) -> {
				throw new DuplicateCustomerException();
			});
}

This looks a little strange but it does what we want. To make it more readable, we should put it into a Stream utility class and give it a nice name:

public static <T> BinaryOperator<T> toOnlyElement() {
	return toOnlyElementThrowing(IllegalArgumentException::new);
}

public static <T, E extends RuntimeException> BinaryOperator<T>
toOnlyElementThrowing(Supplier<E> exception) {
	return (element, otherElement) -> {
		throw exception.get();
	};
}

Now we can call it as follows:

// if a generic exception is fine
public Optional<Customer> findOnlyCustomer(String customerId) {
	return customers.stream()
			.filter(customer -> customer.getId().equals(customerId))
			.reduce(toOnlyElement());
}

// if we want a specific exception
public Optional<Customer> findOnlyCustomer(String customerId) {
	return customers.stream()
			.filter(customer -> customer.getId().equals(customerId))
			.reduce(toOnlyElementThrowing(DuplicateCustomerException::new));
}

How is that for intention revealing code?

This will materialize the entire stream.

It should be noted that, unlike

findFirst()
and
findAny()
, this is of course no short-circuiting operation and will materialize the entire stream. That is, if there is indeed only one element. The processing of course stops as soon as a second element is encountered.

Reflection

We have seen how

findFirst()
and
findAny()
do not suffice to express the assumption that there is at most one element left in the stream. If we want to express that assumption and make sure the code fails fast if it is violated, we need to
reduce(toOnlyElement())
.

You can find the code on GitHub and use it as you like – it is in the public domain.

Thanks to Boris Terzic for making me aware of this intention mismatch in the first place.

The post Beware Of findFirst() And findAny() appeared first on blog@CodeFX.

Implied Readability

$
0
0

The hands-on guide to Jigsaw brushes past a feature I would like to discuss in more detail: implied readability. With it, a module can reexport another module’s API to its own dependents.

Overview

This post is based on a section of an article I’ve recently written for InfoQ. If you are interested in a Jigsaw walkthrough, you should read the entire piece.

All non-attributed quotes are from the excellent State Of The Module System.

Definition Of (Implied) Readability

A module’s dependency on another module can take two forms.

Recap: Readability

First, there are dependencies that are consumed internally without the outside world having any knowledge of them. In that case, the dependent module depends upon another but this relationship is invisible to other modules.

Take, for example, Guava, where the code depending on a module does not care at all whether it internally uses immutable lists or not.

implied-readability-requires

This is the most common case and it is covered by the concept of readability:

When one module depends directly upon another […] then code in the first module will be able to refer to types in the second module. We therefore say that the first module reads the second or, equivalently, that the second module is readable by the first.

Here, a module can only access another module’s API if it declares its dependency on it. So if a module depends on Guava, other modules are left in the dark about that and would not have access to Guava without declaring their own explicit dependencies on it.

Implied Readability

But there is another use case where the dependency is not entirely encapsulated, but lives on the boundary between modules. In that scenario one module depends on another, and exposes types from the depended-upon module in its own public API.

In the example of Guava a module’s exposed methods might expect or return immutable lists.

implied-readability-requires-public

So code that wants to call the dependent module might have to use types from the depended-upon module. But it can’t do that if it does not also read the second module. Hence for the dependent module to be at all usable, client modules would all have to explicitly depend on that second module as well. Identifying and manually resolving such hidden dependencies would be a tedious and error-prone task.

This is where implied readability comes in:

[We] extend module declarations so that one module can grant readability to additional modules, upon which it depends, to any module that depends upon it. Such implied readability is expressed by including the public modifier in a requires clause.

In the example of a module’s public API using immutable lists, the module would publicly require Guava, thus granting readability to Guava to all other modules depending on it. This way, its API is immediately usable.

Examples

From The JDK

Let’s look at the

java.sql
module. It exposes the interface
Driver
, which returns a
Logger
via its public method
getParentLogger()
.
Logger
belongs to
java.logging
. Because of that
java.sql
publicly requires
java.logging
, so any module using Java’s SQL features can also access the logging API.

So the module descriptor of

java.sql
might look as follows:

module java.sql {
	requires public java.logging;
	requires java.xml;
	// exports ...
}

From The Jigsaw Advent Calendar

The calendar contains a module

advent.calendar
, which holds a list of 24 surprises, presenting one on each day. Surprises are part of the
advent.surprise
module. So far this looks like a open and shut case for a regular
requires
clause.

But in order to create a calendar we need to pass factories for the different kinds of surprises to the calendar’s static factory method, which is part of the module’s public API. So we used implied readability to ensure that modules using the calendar would not have to explicitly require the surprise module.

module org.codefx.demo.advent.calendar {
	requires public org.codefx.demo.advent.surprise;
	// exports ...
}

implied-readability

Published by Peter Hopper under CC-BY-NC 2.0

Beyond Module Boundaries

The State Of The Module System recommends when to use implied readability:

In general, if one module exports a package containing a type whose signature refers to a package in a second module then the declaration of the first module should include a requires public dependence upon the second. This will ensure that other modules that depend upon the first module will automatically be able to read the second module and, hence, access all the types in that module’s exported packages.

But how far should we take this?

Looking back on the example of

java.sql
, should a module using it require
java.logging
as well? Technically such a declaration is not needed and might seem redundant.

To answer this question we have to look at how exactly our fictitious module uses

java.logging
. It might only need to read it so we are able to call
Driver.getParentLogger()
, change the logger’s log level and be done with it. In this case our code’s interaction with
java.logging
happens in the immediate vicinity of its interaction with
Driver
from
java.sql
. Above we called this the boundary between two modules.

Alternatively our module might actually use logging throughout its own code. Then, types from

java.logging
appear in many places independent of
Driver
and can no longer be considered to be limited to the boundary of our module and
java.sql
.

A similar juxtaposition can be created for our advent calendar: Does the main module

advent
, which requires
advent.calendar
, only use
advent.surprise
for the surprise factories that it needs to create the calendar? Or does it have a use for the surprise module independently of its interaction with the calendar?

A module should be explicitly required if it is used on more than just the boundary to another module.

With Jigsaw being cutting edge, the community still has time to discuss such topics and agree on recommended practices. My take is that if a module is used on more than just the boundary to another module, it should be explicitly required. This approach clarifies the system’s structure and also future-proofs the module declaration for various refactorings.

Aggregation And Decomposition

Implied readability enables some interesting techniques. They rely on the fact that with it a client can consume various modules’ APIs without explicitly depending on them if it instead depends on a module that publicly requires the used ones.

Aggregator modules bundle the functionality of related modules into a single unit.

One technique is the creation of so-called aggregator modules, which contain no code on their own but aggregate a number of other APIs for easier consumption. This is already being employed by the Jigsaw JDK, which models compact profiles as modules that simply expose the very modules whose packages are part of the profile.

Another is, what Alex Buckley calls downward decomposability: A module can be decomposed into more specialized modules without compatibility implications if it turns into an aggregator for the new modules.

But creating aggregator modules brings clients into the situation where they internally use APIs of modules on which they don’t explicitly depend. This can be seen as conflicting with what we said above, i.e. that implied readability should only be used on the boundary to other modules. But I think the situation is subtly different here.

Aggregator modules have a specific responsibility: to bundle the functionality of related modules into a single unit. Modifying the bundle’s content is a pivotal change. “Regular” implied readability, on the other hand, will often manifest between not immediately related modules (as with

java.sql
and
java.logging
), where the implied module is used more incidentally.

This is somewhat similar to the distinction between composition and aggregation but (a) it’s different and (b), lamentably, aggregator modules would be more on the side of composition. I’m happy to hear ideas on how to precisely express the difference.

Reflection

We have seen how implied readability can be used to make a module’s public API immediately usable, even if it contains types from another module. It enables aggregator modules and downwards decomposability.

We discussed how far we should take implied readability and I opined that a module should only lean on implied readability if it merely uses the implied module’s API on the boundary to a module it explicitly depends on. This does not touch on aggregator module as they use the mechanism for a different purpose.

The post Implied Readability appeared first on blog@CodeFX.

How To Implement equals Correctly

$
0
0

I wrote this article for SitePoint's Java channel, where you can find a lot of interesting articles about our favorite programming language. Check it out!

A fundamental aspect of any Java class is its definition of equality. It is determined by a class’s

equals
method and there are a couple of things to be considered for a correct implementation. Let’s check ’em out so we get it right!

Note that implementing

equals
always means that
hashCode
has to be implemented as well! We’ll cover that in a separate article so make sure to read it after this one.

Identity Versus Equality

Have a look at this piece of code:

String some = "some string";
String other = "other string";

We have two strings and they are obviously different.

What about these two?

String some = "some string";
String other = some;
boolean identical = some == other;

Here we have only one String instance and

some
and
other
both reference it. In Java we say
some
and
other
are identical and, accordingly,
identical
is
true
.

What about this one?

String some = "some string";
String other = "some string";
boolean identical = some == other;

Now,

some
and
other
point to different instances and are no longer identical, so
identical
is false. (We’ll ignore String interning in this article; if this bugs you, assume every string literal were wrapped in a
new String(...)
.)

But they do have some relationship as they both “have the same value”. In Java terms, they are equal, which is checked with

equals
:

String some = "some string";
String other = "some string";
boolean equal = some.equals(other);

Here,

equals
is
true
.

A variable’s Identity (also called Reference Equality) is defined by the reference it holds. If two variables hold the same reference they are identical. This is checked with

==
.

A variable’s Equality is defined by the value it references. If two variables reference the same value, they are equal. This is checked with

equals
.

But what does “the same value” mean? It is, in fact, the implementation of

equals
that determines “sameness”. The
equals
method is defined in
Object
and since all classes inherit from it, all have that method.

The implementation in

Object
checks identity (note that identical variables are equal as well), but many classes override it with something more suitable. For strings, for example, it compares the character sequence and for dates it makes sure that both point to the same day.

Many data structures, most notably Java’s own collection framework, use

equals
to check whether they contain an element.

For example:

List list = Arrays.asList("a", "b", "c");
boolean contains = list.contains("b");

The variable

contains
is
true
because, while the instances of
"b"
are not identical, they are equal.

(This is also the point where

hashCode
comes into play.)

Thoughts on Equality

Any implementation of

equals
must adhere to a specific contract or the class’s equality is ill-defined and all kinds of unexpected things happen. We will look at the formal definition in a moment but let’s first discuss some properties of equality.

It might help to think about it as we encounter it in our daily lives. Let’s say we compare laptops and consider them equal if they have the same hardware specifications.

  1. One property is so trivial that it is hardly worth mentioning: Each thing is equal to itself. Duh.
  2. There is another, which is not much more inspiring: If one thing is equal to another, the other is also equal to the first. Clearly if my laptop is equal to yours, yours is equal to mine.
  3. This one is more interesting: If we have three things and the first and second are equal and the second and third are equal, then the first and third are also equal. Again, this is obvious in our laptop example.

That was an exercise in futility, right? Not so! We just worked through some basic algebraic properties of equivalence relations. No wait, don’t leave! That’s already all we need. Because any relation that has the three properties above can be called an equality.

Yes, any way we can make up that compares things and has the three properties above, could be how we determine whether those things are equal. Conversely, if we leave anything out, we no longer have a meaningful equality.

The equals Contract

The

equals
contract is little more but a formalization of what we saw above. To quote the source:

The

equals
method implements an equivalence relation on non-null object references:
  • It is reflexive: for any non-null reference value
    x
    ,
    x.equals(x)
    should return
    true
    .
  • It is symmetric: for any non-null reference values
    x
    and
    y
    ,
    x.equals(y)
    should return
    true
    if and only if
    y.equals(x)
    returns true.
  • It is transitive: for any non-null reference values
    x
    ,
    y
    , and
    z
    , if
    x.equals(y)
    returns
    true
    and
    y.equals(z)
    returns
    true
    , then
    x.equals(z)
    should return
    true
    .
  • It is consistent: for any non-null reference values
    x
    and
    y
    , multiple invocations of
    x.equals(y)
    consistently return
    true
    or consistently return
    false
    , provided no information used in
    equals
    comparisons on the objects is modified.
  • For any non-null reference value
    x
    ,
    x.equals(null)
    should return
    false
    .

By now, the first three should be very familiar. The other points are more of a technicality: Without consistency data structures behave erratically and being equal to null not only makes no sense but would complicate many implementations.

Implementing equals

For a class

Person
with string fields
firstName
and
lastName
, this would be a common variant to implement
equals
:

@Override
public boolean equals(Object o) {
    // self check
    if (this == o)
        return true;
    // null check
    if (o == null)
        return false;
    // type check and cast
    if (getClass() != o.getClass())
        return false;
    Person person = (Person) o;
    // field comparison
    return Objects.equals(firstName, person.firstName)
            && Objects.equals(lastName, person.lastName);
}

Let’s go through it one by one.

Signature

It is very important that

equals
takes an
Object
! Otherwise, unexpected behavior occurs.

For example, assume that we would implement

equals(Person)
like so:

public boolean equals(Person person) {
    return Objects.equals(firstName, person.firstName)
            && Objects.equals(lastName, person.lastName);
}

What happens in a simple example?

Person elliot = new Person("Elliot", "Alderson");
Person mrRobot = new Person("Elliot", "Alderson");
boolean equal = elliot.equals(mrRobot);

Then

equal
is
true
. What about now?

Person elliot = new Person("Elliot", "Alderson");
Object mrRobot = new Person("Elliot", "Alderson");
boolean equal = elliot.equals(mrRobot);

Now it’s

false
. Wat?! Maybe not quite what we expected.

The reason is that Java called

Person.equals(Object)
(as inherited from
Object
, which checks identity). Why?

Java’s strategy for choosing which overloaded method to call is not based on the parameter’s runtime type but on its declared type. (Which is a good thing because otherwise static code analysis, like call hierarchies, would not work.) So if

mrRobot
is declared as an
Object
, Java calls
Person.equals(Object)
instead of our
Person.equals(Person)
.

Note that most code, for example all collections, handle our persons as objects and thus always call

equals(Object)
. So we better make sure we provide an implementation with that signature! We can of course create a specialized
equals
implementation and call it from our more general one if we like that better.

Self Check

Equality is a fundamental property of any class and it might end up being called very often, for example in tight loops querying a collection. Thus, its performance matters! And the self check at the beginning of our implementation is just that: a performance optimization.

if (this == o)
    return true;

It might look like it should implement reflexivity but the checks further down would be very strange if they would not also do that.

Null Check

No instance should be equal to null, so here we go making sure of that. At the same time, it guards the code from

NullPointerException
s.

if (o == null)
    return false;

It can actually be included in the following check, like so:

if (o == null || getClass() != o.getClass())
    return false;

Type Check and Cast

Next thing, we have to make sure that the instance we’re looking at is actually a person. This is another tricky detail.

if (getClass() != o.getClass())
    return false;
Person person = (Person) o;

Our implementation uses

getClass
, which returns the classes to which
this
and
o
belong. It requires them to be identical! This means that if we had a class
Employee extends Person
, then
Person.equals(Employee)
would never return
true
– not even if both had the same names.

This might be unexpected.

That an extending class with new fields does not compare well may be reasonable, but if that extension only adds behavior (maybe logging or other non-functional details), it should be able to equal instances of its supertype. This becomes especially relevant if a framework spins new subtypes at runtime (e.g. Hibernate or Spring), which could then never be equal to instances we created.

An alternative is the

instanceof
operator:

if (!(o instanceof Person))
    return false;

Instances of subtypes of

Person
pass that check. Hence they continue to the field comparison (see below) and may turn out to be equal. This solves the problems we mentioned above but opens a new can of worms.

Say

Employee extends Person
and adds an additional field. If it overrides the
equals
implementation it inherits from
Person
and includes the extra field, then
person.equals(employee)
can be
true
(because of
instanceof
) but
employee.equals(person)
can’t (because
person
misses that field). This clearly violates the symmetry requirement.

There seems to be a way out of this:

Employee.equals
could check whether it compares to an instance with that field and use it only then (this is occasionally called slice comparison).

But this doesn’t work either because it breaks transitivity:

Person foo = new Person("Mr", "Foo");
Employee fu = new Employee("Mr", "Foo", "Marketing");
Employee fuu = new Employee("Mr", "Foo", "Engineering");

Obviously all three instances share the same name, so

foo.equals(fu)
and
foo.equals(fuu)
are
true
. By transitivity
fu.equals(fuu)
should also be
true
but it isn’t if the third field, apparently the department, is included in the comparison.

There is really no way to make slice comparison work without violating reflexivity or, and this is trickier to analyze, transitivity. (If you think you found one, check again. Then let your coworkers check. If you are still sure, ping me. ;) )

So we end with two alternatives:

  • Use
    getClass
    and be aware that instances of the type and its subtypes can never equal.
  • Use
    instanceof
    but make
    equals
    final because there is no way to override it correctly.

Which one makes more sense really depends on the situation. Personally, I prefer

instanceof
because its problems (can not include new fields in inherited classes) occurs at declaration site not at use site.

Field Comparison

Wow, that was a lot of work! And all we did was solve some corner cases! So let’s finally get to the test’s core: comparing fields.

This is pretty simple, though. In the vast majority of cases, all there is to do is to pick the fields that should define a class’s equality and then compare them. Use

==
for primitives and
equals
for objects.

If any of the fields could be null, the extra checks considerably reduce the code’s readability:

return (firstName == person.firstName
        || firstName != null && firstName.equals(person.firstName))
    && (lastName == person.lastName
            || lastName != null && lastName.equals(person.lastName))

And this already uses the non-obvious fact that

null == null
is
true
.

It is much better to use Java’s utility method

Objects.equals
(or, if you’re not yet on Java 7, Guava’s
Objects.equal
):

return Objects.equals(firstName, person.firstName)
        && Objects.equals(lastName, person.lastName);

It does exactly the same checks but is much more readable.

Summary

We have discussed the difference between identity (must be the same reference; checked with

==
) and equality (can be different references to “the same value”; checked with
equals
) and went on to take a close look at how to implement
equals
.

Let’s put those pieces back together:

  • Make sure to override
    equals(Object)
    so our method is always called.
  • Include a self and null check for an early return in simple edge cases.
  • Use
    getClass
    to allow subtypes their own implementation (but no comparison across subtypes) or use
    instanceof
    and make
    equals
    final (and subtypes can equal).
  • Compare the desired fields using
    Objects.equals
    .

Or let your IDE generate it all for you and edit where needed.

Final Words

We have seen how to properly implement

equals
(and will soon look at
hashCode
). But what if we are using classes that we have no control over? What if their implementations of these methods do not suit our needs or are plain wrong?

LibFX to the rescue! It contains transforming collections and one of their features is to allow the user to specify the

equals
and
hashCode
methods she needs.

The post How To Implement equals Correctly appeared first on blog@CodeFX.

How To Implement hashCode Correctly

$
0
0

I wrote this article for SitePoint's Java channel, where you can find a lot of interesting articles about our favorite programming language. Check it out!

So you’ve decided that identity isn’t enough for you and wrote a nice

equals
implementation? Great! But now you have to implement

hashCode
as well.

Let’s see why and how to do it correctly.

Equality and Hash Code

While equality makes sense from a general perspective, hash codes are much more technical. If we were being a little hard on them, we could say that they are just an implementation detail to improve performance.

Most data structures use

equals
to check whether they contain an element. For example:

List<String> list = Arrays.asList("a", "b", "c");
boolean contains = list.contains("b");

The variable

contains
is
true
because, while instances of
"b"
are not identical (again, ignoring String interning), they are equal.

Comparing every element with the instance given to

contains
is wasteful, though, and a whole class of data structures uses a more performant approach. Instead of comparing the requested instance with each element they contain, they use a shortcut that reduces the number of potentially equal instances and then only compare those.

This shortcut is the hash code, which can be seen as an object’s equality boiled down to an integer value. Instances with the same hash code are not necessarily equal but equal instances have the same hash code. (Or should have, we will discuss this shortly.) Such data structures are often named after this technique, recognizable by the

Hash
in their name, with
HashMap
the most notable representative.

This is how they generally work:

  • When an element is added, its hash code is used to compute the index in an internal array (called a bucket).
  • If other, non-equal elements have the same hash code, they end up in the same bucket and must be bundled together, e.g. by adding them to a list.
  • When an instance is given to
    contains
    , its hash code is used to compute the bucket. Only elements therein are compared to the instance.

This way, very few, ideally no

equals
comparisons are required to implement
contains
.

As

equals
,
hashCode
is defined on
Object
.

Thoughts on Hashing

If

hashCode
is used as a shortcut to determine equality, then there is really only one thing we should care about: Equal objects should have the same hash code.

This is also why, if we override

equals
, we must create a matching
hashCode
implementation! Otherwise things that are equal according to our implementation would likely not have the same hash code because they use
Object
‘s implementation.

The hashCode Contract

Quoting the source:

The general contract of

hashCode
is:
  • Whenever it is invoked on the same object more than once during an execution of a Java application, the
    hashCode
    method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the
    equals(Object)
    method, then calling the
    hashCode
    method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the
    equals(Object)
    method, then calling the
    hashCode
    method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

The first bullet mirrors the consistency property of

equals
and the second is the requirement we came up with above. The third states an important detail that we will discuss in a moment.

Implementing hashCode

A very easy implementation of

Person.hashCode
is the following:

@Override
public int hashCode() {
    return Objects.hash(firstName, lastName);
}

The person’s hash code is computed by computing the hash codes for the relevant fields and combining them. Both is left to

Objects
‘ utility function
hash
.

Selecting Fields

But which fields are relevant? The requirements help answer this: If equal objects must have the same hash code, then hash code computation should not include any field that is not used for equality checks. (Otherwise two objects that only differ in those fields would be equal but have different hash codes.)

So the set of fields used for hashing should be a subset of the fields used for equality. By default both will use the same fields but there are a couple of details to consider.

Consistency

For one, there is the consistency requirement. It should be interpreted rather strictly. While it allows the hash code to change if some fields change (which is often unavoidable with mutable classes), hashing data structures are not prepared for this scenario.

As we have seen above the hash code is used to determine an element’s bucket. But if the hash-relevant fields change, the hash is not recomputed and the internal array is not updated.

This means that a later query with an equal object or even with the very same instance fails! The data structure computes the current hash code, different from the one used to store the instance, and goes looking in the wrong bucket.

Conclusion: Better not use mutable fields for hash code computation!

Performance

Hash codes might end up being computed about as often as

equals
is called. This can very well happen in performance critical parts of the code so it makes sense to think about performance. And unlike
equals
there is a little more wiggle room to optimize it.

Unless sophisticated algorithms are used or many, many fields are involved, the arithmetic cost of combining their hash codes is as negligible as it is unavoidable. But it should be considered whether all fields need to be included in the computation! Particularly collections should be viewed with suspicion. Lists and sets, for example, will compute the hash for each of their elements. Whether calling them is necessary should be considered on a case-by-case basis.

If performance is critical, using

Objects.hash
might not be the best choice either because it requires the creation of an array for its varargs.

But the general rule about optimization holds: Don’t do it prematurely! Use a common hash code algorithm, maybe forego including the collections, and only optimize after profiling showed potential for improvement.

Collisions

Going all-in on performance, what about this implementation?

@Override
public int hashCode() {
    return 0;
}

It’s fast, that’s for sure. And equal objects will have the same hash code so we’re good on that, too. As a bonus, no mutable fields are involved!

But remember what we said about buckets? This way all instances will end up in the same! This will typically result in a linked list holding all the elements, which is terrible for performance. Each

contains
, for example, triggers a linear scan of the list.

So what we want is as few items in the same bucket as possible! An algorithm that returns wildly varying hash codes, even for very similar objects, is a good start.

How to get there partly depends on the selected fields. The more details we include in the computation, the more likely it is for the hash codes to differ. Note how this is completely opposite to our thoughts about performance. So, interestingly enough, using too many or too few fields can result in bad performance.

The other part to preventing collisions is the algorithm that is used to actually compute the hash.

Computing The Hash

The easiest way to compute a field’s hash code is to just call

hashCode
on it. Combining them could be done manually. A common algorithm is to start with some arbitrary number and to repeatedly multiply it with another (often a small prime) before adding a field’s hash:

int prime = 31;
int result = 1;
result = prime * result + ((firstName == null) ? 0 : firstName.hashCode());
result = prime * result + ((lastName == null) ? 0 : lastName.hashCode());
return result;

This might result in overflows, which is not particularly problematic because they cause no exceptions in Java.

Note that even great hashing algorithms might result in uncharacteristically frequent collisions if the input data has specific patterns. As a simple example assume we would compute the hash of points by adding their x and y-coordinates. May not sound too bad until we realize that we often deal with points on the line

f(x) = -x
, which means
x + y == 0
for all of them. Collisions, galore!

But again: Use a common algorithm and don’t worry until profiling shows that something isn’t right.

Summary

We have seen that computing hash codes is something like compressing equality to an integer value: Equal objects must have the same hash code and for performance reasons it is best if as few non-equal objects as possible share the same hash.

This means that

hashCode
must always be overridden if
equals
is.

When implementing

hashCode
:
  • Use a the same fields that are used in
    equals
    (or a subset thereof).
  • Better not include mutable fields.
  • Consider not calling
    hashCode
    on collections.
  • Use a common algorithm unless patterns in input data counteract them.

Remember that

hashCode
is about performance, so don’t waste too much energy unless profiling indicates necessity.

The post How To Implement hashCode Correctly appeared first on blog@CodeFX.

Java 9 Additions To Stream

$
0
0

Java 9 is coming! And it is more than just Project Jigsaw. (I was surprised, too.) It is bringing a lot of small and not-so-small changes to the platform and I’d like to look at them one by one. I’ll tag all these posts and you can find them here.

Let’s start with …

Streams

Streams learned two new tricks. The first deals with prefixes, which streams now understand. We can use a predicate to test a stream’s elements and, starting at the beginning, either take or drop them until the first fails a test.

Stream::takeWhile

Let’s look at

takeWhile
first:

Stream<T> takeWhile(Predicate<? super T> predicate);

Called on an ordered stream it will return a new one that consists of those element that passed the predicate until the first one failed. It’s a little like

filter
but it cuts the stream off as soon as the first element fails the predicate. In its parlance, it takes elements from the stream while the predicate holds and stops as soon as it no longer does.

Let’s see an example:

Stream.of("a", "b", "c", "", "e")
	.takeWhile(s -> !String.isEmpty(s));
	.forEach(System.out::print);

Console: abc

Easy, right? Note how

e
is not part of the returned stream, even though it would pass the predicate. It is never tested, though, because
takeWhile
is done after the empty string.

Prefixes

Just to make sure we’re understanding the documentation, let’s get to know the terminology. A subsequence of an ordered stream that begins with the stream’s first element is called a prefix.

Stream<String> stream = Stream.of("a", "b", "c", "d", "e");
Stream<String> prefix = Stream.of("a", "b", "c");
Stream<String> subsequenceButNoPrefix = Stream.of("b", "c", "d");
Stream<String> subsetButNoPrefix = Stream.of("a", "c", "b");

The

takeWhile
-operation will return the longest prefix that contains only elements that pass the predicate.

Prefixes can be empty so if the first element fails the predicate, it will return the empty stream. Conversely, the prefix can be the entire stream and the operation will return it if all elements pass the predicate.

Order

Talking of prefixes only makes sense for ordered streams. So what happens for unordered ones? As so often with streams, the behavior is deliberately unspecified to enable performant implementations.

Taking from an unordered stream will return an arbitrary subset of those elements that pass the predicate. Except if all of them do, then it always returns the entire stream.

Concurrency

Taking from an ordered parallel stream is not the best idea. The different threads have to cooperate to ensure that the longest prefix is returned. This overhead can degrade performance to the point where it makes more sense to make the stream sequential.

Stream::dropWhile

Next is

dropWhile
:

Stream<T> dropWhile(Predicate<? super T> predicate);

It does just the opposite of

takeWhile
: Called on an ordered stream it will return a new one that consists of the first element that failed the predicate and all following ones. Or, closer to its name, it drops elements while the predicate holds and returns the rest.

Time for an example:

Stream.of("a", "b", "c", "de", "f")
	.dropWhile(s -> s.length <= 1);
	.forEach(System.out::print);

Console: def

Note that the stream contains

f
even though it would not pass the predicate. Analog to before, the operation stops after the first string fails the predicate, in this case
ef
.

Called on an unordered stream the operation will drop a subset of those elements that fail the predicate. Unless all of them do, in which case it will always return an empty stream. Everything else we said above about terminology and concurrency applies here as well.

Stream::ofNullable

That one’s really trivial. Instead of talking about it, lets see it in action:

long one = Stream.ofNullable("42").count();
long zero = Stream.ofNullable(null).count();

You got it, right? It creates a stream with the given element unless it is

null
, in which case the stream is empty. Yawn!

It has its use cases, though. Before, if some evil API gave you an instance that could be null, it was circuitous to start operating on a stream that instance could provide:

// findCustomer can return null
Customer customer = findCustomer(customerId);

Stream orders = customer == null
	? Stream.empty()
	: customer.streamOrders();
// do something with stream of orders ...

// alternatively, for the Optional lovers
Optional.ofNullable(customer)
	.map(Customer::streamOrders)
	.orElse(Stream.empty()
	. // do something with stream of orders

This gets much better now:

// findCustomer can return null
Customer customer = findCustomer(customerId);

Stream.ofNullable(customer)
	.flatMap(Customer::streamOrders)
	. // do something with stream of orders

Reflection

We’ve seen how

takeWhile
will return elements that pass the predicate and cut the stream off when the first element fails it. Conversely,
dropWhile
will also cut the stream when the first element fails the predicat but will return that one and all after it.

As a farewell, let’s see a final example, in which we stream all lines from an HTML file’s

meta
element:

Files.lines(htmlFile)
	.dropWhile(line -> !line.contains("")
	.skip(1)
	.takeWhile(line -> !line.contains("")

We also learned about

ofNullable
. I wonder why it seems so familiar? Ah yes, Optional of course! Coincidently I will cover that next. :)

Stay tuned!

The post Java 9 Additions To Stream appeared first on blog@CodeFX.

Java 9 Additions To Optional

$
0
0

Wow, people were really interested in Java 9’s additions to the Stream API. Want some more? Let’s look at …

Optional

Optional::stream

This one requires no explanation:

Stream<T> stream();

Finally we get from Optional to Stream!

The first word that comes to mind is: finally! Finally can we easily get from a stream of optionals to a stream of present values!

Given a method

Optional<Customer> findCustomer(String customerId)
we had to do something like this:

public Stream<Customer> findCustomers(Collection<String> customerIds) {
	return customerIds.stream()
		.map(this::findCustomer)
		// now we have a Stream>
		.filter(Optional::isPresent)
		.map(Optional::get);
}

Or this:

public Stream<Customer> findCustomers(Collection<String> customerIds) {
	return customerIds.stream()
		.map(this::findCustomer)
		.flatMap(customer -> customer.isPresent()
			? Stream.of(customer.get())
			: Stream.empty());
}

We could of course push that into a utility method (which I hope you did) but it was still not optimal.

Now, it would’ve been interesting to have

Optional
actually implement
Stream
but
  1. it doesn’t look like it has been considered when
    Optional
    was designed
    , and
  2. that ship has sailed since streams are lazy and
    Optional
    is not.

So the only option left was to add a method that returns a stream of either zero or one element(s). With that we again have two options to achieve the desired outcome:

public Stream<Customer> findCustomers(Collection<String> customerIds) {
	return customerIds.stream()
		.map(this::findCustomer)
		.flatMap(Optional::stream)
}

public Stream<Customer> findCustomers(Collection<String> customerIds) {
	return customerIds.stream()
		.flatMap(id -> findCustomer(id).stream());
}

It’s hard to say which I like better – both have upsides and downsides – but that’s a discussion for another post. Both look better than what we had to do before.

We can now operate lazily on Optional.

Another small detail: If we want to, we can now more easily move from eager operations on

Optional
to lazy operations on
Stream
.

public List<Order> findOrdersForCustomer(String customerId) {
	return findCustomer(customerId)
		// 'List<Order> getOrders(Customer)' is expensive;
		// this is 'Optional::map', which is eager
		.map(this::getOrders)
		.orElse(new ArrayList<>());
}

public Stream<Order> findOrdersForCustomer(String customerId) {
	return findCustomer(customerId)
		.stream()
		// this is 'Stream::map', which is lazy
		.map(this::getOrders)
}

I think I didn’t have a use case for that yet but it’s good to keep in mind.

Optional::or

Another addition that lets me to think finally! How often have you had an

Optional
and wanted to express “use this one; unless it is empty, in which case I want to use this other one”? Soon we can do just that:

Optional<T> or(Supplier<Optional<T>> supplier);

Say we need some customer’s data, which we usually get from a remote service. But because accessing it is expensive and we’re very clever, we have a local cache instead. Two actually, one on memory and one on disk. (I can see you cringe. Relax, it’s just an example.)

This is our local API for that:

public interface Customers {

	Optional<Customer> findInMemory(String customerId);

	Optional<Customer> findOnDisk(String customerId);

	Optional<Customer> findRemotely(String customerId);

}

Chaining those calls in Java 8 is verbose (just try it if you don’t believe me). But with

Optional::or
it becomes a piece of cake:

public Optional<Customer> findCustomer(String customerId) {
	return customers.findInMemory(customerId)
		.or(() -> customers.findOnDisk(customerId))
		.or(() -> customers.findRemotely(customerId));
}

Isn’t that cool?! How did we even live without it? Barely, I can tell you. Just barely.

Optional::ifPresentOrElse

This last one, I am less happy with:

void ifPresentOrElse(Consumer<? super T> action, Runnable emptyAction);

You can use it to cover both branches of an

isPresent
-if:

public void logLogin(String customerId) {
	findCustomer(customerId)
		.ifPresentOrElse(
			this::logLogin,
			() -> logUnknownLogin(customerId)
		);
}

Where

logLogin
is overloaded and also takes a customer, whose login is then logged. Similarly
logUnknownLogin
logs the ID of the unknown customer.

Now, why wouldn’t I like it? Because it forces me to do both at once and keeps me from chaining any further. I would have preferred this by a large margin:

Optional<T> ifPresent(Consumer<? super T> action);

Optional<T> ifEmpty(Runnable action);

The case above would look similar but better:

public void logLogin(String customerId) {
	findCustomer(customerId)
		.ifPresent(this::logLogin)
		.ifEmpty(() -> logUnknownLogin(customerId));
}

First of all, I find that more readable. Secondly it allows me to just have the

ifEmpty
branch if I whish to (without cluttering my code with empty lambdas). Lastly, it allows me to chain these calls further. To continue the example from above:

public Optional<Customer> findCustomer(String customerId) {
	return customers.findInMemory(customerId)
		.ifEmpty(() -> logCustomerNotInMemory(customerId))
		.or(() -> customers.findOnDisk(customerId))
		.ifEmpty(() -> logCustomerNotOnDisk(customerId))
		.or(() -> customers.findRemotely(customerId))
		.ifEmpty(() -> logCustomerNotOnRemote(customerId))
		.ifPresent(ignored -> logFoundCustomer(customerId));
}

The question that remains is the following: Is adding a return type to a method (in this case to

Optional::ifPresent
) an incompatible change? Not obviously but I’m currently too lazy to investigate. Do you know?

Reflection

To sum it up:

  • Use
    Optional::stream
    to map an
    Optional
    to a
    Stream
    .
  • Use
    Optional::or
    to replace an empty
    Optional
    with the result of a call returning another
    Optional
    .
  • With
    Optional::ifPresentOrElse
    you can do both branches of an
    isPresent
    -if.

Very cool!

What do you think? I’m sure someone out there still misses his favorite operation. Tell me about it!

The post Java 9 Additions To Optional appeared first on blog@CodeFX.

Oh No, I Forgot Stream::iterate!

$
0
0

There I go talking about the new things that Java 9 will bring to the stream API and then I forget one: a new overload for

iterate
. D’oh! I updated that post but to make sure you don’t miss it, I also put it into this one.

Stream::iterate

Stream
already has a method
iterate
. It’s a static factory method that takes a seed element of type
T
and a function from
T
to
T
. Together they are used to create a
Stream<T>
by starting with the seed element and iteratively applying the function to get the next element:

Stream.iterate(1, i -> i + 1)
	.forEach(System.out::println);
// output: 1 2 3 4 5 ...

Great! But how can you make it stop? Well, you can’t, the stream is infinite.

Or rather you couldn’t because this is where the new overload comes in. It has an extra argument in the middle: a predicate that is used to assess each element before it is put into the stream. As soon as the first elements fails the test, the stream ends:

Stream.iterate(1, i -> i <= 3, i -> i + 1)
	.forEach(System.out::println);
// output: 1 2 3

As it is used above, it looks more like a traditional for loop than the more succinct but somewhat alien

IntStream.rangeClosed(1, 3)
(which I still prefer but YMMV). It can also come in handy to turn “iterator-like” data structures into streams, like the ancient
Enumeration
:

Enumeration<Integer> en = // ...
if (en.hasMoreElements()) {
	Stream.iterate(
			en.nextElement(),
			el -> en.hasMoreElements(),
			el -> en.nextElement())
		.forEach(System.out::println);
}

You could also use it to manipulate a data structure while you stream over it, like popping elements off a stack. This not generally advisable, though, because the source may end up in a surprising state – you might want to discard it afterwards.

The post Oh No, I Forgot Stream::iterate! appeared first on blog@CodeFX.


Rebutting 5 Common Stream Tropes

$
0
0

I’ve just finished reading “1 Exception To The Power of JDK 8 Collectors” and I have to say that I am pretty disappointed. Simon Ritter, Java champion, former Java evangelist at Oracle, and now Deputy CTO at Azul Systems (the guys with the cool JVM), wrote it so I expected some interesting insights into streams. Instead the post comes down to:

  • use streams to reduce line count
  • you can do fancy stuff with collectors
  • exceptions in streams suck

Not only is this superficial, the article also employs a handful of substandard development practices. Now, Simon writes that this is just for a small demo project, so I guess he didn’t pour all his expertise into it. Still, it is sloppy and – and this is worse – many people out there make the same mistakes and repeat the same tropes.

Seeing them being recited in many different places (even if the respective authors might not defend these points when pressed), is surely not helping developers to get a good impression of how to use streams. So I decided to take this occasion and write a rebuttal – not only to this post but to all that repeat any of the five tropes I found in it.

(Always pointing out that something is my opinion is redundant [it’s my blog, after all] and tiresome, so I won’t do it. Keep it in mind, though, because I say some things like they were facts even though they’re only my point of view.)

The Problem

There’s a lot of explanations of what’s going on and why but in the end, it comes down to this: We have a query string from an HTTP POST request and want to parse the parameters into a more convenient data structure. (For example, given a string

a=foo&b=bar&a=fu
we want to get something like
a~>{foo,fu} b~>{bar}
.)

We also have some code we found online that already does this:

private void parseQuery(String query, Map parameters)
		throws UnsupportedEncodingException {

	if (query != null) {
		String pairs[] = query.split("[&]");

		for (String pair : pairs) {
			String param[] = pair.split("[=]");
			String key = null;
			String value = null;

			if (param.length > 0) {
				key = URLDecoder.decode(param[0],
					System.getProperty("file.encoding"));
			}

			if (param.length > 1) {
				value = URLDecoder.decode(param[1],
					System.getProperty("file.encoding"));
			}

			if (parameters.containsKey(key)) {
				Object obj = parameters.get(key);

				if(obj instanceof List) {
					List values = (List)obj;
					values.add(value);
				} else if(obj instanceof String) {
					List values = new ArrayList();
					values.add((String)obj);
					values.add(value);
					parameters.put(key, values);
				}
			} else {
				parameters.put(key, value);
			}
		}
	}
}

I assume it is kindness that the author’s name is not mentioned because this snippet is wrong on so many levels that we will not even discuss it.

My Beef

From here on, the article explains how to refactor towards streams. And this is where I start do disagree.

Streams For Succinctness

This is how the refactoring is motivated:

Having looked through this I thought I could […] use streams to make it a bit more succinct.

I hate it when people put that down as the first motivation to use streams! Seriously, we’re Java developers, we are used to writing a little extra code if it improves readability.

Streams are not about succinctness

So streams are not about succinctness. On the contrary, we’re so used to loops that we’re often cramming a bunch of operations into the single body line of a loop. When refactoring towards streams I often split the operations up, thus leading to more lines.

Instead, the magic of streams is how they support mental pattern matching. Because they use only a handful of concepts (mainly map/flatMap, filter, reduce/collect/find), I can see quickly what’s going and focus on the operations, preferably one by one.

for (Customer customer : customers) {
	if (customer.getAccount().isOverdrawn()) {
		WarningMail mail = WarningMail.createFor(customer.getAccount());
		// do something with mail
	}
}

customers.stream()
	.map(Customer::getAccount)
	.filter(Account::isOverdrawn)
	.map(WarningMail::createFor)
	.forEach(/* do something with mail */ );

In code, it is much easier to follow the generic “customers map to accounts filter overdrawn ones map to warning mails”, then the convoluted “create a warning mail for an account that you got from a customer but only if it is overdrawn”.

But why would this be a reason to complain? Everybody has his or her own preferences, right? Yes, but focusing on succinctness incentivizes bad decision making.

For example, I often decide to summarize one or more of operations (like successive maps) by creating a method for it and using a method reference. This can have different benefits like keeping all of the operations in my stream pipeline on the same level of abstraction or simply naming operations that would otherwise be harder to understand (you know, intention revealing names and stuff). If I focus on succinctness I might not do this.

Aiming for fewer lines of code can also lead to combining several operations into a single lambda just to save a couple of maps or filters. Again, this defeats the purpose behind streams!

So, when you see some code and think about refactoring it to streams, don’t count lines to determine your success!

Using Ugly Mechanics

The first thing the loop does is also the way to start off the stream: We split the query string along ampersands and operate on the resulting key-value-pairs. The article does it as follows:

Arrays.stream(query.split("[&]"))

Looking good? Honestly, no. I know that this is the best way to create the stream but just because we have to do it this way does not mean we have to look at it. And what we’re doing here (splitting a string along a regex) seems pretty general, too. So why not push it into a utility function?

public static Stream<String> splitIntoStream(String s, String regex) {
	return Arrays.stream(s.split(regex));
}

Then we start the stream with

splitIntoStream(query, "[&]")
. A simple “extract method”-refactoring but so much better.

Suboptimal Data Structures

Remember what we wanted to do? Parse something like

a=foo&b=bar&a=fu
to
a~>{foo,fu} b~>{bar}
. Now, how could we possibly represent the result? It looks like we’re mapping single strings to many strings, so maybe we should try a
Map<String, List<String>>
?

That is definitely a good first guess… But it is by no means the best we can do! First of all, why is it a list? Is order really important here? Do we need duplicated values? I’d guess no on both counts, so maybe we should try a set?

Anyways, if you ever created a map where values are collections, you know that this is somewhat unpleasant. There is always this edge case of “is this the first element?” to consider. Although Java 8 made that a little less cumbersome…

public void addPair(String key, String value) {
	// `map` is a `Map<String, Set<String>>`
	map.computeIfAbsent(key, k -> new HashSet<>())
			.add(value);
}

… from an API perspective it is still far from perfect. For example, iterating or streaming over all values is a two-step process:

private <T> Stream<T> streamValues() {
	// `map` could be a `Map<?, Collection<T>>`
	return map
			.values().stream()
			.flatMap(Collection::stream);
}

Bleh!

Long story short, we’re shoehorning what we need (a map from keys to many values) into the first thing we came up with (a map from keys to single values). That’s not good design!

Especially since there’s a perfect match for our needs: Guava’s Multimap. Maybe there’s a good reason not to use it but in that case it should at least be mentioned. After all, the article’s quest is to find a good way to process and represent the input, so it should do a good job in picking a data structure for the output.

(While this is a recurring theme when it comes to design in general, it is not very stream specific. I didn’t count it into the 5 common tropes but still wanted to mention it because it makes the final result much better.)

Corny Illustrations

Speaking of common tropes… One is to use a corny photo of a stream to give the post some color. With this, I am happy to oblige!

Anemic Pipelines

Did you ever see a pipeline that does almost nothing but then suddenly crams all functionality into a single operation? The article’s solution to our little parsing problem is a perfect example (I removed some null handling to improve readability):

private Map<String, List<String>> parseQuery(String query) {
	return Arrays.stream(query.split("[&]"))
		.collect(groupingBy(s -> (s.split("[=]"))[0],
				mapping(s -> (s.split("[=]"))[1], toList())));
}

Here’s my thought process when reading this: “Ok, so we split the query string by ampersands and then, JESUS ON A FUCKING STICK, what’s that?!” Then I calm down and realize that there’s an abstraction hiding here – it is common not to pursue it but let’s be bold and do just that.

In this case we split a request parameter

a=foo
into
[a, foo]
and process both parts separately. So shouldn’t there be a step in the pipeline where the stream contains this pair?

But this is a rarer case. Far more often the stream’s elements are of some type and I want to enrich it with other information. Maybe I have a stream of customers and want to pair it with the city they live in. Note that I do not want to replace the customers with cities – that’s a simple

map
– but need both, for example to map cities to the customers living therein.

Properly representing intermediate results is a boon to readability.

What have both cases in common? They need to represent a pair. Why don’t they? Because Java has no idiomatic way to do it. Sure, you can use an array (works well for our request parameters), a Map.Entry, some library’s tuple class, or even something domain specific. But few people do, which makes code that does do it stand out by being a little surprising.

Still, I absolutely prefer to do it that way! Properly representing intermediate results is a boon to readability. Using

Entry
it looks like this:

private Map<String, List<String>> parseQuery(String query) {
	return splitIntoStream(query, "[&]")
			.map(this::parseParameter)
			.collect(groupingBy(Entry::getKey,
					mapping(Entry::getValue, toList())));
}

private Entry<String, String> parseParameter(String parameterString) {
	String[] split = parameterString.split("[=]");
	// add all kinds of verifications here
	return new SimpleImmutableEntry<>(split[0], split[1]);
}

We still have that magic collector to deal with but at least a little less is happening there.

Collector Magic

Java 8 ships with some crazy collectors (particularly those that forward to downstream collectors) and we already saw how they can be misused to create unreadable code. As I see it, they mostly exist because without tuples, there is no way to prepare complex reductions. So here’s what I do:

  • I try to make the collector as simple as possible by properly preparing the stream’s elements (if necessary, I use tuples or domain specific data types for that).
  • If I still have to do something complicated, I stick it into a utility method.

Eating my own dog food, what about this?

private Map<String, List<String>> parseQuery(String query) {
	return splitIntoStream(query, "[&]")
			.map(this::parseParameter)
			.collect(toListMap(Entry::getKey, Entry::getValue));
}

/** Beautiful JavaDoc comment explaining what the collector does. */
public static <T, K, V> Collector<T, ?, Map<K, List<V>>> toListMap(
		Function<T, K> keyMapper, Function<T, V> valueMapper) {
	return groupingBy(keyMapper, mapping(valueMapper, toList()));
}

It’s still hideous – although less so – but at least I don’t have to look at it all the time. And if I do, the return type and the contract comment will make it much easier to understand what’s going on.

Or, if we decided to use the Multimap, we shop around for a matching collector:

private Multimap<String, String> parseQuery(String query) {
	return splitIntoStream(query, "[&]")
			.map(this::parseParameter)
			.collect(toMultimap(Entry::getKey, Entry::getValue));
}

In both cases we could even go one step further and make a special case for streams of entries. I’ll leave that as an exercise to you. :)

Exception Handling

The article culminates in the biggest challenge when working with streams: exception handling. It says:

Unfortunately, if you go back and look at the original code you will see that I’ve conveniently left out one step: using URLDecoder to convert the parameter strings to their original form.

The problem is that

URLDecoder::decode
throws the checked
UnsupportedEncodingException
, so it is not possible to simply add it to the code. So which approach to this relevant problem does the article take? The ostrich one:

In the end, I decided to keep my first super-slim approach. Since my web front end wasn’t encoding anything in this case my code would still work.

Eh… Doesn’t the article’s title mention exceptions? So shouldn’t it spend a little more thought on this?

Anyways, error handling is always tough and streams add some constraints and complexity. Discussing the different approaches takes time and, ironically, I’m not keen on squeezing it into a post’s final sections. So let’s defer a detailed discussion about how to use runtime exceptions, trickery, or monads to address the problem and instead look at one possible solution.

The simplest thing for an operation to do is to sift out the elements that cause trouble. So instead of mapping each element to a new one, the operation would map from a single element to either zero or one element. In our case:

private static Stream<Entry<String, String>> parseParameter(
		String parameterString) {
	try {
		return Stream.of(parseValidParameter(parameterString));
	} catch (IllegalArgumentException | UnsupportedEncodingException ex) {
		// we should probably log the exception here
		return Stream.empty();
	}
}

private static Entry<String, String> parseValidParameter(
		String parameterString)
		throws UnsupportedEncodingException {
	String[] split = parameterString.split("[=]");
	if (split.length != 2) {
		throw new IllegalArgumentException(/* explain what's going on */);
	}
	return new SimpleImmutableEntry<>(
			URLDecoder.decode(split[0], ENCODING),
			URLDecoder.decode(split[1], ENCODING));
}

We then use

parseParameter
in a
flatMap
instead of a
map
and get a stream of those entries that could be split and decoded (and a bunch of log messages telling us in which cases things went wrong).

Showdown

Here’s the article’s final version:

private Map<String, List> parseQuery(String query) {
	return (query == null) ? null : Arrays.stream(query.split("[&]"))
		.collect(groupingBy(s -> (s.split("[=]"))[0],
				mapping(s -> (s.split("[=]"))[1], toList())));
}

The summary says:

The takeaway from this is that using streams and the flexibility of collectors it is possible to greatly reduce the amount of code required for complex processing. The drawback is this doesn’t work quite so well when those pesky exceptions rear their ugly head.

Here’s mine:

private Multimap<String, String> parseQuery(String query) {
	if (query == null)
		return ArrayListMultimap.create();
	return splitIntoStream(query, "[&]")
			.flatMap(this::parseParameter)
			.collect(toMultimap(Entry::getKey, Entry::getValue));
}

// plus `parseParameter` and `parseValidParameter` as above

// plus the reusable methods `splitIntoStream` and `toMultimap`

More lines, yes, but the stream pipeline has much less technical mumbo-jumbo, a full feature-set by URL-decoding the parameters, acceptable (or at least existing) exception handling, proper intermediate results, a sensible collector, and a good result type. And it comes with two universal utility functions that help other devs. I think the few extra lines are worth all that.

So my takeaway is a little different: Use streams to make your code reveal its intentions by using streams’ building blocks in a simple and predictable manner. Take the chance to look for reusable operations (particularly those that create or collect streams) and don’t be shy about calling small methods to keep the pipeline readable. Last but not least: ignore line count.

What do you think? Am I way off? A nitpicking asshole? Or right on target? Leave a comment and tell me. If you accidentally agree, you might want to share this post with your friends and followers.

twittergoogle_plusredditlinkedin

And if you like what I’m writing about, why don’t you follow me?

twittergoogle_plusrss

Post Scriptum

By the way, with Java 9’s enhancements to the stream API, we don’t have to special-case a null query string:

private Multimap<String, String> parseQuery(String query) {
	return Stream.ofNullable(query)
			.flatMap(q -> splitIntoStream(q, "[&]"))
			.flatMap(this::parseParameter)
			.collect(toMultimap(Entry::getKey, Entry::getValue));
}

Can’t wait!

The post Rebutting 5 Common Stream Tropes appeared first on blog@CodeFX.

The Ultimate Guide To Java 9

$
0
0

Today was the grand opening of SitePoint’s Java channel and we kicked it off with the ultimate guide to Java 9. We left out Project Jigsaw because so much has already been written about it and focused on everything else – and there’s a lot of it!

  • Language Changes
    • Private Interface (Default) Methods
    • Try-With-Resources on Effectively Final Variables
    • Diamond Operator for Anonymous Classes
    • SaveVarargs on Private Methods
    • No More Deprecation Warnings for Imports
  • APIs
    • OS Processes
    • Multi-Resolution Images
    • Stack Walking
    • Redirected Platform Logging
    • Reactive Streams
    • Collection Factory Methods
    • Native Desktop Integration
    • Deserialization Filter
    • Networking
      • HTTP/2
      • Datagram Transport Layer Security (DTLS)
      • TLS Application-Layer Protocol Negotiation Extension (TLS ALPN)
      • OCSP Stapling for TLS
    • XML
      • OASIS XML Catalogs Standard
      • Xerces 2.11.0
    • Extensions to Existing APIs, e.g.
      • Optional, Stream, and Collectors
      • DateTime API
      • Matcher
      • Atomic…
      • Array utilities
  • Low Level APIs
    • Variable Handles Aka VarHandles
    • Enhanced Method Handles
    • Dynalink
    • Nashorn Parser API
    • Spin-Wait Hints
  • Deprecations
    • Applet API
    • Corba
    • Observer, Observable
    • SHA-1
  • Removals

Now, don’t tell me, you’re not curious! Go check it out:

The Ultimate Guide to Java 9

The post The Ultimate Guide To Java 9 appeared first on blog@CodeFX.

What Future Java Might Look Like

$
0
0

I wrote this article for SitePoint's Java channel, where you can find a lot of interesting articles about our favorite programming language. Check it out!

During the second week of November was Devoxx Belgium, Europe’s biggest Java conference, and as every year the community’s who’s-who showed up. One of them was Brian Goetz, Java Language Architect at Oracle, and he gave what I would consider the conference’s most thrilling talk: “Java Language and Platform Futures: A Sneak Peek”. In it he presented ideas that the JDK team is currently kicking around. And boy, is the pipeline full of great stuff! Java won’t look the same once it’s all out in the wild.

When will that be? Nobody knows. And that’s not nobody as in nobody outside of Oracle, that’s nobody as in nobody knows whether happy endings exist for arbitrary

n
. Brian went to great lengths to stress how very, very speculative all of the following is and how much things might evolve or simply get dropped. He went so far to let everyone in the audience sign an acknowledgment thereof (just mentally but still) and explicitly forbade any sensationalist tweets.

Well… first of all, this is no tweet and second of all, I wasn’t in that audience. So here we go! (Seriously though, take this as what it is: a glimpse into one of many, many possible futures.)

Crash Course

Before we go through the ideas one by one, let’s jump right in and have a look at what code might look like that uses all of the envisaged features. The following class is a simple linked list that uses two types of nodes:

  • InnerNode
    s that contain a value and link to the next node
  • EndNode
    s that only contain a value

One particularly interesting operation is

reduce
, which accepts a seed value and a
BinaryOperator
and applies it to the seed and all of the nodes’ values. This is what that might one day look like:

public class LinkedList<any T> {

    private Optional<Node<T>> head;

    // [constructors]
    // [list mutation by replacing nodes with new ones]

    public T reduce(T seed, BinaryOperator<T> operator) {
        var currentValue = seed;
        var currentNode = head;

        while (currentNode.isPresent()) {
            currentValue = operator
                    .apply(currentValue, currentNode.get().getValue());
            currentNode = switch (currentNode.get()) {
                case InnerNode(_, var nextNode) -> Optional.of(nextNode);
                case EndNode(_) -> Optional.empty();
                default: throw new IllegalArgumentException();
            }
        }

        return currentValue;
    }

    private interface Node<any T> {
        T getValue();
    }

    private static class InnerNode<any T>(T value, Node<T> next)
            implements Node<T> { }

    private static class EndNode<any T>(T value)
            implements Node<T> { }

}

Wow! Hardly Java anymore, right?! Besides the omitted constructors there’s only code that actually does something – I mean, where’s all the boilerplate? And what if I told you that on top of that performance would be much better than today? Sounds like a free lunch, heck, like an entire free all-you-can-eat buffet!

Here’s what’s new:

  • The generic type argument is marked with
    any
    – what’s up with that?
  • Where are the type information for
    currentValue
    and
    currentNode
    in
    reduce
    ?
  • That
    switch
    is almost unrecognizable.
  • The classes
    InnerNode
    and
    EndNode
    look, err, empty.

Let’s look at all the ideas that went into this example.

Data Objects

When was the last time you created a domain object that was essentially a dumb data holder, maybe with one or two non-trivial methods, that still required a hundred lines for constructors, static factory methods, accessors,

equals
,
hashCode
, and

toString
. (Right now, you say? Don’t worry, I don’t judge.) And while IDEs happily generate all of that, making typing it unnecessary even today, it is still code that needs to be understood (does the constructor do any validation?) and maintained (better not forget to add that new field to
equals
).

In an aggressive move to reduce boilerplate, the compiler might generate all of that stuff on the fly without us having to bend a finger!

Here’s what a user might look like:

public class User(String firstName, String lastName, DateTime birthday) { }

We can get everything else I mentioned above for free and only need to actually implement what’s non-standard (maybe users have an ID that alone determines equality, so we’d want an according

equals
implementation). Getting rid of all that code would be a great boost for maintainability!

Looking at the linked list example we can see that

InnerNode
and
EndNode
depend on this feature.

Value Types

When Java was created an arithmetic operation and a load from main memory took about the same number of cycles (speaking in magnitudes here). This changed considerably over the last 20 and more years to the point where memory access is about three magnitudes slower.

That all abstract Java types are objects, linked to each other via references, requires pointer hunting and makes the problem even worse. The benefits are that such types have identity, allow mutability, inheritance, and a couple of other things… which we don’t actually always need. This is very unsatisfactory and something needs to be done!

In comes Project Valhalla, as part of which value types are being developed as we speak. They can be summarized as self-defined primitives. Here’s a simple example:

value class ComplexNumber {

    double real;
    double imaginary;

    // constructors, getters, setters, equals, hashCode, toString
}

Looks like a regular class – the only difference is the keyword

value
in there.

Like primitives, value types incur neither memory overhead nor indirection. A self-defined

ComplexNumber
, like the one above with two
double
fields
real
and
imaginary
, will be inlined wherever it is used. Like primitives, such numbers have no identity – while there can be two different
Double
objects with value 5.0, there can’t be two different doubles 5.0. This precludes some of the things we like to do to objects: setting them to null, inheriting, mutating, and locking. In turn, it will only require the memory needed for those two doubles and an array of complex numbers will essentially be an array of real/imaginary pairs.

Like classes, value types can have methods and fields, encapsulate internals, use generics, and implement interfaces (but not extend other classes). Thus the slogan: “Codes like a class, works like an int.” This will allow us to no longer weigh an abstraction we would prefer against the performance (we imagine) we need.

Talking about performance, the advantages are considerable and can speed up just about any code. In a

HashMap
, for example, the nodes could become value types, speeding up one of Java’s most ubiquitous data structures. But this is not a low-level feature only hardcore library developers will want to use! It allows all of us to chose the right abstraction and inform the compiler as well as our colleagues that some of our objects in fact aren’t objects but values.

By the way, my personal guess is that the compiler would be just as helpful as with data objects and chip in constructors, getters, setters, etc.:

value class ComplexNumber(double real, double imaginary) { }

In case this wasn’t perfectly obvious: This is a deep change and interacts with basically everything:

  • the language (generics, wildcards, raw types, …)
  • the core libraries (collections, streams)
  • the JVM (type signatures, bytecodes, …)

So… where exactly in the linked list example do value types come in? Admittedly, they don’t play a big role. If I were clever enough to write a persistent data structure, the nodes could be value types (remember, they have to be immutable), which could be pretty interesting.

But there’s one possible value type in there:

Optional
. In Java 8 it is already marked as a value-based class, something that might one day become a value type or a wrapper thereof. This makes it flat and eliminates the memory indirection and possible cache miss it currently imposes.

Specialized Generics

With everybody and their dog creating primitive-like value types it becomes necessary to look at how they interact with parametric polymorphism. As you know, generics do not work for primitives – there can’t be an

ArrayList<int>
. This is already painful with eight primitives (see the primitive specializations of Stream or libraries like Trove) but becomes unbearable when developers can define more. If value types would have to be boxed to interact with generics (like primitives are today), their use would be fairly limited and they would be a non-starter.

So we want to be able to use generics with value types – and primitives can come along for the ride. In the end we not only want to instantiate an

ArrayList<int>
or
ArrayList<ComplexNumber>
, we also want it to be backed by an
int[]
or
ComplexNumber[]
, respectively. This is called specialization and opens a whole new can of worms. (To take a good look at those worms, watch the talk “Adventures in Parametric Polymorphism”, which Brian gave at JVMLS 2016. That article also contains a list of talks you can watch if you want to get deeper.)

Code that wants to generify not only over reference types but also over value types must mark the respective type parameters with

any
. You can see that
LinkedList
,
Node
, and its implementations do exactly that. This means that in a
LinkedList<int>
the nodes would actually have
int
fields as opposed to the
Object
fields holding boxed
Integer
s as would be the case with a
LinkedList<Integer>
nowadays.
what-future-java-might-look-like

More Type Inference

Java has done type inference since Java 5 (for type witnesses in generic methods) and the mechanism was extended in Java 7 (diamond operator), 8 (lambda parameter types), and 9 (diamond on anonymous classes). In Java X it might very well cover variable declarations. Brian’s example is this one:

// now
URL url = new URL("...")
URLConnectoin conn = url.openConnection();
Reader reader = new BufferedReader(
        new InputStreamReader(conn.getInputStream()));

// maybe in the future
var url = new URL("...")
var conn = url.openConnection();
var reader = new BufferedReader(
        new InputStreamReader(conn.getInputStream()));

Here, the types of

url
,
conn
, and
reader
are perfectly obvious. As a consequence the compiler can infer them, making it unnecessary for us to specify them. In general, type inference can reduce boilerplate but also hide essential information. If you consider variable names to be more important than their types, you’ll like this as it aligns the names perfectly while throwing out redundant information.

Note that type inference is not dynamic typing – it’s still strong typing just with less typing (Brian’s pun – presumably intended). The type information will still end up in the bytecode and IDEs will also be able to show them – it’s just that we don’t have to write it out anymore.

An automatic process deducing types implies that code changes will change the outcome of that computation. While it is generally ok for a local variable to change its type (e.g. to its supertype), the same is not true for fields, method parameters or return values, etc. On the contrary, any change here could cause binary incompatibilities, which would lead to code compiled against an old version failing to link at runtime. Not good and hence forbidden.

So that only local variables’ types are inferred is more about protecting the ecosystem from unstable code than protecting developers from unreadable code.

Pattern Matching

Java’s current ‘switch’ statement is pretty weak. You can use it for primitives, enums and strings but that’s it. If you want to do anything more complex, you either resort to if-else-if chains or, if you can’t get the Gang of Four book out of your head, the visitor pattern.

But think about it, there’s not really an intrinsic reason for these limitations. On a higher level a switch can be described to be using a variable to evaluate some conditions and choosing a matching branch, evaluating what it finds there – why should the variable’s type be so limited and the conditions only check equality? Come to think of it, why would the

switch
only do something as opposed to become something. Following this trail we end up with pattern matching, which has none of these limitations.

First of all, all kinds of variables could be allowed. Secondly, conditions could be much broader. They could, for example, check types or even deconstruct entire data objects. And last but not least, the whole

switch
should be an expression, evaluated to the expression in the branch of the matching condition.

Here are Brian’s examples:

// matching types
String formatted;
switch (constant) {
    case Integer i: formatted = String.format("int %d", i); break;
    case Byte b: //...
    case Long l: // ...
    // ...
    default: formatted = "unknown"
}

// used as an expression
String formatted = switch (constant) {
    case Integer i -> String.format("int %d", i);
    case Byte b: //...
    case Long l: // ...
    // ...
    default: formatted = "unknown"
}

// deconstructing objects
int eval(ExprNode node) {
    return switch (node) {
        case ConstantNode(var i) -> i;
        case NegNode(var node) -> -eval(node);
        case PlusNode(var left, var right) -> eval(left) + eval(right);
        case MulNode(var left, var right) -> eval(left) * eval(right);
        // ...
    }
}

For the linked list I also used it as an expression and to deconstruct the nodes:

currentNode = switch (currentNode.get()) {
    case InnerNode(_, var nextNode) -> Optional.of(nextNode);
    case EndNode(_) -> Optional.empty();
    default: throw new IllegalArgumentException();
}

Much nicer than what it would have to look like now:

if (currentNode.get() instanceof InnerNode) {
    currentNode = Optional.of(((InnerNode) currentNode.get()).getNext());
} else if (currentNode.get() instanceof EndNode) {
    currentNode = Optional.empty();
} else {
    throw new IllegalArgumentException();
}

(Yes, I know, this particular example could be solved with polymorphism.)

Summary

Again, wow! Data objects, value types, generic specialization, more type inference, and pattern matching – that’s a set of huge features the JDK team is working on. I can’t wait for them to come out! (By the way, while I presented all the features here, Brian provides so much more interesting background – you should definitely check out the entire talk.)

What do you think? Would you like to code in that Java?

The post What Future Java Might Look Like appeared first on blog@CodeFX.

Reflection vs Encapsulation

$
0
0

I wrote this article for SitePoint's Java channel, where you can find a lot of interesting articles about our favorite programming language. Check it out!

Historically reflection could be used to break into any code that ran in the same JVM. With Java 9 this is going to change. One of the two main goals of the new module system is strong encapsulation; giving modules a safe space into which no code can intrude. These two techniques are clearly at odds so how can this stand off be resolved? After considerable discussions it looks like the recent proposal of open modules would show a way out.

If you’re all down with the module system and what reflection does, you can skip the following back story and jump right into the stand off.

Setting the Scene

Let me set the scene of how the module system implements strong encapsulation and how that clashes with reflection.

Module Crash Course

The Java Platform Module Saloon (JPMS) is introducing the concept of modules, which in the end are just regular JARs with a module descriptor. The descriptor is compiled from a

module-info.java
file that defines a module’s name, its dependencies on other modules, and the packages it makes available:

module some.module {

    requires some.other.module;
    requires yet.another.module;

    exports some.module.package;
    exports some.module.other.package;

}

In the context of encapsulation there are two points to take note of:

  • Only public types, methods, and fields in exported packages are accessible.
  • They are only accessible to modules that require the exporting module.

Here, “being accessible” means that code can be compiled against such elements and that the JVM will allow accessing them at run time. So if code in module a user depends on code in a module owner, all we need to do to make that work is have user require owner and have owner export the packages containing the required types:

module user {
    requires owner;
}

module owner {
    exports owner.api.package;
}

This is the common case and apart from making the dependencies and API explicit and known to the module system all works as we’re used to.

So far everybody’s having fun! Then, in comes Reflection… conversations halt mid-sentence, the piano player stops his tune.

Reflection

Before Java 9, reflection was allowed to break into any code. Aside from some pesky calls to

setAccessible
every type, method, or field in any class could be made available, could be called, could be changed – hell, even final fields were not safe!

Integer two = 2;

Field value = Integer.class.getDeclaredField("value");
value.setAccessible(true);
value.set(two, 3);

if (1 + 1 != two)
    System.out.println("Math died!");

This power drives all kinds of frameworks – starting with JPA providers like Hibernate, coming by testing libraries like JUnit and TestNG, to dependency injectors like Guice, and ending with obsessed class path scanners like Spring – which reflect over our application or test code to work their magic. On the other side we have libraries that need something from the JDK that it would rather not expose (did anybody say

sun.misc.Unsafe
?). Here as well, reflection was the answer.

So this guy, being used to getting what he wants, now walks into the Module Saloon and the bartender has to tell him no, not this time.

The Stand Off

Inside the module system (let’s drop the saloon, I think you got the joke) reflection could only ever access code in exported packages. Packages internal to a module were off limits and this already caused quite a ruckus. But it still allowed to use reflection to access everything else in an exported package, like package-visible classes or private fields and methods – this was called deep reflection. In September rules got even stricter! Now deep reflection was forbidden as well and reflection was no more powerful than the statically typed code we would write otherwise: Only public types, methods, and fields in exported packages were accessible.

All if this caused a lot of discussions of course – some heated, some amicable but all with the sense of utter importance.

Some (myself included) favor strong encapsulation, arguing that modules need a safe space in which they can organize their internals without the risk of other code easily depending on it. Examples I like to give are JUnit 4, where one big reason for the rewrite was that tools depended on implementation details, reflecting down to the level of private fields; and

Unsafe
, whose pending removal put a lot of pressure on a lot of libraries.

Others argue that the flexibility provided by reflection not only enables great usability for the many frameworks relying on it, where annotating some entities and dropping hibernate.jar onto the class path suffices to make things work. It also gives freedom to library users, who can use their dependencies the way they want, which might not always be the way the maintainers intended to. Here,

Unsafe
comes in as an example for the other side: Many libraries and frameworks that are now critical to the Java ecosystem were only feasible exactly because some hacks were possible without the JDK team’s approval.

Even though I tend towards encapsulation, I see the other arguments’ validity as well. So what to do? What choices to developers have besides encapsulating their internals and giving up on reflection?

Choice of Weapons

So let’s say we are in a position where we need to make a module’s internals available via reflection. Maybe to expose it to a library or framework module or maybe because we are that other module and want to break into the first one. In the rest of the article we’ll explore all available choices, looking for answers to these questions:

  • What privileges do we need to employ that approach?
  • Who can access the internals?
  • Can they be accessed at compile time as well?

For this exploration we will create two modules. One is called owner and contains a single class

Owner
(in the package
owner
) with one method per visibility that does nothing. The other, intruder, contains a class
Intruder
that has no compile time dependency on
Owner
but tries to call its methods via reflection. Its code comes down to this:

Class<?> owner = Class.forName("owner.Owner");
Method owned = owner.getDeclaredMethod(methodName);
owned.setAccessible(true);
owned.invoke(null);

The call to

setAccessible
is the critical part here, it succeeds or fails depending on how we decide to create and execute our modules. In the end we get output as follows:

public: ✓   protected: ✗   default: ✗   private: ✗

(Here only the public method could be accessed.)

All the code I’m using here can be found in a GitHub repository, including Linux scripts that run it for you.

Regular Exports

This is the vanilla approach to expose an API: The module owner simply exports the package

owner
. To do this we need of course be able to change the owning module’s descriptor.

module owner {
    exports owner;
}

With this we get the following result:

public: ✓   protected: ✗   default: ✗   private: ✗

So far, err… not so good. First of all, we only reached part of our goal because the intruding module can only access public elements. And if we do it this way all modules that depend on owner can compile code against it and all modules can reflect over its internals. Actually, they are no longer internals at all since we properly exported them – the package’s public types are now baked into the module’s API.

Qualified Exports

If exports are vanilla, this is cranberry vanilla – a default choice with an interesting twist. The owning module can export a package to a specific module with what is called a qualified export:

module owner {
    exports owner to intruder;
}

But the result is the same as with regular exports – the intruding module can only access public elements:

public: ✓   protected: ✗   default: ✗   private: ✗

Again, we reached only part of our goal, and again, we exposed the elements at compile time as well as at run time. The situation improved in the sense that only the named module, intruder in this case, is granted access but for that we accepted the necessity to actually know the module’s name at compile time.

Knowing the intruding module might be amenable in the case of frameworks like Guice but as soon as the implementation hides behind an API (what the JDK team calls an abstract reflective framework; think JPA and Hibernate) this approach fails. Independently of whether it works or not, explicitly naming the intruding module in the owning module’s descriptor can be seen as iffy. On the other hand, chances are the owning module already depends on the intruding one anyways because it needs some annotations or something, in which case we’re not making things much worse.

Open Packages

Now it gets interesting. A pretty recent addition to the module system is the ability for modules to open up packages at run time only.

module owner {
    opens owner;
}

Yielding:

public: ✓   protected: ✓   default: ✓   private: ✓

Neat! We killed two birds with one stone:

  • The intruding module gained deep access to the whole package, allowing it to use even private elements.
  • This exposure exists at run time only, so code can not be compiled against the package’s content.

There is one downside, though: All modules can reflect over the opened package, now. Still, all in all much better than exports.

Qualified Open Packages

As with exports and qualified exports, there exists a qualified variant of open packages as well:

module owner {
    opens owner to intruder;
}

Running the program we get the same result as before but now only intruder can achieve them:

public: ✓   protected: ✓   default: ✓   private: ✓

This presents us with the same trade-off as between exports and qualified exports and also doesn’t work for a separation between API and implementation. But there’s hope!

In November Mark Reinhold proposed a mechanism that would allow code in the module to which a package was opened up to transfer that access to a third module. Coming back to JPA and Hibernate this solves that problem exactly. Assume the following module descriptor for owner:

module owner {
    // the JPA module is called java.persistence
    opens owner to java.persistence;
}

In this case the mechanism could be employed as follows (quoted almost verbatim from the proposal):

A JPA entity manager is created via one of the

Persistence::createEntityManagerFactory
methods, which locate and initialize a suitable persistence provider, say Hibernate. As part of that process they can use the

addOpens
method on the client module owner to open the
owner
package to the Hibernate module. This will work since the
owner
module opens that package to the
java.persistence
module.

There is also a variant for containers to open packages to implementations. In the current EA build (b146) this feature does not seem to be implemented yet, though, so I couldn’t try it out. But it definitely looks promising!

Open Modules

If open packages were a scalpel, open modules are a cleaver. With it a module relinquishes any control over who accesses what at run time and opens up all packages to everybody as if there were an

opens
clause for each of them.

open module owner { }

This results in the same access as individually opened packages:

public: ✓   protected: ✓   default: ✓   private: ✓

Open modules can be considered an intermediate step on the migration path from JARs on the class path to full-blown, strongly encapsulating modules.

Class Path Trickery

Now we’re entering less modular ground. As you might know

java
and
javac
require modules to be on the module path, which is like the class path but for modules. But the class path is not going away and neither are JARs. There are two tricks we can employ if we have access to the launching command line and can push the artifact around (so this won’t work for JDK modules).

Unnamed Module

First, we can drop the owning module onto the class path.

How does the module system react to that? Since everything needs to be a module the module system simply creates one, the unnamed module, and puts everything in it that it finds on the class path. Inside the unnamed module everything is much like it is today and JAR hell continues to exist. Because the unnamed module is synthetic, the JPMS has no idea what it might export so it simply exports everything – at compile and at run time.

If any JAR on the class path should accidentally contain a module descriptor, this mechanism will simply ignore it. Hence, the owning module gets demoted to a regular JAR and its code ends up in a module that exports everything:

public: ✓   protected: ✓   default: ✓   private: ✓

Ta-da! And without touching the owning module, so we can do this to modules we have no control over. Small caveat: We can not require the unnamed module so there is no good way to compile against the code in the owning module from other modules. Well, maybe the caveat is not so small after all…

Automatic Module

The second approach is to strip the owning module of its descriptor and still put it on the module path. For each regular JAR on the module path the JPMS creates a new module, names it automatically based on the file name, and exports all its contents. Since all is exported, we get the same result as with the unnamed module:

public: ✓   protected: ✓   default: ✓   private: ✓

Nice. The central advantage of automatic modules over the unnamed module is that modules can require it, so the rest of the application can still depend on and compile against it, while the intruder can use reflection to access its internals.

One downside is that the module’s internals become available at run time to every other module in the system. Unfortunately, the same is true at compile time unless we manage to compile against the proper owning module and then rip out its descriptor on the way to the launch pad. This is iffy, tricky, and error-prone.

Command Line Escape Hatches

Since we’re fiddling with the command line anyway, there is a cleaner approach (maybe I should’ve told you about it earlier): Both

javac
and
java
come with a new flag
--add-opens
, which opens additional packages.

java \
    --module-path mods \
    --add-modules owner \
    --add-opens owner/owner=intruder \
    --module intruder

This works without changing the owning module and applies to JDK modules as well. So yeah, much better than the unnamed and automatic module hacks.

Summary

Ookey, still remember everything we did? No? Executive summary table to the rescue!

mechanism access compile access reflection access comments
export descriptor all code > public all code > public makes API public
qualified export descriptor specified modules > public specified modules > public need to know intruding modules
open package descriptor none all code > private
qualified open package descriptor none specified modules > private can be transfered to implementation modules
open module descriptor none all code > private one keyword to open all packages
unnamed module command line all non-modules > public all code > private
automatic module command line and artifact all code > public all code > private requires fiddling with the artifact
command line flag command line none all code > private

Wow, we really went through quite a number of options! But now you know what to do if you’re faced with the task to break into a module with reflection. In summary, I think the vast majority of use cases can be covered by answering one question:

Is it your own module?

  • Yes ⇝ Open packages (maybe qualified) or, if there are too many, the entire module.
  • No ⇝ Use the command line flag
    --add-opens
    .

The post Reflection vs Encapsulation appeared first on blog@CodeFX.

Why Elvis Should Not Visit Java

$
0
0

I was recently involved in quite a long Twitter discussion regarding Java’s

Optional
, type systems that distinguish nullable and non-nullable types and the Elvis operator, which allows null-safe member selection. The latter was peddled as a killer feature for succinct null-handling, which I strongly disagree with.

My opinion on the matter is that without a type system that allows making every type non-nullable (something that is not going to happen in Java any time soon) the Elvis operator would be detrimental to correctness and readability.

Let me explain why.

The Crux With Null

The issue with null is that it says nothing about why a value is missing

I already wrote about this before. The issue with null is not that it causes exceptions – that’s just a symptom. The problem with null is that it says nothing about why the value is missing. Was something tried and failed (like connecting to the database) but for some reason the execution continued? Is there a number of values (maybe a pair?) where only one could ever be present? Is the value just optional, like non-mandatory user input? Or, finally, is it an actual implementation error and the value should really never have been missing?

Bad code maps all of these cases to the same thing:

null
. So when a
NullPointerException
or other undesired behavior that relates to missing values (“Why is this field empty?”, “Why does the search not find that thing?”) pops up, what is the first step in fixing it? Finding out why the value is missing and whether that is ok or an implementation error. In fact, answering that question is usually 90% of the solution!

It can be very hard to do that, though, because null can hide in any reference type and unless rigorous checks are in place (like using

Objects::requireNonNull
on constructor and method parameters) it readily proliferates throughout a code base. So before answering why null showed up in the place where it caused trouble, it is necessary to track it to its source, which can take quite some time in a sufficiently complex system.

So the underlying problem with null is not the misbehavior it causes but the conflation of various different concerns into a single, particularly sneaky and error-prone concept.

Elvis Enters The Building

I’ve recently played around with Kotlin and was as amazed by the null-handling as I assumed I would be from reading about it. It is not the only language which does it this way but it’s one I actually worked with so I picked it as an example. But it is just that: an example. This is no “Kotlin is better than Java” argument, it’s an “look how other type systems handle this” elaboration.

(I highly recommend this thorough introduction to Kotlin’s type system if you want to learn more about it.)

Anyway, in such type systems default references are not-nullable and the compiler makes sure that no accidents happen. A

String
is always a string and not “either a string or null”.

// declare a variable of non-nullable type `User`
val user : User = ...
// call properties (if you don't know the syntax,
// just assume these were public fields)
val userStreet : String = user.address.street
// if neither `address` nor `street` return a nullable type,
// `userStreet` can never be null;
// if they would, the code would not compile because `userStreet`
// is of the non-nullable type `String`

Of course things can go missing and every type can be made nullable by appending

?
to it. From this point on, member access (e.g. calling methods) is at the risk of failing due to null references. The awesome part is that the compiler is aware of the risks and forces you to handle them correctly (or be an ass about it and override the complaints). What’s one way to do that? The Elvis operator!

Elvis, written as

?.
, distinguishes whether the reference on which the member is called is null or not. If it is null, the member is not called and the entire expression evaluates to null. If it is present, the member is called as expected.

// declare a variable of the nullable type `User`
val user : User? = ...
// use Elvis to navigate properties null-safely
val userStreet : String? = user?.address?.street
// if `user` is null, so is `userStreet`;
// `address` and `street` might return nullable types

In type systems that understand nullability Elvis is a wonderful mechanism! With it, you can express that you are aware values might be missing and accept that as an outcome for the call.

At the same time, the compiler will force you to use it on potentially null references, thus preventing accidental exceptions. Furthermore, it will forcefully propagate that ugly nullability-property to the variables you assign the result to. This forces you to carry the complexity of possibly null values with you and gives you an incentive to get rid of it sooner rather than later.

Why Shouldn’t This Work In Java?

So if I like Elvis so much in Kotlin, why wouldn’t I want to see it in Java? Because Elvis only works with a type system that distinguishes nullable from non-nullable types! Otherwise it does exactly the opposite of what it was supposed to and makes nulls much more problematic.

Elvis only works with non-nullable types

Think about it: You get an NPE from calling a member on null. What is the easiest thing to do? Squeeze that question mark in there and be done with it!

Is that correct? Null tells you nothing about whether a value is allowed to be missing, so who knows? Does it affect the calling or the called code negatively? Well, the compiler can’t tell you whether that code can handle null, so, again, who knows?

Type systems like Kotlin’s can answer both of these questions, Java’s leaves you guessing. The right choice is to investigate, which requires effort. The wrong choice is to just proliferate null. What do you think will happen if the second choice gets even easier than it is today? Do you expect to see more or less problems with absent values? Do you expect the paths from the source of a null reference to where it causes problems to become longer or shorter?

Elvis makes the wrong choice easier

Good languages and good APIs make the correct choice the easy one. Well-designed types in a good static type system rule out what should not happen at run time. Elvis in Java would fail on both these accounts.

Instead of demanding an easier way to handle null, we would do better to eradicate it from our code base or at least each type’s public API.

A Word On Optional

Most of the Twitter discussion actually revolved around

Optional
but I’m not going to repeat it here because that’s a different post (one I already wrotetwice actually). Instead I want to highlight a specific argument and put it into the context of Elvis.

It was repeatedly remarked as a weakness of

Optional
that it was so easy to mishandle and that imprudent use was a likely or even common scenario. Personally, I didn’t have that problem yet but it sounds reasonable. I would argue that handling
Optional
can be taught with moderate effort (surely more easily than proper null handling) but unless that happens I can see how misusing it could make a code base suck.

What the hell makes you think Elvis would not be so much worse?

But to those who feel that way, I want to pose the question: What the hell makes you think that this would not be so much worse with Elvis? As I pointed out above, it makes a terrible choice damnably easy! Arguably more so than

Optional
ever could.

Summary

Absent values necessary evil. Encoding as null bad. Proliferation terrible.

If Java had a type system that would help handling null and incentivize moving away from it, Elvis would be great. Alas, it doesn’t. So making it even easier to spread null around the code base instead of creating a proper design for missing values moves the needle in the wrong direction.

To end on a bellicose note: If you’ve read all this with the thought that you still want Elvis because it would make your life so much easier, chances are your APIs are badly designed because they overuse null. In that case your desire to get your hands on Elvis is precisely the reason why I think Java should not have it.

The post Why Elvis Should Not Visit Java appeared first on blog@CodeFX.

Viewing all 68 articles
Browse latest View live