Many people, including Stephen Colebourne, have complained about some of the methods in the Scala collections API. A common example is the ++ operator:
// There are many variants but this is what Stephen Colebourne used as an example def ++ [B >: A, That] (that: TraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]) : That
Here’s an example of its use:
val first = List(1, 2, 3) val second = List(4, 5, 6) println(first ++ second) // List(1, 2, 3, 4, 5, 6)
In this blog post I’ll try to show that the perceived complexity of the definition is a necessary thing arising from several design goals. I’ll try to show code examples in Java so that the overall syntax is familiar for most people. Basically I’m trying to say that by following the following three goals you end up with the same kind of definition that Scala has (except that it cannot be represented with Java without violating any of the goals). If you feel that your grasp of Java generics is lacking or just want the TLDR version, you can skip right away to the monstrosity.
Design goals
1. Immutability
- The collection classes should be immutable.
2. Type safety
- The API should be as type safe as possible (explicit instanceofs or casts should not be needed)
3. Conformity with OOP
- Common methods should also be abstracted to interfaces if possible
- The collection classes should work with element subtypes/supertypes correctly
Common code
First we have a naive ImmutableList implementation that just wraps a standard Java list:
// ImmutableList.java
public class ImmutableList<A> {
private final List<A> inner;
private ImmutableList(List<A> inner) {
this.inner = inner;
}
public static ImmutableList<A> of(A value1, A value2) {
List<A> inner = new ArrayList<A>();
inner.add(value1);
inner.add(value2);
return new ImmutableList<A>(inner);
}
}
We’ll also use some data types to demonstrate design goal 3:
public interface Animal {
String getSound();
}
public class Cat implements Animal {
public String getSound() {
return "meow";
}
}
public class Dog implements Animal {
public String getSound() {
return "woof";
}
}
Attempt 1
// ImmutableList.java
public ImmutableList<A> concat(ImmutableList<A> other) {
List<A> newInner = new ArrayList<A>();
newInner.addAll(this.inner);
newInner.addAll(other.inner);
return new ImmutableList<A>(newInner);
}
Well that was easy, right?
Nope, the implementation doesn’t support variances (return element type can be super type of T, argument element type can be sub type of T).
ImmutableList<Cat> cats = ImmutableList.of(new Cat(), new Cat()); ImmutableList<Dog> dogs = ImmutableList.of(new Dog(), new Dog()); ImmutableList<Animal> animals = cats.concat(dogs); // Will NOT compile
Attempt 2
As far as I know, it’s not possible to make the above code compile with Java unless you drop generics (= violate goal 2). It’s possible to play with wildcards if you make it a static method instead:
// ImmutableList.java
public static <S> ImmutableList<S> concat(ImmutableList<? extends S> first, ImmutableList<? extends S> second) {
List<S> newInner = new ArrayList<S>();
newInner.addAll(first.inner);
newInner.addAll(second.inner);
return new ImmutableList<S>(newInner);
}
But a static method is a completely different beast, so let’s just ignore this problem for a moment and forget the support for variances. We are violating goal 3, but let’s just go on.
Attempt 3
Now let’s imagine we extend our immutable collection library with another type: ImmutableSet. We want to be able to concat sets too, so we create a concat method (still violating goal 3 regarding variances):
// ImmutableSet.java public ImmutableSet<A> concat(ImmutableSet<A> other);
Now, based on goal 3 we must abstract this concat method to an interface called ImmutableCollection:
// ImmutableCollection.java
public interface ImmutableCollection<A> {
ImmutableCollection<A> concat(ImmutableCollection<A> other);
}
Looks good, right?
Nope, the concat method now returns a generic collection, and the actual type is lost. It’s very reasonable to expect this to work:
ImmutableList<Dog> dogs = ImmutableList.of(new Dog(), new Dog()); ImmutableList<Dog> moreDogs = ImmutableList.of(new Dog(), new Dog()); ImmutableList<Dog> fourDogs = dogs.concat(moreDogs); // Won't compile because concat returns ImmutableCollection
The point here is that the returned object is an ImmutableList, but at compile time we cannot abstract the method in ImmutableCollection AND preserve the return type. We can of course cast the return value but we’ll violate goal 3.
Attempt 4
We need a way to describe the implementation type in the interface:
// ImmutableCollection.java
<THAT extends ImmutableCollection<A>> THAT concat(ImmutableCollection<A> other);
// ImmutableList.java
public <THAT extends ImmutableCollection<A>> THAT concat(ImmutableCollection<A> other) {
// ...
return new ImmutableList<A>(newInner); // Won't compile!!
}
Now the interface compiles and the dog example works, but the implementation of concat will not compile anymore. The problem is that the implementation of concat must support all ImmutableCollections while the implementation is providing only a subtype (= ImmutableList).
Attempt 5
We must somehow abstract away the construction of the new collection:
// ImmutableBuilder.java
public interface ImmutableBuilder<THAT> {
THAT build(Collection elements);
}
I’ve violated goal 2 and dropped generics from elements in advance because I know they won’t work. An ImmutableBuilder is just a factory that knows how to create objects of type THAT from a collection of elements. Now we’ll change the signatures of both the interface and the implementation:
// ImmutableCollection.java
<THAT> THAT concat(ImmutableCollection<A> other, ImmutableBuilder<THAT> builder);
// ImmutableList.java
public <THAT> THAT concat(ImmutableCollection<A> other, ImmutableBuilder<THAT> builder) {
List<A> inner = // ...
return builder.build(inner);
}
Everything compiles now, so now we only need an implementation of a builder:
// ImmutableList.java
public static <T> ImmutableBuilder<ImmutableList<T>> builder() {
return new ImmutableBuilder<ImmutableList<T>>() {
public ImmutableList<T> build(Collection elements) {
return new ImmutableList<T>(new ArrayList<T>(elements));
}
};
}
Overview of the final Java version
We now have everything in place for our immutable collection API. On our journey we have violated goals 2 and 3 (due to language limitations) and added complexity to our implementation. However, the complexity cannot be avoided without violating the design goals. Our API is also pretty clunky to use:
// ImmutableCollection.java <THAT> THAT concat(ImmutableCollection<A> other, ImmutableBuilder<THAT> builder);
ImmutableBuilder<ImmutableList<Dog>> listBuilder = ImmutableList.builder(); ImmutableList<Dog> dogs = ImmutableList.of(new Dog(), new Dog()); ImmutableList<Dog> moreDogs = ImmutableList.of(new Dog(), new Dog()); ImmutableList<Dog> fourDogs = dogs.concat(moreDogs, listBuilder);
We don’t support variances and we have to pass around a builder for most of the operations. Due to goal 1, most interesting operations on the collections will return new ones, and thus will need a builder instance!
Stephen Colebourne wrote about ++ in his blog post:
“In fact this is the equivalent to Java’s addAll() on a list”
Reading this made me completely lose my respect to the guy. This is absolutely wrong and misses completely the crucial difference: immutability (= goal 1). Java’s addAll() doesn’t return a new collection, so it does not need ImmutableBuilder and variances cause less problems because there is no return type. However, Java collections are generally not immutable and thus violate goal 1. It is a completely different thing if you want mutable collections, but then you would have completely different requirements and you are not talking about Scala’s ++ operator anymore. If you want OOP-compatible, immutable, type-safe collections, you will end up with something that Scala has. That is why I consider complexity arising from goal 1 to be necessary. It’s also worth nothing that you don’t always have to understand the exact details of a method signature, and in the collections API the CanBuildFrom stuff is pretty much same everywhere.
Comparison with the Scala version
Java:
<THAT> THAT concat(ImmutableCollection<A> other, ImmutableBuilder<THAT> builder);
Scala:
def ++ [B >: A, That] (that: TraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]) : That
First thing you should note is that they are not actually equivalent. The Scala version does not violate any goal, but the Java version violates goals 2 and 3. Let’s look at all the type parameters we have here:
- A
- Element type of the first collection
- That/THAT
- The type that we expect to get out of the operation
- B
- Element type of the second collection. The definition essentially says that type B can be a supertype of A.
The next thing you should note is CanBuildFrom in the Scala version. CanBuildFrom[List[A], B, That] is a factory object that can create objects with type That, element type B from Lists of element type A. Sounds complex, but it’s not that difficult to understand. Our Java version is simpler (no A/B type parameters), but it violates our design goals. The presence of parameters B and That in the Scala version makes my previous dog/cat example possible and if you run that with Scala you will get a list of animals.
The CanBuildFrom parameter in the Scala version is implicit, meaning that it will automatically be added by the compiler to invocations based on the import scoping if possible. So 99% of the time you will not need to actually handle the parameter yourself. In the Java version you have to include the builder parameter all the time. I really have trouble understanding the fear of implicits. Many things that would be implicits in good Scala code are typically pushed to static variables or ThreadLocals in Java (this is of course not the case with the collections API).
About the syntax
Finally I must also mention that many people complain the use of operators instead of named methods. Personally I have no problem with this as long at it is not overused. The ++ operator is probably based on Haskell where list concat is done with that operator.
I’d also like to ask why don’t the complainers complain about other operators? The trivial answer is of course that most operators like +, – are based on mathematics and as such are “common knowledge”. But what about operators like modulo (% in some languages, mod in others)? What about the power operator (^, ^^ or **)? I don’t see many Python programmers crying about ** even though the syntax is not based on mathematics directly.
On the other hand, good syntax is always a matter of taste. I like the fact that for example scalaz has both named and operator versions of most important methods.
Nice explanation, thanks!
“In fact this is the equivalent to Java’s addAll() on a list”
Since the default collections in Scala are immutable, the method in question is the default way to add a list to a list. Java’s default way to add a list to a list is addAll(). Put those two things together and you get the phrase I used. (I could have explained it properly, but that would have required a lot more text on a point that was tangential to the blog I was writing). I think you, and others, have taken that line way too literally – if you use Scala daily the phrase will seem odd, but if you don’t use Scala it makes perfect sense.
More broadly, you’ve done good work above in showing how to port Scala to Java at the technical level. But, no sane Java developer would code like your final Java example. And yet lots of real work gets done in Java – how can that be? A super-strong type system is a trade-off in language design – I simply argue its not a trade-off I agree with, especially as my position is that the type system should be primarily for documentation and communication, with error checking being less important (a super-strong type system results in less clear documentation/communication, not more).
I can see your point and I agree that the default way of doing things is what gets the attention of developers (especially newcomers). But I think it’s bad if people cannot see why something is the default way.
If someone complained that Joda Time is an awful library because DateTime doesn’t have a setYear() method, wouldn’t that feel like an odd argument? After all, DateTime is an immutable class, so it’s a design choice. You would then point that person to MutableDateTime. But that doesn’t change the fact that the default way of Joda Time and most of the tutorials and documentation in the internet is to use immutable DateTime and withYear(). All this happens because you made the design decision to make many of the types immutable. (by the way, Joda Time is an awesome library, thanks for your work!)
So, in the end I strongly think that the issue here is mutable vs immutable collections, not directly Java syntax vs Scala syntax. You understand this, but the readers of your blog will probably just think “Scala is bad because list concat operation looks like that”.
If you can show me one implementation of a Java-based immutable collection API that conforms to my three reasonable design goals (Guava and Functional Java both fail goal 3), you can immediately prove my point false. But I am almost sure you cannot do that, because the design choice of immutability requires certain flexibility from the language, and Java does NOT provide that flexibility.
Java, immutability, fluent API. Choose two.
Well written. Good work, man.
Why can’t we do this?
IC<SELF extends IC> {
SELF concat(IC other);
}
IL extends IC {
IL concat(IC other) {
// …
}
}
Markup corrupted
Here: http://pastebin.com/f6HcRPg2
EDIT: I realized later I was wrong.
It is possible to use self types like that, but it makes the syntax very ugly for users because users will have to use the type parameter SELF.It also does not fix any of the variance issues (= you cannot concat list of dogs and list of cats to get list of animals)If I remember correctly, Scala actually uses self types in the collection API. But Scala supports so called abstract type members, which don’t end up in the type signature from the user’s point of view.
I don’t get your point on why the static version is bad.
Look at guava for a correct implementation of ImmutableList.
http://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/ImmutableList.java
Yes, the Guava version is very good (I use it in my Java projects). But you’ll notice that there is no concat method
The static version works and is a decent solution, BUT:
1. It violates goal 3 because you would need separate concat methods for all possible collection types (ImmutableList.concat, ImmutableSet.concat, …)
2. Method calls cannot be chained, they must be nested
Compare:
ImmutableList.of(1,2,3).concat(ImmutableList.of(4,5,6)).concat(ImmutableLIst.of(7,8,9));
to:
ImmutableList.concat(ImmutableList.concat(ImmutableList.of(1,2,3), ImmutableList.of(4,5,6)), ImmutableList.concat(7,8,9));
(by the way, this is the exactly same reason why Guava
s Collections2.transform/Collections2.filter become painful to use if there are more than a few invocations)
As i said i don’t know scala so perhaps my vision is limited but …
To me you definitly need, in one way or another, to specify the type of the collection you want in return. Either using a builder or using different implementations for different return type.
which type will result of a set ++ list ? a set ? a list ?
i guess scala will return a set if you affect into a set or a set if you affect into a set (like Set myset = aset ++ alist) but what if you’re using interface ? (like Collection mycollection = aset ++ alist)
Sure you type less code in scala but then lot of magic happen behind your back and not all informations is available to the reader if he doesn’t flully *know* all the magic which happen behind is back.
This is my main complain about Scala and all write-few do-many-things languages (ruby …)
Your guesses are pretty much correct
In case you don’t specify a type yourself, the operation result has usually the same type as the first collection.
val result = List(1, 2) ++ Set(1, 2)
// result: List[Int] = List(1, 2, 3, 4)
val result = Set(1, 2) ++ List(3, 4)
// result: Set[Int] = Set(1, 2, 3, 4)
You can also specify a type yourself, as long as it is compatible with the first collection type (= either the same type or a supertype):
val result: List[Int] = Set(1, 2) ++ List(3, 4)
// COMPILE ERROR because Set is the result type
val result: List[Int] = List(1, 2) ++ Set(3, 4)
// result: List[Int] = List(1, 2, 3, 4)
val result: Seq[Int] = List(1, 2) ++ Set(3, 4)
// result: Seq[Int] = List(1, 2, 3, 4)
val result: Traversable[Int] = List(1, 2) ++ Set(3, 4)
// result: Traversable[Int] = List(1, 2, 3, 4)
Also, the inner type of the collection is always inferred in a sane way (basically it’s the first common ancestor in type hierarchy):
val result: List[Cat] = List(Cat(), Cat()) ++ Set(Cat(), Cat())
// result: List[Cat] = List(Cat, Cat, Cat, Cat)
val result: List[Animal] = List(Cat(), Cat()) ++ List(Dog(), Dog())
// result: List[Animal] = List(Cat(), Cat(), Dog(), Dog())
// Combination of both inner and collection types:
val result: Traversable[Animal] = List(Cat(), Cat()) ++ Set(Dog()) ++ Seq(Bird())
// result: Traversable[Animal] = List(Cat(), Cat(), Dog(), Bird())
Personally I find the above very intuitive. So basically:
1. If you care about the result type, specify it
2. If you don’t care about the result type, the compiler will 99% of the time choose the obvious and correct one
I am not a native speaker, but what is the meaning of “should” in goal 1? Does it mean maybe, maybe not? If so, adAll() fits perfectly…
In this context “should” could be replaced with “must”
Nice post. Stephen’s blog is a feeble attempt to hold back the Scala tide.
Pingback: This week in #Scala (02/12/2011) « Cake Solutions Team Blog
Nice explanation but you loose respect for a guy because he says something which is technically or subjectively wrong? I do not know much of Scala but I like to see views from both sides. One side correcting the other. I like to see debate on concepts not on people, in the same way Stephen did not pass any comments on Martin Odersky, although he did not like Scala.
Mainly I felt disappointed in that even though he clearly understands immutability vs mutability, he mentioned the ++ operator in a context that did not state the design differences clearly. That kind of writing style definitely spreads FUD instead of actually giving readers valuable insight on things. For example, I’ve seen so many “Hibernate is bad” posts that make unfair comparisons without actually mentioning the design and requirement differences.
That is also why I almost completely focused on the concept differences, not that much on subjective things or persons.
Self types are more difficult than they appear to be to get right. There are two intrinsic problems. First, where does the self type “stop”? Say class IC is abstract, and it is implemented through private classes IC1 and IC2, which are built from a factory. This is a common pattern and, in fact, Java collections use it. Absent some mechanism to control what “self” means, it would leak the types IC1 and IC2, which are very likely to be undesirable.
And then, there’s anonymous classes, but this is simpler to exclude with an automatic rule. On the other hand, it may be wrong to exclude anonymous classes from the “self” rule.
The other main problem is contra-variance. Sometimes, types go the other way, as see in Java with . The intersection of contra-variance and self types is very difficult to handle.
Nice analysis, and nice example. And finally it says the point – if you look for a language where (somewhat) sophisticated concepts can be expressed in something that doesn’t read as line noise to the reader on the 1st look, both Scala and Java fail, and such a programming language is yet to be invented …