story

I initially became interested in this problem while talking with a friend. I don’t remember how we got on this topic.

The number of trees with \(L\) leaves is ill-defined; since, without imposing additional constraints, there are an infinite number of such trees. Two potentially useful additional constraints come to mind: limiting the total number of nodes and limiting the height. This post was rushed a bit because I wanted to test this new blogging platform. As a result, this post will only focus on the first constraint and impose the additional constraint that the tree be binary.

problem

How many unique binary trees are there with \(N\) nodes and \(L\) leaves?

solution

The solution to this problem is most easily described by a recurrence relation.

\[ C_{N,L} = \sum_{0 \le k \le N-1} \sum_{0 \le b \le L} C_{k,b}C_{N-k-1, L-b} + \delta_{N,0}\delta_{L,0} + \delta_{N,1}\delta_{L,1} \]

for \(N \gt L \gt 0\), \(N=L=0\) and \(N=L=1\) and \(C_{N,L} = 0\) otherwise.

In the above equation, \(\delta\) is the Kronecker delta. The \(\delta_{N,0}\delta_{L,0}\) term represents the empty tree and \(\delta_{N,1}\delta_{L,1}\) term represents the tree with one node.

I do not have a proof of correctness beyond what is implicit in the formulation above, but there are other ways to gain confidence in this answer.

Firstly, it is fortunate that the number of binary trees with \(N\) nodes (without the constraint on number of leaves) has a well known solution. Namely, it is given by the Catalan numbers which is defined using a similar recurrence relationship.

The Catalan numbers, \(C_N\) are given by the following formula: \[ C_N = \frac{1}{N+1} \binom{2N}{N} \]

It is obvious that the following relationship should hold, \[ C_N = \sum_{0 \le b \le N} C_{N,b} \]

This has been verified for up to \(N=100\). There is probably a simple inductive proof that can be made here.

It is also easy to verify that it is correct for \(L=1\) producing the expected \(C_{N,1} = 2^{N-1}\).

discussion

The Catalan numbers have many applications in combinatorics, this extention to the Catalan numbers may also be useful in some of those counting problems (by mapping problems for which the number of leaves has a meaningful image).

This recurrence relationship can be easily extended to a general n-ary tree using something similar to

\[ C_{N,L} = \sum_{p_i \in P} \sum_{b_i \in B} \prod_{i=0}^{I} C_{p_i,b_i} + \delta_{N,0}\delta_{L,0} + \delta_{N,1} \delta_{L,1} \]

Where \(P\) and \(B\) are the set of weak compositions of length \(I\) of \(N\) and \(L\) respectively. Where \(I\) determines the arity of the tree.

It would be interesting to develop an explicit formula for \(C_{N,L}\) perhaps by making use of generating functions.

I limited the scope of this problem since this post was rushed. I’m interested in tackling a few of the questions mentioned above in the future.

notes

  • all terms up to \(N=100\) can be found here