Iterating through relations – batches

Scenario

The application retrieves all books for an author. It iterates through all chapters and counts the number of characters. An alternative scenario could go through orders of a customer and check if one of the order position can already be delivered.

The query for the books:

List<Book> books = session.createQuery(
   "from Book b where b.name like ?").setString(0, "Java%").list();

The following code printing the books will create one SQL query per book to initialize the chapters. We get 1+n queries in total. One for the books and n for the chapters, if we have n books.

for (Book book : books) {
   int totalLength = 0;
   for (Chapter chapter : book.getChapters()) {
      totalLength += (chapter.getContent() != null ?
      chapter.getContent().length() : 0);
   }
   log.info("Length of all chapters: " + totalLength);
}

Solution

One way to improve this is to define that Hibernate loads the chapters in batches. Here is the mapping:

Annotations. 

@OneToMany(cascade = CascadeType.ALL)
@JoinColumn(nullable = false)
@BatchSize(size = 4)
private Set<Chapter> chapters = new HashSet<Chapter>();

XML. 

<set name="chapters" batch-size="4">
  <key column="book_id"></key>
  <one-to-many/>
</set>

When iterating through 10 books, Hibernate will load the chapters for the first four, the next four and the last two books together in a single query. This is possible because the java.util.List returned by Hibernate is bewitched. Take the sample code of this chapter and play around with the batchsize in the method efficientBatchSizeForRelation in the class PerformanceTest.

The best size of the batch size is the number of entries you print normally on the screen. If you print a book and print only the first 10 chapters, then this could be your batch size.

It is possible to set a default for all relations in the Hibernate configuration.

<property name="default_batch_fetch_size">4</property>

Use this property with care. If you print on most screens the first 5 entries from a collection, a batch size of 100 is pretty useless. The default should be very low. Keep in mind that a size of 2 reduces the queries already by 50 % and 4 by 75 %.