Zhenyu’s Blog

陌上发花,可以缓缓醉矣, 忍把浮名,换了浅酌低唱。

Hibernate N+1 问题

| Comments

考虑下面的一对多关系:一个Department包含多个Employee:

public class Employee
{
    public virtual int Id { get; set; }
    public virtual string Name { get; set; }
    public virtual string IdentityNumber { get; set; }
    public virtual Department Department { get; set; }
}

public class Department
{
    public virtual int Id { get; set; }
    public virtual string Name { get; set; }     
    public virtual IList<Employee> Employees { get; set; }
}

数据库中已有数据如下:

nhibernate_n_1_db_data

默认情况下,Department.Employees是Lazy Load的,也就是说,当查询Department时,hibernate并不会将对应的employee也从数据库中加载进来, 而是在第一次调用Department.Employees时加载。

N+1问题是如何出现的

当应用程序需要遍历多个Department及其Employee时,问题就出现了,看测试代码:

[Test]
public void show_n_plus_1_problem()
{
     var modelMapper = new ModelMapper();
     modelMapper.Class<Department>(cm =>
     {
          cm.Table("Department");
          cm.Id(x => x.Id, map => map.Generator(Generators.Identity));
          cm.Property(x => x.Name);
          cm.Bag(x => x.Employees, map => map.Key(k => k.Column("DepartmentId")), rel => rel.OneToMany());
     });
     MapEmployee(modelMapper);

     using (var sessionFactory = BuildSessionFactory(modelMapper))
     using(var session = sessionFactory.OpenSession())
     {
          // 1 query for retrieve all departments, got n departments
          // n query to retrieve all employees
          var departments = session.QueryOver<Department>().List<Department>();
          Assert.AreEqual(3, departments.Count);
          departments.ForEach(department => department.Employees.ForEach(employee => Console.WriteLine(employee.Name)));
     }
}

  为了能在测试中动态改变nhibernate映射,这里使用了nHibernate 3.2新增的map by code特性。

private void MapEmployee(ModelMapper modelMapper)
{
     modelMapper.Class<Employee>(cm =>
         {
             cm.Table("Employee");
              cm.Id(x => x.Id, map => map.Generator(Generators.Identity));
             cm.Property(x => x.Name);
             cm.Property(x => x.IdentityNumber);
              cm.ManyToOne(x => x.Department, map => map.Column("DepartmentId"));
     });
}

private ISessionFactory BuildSessionFactory(ModelMapper modelMapper)
{
     var configuration = new Configuration();
     configuration.Configure();

     HbmMapping hbmMapping = modelMapper.CompileMappingForAllExplicitlyAddedEntities();
     Console.WriteLine(hbmMapping.AsString());

     configuration.AddDeserializedMapping(hbmMapping, "");

     return configuration.BuildSessionFactory();
}

运行测试可以发现,hibernate实际执行了如下sql查询:

SELECT Id, Name FROM Department
SELECT Id, Name, IdentityNumber, DepartmentId FROM Employee WHERE DepartmentId=1
SELECT Id, Name, IdentityNumber, DepartmentId FROM Employee WHERE DepartmentId=2
SELECT Id, Name, IdentityNumber, DepartmentId FROM Employee WHERE DepartmentId=3

当执行session.QueryOver<Department>().List<Department>()时, nhibernate执行了1次查询SELECT Id, Name FROM Department, 此时department.Employees中并没有加载出对应的 Employee,而每次调用department.Employees时,nhibernate都会从Employee表中加载相应的Employee。

因此,对N个Department,实际会执行N+1次查询。这将对性能造成很大影响。

如何解决N+1问题

(1)采用Batch Load批量加载

[Test]
public void resolve_n_plus_1_problem_by_batch_load()
{
     var modelMapper = new ModelMapper();
     modelMapper.Class<Department>(cm =>
     {
          cm.Table("Department");
          cm.Id(x => x.Id, map => map.Generator(Generators.Identity));
          cm.Property(x => x.Name);
          cm.Bag(x => x.Employees, map =>
          {
               map.Key(k => k.Column("DepartmentId"));
               map.BatchSize(2); // add batch size to batch load employees
          }, rel => rel.OneToMany());
     });
     MapEmployee(modelMapper);

     using (var sessionFactory = BuildSessionFactory(modelMapper))
     using (var session = sessionFactory.OpenSession())
     {
          // 1 query for retrieve all departments, got n departments
          // ceil(n/BatchSize) query to retrieve all employees
          var departments = session.QueryOver<Department>().List<Department>();
          Assert.AreEqual(3, departments.Count);
          departments.ForEach(department => department.Employees.ForEach(employee => Console.WriteLine(employee.Name)));
     }
}

nHibernate实际执行查询:

SELECT this_.Id as Id0_0_, this_.Name as Name0_0_ FROM Department this_
SELECT ... FROM Employee employees0_ WHERE employees0_.DepartmentId in (1, 2);
SELECT ... FROM Employee employees0_ WHERE employees0_.DepartmentId = 3

  这种方式并没有完全解决N+1查询问题,但显著减少了查询次数,实际查询为ceil(n/BatchSize) + 1次,BatchSize足够大时,查询次数为2次。

(2)使用Join Fetch一次性加载所有数据

[Test]
public void resolve_n_plus_1_problem_by_join_fetch()
{
     var modelMapper = new ModelMapper();
     modelMapper.Class<Department>(cm =>
     {
          cm.Table("Department");
          cm.Id(x => x.Id, map => map.Generator(Generators.Identity));
          cm.Property(x => x.Name);
          cm.Bag(x => x.Employees, map =>
          {
               map.Key(k => k.Column("DepartmentId"));
               map.Fetch(CollectionFetchMode.Join); // eager load employees
          }, rel => rel.OneToMany());
     });
     MapEmployee(modelMapper);

     using (var sessionFactory = BuildSessionFactory(modelMapper))
     using (var session = sessionFactory.OpenSession())
     {
          // 1 query for retrieve all departments and employees
          var departments = new HashSet<Department>(session.QueryOver<Department>().List<Department>()).ToList();
          Assert.AreEqual(3, departments.Count);
          departments.ForEach(department => department.Employees.ForEach(employee => Console.WriteLine(employee.Name)));
     }
}

  nHibernate实际执行查询:

SELECT ... FROM Department this_ left outer join Employee employees2_ on this_.Id=employees2_.DepartmentId

  需要注意的是,这种join fetch会导致重复记录,比如本例中,因为Jim和Rechard同属于Finance部门,所以departments实际查询出4条记录, 还需用Set过一遍以保证departments集合元素的唯一性。

另外,这种在映射中直接配置join fetch的做法也不够灵活,在某些场景中,可能并不关心Department.Employees,join fetch没有必要。

(3)Department.Employees映射仍然是Lazy的,仅在需要时通过join fetch加载

[Test]
public void resolve_n_plus_1_problem_by_join_fetch_when_necessary()
{
     var modelMapper = new ModelMapper();
     modelMapper.Class<Department>(cm =>
     {
          cm.Table("Department");
          cm.Id(x => x.Id, map => map.Generator(Generators.Identity));
          cm.Property(x => x.Name);
          cm.Bag(x => x.Employees, map => map.Key(k => k.Column("DepartmentId")), rel => rel.OneToMany());
     });
     MapEmployee(modelMapper);

     using (var sessionFactory = BuildSessionFactory(modelMapper))
     using (var session = sessionFactory.OpenSession())
     {
          // 1 query for retrieve all departments and all employees
          // Note: the retrieved departments will be duplicated if excluding the 'distinct' keyword
          var departments = session.CreateQuery("select distinct d from Department d left join fetch d.Employees").List<Department>();
          Assert.AreEqual(3, departments.Count);
          departments.ForEach(department => department.Employees.ForEach(employee => Console.WriteLine(employee.Name)));
     }
}

nHibernate实际执行查询:

select distinct d... from Department department0_ left outer join Employee employees1_ on department0_.Id=employees1_.DepartmentId

  实际应用中,往往有些场景只关心department的数据,而不关心department.Employees; 而有些场景恰恰相反,关心department.Employees。 上面的例子就很有价值了,默认department.Employees为lazy load,在需要遍历department.Employee的场景使用另外的方法加载department, 例如:

public class DepartmentRepository 
{
    ....
    public IList<Department> FindAll()
    {
        return session.CreateQuery("select distinct d from Department d left join fetch d.Employees").List<Department>();
    }
}

Comments