Extracting buildings from aerial scene images is an important task with many applications. However, this task is highly difficult to automate due to extremely large variations of building appearances, and still heavily relies on manual work. To attack this problem, we design a deep convolutional network with a simple structure that integrates activation from multiple layers for pixel-wise prediction, and introduce the signed distance function of building boundaries to represent output, which has an enhanced representation power. To train the network, we leverage abundant building footprint data from geographic information systems (GIS) to generate large amounts of labeled data. The trained model achieves a superior performance on datasets that are significantly larger and more complex than those used in prior work, demonstrating that the proposed method provides a promising and scalable solution for automating this labor-intensive task.