Rapid development of remote sensing (RS) imaging technology makes the acquired images have larger size, higher resolution, and more complex structure, which goes beyond the reach of classical hand-crafted feature-based matching. In this paper, we propose a feature learning approach based on two-branch networks to transform the image matching task into a two-class classification problem. To match two key points, two image patches centered at the key points are entered into the proposed network. The network aims to learn discriminative feature representations for patch matching, so that more matching pairs can be obtained on the premise of maintaining higher subpixel matching accuracy. The proposed network adopts a two-stage training mode to deal with the complex characteristics of RS images. An adaptive sample selection strategy is proposed to determine the size of each patch by the scale of its central key point. Thus, each patch can preserve the texture structure around its key point rather than all patches have a predetermined size. In the matching prediction stage, two strategies, namely, superpixel-based sample graded strategy and superpixel-based ordered spatial matching, are designed to improve the matching efficiency and matching accuracy, respectively. The experimental results and theoretical analysis demonstrate the feasibility, robustness, and effectiveness of the proposed method.