One-shot action recognition is one of the most challenging tasks due to the very limited training samples. For one-shot video action recognition, randomly selected frames from cluttered frame features may result in a poor performance. To use the most valuable frames in a better feature space, this paper proposes Hierarchical Temporal Memory Enhanced One-shot Distance Learning (HED). Firstly, we introduce temporal triplet from different frames, so that the intra-class distance will be decreased while the inter-class distance will be increased. Secondly, the Hierarchical Temporal Memory (HTM), a biological plausible unsupervised model for sequence prediction, is employed to enhance the one-shot action recognition by finding the most valuable frames in a video sequence. Finally, the selected frames together with the temporal triplet trained model are used to get the corresponding category label. Extensive experiments conducted on three benchmark datasets (i.e UCF11, UCF50 and HMDB51) demonstrate that we can achieve significant improvement than the state-of-the-art methods.