【问题分析】Unsuccessful TensorSliceReader constructor: Failed to find any matching files for model

2023年5月23日下午6:08 • 人工智能 • 阅读 85

背景概述

百度/Google “Unsuccessful TensorSliceReader constructor: Failed to find any matching files for model”的错误，可以发现网上有很多类似的问题分析，但基本都是从装载pretrain的checkpoint文件路径提示解决方案，忽略了该错误信息中最后一个model 关键字 “Unsuccessful … Failed to find any matching files for model” 。

本文通过一个可以复现问题的sample程序，结合Tensorflow r1.15的源码对该问题做深入的分析。

示例Sample

import pdb
import tensorflow as tf

#pretrain的vgg16模型checkpoint文件
pretrained_model = '/your/path/to/vgg_16.ckpt'

#构建fc6 conv层的权重变量
fc6_conv = tf.get_variable("fc6_conv", [7, 7, 512, 4096], trainable=False)
#构建Saver OP，准备从模型checkpoint中恢复权重Assign给fc6_conv
restorer_fc = tf.compat.v1.train.Saver({"vgg_16/fc6/weights": fc6_conv})

with tf.Session() as sess:
  graph = tf.get_default_graph()
  fetch_list = []
  #从计算图中找到名字为"save/Assign"的Assign OP 加入fetch list
  for op in graph.get_operations():
    if op.name.find("save/Assign") >= 0:
      for tensor_o in op.outputs:
         fetch_list.append(tensor_o)

  #运行Saver OP从模型checkpoint中恢复权重并Assign给fc6_conv - 运行正常
  restorer_fc.restore(sess, pretrained_model)

  #运行fetch_list中的"save/Assign" Assign OP - 报错"... Failed to find any matching files for model"
  sess.run(fetch_list)

  sess.close()

Sample示意程序及其代码注释如上，当运行”sess.run(fetch_list)” 的时候，就会发生类似下面的Error错误，导致Sample程序运行异常终止

tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.

  (0) Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for model
     [[node save/RestoreV2 (defined at /home/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
     [[save/RestoreV2/_7]]
  (1) Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for model
     [[node save/RestoreV2 (defined at /home/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.

0 derived errors ignored.

问题分析

Sample程序对应的计算图

Sample程序的计算图如上所示，第一眼的感觉就是和sample程序逻辑上表达的计算图不相符，TF在图生成、切分、优化过程中自说自话添加了很多OP节点，这是导致TF静态计算图难以debug的重要原因，比如：其中出现Error错误的OP 即图中红框所示的save/Assign OP，它是一个Assign 类型的OP，而该OP在sample程序中其实没有API显示的创建。

save/Assign OP的input Tensor示意

save/Assign OP的input输入Tensor是上图中蓝框所示的save/RestoreV2 OP 和 fc6_conv OP，前者是一个RestoreV2类型OP,后者是一个VariableV2类型OP

//tensorflow/core/kernels/assign_op.h

  void Compute(OpKernelContext* context) override {
    const Tensor& rhs = context->input(1);

    // We always return the input ref.

    context->forward_ref_input_to_ref_output(0, 0);
...

如上根据Tensorflow Assign kernel的源码分析可以知道，Assign OP的kernel计算过程就是把input 1的Tensor直接输出给input 0 Tensor。根据上面的计算图可知，save/Assign OP的input 1就是 save/RestoreV2 OP的输出Tensor，input 0 就是fc6_conv OP的输出Tensor，即把save/RestoreV2从pretrain的模型权重checkpoint文件读取的值给到fc6_conv Variable中，实现restore权重的目的，so far so good 看不出为什么会产生Error 错误

//tensorflow/core/kernels/save_restore_v2_ops.cc

class RestoreV2 : public OpKernel {
...

  void Compute(OpKernelContext* context) override {
    //pretrain的模型权重checkpoint文件路径（可以是待通配符的路径pattern），如果有错就会引起本文的Error错误
    const Tensor& prefix = context->input(0);
...

        //读取checkpoint文件中的Tensor值
        RestoreTensor(context, &checkpoint::OpenTableTensorSliceReader,
                      /* preferred_shard */ -1, /* restore_slice */ true,
                      /* restore_index */ i);

//tensorflow/core/kernels/save_restore_tensor.cc

void RestoreTensor(OpKernelContext* context,
                   checkpoint::TensorSliceReader::OpenTableFunction open_func,
                   int preferred_shard, bool restore_slice, int restore_index) {
    //pretrain的模型权重checkpoint文件路径（可以是待通配符的路径pattern）
  const string& file_pattern = file_pattern_t.flat()(0);
...

  if (!reader) {
    //构建读取模型权重checkpoint文件的allocated_reader
    allocated_reader.reset(new checkpoint::TensorSliceReader(
        file_pattern, open_func, preferred_shard));
...

//tensorflow/core/util/tensor_slice_reader.cc

TensorSliceReader::TensorSliceReader(const string& filepattern,
 ...

  Status s = Env::Default()->GetMatchingPaths(filepattern, &fnames_);
 ...

  //分析模型权重checkpoint文件路径的pattern，提取其中发现的checkpoint文件路径,如果找不到合适的checkppoint文件路径，就会抛出本文的Error错误
  if (fnames_.empty()) {
    status_ = errors::NotFound(
        "Unsuccessful TensorSliceReader constructor: "
        "Failed to find any matching files for ",
        filepattern);
    return;
  }

分析save/RestoreV2 OP的kernel 源码实现如上，展示其中最关键的代码部分并加上了注释，可以看到如果save/RestoreV2 OP的input 0 Tensor中没有发现合适的模型权重checkpoint文件路径pattern，那么最终就是在构建allocated_reader的时候抛出本文分析的”Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ” 错误。注意一个细节的地方，本文提到错误信息中最后一个model 关键字 “Unsuccessful … Failed to find any matching files for model“，该值来源与变量filepattern，而网上大多数文章在这个错误出现的时候变量filepattern是一个checkpoint的路径，而不是”model”，所以重点是分析为什么在Sample用例中出现了这个model值。

RestoreV2 OP input 0 Tensor的值来源示意

根据上面的代码分析可知, RestoreV2 OP input 0 Tensor是中非常重要的模型权重路径pattern。分析计算图中的Tensor关系可以发现RestoreV2 OP input 0 Tensor来自输入 save/Const OP的输出，而save/Const OP的输入来自save/filename OP，而最终save/filename OP的input是个Const string 值” model“。所以这就完美解释了在sample代码中运行sess.run(fetch_list) 的过程如下

sess.run(fetch_list) 等于运行save/Assign OP
运行save/Assign OP需要运行save/RestoreV2 OP
运行save/RestoreV2 OP需要运行save/Const OP获得save/RestoreV2 kerenl中input 0 Tensor依赖的模型权重checkpoint文件路径pattern
运行save/Const OP需要运行save/filename OP获得它的输出，而它的输出就是一个Const string 值” model“，所以导致save/RestoreV2 kerenl中input 0 Tensor依赖的模型权重checkpoint文件路径pattern值为” model“，最终就是在构建allocated_reader的时候抛出本文分析的”Unsuccessful … Failed to find any matching files for model” Error

Saver.restore API分析

//tensorflow/python/training/saver.py

class Saver:
  ...

  def restore(self, sess, save_path):
    """Restores previously saved variables.

    This method runs the ops added by the constructor for restoring variables.

    It requires a session in which the graph was launched.  The variables to
    restore do not have to have been initialized, as restoring is itself a way
    to initialize variables.

    The save_path argument is typically a value previously returned from a
    save() call, or a call to latest_checkpoint().

    Args:
      sess: A Session to use to restore the parameters. None in eager mode.

      save_path: Path where parameters were previously saved.

    Raises:
      ValueError: If save_path is None or not a valid checkpoint.

"""
    if self._is_empty:
      return
    if save_path is None:
      raise ValueError("Can't load save_path when it is None.")

    checkpoint_prefix = compat.as_text(save_path)
...

        #应用程序提供模型权重checkpoint的文件路径到参数save_path
        sess.run(self.saver_def.restore_op_name,
                 {self.saver_def.filename_tensor_name: save_path})

细心的同学一定会有个疑问，既然计算图中看到执行 RestoreV2 OP 会发生Error，那为什么sample代码中restorer_fc.restore(sess, pretrained_model) 也会执行RestoreV2 OP，为毛没有发生Error。如上面的RestoreV2 OP源码所示，关键原因就是sample程序在调用restore API的时候输入了模型权重的checkpoint文件路径，所以在TF源码中sess.run 的时候就把文件路径作为feed list送入了计算图，所以在运行save/RestoreV2 OP的kerenl中input 0 Tensor 依赖的模型权重checkpoint文件路径pattern就不是错误的”model”值，而是sample程序在调用restore API的时候输入了模型权重的checkpoint文件路径，从而能够正确的找到checkpoint文件读取数据了

Original: https://blog.csdn.net/HaoBBNuanMM/article/details/123735318
Author: HaoBBNuanMM
Title: 【问题分析】Unsuccessful TensorSliceReader constructor: Failed to find any matching files for model

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/497103/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

np.expand_dims 小白详解

np.expand_dims 目录 np.expand_dims * 前言第一层理解：这个axis会插在形状的哪里（知道形状会怎么改变）第二层理解：这个数组的内在会怎么改变（知…

人工智能 2023年6月23日
0072
Pandas总结与思维导图

Pandas是一个强大的分析结构化数据的工具集；它的使用基础是Numpy（提供高性能的矩阵运算）；用于数据挖掘和数据分析，同时也提供数据清洗功能。利器之一：Series它是一种类似…

人工智能 2023年7月7日
0091
机动目标跟踪之交互多模型（IMM）

机动目标跟踪之交互多模型 IMM * 输入交互滤波模型概率更新输出交互融合 IMM 由于单一模型很难匹配目标时刻变化的运动状态，容易导致目标跟踪丢失，1984年H.A.P.B…

人工智能 2023年6月2日
0094
医疗知识图谱问答系统（python neo4j)

这是中科院软件所刘焕勇老师在github上的一个开源项目地址：本项目构建了以疾病为中心的医疗知识图谱，实体规模4.4万，实体关系规模30万。并基于此，搭建起了一个可以回答18类…

人工智能 2023年7月26日
0061
基于知识图谱的推荐综述

一.A Survey on Knowledge Graph-Based Recommender Systems 基于知识图谱的推荐系统综述来源：TKDE 2020论文链接：http…

人工智能 2023年6月5日
0076
Ubuntu Linux 安装配置JDK17开发环境

Ubuntu Linux 安装配置JDK17开发环境 1、查看linux系统信息，并升级系统 $ cat /etc/os-release$ lsb_release -a$ unam…

人工智能 2023年6月10日
0082
Redis解决优惠券秒杀

虽然本文是针对黑马点评的优惠券秒杀业务的实现，但是是适用于各种抢购活动，保证线程安全。摘要：本文先讲了抢购问题，指出其中会出现的多线程问题，提出解决方案采用悲观锁和乐观锁两种…

人工智能 2023年6月29日
0061
2020年5月第一次presentation：讲的是人流量预测算法ST-ResNet

整理电脑文件时发现研究生第一次做报告的稿子，在此分享一下。对于初学者而言，特别是不善于阅读文献且阅读量寥寥无几的情况下，做一个非常棒深层次的报告是有难度的。是的，我就是这类学生。从…

人工智能 2023年7月15日
0075
机器学习-习题（二）

2.2 数据集包含 100 个样本, 其中正、反例各一半, 假定学习算法所产生的模型是将新样本预测为训练样本数较多的类别 (训练样本数相同时进行随机猜测) , 试给出用 10 折交…

人工智能 2023年6月4日
00141
为保证软件开发质量，如何提高检查效率？

为了保证软件开发质量，降低Bug出现率，如何提高软件检查效率就非常重要。尤其是对即将发生问题的防范和已发生问题的探究等。 1、构建了质量检查系统为了保证软件开发质量，一般来说QA…

人工智能 2023年6月28日
0090
计算yolov5中detect.py生成图像的mAP

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、用YoloV5的detect.py生成预测图，预测类别，预测框坐标，预测置信度 * 1、跑de…

人工智能 2023年6月17日
00102
【python数据分析】数据建模之 PCA主成分分析

PCA主成分分析：最广泛无监督算法 + 基础的降维算法。通过线性变换将原始数据变换为一组各维度线性无关的表示，用于提取数据的主要特征分量 → 高维数据的降维 PCA主成分分析： …

人工智能 2023年7月15日
0083
数据分析案例-基于随机森林算法探索影响人类预期寿命的因素并预测人类预期寿命

假设我们有一个包含经纬度信息的数据集，可以使用Oracle数据库中的Sdo_Geometry类型来存储经纬度信息。假设我们已经将数据集存储到了一个名为mytable的表中，其中经度…

人工智能 2023年7月18日
0057
stata协整检验结果怎么看_VAR 的stata命令

四、VAR 模型向量自回归介绍：当我们对变量是否真是外生变量的情况不自信时，传递函数分析的自然扩展就是均等地对待每一个变量。在双变量情况下，我们可以令{yt}的时间路径受序列{…

人工智能 2023年6月18日
0066
全站最详细的Python numpy 搭建全连接神经网络模型教程（理论计算+代码实现）（不止能预测手写数字数据，准确率93.21%）

1.引言本文构建的全连接神经网络模型结构图如上。其中中间隐藏层的数量以及各层（输入层、隐藏层、输出层）的神经单元数量均可自由设置，本文构造的神经网络并不是专门为识别手写数字而写…

人工智能 2023年6月15日
00119
正态分布（高斯分布）、Q函数、误差函数、互补误差函数（定义，意义及互相之间的关系）高斯分布的分布概率反解

1.正态分布参考博客：https://www.cnblogs.com/htj10/p/8621771.html 概率密度函数的意义：理解概率密度函数 – 知乎 (z…

人工智能 2023年7月28日
0079

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

【问题分析】Unsuccessful TensorSliceReader constructor: Failed to find any matching files for model

大家都在看