tensorflow之tensor

tensorflow,从名字上看由tensor+flow组成。本文来看看Tensor是什么,是怎么实现的。

tensorflow里的tensor可以抽象的认为由

struct Tensor {
    std::vector shape; //表示多维数组各维大小,如三维数组:shape={2,3,4}
    int dtype;              //表示数据类型,根据类型能报data转成对应的数组
    void *data;             //连续内存空间,保存了数组中所有元素
};

shape是可以修改的,比如一个2×3的数组,也可以变成3×2,只要元素个数不变就行。

data是一段连续的内存空间,正如c++中的数组 T[2][3]. 如果dtype是整数,那么就是int data[2][3],

data是个指针,如果强制转成int *data. 那么data, data+1, data+2, …, data+5就是各个元素。

还能切片slice:如如把data的第一维拿出来,就是data[1]. 因为是2*3数组, data[0], data[1]都3个元素。slice之后tensor还引用着原tensor的内存。而且通过引用计数保存原tensor内存释放了,slice也是可用的

Tensor实现

然而,在工程实现中,还要考虑data的对齐,如8字节对齐。也要考虑 data的内存分配方式,tensorflow里定义了allocator接口,来实现各种不同的分配方式。考虑到模型参数保存,checkpoint保存等,tensor还得支持序列化,tensorflow使用protobuf来序列化tensor.

Tensor的实现在:

基本操作

  TensorShape shape_; //形状
  TensorBuffer* buf_; //数据
  • 空构造:不是scalar, shape {0}, NumElements() ==0。
  • type+shape构造,会分配内存:Tensor(DataType type, const TensorShape& shape); 默认用CPUAllocator
  • allocator+type+shape构造:Tensor(Allocator* a, DataType type, const TensorShape& shape);
  • 带buffer构建:Tensor(DataType type, const TensorShape& shape, TensorBuffer* buf);
  • 基于常量(scalar)的构建函数,重载了很多 explicit Tensor(float scalar_value)

按第一维切片,但是不复制数据,不能保证对齐IsAligned

Tensor Slice(int64_t dim0_start, int64_t dim0_limit) const;
Tensor SubSlice(int64_t index) const;
  bool FromProto(const TensorProto& other) TF_MUST_USE_RESULT;
  bool FromProto(Allocator* a, const TensorProto& other) TF_MUST_USE_RESULT;

  /// \brief Fills in proto with *this tensor's content.

  ///
  /// AsProtoField() fills in the repeated field for proto.dtype(), while
  /// AsProtoTensorContent() encodes the content in proto.tensor_content()
  /// in a compact form.

  void AsProtoField(TensorProto* proto) const;
  void AsProtoTensorContent(TensorProto* proto) const;
/// Returns the data type.

DataType dtype() const { return shape_.data_type(); }

/// Returns the shape of the tensor.

const TensorShape& shape() const { return shape_; }

/// \brief Convenience accessor for the tensor shape.

///
/// For all shape accessors, see comments for relevant methods of
/// TensorShape in tensor_shape.h.

int dims() const { return shape().dims(); }

/// Convenience accessor for the tensor shape.

int64_t dim_size(int d) const { return shape().dim_size(d); }

/// Convenience accessor for the tensor shape.

int64_t NumElements() const { return shape().num_elements(); }

size_t AllocatedBytes() const
bool IsAligned() const
bool CopyFrom(const Tensor& other,
              const TensorShape& shape)
Tensor t;
d = t.scalar(); //访问scalar
d = t.vec();    //以一维数组方式访问: d[0]

d = t.matrix(); //以矩阵方式访问: d(2,3)

//单个元素访问
flat = t.flat()
d = flat.data()
for(auto i = 0; i < t.NumElements(); i++) d[i]
  template
  typename TTypes::Flat flat() {
    return shaped({NumElements()});
  }

  template
  typename TTypes::UnalignedFlat unaligned_flat() {
    return unaligned_shaped({NumElements()});
  }

//用于memcpy
/// REQUIRES: DataTypeCanUseMemcpy(dtype()).

StringPiece tensor_data() const;
void* data() const;

  std::string SummarizeValue(int64_t max_entries, bool print_v2 = false) const;
  std::string DebugString(int num_values) const;
  std::string DebugString() const { return DebugString(3); }
  std::string DeviceSafeDebugString() const;
  void FillDescription(TensorDescription* description) const;

Tensor shape type的实现在如下文件中

tensor.h: TensorBuffer来执行级data内存。

$ ls tensorflow/core/framework/tensor*
tensorflow/core/framework/tensor.cc                 tensorflow/core/framework/tensor_shape.proto    tensorflow/core/framework/tensor_testutil.h
tensorflow/core/framework/tensor.h                  tensorflow/core/framework/tensor_shape_test.cc  tensorflow/core/framework/tensor_testutil_test.cc
tensorflow/core/framework/tensor.proto              tensorflow/core/framework/tensor_slice.cc       tensorflow/core/framework/tensor_types.h
tensorflow/core/framework/tensor_description.proto  tensorflow/core/framework/tensor_slice.h        tensorflow/core/framework/tensor_util.cc
tensorflow/core/framework/tensor_key.h              tensorflow/core/framework/tensor_slice.proto    tensorflow/core/framework/tensor_util.h
tensorflow/core/framework/tensor_reference.h        tensorflow/core/framework/tensor_slice_test.cc  tensorflow/core/framework/tensor_util_test.cc
tensorflow/core/framework/tensor_shape.cc           tensorflow/core/framework/tensor_test.cc
tensorflow/core/framework/tensor_shape.h            tensorflow/core/framework/tensor_testutil.cc

$ ls tensorflow/core/framework/shape*
tensorflow/core/framework/shape_inference.cc  tensorflow/core/framework/shape_inference_test.cc      tensorflow/core/framework/shape_inference_testutil.h
tensorflow/core/framework/shape_inference.h   tensorflow/core/framework/shape_inference_testutil.cc  tensorflow/core/framework/shape_inference_testutil_test.cc

$ ls tensorflow/core/framework/type*
tensorflow/core/framework/type_index.h   tensorflow/core/framework/typed_allocator.cc  tensorflow/core/framework/types.cc  tensorflow/core/framework/types.proto
tensorflow/core/framework/type_traits.h  tensorflow/core/framework/typed_allocator.h   tensorflow/core/framework/types.h   tensorflow/core/framework/types_test.cc

Tensor支持的数据类型

定义在tensorflow/core/framework/types.proto中

enum DataType {
  // Not a legal value for DataType.  Used to indicate a DataType field
  // has not been set.

  DT_INVALID = 0;

  // Data types that all computation devices are expected to be
  // capable to support.

  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8
  DT_QUINT8 = 12;    // Quantized uint8
  DT_QINT32 = 13;    // Quantized int32
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops.

  DT_QINT16 = 15;    // Quantized int16
  DT_QUINT16 = 16;   // Quantized uint16
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types
  DT_UINT32 = 22;
  DT_UINT64 = 23;
}

序列化tensor.proto


// Protocol buffer representing a tensor.

message TensorProto {
  DataType dtype = 1;

  // Shape of the tensor.  TODO(touts): sort out the 0-rank issues.

  TensorShapeProto tensor_shape = 2;

  // Only one of the representations below is set, one of "tensor_contents" and
  // the "xxx_val" attributes.  We are not using oneof because as oneofs cannot
  // contain repeated fields it would require another extra set of messages.

  // Version number.

  //
  // In version 0, if the "repeated xxx" representations contain only one
  // element, that element is repeated to fill the shape.  This makes it easy
  // to represent a constant Tensor with a single value.

  int32 version_number = 3;

  // Serialized raw tensor content from either Tensor::AsProtoTensorContent or
  // memcpy in tensorflow::grpc::EncodeTensorToByteBuffer. This representation
  // can be used for all tensor types. The purpose of this representation is to
  // reduce serialization overhead during RPC call by avoiding serialization of
  // many repeated small items.

  bytes tensor_content = 4;

  // Type specific representations that make it easy to create tensor protos in
  // all languages.  Only the representation corresponding to "dtype" can
  // be set.  The values hold the flattened representation of the tensor in
  // row major order.

  // DT_HALF, DT_BFLOAT16. Note that since protobuf has no int16 type, we'll
  // have some pointless zero padding for each value here.

  repeated int32 half_val = 13 [packed = true];

  // DT_FLOAT.

  repeated float float_val = 5 [packed = true];

  // DT_DOUBLE.

  repeated double double_val = 6 [packed = true];

  // DT_INT32, DT_INT16, DT_UINT16, DT_INT8, DT_UINT8.

  repeated int32 int_val = 7 [packed = true];

  // DT_STRING
  repeated bytes string_val = 8;

  // DT_COMPLEX64. scomplex_val(2*i) and scomplex_val(2*i+1) are real
  // and imaginary parts of i-th single precision complex.

  repeated float scomplex_val = 9 [packed = true];

  // DT_INT64
  repeated int64 int64_val = 10 [packed = true];

  // DT_BOOL
  repeated bool bool_val = 11 [packed = true];

  // DT_COMPLEX128. dcomplex_val(2*i) and dcomplex_val(2*i+1) are real
  // and imaginary parts of i-th double precision complex.

  repeated double dcomplex_val = 12 [packed = true];

  // DT_RESOURCE
  repeated ResourceHandleProto resource_handle_val = 14;

  // DT_VARIANT
  repeated VariantTensorDataProto variant_val = 15;

  // DT_UINT32
  repeated uint32 uint32_val = 16 [packed = true];

  // DT_UINT64
  repeated uint64 uint64_val = 17 [packed = true];
}

// Protocol buffer representing the serialization format of DT_VARIANT tensors.

message VariantTensorDataProto {
  // Name of the type of objects being serialized.

  string type_name = 1;
  // Portions of the object that are not Tensors.

  bytes metadata = 2;
  // Tensors contained within objects being serialized.

  repeated TensorProto tensors = 3;
}

提供了如下功能:

  • tensor深拷贝
  • slice深拷贝
  • Concat 连接
  • Split 分割
  • ConcatSplitStrings 字符串连接
  • CreatesStringTensorProto: 从文件的protobuf中反序列化出dtype=DT_STRING的tensor
  • CreatesInt32TensorProto
  • CreatesInt64TensorProto
  • CreatesUInt32TensorProto
  • CreatesUInt64TensorProto
  • …各种类型都有从文件反序列化
  • CompressTensorProtoInPlaceTooSmall 各种tensor proto压缩
  • CompressTensorProtoInPlaceAllEqual
  • CompressTensorProtoConstantTail
  • CompressTensorProtoNegatizeZero

tensorflow/core/framework/allocator.h


// Allocator is an abstract interface for allocating and deallocating
// device memory.

class Allocator {
 public:
  // Align to 64 byte boundary.

  static constexpr size_t kAllocatorAlignment = 64;

  virtual ~Allocator();

  // Return a string identifying this allocator
  virtual std::string Name() = 0;

  // Return an uninitialized block of memory that is "num_bytes" bytes
  // in size.  The returned pointer is guaranteed to be aligned to a
  // multiple of "alignment" bytes.

  // REQUIRES: "alignment" is a power of 2.

  virtual void* AllocateRaw(size_t alignment, size_t num_bytes) = 0;

  // Return an uninitialized block of memory that is "num_bytes" bytes
  // in size with specified allocation attributes.  The returned pointer is
  // guaranteed to be aligned to a multiple of "alignment" bytes.

  // REQUIRES: "alignment" is a power of 2.

  virtual void* AllocateRaw(size_t alignment, size_t num_bytes,
                            const AllocationAttributes& allocation_attr) {
    // The default behavior is to use the implementation without any allocation
    // attributes.

    return AllocateRaw(alignment, num_bytes);
  }

  // Deallocate a block of memory pointer to by "ptr"
  // REQUIRES: "ptr" was previously returned by a call to AllocateRaw
  virtual void DeallocateRaw(void* ptr) = 0;

  // Returns true if this allocator tracks the sizes of allocations.

  // RequestedSize and AllocatedSize must be overridden if
  // TracksAllocationSizes is overridden to return true.

  virtual bool TracksAllocationSizes() const { return false; }

  // Returns true if this allocator allocates an opaque handle rather than the
  // requested number of bytes.

  //
  // This method returns false for most allocators, but may be used by
  // special-case allocators that track tensor usage. If this method returns
  // true, AllocateRaw() should be invoked for all values of num_bytes,
  // including 0.

  //
  // NOTE: It is the caller's responsibility to track whether an allocated
  // object is a buffer or an opaque handle. In particular, when this method
  // returns true, users of this allocator must not run any constructors or
  // destructors for complex objects, since there is no backing store for the
  // tensor in which to place their outputs.

  virtual bool AllocatesOpaqueHandle() const { return false; }

  // Returns the user-requested size of the data allocated at
  // 'ptr'.  Note that the actual buffer allocated might be larger
  // than requested, but this function returns the size requested by
  // the user.

  //
  // REQUIRES: TracksAllocationSizes() is true.

  //
  // REQUIRES: 'ptr!=nullptr' and points to a buffer previously
  // allocated by this allocator.

  virtual size_t RequestedSize(const void* ptr) const {
    CHECK(false) << "allocator doesn't track sizes";
    return size_t(0);
  }

  // Returns the allocated size of the buffer at 'ptr' if known,
  // otherwise returns RequestedSize(ptr). AllocatedSize(ptr) is
  // guaranteed to be >= RequestedSize(ptr).

  //
  // REQUIRES: TracksAllocationSizes() is true.

  //
  // REQUIRES: 'ptr!=nullptr' and points to a buffer previously
  // allocated by this allocator.

  virtual size_t AllocatedSize(const void* ptr) const {
    return RequestedSize(ptr);
  }

  // Returns either 0 or an identifier assigned to the buffer at 'ptr'
  // when the buffer was returned by AllocateRaw. If non-zero, the
  // identifier differs from every other ID assigned by this
  // allocator.

  //
  // REQUIRES: TracksAllocationSizes() is true.

  //
  // REQUIRES: 'ptr!=nullptr' and points to a buffer previously
  // allocated by this allocator.

  virtual int64_t AllocationId(const void* ptr) const { return 0; }

  // Returns the allocated size of the buffer at 'ptr' if known,
  // otherwise returns 0. This method can be called when
  // TracksAllocationSizes() is false, but can be extremely slow.

  //
  // REQUIRES: 'ptr!=nullptr' and points to a buffer previously
  // allocated by this allocator.

  virtual size_t AllocatedSizeSlow(const void* ptr) const {
    if (TracksAllocationSizes()) {
      return AllocatedSize(ptr);
    }
    return 0;
  }

  virtual absl::optional GetStats() { return absl::nullopt; }

  virtual bool ClearStats() TF_MUST_USE_RESULT { return false; }

  virtual void SetSafeFrontier(uint64 count) {}

  // Returns the type of the memory allocated by this allocator.

  virtual AllocatorMemoryType GetMemoryType() const {
    return AllocatorMemoryType::kUnknown;
  }
};

可以继承并实现自己的allocator

tensorflow/core/framework/cpu_allocator_impl.h


class CPUAllocator : public Allocator {
 public:
  CPUAllocator()
      : single_allocation_warning_count_(0),
        total_allocation_warning_count_(0) {}

  ~CPUAllocator() override {}

  string Name() override { return "cpu"; }

  void* AllocateRaw(size_t alignment, size_t num_bytes) override {
    if (num_bytes > static_cast(LargeAllocationWarningBytes()) &&
        single_allocation_warning_count_ < kMaxSingleAllocationWarnings) {
      ++single_allocation_warning_count_;
      LOG(WARNING) << "Allocation of " << num_bytes << " exceeds "
                   << 100 * kLargeAllocationWarningThreshold
                   << "% of free system memory.";
    }

    void* p = port::AlignedMalloc(num_bytes, alignment);
    if (cpu_allocator_collect_stats) {
      const std::size_t alloc_size = port::MallocExtension_GetAllocatedSize(p);
      mutex_lock l(mu_);
      ++stats_.num_allocs;
      stats_.bytes_in_use += alloc_size;
      stats_.peak_bytes_in_use =
          std::max(stats_.peak_bytes_in_use, stats_.bytes_in_use);
      stats_.largest_alloc_size =
          std::max(stats_.largest_alloc_size, alloc_size);

      if (stats_.bytes_in_use > TotalAllocationWarningBytes() &&
          total_allocation_warning_count_ < kMaxTotalAllocationWarnings) {
        ++total_allocation_warning_count_;
        LOG(WARNING) << "Total allocated memory " << stats_.bytes_in_use
                     << "exceeds " << 100 * kTotalAllocationWarningThreshold
                     << "% of free system memory";
      }
      if (p != nullptr) {
        AddTraceMe("MemoryAllocation", p, num_bytes, alloc_size);
      }
    }
    return p;
  }

  void DeallocateRaw(void* ptr) override {
    if (cpu_allocator_collect_stats) {
      const std::size_t alloc_size =
          port::MallocExtension_GetAllocatedSize(ptr);
      mutex_lock l(mu_);
      stats_.bytes_in_use -= alloc_size;
      AddTraceMe("MemoryDeallocation", ptr, 0, alloc_size);
    }
    port::AlignedFree(ptr);
  }

  void AddTraceMe(absl::string_view traceme_name, const void* chunk_ptr,
                  std::size_t req_bytes, std::size_t alloc_bytes) {
    tensorflow::profiler::TraceMe::InstantActivity(
        [this, traceme_name, chunk_ptr, req_bytes,
         alloc_bytes]() TF_NO_THREAD_SAFETY_ANALYSIS {
          const auto& annotation =
              profiler::ScopedMemoryDebugAnnotation::CurrentAnnotation();
          return tensorflow::profiler::TraceMeEncode(
              traceme_name, {{"allocator_name", Name()},
                             {"bytes_reserved", stats_.bytes_reserved},
                             {"bytes_allocated", stats_.bytes_in_use},
                             {"peak_bytes_in_use", stats_.peak_bytes_in_use},
                             {"requested_bytes", req_bytes},
                             {"allocation_bytes", alloc_bytes},
                             {"addr", reinterpret_cast(chunk_ptr)},
                             {"tf_op", annotation.pending_op_name},
                             {"id", annotation.pending_step_id},
                             {"region_type", annotation.pending_region_type},
                             {"data_type", annotation.pending_data_type},
                             {"shape", annotation.pending_shape_func()}});
        },
        /*level=*/profiler::TraceMeLevel::kInfo);
  }

  absl::optional GetStats() override {
    if (!cpu_allocator_collect_stats) return absl::nullopt;
    mutex_lock l(mu_);
    return stats_;
  }

  bool ClearStats() override {
    if (!cpu_allocator_collect_stats) return false;
    mutex_lock l(mu_);
    stats_.num_allocs = 0;
    stats_.peak_bytes_in_use = stats_.bytes_in_use;
    stats_.largest_alloc_size = 0;
    return true;
  }

  size_t AllocatedSizeSlow(const void* ptr) const override {
    return port::MallocExtension_GetAllocatedSize(ptr);
  }

  AllocatorMemoryType GetMemoryType() const override {
    return AllocatorMemoryType::kHostPageable;
  }

 private:
  mutex mu_;
  AllocatorStats stats_ TF_GUARDED_BY(mu_);

  // Use  for single allocations to avoid mutex contention when
  // statistics are disabled.

  std::atomic single_allocation_warning_count_;
  int total_allocation_warning_count_ TF_GUARDED_BY(mu_);

  TF_DISALLOW_COPY_AND_ASSIGN(CPUAllocator);
};

//注册cpu allocator
REGISTER_MEM_ALLOCATOR("DefaultCPUAllocator", 100, CPUAllocatorFactory);

Allocator注册

tensorflow/core/framework/allocator_registry.h


class AllocatorFactoryRegistry {
 public:
  AllocatorFactoryRegistry() {}
  ~AllocatorFactoryRegistry() {}

  void Register(const char* source_file, int source_line, const string& name,
                int priority, AllocatorFactory* factory);

  // Returns 'best fit' Allocator.  Find the factory with the highest priority
  // and return an allocator constructed by it.  If multiple factories have
  // been registered with the same priority, picks one by unspecified criteria.

  Allocator* GetAllocator();

  // Returns 'best fit' SubAllocator.  First look for the highest priority
  // factory that is NUMA-enabled.  If none is registered, fall back to the
  // highest priority non-NUMA-enabled factory.  If NUMA-enabled, return a
  // SubAllocator specific to numa_node, otherwise return a NUMA-insensitive
  // SubAllocator.

  SubAllocator* GetSubAllocator(int numa_node);

  // Returns the singleton value.

  static AllocatorFactoryRegistry* singleton();

  ProcessStateInterface* process_state() const { return process_state_; }

 protected:
  friend class ProcessState;
  ProcessStateInterface* process_state_ = nullptr;

 private:
  mutex mu_;
  bool first_alloc_made_ = false;
  struct FactoryEntry {
    const char* source_file;
    int source_line;
    string name;
    int priority;
    std::unique_ptr factory;
    std::unique_ptr allocator;
    // Index 0 corresponds to kNUMANoAffinity, other indices are (numa_node +
    // 1).

    std::vector> sub_allocators;
  };
  std::vector factories_ TF_GUARDED_BY(mu_);

  // Returns any FactoryEntry registered under 'name' and 'priority',
  // or 'nullptr' if none found.

  const FactoryEntry* FindEntry(const string& name, int priority) const
      TF_EXCLUSIVE_LOCKS_REQUIRED(mu_);

  TF_DISALLOW_COPY_AND_ASSIGN(AllocatorFactoryRegistry);
};

Original: https://blog.csdn.net/wyg_031113/article/details/124511745
Author: wyg_031113
Title: tensorflow之tensor

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/509520/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球