站点图标 高效码农

pt-datasets 使用和报错解决

项目描述

PyTorch 数据集;在部署自动编码器是一种旨在重建给定输入的神经网络。它通过学习数据的最显著特征来学习重建输入。这些显著特征被编码在潜在空间中,即比原始特征空间维度更低的特征表示。我们可以将自动编码器的潜在代码用于下游任务,例如分类、回归和聚类。在这个简单的例子中,我们使用文本的潜在代码表示通过 k-Means 算法进行聚类。这项工作的目的不是达到最先进的性能,而是展示来自自动编码器的潜在代码可用于下游任务,就像我们如何使用来自主成分分析、线性判别分析和局部线性嵌入的特征一样的ag-news-ae-clustering项目是pip安装pt-datasets时很吃力,特此记录。

概述

此存储库旨在让您更轻松、更快速地访问常用的基准数据集。使用此存储库,您可以以现成的方式将数据集加载到 PyTorch 模型中。此外,它还可用于加载使用 PCA、t-SNE 或 UMAP 编码的上述数据集的低维特征。

数据集

用法

建议使用虚拟环境来隔离项目依赖关系。

$  virtualenv  env  --python = python3 # 我们使用 python 3 
$ pip install pt-datasets # 安装包     

然后我们可以使用该包加载现成的数据加载器,

from pt_datasets import load_dataset, create_dataloader

# 加载训练和测试数据
train_data , test_data  =  load_dataset ( name = "cifar10" ) 

# 为训练数据创建数据加载器
train_loader  =  create_dataloader ( 
    dataset = train_data , batch_size = 64 , shuffle = True , num_workers = 1 
) 

使用数据加载器训练

模型. fit ( train_loader , epochs = 10 )
我们还可以将数据集特征编码到低维空间,

import seaborn as sns
import matplotlib.pyplot as plt
from pt_datasets import load_dataset, encode_features

# load the training and test data
train_data, test_data = load_dataset(name="fashion_mnist")

# get the numpy array of the features
# the encoders can only accept np.ndarray types
train_features = train_data.data.numpy()

# flatten the tensors
train_features = train_features.reshape(
    train_features.shape[0], -1
)

# get the labels
train_labels = train_data.targets.numpy()

# get the class names
classes = train_data.classes

# encode training features using t-SNE
encoded_train_features = encode_features(
    features=train_features,
    seed=1024,
    encoder="tsne"
)

# use seaborn styling
sns.set_style("darkgrid")

# scatter plot each feature w.r.t class
for index in range(len(classes)):
    plt.scatter(
        encoded_train_features[train_labels == index, 0],
        encoded_train_features[train_labels == index, 1],
        label=classes[index],
        edgecolors="black"
    )
plt.legend(loc="upper center", title="Fashion-MNIST classes", ncol=5)
plt.show()

pip install pt-datasets 报错: error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [47 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-37
      creating build\lib.win-amd64-cpython-37\MulticoreTSNE
      copying MulticoreTSNE\__init__.py -> build\lib.win-amd64-cpython-37\MulticoreTSNE
      creating build\lib.win-amd64-cpython-37\MulticoreTSNE\tests
      copying MulticoreTSNE\tests\test_base.py -> build\lib.win-amd64-cpython-37\MulticoreTSNE\tests
      copying MulticoreTSNE\tests\__init__.py -> build\lib.win-amd64-cpython-37\MulticoreTSNE\tests
      running egg_info
      writing MulticoreTSNE.egg-info\PKG-INFO
      writing dependency_links to MulticoreTSNE.egg-info\dependency_links.txt
      writing requirements to MulticoreTSNE.egg-info\requires.txt
      writing top-level names to MulticoreTSNE.egg-info\top_level.txt
      reading manifest file 'MulticoreTSNE.egg-info\SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      adding license file 'LICENSE.txt'
      writing manifest file 'MulticoreTSNE.egg-info\SOURCES.txt'
      running build_ext
      cmake version 3.18.4
     
      CMake suite maintained and supported by Kitware (kitware.com/cmake).
      -- Building for: NMake Makefiles
      -- The CXX compiler identification is unknown
      CMake Error at CMakeLists.txt:1 (PROJECT):
        The CMAKE_CXX_COMPILER:
     
          cl
     
        is not a full path and was not found in the PATH.
     
        To use the NMake generator with Visual C++, cmake must be run from a shell
        that can use the compiler cl from the command line.  This environment is
        unable to invoke the cl compiler.  To fix this problem, run cmake from the
        Visual Studio Command Prompt (vcvarsall.bat).
     
        Tell CMake where to find the compiler by setting either the environment
        variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
        to the compiler, or to the compiler name if it is in the PATH.
     
     
      -- Configuring incomplete, errors occurred!
      See also "C:/Users/Administrator/AppData/Local/Temp/pip-install-1wqz9ixa/multicoretsne_e640b7e6100b44f487a8475a7e691c2e/build/temp.win-amd64-cpython-37/Release/CMakeFiles/
CMakeOutput.log".
      See also "C:/Users/Administrator/AppData/Local/Temp/pip-install-1wqz9ixa/multicoretsne_e640b7e6100b44f487a8475a7e691c2e/build/temp.win-amd64-cpython-37/Release/CMakeFiles/
CMakeError.log".
     
      ERROR: Cannot generate Makefile. See above errors.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for MulticoreTSNE

### 解决方案:
在 Windows 上安装需要 C++ 编译的 Python 包时,常见此类问题。以下是分步解决方案:


问题原因

错误显示 MulticoreTSNE 需要 C++ 编译器 (Visual Studio 的 cl.exe),但你的环境未正确配置。该包依赖 CMake 和 Visual Studio 构建工具。


解决方案

1. 安装 Visual Studio 构建工具

2. 使用 Visual Studio 命令提示符

报错:ERROR: Could not build wheels for MulticoreTSNE, which is required to install

解决方案:

去pypi官网:网址

搜索 MulticoreTSNE

下载解压:

将该文件放入难度虚拟环境当中,这个时候发现没有报错了。就看看呢个否正常运行

运行之后如果出现:Cannot find/open tsne_multicore shared library。那么该方法宣告失败!。

参考:https://blog.csdn.net/m0_45924886/article/details/133122422

退出移动版