PointNet++之S3DIS 语义分割数据预处理实战

902 字

2 分钟

PointNet++之S3DIS 语义分割数据预处理实战

2026-05-06

笔记

点云

/

笔记

/

PointNet++

前言#

在跑 PointNet++ 语义分割训练之前，必须先过一道坎：S3DIS 数据预处理。官方仓库的 collect_indoor3d_data.py 是为早期原始版本写的，面对现在常见的 Aligned_Version 会直接崩溃。本文记录从报错到成功生成 272 个训练用 .npy 文件的完整踩坑过程。

1. 现象：原版脚本全军覆没#

执行官方 collect_indoor3d_data.py 后，终端刷屏：

1
/home/hw/.../Area_1/conferenceRoom_1/Annotations ERROR!!
2
/home/hw/.../Area_1/copyRoom_1/Annotations ERROR!!
3
...
4
ValueError: need at least one array to concatenate

表面现象：所有房间都报错，没有一个成功。
隐藏真相：try-except 把具体错误吞掉了，根本不知道哪一步崩的。

2. 诊断：数据到底长什么样#

打开 Windows 资源管理器看 Aligned_Version 的实际结构：

1
Stanford3dDataset_v1.2_Aligned_Version/
2
├── Area_1/
3
│   ├── conferenceRoom_1/
4
│   │   ├── Annotations/              ← 按物体分好的 .txt
5
│   │   │   ├── beam_1.txt
6
│   │   │   ├── board_1.txt
7
│   │   │   ├── chair_1.txt
8
│   │   │   ├── ceiling_1.txt
9
│   │   │   └── ...
10
│   │   └── conferenceRoom_1.txt      ← 房间级合并文件（35MB）

关键发现：Annotations/ 目录并非为空，里面全是 chair_1.txt、beam_1.txt 这类按物体拆分的文件。原版脚本理论上应该能读到，但实际上因为路径拼接或类别过滤逻辑问题，导致 points_list 为空，最终 np.concatenate 崩溃。

3. 根因：原版脚本的三个暗坑#

暗坑	说明
路径硬编码	依赖 `meta/anno_paths.txt` 和 `DATA_PATH` 拼接，一旦目录层级不对（如多了 `Aligned_Version` 这层），就找错地方
静默跳过	`try-except` 不打印 `traceback`，所有文件读取失败都被吞掉，最后只剩一个空列表去 `concatenate`
标签列缺失	`Annotations/` 下的每个 `.txt` 只有 `XYZRGB`（6 列），没有第 7 列标签。原版脚本可能期望直接读取带标签的文件，或内部有补标签逻辑但路径错了导致没走到

4. 解决：重写预处理脚本#

不折腾原版了，直接写一个适配 Aligned_Version 的脚本，核心策略：

双保险读取：优先读 Annotations/ 下的分物体文件；如果失败， fallback 到房间级合并 .txt
文件名解析类别：从 chair_1.txt 解析出 chair，映射到标准类别 ID，自动补上第 7 列标签
逐房间独立处理：一个房间失败不影响其他房间，且打印详细原因

完整脚本（collect_s3dis_aligned.py）：

1
import os
2
import numpy as np
3

4
# S3DIS 13 个标准语义类别
5
CLASS_NAMES = [
6
    'ceiling', 'floor', 'wall', 'beam', 'column', 'window', 'door',
7
    'table', 'chair', 'sofa', 'bookcase', 'board', 'clutter'
8
]
9

10
def get_class_id(filename):
11
    """从 'chair_1.txt' 解析出 'chair' -> ID 8"""
12
    prefix = filename.split('_')[0].lower()
13
    if prefix in CLASS_NAMES:
14
        return CLASS_NAMES.index(prefix)
15
    return 12  # 未知类别归为 clutter
16

17
def collect_from_annotations(room_path):
18
    """
19
    从 Annotations/ 读取分物体文件：
20
    每个 .txt 是 (N, 6) XYZRGB，需要补第 7 列标签
21
    """
22
    anno_dir = os.path.join(room_path, 'Annotations')
23
    if not os.path.isdir(anno_dir):
24
        return None, "Annotations 目录不存在"
25

26
    txt_files = sorted([f for f in os.listdir(anno_dir) if f.endswith('.txt')])
27
    if len(txt_files) == 0:
28
        return None, "Annotations 下无 .txt 文件"
29

30
    points_list = []
31
    for f in txt_files:
32
        fpath = os.path.join(anno_dir, f)
33
        try:
34
            pts = np.loadtxt(fpath)  # (N, 6)
35
            if pts.ndim == 1:
36
                pts = pts.reshape(1, -1)
37

38
            # 关键：根据文件名补第 7 列语义标签
39
            cls_id = get_class_id(f)
40
            labels = np.full((pts.shape[0], 1), cls_id, dtype=np.float32)
41
            pts_labeled = np.concatenate([pts, labels], axis=1)  # (N, 7) XYZRGBL
42

43
            points_list.append(pts_labeled)
44
        except Exception as e:
45
            print(f"    跳过 {f}: {e}")
46

47
    if len(points_list) == 0:
48
        return None, "所有文件读取失败"
49

50
    return np.concatenate(points_list, axis=0), f"合并 {len(points_list)} 个物体"
51

52

53
def main():
54
    # 数据根目录（根据你的软链接层级调整）
55
    root = os.path.join(
56
        os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
57
        'data/s3dis'  # 如果软链接直接指向 Aligned_Version，就用 'data/s3dis'
58
    )
59

60
    # 输出目录：和原版一致
61
    output = os.path.join(os.path.dirname(root), 'stanford_indoor3d')
62
    os.makedirs(output, exist_ok=True)
63

64
    print(f"扫描目录: {root}\n")
65
    success = 0
66
    failed = []
67

68
    for area in sorted(os.listdir(root)):
69
        if not area.startswith('Area_'):
70
            continue
71

72
        area_path = os.path.join(root, area)
73
        rooms = sorted([d for d in os.listdir(area_path)
74
                       if os.path.isdir(os.path.join(area_path, d))])
75

76
        for room in rooms:
77
            room_path = os.path.join(area_path, room)
78
            out_name = f"{area}_{room}.npy"
79
            out_path = os.path.join(output, out_name)
80

81
            data, info = collect_from_annotations(room_path)
82

83
            if data is None:
84
                print(f"❌ {area}/{room}: {info}")
85
                failed.append(f"{area}/{room}")
86
                continue
87

88
            np.save(out_path, data.astype(np.float32))
89
            print(f"✅ {area}/{room} | {info} | shape: {data.shape}")
90
            success += 1
91

92
    print(f"\n{'='*60}")
93
    print(f"完成: 成功 {success} 个, 失败 {len(failed)} 个")
94
    print(f"输出: {output} ({len(os.listdir(output))} 个 .npy 文件)")
95

96

97
if __name__ == '__main__':
98
    main()

5. 软链接的坑：路径层级要对齐#

WSL2 下访问 Windows D 盘数据时，软链接层级容易建错。

错误示范：

1
# 假设 D 盘实际路径是 /mnt/d/Download/S3DIS/Stanford3dDataset_v1.2_Aligned_Version
2
# 如果直接这样建：
3
ln -s /mnt/d/Download/S3DIS/Stanford3dDataset_v1.2_Aligned_Version data/s3dis
4
# 那么 data/s3dis 直接就是数据根目录，里面直接是 Area_1~6

脚本里的 root 路径必须和软链接层级匹配：

如果 data/s3dis 直接指向 Aligned_Version → root = 'data/s3dis'
如果 data/s3dis 指向 S3DIS/（多一层） → root = 'data/s3dis/Stanford3dDataset_v1.2_Aligned_Version'

验证软链接是否正常：

1
ls data/s3dis/Area_1/
2
# 应该看到 conferenceRoom_1, hallway_1, office_1...

6. 运行与结果#

1
cd ~/projects/pointcloud/Pointnet_Pointnet2_pytorch/data_utils
2
python3 collect_s3dis_aligned.py

预期输出：

1
扫描目录: /home/hw/.../data/s3dis
2

3
✅ Area_1/conferenceRoom_1 | 合并 42 个物体 | shape: (1258327, 7)
4
✅ Area_1/copyRoom_1 | 合并 15 个物体 | shape: (523456, 7)
5
...
6
完成: 成功 272 个, 失败 0 个
7
输出: .../data/stanford_indoor3d (272 个 .npy 文件)

产物说明：

每个 .npy 形状为 (N, 7)，其中 N 是该房间总点数（几十万到上百万）
前 6 列：x y z r g b
第 7 列：label（0~12 的语义类别 ID）

7. 下一步：直接训练#

预处理完成后，不需要再打包 HDF5，train_semseg.py 会直接读取 data/stanford_indoor3d/ 下的 .npy：

1
cd ~/projects/pointcloud/Pointnet_Pointnet2_pytorch
2

3
python3 train_semseg.py \
4
    --model pointnet2_sem_seg \
5
    --log_dir pointnet2_sem_seg \
6
    --batch_size 16 \
7
    --epoch 32 \
8
    --test_area 5

8. 总结#

问题	原因	解法
`ValueError: need at least one array to concatenate`	原版脚本路径错误或文件读取失败，导致 `points_list` 为空	重写脚本，逐房间独立处理，失败时打印具体原因
不知道数据里有没有文件	`try-except` 吞掉了错误	去掉裸 `except`，改用 `traceback.format_exc()` 或逐文件检查
`Annotations` 下 `.txt` 只有 6 列	Aligned Version 的物体文件是 `XYZRGB`，标签需要从文件名解析	用 `get_class_id()` 映射文件名前缀到类别 ID，拼接第 7 列
软链接路径层级混乱	`ln -s` 时没搞清楚指向的是哪一层	用 `ls data/s3dis/Area_1/` 验证，脚本 `root` 与软链接严格对齐

S3DIS 预处理是 PointNet++ 语义分割的第一道门槛，跨过去之后，训练命令和分类任务几乎一样，只是评估指标从 accuracy 换成了 mIoU。

PointNet++之S3DIS 语义分割数据预处理实战

https://fredsblog-2dc.pages.dev/posts/note-pointnet-senseg-s3dis/

作者

Fredzhe

发布于

2026-05-06

许可协议

CC BY-NC-SA 4.0

部分信息可能已经过时

WSL2 磁盘膨胀急救与预防

PointNet++之S3DIS 语义分割训练

折根妙妙屋

前言#

1. 现象：原版脚本全军覆没#

2. 诊断：数据到底长什么样#

3. 根因：原版脚本的三个暗坑#

4. 解决：重写预处理脚本#

5. 软链接的坑：路径层级要对齐#

6. 运行与结果#

7. 下一步：直接训练#

8. 总结#