python-pptx提取pptx文件中的文本时,不能按文本框的顺序依次提取

比如在一张幻灯片中,最上面的文本框中的内容反而是最后提取的。针对此问题,我在外网搜了以下,攒出以下代码,但提取出的文本顺序依然有问题。不知是哪里有问题,哪位大神指导y下。

from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE

def iter_textable_shapes(shapes):
    for shape in shapes:
        if shape.has_text_frame:
            yield shape

def iter_textframed_shapes(shapes):
    """Generate shape objects in *shapes* that can contain text.

    Shape objects are generated in document order (z-order), bottom to top.
    """
    for shape in shapes:
        # ---recurse on group shapes---
        if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
            group_shape = shape
            for shape in iter_textable_shapes(group_shape.shapes):
                yield shape
            continue

        # ---otherwise, treat shape as a "leaf" shape---
        if shape.has_text_frame:
            yield shape

prs = Presentation(path_to_my_prs)

for slide in prs.slides:
    textable_shapes = list(iter_textframed_shapes(slide.shapes))
    ordered_textable_shapes = sorted(
        textable_shapes, key=lambda shape: (shape.top, shape.left)
    )

    for shape in ordered_textable_shapes:
        print(shape.text)

因为文本框显示的顺序未必是添加/插入的顺序,比如你先添加的内容,放在右下角,后添加的放在左上角,那么你获取的顺序和视觉上的顺序就不同。
你可以通过坐标位置的关系来判断。比如说先按照纵坐标排序,再按照横坐标排序