在vision_process.py的第405行,fetch_video函数的开头,会计算单个block的宽度 image_factor = image_patch_size * SPATIAL_MERGE_SIZE
随后在425-426行,处理视频中的每一帧时,会将image_factor传入fetch_image函数中,
executor.submit(fetch_image, {"image": video_element, **process_info}, image_factor)
for video_element in ele["video"]
但是在第100行,fetch_image函数中,会再次方法image_factor
image_obj = None
patch_factor = int(image_patch_size * SPATIAL_MERGE_SIZE)
if isinstance(image, Image.Image):
image_obj = image
倘若,image_patch_size=14,SPATIAL_MERGE_SIZE=2, 则fetch_video会按照patch_factor=1422=56来进行resize,这样是不是有问题呀?