The author | Rahul Varma compile | source of vitamin k | forward Data Science

The most important step in training and testing an effective machine learning model is to collect a lot of data and use that data to effectively train it. Mini-batches help solve this problem, using a small number of batches of data for training in each iteration.

However, as a large number of machine learning tasks are performed on video data sets, there is a problem of effective batch processing for unequal length videos. Most methods rely on cropping the video to equal lengths in order to extract the same number of frames during iterations. But this isn’t particularly useful in scenarios where we need to extract information from every frame to effectively predict something, especially in the case of self-driving cars and motion recognition.

We can create a processing method that can handle videos of different lengths.

In Glenn Jocher’s Yolov3 (github.com/ultralytics…

Class initialization

def __init__(self, sources='streams.txt', img_size=416, batch_size=2, subdir_search=False) :
        self.mode = 'images'
        self.img_size = img_size
        self.def_img_size = None

        videos = []
        if os.path.isdir(sources):
            if subdir_search:
                for subdir, dirs, files in os.walk(sources):
                    for file in files:
                        if 'video' in magic.from_file(subdir + os.sep + file, mime=True):
                            videos.append(subdir + os.sep + file)
            else:
                for elements in os.listdir(sources):
                    if not os.path.isdir(elements) and 'video' in magic.from_file(sources + os.sep + elements, mime=True):
                        videos.append(sources + os.sep + elements)
        else:
            with open(sources, 'r') as f:
                videos = [x.strip() for x in f.read().splitlines() if len(x.strip())]

        n = len(videos)
        curr_batch = 0
        self.data = [None] * batch_size
        self.cap = [None] * batch_size
        self.sources = videos
        self.n = n
        self.cur_pos = 0

        The starting thread reads frames from the video stream
        for i, s in enumerate(videos):
            if curr_batch == batch_size:
                break
            print('%g/%g: %s... ' % (self.cur_pos+1, n, s), end=' ')
            self.cap[curr_batch] = cv2.VideoCapture(s)
            try:
                assert self.cap[curr_batch].isOpened()
            except AssertionError:
                print('Failed to open %s' % s)
                self.cur_pos+=1
                continue
            w = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_WIDTH))
            h = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_HEIGHT))
            fps = self.cap[curr_batch].get(cv2.CAP_PROP_FPS) % 100
            frames = int(self.cap[curr_batch].get(cv2.CAP_PROP_FRAME_COUNT))
            _, self.data[i] = self.cap[curr_batch].read()  # guarantee first frame
            thread = Thread(target=self.update, args=([i, self.cap[curr_batch], self.cur_pos+1]), daemon=True)
            print(' success (%gx%g at %.2f FPS having %g frames).' % (w, h, fps, frames))
            curr_batch+=1
            self.cur_pos+=1
            thread.start()
            print(' ')  # new line

        if all( v is None for v in self.data ):
            return
        Check for common shapes
        s = np.stack([letterbox(x, new_shape=self.img_size)[0].shape for x in self.data], 0)  # Shape of reasoning
        self.rect = np.unique(s, axis=0).shape[0] = =1
        if not self.rect:
            print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.')
Copy the code

In the *__init__* function, four arguments are accepted. Although img_size is the same as the original, the other three parameters are defined as follows:

  • Sources: It takes a directory path or text file as input.
  • Batch_size: indicates the required batch size
  • Subdir_search: You can toggle this option to ensure that related files in all subdirectories are searched when the directory is passed as the sources parameter

I first check whether the sources argument is a directory or a text file. If it is a directory, I will read everything in the directory (subdirectories are included if the subdir_search parameter is True), otherwise I will read the path of the video in the text file. The path of the video is stored in the list. Use cur_pos to track the current position in the list.

The list is iterated over at a maximum batch_size and checked to skip wrong videos or nonexistent ones. They are sent to the LetterBox function to resize the image. This is no change from the original version, unless all videos are faulty/unavailable.

def letterbox(img, new_shape=(416.416), color=(114.114.114), auto=True, scaleFill=False, scaleup=True) :
    # adjust image to 32 pixels multiples of rectangular https://github.com/ultralytics/yolov3/issues/232
    shape = img.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # proportion
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # Scale down only, not scale up (for better test images)
        r = min(r, 1.0)

    # calculate fill
    ratio = r, r  # aspect ratio
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # fill
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # fill
    elif scaleFill:  # stretch
        dw, dh = 0.0.0.0
        new_unpad = new_shape
        ratio = new_shape[0] / shape[1], new_shape[1] / shape[0]  # aspect ratio

    dw /= 2  # Divide the fill into two sides
    dh /= 2

    if shape[::-1] != new_unpad:  # change the size
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add boundary
    return img, ratio, (dw, dh)
Copy the code

Fixed interval retrieval frame function

The update function has a small change, we also store the default image size so that all videos are extracted for processing, but one video is finished before the other due to unequal lengths. It will become clearer when I explain the next part of the code, which is the *__next__* function.

def update(self, index, cap, cur_pos) :
        Read the next frame in the daemon thread
        n = 0
        while cap.isOpened():
            n += 1
            # _, self.imgs[index] = cap.read()
            cap.grab()
            if n == 4:  # Read every 4 frames
                _, self.data[index] = cap.retrieve()
                if self.def_img_size is None:
                    self.def_img_size = self.data[index].shape
                n = 0
            time.sleep(0.01)  # wait
Copy the code

The iterator

If the frame exists, it is passed to the letterBox function as usual. In the case of frame None, which means the video has been fully processed, we check to see if all the videos in the list have been processed. If there are more videos to process, the cur_pos pointer is used to get the location of the next available video.

If videos are no longer extracted from the list, but some videos are still being processed, a blank frame is sent to the other processing component, that is, it dynamically resizes the video based on the remaining frames in the other batch.

def __next__(self) :
        self.count += 1
        img0 = self.data.copy()
        img = []

        for i, x in enumerate(img0):
            if x is not None:
                img.append(letterbox(x, new_shape=self.img_size, auto=self.rect)[0])
            else:
                if self.cur_pos == self.n:
                    if all( v is None for v in img0 ):
                        cv2.destroyAllWindows()
                        raise StopIteration
                    else:
                        img0[i] = np.zeros(self.def_img_size)
                        img.append(letterbox(img0[i], new_shape=self.img_size, auto=self.rect)[0])
                else:
                    print('%g/%g: %s... ' % (self.cur_pos+1, self.n, self.sources[self.cur_pos]), end=' ')
                    self.cap[i] = cv2.VideoCapture(self.sources[self.cur_pos])
                    fldr_end_flg = 0
                    while not self.cap[i].isOpened():
                        print('Failed to open %s' % self.sources[self.cur_pos])
                        self.cur_pos+=1
                        if self.cur_pos == self.n:
                            img0[i] = np.zeros(self.def_img_size)
                            img.append(letterbox(img0[i], new_shape=self.img_size, auto=self.rect)[0])
                            fldr_end_flg = 1
                            break
                        self.cap[i] = cv2.VideoCapture(self.sources[self.cur_pos])
                    if fldr_end_flg:
                        continue
                    w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                    h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                    fps = cap.get(cv2.CAP_PROP_FPS) % 100
                    frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
                    _, self.data[i] = self.cap[i].read()  # Guarantee the first frame
                    img0[i] = self.data[i]
                    img.append(letterbox(self.data[i], new_shape=self.img_size, auto=self.rect)[0])
                    thread = Thread(target=self.update, args=([i, self.cap[i], self.cur_pos+1]), daemon=True)
                    print(' success (%gx%g at %.2f FPS having %g frames).' % (w, h, fps, frames))
                    self.cur_pos+=1
                    thread.start()
                    print(' ')  # new line

        # stack
        img = np.stack(img, 0)

        # transformation
        img = img[:, :, :, ::-1].transpose(0.3.1.2)  # BGR to RGB, bSX3x416x416
        img = np.ascontiguousarray(img)

        return self.sources, img, img0, None
Copy the code

conclusion

With a lot of time spent on data collection and data preprocessing, I believe this helps reduce the time it takes to match the video to the model and we can focus on matching the model to the data.

I’ve attached the full source code here. Hope this helps!

The original link: towardsdatascience.com/variable-si…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/