How to use 16bit float, TensorRT, network rewriting and multi-threading to dramatically speed up deep learning model prediction — Vision transformer such as UNET, SwinUNETR are state-of-the-art in computer vision tasks, such as semantic segmentation. But it takes a lot of time for such models to make a prediction. This article shows how to speed up such model’s prediction by 9 times. …