My pure guess, and I stress pure guess, is that the way the motion and sound is recorded some scenes require less data to describe them than others. So over the course of a programme some will require less data to be stored than another.
I do know one way of storing video would be to assume the next frame is identical to the last and only store the differences. It means the data required to be stored is reduced a lot. If one programme was about the same scene and just two folk talking the resulting file would be relatively small. However if another programme is a pop music one with the scene changing completely every few seconds and lights flashing on and off, the number of differences will be large and a lot of data need storing to capture the programme.
But I could be completely wrong.