It should be noted of course that each device creates these zeros and one in a different way.
So your camera for example divides a photo up into very very small squares called pixels. There are millions of pixels in a single photo.
When you take a photo, for each pixel the camera measures the strength of each of three colours - Red, Green, and Blue. It uses a number between 0 and 255.
So a single pixel may be red - 20, blue - 70, green - 187. When these colour strengths are merged together they make the right colour for that single pixel.
One photograph can consist of millions of pixels (a camera is measured in megapixels) and when you take a photo the camera makes note of the strength of red/green/blue for each pixel.
So the camera is not actually saving the picture as a picture, but as a string of numbers. So 20,70,187 for the first pixel, 200, 150, 12 for the second pixel and so on.
When it comes to save the "photo" onto the SD card it converts those DECIMAL numbers 20,70,187 etc) into BINARY numbers (0 and 1).
So your digital photo is now saved on your SD card as millions of zeroes and ones. There will be a set of zeroes and ones at the start to identify it as a digital photo and not a video file or audio track.
When your camera or computer wants to display the picture it uses a reverse process to convert these zeros and ones back into a picture.