2 
and a method to find the edges in noisy picture. Concludes the chapter the description of a new 
technique for automatic discrimination of text images.  
 
 3 
 
Publications 
Part of the work presented in this thesis has been published in international journals or 
conferences proceedings. More precisely: 
•  LAZA algorithm presented in section 2.3 has been presented at the Signal Processing 
and Communications Conference SPC2000 [11], Marbella (Spain); a more advanced 
version of LAZA has been published on ELSEVIER Image and Vision Computing 
Journal [10] in 2002; 
•  Algorithms proposed in chapter 2 and experimental results presented in chapter 3 will 
be presented at SPIE Electronic Imaging: Sensors, Cameras, and Applications for 
Digital Photography Conference [9], in January 2003, San Josè (CA USA). 
•  Re-indexing algorithm presented in 4.1 has been presented at IEEE Spring 
Conference on Computer Graphics SCCG2001 [7], Bratislava (Slovak Republic); a 
more detailed version has been submitted to IEEE Transactions on Image Processing 
[8]; 
•  The edge finding algorithm [20] has been presented at the Spring Conference on 
Computer Graphics SCCG2000, Bratislava (Slovak Republic); 
•  The automatic discrimination methods for text images [5] will be presented at SPIE 
Electronic Imaging – Sensors, Cameras, and Applications for Digital Photography 
Conference in January 2003, San Jose (CA USA). 
 4 
Chapter 1: Image Acquisition Devices 
 
1.1 Introduction 
Before focusing, in the next chapter, on zooming algorithm, it is useful to review the main 
hardware and technological details of today’s image acquisition devices. This review helps to 
assess the relevance of the problems that have prompted some parts of the research reported in 
this dissertation and provides the usefulness of some of the proposed approaches. 
All the algorithms studied in this thesis are applied to digital images. It is hence important 
briefly discuss the process of creation of this kind of images. There are two commonly 
available methods for creating a digital image: 
•  Take a picture using a film emulsion, process it chemically, print it onto photographic 
paper and then use a digital scanner to sample the print.  
•  Use a device that will sample the original light that the subject bounces off to create a 
digital image (digital camera, mobile phone…)  
The reduction of the price, and the increasing of the quality of the digital cameras, have 
increased the popularity of the second method. Some market predictions suggest that digital 
cameras will become as popular as film-based cameras by 2005.  
The main difference between a digital camera and a film-based camera is that the digital 
camera has no film. Instead, it has a sensor that converts light into electrical charges. The image 
sensor employed by the largest share of digital cameras is a charged-coupled device (CCD). 
Some low-end cameras (like the popular webcams) use complementary metal oxide 
semiconductor (CMOS) technology. The differences between these sensors are discussed in 
section 1.4. 
The output of a digital camera is stored in a removable device (floppy disk, flash memory 
card, etc.). As with a film camera, it possible to replace the storage device when it’s full and 
continue to store pictures over another support. The difference is that it doesn’t need to develop 
digital picture. They can be download directly to the computer and then they are ready to be 
used. 
With many cameras, it is possible to review images stored in memory on a LCD (Liquid 
Crystal Display) built in the camera. The same LCD is often used as a viewfinder. 
 5 
Most of today’s cameras store their images in JPEG
1
 format; and it is possible to select 
between “fine detail mode” and “ normal mode”. Higher-end cameras may also support the 
TIFF (Tagged Image File Format) format. While JPEG compresses the image, TIFF does not, 
so TIFF images take a lot of memory space. The advantage of TIFF storage is that no data is 
lost to the compression process (lossless image compression). 
 
 
1.2 The structure of a digital still camera 
A common Digital Still Camera has a lens system through the image’s light and the 
ambient light pass. This is directed to an eyepiece by a mirror and a prism (see Figure 1-1). 
 
Figure 1-1: A digital still camera structure. 
 
When a picture is being taken, the mirror is pivoted up so as to allow light to strike a 
recording medium. Optical signals passing through the lens are transformed into electric signals 
through an optical low-pass filter (LPF), a color filter array (CFA), and a charge-coupled 
device (CCD), respectively. The sensor device (CCD) outputs an analog electric signal, which 
passes through the correlated double-sampling (CDS) for reducing thermal noise, and its gain 
is adjusted by the automatic gain control (AGC). The output of the AGC is γ -compensated, and 
                                                           
1
 JPEG (Joint Photographic Experts Group) is a lossy compression method standardised by ISO. 
 6 
then converted into digital signals by the analog-to-digital converter (ADC). The luminance 
(Y) and chrominance (C) components coming from the digital signal are produced by the 
digital camera signal processor (DCP). These signals are employed to generate the output file 
JPEG or TIFF. If it is necessary to obtain a TV signal, the digital Y and C signals are 
transformed into the corresponding analog signals by the digital-to-analog converter (DAC), 
and then mixed to form composite TV signals [65], [66]. 
 
Figure 1-2: Digital still camera pipeline. 
 
The image quality of single-CCD color camera or camcorder is mainly determined by the 
characteristics of the DCP. Main signal processing in the DCP is divided into three parts. The 
detection module (DM) performs auto-exposure (AE), auto-focus (AF), auto-white balance 
(AWB), CDS, AGC, etc. The CDS and the AGC are performed digitally in some camcorders, 
and analogically from the others. After the DM, the signal is converted into RGB and Y 
components in the CPM. The encoding module (EM) produces the standard Y signal and the 
digitally modulated C signal, too. 
 
Figure 1-3: The DCP structure. 
 
 7 
1.3 Characteristic parameters 
During the pre-capture phase the sensor is read continuously and the output is analyzed in 
order to determine three parameters, which determine the quality of the final picture [12]:  
•  Auto-white Balancing (AWB) automatically compensates the dominant “color” 
of the scene. The human eye is able to compensate colors automatically through a 
characteristic known as Color Constancy, by which the color white is always perceived as 
white independently of the spectral characteristic of the light source illuminating the 
scene. When a scene is captured on a picture, the illuminating context is lost, color 
constancy does not hold anymore, and white balancing is required to compensate colors. 
AWB relies on the analysis of the picture in order to match the white with a reference 
white point. White balance adjustment attempts to reproduce colors naturally so images 
are not affected by surrounding light. To do that, classical techniques use either simple 
global measure of energy of the scene analyzing the relative distribution of the various 
chromatic channels or try to adapt the white color to the particular light condition (sunset, 
cloudy, …). Auto-white-balancing is sufficient for most conditions, but if there is no near 
white color in the picture, colors that are not originally white may appear white in the 
image and the white balance of the image may not be correct. Also, Auto-white-balancing 
may not have the expected effect when shooting under white fluorescent or other 
fluorescent lights. In such cases, some cameras give the possibility to use a white surface 
and quick reference white balance to achieve the correct white balance, or use preset 
white balance to select a color temperature for the incident light. Alternatively, it is 
possible to use preset white balancing to reproduce more red in a picture of a sunset, or 
capture a warmer artistic effect under artificial lighting [59].  
•  Auto Exposure determines the amount of light hitting the sensor and, 
differently than traditional cameras, the sensor itself is used for light metering. The 
exposure - the amount of light that reaches the image sensor - determines how light or 
dark the resulting photograph will be. When the shutter opens, light strikes the image 
sensor inside the camera. If too much light strikes it, the photograph will be overexposed-
washed out and faded. Too little light produces an underexposed photograph-dark and 
lacking in details, especially in shadow areas. To measure the light reflecting from the 
scene, a camera uses to built in light meters. The part of the scene they measure makes a 
great difference. Most camera read the entire image area but give more emphasis to the 
bottom part of the scene because this reduces the possibility that the bright sky will cause 
 8 
the picture to be underexposed. They also emphasize the center of the image area based 
on the assumption that the major subject is placed in. This is called a center-weighted 
system. Some system allows the user to select a small area of the scene and meter it 
directly using a spot meter. In this mode, only the part of the scene in the center of the 
viewfinder is metered [4].  
•  Auto-Focus techniques are more proprietary and vary from one manufacturer to 
another. The Auto-Focus algorithm directly affects picture sharpness. Essentially, it 
consists on extracting a measure of the high frequency content of the picture and 
changing the focus setting until this measure reaches a maximum [54].  
 
Once the picture is taken a number of different techniques such as Defect Correction, 
Noise Reduction and Color Correction are applied to compensate/enhance the sensor output 
data [12]. 
•  Defect Correction manages pixel defects related to the sensor and/or to the 
memory storing the picture. When systems on a chip solution for DSC are considered, 
both sensor and memory can be part of a more complex device. Exploiting the 
redundancy of image data, these defects can be corrected in a complete transparent way 
for the DSC manufacturer.  
•  Noise Reduction is performed to limit the visible effects of an electronic error 
(or interference) in the final image from a digital camera. Noise is a function of how well 
the sensor (CCD/CMOS) and digital signal processing systems inside the digital camera 
are prone to and can cope with or remove these errors.  
•  Color Correction simply adjust the RGB components of a color separation by 
mathematical operations and creates a new RGB output based on the relative values for 
the input components. It is also called color matrixing or color mixing.  
 
1.4 Difference between CCD and CMOS 
Both CCD and CMOS image sensors have to convert light into electrons at the photosites. 
A simplified way to think about the sensor used in a digital camera is to think of it as having a 
2-D array of thousands or millions of tiny solar cells, each of which transforms the light from 
one small portion of the images into electrons. Both CCD and CMOS devices perform this task 
using a variety of technologies [67]. 
 9 
The next step is to read the value (accumulated charge) of each cell in the image. In a 
CCD device, the charge is actually transported across the chip and read at one corner of the 
array. An analog-to-digital converter turns each pixel’s value into a digital value. In most 
CMOS devices, there are several transistors at each pixel, which amplify and move the charge 
using more traditional wires. The CMOS approach is more flexible than CCD because each 
pixel can be read individually. CCDs use a special manufacturing process to create the ability to 
transport charges across the chip without distortion. This process leads to very high-quality 
sensors in terms of fidelity and light sensitivity. CMOS chips, on the other hand, use a 
complementary normal manufacturing process to create the chip. Because of the manufacturing 
differences, there are several noticeable differences between CCD and CMOS sensors: 
•  CCD sensors create high-quality, low-noise images. CMOS sensors, traditionally, are 
more susceptible to noise; 
•  Because each pixel on a CMOS sensor has several transistors located next to it, the 
light sensitivity of a CMOS chip is lower. Many of the photons hitting the chip hit the 
transistors instead of the photodiode; 
•  CMOS sensors traditionally consume a little of power. Implementing a sensor in 
CMOS yields a low-power sensor. CCDs, on the other hand, use a special process that 
consumes lots of power. CCDs consume as much as 100 times more power than an 
equivalent CMOS sensor; 
•  CMOS chips can be built on just about any standard silicon production line, so they 
tend to be extremely inexpensive compared to CCD sensors; 
•  CCD sensors have been mass-produced for a longer period of time, so they are more 
mature. They tend to have higher quality pixels, and more of them. 
Based on these differences, it can be seen that CCDs tend to be used in cameras that are 
aimed to produce high-quality images with lots of pixels and excellent light sensitivity. CMOS 
sensors usually have lower quality, lower resolution and lower sensitivity. However, CMOS 
cameras are much less expensive and have longer battery life. Over time, CMOS sensors will 
improve to the point in which they reach near parity with CCD devices in most applications and 
probably will not reach the same quality before several years [12]. 
 10 
1.5 Resolution 
Resolution is perhaps a confusing term in describing the characteristics of a visual image 
since it has a large number of competing terms and definitions. Researches in optics define 
resolution in terms of modulation transfer function (MTF) computed as the modulus or 
magnitude of the optical transfer function (OTF). MTF is not used only to give a resolution 
limit at a single point, but also to characterize the response of the optical system to an arbitrary 
input. On the other hand, researches in digital image processing and computer vision use the 
term resolution in other three different ways [24]:  
•  Spatial resolution refers to the spacing of pixels in an image and is measured in pixel 
per inch (ppi). The higher the spatial resolution is, the greater the number of pixels in 
the image and, correspondingly, the smaller the size of individual pixels will be. This 
allows for more detailed and subtle color transitions in an image. 
•  Brightness resolution refers to the number of brightness levels that can be recorded at 
any given pixel. This relates to the quantization of the light energy collected in a 
photo-receptor element. A more appropriate term for this process is quantization. The 
brightness resolution for monochrome images is usually 256 implying that one level is 
represented by 8 bits. For full color images, at least 24 bits are used to represent one 
brightness level, i.e., 8 bits per color plane (red, green, blue). 
•  Temporal resolution refers to the number of frames captured per second and is also 
commonly known as the frame rate. It is related to the amount of perceptible motion 
between the frames. Higher frame rate results in less smearing due to movements in 
the scene. The lower limit on the temporal resolution is directly proportional to the 
expected motion during two subsequent frames. The typical frame rate suitable for a 
pleasing view is about 25 frames per second or above. 
In this thesis the term resolution univocally refers to the spatial resolution. In reality the 
number of pixels and the maximum available resolution are different. For example, a camera 
claims to be a 2.1 megapixel camera and it is capable of producing images with resolution of 
1600x1200 (i.e. 1.920.000 pixels). This is not an error, but there is a real discrepancy between 
these two numbers. If a camera nominally has 2.1 megapixels, this means that in reality there 
are approximately 2.100.000 photosites on the CCD. What happens is that some of the 
photosites are not being used for imaging. This because the CCD is an analog device. It’s 
necessary to provide some circuitry to the photosites so that the ADC can measure the amount 
 11 
of charge. This circuitry is dyed black so that it cannot absorb any light and distort the image 
[67]. 
1.6 Color filter array 
Due to the cost and packing consideration, in most of the DSC only a single electronic 
sensor for each pixel is used to capture a color image instead of three sensors to capture three 
primary colors. This is usually achieved by covering the surface of CCD with a filter mosaic 
called filter color array (CFA). Each filter in the CFA covers a single pixel in the sensor plane 
and passes only a specific spectral band in order to capture a specific color component in that 
pixel location. A typical widely used CFA pattern, proposed by Bryce Bayer, is known as a 
Bayer pattern [16]. Row 1 starts with G and alternates with R. Row 2 starts with B and 
alternates with G. The subsequent rows alternate with GRG and then BGB. It possible to notice 
that the number of G elements is equal to the double of the sum of the number of R and B. Half 
of all pixels are green versus a quarter of blue and a quarter of red pixels. This particular 
arrangement relies on the higher sensitivity of our eyes to the green color.  
 
 
Figure 1-4: Bayer Pattern. 
 
 
Figure 1-5: Stripes Pattern. 
 12 
 
Another example of a CFA pattern is known as a “stripes” pattern [1]. Column 1 is all G; 
column 2 is all R; and column 3 is all B. Columns are then always repeat G, R, and then B. 
Some cameras allow exportation of data in RAW format. In this case, data is technically 
formatted in proprietary ways and it describes the picture in the Bayer check board pattern 
mentioned above. Such a feature can be used from a professional photograph working with 
original input data in order to apply his own enhancement techniques.  
If the DSC pipeline is not interrupted to obtain a RAW image, as mentioned above, in the 
CPM are recovered the missing two colors in each pixel location. Usually, they are estimated 
using the color information of the neighboring pixels. The methodology to recover these 
missing colors in every pixel location from sub-sampled image is popularly known as “color 
interpolation”. The best color interpolation algorithm allows to improve the quality of the final 
image without a high associated computational complexity. 
In the following paragraphs two classical color interpolation algorithms are reported: 
replication and bilinear. 
 
 
Figure 1-6: A simplified Digital Still Camera pipeline with reference to CFA. 
 13 
1.6.1 Nearest neighbor interpolation - replication 
Each interpolated output pixel is assigned by the value of the nearest pixel in the input 
image. The nearest neighbor can be one of the upper, lower, left and right pixels. An example is 
illustrated below. 
 
 
Figure 1-7: Replication. 
 
1.6.2 Bilinear interpolation 
The average of the upper, lower, left and right pixel values is assigned as the G value of 
the interpolated pixel. For example: G8=(G3+G7+G9+G13)/4. 
 
 
Figure 1-8: Bayer Pattern. 
 
Interpolation of red/blue pixels at a green position: the average of two adjacent pixel 
values in corresponding color is assigned to the interpolated pixel. For example: 
B7=(B6+B8)/2; R7=(R2+R12)/2. 
 14 
Interpolation of a red/blue pixel at a blue/red position: the average of four adjacent 
diagonal pixel values is assigned to the interpolated pixel. For example: 
R8=(R2+R4+R12+R14)/4; B12=(B6+B8+B16+B18)/4. 
More generally, in a green position, the green value is present and the red/blue must be 
calculated. In a red position, green and blue are compulsory; a blue position needs green and 
red values. 
 
 
Red position: 
         
 
Green position: 
         
 
Blue position: 
         
Figure 1-9: Bilinear color interpolation. 
 
 15 
1.6.3 Some considerations 
Replication and bilinear methods are the simplest color interpolation techniques. Despite 
the simplicity of the idea, their performance is not the best. Replication gives images with 
“stairs”; bilinear returns pictures with high smoothing effects. The commercialized devices use 
more sophisticated techniques. These color interpolation algorithms are edge sensing and 
performs color correction and error reduction ([1], [23], [26], [52]). 
Figure 1-10 reports different results obtained using different methods. The best visual 
results are obtained with a more sophisticated algorithms, which preserve the high frequency 
related to the edges and reduce the error computation. 
 
 
(a) 
 
(b) 
 
(c) 
 
(d) 
 
(e) 
 
(f) 
Figure 1-10: (a) ideal image; (b) bayer pattern; (c) replication; (d) bilinear; (e) edge sensing 
interpolation; (f) interpolation with color correction. 
1.7 Conclusions 
In this chapter the main features of today’s digital acquisition devices have been 
reviewed. The acquisition format for digital images and color interpolation have been discussed 
in great detail because these elements are relevant for the results proposed in the next chapters.