Unit 2.2 Data Compression, Images
Lab will perform alterations on images, manipulate RGB values, and reduce the number of pixels. College Board requires you to learn about Lossy and Lossless compression.
- Data Compression Blog
- 2.2 Compressing Data Video Notes
- Compiled Questions from this Notebook
- Incorporate into my project
- My Use of PIL
- Enumerate "Data" Big Idea from College Board
- Image Files and Size
- Displaying images in Python Jupyter notebook
- Reading and Encoding Images (2 implementations follow)
- Data Structures and OOP
- Additionally, review all the imports in these three demos. Create a definition of their purpose, specifically these ...
- Hacks
Data Compression Blog
2.2 Compressing Data Video Notes
- Data compression is a reduction in the number of bits needed to represent data
- Data compression is used to save transmission time and storage space
- When data is compressed, repeated patterns and predictability
- compression can be lossy or lossless...
Lossy vs Lossless
- Lossy data compression: reduces the number of bits stored or transmitted while guarenteeing complete reconstruction of the original data. (approach used when removal of some data has no effect on the representation of the content) Ex. Graphics, audio, Video, Images
- Lossless data compression: reduced the number of bits stored or transmitted while guarenteeing complete reconstruction of the original data (approach used where the loss of data would change the information Ex. text, spreadsheets)
This could have lossy compression because it is a solid color, so losing specific parts of the image woul dnot affect it's overall look or function as a blue square:
This must have lossless compression because it is a detailed picture. With lossy, important parts of the picture may be left out which can change the overall function of the picture:
- In short, if an image is simple it is more likely to use lossy compression than a complicated image since no critical data will be lost in a simple picture
Compiled Questions from this Notebook
What are commands you use in terminal to access files?
You can use cd which changes directory, ls which lists the content of a directory, rm which deletes a file
What are the command you use in Windows terminal to access files?
You can use cd to change directory, dir to list the contents of a directory, and del to remove a file
What are some of the major differences?
Terminal uses a "Unix-file" system, meaning many commands that work in terminal will not work with windows. For example the commands for delete and listing the contents of a directory are different with each. Also file paths are different.
Provide what you observed, struggled with, or leaned while playing with this code.
Originally I was having alot of probems with the images. In the begining the images would not load due to a format issue. The name suffix of the file did not match the file type itself. Then I had issues with the img path. With some research I was able to correct my mistake and realize that my path was calling a non existent file since my image itself was not in the right images folder. Path is very important when working with images because if the program can't find the image, it can't run your manipulations.
Why is path a big deal when working with images?
The path is important when working with images because the path is the location of the image file in your system. When you work with images you have to specific the image's path in order to display it or manipulate it using compression methods. If the path is incorrect you can not work with the image.
How does the meta data source and label relate to Unit 5 topics?
Meta data is the data that provides background information about other data. It describes teh characteristics of data. In unit 5's context, understanding metadata is important for anaylyzing data and getting insight from it. By understanding metadata, we can learn how to manipulate and extract important data and draw conclusions about it, which is a major focus of unit 5.
Look up IPython, describe why this is interesting in Jupyter Notebooks for both Pandas and Images?
Ipython provides a more flexible and interactive environment for working with pandas and images in jupyter notebooks. Ipython's tab completion allows to quickly view functions available to manipuate data.
Does this code seem like a series of steps are being performed?
Yes this code (grey scale code) looks like a series of steps. It looks like the functions such as management are depended on by other functions to run, and are all called at the very end of the code.
Describe Grey Scale algorithm in English or Pseudo code?
First the function revieves an image object with a PIL object 'img'. The function then recieves the data using img.getdata() and stores it in img_data. img_data is converted into a NumPy array. image[] --> gray_data is a list created to store the grey scale values of the image. The function iterates through every pixel in image[]--> data and calculates the average value of its RGB and stores the value in a variable called 'average'. If the length of the pixel is greater than 3 the img is assumed to be PNG and its grey scale values are appended to the list along with the transparency values (alpha channel values). If it is not png only the grey scale values are appended. img.putdata method is used to update the image with the new grey scale values. The function then generates HTML for the grey scale image, stores it, and then displays.
Describe scale image? What is before and after on pixels in three images?
In a scale image, colors are used to represent different data values. These colors can be manipulated causing 'filters' such as grey scale, and red, green, blue scale. Thre three images go through the grey scale process and the outcome is a black and white version of the original photo.
Is scale image a type of compression? If so, line it up with College Board terms described?
Scale image is not a type of compression. It can be used as a precursor step to image compression but they are not the same. Image compression techniques can be used to compress color scale images though.
Incorporate into my project
I could not do this since my project does not incorporate images with actual function.
from IPython.display import Image, display
from pathlib import Path # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import ImageFilter
import numpy as np
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': " World Flower database", 'label': "Tulip", 'file': "tulip.png"}
]
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
def image_display(images):
for image in images:
display(Image(filename=image['filename']))
# Large image scaled to baseWidth of 320
def scale_image(img):
baseWidth = 320
scalePercent = (baseWidth/float(img.size[0]))
scaleHeight = int((float(img.size[1])*float(scalePercent)))
scale = (baseWidth, scaleHeight)
return img.resize(scale)
# PIL image converted to base64
def image_to_base64(img, format):
with BytesIO() as buffer:
img.save(buffer, format)
return base64.b64encode(buffer.getvalue()).decode()
# Set Properties of Image, Scale, and convert to Base64
def image_management(image): # path of static images is defaulted
# Image open return PIL image object
img = pilImage.open(image['filename'])
# Python Image Library operations
image['format'] = img.format
image['mode'] = img.mode
image['size'] = img.size
# Scale the Image
img = scale_image(img)
image['pil'] = img
image['scaled_size'] = img.size
# Scaled HTML
image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
# Create Grey Scale Base64 representation of Image
def image_management_add_html_grey(image):
# Image open return PIL image object
img = image['pil']
format = image['format']
img_data = img.getdata() # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
image['data'] = np.array(img_data) # PIL image to numpy array
image['gray_data'] = [] # key/value for data converted to gray scale
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in image['data']:
# create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
average = (pixel[0] + pixel[1] + pixel[2]) // 3 # average pixel values and use // for integer division
if len(pixel) > 3:
image['gray_data'].append((average, average, average, pixel[3])) # PNG format
else:
image['gray_data'].append((average, average, average))
# end for loop for pixels
img.putdata(image['gray_data'])
image['html_grey'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)
# Red Scale Base64 rep. of the image
def image_management_add_html_red(image):
# Image open return PIL image object
img = image['pil']
format = image['format']
img_data = img.getdata() # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
image['data'] = np.array(img_data) # PIL image to numpy array
image['data_red'] = [] # # data_red is a new list which will contain key-value pair of the converted red-scale data
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in image['data']: # this will loop through every pixel (object) in the image data
if len(pixel) > 3:
image['data_red'].append((pixel[0], 0, 0, pixel[3])) # the zeroes are representing the blue and green being set to 0. this is in RGB format
else:
image['data_red'].append((pixel[0], 0, 0))
img.putdata(image['data_red'])
image['html_red'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)
def image_management_add_html_blur(image):
img = image['pil'].filter(ImageFilter.BLUR) # updating image[] with the new blurred image
# set new img pil_blur to original img
image["pil_blur"] = img
image['html_blur'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image["pil_blur"], image['format'])
# all image[html] store the manipulated, in this case blurred image into a string of base 64 (as stated above). This representation of the image is stored in HTML4
# We can repeat this exact process for blue and green. Instead I'm going to adjust some of the colors manually for a surprise filter.
def image_management_add_html_surprise(image):
# Image open return PIL image object
img = image['pil']
format = image['format']
img_data = img.getdata() # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
image['data'] = np.array(img_data) # PIL image to numpy array
image['data_surprise'] = [] # # data_red is a new list which will contain key-value pair of the converted red-scale data
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in image['data']: # this will loop through every pixel (object) in the image data
if len(pixel) > 3:
image['data_surprise'].append((pixel[0], 20, 102, pixel[3])) # the zeroes are representing the blue and green being set to 0. this is in RGB format
else:
image['data_surprise'].append((pixel[0], 102, 20))
img.putdata(image['data_surprise'])
image['html_surprise'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)
# Run this as standalone tester to see sample data printed in Jupyter terminal
if __name__ == "__main__":
# display default images from image_data()
print("The Original Image:")
default_images = image_data()
image_display(default_images)
images = image_data()
# Display meta data, scaled view, and grey scale for each image
for image in images:
image_management(image)
print("--Grey Scale!--")
image_management_add_html_grey(image)
display(HTML(image['html_grey']))
print("--Blurry!--")
image_management_add_html_blur(image)
display(HTML(image['html_blur']))
print("--Red Scale!--")
image_management_add_html_red(image)
display(HTML(image['html_red']))
print("--Surprise Scale!--")
image_management_add_html_surprise(image)
display(HTML(image['html_surprise']))
print()
I think the surprise scale is definitely my favorite one!!
Enumerate "Data" Big Idea from College Board
Some of the big ideas and vocab that you observe, talk about it with a partner ...
- "Data compression is the reduction of the number of bits needed to represent data"
- "Data compression is used to save transmission time and storage space."
- "lossy data can reduce data but the original data is not recovered"
- "lossless data lets you restore and recover"
The Image Lab Project contains a plethora of College Board Unit 2 data concepts. Working with Images provides many opportunities for compression and analyzing size.
Image Files and Size
Here are some Images Files. Download these files, load them into
images
directory under _notebooks in your Blog.
Describe some of the meta data and considerations when managing Image files. Describe how these relate to Data Compression ...
- File Type, PNG and JPG are two types used in this lab
- Size, height and width, number of pixels
- Visual perception, lossy compression
Displaying images in Python Jupyter notebook
Python Libraries and Concepts used for Jupyter and Files/Directories
IPython
Support visualization of data in Jupyter notebooks. Visualization is specific to View, for the web visualization needs to be converted to HTML.
pathlib
File paths are different on Windows versus Mac and Linux. This can cause problems in a project as you work and deploy on different Operating Systems (OS's), pathlib is a solution to this problem.
from IPython.display import Image, display
from pathlib import Path # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"},
{'source': "Smiley", 'label': "Smiley", 'file': "smiley.png"}
]
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
def image_display(images):
for image in images:
display(Image(filename=image['filename']))
# Run this as standalone tester to see sample data printed in Jupyter terminal
if __name__ == "__main__":
# print parameter supplied image
green_square = image_data(images=[{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"}])
image_display(green_square)
# display default images from image_data()
default_images = image_data()
image_display(default_images)
Reading and Encoding Images (2 implementations follow)
PIL (Python Image Library)
Pillow or PIL provides the ability to work with images in Python. Geeks for Geeks shows some ideas on working with images.
base64
Image formats (JPG, PNG) are often called *Binary File formats, it is difficult to pass these over HTTP. Thus, base64 converts binary encoded data (8-bit, ASCII/Unicode) into a text encoded scheme (24 bits, 6-bit Base64 digits). Thus base64 is used to transport and embed binary images into textual assets such as HTML and CSS.- How is Base64 similar or different to Binary and Hexadecimal?
- Translate first 3 letters of your name to Base64.
numpy
Numpy is described as "The fundamental package for scientific computing with Python". In the Image Lab, a Numpy array is created from the image data in order to simplify access and change to the RGB values of the pixels, converting pixels to grey scale.
io, BytesIO
Input and Output (I/O) is a fundamental of all Computer Programming. Input/output (I/O) buffering is a technique used to optimize I/O operations. In large quantities of data, how many frames of input the server currently has queued is the buffer. In this example, there is a very large picture that lags.
- Where have you been a consumer of buffering?
- From your consumer experience, what effects have you experienced from buffering?
- How do these effects apply to images?
from IPython.display import HTML, display
from pathlib import Path # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
{'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}
]
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
# Large image scaled to baseWidth of 320
def scale_image(img):
baseWidth = 320
scalePercent = (baseWidth/float(img.size[0]))
scaleHeight = int((float(img.size[1])*float(scalePercent)))
scale = (baseWidth, scaleHeight)
return img.resize(scale)
# PIL image converted to base64
def image_to_base64(img, format):
with BytesIO() as buffer:
img.save(buffer, format)
return base64.b64encode(buffer.getvalue()).decode()
# Set Properties of Image, Scale, and convert to Base64
def image_management(image): # path of static images is defaulted
# Image open return PIL image object
img = pilImage.open(image['filename'])
# Python Image Library operations
image['format'] = img.format
image['mode'] = img.mode
image['size'] = img.size
# Scale the Image
img = scale_image(img)
image['pil'] = img
image['scaled_size'] = img.size
# Scaled HTML
image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
# Create Grey Scale Base64 representation of Image
def image_management_add_html_grey(image):
# Image open return PIL image object
img = image['pil']
format = image['format']
img_data = img.getdata() # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
image['data'] = np.array(img_data) # PIL image to numpy array
image['gray_data'] = [] # key/value for data converted to gray scale
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in image['data']:
# create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
average = (pixel[0] + pixel[1] + pixel[2]) // 3 # average pixel values and use // for integer division
if len(pixel) > 3:
image['gray_data'].append((average, average, average, pixel[3])) # PNG format
else:
image['gray_data'].append((average, average, average))
# end for loop for pixels
img.putdata(image['gray_data'])
image['html_grey'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)
# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
# Use numpy to concatenate two arrays
images = image_data()
# Display meta data, scaled view, and grey scale for each image
for image in images:
image_management(image)
print("---- meta data -----")
print(image['label'])
print(image['source'])
print(image['format'])
print(image['mode'])
print("Original size: ", image['size'])
print("Scaled size: ", image['scaled_size'])
print("-- original image --")
display(HTML(image['html']))
print("--- grey image ----")
image_management_add_html_grey(image)
display(HTML(image['html_grey']))
print()
Data Structures and OOP
Most data structures classes require Object Oriented Programming (OOP). Since this class is lined up with a College Course, OOP will be talked about often. Functionality in remainder of this Blog is the same as the prior implementation. Highlight some of the key difference you see between imperative and oop styles.
- Read imperative and object-oriented programming on Wikipedia
- Consider how data is organized in two examples, in relations to procedures
- Look at Parameters in Imperative and Self in OOP
Additionally, review all the imports in these three demos. Create a definition of their purpose, specifically these ...
- PIL
- numpy
- base64
from IPython.display import HTML, display
from pathlib import Path # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np
class Image_Data:
def __init__(self, source, label, file, path, baseWidth=320):
self._source = source # variables with self prefix become part of the object,
self._label = label
self._file = file
self._filename = path / file # file with path
self._baseWidth = baseWidth
# Open image and scale to needs
self._img = pilImage.open(self._filename)
self._format = self._img.format
self._mode = self._img.mode
self._originalSize = self.img.size
self.scale_image()
self._html = self.image_to_html(self._img)
self._html_grey = self.image_to_html_grey()
@property
def source(self):
return self._source
@property
def label(self):
return self._label
@property
def file(self):
return self._file
@property
def filename(self):
return self._filename
@property
def img(self):
return self._img
@property
def format(self):
return self._format
@property
def mode(self):
return self._mode
@property
def originalSize(self):
return self._originalSize
@property
def size(self):
return self._img.size
@property
def html(self):
return self._html
@property
def html_grey(self):
return self._html_grey
# Large image scaled to baseWidth of 320
def scale_image(self):
scalePercent = (self._baseWidth/float(self._img.size[0]))
scaleHeight = int((float(self._img.size[1])*float(scalePercent)))
scale = (self._baseWidth, scaleHeight)
self._img = self._img.resize(scale)
# PIL image converted to base64
def image_to_html(self, img):
with BytesIO() as buffer:
img.save(buffer, self._format)
return '<img src="data:image/png;base64,%s">' % base64.b64encode(buffer.getvalue()).decode()
# Create Grey Scale Base64 representation of Image
def image_to_html_grey(self):
img_grey = self._img
numpy = np.array(self._img.getdata()) # PIL image to numpy array
grey_data = [] # key/value for data converted to gray scale
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in numpy:
# create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
average = (pixel[0] + pixel[1] + pixel[2]) // 3 # average pixel values and use // for integer division
if len(pixel) > 3:
grey_data.append((average, average, average, pixel[3])) # PNG format
else:
grey_data.append((average, average, average))
# end for loop for pixels
img_grey.putdata(grey_data)
return self.image_to_html(img_grey)
# prepares a series of images, provides expectation for required contents
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
{'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}
]
return path, images
# turns data into objects
def image_objects():
id_Objects = []
path, images = image_data()
for image in images:
id_Objects.append(Image_Data(source=image['source'],
label=image['label'],
file=image['file'],
path=path,
))
return id_Objects
# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
for ido in image_objects(): # ido is an Imaged Data Object
print("---- meta data -----")
print(ido.label)
print(ido.source)
print(ido.file)
print(ido.format)
print(ido.mode)
print("Original size: ", ido.originalSize)
print("Scaled size: ", ido.size)
print("-- scaled image --")
display(HTML(ido.html))
print("--- grey image ---")
display(HTML(ido.html_grey))
print()
Hacks
Early Seed award
- Add this Blog to you own Blogging site.
- In the Blog add a Happy Face image.
- Have Happy Face Image open when Tech Talk starts, running on localhost. Don't tell anyone. Show to Teacher.
AP Prep
- In the Blog add notes and observations on each code cell that request an answer.
- In blog add College Board practice problems for 2.3
- Choose 2 images, one that will more likely result in lossy data compression and one that is more likely to result in lossless data compression. Explain.
Project Addition
- If your project has images in it, try to implement an image change that has a purpose. (Ex. An item that has been sold out could become gray scale)
Pick a programming paradigm and solve some of the following ...
- Numpy, manipulating pixels. As opposed to Grey Scale treatment, pick a couple of other types like red scale, green scale, or blue scale. We want you to be manipulating pixels in the image.
- Binary and Hexadecimal reports. Convert and produce pixels in binary and Hexadecimal and display.
- Compression and Sizing of images. Look for insights into compression Lossy and Lossless. Look at PIL library and see if there are other things that can be done.
- There are many effects you can do as well with PIL. Blur the image or write Meta Data on screen, aka Title, Author and Image size.