Administrivia

                Signup sheet!!

                2 problem sets, 1 oral paper presentation, final research project

                Scribe system (but not today)

                Overall goal: to prepare students to do research in vision or medical imaging

What is computer vision?

                Working definition: extracting useful information from images

                In particular, information about image content

                Certain formal problems [high-dimensional inverse problems with spatial constraints]

                Elements of psychology, engineering, mathematics

                From a technical point of view, interplay of statistics and geometry

                From a very engineering point of view, getting info out of (typically) 512 by 512 8 or 24 bit arrays

 

Why is vision hard?

                It certainly looks easy (any child can see…)  but about 1/3 of your brain is doing it

                Ill-posed and ill-defined problems

                Inverse problems are ill-posed in the sense of Hadamard (no unique solution)

                Perception problems are always ill-posed, since the goal is to recover info about the world

                Vision is going from the (2D) image to the (3D) scene; graphics is the opposite

                Worse still, the problems in vision are ill-defined without a task

                There is no formal specification for nearly any vision problem, and few doable tasks

In terms of engineering, to build a vision system requires hooking together unreliable components whose lies cannot be checked

There are many ugly engineering problems, due to e.g. bad cameras, slow computers and busses, etc. (these are slowly being solved).

664’s take on vision

This is not an exhaustive course – there are many areas of vision you will never hear about in 664

This is mostly by choice – the Cornell vision group (= RDZ) has a fairly strong bias

Pro-algorithms, i.e. computational

Discrete math (much of vision is continuous)

Task-oriented

Try to make minimal assumptions, and reasonable ones (this is partly an AI legacy).

A bit about the field itself

About 30 years old (MIT summer vision project from ~1968)

Draws primarily from EE, then CS, then psych (but not in 664!)

About 1 major conference per year w/ 500 people, 2 major journals (see links page)

Originated largely in AI, but is now totally distinct (and proud of it…)

Somewhat related to image processing, 2D signal processing

                There is no formal definition, here is an RDZ intuition

                Sometimes the fields really do cross, e.g. MPEG-4

Tasks have evolved over time

Classic tasks:

                Recognition of tanks

                Recognition of characters (OCR)

                Industrial inspection

                Robotics?

In the last 5-10 years lots of interesting new tasks:

                Search engines for images/image databases

                Video surveillance

                Security applications: identify people via faces, iris, fingerprints (in order of less vision)

                HCI (Bill Gates’ favorite examples)

                Multimedia apps, i.e. compression

                Graphics applications!

A new emphasis this year on medical imaging applications

                From a technical point of view, a huge amount of overlap with non-medical vision

                But also some specific quirks

You can do a final project in any area of vision

What we will cover

                There is a list of topics in the 1st day handout. 

                Order of topics is roughly “low level/early” to “high level/late”

                Distinction:          low level involves direct operations on the pixels

high level involved intermediate representations

                Most of computer vision is low level/early; high-level vision is perhaps premature as a field

A very brief overview of the 1st half of the course:

                There is a classic vision problem (pixel labeling) that is extremely important

                It’s vital for almost any application

                It provides a nice intro to some of vision’s mathematical tools and techniques

        So, we’ll be talking about it in detail, starting today.

 

Image Formation (digital x-ray, conventional camera)

                Consider taking a point X-ray (photon) source, an object to be x-rayed, and a detector.

                Detector counts photons per unit time, which is what we measure

                A pixel value tells us the average density (to x-rays) of a solid angle

 

                Now consider a pinhole camera looking at a scene

                The geometry is a little more complex, but basically similar

                What we are measuring is the brightness of a patch of the world (scene element)

 

                Back to the X-ray.  Suppose that bone is very bright and soft tissue/air is very dark.

                The picture we get should, perhaps, be 200’s (bone) and 50’s (other).

                Ideally we would see only these values.  BUT, a wide variety of processes, which

                we lump together as “noise”, gives us slightly and randomly different values. 

Suppose we really want to know which pixels are bone and which are not.

 

(Why do we care?  A good example comes from angiography, where you are looking at an artery into which you’ve injected some radio opaque dye.  To find a stenosis, or to measure its seriousness, you’d really like to know for the individual pixels if they are blood or vessel.)

 

So here is our problem, usually known as “image restoration”, sometimes called “denoising”.

There is a “true” value at each pixel, which we are trying to figure out.  What we get as input

is the true value plus some noise.