2008年2月25日 星期一

[Paper Review] Image Retrieval: Ideas, Influences, and Trends of the New Age

This paper is quite long and I have to admit that I didn't spend enough time on it. Perhaps it is because of the lack of background knowledge that make me hard to go ahead every sentence. I have tried to apply the first pass as described in "How to read a paper?" but it doesn't help too much... I would talk about the read materials in the following anyway.

This comprehensive survey is for the progress in the last decade in CBIR(Content-Based Image Retrieval). It talks about 1) the three design aspects in both user's and system's perspective(skipped here) 2) key techniques developed in this time interval 3) CBIR derived problems and 4) evaluation metrics for CBIR system. One word I would like to quote from the paper is that text is man's creation while images are there by itself. It is no wonder we find images much harder to drive than text.

The core obstacles about CBIR is around two main problems, namely, how to express the image in a more useful formulation and how to establish similarity relationships between images. In fact, we can say that every scientific research field faces the first problem and every abstractive domain is subject to the second. The researchers in CBIR tackled the first problem by feature extraction at different levels, which the authors have found a tendency from global ones to local ones. Global features, while capture overall characteristics of the entire image, is often too cumbersome to deal with local transformations that exposed by general images. Thus, local descriptors received more and more attention in the decade. Beyond local features is the summarization or clustering, that aims to give image signatures that can help later similarity assessment, which can be viewed as "spaces" of images as a whole.

The similarity estimation seems to me much alike to machine learning methods. In this part, researchers work to find some reasonable energy functions, such that after optimization w.r.t. ground truth data, can be used to compute a similarity measure between image signatures. It comes here one can resort to pre-processing methods (clustering, categorization) or user relevance feedback to ease the burden of similarity measurements and to speed up the performance. Among those, relevance feedbacks are of distinctness, not only because it is the most direct communication with users, but also it requires a careful design to "catch" the user.
Finally, it is a novel direction to fuse informations from other medias for better content retrieval.

In the long way of CBIR it has also derived many branch fields and research topics. Those new problems include image annotation, that aims to give conceptual textual descriptions for images, pictures for story, that finds pictures best depicting a concept(reverse of the first one), aesthetics measurement, that measure aesthetic perception for images, imagery security, that exploits the limitation of CBIR system to help personality identification and many so on. It is also roughly mentioned about CBIR system evaluation at the end of the paper, to show the lack of related researches.

From the author's perspective, we should pay more attentions on application oriented aspects since it directly influence the success of any system and have their own rights to be "considered equally important". The growth of image sizes also pose a hard barrier for the future of CBIR. In any case, we can anticipate a continuing progress in this field.

[Paper Review] How to Give a Good Research Talk?

This paper aims to help people to give a good talk for their researches, especially in the computer science domain. I found many points in this paper valuable and they should be thought of every time when preparing presentation. In the following I will talk about those points for summarization.

One thing so important that I have to repeat it here is the two questions in the beginning:
  • Who is my primary audience?
  • If someone remembers only one thing from my talk, what would I like it to be?
We shall ask ourselves both questions each time before going into slides, especially the second one. In my experience, quite often we regard the presentation as an artwork or a performance, but forget that all the most vital thing is to build a intended comprehension for the audience to your ideas. To this end, we should evaluate every thing in our slides: if it help to make clear?

The other three points proposed in the section 2, in the order: using examples, pruning and to be honest, are all important things for presentation too. I usually intend to conceal problems in the presentation in the past. This is certainly one thing I should avoid in the future.

The section 3 is somehow out-of-date as the popularization of PowerPoint recently. Nevertheless, we can still find many shared pitfalls. For example, people often incline to put too many things on one single slide, or just type exactly the words they are going to speak. The most important thing here, I think, is not to start writing slides too early. This effectively prevent you from putting too much content...

Finally, I have found two points important in the last section: being careful about visual tricks and timing. Beginner presenters usually use many animations on their slides, as it is simple to add in PowerPoint. Most of the "visual aids" are in fact annoying and distracting. Over-running is also something we commit quite often. Just like described in the paper, it is selfish and rude, and we should all try to work things out in time.

2008年2月21日 星期四

[Paper Review] How to Read a Paper?

The author mainly talks about his "3-pass" guidelines for efficient paper reading in this paper. The audience he targets on is graduate students who has just started doing researches, but not limited to. I will summarize the 3-passes here along with my own opinion:

The first pass is to skim over the whole paper, partly like a common people without background knowledge would do. In this pass, one looks at abstracts, introductions, figures and tables to get a roughly understanding about this paper. The author suggests this fast pass to be a good indicator for further "action", either taking one step ahead or putting it aside the desk. One important thing I have learned from this section is that one should put more emphasis on those parts when writing paper, otherwise it would keep readers away from your clever ideas.

In the second pass, the author expects us to read a paper "as we usually do". This includes a complete go-through of the paper and to think over the main ideas proposed. A common mistake I often commit is to read every paper I find to this depth, which is very time expensive if it is not really useful. Nevertheless, I think a careful reading is still necessary(and usually proves to be) for many papers we decide to click "download".

If we really find something beneficial to our work and just don't feel enough after the second pass, the third pass would be a perfect end for our need. The author recommend to virtually re-implement the paper in your mind, so that you can re-create the work using the same assumption. I totally agree with him at this point. Not until pen and paper or codes are reached, we seldom truly understand the deep meaning hidden in the paper.

At last, the author also talks about general flows to do survey for a unfamiliar field. One thing I would like to highlight here is to browse key researcher's website for a whole picture about how this field is going on. It is often much more easy to do it this way than to scan proceedings' website...

2008年2月19日 星期二

First shot!

Cheer for the opening of my first blog!

It is amusing that I start to blog just because of academic reasons...