Nothing Special   »   [go: up one dir, main page]

US20140169680A1 - Image Object Recognition Based on a Feature Vector with Context Information - Google Patents

Image Object Recognition Based on a Feature Vector with Context Information Download PDF

Info

Publication number
US20140169680A1
US20140169680A1 US13/717,706 US201213717706A US2014169680A1 US 20140169680 A1 US20140169680 A1 US 20140169680A1 US 201213717706 A US201213717706 A US 201213717706A US 2014169680 A1 US2014169680 A1 US 2014169680A1
Authority
US
United States
Prior art keywords
feature vector
area
image
feature
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/717,706
Other versions
US9165220B2 (en
Inventor
Henry "Hoa" Tang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to US13/717,706 priority Critical patent/US9165220B2/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. reassignment HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TANG, HAO
Publication of US20140169680A1 publication Critical patent/US20140169680A1/en
Application granted granted Critical
Publication of US9165220B2 publication Critical patent/US9165220B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • G06K9/48
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/768Arrangements for image or video recognition or understanding using pattern recognition or machine learning using context analysis, e.g. recognition aided by known co-occurring patterns

Definitions

  • Object recognition may involve determining the presence of an object in an image based on a statistical comparison of the features of the image to features representative of the object.
  • a processor may create feature vectors where each feature vector includes information about the local features of the image in a particular area of the image.
  • the processor may analyze a group of feature vectors to determine the likelihood of a particular type of object appearing in the image.
  • FIG. 1 is a block diagram illustrating one example of an apparatus to recognize an object in an image based on a context expanded feature vector.
  • FIG. 2 is a flow chart illustrating one example of a method to recognize an object in an image based on a context expanded feature vector.
  • FIG. 3 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context information from adjacent areas of the image.
  • FIG. 4 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context derivative information.
  • FIG. 5 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context derivative information and adjacent area information.
  • FIG. 6 is a diagram illustrating one example of a flow chart of a method to recognize an object in an image based on a context expanded feature vector.
  • Feature vectors may be vectors that represent information about the local features in a particular area of the image. For example, each of the vector values may represent information about a particular local feature in the area represented by the feature vector.
  • the feature vector such as a Scale Invariant Feature Transform (SIFT) vector, may assume the independence of the local features and limit itself to the features in the particular area covered by the feature vector.
  • Object recognition methods such as bag-of-words classifier and support vector machine classifier, may be used to analyze feature vectors of different areas of the image to determine the probability of a particular type of object being present in the image.
  • a feature vector is expanded to include both information about the features of a particular area of the image represented by the feature vector and information about context related to the features of the particular area of the image.
  • the context information may include, for example, information about features of adjacent areas of the image and/or information about a rate of change of features from the area of the image.
  • the expanded feature vector may then be analyzed using object recognition methods to determine the likelihood of the presence of a particular type of object.
  • Analyzing feature vectors including context information may improve the accuracy of object recognition methods. Feature vectors that do not include context information may incorrectly assume that local image features are independent Expanding feature vectors to include context information allows for local low level context information to be taken into account when performing object recognition.
  • FIG. 1 is a block diagram illustrating one example of an apparatus 100 to recognize an object in an image based on a context expanded feature vector.
  • the apparatus 100 may create an expanded feature vector that includes information about the features of the area represented by the feature vector as well as context information related to the area.
  • the context information may be related to local image features that provide context to the image features covered by the feature vector.
  • the apparatus 100 may analyze the feature vector to recognize an object within the image.
  • the apparatus 100 may include a processor 101 and a machine-readable storage medium 102 .
  • the processor 101 may be a central processing unit (CPU), a semiconductor-based microprocessor, or any other device suitable for retrieval and execution of instructions.
  • the processor 101 may include one or more integrated circuits (ICs) or other electronic circuits that comprise a plurality of electronic components for performing the functionality described below. The functionality described below may be performed by multiple processors.
  • ICs integrated circuits
  • the processor 101 may communicate with the machine-readable storage medium 102 .
  • the machine-readable storage medium 102 may be any suitable machine readable medium, such as an electronic, magnetic, optical, or other physical storage device that stores executable instructions or other data (e.g., a hard disk drive, random access memory, flash memory, etc.).
  • the machine-readable storage medium 102 may be, for example, a computer readable non-transitory medium.
  • the machine-readable storage medium 102 may include instructions executable by the processor 101 .
  • the machine-readable storage medium 102 may include expanded feature vector creation instructions 103 , object presence determination instructions 104 , and object information output instructions 105 .
  • the expanded feature vector creation instructions 103 may include instructions to create a feature vector that includes information about the area covered by the feature vector in addition to context information related to the area covered by the feature vector.
  • the expanded feature vector may indicate the local features of the particular area as well as the interaction with other areas.
  • the expanded feature vector may be created in any suitable manner.
  • the information may be aggregated or summarized.
  • the expanded vector is created by stacking the feature vector for the area with a context information feature vector.
  • An image may be divided into sections, such as in a grid pattern, and a feature vector may be associated with each individual section.
  • An expanded feature vector may include information about the section covered by the feature vector as well as context information about the local features nearby the section covered by the feature vector.
  • the context information may include, for example, information about the local features of areas of the image nearby the particular area.
  • the processor may determine a window around the area and include feature information about grid positions within the window.
  • the feature information for the other grid positions may be determined in the same manner as the feature information for the particular area covered by the feature vector.
  • the context information includes comparison information related to areas of the image in a window surrounding the area. For example, a derivative of the features of the particular area may be determined and included in the expanded feature vector. The derivative may be any order, and any number of derivatives in any suitable directions may be used.
  • the derivative context information may provide information related to the dynamic features in the area of the image.
  • the object presence determination instructions 104 may include instructions to analyze the expanded feature vector to recognize an object within the image. Any suitable method may be used. In one implementation, a bag-of-words classifier method is used to recognize an object within the image based on an expanded feature vector. In some cases, the same methods or similar methods to those used to analyze a feature vector may be applied to an expanded feature vector.
  • any number of feature vectors related to the image may include context information.
  • the entire set of feature vectors or a subset of the feature vectors may be expanded feature vectors with context information.
  • the processor 101 may analyze any number of the feature vectors related to the image to determine the likelihood of the presence of a particular type of object within the image.
  • the feature vectors may be analyzed to determine the presence of a human face in the image, and the location of the face may be determined based on the location of the feature vectors indicating a face.
  • the object information output instructions 105 may include instructions to output information about the object determined to be in the image.
  • Information about the object may be stored, displayed, or transmitted.
  • the information may be any suitable information, such as information about the object and the probability of the object being presence in the image.
  • the machine-readable storage medium 102 may include additional instructions to process the image based on the recognition of a particular type of object within the image.
  • the processor 101 may output additional information about the detected object, such as location or other characteristics.
  • FIG. 2 is a flow chart illustrating one example of a method to recognize an object in an image based on a context expanded feature vector.
  • object recognition methods such as a bag-of-words classifier method may be applied to feature vectors that include values indicating features of the particular area represented by the feature vector.
  • An expanded feature vector may be created where the feature vector includes both values indicating features of the particular area and values providing context information related to the features of the particular area.
  • the context information may indicate information about the local features nearby the particular area of the image.
  • An object recognition method may then be applied to the expanded feature vector such that individual areas and their context within the image are taken into account when determining the probability of a presence of a particular type of object within the image.
  • the method may be implemented, for example, by the processor 101 in FIG. 1 .
  • a processor creates an expanded feature vector related to a first area of an image including context information related to the first area.
  • the image may be, for example, retrieved from a storage or received from a remote source via a network.
  • the image may be any image content in any suitable format, such as a JPEG image.
  • the area of the image may be determined in any suitable manner.
  • the image may be divided into a grid pattern, and a feature vector may be created for each grid position to represent the local features of each grid position. Each vector value may represent a different feature of the grid position.
  • the features may be related to, for example, color, texture, edges, or intensity.
  • the features may also be color or intensity gradient based features, such as Scale-invariant Feature Transform (SIFT) or Histogram of Oriented Gradients (HOG).
  • SIFT Scale-invariant Feature Transform
  • HOG Histogram of Oriented Gradients
  • the context information may be, for example, information related to the area around the area covered by the feature vector.
  • the feature vector may include feature information about the features in grid positions adjacent to or near the grid position of the feature vector. In some cases, the grid positions are not adjacent, such as where more important information for object recognition is not included in an adjacent grid position.
  • the processor includes feature information of eight grid positions adjacent to the grid position. The processor may determine a window centered around the grid position of the feature vector. The window may be, for example, a square or circular window, and the feature vectors of the grid positions included within the window may be included in the expanded feature vector. In one implementation, the processor analyzes the features of the nearby areas to determine which areas information to include within the expanded feature vector.
  • the feature vector may include the context information by combining the feature information with the context information.
  • the feature includes values representative of the grid position related to the feature vector and additional values are concatenated in the same vector.
  • the data may be stacked in the vector such that the feature vector is stacked with additional feature vectors from nearby locations in the image.
  • the information may be combined such that it is aggregated or summarized.
  • the feature vector includes comparison information of the area covered by the feature vector to nearby areas in the image.
  • the context information may be dynamic feature information describing changes occurring within the image from the originating grid position across the spatial space of the image.
  • the processor may determine a derivative related to the feature vector. For example, the processor may determine a velocity or acceleration of a feature along an x or y axis of the image. Any number and order of derivative may be used and any number of axes may be used.
  • a first order velocity derivative is obtained by taking the finite difference derivatives of the features of the feature vector over the x and y directions using a finite length window centered at the grid position associated with the feature vector
  • a second order acceleration derivative is obtained by taking the finite difference derivatives of the velocity features over the x and y directions using a finite length window centered at the grid position associated with the feature vector.
  • the finite length window for the directive may be any size and position.
  • the velocity along the x direction in the grid position i,j may be determined as the following where k is a position indicator:
  • the velocity along the y direction in the grid position i,j may be determined as the following where k is a position indicator:
  • the acceleration along the x direction in the grid position i,j may be determined as the following where k is a position indicator:
  • the acceleration along the y direction in the grid position i,j may be determined as the following where k is a position indicator:
  • a feature vector may be created that includes original, velocity, and acceleration features.
  • the features are stacked in the vector as if the vectors are combined.
  • the features are aggregated and at least some vector positions include summary information.
  • a processor determines the presence of an object in the image based on an analysis of the expanded feature vector.
  • Any number of feature vectors may be used to determine the presence of an object in the image. For example, feature vectors from each grid position in an image or a subset of grid positions may be analyzed. Any number of vectors used in the method may include context information. For example, the entire set of vectors or a subset of the vectors may include context information. In some implementations, vectors in particular positions or with particular features are expanded to include context information.
  • Any suitable method for determining the likelihood of the presence of an object within an image based on a feature vector may be used. For example, spatial pyramid matching (SPM), Gaussian mixture model (GMM), locality constrained coding (LCC), and fisher vector (FV) may be used.
  • SPM spatial pyramid matching
  • GMM Gaussian mixture model
  • LCC locality constrained coding
  • FV fisher vector
  • a processor such as the processor 101 from FIG. 1 , outputs information about the determined object.
  • the processor may display, store, or transmit the information about the determined object. For example, the information may be displayed to a user.
  • the information may be used to determine additional processing within the image. For example, if a human face is detected, additional processing may match the face to stored images.
  • the processor determines a likelihood of the presence of different objects and outputs the objects with the highest likelihood along with the probabilities.
  • the processor may also output information about the location of the object within the image.
  • FIG. 3 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context information from adjacent areas of the image.
  • Image 300 shows an image divided into sections. Each section may be represented by a feature vector, F1, F2, F3, F4, F5, F6, F7, F8, and F9.
  • the sections may be any suitable size, such as a single pixel or a group of pixels.
  • the individual feature vectors may include any suitable dimensions and number of items, As an example, each feature vector of image 300 has three vector positions, as shown in block 301 by the F1 feature vector including vector items F 1,1 , F 1,2 , and F 1,3 such that the first subscript represents the feature vector and the second subscript represents the position in the particular feature vector, Each value in the vector may indicate a characteristic of the area of the image covered by the F1 feature vector.
  • the other feature vectors may each include three positions. In some cases, the different feature vectors may contain different numbers of values.
  • a processor may create an expanded feature vector that includes information from the feature vector of the particular area represented by the feature vector (F1) as well as information about surrounding areas (other grid positions surrounding F1).
  • a feature vector for a position may include the feature vector for that position concatenated with feature vectors of image positions adjacent to the position.
  • Block 302 shows a context information expanded feature vector for F1.
  • the expanded feature vector 302 includes the three positions from F1 as well as the feature vectors of the surrounding areas represented by F2, F3, F4, F5, F6, F7, F8, and F9.
  • the expanded feature vector F1 includes the three vector positions for each of the eight grid positions adjacent to the position for F1.
  • Other implementations are also possible, such as using a grid position near F1 that is not adjacent to it or using fewer than the eight grid positions.
  • the expanded feature vector F1 may then be analyzed using object recognition feature vector analysis methods.
  • FIG. 4 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context derivative information.
  • Image 400 shows an image divided into sections. For example, a grid maybe overlaid over the image, and each grid section may be represented by a feature vector.
  • the grid sections are each represented by one of the feature vectors F1, F2, F3, F4, F5, F6, F7, F8, and F9.
  • the sections may be any suitable size, such as a single pixel or a group of pixels.
  • the feature vectors representing each section may be any suitable dimensions and number of items.
  • each feature vector of image 400 has three vector positions, as shown in block 401 by the F1 feature vector including vector items F 1,1 , F 1,2 , and F 1,3 such that the first subscript represents the feature vector and the second subscript represents the position in the particular feature vector.
  • the other feature vectors may each include three positions.
  • the F1 expanded feature vector 402 includes the three positions from the F1 feature vector as well as velocity and acceleration information related to each of the three positions. For example, there are three velocity values and three acceleration values, one related to each of the three features represented in the feature vector F1.
  • the velocity and acceleration may describe the change in the feature in the space of the image originating at F1 within a window surrounding F1.
  • the derivative information may be used to show change between the features at F1 across the image.
  • the derivative information may include any number and order of derivatives.
  • a feature vector may include the original features and velocity or the original features and acceleration features.
  • the derivative information may include multiple directions, such as X and Y direction velocity.
  • FIG. 5 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context derivative information and adjacent area information. For example, the types of expanded feature vector from FIGS. 3 and 4 may be combined.
  • Image 500 shows an image divided into sections with each section represented by a feature vector, F1, F2, F3, F4, F5, F6, F7, F8, and F9.
  • the individual feature vectors may include any suitable dimensions and number of items.
  • each feature vector of image 500 has three vector positions, as shown in block 501 by the F1 feature vector including vector items F 1,1 , F 1,2 , and F 1,3 such that the first subscript represents the feature vector and the second subscript represents the position in the particular feature vector.
  • the other feature vectors may each include three positions.
  • the context information expanded feature vector 502 includes feature vectors from areas of the grid adjacent to F1 and includes derivative information related to both F1 and the adjacent grid position feature vectors.
  • the expanded feature vector 502 includes the feature vectors for F1, F2, F3 F4, F5, F6, F7, F8, and F9, and the velocity feature vectors of F1, F2, F3, F4, F5, F6, F7, F8, and F9, and the acceleration vector of F1, F2, F3, F4, F5, F6, F7, F8, and F9.
  • Other combinations are also possible, such as different grid positions of the selected feature vectors and a different number and/or orders of derivatives.
  • the number and order of derivatives may vary by feature location, such as where derivative information is included related to some feature vectors but not related to other feature vectors.
  • the context information expanded feature vector 502 includes additional information that may allow for better object recognition. For example, the absolute local features of nearby areas are included as well as derivative comparison information to the local features of nearby areas.
  • FIG. 6 is a diagram illustrating one example of a flow chart of a method to recognize an object based on a context expanded feature vector.
  • additional processing may be performed on an expanded feature vector prior to performing object recognition methods, for example, due to the size and complexity of the expanded feature vector.
  • a processor creates an expanded feature vector with context information. Methods to reduce the size of the expanded vector may be used to make the larger vector a more manageable size.
  • the processor may apply principal component analysis (PCA) to the expanded feature vector. PCA may be used to reduce the size of the vector while maintaining the energy. PCA may be useful due to the longer size of the expanded feature vector.
  • PCA principal component analysis
  • the processor applies linear discriminant analysis (LDA) to the expanded feature vector.
  • LDA linear discriminant analysis
  • LDA may increase the discriminative power of the expanded feature vector.
  • the processor may apply one of PCA and LDA or both.
  • the processor may evaluate the vector to determine whether to apply PCA and/or LDA.
  • Other dimensionality reduction methods may be applied instead of or in addition to PCA and LDA. Proceeding to 603 , the processor performs object recognition methods on the altered expanded feature vector. Including context information on a local level within a feature vector allows for more accurate object recognition.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

Examples disclosed herein relate to image object recognition based on a feature vector with context information. A processor may create an expanded feature vector related to a first area of an image including context information related to the first area. The processor may determine the presence of an object in the image based on the feature vector and output information about the determined object.

Description

    BACKGROUND
  • Object recognition may involve determining the presence of an object in an image based on a statistical comparison of the features of the image to features representative of the object. A processor may create feature vectors where each feature vector includes information about the local features of the image in a particular area of the image. The processor may analyze a group of feature vectors to determine the likelihood of a particular type of object appearing in the image.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The drawings describe example embodiments. The following detailed description references the drawings, wherein:
  • FIG. 1 is a block diagram illustrating one example of an apparatus to recognize an object in an image based on a context expanded feature vector.
  • FIG. 2 is a flow chart illustrating one example of a method to recognize an object in an image based on a context expanded feature vector.
  • FIG. 3 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context information from adjacent areas of the image.
  • FIG. 4 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context derivative information.
  • FIG. 5 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context derivative information and adjacent area information.
  • FIG. 6 is a diagram illustrating one example of a flow chart of a method to recognize an object in an image based on a context expanded feature vector.
  • DETAILED DESCRIPTION
  • Automatic object recognition may be used to determine the content of images, for example, to organize or edit images. Feature vectors may be vectors that represent information about the local features in a particular area of the image. For example, each of the vector values may represent information about a particular local feature in the area represented by the feature vector. The feature vector, such as a Scale Invariant Feature Transform (SIFT) vector, may assume the independence of the local features and limit itself to the features in the particular area covered by the feature vector. Object recognition methods, such as bag-of-words classifier and support vector machine classifier, may be used to analyze feature vectors of different areas of the image to determine the probability of a particular type of object being present in the image.
  • In one implementation, a feature vector is expanded to include both information about the features of a particular area of the image represented by the feature vector and information about context related to the features of the particular area of the image. The context information may include, for example, information about features of adjacent areas of the image and/or information about a rate of change of features from the area of the image. The expanded feature vector may then be analyzed using object recognition methods to determine the likelihood of the presence of a particular type of object.
  • Analyzing feature vectors including context information may improve the accuracy of object recognition methods. Feature vectors that do not include context information may incorrectly assume that local image features are independent Expanding feature vectors to include context information allows for local low level context information to be taken into account when performing object recognition.
  • FIG. 1 is a block diagram illustrating one example of an apparatus 100 to recognize an object in an image based on a context expanded feature vector. The apparatus 100 may create an expanded feature vector that includes information about the features of the area represented by the feature vector as well as context information related to the area. The context information may be related to local image features that provide context to the image features covered by the feature vector. The apparatus 100 may analyze the feature vector to recognize an object within the image. The apparatus 100 may include a processor 101 and a machine-readable storage medium 102.
  • The processor 101 may be a central processing unit (CPU), a semiconductor-based microprocessor, or any other device suitable for retrieval and execution of instructions. As an alternative or in addition to fetching, decoding, and executing instructions, the processor 101 may include one or more integrated circuits (ICs) or other electronic circuits that comprise a plurality of electronic components for performing the functionality described below. The functionality described below may be performed by multiple processors.
  • The processor 101 may communicate with the machine-readable storage medium 102. The machine-readable storage medium 102 may be any suitable machine readable medium, such as an electronic, magnetic, optical, or other physical storage device that stores executable instructions or other data (e.g., a hard disk drive, random access memory, flash memory, etc.). The machine-readable storage medium 102 may be, for example, a computer readable non-transitory medium.
  • The machine-readable storage medium 102 may include instructions executable by the processor 101. For example, the machine-readable storage medium 102 may include expanded feature vector creation instructions 103, object presence determination instructions 104, and object information output instructions 105.
  • The expanded feature vector creation instructions 103 may include instructions to create a feature vector that includes information about the area covered by the feature vector in addition to context information related to the area covered by the feature vector. The expanded feature vector may indicate the local features of the particular area as well as the interaction with other areas. The expanded feature vector may be created in any suitable manner. The information may be aggregated or summarized. In one implementation, the expanded vector is created by stacking the feature vector for the area with a context information feature vector.
  • An image may be divided into sections, such as in a grid pattern, and a feature vector may be associated with each individual section. An expanded feature vector may include information about the section covered by the feature vector as well as context information about the local features nearby the section covered by the feature vector.
  • The context information may include, for example, information about the local features of areas of the image nearby the particular area. For example, the processor may determine a window around the area and include feature information about grid positions within the window. The feature information for the other grid positions may be determined in the same manner as the feature information for the particular area covered by the feature vector.
  • In one implementation, the context information includes comparison information related to areas of the image in a window surrounding the area. For example, a derivative of the features of the particular area may be determined and included in the expanded feature vector. The derivative may be any order, and any number of derivatives in any suitable directions may be used. The derivative context information may provide information related to the dynamic features in the area of the image.
  • The object presence determination instructions 104 may include instructions to analyze the expanded feature vector to recognize an object within the image. Any suitable method may be used. In one implementation, a bag-of-words classifier method is used to recognize an object within the image based on an expanded feature vector. In some cases, the same methods or similar methods to those used to analyze a feature vector may be applied to an expanded feature vector.
  • Any number of feature vectors related to the image may include context information. For example, the entire set of feature vectors or a subset of the feature vectors may be expanded feature vectors with context information. The processor 101 may analyze any number of the feature vectors related to the image to determine the likelihood of the presence of a particular type of object within the image. As an example, the feature vectors may be analyzed to determine the presence of a human face in the image, and the location of the face may be determined based on the location of the feature vectors indicating a face.
  • The object information output instructions 105 may include instructions to output information about the object determined to be in the image. Information about the object may be stored, displayed, or transmitted. The information may be any suitable information, such as information about the object and the probability of the object being presence in the image. In some implementations, the machine-readable storage medium 102 may include additional instructions to process the image based on the recognition of a particular type of object within the image. The processor 101 may output additional information about the detected object, such as location or other characteristics.
  • FIG. 2 is a flow chart illustrating one example of a method to recognize an object in an image based on a context expanded feature vector. For example, object recognition methods, such as a bag-of-words classifier method may be applied to feature vectors that include values indicating features of the particular area represented by the feature vector. An expanded feature vector may be created where the feature vector includes both values indicating features of the particular area and values providing context information related to the features of the particular area. The context information may indicate information about the local features nearby the particular area of the image. An object recognition method may then be applied to the expanded feature vector such that individual areas and their context within the image are taken into account when determining the probability of a presence of a particular type of object within the image. The method may be implemented, for example, by the processor 101 in FIG. 1.
  • Beginning at 200, a processor creates an expanded feature vector related to a first area of an image including context information related to the first area. The image may be, for example, retrieved from a storage or received from a remote source via a network. The image may be any image content in any suitable format, such as a JPEG image. The area of the image may be determined in any suitable manner. For example, the image may be divided into a grid pattern, and a feature vector may be created for each grid position to represent the local features of each grid position. Each vector value may represent a different feature of the grid position. The features may be related to, for example, color, texture, edges, or intensity. The features may also be color or intensity gradient based features, such as Scale-invariant Feature Transform (SIFT) or Histogram of Oriented Gradients (HOG).
  • The context information may be, for example, information related to the area around the area covered by the feature vector. For example, the feature vector may include feature information about the features in grid positions adjacent to or near the grid position of the feature vector. In some cases, the grid positions are not adjacent, such as where more important information for object recognition is not included in an adjacent grid position. In one implementation, the processor includes feature information of eight grid positions adjacent to the grid position. The processor may determine a window centered around the grid position of the feature vector. The window may be, for example, a square or circular window, and the feature vectors of the grid positions included within the window may be included in the expanded feature vector. In one implementation, the processor analyzes the features of the nearby areas to determine which areas information to include within the expanded feature vector.
  • The feature vector may include the context information by combining the feature information with the context information. In one implementation, the feature includes values representative of the grid position related to the feature vector and additional values are concatenated in the same vector. For example, the data may be stacked in the vector such that the feature vector is stacked with additional feature vectors from nearby locations in the image. In some cases, the information may be combined such that it is aggregated or summarized.
  • In one implementation, the feature vector includes comparison information of the area covered by the feature vector to nearby areas in the image. The context information may be dynamic feature information describing changes occurring within the image from the originating grid position across the spatial space of the image. The processor may determine a derivative related to the feature vector. For example, the processor may determine a velocity or acceleration of a feature along an x or y axis of the image. Any number and order of derivative may be used and any number of axes may be used. In one implementation, a first order velocity derivative is obtained by taking the finite difference derivatives of the features of the feature vector over the x and y directions using a finite length window centered at the grid position associated with the feature vector, and a second order acceleration derivative is obtained by taking the finite difference derivatives of the velocity features over the x and y directions using a finite length window centered at the grid position associated with the feature vector. The finite length window for the directive may be any size and position.
  • As an example, the velocity along the x direction in the grid position i,j may be determined as the following where k is a position indicator:

  • Δf (x) i,jk k=1 k(f i−kj −f i−k,j)/2Σk k=1 k 2
  • The velocity along the y direction in the grid position i,j may be determined as the following where k is a position indicator:

  • Δf (y) i,jk=1 k(f i,j+k −f i,j−k)/2Σk k=1 k 2
  • The acceleration along the x direction in the grid position i,j may be determined as the following where k is a position indicator:

  • ΔΔf (x) i,jk k=1 kf i+k,j −Δf i−k,j)/2Σk k=1 k 2
  • The acceleration along the y direction in the grid position i,j may be determined as the following where k is a position indicator:

  • ΔΔf (y) i,jk k=1 kf i,j+k −Δf i,j−k)/2Σk k=1 k 2
  • A feature vector may be created that includes original, velocity, and acceleration features. In one implementation, the features are stacked in the vector as if the vectors are combined. In one implementation, the features are aggregated and at least some vector positions include summary information.
  • Continuing to 201, a processor determines the presence of an object in the image based on an analysis of the expanded feature vector. Any number of feature vectors may be used to determine the presence of an object in the image. For example, feature vectors from each grid position in an image or a subset of grid positions may be analyzed. Any number of vectors used in the method may include context information. For example, the entire set of vectors or a subset of the vectors may include context information. In some implementations, vectors in particular positions or with particular features are expanded to include context information. Any suitable method for determining the likelihood of the presence of an object within an image based on a feature vector may be used. For example, spatial pyramid matching (SPM), Gaussian mixture model (GMM), locality constrained coding (LCC), and fisher vector (FV) may be used.
  • Moving to 202, a processor, such as the processor 101 from FIG. 1, outputs information about the determined object. The processor may display, store, or transmit the information about the determined object. For example, the information may be displayed to a user. The information may be used to determine additional processing within the image. For example, if a human face is detected, additional processing may match the face to stored images. In one implementation, the processor determines a likelihood of the presence of different objects and outputs the objects with the highest likelihood along with the probabilities. The processor may also output information about the location of the object within the image.
  • FIG. 3 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context information from adjacent areas of the image. Image 300 shows an image divided into sections. Each section may be represented by a feature vector, F1, F2, F3, F4, F5, F6, F7, F8, and F9. The sections may be any suitable size, such as a single pixel or a group of pixels. The individual feature vectors may include any suitable dimensions and number of items, As an example, each feature vector of image 300 has three vector positions, as shown in block 301 by the F1 feature vector including vector items F1,1, F1,2, and F1,3 such that the first subscript represents the feature vector and the second subscript represents the position in the particular feature vector, Each value in the vector may indicate a characteristic of the area of the image covered by the F1 feature vector. The other feature vectors may each include three positions. In some cases, the different feature vectors may contain different numbers of values.
  • A processor may create an expanded feature vector that includes information from the feature vector of the particular area represented by the feature vector (F1) as well as information about surrounding areas (other grid positions surrounding F1). As an example, a feature vector for a position may include the feature vector for that position concatenated with feature vectors of image positions adjacent to the position. Block 302 shows a context information expanded feature vector for F1. The expanded feature vector 302 includes the three positions from F1 as well as the feature vectors of the surrounding areas represented by F2, F3, F4, F5, F6, F7, F8, and F9. For example, the expanded feature vector F1 includes the three vector positions for each of the eight grid positions adjacent to the position for F1. Other implementations are also possible, such as using a grid position near F1 that is not adjacent to it or using fewer than the eight grid positions. The expanded feature vector F1 may then be analyzed using object recognition feature vector analysis methods.
  • FIG. 4 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context derivative information. Image 400 shows an image divided into sections. For example, a grid maybe overlaid over the image, and each grid section may be represented by a feature vector. In image 400, the grid sections are each represented by one of the feature vectors F1, F2, F3, F4, F5, F6, F7, F8, and F9. The sections may be any suitable size, such as a single pixel or a group of pixels. The feature vectors representing each section may be any suitable dimensions and number of items. As an example, each feature vector of image 400 has three vector positions, as shown in block 401 by the F1 feature vector including vector items F1,1, F1,2, and F1,3 such that the first subscript represents the feature vector and the second subscript represents the position in the particular feature vector. The other feature vectors may each include three positions.
  • The F1 expanded feature vector 402 includes the three positions from the F1 feature vector as well as velocity and acceleration information related to each of the three positions. For example, there are three velocity values and three acceleration values, one related to each of the three features represented in the feature vector F1. The velocity and acceleration may describe the change in the feature in the space of the image originating at F1 within a window surrounding F1. The derivative information may be used to show change between the features at F1 across the image. The derivative information may include any number and order of derivatives. For example, a feature vector may include the original features and velocity or the original features and acceleration features. The derivative information may include multiple directions, such as X and Y direction velocity.
  • FIG. 5 is a diagram illustrating one example of recognizing an object in an image based on a feature vector expanded with context derivative information and adjacent area information. For example, the types of expanded feature vector from FIGS. 3 and 4 may be combined.
  • Image 500 shows an image divided into sections with each section represented by a feature vector, F1, F2, F3, F4, F5, F6, F7, F8, and F9. The individual feature vectors may include any suitable dimensions and number of items. As an example, each feature vector of image 500 has three vector positions, as shown in block 501 by the F1 feature vector including vector items F1,1, F1,2, and F1,3 such that the first subscript represents the feature vector and the second subscript represents the position in the particular feature vector. The other feature vectors may each include three positions.
  • The context information expanded feature vector 502 includes feature vectors from areas of the grid adjacent to F1 and includes derivative information related to both F1 and the adjacent grid position feature vectors. For example, the expanded feature vector 502 includes the feature vectors for F1, F2, F3 F4, F5, F6, F7, F8, and F9, and the velocity feature vectors of F1, F2, F3, F4, F5, F6, F7, F8, and F9, and the acceleration vector of F1, F2, F3, F4, F5, F6, F7, F8, and F9. Other combinations are also possible, such as different grid positions of the selected feature vectors and a different number and/or orders of derivatives. In some cases, the number and order of derivatives may vary by feature location, such as where derivative information is included related to some feature vectors but not related to other feature vectors. The context information expanded feature vector 502 includes additional information that may allow for better object recognition. For example, the absolute local features of nearby areas are included as well as derivative comparison information to the local features of nearby areas.
  • FIG. 6 is a diagram illustrating one example of a flow chart of a method to recognize an object based on a context expanded feature vector. In some cases additional processing may be performed on an expanded feature vector prior to performing object recognition methods, for example, due to the size and complexity of the expanded feature vector. Starting at 600, a processor creates an expanded feature vector with context information. Methods to reduce the size of the expanded vector may be used to make the larger vector a more manageable size. Moving to 601, the processor may apply principal component analysis (PCA) to the expanded feature vector. PCA may be used to reduce the size of the vector while maintaining the energy. PCA may be useful due to the longer size of the expanded feature vector. Continuing to 602, the processor applies linear discriminant analysis (LDA) to the expanded feature vector. LDA may increase the discriminative power of the expanded feature vector. The processor may apply one of PCA and LDA or both. The processor may evaluate the vector to determine whether to apply PCA and/or LDA. Other dimensionality reduction methods may be applied instead of or in addition to PCA and LDA. Proceeding to 603, the processor performs object recognition methods on the altered expanded feature vector. Including context information on a local level within a feature vector allows for more accurate object recognition.

Claims (15)

1. A computing system, comprising:
a processor to:
create an expanded feature vector related to a first area of an image including context information related to the first area;
determine the presence of an object in the image based on the expanded feature vector; and
output information about the determined object.
2. The computing system of claim 1, wherein the processor is further to apply a dimensionality reduction technique to the expanded feature vector.
3. The computing system of claim 1, wherein creating the expanded feature vector comprises combining a feature vector of the first area with a feature vector of a second area.
4. The computing system of claim 1, wherein creating the feature vector comprises combining a feature vector of the first area and a feature vector of the derivative of the feature vector of the first area.
5. The computing system of claim 1, wherein creating the feature vector comprises combining a feature vector of the first area, a feature vector of a derivative of the feature vector of the first area, a feature vector of a second area, and a feature vector of a derivative of the feature vector of the second area.
6. A method, comprising:
creating, by a processor, a feature vector related to an area of an image wherein the feature vector includes information about the area of the image and adjacent to the area of the image;
performing a statistical object recognition method based on the feature vector to determine the presence of an object; and
output information about the object.
7. The method of claim 6, wherein creating the feature vector comprises concatenating a feature vector related to the area with feature vectors related to each of eight grid points adjacent to the area.
8. The method of claim 6, further comprising applying a dimensionality reduction method to the feature vector.
9. The method of claim 6, wherein creating the feature vector comprises:
determining a window surrounding the area;
concatenating feature vectors for areas within the window to the feature vector.
10. The method of claim 6, wherein the window comprises a circular or rectangular window around the area.
11. A machine-readable non-transitory storage medium comprising instructions executable by a processor to:
determine a first feature vector related to an area of an image;
determine a second feature vector including a derivative of the feature vector centered at the area of the image;
stack the first and second feature vectors;
perform a statistical object recognition method based on the stacked feature vectors to determine the presence of an object; and
output information about the object.
12. The machine-readable non-transitory storage medium of claim 11, wherein the second feature vector comprises velocity information related to the first feature vector.
13. The machine-readable non-transitory storage medium of claim 12, further comprising:
determining a third feature vector comprising an acceleration of the first feature vector, and
stacking the third feature vector to the first and second feature vector.
14. The machine-readable non-transitory storage medium of claim 11, further comprising instructions to apply a dimensionality reduction method to the stacked feature vector.
15. The machine-readable non-transitory storage medium of claim 11, further comprising instructions to:
determine a third feature vector including information about a second area of the mage and derivative information related to the second area of the image; and
stack the third feature vector onto the first and second feature vectors.
US13/717,706 2012-12-18 2012-12-18 Image object recognition based on a feature vector with context information Expired - Fee Related US9165220B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/717,706 US9165220B2 (en) 2012-12-18 2012-12-18 Image object recognition based on a feature vector with context information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/717,706 US9165220B2 (en) 2012-12-18 2012-12-18 Image object recognition based on a feature vector with context information

Publications (2)

Publication Number Publication Date
US20140169680A1 true US20140169680A1 (en) 2014-06-19
US9165220B2 US9165220B2 (en) 2015-10-20

Family

ID=50930944

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/717,706 Expired - Fee Related US9165220B2 (en) 2012-12-18 2012-12-18 Image object recognition based on a feature vector with context information

Country Status (1)

Country Link
US (1) US9165220B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140301650A1 (en) * 2013-04-08 2014-10-09 Omron Corporation Image processing device, image processing method, and recording medium
CN104134058A (en) * 2014-07-21 2014-11-05 成都万维图新信息技术有限公司 Face image processing method
US20150146924A1 (en) * 2013-11-27 2015-05-28 Yuka Kihara Image analyzing device, image analyzing method, and recording medium storing image analyzing program
US20160224864A1 (en) * 2015-01-29 2016-08-04 Electronics And Telecommunications Research Institute Object detecting method and apparatus based on frame image and motion vector
CN111461247A (en) * 2020-04-09 2020-07-28 浙江国贸云商控股有限公司 Feature data processing method and related device
US11080316B1 (en) * 2017-05-26 2021-08-03 Amazon Technologies, Inc. Context-inclusive face clustering
CN113366486A (en) * 2018-12-21 2021-09-07 伟摩有限责任公司 Object classification using out-of-region context

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070183629A1 (en) * 2006-02-09 2007-08-09 Porikli Fatih M Method for tracking objects in videos using covariance matrices
US20080199055A1 (en) * 2007-02-15 2008-08-21 Samsung Electronics Co., Ltd. Method and apparatus for extracting facial features from image containing face
US20100074530A1 (en) * 2008-09-25 2010-03-25 Canon Kabushiki Kaisha Image processing apparatus, image processing method and program
US20120099790A1 (en) * 2010-10-20 2012-04-26 Electronics And Telecommunications Research Institute Object detection device and system
US20120288167A1 (en) * 2011-05-13 2012-11-15 Microsoft Corporation Pose-robust recognition

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6307964B1 (en) 1999-06-04 2001-10-23 Mitsubishi Electric Research Laboratories, Inc. Method for ordering image spaces to represent object shapes
US7809722B2 (en) 2005-05-09 2010-10-05 Like.Com System and method for enabling search and retrieval from image files based on recognized information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070183629A1 (en) * 2006-02-09 2007-08-09 Porikli Fatih M Method for tracking objects in videos using covariance matrices
US20080199055A1 (en) * 2007-02-15 2008-08-21 Samsung Electronics Co., Ltd. Method and apparatus for extracting facial features from image containing face
US20100074530A1 (en) * 2008-09-25 2010-03-25 Canon Kabushiki Kaisha Image processing apparatus, image processing method and program
US20120099790A1 (en) * 2010-10-20 2012-04-26 Electronics And Telecommunications Research Institute Object detection device and system
US20120288167A1 (en) * 2011-05-13 2012-11-15 Microsoft Corporation Pose-robust recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision 60(2), 91-110, 2004, copyright 2004 Kluwer Academic Publishers. *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140301650A1 (en) * 2013-04-08 2014-10-09 Omron Corporation Image processing device, image processing method, and recording medium
US9239963B2 (en) * 2013-04-08 2016-01-19 Omron Corporation Image processing device and method for comparing feature quantities of an object in images
US20150146924A1 (en) * 2013-11-27 2015-05-28 Yuka Kihara Image analyzing device, image analyzing method, and recording medium storing image analyzing program
US9613272B2 (en) * 2013-11-27 2017-04-04 Ricoh Company, Ltd. Image analyzing device, image analyzing method, and recording medium storing image analyzing program
CN104134058A (en) * 2014-07-21 2014-11-05 成都万维图新信息技术有限公司 Face image processing method
US20160224864A1 (en) * 2015-01-29 2016-08-04 Electronics And Telecommunications Research Institute Object detecting method and apparatus based on frame image and motion vector
US11080316B1 (en) * 2017-05-26 2021-08-03 Amazon Technologies, Inc. Context-inclusive face clustering
CN113366486A (en) * 2018-12-21 2021-09-07 伟摩有限责任公司 Object classification using out-of-region context
US11783568B2 (en) 2018-12-21 2023-10-10 Waymo Llc Object classification using extra-regional context
CN111461247A (en) * 2020-04-09 2020-07-28 浙江国贸云商控股有限公司 Feature data processing method and related device

Also Published As

Publication number Publication date
US9165220B2 (en) 2015-10-20

Similar Documents

Publication Publication Date Title
US10885365B2 (en) Method and apparatus for detecting object keypoint, and electronic device
US9165220B2 (en) Image object recognition based on a feature vector with context information
US10936911B2 (en) Logo detection
CN109960742B (en) Local information searching method and device
US20190318195A1 (en) Robust feature identification for image-based object recognition
US9396546B2 (en) Labeling objects in image scenes
US9349076B1 (en) Template-based target object detection in an image
CN106408037B (en) Image recognition method and device
CN108734185B (en) Image verification method and device
JP2014232533A (en) System and method for ocr output verification
US9934577B2 (en) Digital image edge detection
US8442327B2 (en) Application of classifiers to sub-sampled integral images for detecting faces in images
US20180352213A1 (en) Learning-based matching for active stereo systems
JP5936561B2 (en) Object classification based on appearance and context in images
CN103460705A (en) Real-time depth extraction using stereo correspondence
US9830530B2 (en) High speed searching method for large-scale image databases
KR102221152B1 (en) Apparatus for providing a display effect based on posture of object, method thereof and computer readable medium having computer program recorded therefor
CN103295026A (en) Spatial local clustering description vector based image classification method
KR20110087620A (en) Layout based page recognition method for printed medium
CN103679174A (en) Shape descriptor generating method and device
CN112651351B (en) Data processing method and device
JP4570995B2 (en) MATCHING METHOD, MATCHING DEVICE, AND PROGRAM
Sanin et al. K-tangent spaces on Riemannian manifolds for improved pedestrian detection
US9536144B2 (en) Automatic image classification
Carvajal et al. Comparative evaluation of action recognition methods via Riemannian manifolds, Fisher vectors and GMMs: Ideal and challenging conditions

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TANG, HAO;REEL/FRAME:029663/0142

Effective date: 20121217

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20231020