WO2015198284A1

WO2015198284A1 - Reality description system and method

Info

Publication number: WO2015198284A1
Application number: PCT/IB2015/054830
Authority: WO
Inventors: Alessio Maria D'AMICO; Fabrizio Loppini
Original assignee: D Amico Alessio Maria; Fabrizio Loppini
Priority date: 2014-06-26
Filing date: 2015-06-26
Publication date: 2015-12-30

Abstract

A reality description system, comprising: at least one acquisition unit (1), for acquiring one or more images of an environment, at least one processing unit (2) for processing the data acquired by said acquisition unit (1) and at least one output interface (3), which is designed to receive output data from the processing unit (2) for describing the environment to at least one user. The processing unit (2) comprises a filtering unit (21) for detecting the edges of the shapes acquired by the acquisition unit (1). The present invention also relates to a method of virtual environment reconstruction.

Description

REALITY DESCRIPTION SYSTEM AND METHOD

The present invention relates to a reality description system.

Particularly the present invention deals with computer vision based system for assisting people and especially for assisting visually impaired persons.

The system comprises at least one environment scanning unit, at least one processing unit for processing the data collected by the environment scanning unit and generating at least one acoustic description of the environment based on the processed data collected by the scanning unit.

Currently many different kinds of electronic assistive devices has been developed and known using different techniques for recognizing objects in their environment.

Among the different technologies used an emerging one is computer vision. Many different kind of computer visions techniques are also known.

Document "computer vision-based object recognition for the visually impaired in an indoors environment: a survey", Rabia Jafri et al , Vis Computer (2014) pages 1197-1222, Springer Verlag, Published on line 15.10.2013 is a survey on the different techniques used in the said electronic assistive devices.

Typical visual approaches consists in systems based on computer vision which are provided with at least one acquisition system for acquiring one or more images of the environment, at least one processing unit for processing the data acquired by the acquisition unit and at least one output interface which is designed to receive output data from the processing unit for describing the environment to at least one user .

Preferably but without limitation the invention relates to virtual reality description and reconstruction systems as aids for blind people.

Briefly said these systems are designed to capture video or images of the reality or environment surrounding a user, especially but not exclusively a blind user to return feedbacks that might allow the said user to more easily move in the environment around him/her or to identify searched items which may be present in an environment.

Such systems are developed particularly but not exclusively as a support for blind users in carrying out everyday activities.

A blind user should be as autonomous as possible as he/she moves in unfamiliar places, or be able, for example, to sit and act autonomously at a set up table, and behave like an unimpaired user.

It will be understood from the above that these systems are required to ensure accurate detection of objects and actors in the environment and provide responses and feedbacks to the blind user as quickly as possible, in real-time mode. These two aspects are contradictory, as accurate detection involves computational complexity in the components of the systems, which will slow down the virtual reality processing and reconstruction process.

In particular high accuracy in recognizing objects in an environment, particularly if it is an outdoor environment, would provide for a huge number of objects to be recognized and vocally described and also of positions in space of the said objects particularly ina 3D space.

This huge number of items and their positions in space and for moving objects the variation in time of the position, additionally to requiring a high computational power, also generate a high level noise in the representation of the environment which would contain all kind of items identified despite the fact if these items are useful or not.

Furthermore each technique used for scanning the environment and identifying and recognizing objects and their position in space has advantages and drawbacks since each technique will represent only a part of the real environment which would be known by the user through natural senses and particularly its sight.

In a first development step, the present invention provides for a system according to the above description which has a multichannel scanning unit being able to scan or acquire data about the environment and the objects present therein as well as data about their position in the said environment, a multichannel processing unit comprising also a data fusion unit for comparing or correlating the data about the environment and the objects present therein obtained by different scanning techniques and for generating a virtual representation of the environment and of the objects and their positions which are present in the said environment by combining the data obtained by each of the different scanning techniques, and a multichannel output unit, among which a voice unit is present for generating a differentiated output by each channel which outputs can be combined.

The above features allow to generate a more precise virtual representation of the environment and to generate from it a more precise descriptive output to the user, particularly by means of a voice output.

Each different acquisition or scanning techniques as it is disclosed also from the disclosure of the above cited publication "computer vision-based object recognition for the visually impaired in an indoors environment: a survey", Rabia Jafri et al , Vis Computer (2014) pages 1197-1222, Springer Verlag, Published on line 15.10.2013, provide information about different aspects of the environment. Combining the information would provide for a more precise and useful virtual reconstruction of the environment and for a better choice of the elements which has to be described using different senses such as voice or similar either by using only one kind of output or by combining the said outputs.

Particularly relevant to notice is that the present invention is not a navigation system indicating a route or how to proceed by moving in an environment, but it has the task to give useful information for helping impaired persons or also fully normal people to carry out actions in the said environment.

Although the present system is not a navigation system it may provide as an output channel an output visual and/or acoustic which is the typical output of a navigation system.

According to a further improvement which can be implemented thanks to the multichannel scanning techniques and to the multichannel output of different techniques describing the virtual environment generated by the data obtained from the different scanning and acquisition units, a filtering unit is provided which filters away part of the information to be output to the user in accordance to preset criteria and or to dynamic variable criteria in an automatic way or by input of commands .

The filtering action can be carried out by ignoring certain output channels or by ignoring or not transmitting the information about certain items or objects which are present in the environment as a function of their position relatively to other objects and/or relatively to the instant position of the user.

In the last case a tracking unit of the position of the user may be provided as a further input channel.

The filtering criteria can be used to simply ignore objects or items which are not within a certain distance from the user or which are not along the path of the user in direction of a certain target, such as a door or the like.

In this case the criteria for considering and transmitting the information to the user or not could be chosen to be a function of the distance of the objects from an expected path of the user from a certain starting point at which a first scan of the environment has been carried out to a certain target.

Acquisitions and scans of the environment according to one or more or all the different techniques or of a selection thereof can be carried out during the displacement of the user along the said path and the changing topography of the environment considering also the objects present therein can be reported by the one or more output channels and the relating output techniques .

The set of criteria or the functions may depend on the kind of environment and can be preset for a certain number of different predefined environment.

It is also possible that voice commands representing the actions which the user will carry out, are used as the triggers for changing the filtering or the channel activation criteria, by changing the kind of information to be transmitted to the user in different situations, such as sitting down, working at the desk, moving around, cooking something etc. etc.

The basis of the present invention is the fact that since, particularly in the case of visually impaired persons, the vocal information has to be understood and memorized by the user, too much information would not efficiently help the user, but rather prevent its decisions on how to carry out the tasks he wishes. Furthermore rapid decisions are needed, particularly when moving in an outdoor environment.

In an embodiment the present invention provides for two input channels, one for visual information of non tagged objects and the other for the visual information about tagged objects. Combining these two techniques and the corresponding criteria, tagged objects could be more easily recognized while the real shape is scanned and recognized by the tag free technique . Furthermore, one input channel consists in information which can be collected actively or is sent to the processing unit, for example via wi-fi connection to servers of official sites of institutions or of companies or of service providers. Information can be related to the condition of the traffic, to maps, planimetry, and or preprocessed visual or acoustic information of outdoor environments as for example visual information acquired by cameras at traffic lights or in official buildings.

Active collection of the information means that the user connects by command the assistive device to external server by means of the available connection which may be detected automatically by the assistive device .

Passive collection or automatic collection means that the assistive device automatically scans for available connection to servers, survey systems and other networks or devices which are able to provide information in the environment.

As disclosed above, different input channels are used which are disclosed for example in the publication "Computer vision-based object recognition for the visually impaired in an indoors environment: a survey", already cited above. In the following one example of input channel is disclosed in which the processing unit comprises a unit for detecting the edges of the shapes acquired by the acquisition unit.

Edge detection can filter out information that might slow down processing, without losing the information required for proper environment reconstruction . Simple edge detection allows quick identi ication of the number of objects and actors in the environment.

Such information may be obviously integrated with other data, for more complete virtual environment reconstruction.

For example, according to a first embodiment, the processing unit comprises an association unit for detecting shapes that meet predetermined requirements in the environment.

This avoids the need for a slow analysis for recognition of such shapes, as the system automatically recognizes previously acquired shapes.

For example, such association unit may comprise a module for recognition of an ID device, such as a tag or the like.

In this case, the system of the present invention will comprise one or more ID devices, which will be placed on the object in the environment and will provide information about the associated object.

These ID devices may be interrogated, according to one or more prior art methods, for receiving information that will supplement edge detection data for virtual environment reconstruction.

The above described configuration shows that the system of the invention shall not be necessarily used by blind users.

Indeed, ID devices may also contain useful information for unimpaired individuals, i.e. provide a somewhat augmented reality.

In a possible embodiment, the association unit may comprise a storage unit for storing shapes. A shape library is this created, for the system to quickly recognize and detect known and acquired objects and actors in the environment.

Advantageously, the association unit comprises a face detection module for detecting faces in the environment .

In a preferred embodiment, the processing unit comprises an object reconstruction unit, said object reconstruction unit having an input for receiving output data from the association unit and from the acquisition unit.

Thus, the reconstruction unit integrates information from the filtering unit and the association unit to reconstruct the acquired objects.

Advantageously, all the information available to the system are further integrated with user position data .

As clearly shown by the method of the present invention, the system provides real-time updating of virtual environment reconstruction, as it is required to inform the user about proper location and dimensions of the objects in the room based on user position changes .

Therefore, in a variant embodiment of the inventive system, the processing unit comprises a user locating unit and an object locating unit for locating the objects reconstructed by the reconstruction unit.

Preferably, the processing unit comprises a reconstruction module, which is designed to reconstruct the scene directly before the user.

In a preferred embodiment, the acquisition unit consists of a stereo camera, comprising at least two optical sensors. This aspect is particularly advantageous, as explained below in the description of the method of the present invention, for determining the depth of the acquired objects.

As mentioned above, the output interface is adapted to provide an environment description feedback to the user .

This output interface may be provided in any form, but preferably comprises a voice message generation unit.

Voice messages will be adapted to describe the environment and will preferably consist of short and readily understandable phrases, for the user to receive as quickly as possible information allowing him/her to nimbly move within the environment.

As a supplement to these voice messages, the system of the present invention may provide other signals, such as alarms or the like, e.g. to inform a user that he/she is getting close to an object or a person.

As a result, the system preferably comprises sensors for detecting the distance of the user from objects in the environment.

These sensors may be proximity sensors and may return a sound or vibration signal to the user to warn him/her about proximity of a given object.

It will be appreciated from the above that at least some of the above described components of the system of the present invention may be integrated in a portable device, such as a smart phone, a tablet or the like. The portability advantages of the invention are self-evident and benefit both blind and unimpaired users .

The present invention also relates to a method for carrying out electronic assistance by means of the above disclosed device, the said method comprising the steps of :

a) acquiring one or more images of the environment according to different techniques for scanning and acquiring information of the environment,

b) processing the acquired data by combing the information data obtained by one or more or all the said scanning and acquisition techniques,

c) reconstructing a virtual model of the environment using the said data of the said one or more or all scanning and acquisition techniques;

d) generating an output of the said data which makes use of all or part of the senses of the user and particularly generating acoustic and/or vocal output to transfer at least a part of the information contained in the said virtual model of the environment.

According to the method of the present invention, the step of processing comprises filtering off or suppressing or non transmitting or non processing information considered non-pertinent according to predefined criteria for setting the said filter.

The filtering criteria could be different and vary with the kind of environment and with the action which the user decides to carry out in the environment and can be preset settings or dynamically variable settings according to the condition of the environment and of the user in the said environment. A primary criteria for filtering is a command of the user relating to receiving certain information or relating to the indication of a certain action the user wants to carry out or a target which the user wants to achieve.

Filtering out of information or channels can be carried out automatically by recognizing the objects and considering which relevance they have for the task or target chosen by the user or relating to the command given by the user, which relevance is calculates by considering the morphological information about the objects recognized and/or the physical effects on movements and/or topographical conditions of the environment .

One of the further criteria is the distance of the object or objects from the current or from the estimated position of the user in the environment.

Further criteria can be obtained or received from external information providers, such as information on events changing the environment condition, information relating to the existence of a condition of danger or risk .

Further criteria consist in the automatic detection of risk or danger and in the signaling of the risk and danger condition and the activation of a navigation output for helping the user to reach a safety condition in the shortest and easiest way, by considering automatically the distance from the position consisting in the safety condition and the number of obstacles in the path to the said position.

In one embodiment of the method of the present invention, relating to the visual channel a step of detecting edges of the objects in the environment on a visual frame further comprises the sub steps of :

bl) blurring the acquired image,

b2) removing color from the image, maximizing contrast and then binarizing the image,

b3) extracting image edges.

As mentioned above, a reduced data burden is obtained, but the information covered by the scope of the system and method of the present invention is not lost.

According to an improvement, an association unit may be provided, which has various modules, and is designed to integrate the information acquired by the acquisition unit.

Therefore, similar to what has been described above, according to an embodiment of the method of the present invention the processing step comprises an association step for detecting shapes that meet predetermined requirements in the environment.

For example, the association step may comprise a step of recognizing at least one ID device, at least one ID device being placed on at least one object in the environment, and a module for ID device recognition being provided in the association unit.

Instead of or in addition to the above, the association step may comprise a step of storing acquired shapes in a storage unit.

Finally, the association step may comprise a step of detecting faces in the environment.

According to a further example, a step of detecting faces obviously affords identification of persons and animals in the environment. Based on the above, persons and animals may be also detected by the filtering unit and the object reconstruction unit.

Irrespective of the method used for identification, the system and method of the invention can apparently detect moving persons, animals and things, i.e. dynamic objects, and reconstruct them step-by-step as they move, to provide a real-time feedback to the user about the environment around him/her.

Advantageously, the step of processing comprises a step of reconstructing objects, which is designed to position the acquired images in the virtual reconstruction module, said step of reconstructing objects comprising the steps of:

b4) dividing the environment into at least two different depth levels,

b5) grouping each edge detected in the environment into as many classes as there are depth levels, according to the distance of each edge from the acquisition unit,

b6) setting up a grid for each depth level and introducing the detected edges in the corresponding grid,

b7) aggregating the edges for each grid and forming objects in the environment,

b4) calculating the distance of each object from the acquisition unit and thus the distance from the user .

In one embodiment of the above described method steps, an additional verification step is provided, which comprises checking that each object belongs to one or more of the depth levels. As described above concerning the system of the present invention, the step of processing may comprise a step of locating the user and a step of placing each previously found object in the environment, and estimating its distance from the user.

Advantageously, the step of placing each previously found object comprises the substeps of: blOl) determining reference points to be associated with the detected objects,

bl02) estimating the distance of the operator from the reference points,

bl03) updating the operator position,

bl04) updating the list of reference points, bl05) creating the virtual module of the environment .

Once the virtual module of the environment has been recognized, in order to provide a proper description thereof to the user, particularly to a blind user, the method of the present invention provides the possibility of generating voice messages.

Finally, the acquisition unit advantageously comprises a stereo camera with at least two optical sensors, the acquisition step being carried out by a calibration process, consisting in aligning and synchronizing each image detected by the optical sensors .

This will provide information about the depth of each object and its position in the environment.

In addition to the above advantages, the system and method of the present invention are apparently found to be useful not only for providing to a blind user a description of the reality around him/her, but also for suggesting the user a possible way of movement, e.g. a possible way of escape in case of danger .

This aspect provides advantageous characteristics also for unimpaired users in poor visibility conditions, such as during a fire.

Furthermore, this aspect is also advantageous, for instance, when an unimpaired user has difficulties in moving within an environment due to poor knowledge of the environment and its conditions.

A typical case may be a tourist who does not speak the language of the place in which he/she is, and there is no indication that he/she can recognize.

These and other features and advantages of the present invention will appear more clearly from the following description of a few embodiments, illustrated in the annexed drawings, in which:

Fig. 1 is a block diagram of a possible embodiment of the system of the present invention;

Figs. 2a to 2c show an image processed by one of the units in the processing unit of the system of the present invention.

Fig. 3 is a possible output screen of the system of the present invention;

Figure 4 is block diagram disclosing the structure of the system according to the present invention.

It shall be noted that while the figures show an embodiment of the system and method of the present invention, this embodiment shall not be intended to limit the inventive principle of the present patent application.

This embodiment has illustrative purposes, and is used to better explain the advantageous features and aspects of the system and method of the present invention .

Referring to figure 4, an electronic assistive system comprises one or more input channels 100 each one corresponding to a certain environment scanning or acquisition technique of information of a certain environment, a processing unit 500, which processes the inputs from the various channels and transform the data in information to be transmitted to the user by means of one or more different output channels corresponding to different techniques addressing different human senses .

Processing may occur according to different techniques such as different image processing techniques, different acoustic image/object reconstruction techniques, and other techniques. Output channels 300 may be principally visual, acoustic, or tactile or other depending on the skills of the user and can be activated by choice or by presets alternatively or in parallel.

The processing unit 500 cooperates with a filter unit 200 which sets the criteria for determining which data has to be collected and/or processed and/or transmitted to the user.

A preset unit 400 cooperating with the processing unit allows to save custom preset and personalized presets .

The assistive system and the assistive device according to the present invention uses well know hardware and firmware in order to carry out the funcions . The novelty stands in the fact that it is a hybrid assistive system or device which provides selective data fusion of several input channels using different techniques for optimizing task or target oriented outputs reducing computational burden and enhancing effectiveness of the information transmitted to the user .

The different input channels can be categorized as: acoustic inputs 101, such as microphone arrays for collecting acoustic signals and recognizing and localizing acoustic sources by means of evaluation software, tactile inputs 102, such as gloves, optical inputs 103 such as cameras or other optical sensors being able to generate images which can be then subjected to image processing by means of different image processing software capable of extracting features such as edge detection, segmentation, object recognition, and similar, tags 104, which are placed on objects and are provided with a memory in which data on the object are saved and with a communication hardware and software for connecting to the assistive system.

Further data can be acquired from other sources, which can be defined as external sources 105. These sources are devices or networks existing in the environment and operating for carrying out different tasks as supervision of sites, such as traffic cross, ways, buildings or similar, and which have TX/Rx hardware and software allowing to be connected for communicating with the present assistive system.

Alternative sources falling within the said definition can also be servers of public institutions, which make available environmental information, such as news on certain events, planimetry of buildings, or locations. Further sources can be providers specialized in generating and making available information to assistive devices.

Connection and communication with such devices can be generated automatically being triggered by availability events such as for example the automatic detection of the presence of such sources which triggers the assistive system to connect to the server. Information can be downloaded automatically or on choice and demand of the user and be subject to the filters set as will be described in the following.

A further input channel consists in on board sensors 106, such as for example accelerometer , gravimeter, altimeter, temperature sensors, gps sensors, network sensors, sensors for detecting chemical substances, such as smoke detector and similar and every kind of sensor available for determining the condition of the environment and of the user in the said environment.

Data collected by the input channels are sent to the processing unit 500 which provides for processing the data and comparing the data collected by the different channels and the relating technologies in order to mutually confirm conditions and to generate the different outputs.

The output channels 330 may be of different kinds, for example visual 301, acoustic 302 and tactile 303.

The activation of the different output channels depends on setting of the user. In principle, the assistive system is not directed only to diversely impaired persons, so that different outputs can be used depending on the skills of the user. Visual output can be for example directed to provide specific information on environmental condition by showing visually the condition.

Acoustic output 302 may be of any kind and particularly it can consist in vocal indications or descriptions using natural language. This could be describing the environment and the presence of recognized objects or simply providing a translation into another language of the information on a shield or other.

Acoustic output can be provided with different means to the user, such as for example by transmitting acoustic energy through the bones.

Tactile information can also be generated by means of gloves or other tactile excitation devices which provide for tactile sensation and temperature.

The outputs disclosed are not limited only to the one disclosed.

One of the central feature of the hybrid system according to the present invention is the provision of a filter unit 200 which allows to limit the quantity of the data to be collected, processed, transmitted to the user.

The first filtering criteria which can be used for determining which data to be collected and/or analyzed and/or transmitted to the user is user command 201.

A user command may indicate to the system which information the user wants to receive. For example the user indicates specific objects he is looking for. In the case of a visually impaired person for example the person could indicate to the system a command like "find door" and the system will ignore data not pertinent to the door. It is immediately clear that not all input channels will furnish information on this object so that from the very beginning the number of channels can be reduced and thus the data quantity.

Similarly other commands can be set or provided as preset commands which will lead to custom settings 401.

Further criteria which can be provided in combination or alternatively to the user command 201 is the indication of user tasks or target 202, such as for example: "going out", "sitting down", "find telephone", and similar. In this case the system by analyzing the data for reconstructing the virtual model is able to recognize relevant objects and leave out the non relevant. Relevance can be determined by means of criteria relating to physical effects and the physical features of the objects in relation to the command, task.

Automatic user oriented criteria for filtering data, processing and output channels consist for example in the distance of the objects from the user as indicated on 203.

Obviously, other criteria may be added or changed depending also on the capabilities of the system in terms of artificial intelligence skills.

Further criteria may be determined automatically by the system and triggered for example by the automatic detection of dangers or risk situation.

Considering for example the event of a fire, the system may immediately stop every activity and inform the user of the danger and of the action to be taken. The output channels will be driven as navigation systems which processes data in order to maximize the probability to reach a certain safe condition in the shortest time, also by dynamically considering moving objects and changing the routes and actions suggested.

Many other criteria may be considered which in any case falls within the above definitions and which are disclosed with the said definitions

The following description is directed to a particular example of a visual input and a vocal output channel provided in the above described system.

Figure 1 shows a block diagram for illustrating the system for virtual environment description and reconstruction of the present invention.

The system comprises an acquisition system 1 for acquiring one or more images of the environment, at least one processing unit 2 for processing the data acquired by the acquisition unit and at least one output interface 3 which is designed to receive output data from the processing unit 2 for describing the environment to at least one user.

Still referring to Figure 1, the method of the present invention will be also described, i.e. a method for virtual environment reconstruction, which comprises the steps of :

a) acquiring one or more images of the environment by the acquisition unit 1,

b) processing the acquired data by the processing unit 2,

c) reconstructing a virtual model of the environment .

The acquisition unit 1 is preferably a stereo camera comprising at least two optical sensors 11 and 12.

Particularly, a right optical sensor 11 and a left optical sensor 12 are provided. This is because, as clearly explained below, reconstruction of the reality around the user is typically based on bifocal vision and on algorithms for information extraction from images.

Therefore, the acquisition step is carried out by a calibration process, consisting in aligning and synchronizing each image detected by the optical sensors 11 and 12.

Once frames have been acquired by both sensors, a calibration process is carried out, which consists in aligning and synchronizing the right and left frames, and hence obtaining two images of the same scene, taken at the same distance but laterally offset by the binocular distance, which may be used for calculating the depth and distance of a given object.

The processing unit 2 comprises a filtering unit 21 for detecting the edges of the shapes acquired by the acquisition unit 1.

Thus, the step of processing comprises filtering of the acquired data, for detecting the edges of the acquired shapes.

Advantageously, the filtering unit 21 carries out the step of detecting edges, which comprises the substep of :

bl) blurring the acquired image.

The step bl allows blurring of the image, to filter out detail information not associated with any actor or object of interest to be detected. It actually affords prior reduction of the computational noise that might arise from later application of the algorithms. b2) removing color from the image, maximizing contrast and then binarizing the image. The step b2 preferably applies the filter known as SOBEL for color removal, contrast maximization and binarization of the image. A binarized image has no gray scale on black background, with significant white lines and dots.

b3) extracting image edges.

The step b3 extracts all the edges that compose the binarized image and introduces them into a vector. More in detail, each edge in such vector corresponds to another vector of all the Cartesian coordinates of the dots that compose the edge .

Figures 2a to 2c show the above described method steps .

It will be understood that these method steps may be carried out using one or more of prior art edge detection algorithms.

Advantageously, the filtering unit 21 receives at its input the images acquired by the optical sensors 11 and 12 and first processes them into a two- dimensional reconstruction, according to the above described method steps.

This two-dimensional reconstruction of the scene may be used to determine the number of actors and objects therein.

Then, it uses the two-dimensional reconstruction data to obtain a three-dimensional reconstruction.

The three-dimensional reconstruction of the scene may be used to extrapolate the depth position of such actors and objects.

For three-dimensional reconstruction of the scene, the filtering unit 21 applies the following method steps: cl) rectifying right and left images: each frame acquired by right and left optical sensors 11 and 12 undergoes alignment and spatial synchronization,

c2) generating a two-dimensional gray scale image, in which closer objects are white and farther objects are black,

c3) interpolating the image obtained in step c2) with the original image to obtain a three-dimensional image .

According to a possible embodiment, a further binarization step may be carried out to reduce the amount of output data from the filtering unit 21.

Particularly referring to Figure 1, the filtering unit 21 is functionally connected to an object reconstruction unit 22, such object reconstruction unit having an input for receiving output data from the association unit 21.

Before describing the characteristics of the reconstruction unit 22, it should be noted that such unit integrates the data from the filtering unit 21 with the information from an association unit 23.

The association unit 34 carries out an association step, for detecting shapes that meet predetermined requirements in the environment.

Particularly, the association unit 23 detects, in the environment, special types of objects of relevance for processing purposes.

In a first embodiment, the association unit 23 has a module 231 for recognizing one or more ID devices: in this case, ID devices are designed to be placed on one or more objects in the environment. The module 231 recognizes the different ID devices, interrogates them and obtains the information it needs therefrom.

Particularly referring to Figure 1, in combination with the module 231, the association unit includes a storage unit 233 which is adapted to store shapes, particularly previously acquired or preloaded shapes into the storage unit 233.

The module 233 can detect special objects defined and classified therein within the scene.

Finally, the association unit 23 comprises a face detection module 232 for detecting faces in the environment .

The module 232 can distinguish people from objects in the same scene and is a necessary instrument for reconstruction of a plan free of any non-pertinent information .

Still referring to Figure 1, the object reconstruction unit 22 has an input for receiving output data from the filtering unit 21 and the association unit 23.

The reconstruction unit 22 is designed to position the acquired images in the virtual reconstruction model .

Particularly, this unit 22 extracts objects for which no specific information is available from the scene .

Advantageously, reconstruction occurs through the steps of :

b4) dividing the environment into at least two, preferably three different depth levels,

b5) grouping each edge detected in the environment into as many classes as there are depth levels, according to the distance of each edge from the acquisition unit 1,

b7) aggregating the edges for each grid and forming objects in the environment,

b4) calculating the distance of each object from the acquisition unit.

In a possible embodiment, an additional verification step is provided in the reconstruction unit 22, which comprises checking that each object belongs to one or more of the depth levels.

Preferably, the level that is closer to the acquisition unit 1 is set to be higher than the level that is farther from the acquisition unit 1: for each objected detected in each of the three grids a check is made to determine whether it occupies or extends over multiple adjacent levels. If it does, it is associated to a higher-level object. In other words, an object is detected and marked in the higher-level grid in which it "starts" and is signaled in the lower levels in which it "expands".

The output results for the reconstruction unit 22 are sent to a reconstruction module 24, which is designed to reconstruct the scene directly before the user .

Figure 3 shows a possible reconstruction provided by this module 2 .

The scene depth has been set over three aggregation levels.

Three level screens are provided in the image: level 0, level 1 and level 2. The level 0 is closest to the acquisition unit 1 and the level 1 is farthest there rom.

As mentioned above, a level is deemed to be higher than another if it is closer to the acquisition unit 1. For example: level 0 is deemed to be higher than level 1 and level 1 is deemed to be higher than level 2.

Since an object may extend over multiple levels, it can be said to start in the higher level and continue in another.

At each level, objects are marked O and the level closer to the acquisition unit 1 where the object starts and the number of the object on the level are indicated between parentheses: O (level, number)

The objects marked with a R are the objects that

"start" at the current level .

The objects marked with a V are the objects that "start" at a higher level and continue in the current one; in this case the indication O (level, number) contains both the level number and the number of the object of the higher level.

The detected tags are indicated with a B.

For each object and tag, the distance from the acquisition unit 1 is indicated, in cm.

For each tag the angle with respect to the user is also indicated.

According to Figure 1, the reconstruction module 24 is functionally connected to the output interface 3 which receives the data of the scene directly before the user and generates signals to be transmitted to the user for describing the scene before him/her.

In a first embodiment, the output interface 3 comprises a voice message generating unit. Therefore, the output interface 3 communicates with the user by describing the surrounding environment and may transmit messages either automatically or upon request by the user.

This is allowed by integrating an input module in the output interface, for the user to enter controls for message generation.

Instead of or in addition to the above, these messages may be automatically generated by the output interface, e.g. if the user comes too close to one or more objects.

Here sensors, e.g proximity sensors, may be provided for generating acoustic or mechanical alarms warning of excessive proximity.

In a possible embodiment, the processing unit 2 comprises a user locating unit and an object locating unit for locating the objects reconstructed by the reconstruction unit, these locating units being generally referenced 25.

The unit 25 identifies the location of user and the position of each of the previously detected objects in the environment and estimates the distance of each object from the user.

Particularly, the user locating unit tracks the individual, i.e. his/her movements, in the room, whereas the object locating unit "reconstructs" the whole scene, i.e. places each of the previously found objects in the room and estimates its distance from the user.

Furthermore, the object locating process is carried out through the steps of :

blOl) determining reference points to be associated with the detected objects, bl02) estimating the distance of the operator from the reference points,

bl03) updating the user position,

Claims

1. Reality description system e system comprises at least one environment scanning unit, at least one processing unit for processing the data collected by the environment scanning unit and generating at least one acoustic description of the environment based on the processed data collected by the scanning unit and which system is a hybrid system having a multichannel scanning unit being able to scan or acquire data about the environment and the objects present therein as well as data about their position in the said environment, a multichannel processing unit comprising also a data fusion unit for comparing or correlating the data about the environment and the objects present therein obtained by different scanning techniques and for generating a virtual representation of the environment and of the objects and their positions which are present in the said environment by combining the data obtained by each of or one or more of the different scanning techniques, and

a multichannel output unit, among which a voice unit is present for generating a differentiated output by each channel which outputs can be combined.

2. System according to claim 1 in which a filtering unit is provided which filters away part of the information to be output to the user in accordance to pre set criteria and or to dynamic variable criteria in an automatic way or by input of commands.

3. System according to claim 2 the filtering action consists in ignoring certain output channels or by ignoring or not transmitting the information about certain items or objects which are present in the environment as a function of their position relatively to other objects and/or relatively to the instant position of the user.

4. System according to claims 2 or 3, in which the filtering criteria are determined by onre or more of the said options : user commands , user targets or tasks , automatic determination of relative position of user and objects, automatic determination of danger or risk condition .

5. System according to one or more of the proceeding claims in which one input channel consists in information which can be collected actively or is sent to the processing unit, for example via wi-fi connection to servers of official sites of institutions or of companies or of service providers. Information can be related to the condition of the traffic, to maps, planimetry, and or preprocessed visual or acoustic information of outdoor environments as for example visual information acquired by cameras at traffic lights or in official buildings and where active collection of the information means that the user connects by command the assistive device to external server by means of the available connection which may be detected automatically by the assistive device and passive collection or automatic collection means that the assistive device automatically scans for available connection to servers, survey systems and other networks or devices which are able to provide information in the environment.

6. A reality description system, according to one or moe of the preceding claims in which one input channel comprises

at least one acquisition unit (1) , for acquiring one or more images of an environment, at least one processing unit (2) for processing the data acquired by said acquisition unit (1)

and at least one output interface (3) , which is designed to receive output data from the processing unit (2) for describing the environment to at least one user,

the processing unit (2) comprises a filtering unit (21) for detecting the edges of the shapes acquired by the acquisition unit (1) .

7. A system as claimed in claim 6, wherein the processing unit (2) comprises an association unit (23) for detecting shapes that meet predetermined requirements in the environment.

8. A system as claimed in claim 7, comprising at least one ID device being placed on at least one object in the environment,

a module (231) being provided in the association unit (23) for recognizing said ID device.

9. A system as claimed in claim 7, wherein said association unit (23) comprises a storage unit (232) for storing shapes.

10. A system as claimed in claim 7, wherein the association unit (23) comprises a face detection module (233) for detecting faces in the environment.

11. A system as claimed in claim 7, wherein the processing unit (2) comprises an object reconstruction unit (22) , said object reconstruction unit (22) having an input for receiving output data from the association unit (23) and from the acquisition unit (1) .

12. A system as claimed in one or more of the preceding claims wherein the processing unit (2) comprises a user locating unit and an object locating unit for locating the objects reconstructed by the reconstruction unit.

13. A system as claimed in one or more of the preceding claims, wherein the processing unit (2) comprises a reconstruction module (24) , which is designed to reconstruct the scene directly before the user .

14. A system as claimed in claim 6, wherein said output interface (3) comprises a voice message generating unit.

15. A system as claimed in claim 6, wherein said acquisition unit (1) comprises a stereo camera comprising at least two optical sensors.

16. A system as claimed in claim 6, wherein sensors are provided for detecting the distance of the user from the objects in the environment.

17. A method for carrying out electronic assistance by means of the above disclosed device, the said of virtual environment reconstruction, method comprising the steps of:

a) acquiring one or more images of the environment according to different techniques for scanning and acquiring information of the environment by an acquisition unit,

b) processing the acquired data by combing the information data obtained by one or more or all the said scanning and acquisition techniquesa processing unit,

c) reconstructing a virtual model of the environment unsing the said data of the said one or more or all scanning and acquisition techniques;

18. A reality description method according to claim 17, comprising the steps of:

a) acquiring one or more images of an environment by an acquisition unit,

b) processing the acquired data by a processing unit,

c) reconstructing a virtual model of the environment ,

the step of processing comprises filtering of the acquired data, for detecting the edges of the acquired shapes .

19. A method as claimed in claim 18, wherein the step of detecting edges comprises the substeps of : bl) blurring the acquired image,

b3) extracting image edges.

20. A method as claimed in claim 18, wherein the step of processing comprises an association step for detecting shapes that meet predetermined requirements in the environment.

21. A method as claimed in claim 19, wherein the association step comprises a step of recognizing at least one ID device,

at least one ID device being provided, which adapted to be placed on at least one object in the environment,

a module being provided in the association unit for recognizing said ID device.

22. A method as claimed in claim 19, wherein the association step comprises a step of storing acquired shapes in a storage unit.

23. A method as claimed in claim 19, wherein the association step comprises a step of detecting faces in the environment.

24. A method as claimed in claim 17, wherein the step of processing comprises a step of reconstructing objects, which is designed to position the acquired images in the virtual reconstruction module, said step of reconstructing objects comprising the steps of: b4) dividing the environment into at least two different depth levels,

b7) aggregating the edges for each grid and forming objects in the environment,

b4) calculating the distance of each object from the acquisition unit.

25. A method as claimed in claim 23, wherein an additional verification step is provided, which comprises checking that each object belongs to one or more of the depth levels.

26. A method as claimed in claim 17, wherein the step of processing comprises a step b9) of locating the user and a step blO) of placing each previously found object in the environment, and estimating its distance from the user.

27. A method as claimed in claim 25, wherein the step blO) comprises the substeps of:

blOl) determining reference points to be associated with the detected objects,

bl02) estimating the distance of the operator from the reference points,

bl03) updating the user position,

28. A method as claimed in claim 17, wherein voice messages are generated to describe the environment to the user.

29. A method as claimed in claim 17, wherein the acquisition unit comprises a stereo camera with at least two optical sensors.

the acquisition step being carried out by a calibration process, consisting in aligning and synchronizing each image detected by the optical sensors .