Multisensory Data Compression
This invention relates to the compression of multisensory data. Humans perceive the world with their major senses, including: sight, sound, smell, taste and feel (where "feel" includes motion, touch, heat and so forth). Cross-modal effects, i.e. the interaction of these senses, can have a major influence on how environments are perceived, even to the extent that large amounts of detail of one sense may be ignored when in the presence of other more dominant senses.
The interaction of the senses, the so called cross-modal effects, influences a person's perception of an environment. A classic example of how our senses can trick each other, is the ventriloquism effect in which the viewer is fooled into thinking that the sound source emanates from a visual cue. Even the sense of taste can be affected by other dominant sensory stimuli. For example, it has been established that stale potato crisps taste fresher when they are accompanied by an
electronically generated "crispy" sound.
Most research in sensory perception to-date has been uni-sensory, focussing on the functional properties of only one sense. It is only recently that researchers have recognized that, in the real world (as opposed to a laboratory environment), the human brain sorts through all sensory input to couple signals that relate to a common event. This is done concurrently while processing the separate sensory inputs [Calvert, G.; Spence, C; Stein, B.; 2004. The Multisensory Handbook. MIT Press]. In 1993 [Stein, B. and Meredith, M.; The Merging of the Senses. MIT Press] proposed three principles to describe multi-sensory integration and when this is likely to be strongest.
The Spatial Rule: when the contributing uni-sensory stimuli originated from approximately the same location;
The Temporal Rule: when the contributing uni-sensory stimuli originate at approximately the same time; and,
Inverse Effect: when the contributing uni-sensory stimuli are relatively weak when considered one at a time.
To deliver a perceptually equivalent "real world experience" to a user in a digital environment, it is necessary to deliver the appropriate level of sensory stimulation for each sense. If it would be possible to capture accurately all the sensory stimuli in a real world scene, or to compute all the physics associated with presenting the stimuli, then, as in the real world, the user would simply process those parts of the stimuli which are necessary for the perception of the environment at that time. However, the full amount of sensory stimuli in the real world is huge, beyond any device's ability to capture all the details, and even if all the physics were fully understood, simulating such physical accuracy is beyond computing capabilities for many years to come. A key feature in increasing our ability to capture / model, store, transmit, and deliver full multi-sensory stimuli on current technology is to understand the limitations of the human brain, which is simply not able to process all the sensory input it receives every moment of the day. The human brain selectively processes these sensory inputs to build up a useful, but not necessarily accurate, perception of the environment [Chalmers A.G.; Howard D.; Moir C; Real Virtuality: A step change from Virtual Reality. SCCG'09: Spring Conference on Computer Graphics, pp 15- 22, ACM SIGGRAPH Press, 2009.]
The present invention is concerned with the compression of multisensory data in a manner that can take into account the interactions between the different types of sensory data.
According to the invention, there is provided a method of compressing a block of multisensory data comprising at least one frame comprising a plurality of different types of data, wherein there are stored quality settings for each type of data which include parameters determining the level of compression for that type of data in that block; each type of data in that block is subjected to compression in accordance with the quality settings to produce a compressed bock of multisensory data; and a package of data is created which includes the compressed block and a header which includes the respective parameters used for compressing each type of data.
In addition to compression of multisensory data in accordance with the quality settings, there may be temporal compression of the multisensory data. In that case, temporal compression may be carried out at the same stage as another type of compression, or may be carried out after the other type of compression.
Alternatively, data that is compressed in accordance with the quality settings may be compressed only using a temporal technique.
Temporal compression can be carried out by any known method such as MPEG encoding. Temporal compression works across time, comparing one frame of data with another frame of data, and achieving compression by saving differences between frames. If temporal compression is the only form of compression for a particular type of data, the quality settings will relate to the type or extent of temporal compression. Compression in accordance with the quality settings, if not in the form of temporal compression, can be carried out by any known method for compressing a single frame of data and there may be different methods of compression for different types of data. Decoding of the package of data will involve decoding the compressed block of data in accordance with the parameters stored in the header. There may also be an additional stage of decoding separate temporal compression. The quality of the uncompressed types of data will depend on the quality settings used for the respective types of data for this block, as well as any parameters for separate temporal compression.
For each package of data, the header will contain data identifying the type of compression used for each different type of multisensory data, and the parameters used. If there is an additional stage of temporal compression, the header will also contain information about that. In some embodiments, there is an additional step of inter-package compression. For a given series of packages of data, the different types of compression used for the different types of multisensory data may be the same. In that case, the difference between the packages of data in the series consists of the parameters used for compression of the different types of multisensory data. It is therefore possible to combine these packages in a final
package, with a main header that contains data identifying the types of
compression used for the different types of multisensory data. In that case, the individual packages contained in the final package can have a more compact structure, as they only need to identify the parameters for the compression of the specific types of data rather than having to identify also the types of compression concerned. If there is temporal compression also which is applicable to all of the individual packages within the final package, the main header in the final package can also contain data relating to that temporal compression. In some embodiments, the individual packages could be created from the start with these compact headers, additional data such as that identifying the type of compression being stored and then included in a main header of a final package that brings together the individual packages. For any particular block, the quality settings can determine the relative qualities of the different types of data, taking into account how they will interact including any cross-modalities. In a simple example, for an given scene there may be one or more types of data which are of a foreground type and whose quality will be maintained at the highest level, and one or more types of data which are of a background type and whose quality will be allowed to drop as a result of the first stage of compression. In a more complex arrangement there will be several different quality levels that can be applied to each type of data.
The type of data may be obtained from capturing live data from suitable sensors to detect, for example, audio, visuals, feel, smell, taste and any other types of stimuli, and / or artificially created data in respect of this stimuli. It will be appreciated that the expression "type of data" is not restricted to just a particular type of stimulus. The expression "multisensory" does imply that different sensory organs of a user will be involved. For example, one type of data could be sound data captured by a microphone in a room, another type of data could be sound data captured by a microphone outside the room, and another type of data could be artificially created background music for a scene. In any scenario, a person creating a multisensory presentation can use as many channels of data as desired, and for a given block of multisensory data there can be different quality settings applied to each channel.
The quality settings applied to the different types of data can be chosen to minimise the overall perceptual loss compared to reality when the compressed block of data is uncompressed and the multisensory data reproduced. The invention can be used when creating a film or video so that in different scenes, different senses are given different quality settings. The invention can also be used when creating an interactive video game, or when creating a virtual environment.
It will be appreciated that the quality settings are used in the context of the compression of data. This is distinct from adjusting other parameters of the different types of data. For example, in the case of multi channel audio streams an editor may adjust the volume of the various streams to obtain the sound balance that he wishes. The present invention will adjust the quality of for example compressed audio signals as compared to compressed video signals, depending on the relative importance of the signals in any particular context.
In order to obtain a high quality compressed multisensory block stream, in some embodiments the quality settings are determined with respect to the quality settings previously used for previous blocks.
In embodiments of the invention, a stream of multisensory data blocks is fed to an encoding system which will compress the individual blocks as described above, and combine the final blocks into a compressed output stream. A decoder will receive the stream of final blocks, will uncompress individual final blocks and the individual data types, and will create a stream of uncompressed multisensory blocks. The quality of the data types in the uncompressed stream will depend on the quality settings for the data types, which may vary form block to block, or be maintained for groups of blocks. Viewed from another aspect, the invention provides a method of decoding a package of data comprising a header and compressed block of data including a number of different data types which have been subject to individual levels of compression in accordance with respective quality settings for the data types; the header containing information about the compression of the individual data types; wherein the method uncompressing the individual data types in accordance with
data stored in the header so as to provide at least one frame comprising a plurality of different types of data.
Viewed from another aspect, the invention provides data processing apparatus for compressing a block of multisensory data comprising at least one frame comprising a plurality of different types of data, the apparatus having a microprocessor and memory, and the apparatus being configured to carry out the following steps: to store quality settings for each type of data which include parameters determining the level of compression for that type of data in that block; to subject each type of data in that block to compression in accordance with the quality settings to produce a compressed bock of multisensory data; and to create a package of data which includes the compressed block and a header which includes the respective parameters used for compressing each type of data. Viewed from another aspect the invention provides data processing apparatus for decoding a package of data comprising a header and a compressed block of data including a number of different data types which have been subject to individual levels of compression in accordance with respective quality settings for the data types; the header containing information about the compression of the individual data types; the apparatus having a microprocessor and memory, and the apparatus being configured to uncompress the individual data types in accordance with data stored in the header, so as to provide at least one frame comprising a plurality of different types of data. Viewed from another aspect the invention provides a computer software product for configuring data processing apparatus having a microprocessor and memory to compress a block of multisensory data comprising at least one frame comprising a plurality of different types of data, by carrying out the following steps:
to store quality settings for each type of data which include parameters determining the level of compression for that type of data in that block; to subject each type of data in that block to compression in accordance with the quality settings to produce a compressed bock of multisensory data; and to create a package of data which includes the temporally compressed block and header including the respective parameters used for compressing each type of data in the first stage of compression.
Viewed from another aspect , the invention provides a computer software product for configuring data processing apparatus having a microprocessor and memory to decode a package of data comprising a header and a temporally compressed block of data including a number of different data types which have been subject to individual levels of compression in accordance with respective quality settings for the data types; the header containing information about the compression of the individual data types; the apparatus having a microprocessor and memory, by carrying out the following step: to uncompress the individual data types in accordance with data stored in the header, so as to provide at least one frame comprising a plurality of different types of data.
In some embodiments in accordance with these aspects, there is also temporal compression, so that encoding data requires a step of temporal compression, and decoding data requires a step of uncompressing temporally compressed data. In some embodiments in accordance with these aspects, there is an additional step of combining packages into a final package with a main header that includes information concerning the different type of compression used, the individual packages having more compact headers that include the parameters for compression of the different types of data for the respective packages. Thus
encoding requires the step of creating the final package, and decoding requires the step of deconstructing the final package.
In accordance with the invention, multisensory data can comprise data of various types and there will be various parameters. For example, if the sense concerned is smell, there may be one or more canisters of odours. Each odour can contain a different odour, which can be released into the air so that a person can smell it. Parameters may include data identifying the canister; the time when the odour is to be released; how long the odour is to be released for; the speed at which a stream of odour is released into the air; the position and direction at which the odour is released; the amount of dilution of an odour carrying liquid and so forth. If there is a virtual reality scene in which a person experiences, for example, walking along a road with restaurants, different smells could be encountered. By way of another example, the sense could touch and the sensation required is that of wind. Thus, there could be one or more fans positioned to direct a flow of air at a person and / or an object such as the sail of a yacht. The parameters could be used to simulate the wind on a person's face and could include parameters controlling the injection of water into the flow of air to simulate the spray on a person's face whilst sailing the yacht.
In another example, the sense could be taste and the parameters could control the presentation of a flavoured tab for a person to insert in their mouth. There will be many possible channels of data controlling a range of actions that determine the experiences sensed by a person. These actions will be controlled over time by a computer and the present invention allows the data to be stored and / or transmitted in a compressed form. An embodiment of the invention will now be described by way of example only and with reference to the accompanying drawing which shows a compression scheme in accordance with the present invention.
According to Fig 1 , an output stream 1 of multisensory data blocks is provided. A single multisensory block 2 consists of one or more multisensory frames. These
multisensory frames consist of multiple multisensory stimuli 3. These multisensory stimuli are digital representations of any real world stimuli, which may include visuals, audio, smell, feel, taste and others. The real world multisensory stimuli may be captured 4 by means of sensing devices and/or artificially created 5 via a means of simulation. The single multisensory block is inputted to a multisensory quality decision method 6 which may be implemented by either software or hardware.
Additionally input into the multisensory quality decision method 6, are quality settings 7. These quality settings may be specific for an individual multisensory block, or the quality settings can apply to a sequence of multisensory blocks. The quality settings are determined by situation requirements 8. The multisensory quality decision method 6, determines the level of compression to be used for each of the multisensory stimuli in a compression step 9. The compression step 9 includes not only specific compression to the individual types of data in accordance with the quality settings, but also temporal compression that can be applied to the individual types of data and /or to the complete block. Temporal compression algorithms will be used in accordance with temporal compression schemes that are known per se in the art, such as MPEG encoding and so forth. The parameters used for the various compression techniques and data identifying the techniques, are stored.
The result is a multisensory package 10, which contains a compressed
multisensory block 1 1 and a header 12 containing the various compression parameters and identifying data.
A series of multisensory packages are then subjected to inter-package compression at step 13, to create a final package 14. This combines the individual packages 10, but the headers have been replaced by mini headers 15 which now contain only the parameters for the individual compression of the data types. The final package 14 has a main header 16, which contains temporal compression parameters, as well as data identifying the types of compression applied to the different types of data of the individual packages in the final package. The main header may contain any compression parameters or information, for any data type, that is the same for all packages in the final package.
ln order to obtain a high quality compressed multisensory block stream, in some embodiments the quality settings are determined with respect to the quality settings previously used for previous blocks. Thus, the quality settings for a given multisensory block is fed to a compression properties stage 17 which stores the properties of a number of quality settings obtained for previous blocks of the same environment. This compression properties stage can also store temporal compression parameters. It should be noted that an environment change identifier stage may be provided to reset the compression properties stage on any change of scene (not shown).
The compression stage 9, on receiving a quality setting as initially determined for a multisensory block 2 determines whether or not the change in the quality setting compared to prior quality settings is large or not. It will be obvious to the skilled person that any such deviation can be weighted so that particular parts of the quality setting have a weight higher than others. The same applies to temporal compression parameters.
Two examples are given below of situation aware quality settings. In each of these, the quality settings are dependent on the application and the user who is experiencing the multisensory data
The first example concerns walking down a street. The simple task of walking down a street may require different quality settings for the sensory stimuli, depending on what the user is doing while walking down the street. In a virtual embodiment of this task:
If the user is looking for a particular street sign, the visual stimuli, particularly in the area of a scene where the user will expect to see a street sign, needs to be at the highest precision. If the user is instead searching for a coffee shop, then he/she will be using his/her nose to try and locate the smell of coffee, so, in this case, the precision of the smell stimuli needs to be at high precision. If, on the other hand, the user is in fact a soldier on patrol down a street in a hostile environment, then he is likely to be listening hard for, for example, the sound of a weapon being cocked, so it is the sound stimuli that needs to be delivered to the virtual environment at the highest precision.
A second example concerns yachting. Yachting is a popular recreational sport undertaking by all manner of people, from beginners to highly skilled professionals. Yachting is a multisensory experience including the sight of the straining sails, the feel of the wind in the face, the taste of water in the mouth, and the sound of the wind whistling past. Recreating an authentic virtual yachting experience requires these multiple senses to be captured or artificially created and delivered to a user. The required quality of the individual senses that should be delivered to the user of the yachting scenario differs.
If the user is highly skilled, then he or she will be listening carefully to various sounds, in particular the sound of the water over the hull as an indication of speed. The sound comes from increased disturbance of the water passing under the boat. As the boat speeds up it pushes more and more water away from the hull. This is what creates the change in the note of the sound of water as the boat accelerates through it. On some boats the sensation of sound linked to increased speed will added to by increased noise in the rigging (the shrouds usually but the noise could also be from a topping lift). Vibration in the windward shroud increases and as it does so can create a humming sound.
The level of quality delivered to this user needs to ensure a high precision for the sound of the water across the hull and the rigging. While the user is attempting to ascertain the yacht's speed and thus focussing on this, other sounds and indeed other sensory stimuli will not be attended to and can thus be delivered at a lower quality, without the user being aware of this quality difference.
If the user is less skilled, he/she will be depending on other sensory stimuli to try and determine the yacht's speed, such as visual stimuli in the increase in the size of the bow wave and feel stimuli in the amount of spray coming over the boat and the feel of the wind of his/her face. In this case, it is the visual and feel sensory stimuli that need to be delivered at high precision, while the sound of the water across the hull can be delivered at a much lower precision compared to the skilled user case.
Embodiments of the invention allow accurate simulations of equipment and environments to be established. The environment can be varied and there will be a
life-like effect on the effect on the equipment. For example, when testing the noise in an aircraft cabin, different channels representing external noise sources, internal noise sources and motion can be set up, with appropriate quality settings for take off, cruising, and landing, and the effects of varying sound deadening measures can be simulated accurately without the expense of testing the measures in a live aircraft undertaking these manoeuvres.
A computer software product for use in accordance with the invention will contain program instructions to be executed by a microprocessor. These instructions may be stored on tangible media such as a DVD or solid state memory, or may be provided in non tangible form as signals, for example over the Internet.
It will be appreciated that in embodiments of the invention, the output stream 1 will comprise a final package 14 or a series of final packages. The output stream could comprise a series of packages 10, if there is to be no inter-package compression stage. In that case the compression properties stage 17 would receive compression data directly from the packages 10.
In some embodiments of the invention, a video and / or audio stream could be provided, with which additional sensory data is combined.