WO2021113687A1 - Système et procédé permettant un placement de produit dans la vidéo et une capacité d'achat dans la vidéo faisant appel à la réalité augmentée - Google Patents
Système et procédé permettant un placement de produit dans la vidéo et une capacité d'achat dans la vidéo faisant appel à la réalité augmentée Download PDFInfo
- Publication number
- WO2021113687A1 WO2021113687A1 PCT/US2020/063380 US2020063380W WO2021113687A1 WO 2021113687 A1 WO2021113687 A1 WO 2021113687A1 US 2020063380 W US2020063380 W US 2020063380W WO 2021113687 A1 WO2021113687 A1 WO 2021113687A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- video
- user
- smartphone
- user authentication
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0641—Shopping interfaces
- G06Q30/0643—Graphical representation of items or shoppers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/036—Insert-editing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/87—Regeneration of colour television signals
- H04N9/8715—Regeneration of colour television signals involving the mixing of the reproduced video signal with a non-recorded signal, e.g. a text signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/06—Authentication
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/60—Context-dependent security
- H04W12/63—Location-dependent; Proximity-dependent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/60—Context-dependent security
- H04W12/68—Gesture-dependent or behaviour-dependent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/38—Services specially adapted for particular environments, situations or purposes for collecting sensor information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Definitions
- This invention relates generally to the field of object recognition in computer vision technology and augmented reality technologies. More specifically, this invention relates to a system and method for product placement and marketing using object recognition and augmented reality. Further, this invention relates to automated continuous user authentication on a smartphone.
- Techniques are provided by which the digital delivery of a viewer-requested video along with the best chosen advertisement for the viewer is improved. These techniques may be particularly suited for the short video industry.
- An innovative video analytics mechanism and user-behavioral analytics mechanism are provided, with which the best match of an exact product on the video for the particular viewer is advertised on the video, while the viewer is viewing the video. Further, techniques are provided that enable the viewer to purchase the product while still in the video, not having to leave the video or the site to complete the purchase. Further, techniques are provided by which the users and their smartphones are automatically continuously being authenticated.
- FIG. 1 depicts a screenshot of a video frame in which an interactive video object, an interactive product logo, and an interactive product purchase window in a portion of the video are each presented, in accordance with an embodiment
- FIG. 2 depicts a screenshot of a video frame in which an interactive video object, an interactive product logo, and an interactive product purchase window in a portion of the video are each presented, in accordance with an embodiment
- FIG. 3 depicts a screenshot of a video frame in which an interactive video object, an interactive product logo, and an interactive product purchase window in a portion of the video are each presented, in accordance with an embodiment
- FIG. 4 is a schematic diagram of a high-level architecture of the network environment, in accordance with an embodiment
- FIG. 5 is a schematic diagram of the API Ad matching process and best Ad file delivery process, in accordance with an embodiment
- FIG. 6 depicts a process for analyzing and storing video data, including identifying objects within the video and identifying products within the video, in accordance with an embodiment
- FIG. 7 depicts a process for analyzing user-behavioral data, including identifying objects viewed on videos, in accordance with an embodiment
- FIG. 8 depicts a network environment for when a viewer views the video, in accordance with an embodiment
- FIG. 9 depicts a network environment for during each video upload, in accordance with an embodiment
- FIG. 10 depicts a network environment for API overall workflow, in accordance with an embodiment
- FIG. 11 is a block schematic diagram of a system in the exemplary form of a computer system according to an embodiment
- FIG. 12 is a screenshot 1200 of an e-commerce website on a smartphone, the screenshot showing a list of four video options, shown as circled, that are selectable by the user to activate the corresponding video, in accordance with an embodiment
- FIG. 13 is a screenshot of a cooked dish with an advertiser’s company logo and link to a discount super-imposed over the original display of the cooked dish, in accordance with an embodiment
- FIG. 14A is a screenshot of a short video in which a link, shown as an icon of the vendor and an arrow to the available product-item and pointed at with a hand GUI, for purchasing the product-item that is displayed in the video, is integrated into the video, in accordance with an embodiment;
- FIGS. 14B-G is a series of screenshots showing how the innovative platform enables the user to view the video and purchase the product shown in the video without ever leaving the video, in accordance with an embodiment
- FIG. 15 is a schematic diagram of the system architecture for automatic continuous user authentication on a smartphone, in accordance with an embodiment
- FIG. 16 is an accelerometer x-axis plot, depicting a graph of the data from the accelerometer’s reading of a typical user’s handling of their smartphone, in accordance with an embodiment
- FIG. 17 is a schematic diagram of the innovative user authentication CNN model architecture, according to an embodiment.
- Techniques are provided by which the digital delivery of a viewer-requested video along with the best chosen advertisement for the viewer is improved. These techniques may be particularly suited for the short video industry.
- An innovative video analytics mechanism and user-behavioral analytics mechanism are provided, with which the best match of an exact product on the video for the particular viewer is advertised on the video, while the viewer is viewing the video. Further, techniques are provided that enable the viewer to purchase the product while still in the video, not having to leave the video or the site to complete the purchase.
- the viewer when a viewer desired an object in the video and would like to purchase that object, the viewer is required to perform many additional steps on the Internet, such as executing one or more searches for the item, going to one or more vendors’ websites to study the attributes of the product, and opening the webpage to buy the desired item.
- the innovation described herein solves the problem by improving the processing time for e- commerce technology, timewise and processing cost-wise.
- techniques described improve digital user-consumption technology and improve user-convenience.
- the viewer instead of the viewer imposing many costs on the Internet channel, as described above, the viewer can check out and purchase a product within the video, itself.
- the innovation is a system, method, and/or platform (“system,” “process,” or “platform”) that recognizes the objects from a video, places automatically an augmented reality-based (AR) advertisement (ad), and enables the viewer to purchase the product in the advertisement from the video, while still viewing the video.
- the innovation detects all the objects in the video and automatically places the AR advertisement.
- the innovation detects the guitar 102 and automatically places the vendor’s brand indicator, e.g., logo 104, and the guitar’s product information, such that the viewer can purchase the same guitar while viewing the video.
- the innovation involves the features of detection, segmentation of the viewers, and placing the AR advertisement.
- the innovation parses the video into frames and detects the objects in the frames of the video. Then the system automatically detects each pixel in the frame and tags those pixels with one of the detected objects. For example, there is a lot of data about what is going on with the video file. In one second there may be seven tress, four birds; in this part of the frame, two seconds, there is this-this-this; three seconds, there is this-this-this, etc. In a one hour video, based on every single second, there is a whole database of the objects and in which coordinate the objects are within the file. Artificial intelligence techniques are used to explain what is going on in predetermined portions of the video, such as in each corner.
- the Al may indicate that in the left side there is a cup, on the right side there is a white plate, in the other corner there is a white cup, and this-is-this.
- Al is explaining what is happening in each frame and writing the information down in a file such as a JSON formatted file. This file may be looked back, at any time there is a request. That is, this process a one-time analytics, in which the Al completely explains everything. Also, the Al It explains what is the name of the brand that is present in the video, for example, by accessing a dictionary of products stored in the Ad library.
- the viewer may tap each object (which has been tagged in each pixel) and then see an appropriate, corresponding, or matching augmented reality-based advertisement.
- the system automatically had detected the whole guitar so that when someone taps any part of the whole guitar, then AR-based advertisement about the guitar comes up.
- FIG. 2 Another embodiment is illustrated in FIG. 2.
- the system had previously detected and tagged the shoes 202 and, upon user clicking or moving the mouse over the shoes, automatically displayed in real-time the AR advertisement 204 with the vendor’s logo 206 to the viewer.
- the system continually monitors and analyzes the viewer’s linking to sites and, based on that information, creates a list of objects that the system determines the viewer wants to see as advertisement. More specifically, the process proceeds as follows. In video 1: there are 10 objects; in video 2 there are 5 objects; in video 3 there are 2 objects; and in video 4 there is the 1 object with augmented reality- based advertisement. According to the process, in video 1 the system recognized the 10 objects and matches the viewer’s demographics and interests with the objects, performs statistical analysis and, subsequently, places a video with 5 objects that falls within the viewer’s demographics and interests, based on the first video and his previous history.
- the process repeats until the system detects the one object that the system determined the viewer is looking to buy and an appropriate advertisement is placed. That is, the system repeats the process until the system finds the object that has the highest probability of the viewer liking it and an advertisement is placed or overlaid. For example, in FIG. 3, the system had recognized the identity of the artist 302 and placed the augmented reality-based animation 304 and a logo 306 about his music.
- FIG. 4 a schematic diagram of a high- level architecture of the network environment 400.
- a creator uploads a video to a site so that the video may be watched later by others.
- creator 420 uploads their video to the database 421 of an enterprise or a social network site, such as for example, Tiktok, YouTube, or Facebook.
- the video analytics component 412 processes the video.
- the video analytics component 412 may include a processor and one or more application programming interfaces (API) for processing data in accordance with one or more proscribed algorithms.
- API application programming interfaces
- video analytics component 412 analyzes the video in four parts: analyzes basic or standard video information; analyzes to recognize and identify objects within the video; determines or establishes the class of the object, according to a predetermined classification system of definitions; and identifies the exact product that is each object, e.g. by referencing a preloaded Ad server or Ad library.
- the video analytics component 412 logs the video file to a logfile or register file. Also, the video analytics component 412 records the data resulting from the analysis to a JSON formatted file and stores the JSON file in the processed video database 414 for future retrieval.
- the process imports the uploaded video, a video analytics file, and the generated video. These files may be filled with information or they may not be filled with information. These files and their data are stored in the database 414. It should be appreciated that the videos are stored in database 414.
- the process extracts the exact data and time of the import, referred to as import datetime.
- the process imports the detected objects from the database, the unique class set and dictionary of the objects.
- the detected objects may be pants, a shirt, and shoes, in the video.
- the unique class set is imported.
- the shirt is for a woman
- the pants are for a young man
- the shoes are for a young man.
- the unique dictionary is imported.
- the exact product for each object The shoes are Nike, Product No. 12345.
- the exact product was determined by performing a match operation in the Ad server or library (e.g., Ad Server 408), in which vendors has previously provided their product information.
- the informational data is written to a record or file (e.g., a JSON file) for future retrieval.
- the process finds information from each frame of the video, extract frame to frame information from video.
- the process also extracts the indices from the tuple.
- tuple means when the video has been divided into thousands of different frames, every frame is assigned a number (1, 2, etc.).
- the system obtains the basic user information and login number.
- the vendor 418 transmits product description data (e.g., product number, color, size, etc.) for potential viewer purchase in Ad server-database 408. This process may be ongoing. When a vendor is ready to sell its products, the vendor may provide the required information to Ad server 408. In an embodiment, the vendor is enabled to provide the advertisement data by transmitting the data from vendor’s site 418 to Ad server- database 408 via communication network 419. In another embodiment, vendor’s site 418 is communicably connected to Ad server 408 directly.
- product description data e.g., product number, color, size, etc.
- the view requests to view a video.
- viewer 410 makes a request to the frontend 408 component to view a video.
- the viewer may want to watch a specific video on Tiktok.
- the frontend 406 receives the request and performs the following operations: obtains user-behavior analytics (that is continuously being performed) about the user from the user-behavior analytics component 415; and forwards the video request information along with the user-behavior analytics data to the backend 404/API Ad Matching Component 413.
- An embodiment may be understood with reference to Table B, pseudocode of an exemplary user-behavior analytics algorithm, consistent with embodiments herein.
- video analytics processor 412 sends video information, such as objects recognized in the video, to the API Ad Matching 413.
- API Ad Matching 413 takes these two inputs, the input from the user-behavioral analytics component and the input from the video analytics component, compares the data from each input together, and determines matching objects and assigns their respective scores.
- the backend may indicate that there are 10 objects (shirt, pants, hat, etc.) and the frontend may indicate that there are two objects (shirt and shoes) that that person is interested in.
- API Ad Matching 413 may determine that the shirt gets a score of 8 points (for example, because there is a match of a shirt object from the video analytics input and a shirt from the user-behavioral analytics input) and the shoes gets a score of 1 point (e.g., the API Ad Matching component 413 may be instructed to provide a back-up or default-type object, or the shoes might get a score of 1 dependent on other factors such as past input objects).
- the API Ad Matching component 413 may determine that just the Shirt Ad and not that Pant Ad are requested (e.g., the API Ad Matching component 413 may be instructed to submit only one object, e.g., the object with the highest score, and attributes to the Ad Server 408).
- Ad Server 408 receives the score and attributes and finds/determines the corresponding (e.g., best) ad for that video and sends the Ad file corresponding to the product back to the backend 404/API Ad Matching 412 component.
- the Ad Server 408 had been prepopulated, by the vendors, with product-related data by appropriate storage capabilities, such as dictionaries, look-up tables, database and related support, and the like.
- the vendor loads the product-related data to the Ad Server component 408. This process of the vendor uploading products for selling or promoting may occur on a continual basis by the same vendor or other vendors.
- the API Ad Matching component 412 receives the best Ad file and performs a doublechecking operation for accuracy. For instance, it checks whether the attributes of the object in the received Ad file matches those of the products (e.g., is the product received a green, women’s shirt?).
- the backend component 404 delivers the requested video file from processed video database with the best Ad file from the API Ad Matching 412 to the frontend component 406.
- the content layering component 416 receives the Ad file and the video and layers the content of the Ad over the requested video to create a video plus Ad package. It should be appreciated that such content layering process may occur in real-time, while the viewer is already watching the video. Or, the content layering process may occur before the viewer begins to watch the video, when the video is presented to the viewer. For instance, the ad content may have been layered on specific frames and are not presented until the viewer’s streaming process arrives at those frames.
- the advertisement content may be layered, superimposed on, or injected into the video according to one or more of the following processes: content layering; content injection; content repurposing; content re-forging;
- GAN-enabled with content editing and content editing.
- content layering may be used herein to mean any of the specific processes outlined above and may not be meant to be limiting.
- content-layer used herein may mean invideo product placement. Examples of such in-video product placement are illustrated in FIG. 1 (104 and 106); FIG. 2 (204, 206), and FIG. 3 (304 and 306).
- the frontend 406 delivers the video plus Ad package to the viewer 410 for viewing.
- the viewer decides to purchase the product within the video.
- the viewer 410 interacts with video to purchase presented product, by: clicking or otherwise selecting that Ad to buy the product within the video by selecting overlaid product-detail window and entering required viewer informational data (e.g., address, payment information (e.g., credit card number), and attributes of the product).
- required viewer informational data e.g., address, payment information (e.g., credit card number), and attributes of the product.
- the product placement GUI e.g., 106
- the product placement GUI e.g., 204 may have an embedded link such that when the viewer clicks on the GUI (e.g., 204), another GUI (e.g., dialog window) may pop-up into which the viewer may insert the purchase information discussed above.
- the purchase product 417 component may perform the following operations:
- the link may take the viewer to an Ad for the product, from which the viewer may continue to purchase the product.
- the link may take the view to the vendor’s site 418 from which the viewer may continue to purchase the product.
- the frontend 406/purchase product 417 component takes the three pieces of viewer informational data described above and sends such information to the vendor’s site 418 to complete the purchase.
- communication network 419 is illustrated as a generic communication system.
- the communication network 419 comprises the Internet.
- the communication network 419 comprises the API Gateway 402, a logical hub for the APIs of the frontend 406, the backend 404, and the Ad Server 408 to connect and communicate.
- the API Gateway 402 monitors and organizes API requests from the interested parties.
- the API Gateway 402 may perform other auxiliary operations, such as authentication, rate limiting, and so on. Accordingly, interfaces may be a modem or other type of Internet communication device.
- the communication network 419 may be a telephony system, a radio frequency (RF) wireless system, a microwave communication system, a fiber optics system, an intranet system, a local access network (LAN) system, an Ethernet system, a cable system, a radio frequency system, a cellular system, an infrared system, a satellite system, or a hybrid system comprised of multiple types of communication media.
- interfaces are configured to establish a communication link or the like with the communication network 419 on an as-needed basis, and are configured to communicate over the particular type of communication network 419 to which it is coupled.
- the frontend 406 is a viewer or end-user facing component such as those provided by enterprises.
- enterprises hosting the frontend 406 components may include social networking sites (e.g., Facebook, Instagram, etc.) or dedicated short video viewing sites (e.g., Tiktok).
- the viewer device which may be represented as viewer 410, may be directly connected with the frontend component 406 as an in-app connection.
- FIG. 5 a schematic diagram of the API Ad matching process and best Ad file delivery process 500.
- the video analytics processor with API e.g., 412 of FIG. 4
- sends video analytics data to the API Ad Matching processor e.g., 413 of FIG. 4
- the user- behavior analytics processor and API component e.g., 415 of FIG. 4
- sends user-behavioral data of the viewer also to API Ad Matching processor e.g., 413 of FIG. 4
- the API Ad Matching processor (e.g., 413 of FIG. 4) compares the data from the two inputs to generate objects deemed to be common to both inputs and their respective scores, as described above.
- the API Ad Matching processor (e.g., 413 of FIG. 4) sends the score(s) and corresponding object attributes to the Ad database or library (e.g., 408 of FIG. 4) to obtain a product match for each object.
- the API Ad Matching processor (e.g., 413 of FIG. 4) may send only the highest score and corresponding attributes to the Ad database or library (e.g., 408 of FIG. 4) to obtain a matching product.
- no product may be returned.
- a product that is the best fit match may be returned.
- the Ad database or library sends the Ad file to the API Ad Matching processor (e.g., 413 of FIG. 4). Such processor may verify that the item in the Ad file is correct, within a predetermined tolerance. Subsequently, the Ad file is sent to the frontend component 406. In an embodiment, the Ad file is sent back to the user- behavior analytics processor and API component (e.g., 415 of FIG. 4) , which then sends it to the appropriate processing component within the frontend 406. Alternatively, the Ad file is sent back to the content layering / operations component (e.g., 416 of FIG. 4) for being layered or otherwise ingested into the video.
- the Ad Matching processor e.g., 413 of FIG. 4
- the Ad file is sent to the frontend component 406.
- the Ad file is sent back to the user- behavior analytics processor and API component (e.g., 415 of FIG. 4) , which then sends it to the appropriate processing component within the frontend 406.
- the Ad file is sent
- FIG. 6 An embodiment of a video analytics process can be understood with reference to FIG. 6, a flow diagram for analyzing and storing video data, including identifying objects within the video and identifying products within the video. It should be appreciated that this process may be performed by the video analytics component executed by a processor 412.
- the process includes parsing the video file frame-by-frame and performing the following operations.
- the process identifies basic informational data about the video, such as the title and length.
- the process identifies one or more content objects in the video, such as for example, shirts, guitar, shoes, and skateboard.
- the process identifies a classification of each of the identified one or more content objects of the video.
- the shirt may be classified as a men’s shirt.
- the skateboard may be classified as a young person’s toy.
- the process identifies a purchasable product for each of the identified one or more content objects of the video.
- video analytics component 412 may access or obtain from Ad server-database 408 exact product data that corresponds to the objects in the video.
- video analytics component 412 may obtain from Ad server-database 408 that the type of shirt is a men’s Nike sportswear shirt in the color green, with product number 23456.
- the process logs the video file in a locally stored registry so that the system knows that it is there and can access the video file information at a subsequent time.
- the log entry may include the title of the video, the identification number of the video, and other attributes such as size of video and when it was obtained.
- the process generates the video analytics file in which the information necessary for a viewer to request the video and have the best Ad displayed thereon is stored within the video analytics file.
- such file is a JSON formatter file.
- the process stores the video analytics file for future retrievals. That is, each time a viewer requests to view the video, the full analytics of the video is present in the file.
- the Ad that is displayed to the viewer is dependent on the input data from the user-behavioral analytics component (e.g., user-behavioral analytics processor and API 415).
- the video and the Ad information are delivered to the viewer and it is the input from the user-behavioral analytics component that determines which Ad is presented.
- the video and the best Ad is delivered to the viewer.
- FIG. 7 a flow diagram for analyzing user-behavioral data 700, including identifying objects viewed on videos.
- An exemplary user-behavioral analytics process with API may be 415 of FIG. 4.
- the process determines what objects in the video the viewer likes best, based on rating-ranking-page optimization-content aware recommendations. For example, the process may determine that the viewer has a young adult son who has a birthday approaching and that the viewer has been determined to like sports clothing of young men.
- the process generates and continually updates a user vector corresponding to the viewer.
- the user vector may contain information identifying the viewer as being a late middle-aged woman of short stature, having a post-graduate degree, and having appears in a specific range of household income.
- the process refines what objects in the video the viewer likes best, based on recommendations using the user vector. For example, the video may contain an object with a designer shirt for men. The process may determine that the viewer may like this item.
- the process further refines objects in the video the viewer likes best, based on other-user data, using content-based and collaborative behaviors of users and recommendation systems. For example, the process may discover that the viewer has friends on social networking applications who like traveling to resorts.
- the process may determine that among the already selected objects, the viewer likes the objects which can be used in hot weather and for a vacation at a resort-type location. For instance, the process may identify a young man’s trending Nike cap as being of high interest and a tasteful pair of sunglasses to the viewer.
- the process further refines the objects in the video the viewer likes best, based on data that is not required, using collaborative denoising auto-encoders for top-n recommender systems. For example, the process may eliminate as irrelevant data that indicated that the viewer was viewing links and videos on baby items for one day only, as opposed to numerous days looking at vacation-related videos and links.
- the platform includes one API set in the backend server 404 and one API set in the frontend server 406.
- Such APIs connect three things.
- One is the backend database, where the videos are stored (e.g., database 414), the frontend viewing experience (frontend 406 accessible to viewer 410), where the end-user is going to view the video, and the ad servers (e.g., ad server-database 408).
- the backend API e.g., video analytics processor with API 412
- the API does a few things, including looking at every single frame, analyzing what is going on there, and keeping a log file of the information.
- the frontend e.g., user-behavioral analytics processor and API 415.
- the frontend 406 is constantly reacting to and/or trying to figure out what the viewer or end-user is interested in.
- the backend On the viewer side, suppose the viewer is watching a video of someone dancing. In that video, in the backend, the viewer has requested a video file. The backend server had sent the video file to the viewer. In the backend, there is one API (the backend API) that knew that now the viewer is ready to be served an ad. Thus, the backend API, is prepared to communicate with the frontend indicating, show this small file to this user. However, at the same time while on the way, the backend picks up a small ad from the Ad Library and brings it with the video for delivery to the frontend.
- the backend API there is one API (the backend API) that knew that now the viewer is ready to be served an ad.
- the backend API is prepared to communicate with the frontend indicating, show this small file to this user.
- the backend picks up a small ad from the Ad Library and brings it with the video for delivery to the frontend.
- the backend server where such video file is stored, and there is an API that has all the analytics, as discussed above.
- the backend API knows that the viewer is ready to see an Ad.
- the frontend requests, e.g., by sending objects to the backend, that the viewer see even more interesting guitar-related ads.
- the backend e.g., the API Ad Matching Processor 413 accesses the Ad Server 408 and gets a Guitar Ad from the Ad Library 408.
- the frontend API picks up that Guitar Ad and layers that Ad in front of the viewer in real-time or as a different video is being delivered.
- the server e.g., frontend 406 receives a request 802 for a video from the client device 804.
- an authorization processor 808 determines whether the client is authorized to obtain or view the video. For example, the server checks whether it a private video or a public video. If it is a public video, then the server checks whether this client has access ability. If so, the process continues along the “yes” path. If not, an error code 818 is returned to or displayed for the client 804. If yes, control goes to an API Endpoint processor 812. Such API Endpoint processor 812 checks whether the request, itself, is acceptable 810.
- the appropriate error handling 822 is performed. If yes, the control goes to a sanitization processor 816. Such processor strips the request of potentially dangerous protocols at step 814. After doing so, at step 824, the video is uploaded at the client, along with user history being updated and the appropriate Ads from the Ad library being sent with the video. At step 826, the process is updated and goes to the backend server 820, to the Ad library to generate an Ad ID. At step 828, the server 820 updates the Ad library or provides user history and generates the most promising Ad(s) ID(s). The most promising Ad(s) ID(s) are used to obtain the most promising Ads. Such most promising Ads are transmitted by the server 820 to the client 804 for display 830.
- a video has a lot of personal information in terms of the labels.
- the labels are stripped and only the video files are sent to the client.
- a creator uploads a video in a company’s (e.g., TikTok) video site
- the site records details about the video. For instance, a video with John, age such-and-such, and other personal details are whom sent this video.
- this is very private information. The viewer only sees the video content, e.g., that it is a dance video The sensitive information is stripped.
- step 802 may be executed by the viewer 410; step 824 (upload video) may be executed by creator 420; step 826 (process updated Ad library and generate Ad ID) may be executed by Ad server-database 408; and step 830 (display suitable Ads) may be executed by frontend server 406.
- an embodiment can be understood with reference to FIG. 9, a network environment for during each video upload.
- the server e.g., frontend 406 receives a request 902 for a video from the client device 904.
- an authorization processor 908 determines whether the client is authorized to obtain or view the video. For example, the server checks whether it a private video or a public video. If it is a public video, then the server checks whether this client has access ability. If so, the process continues along the “yes” path. If not, the authorization process 908 may determine that the API key is invalid 906. In this case, an error code 918 is returned to or displayed for the client 904. If yes, control goes to an API Endpoint processor 912.
- Such API Endpoint processor 912 checks whether the request, itself, is acceptable 910. If no, then the appropriate error handling 922 is performed. If yes, the control goes to a sanitization processor 916. Such processor strips the request of potentially dangerous protocols at step 914. After doing so, at step 924, the video is uploaded at the client. At step 926, the server 920 generates video analytics information about the video. At step 928, using the video analytics information, the server 920 generates the injected video. At step 930, the server 920 returns the video ID to the client 904.
- step 902 may be executed by the viewer 410; step 924 (upload video) may be executed by creator 420; and step 926 may be executed by the video analytics component 412.
- a request processor 1104 receives the request 1002.
- the authorization processor 1008 determines whether the request was authorized 1106. If not, it is found that the API is not valid 1014.
- An error handling code is activated 1018. If yes, the API Endpoint processor 1012 checks whether the request is acceptable 1010. If not, the error handling code is activated 1018. If yes, the sanitization processor 1016 performs endpoint specific sanitizing 1022, consistent with embodiments described above. Then the database server 1020 determines whether the request has been cached 1014. If not, the request is stored in cache 1028.
- the database server 1020 determines whether the date and time of the request is within a freshness lifetime threshold 1026. If not, the database server 1020 determines whether the request has been revalidated 1030. If not, the request is stored in cache 1028. If yes, the request is a success and a success code 1032 is associated with the request 1004. If the database server 1020 determines that, yes, the date and time of the request is within a freshness lifetime threshold 1026, then the request is a success and a success code 1032 is associated with the request 1004. At step 1034, the request is complete 1034.
- step 1002 may be executed by the viewer 410; steps 1024 (cached?), 1026 (within freshness lifetime), 1028 (store in cache), and 1030 (revalidated?) may be executed by backend server 404; and step 1034 (request complete) may be executed by the frontend server 406.
- steps 1024 cached?
- 1026 within freshness lifetime
- 1028 store in cache
- 1030 revalidated?)
- step 1034 request complete
- a method for providing in-video product placement and invideo purchasing capability using augmented reality is disclosed.
- the method includes receiving a request for a video, the request initiated by a viewer; in response to receiving the request, retrieving, from a video database, a video analytics data file corresponding to the requested video, the video analytics data file comprising analytical data about the video; receiving a user-behavioral data file, the user-behavioral data file comprising information about the viewer’s current viewing habits and viewing preferences; transmitting the video analytics data file and the user-behavioral data file to an ad matching component executed by a processor; comparing, by the ad matching component, the video analytics data file and the user-behavioral data file and determining one or more similar objects from each of the files that are considered to be matching; generating, by the ad matching component, a corresponding one or more scores for the one or more matching objects; transmitting, to an ad server component executed by a processor, a highest score corresponding to one of the matching objects and attributes of the one of the matching objects; receiving, from the ad server component, an ad file
- the video analytics file was generated by a video analytics component executed by a processor, the video analytics component parsing the video file frame-by-frame and performing the operations of: identifying basic informational data about the video; identifying one or more content objects in the video; identifying a classification of each of the identified one or more content objects of the video; identifying a purchasable product for each of the identified one or more content objects of the video; logging the video file in a locally stored registry; generating the video analytics file; and storing the video analytics file for future retrievals.
- the video analytics file is a JSON formatted file.
- the method further comprises: receiving, by the ad server component on an ongoing basis from one or more vendors, product description data corresponding to products that are offered by the one or more vendors; and storing, by the ad server component in local dictionary, the product description data about the product for future retrieval; wherein the product description data is sufficient for the vendor to identify the product for a purchase request.
- the product description data comprises: price of product; product identifier; and product attributes.
- the method comprises receiving, by a purchase product component executed by a processor and from an overlaid product-detail window from the video file as the video file is being streamed by the viewer, a request to purchase a product corresponding to the product in the overlaid product-detail window, wherein the request comprises required viewer informational data for purchasing the product.
- the required viewer informational data comprises three pieces of information: the delivery address for where the product is to be delivered; credit card information, debit card information, or other currency information for actuating the purchase of the product; and informational data about the product for identifying the product.
- the method comprises transmitting, by the purchase product component, the three pieces of information to the vendor, causing the completion of the purchase.
- the user-behavioral data file was generated by a user-behavioral component executed by a processor, the user-behavioral component performing operations as the viewer views each video of one or more videos, the operations comprising: determining what objects in the video the viewer likes best, based on rating-ranking-page optimization-content aware recommendations; generating and continually updating a user vector corresponding to the viewer; refining what objects in the video the viewer likes best, based on recommendations using the user vector; further refining what objects in the video the viewer likes best, based on other-user data, using content-based and collaborative behaviors of users and recommendation systems; and further refining what objects in the video the viewer likes best, based on data that is not required, using collaborative denoising autoencoders for Top-N recommender systems.
- regression models comprising: logistic, linear, and elastic nets
- tree-based methods comprising: gradient- boosted and random forests
- matrix factorizations comprising: factorizations machines; restricted Boltzmann machines; Markov chains and other graphical models
- clustering comprising: from k means to HDP; deep learning and neural nets; linear discriminant analysis; and association rules.
- the method further comprises generating, monitoring, and updating the following parameters on a continual basis: titles watched or abandoned in a recent past by the viewer; members with similar tests and/or user data; titles with similar attributes; propensity of the viewer to re-watch a video; preference for what the viewer is watching; ratings of the videos being watched by the viewer; time of day of viewing session by the viewer; voracity of video consumptions by the viewer; and use of list searches by the view.
- FIG. 11 is a block schematic diagram of a system in the exemplary form of a computer system 1100 within which a set of instructions for causing the system to perform any one of the foregoing methodologies may be executed.
- the system may comprise a network router, a network switch, a network bridge, personal digital assistant (PDA), a cellular telephone, a Web appliance or any system capable of executing a sequence of instructions that specify actions to be taken by that system.
- PDA personal digital assistant
- the computer system 1100 includes a processor 1102, a main memory 1104 and a static memory 1106, which communicate with each other via a bus 1108.
- the computer system 1100 may further include a display unit 1110, for example, a liquid crystal display (LCD) or a cathode ray tube (CRT).
- the computer system 1100 also includes an alphanumeric input device 1112, for example, a keyboard; a cursor control device 1114, for example, a mouse; a disk drive unit 1116, a signal generation device 1118, for example, a speaker, and a network interface device 1128.
- the disk drive unit 1116 includes a machine-readable medium 1124 on which is stored a set of executable instructions, i.e. software, 1126 embodying any one, or all, of the methodologies described herein below.
- the software 1126 is also shown to reside, completely or at least partially, within the main memory 1104 and/or within the processor 1102.
- the software 1126 may further be transmitted or received over a network 1130 by means of a network interface device 1128.
- a different embodiment uses logic circuitry instead of computer-executed instructions to implement processing entities. Depending upon the particular requirements of the application in the areas of speed, expense, tooling costs, and the like, this logic may be implemented by constructing an application-specific integrated circuit (ASIC) having thousands of tiny integrated transistors.
- ASIC application-specific integrated circuit
- Such an ASIC may be implemented with CMOS (complementary metal oxide semiconductor), TTL (transistor- transistor logic), VLSI (very large systems integration), or another suitable construction.
- CMOS complementary metal oxide semiconductor
- TTL transistor- transistor logic
- VLSI very large systems integration
- Other alternatives include a digital signal processing chip (DSP), discrete circuitry (such as resistors, capacitors, diodes, inductors, and transistors), field programmable gate array (FPGA), programmable logic array (PLA), programmable logic device (PLD), and the like.
- DSP digital signal processing chip
- FPGA field programmable gate array
- PLA programmable logic array
- PLD programmable logic device
- a machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine, e.g. a computer.
- a machine readable medium includes read-only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals, for example, infrared signals, digital signals, etc.] or any other type of media suitable for storing or transmitting information.
- embodiments may include performing operations and using storage with cloud computing.
- cloud computing may mean executing algorithms on any network that is accessible by internet-enabled or network-enabled devices, servers, or clients and that do not require complex hardware configurations, e.g. requiring cables and complex software configurations, e.g. requiring a consultant to install.
- embodiments may provide one or more cloud computing solutions that enable users, e.g. users on the go, to purchase a product within the video on such internet-enabled or other network-enabled devices, servers, or clients.
- one or more cloud computing embodiments include purchasing within the video using mobile devices, tablets, and the like, as such devices are becoming standard consumer devices.
- the innovative platform is configured to open up a new branch to its existing automated video ads.
- APIs are provided that transform any website in the world with a capability to have stories or video summaries of the user’s homepage.
- FIG. 12 a screenshot 1200 of an e-commerce website on a smartphone, the screenshot showing a list of four video options 1202, shown as circled, that are selectable by the user to activate the corresponding video.
- this home page as depicted by the highlighted Home icon 1204, is dynamic due to the innovation’s APIs that create and make accessible such user-targeted video summaries or stories, as described in detail below.
- the platform embeds the user-targeted video summaries or stories into the home page of the particular vendor or provider website.
- the innovation may be referred to as a system, method, mobile application (“app”), and/or platform and refers to any of such interchangeably herein (e.g., “system,” “process,” “platform,” or innovation).
- the platform summarizes the videos that are available of the different products and creates a personalized feed for every single user.
- the platform s APIs, described below, are configured to work with any mobile application (sometime referred to herein as “app”) or website to create this functionality.
- apps any mobile application
- An example of such app or website is the ebay website as shown in FIG. 12. It should be appreciated that other websites are contemplated, such as but not limited to, Macy’s, Target, short form video media, live video, game streaming, over the top video platforms, telecommunications, social media, multichannel video programming distribution, and streaming companies.
- the platform is configured to bring video technology into any commerce website or any app with a personalized feed of products, video, or any service that the specific website or app is offering.
- This innovation changes the plain homepage of apps to a vibrant service which excites its users with a feed designed for them.
- the platform processes its behavioral pairing and product-user data matching.
- the platform uses the user behavioral analytics processor and API component 415 of FIG. 4 and uses the video analytics processor with API component 412 of FIG. 4.
- the platform is configured to bring a multitude, e.g., 10-20, of products-in-video or services-in-video.
- the platform is configured to extend the service or functionality described above to real time personalized offers.
- a service company and/or app with Internet presence can transform traditional email-based offers to a personalized feed, that the platform’s API created by matching the behavior of the user and content, plus ultra- targeted video offers such as illustrated in FIG. 13 and FIG. 14A.
- FIG. 13 is a screenshot 1300 of a cooked dish with an advertiser’s company logo 1302 and link 1304 to a discount super-imposed over the original display of the cooked dish.
- FIG. 13 is a screenshot 1300 of a cooked dish with an advertiser’s company logo 1302 and link 1304 to a discount super-imposed over the original display of the cooked dish.
- the platform optimizes the accessibility to the online purchasing of a product by presenting or integrating, on the webpage or app, the link to specific page where the user can purchase the product.
- the platform is configured to provide through the APIs an out-of-appl personalization.
- the platform creates a personalized feed of products or ads for the user that is referred to as an Off-App optimization of the APIs.
- the innovation with its APIs enables a third-party app, such as TikTok or other short video app, to bring customers to a vendor’s, such as ebay’s, website and keep them with using and incorporating the innovation’s personalized feed.
- the platform’s APIs serve some underling parts.
- the Apps/websites sellers e.g., vendors such as ebay, Hulu, Verizon, Twitter, and Roku
- the innovation is a video portal where such Apps/websites sellers can connect their video, such as illustrated in FIG. 12.
- This process can be skipped if the app and/or website already has the video capability where its sellers and/or content publishers post the video. Examples include: (1) when a user gives his/her phone to someone else the innovative platform will be able to understand the user and not show them the ads to make sure there is the highest impact; (2) when a user sees an e-commerce ad within the video they can just tap one time and the product gets checked out as the platform is configured to authenticate the user and complete the e- commerce; and (3) the system is configured to put the ads in live stream based in the users’ interest.
- the platform’s API goes to the database (e.g., Creator Video Database 421) and extracts the core features from the video, as described in detail above.
- An example of an extracted core feature is the top the dancer is wearing in FIG. 14A. That is matched (e.g., by API Ad Matching Processor 413) with the current homepage (e.g., as shown in FIG. 12) and with the specific user’s interest (e.g., as obtained by User Behavioral Analytics Processor and API 415). Then the platform’s API brings those videos on the home page as the summary of the homepage or as a personalized user story.
- accelerometer and gyroscope data the pattern of the users when they used their phone in the usual way while performing the activities such as typing, scrolling, calling, chatting, walking, holding, etc., and sends GPS location (latitude, longitude).
- GPS location latitude, longitude
- the API enables videos to feature offers that are available from discount to dynamic pricing, for example, as shown in FIGS. 12-14A.
- the platform’s APIs enables these videos to be clickable.
- FIGS. 14B-G screenshots showing how the innovative platform enables the user to view the video and purchase the product shown in the video without ever leaving the video.
- FIG. 14B is a screenshot of a young man talking, in which he is shown wearing an attractive hoodie. That specific hoodie is available on Amazon and, further, its product information has been processed by the platform, as discussed in embodiments above. Also as discussion in embodiments above, the platform displays the vendor’s logo (here, the Amazon logo) in the vicinity (e.g., overlaid on the hoodie) of the hoodie.
- the vendor’s logo here, the Amazon logo
- FIG. 14C depicts a screenshot of the video about a second further.
- the user is interested in purchasing that very same hoodie, or is at least interested in learning more about the hoodie product. This situation is depicted by the hand pointer pointing at the Amazon logo to make the selection (e.g., click).
- FIG. 14D depicts a screenshot of the video after the Amazon logo has been selected (shown here at second 4).
- a checkout-type box is displayed. The box lists data that is pertinent about the hoodie and that was either previously stored in the platform’s storage or, in another embodiment, is provided in real time by the vendor. In this example, the name of the item and the price are shown in the box.
- the box displays a payment option, that the platform is configured to determine is the preferred payment option, and a checkout button.
- the hand cursor is over the checkout button, indicating that the user wants to purchase the hoodie (i.e., checkout).
- FIG. 14E is a screenshot of the video after the user has clicked the checkout button (shown here still at second 4).
- the checkout button immediately or after a predetermined amount of time has changed to show the text, Processing, to indicate that the purchase is in progress, but not yet completed.
- FIG. 14F is a screenshot of the video after the purchase has been completed (shown here at second 6).
- the platform indicates that the purchase process is complete by changing the text on the button to display, Thank you!
- FIG. 14G is a screenshot of the video after the purchase has been completed, and in which the video continues to stream (shown here at second 7).
- the innovative platform is configured to enable the user to view the video and purchase the product shown in the video without ever leaving the video.
- Sellers e.g., clothing companies
- upload their videos e.g., to Creator Video Database 421.
- These videos are processed by the platform’s API (e.g., by Video Analytics Processor with API 412 and stored in Processed Video Database 414).
- API e.g., by Video Analytics Processor with API 412 and stored in Processed Video Database 414.
- the platform determines the top number (e.g., 20) of amount of video that matches (e.g., by API Ad Matching Processor 413) the specific consumer, and such video are presented at the top of the page (e.g., 1202 of FIG. 12).
- the platform determines the top number (e.g., 20) of amount of video that matches (e.g., by API Ad Matching Processor 413) the specific consumer, and such video are presented at the top of the page (e.g., 1202 of FIG. 12).
- the platform determines the top number (e.g., 20) of amount of video that matches (e.g., by API Ad Matching Processor 413) the specific consumer, and such video are presented at the top of the page (e.g
- the user will have the option of viewing each of the presented videos and, from the video, they can see the price and offers of each product through a short, e.g., 5-10 seconds, video. On clicking one of the videos they will be directed, by the platform, to that specific page where they can buy the product.
- a short e.g. 5-10 seconds
- the APIs innovative transform a website/app from a static page to a dynamic video/consumer-behavior-driven experience.
- the innovative APIs close the loop; they enable an end-to-end solution for the short video industry, including product discovery through the innovatively ads and e-commerce integration with it.
- the platform processes continuous authentication of the user on the user’s smartphone based on human behavioral patterns using mobile sensors.
- the innovation eliminates or reduces the requirement of receiving user input, such as typing or biometric input, frequently for user authentication, thereby providing a hassle free experience from the user having to type the password frequently. It has been found that secure private data, even passwords, can leak to the intruders.
- the innovation is a system, method, and/or platform and refers to any of such interchangeably herein (e.g., “system,” “process,” “platform,” or innovation).
- the innovation requires continuous authentication to solve the following problems.
- the intruder might be lucky (e.g., a side channel attack) and obtains the password of the smartphone; the intruder might then try to pull, gather, or obtain private information from the smartphone.
- the innovative platform is configured to automatically detect the intruder by using pattern analysis and implementation and restrict him/her from accessing data on the smartphone.
- Remote users can access the smartphone with the user’s username and password (using certain techniques such as snooping, phishing, or brute force) if the authentication credentials are in digital or text format, however it is almost impossible to mimic users using patterns.
- the innovative platform processes pattern recognition of user behavior using mobile phone sensor technology for automatic continuous user authentication.
- Login-based authentication checks a user's identity only once, at the start of a login session, whereas automatic continuous authentication recognizes the correct user for the duration of ongoing work.
- Secure passwords are often not considered to be appropriate because they are too lengthy and contain alphanumeric-symbols, which require users to have to spend their precious time in inputting passwords.
- the innovative platform eliminates or reduces requiring the existence of or the use of such secure passwords.
- the user’s fingertips often leave a distinguishing trace on the screen, which can indicate the pattern that was used to access the device.
- the innovative platform eliminates or reduces such distinguishing trace on the screen.
- Cyber security specialists suggest not to have a similar password for different accounts, thus it is a tedious task for users to memorize different passwords for each account. The innovation helps reduce this tedious task for users.
- the platform handles discrete data and, thus, samples continuous data.
- the system collects when the system detects motion and at every predetermined number of seconds (e.g., 12 seconds). It has been found that human activity often has frequency less than 20Hz. That is, for measurement in data from the gyroscope and accelerometer sensors, any data over 20 HZ will mean motion.
- the platform is configured to choose a sampling frequency that is equal to a predetermined amount (e.g., 50Hz) to avoid aliasing (e.g., in accordance with Nyquist criteria).
- the system collects data over 1hz to understand micro data for higher authentication.
- the smartphone has a number of sensors, such as touchscreen, proximity, magnetometer, accelerometer, gyroscope, GPS, etc.
- the platform implements the accelerometer, gyroscope, and GPS sensors as a component for authentication, since data gathered from such sensors are independent of environmental factors (e.g., in accordance with the Fischer scoring system).
- data can directly go to the cloud or authentication can happen in any part. That is, user data is collected to authenticate them.
- the platform is configured with an innovative sensor collecting app that collects activity data from different users.
- the innovative app automatically collects accelerometer, gyroscope and GPS sensor data.
- the collected data is temporarily stored in the smartphone storage.
- the app automatically uploads such data to the server.
- data is pushed every 5MB (megabyte) at a time.
- the app was distributed to different users and each user was distinguished using any operating system id (e.g., an Android Device ID).
- any operating system id e.g., an Android Device ID
- smartphone devices such as Apple iPhone, etc.
- Apple iPhone can be used, because the system looks into the device id, thus it can be any operating system.
- the platform collected data from 15 different users. On average, each user used their phone for about 5 hours a day.
- one aim was to find the pattern of the users using sensor data.
- the innovative app only required data from when users interacted with their phone.
- the innovative app ran in background and collected the data only when the user was active.
- GPS data was used.
- An end user 1502 manipulates a smartphone 1504 that is communicably connected with the platform’s server 1506.
- end user 1502 is viewer 410 of FIG.4.
- smartphone 1504 and its components reside on frontend 406 of FIG. 4.
- the components of server 1506 reside on backend 404 of FIG. 4.
- the system contains 4 main components: Server 1506, Smartphone 1504, Rest filter model (1516 and 1534), and innovative user authentication Convolutional Neural Network (CNN) model 1518.
- CNN Convolutional Neural Network
- the model is new to the user so it behaves randomly. Therefore, it must be trained with user data.
- the system continuously monitors and collects sensors (accelerometer and gyroscope) data from the smartphones.
- sensors accelerelerometer and gyroscope
- One main aim is to find the pattern of the users when they used their phone in the usual way while performing the activities such as typing, scrolling, calling, chatting, walking, holding, etc.
- the system collects the data primarily when the user is active.
- the collected data is temporarily stored in local storage until the data is sufficient for training the model.
- the size of data was restricted to 5mb. When stored data reaches up to 5mb, it automatically uploads to the server.
- the smartphone also sends GPS location (latitude, longitude) information during uploading to track the user’s location.
- GPS location latitude, longitude
- the server when the raw data is uploaded, it is retrieved from the database and is pre- processed by the rest filter model.
- the system considers the phone resting on the table as “rest data.” Rest data is invalid for training because such data have no features to distinguish legitimate users from intruders and thus the system removes such rest data.
- the training dataset is prepared by merging legitimate data with other participating legitimate user’s data at an equal ratio.
- the innovative user authentication CNN model is trained with the prepared dataset. After training, the models are saved to the database and downloaded to the smartphone.
- the smartphone is ready for continuous authentication.
- the sensors’ data are continuously accumulated at 50Hz.
- a time-series data from the different channels are segmented with a window of size 200x6.
- Features are extracted from segmented data and are fed to the rest filter model. Based on the features extracted, the rest filter model identifies the invalid/rest data. Only valid data is further processed.
- the innovative user authentication CNN model classifies whether the input data is legitimate or intruder.
- the platform classifies the users as legitimate or intruder when the model consistently outputs the same class (legitimate or intruder) a predetermined number of times (e.g., thrice); classified output is considered to be valid by the configured system when the prediction probability is above a predetermined threshold (e.g., 0.8).
- a predetermined threshold e.g. 0.
- the system explicitly uses the advantage of GPS location to track the legitimate user.
- a predetermined circle of radius e.g., 1km
- GPS location e.g., measured in latitude and longitude
- a predetermined value e.g. 0.95.
- the innovation’s unique user authentication CNN model is based on a supervised machine learning approach. For training the models, a large number of legitimate and intruder user’s data are required.
- the platform provides a newly built sensor collecting app to collect the data.
- Smartphones have a number of sensors such as touchscreen, proximity, magnetometer, accelerometer, gyroscope, GPS, etc.
- the platform implements the accelerometer, gyroscope, and GPS sensors as a component for authentication, since data gathered from such sensors are independent of environmental factors (e.g., in accordance with the Fischer scoring system).
- FIG. 16 is an accelerometer x-axis plot, in accordance with an embodiment. Such plot depicts a graph of the data from the accelerometer’s reading of a typical user’s handling of their smartphone. Filtering the rest data:
- the innovation groups the raw data into the window size of 200 and a different time, as well as frequency domain features, was extracted using the TSFEL library.
- the innovation uses the dominant features (listed below under Feature Extraction) for the dataset based on the value of mutual information.
- the corresponding Feature Extraction labels were computed by taking the mode of the values within the same window of 200 rows.
- the missing values and invalid numbers were replaced by the mean of the corresponding feature.
- the highly correlated features with Pearson correlation coefficient value greater than 0.95 were also removed. Only one of such features was kept. Also, the features with zero variance were also removed. After selecting the required features, the data were normalized to have zero mean and unit variance so that all the features come to the same scale.
- the system fed the data to the training model.
- the system used the innovative motion detection model, which is a decision tree-based ensemble learning method.
- the batch of data is first filtered with the innovative motion detection model/classifier algorithm (e.g., through the ASMI Motion Detection model).
- the model detects that data is variable and can impact the user authentication. For example, if one’s phone is on the table, the innovative motion detection model has no use of that data, but when the model detects moment, the model wants to know if you are "you" or an intruder. Thus, the innovation obtains data that has variability that was filtered by unique motion detection model.
- the platform is filtering data that has no use such as if the phone is charging; then, the platform has no use of that data. If the user falls asleep with the phone in their lap, that data has no use.
- the platform used the innovative motion detection model to separate valid and invalid data.
- the innovative motion detection model was separately trained with rest (e.g., smartphone on table) and motion (e.g., from users typical manipulation of their smartphones) data and accuracy of 99.99% was achieved. That is, the system trained both with rest data and then without rest data and found that the ultimate identification of the user was within 99.99% accuracy.
- raw data are collected by the accelerometer, gyroscope, and GPS sensors (1512 and 1513, respectively). Such data are fed to component 1516, to process the filter rest data using the innovative motion detection model.
- a feature extraction component 1514 performs feature extraction on the data.
- Such data are fed to component 1534, to process the filter rest data using the innovative motion detection model.
- TSFEL time series features extraction library
- Python Python programming language to extract the require features.
- time stamp extraction library is also used and for all other languages.
- the rest data is filtered out using the innovative motion detection model by component 1516.
- the resulting data are input in the innovative user authentication CNN model 1518, specifically, input into the prediction algorithm 1520. This is the result for the test that happens within the user device.
- the platform predicts if the user is intruder or legitimate.
- the platform causes the model component 1524 to load into the innovative user authentication CNN model 1518 (described in detail below).
- the model runs in a 90 second time interval. It should be appreciated that FIG. 15 depicts storage 1522 in three different locations for logical purposes. It further should be appreciated that local storage 1522 could represent three different storages, as well.
- the innovative user authentication CNN model is done, the result comes that either the user is an intruder or a legitimate user.
- the system runs the model again if the outcome is an intruder to confirm and if the user is an intruder more than three times, then another test happens in server and the user is marked illegitimate. If the user is legitimate then the user is enabled to do perform the above mentioned features (e.g., ASMI features) such as in-video ads, e-commerce, user behavior tracking and targeting.
- ASMI features e.g., ASMI features
- the system tests in a timely manner, e.g., every 90 seconds.
- the system gets constantly updated where system updates on a timely manner.
- such data are stored in local storage 1522 and then uploaded to the server 1506 into the storage of the server 1524.
- Such storage also stores the trained model, as shown by arrow 1526.
- raw data that had collected by the accelerometer, gyroscope, and GPS components are fed to a prepare training dataset component 1528.
- a raw data component 1530 for receiving the accelerometer and gyroscope data is fed to a GPS component 1531 for receiving the GPS data, and a feature extraction component 1532.
- Such data are fed to component 1534, to process the filter rest data using the innovative motion detection model.
- the feature extraction component 1532 performs feature extraction on the data in the same manner as described above for component 1514.
- the same list of possible types of feature extracted data shown above apply here, as well, and are consistent with embodiments herein.
- Such data are fed to component 1534, to process the filter rest data using the innovative motion detection model.
- the rest data are filtered out using the innovative motion detection model by component 1534.
- the data are fed to a component 1536 to merge legitimate and intruder data.
- This process puts the data from verified with unverified to see if the system can draw a pattern in the unverified data. For example, it’s similar to if a user wants to understand if a dollar bill verifying machine is working or not will be to mix the verified bill with unverified, then running on a whole bundle to see if the machine is working when the both types of notes are there.
- the resulting data of component 1536 is then fed into the training component 1538 to train the innovative user authentication CNN model.
- the innovative module can understand the various data coming from the user and understand the pattern in it based on the user device and how he or she is interacting.
- the innovation is about how the system is able to get the data and understand the pattern.
- the smartphone periodically applies the model to the sensor data (from components 1512 and 1513) and determines either that the end user is authenticated (as depicted by icon 1540) or is considered to be an intruder (as depicted by icon 1542).
- manipulating means using the phone.
- End user 1502 will use the phone and the data goes to the system, component 1520 predicts whether the user is a legitimate user or intruder (1542 or 1540).
- CNN User Authentication Convolutional Neural Network
- An embodiment can be understood with reference to FIG. 17, an innovative user authentication CNN model architecture.
- the platform uses used 1d convolution over the time series data from different channels (acc_x, acc_y, acc_z, gyro_x, gyro_y, gyro_z).
- the model consists of three convolutional blocks, a global max-pooling layer, dropout, and fully connected layers.
- Each convolutional block consists of 1d convolutional operation, batch normalization, and ReLU activation function.
- the innovation has replaced the max-pooling layer with atrous convolution. Grinding effect exits if all the block uses the same dilation rate. Thus, to remove the grinding effect, the innovation has used the dilation rate of (1, 2, and 3) respectively for each convolution.
- the three convolutional blocks extract abstract representations of the input time-series data.
- two models were used.
- One is the innovative motion detection model for filtering out rest data and another model is based on convolutional neural network which is used for authentication.
- the innovation compared the result of both machine learning as well as deep learning based methods. It was found that the machine learning models such as the innovative motion detection model, Random Forest, and SVM performed better than deep learning based methods when the training dataset was small. However, as the size of the training dataset increased, the innovative user authentication model based on Convolutional Neural Network (CNN) outperformed the result of machine learning based methods. Thus, embodiments consistent herein use the innovation user authentication CNN model for authentication, which obtained an accuracy of 94% on the dataset.
- CNN Convolutional Neural Network
- Table D shows the innovative motion detection model classifier rest filter model classification accuracy, in accordance with embodiments herein.
- a system for providing in-video product placement, in-video purchasing capability using augmented reality, and performing continuous user authentication for the purchase includes a smartphone, wherein the smartphone includes an accelerometer sensor, a gyroscope sensor, and a GPS sensor, wherein each sensor collects raw data from manipulations of the smartphone for purposes of user authentication; a rest filter model configured to receive the raw data and configured to process the raw data using a motion detection module to identify valid data and invalid data therefrom, wherein valid data is data reflecting motion of the smartphone and invalid data is data reflecting the smartphone at rest; a user authentication convolutional neural network (“CNN”) model configured to receive the identified valid and invalid data and generate a prediction that the manipulations of the smartphone are from a legitimate user or from an intruder; and a server configured to train and store the user authentication CNN model and further configured to download the trained user authentication CNN model to the smartphone on a predetermined time basis.
- CNN convolutional neural network
- system is configured to process a testing phase and a training phase, and when the user authentication CNN model is trained, the smartphone is configured to perform continuous user authentication.
- the raw data reflect typical activities including typing, scrolling, calling, chatting, walking, holding, and location
- the user authentication CNN model determines one or more patterns of typical behavior for a user, based on the raw data reflects typical activities.
- system is a component of a network-accessible platform that enables invideo product purchasing and wherein the system is configured to provide automatic user authentication is response to a request for purchasing an in-video product.
- the user authentication CNN model is trained on an on-going basis.
- the smartphone is configured to collect the raw data in local storage up to 5mb and when over 5mb, the smartphone automatically uploads the raw data to the server for training the user authentication CNN model and the smartphone is configured to send the GPS location data to the server along with the raw data.
- the sensors are configured to, at testing phase, continuously accumulate data at 50Hz.
- the smartphone further includes a feature extraction component that is configured to extract features from the data and feed such extracted features to the rest filter model and wherein the rest filter model is further configured to receive the extracted features and use the features for identifying the invalid data and further configured to send the valid data to the user authentication CNN model.
- a feature extraction component configured to extract features from the data and feed such extracted features to the rest filter model and wherein the rest filter model is further configured to receive the extracted features and use the features for identifying the invalid data and further configured to send the valid data to the user authentication CNN model.
- the features include: absolute energy; area under the curve; interquartile range; kurtosis; maximum value; minimum value; mean; median absolute deviation; standard deviation; maximum frequency; entropy; negative turning points;
- ECDF empirical cumulative distribution function
- the user authentication CNN model is configured to classify a user as legitimate or intruder when the model consistently outputs the same class (legitimate or intruder) thrice and the classified output is considered to be valid when the prediction probability is above 0.8.
- the system is further configured to use GPS to track a legitimate user by using a predetermined circle of radius, with center as the user’s recorded, during training, GPS location, that created, wherein the radius is the threshold that is considered and wherein during testing, if the current GPS location of the user’s lies within a circle, the identification of the location is determined valid and no penalty is given to its predicted probability, else the application assigns a penalty of 0.85, wherein enforcing the penalty improves the robustness of the innovative user authentication CNN model because, to overcome the penalty the user is required to have a prediction output greater than a predetermined value.
- the smartphone is configured to push data to the server every 5mb at a time and wherein the server is configured so that once the uploaded data achieves an accuracy rate that is flattening out, the data uploading stops.
- the smartphone is configured to collect data when the smartphone detects motion and at every 12 seconds.
- the user authentication CNN model runs in a 90 second time interval.
- the user authentication CNN model is configured to constantly learn from user data by being configured to determine that once a measurement of confidence is above a par value of 80%, then the model is considered confident is marked trained.
- the user when a user is watching a plurality of videos and they like an object on a particular video, the user is enabled to click and buy the object based on the user authentication CNN model producing an output that indicates that the user is legitimate.
- system is further configured to run the user authentication CNN model again if the user is classified as an intruder to confirm and if the user is classified as an intruder more than three times, then another test is performed in server and the user is marked illegitimate.
- the smartphone is marked as idle and the data are not further processed.
- a method for providing in-video product placement, in-video purchasing capability using augmented reality, and performing continuous user authentication for the purchase includes collecting raw data from manipulations of a smartphone for purposes of user authentication, wherein the smartphone comprises: an accelerometer sensor, a gyroscope sensor, and a GPS sensor, for collecting the data; receiving, at a rest filter model, the raw data and processing the raw data using a motion detection module to identify valid data and invalid data therefrom, wherein valid data is data reflecting motion of the smartphone and invalid data is data reflecting the smartphone at rest; receiving, by a user authentication convolutional neural network (“CNN”) model, the identified valid and invalid data and generating therefrom a prediction that the manipulations of the smartphone are from a legitimate user or from an intruder; and training and storing, by a server, the user authentication CNN model and downloading the trained user authentication CNN model to the smartphone on a predetermined time basis.
- CNN convolutional neural network
- a non-transitory computer readable medium having stored thereon instructions which, when executed by a processor, performs the steps for providing in-video product placement, in-video purchasing capability using augmented reality, and performing continuous user authentication for the purchase is disclosed.
- the performed steps include collecting raw data from manipulations of a smartphone for purposes of user authentication, wherein the smartphone comprises: an accelerometer sensor, a gyroscope sensor, and a GPS sensor, for collecting the data; receiving, at a rest filter model, the raw data and processing the raw data using a motion detection module to identify valid data and invalid data therefrom, wherein valid data is data reflecting motion of the smartphone and invalid data is data reflecting the smartphone at rest; receiving, by a user authentication convolutional neural network (“CNN”) model, the identified valid and invalid data and generating therefrom a prediction that the manipulations of the smartphone are from a legitimate user or from an intruder; and training and storing, by a server, the user authentication CNN model and downloading the trained user authentication CNN model to the smartphone on a predetermined time basis.
- CNN convolutional neural network
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Finance (AREA)
- Multimedia (AREA)
- Accounting & Taxation (AREA)
- Signal Processing (AREA)
- Strategic Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Development Economics (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Social Psychology (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Game Theory and Decision Science (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Psychiatry (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Animal Behavior & Ethology (AREA)
Abstract
L'invention concerne des techniques permettant d'améliorer la distribution numérique d'une vidéo demandée par un spectateur conjointement avec la meilleure publicité choisie pour le spectateur. Ces techniques peuvent être particulièrement appropriées pour l'industrie de la vidéo courte. L'invention concerne également un mécanisme d'analyse vidéo et un mécanisme d'analyse comportementale d'utilisateur novateurs, qui permettent de faire la publicité sur la vidéo de la meilleure concordance d'un produit précis sur la vidéo pour le spectateur particulier, pendant que ce dernier regarde la vidéo. En outre, l'invention concerne des techniques qui permettent au spectateur d'acheter le produit tout en restant dans la vidéo, sans avoir à quitter la vidéo ou le site pour effectuer l'achat. En outre, l'invention concerne des techniques permettant d'authentifier automatiquement les utilisateurs et leurs téléphones intelligents.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962944300P | 2019-12-05 | 2019-12-05 | |
US62/944,300 | 2019-12-05 | ||
US17/111,394 US20210176519A1 (en) | 2019-06-06 | 2020-12-03 | System and method for in-video product placement and in-video purchasing capability using augmented reality with automatic continuous user authentication |
US17/111,394 | 2020-12-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021113687A1 true WO2021113687A1 (fr) | 2021-06-10 |
Family
ID=76222316
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2020/063380 WO2021113687A1 (fr) | 2019-12-05 | 2020-12-04 | Système et procédé permettant un placement de produit dans la vidéo et une capacité d'achat dans la vidéo faisant appel à la réalité augmentée |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021113687A1 (fr) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100332454A1 (en) * | 2009-06-30 | 2010-12-30 | Anand Prahlad | Performing data storage operations with a cloud environment, including containerized deduplication, data pruning, and data transfer |
US20170109514A1 (en) * | 2014-12-18 | 2017-04-20 | Sri International | Continuous authentication of mobile device users |
US20170227995A1 (en) * | 2016-02-09 | 2017-08-10 | The Trustees Of Princeton University | Method and system for implicit authentication |
US20180012005A1 (en) * | 2016-07-11 | 2018-01-11 | Richard James Hallock | System, Method, and Apparatus for Personal Identification |
US20180032997A1 (en) * | 2012-10-09 | 2018-02-01 | George A. Gordon | System, method, and computer program product for determining whether to prompt an action by a platform in connection with a mobile device |
US20190324527A1 (en) * | 2018-04-20 | 2019-10-24 | Facebook Technologies, Llc | Auto-completion for Gesture-input in Assistant Systems |
-
2020
- 2020-12-04 WO PCT/US2020/063380 patent/WO2021113687A1/fr active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100332454A1 (en) * | 2009-06-30 | 2010-12-30 | Anand Prahlad | Performing data storage operations with a cloud environment, including containerized deduplication, data pruning, and data transfer |
US20180032997A1 (en) * | 2012-10-09 | 2018-02-01 | George A. Gordon | System, method, and computer program product for determining whether to prompt an action by a platform in connection with a mobile device |
US20170109514A1 (en) * | 2014-12-18 | 2017-04-20 | Sri International | Continuous authentication of mobile device users |
US20170227995A1 (en) * | 2016-02-09 | 2017-08-10 | The Trustees Of Princeton University | Method and system for implicit authentication |
US20180012005A1 (en) * | 2016-07-11 | 2018-01-11 | Richard James Hallock | System, Method, and Apparatus for Personal Identification |
US20190324527A1 (en) * | 2018-04-20 | 2019-10-24 | Facebook Technologies, Llc | Auto-completion for Gesture-input in Assistant Systems |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210168442A1 (en) | Computerized system and method for automatically detecting and rendering highlights from streaming videos | |
US10867221B2 (en) | Computerized method and system for automated determination of high quality digital content | |
US10827214B1 (en) | System and method for in-video product placement and in-video purchasing capability using augmented reality | |
KR102292193B1 (ko) | 멀티미디어 커머스 서비스 처리 장치 및 방법 | |
US20190278814A1 (en) | URL Normalization | |
JP2019531547A (ja) | 視覚検索クエリによるオブジェクト検出 | |
US10841651B1 (en) | Systems and methods for determining television consumption behavior | |
US20110238503A1 (en) | System and method for personalized dynamic web content based on photographic data | |
KR102191486B1 (ko) | 자동 광고 대행 서버, 자동으로 광고 매체를 위한 캠페인 정보를 생성하여 광고의 집행을 대행하는 방법 및 상기 방법을 실행하기 위한 컴퓨터 프로그램 | |
US20140207559A1 (en) | System and method for utilizing captured eye data from mobile devices | |
US10425687B1 (en) | Systems and methods for determining television consumption behavior | |
CN111699487A (zh) | 用于快速且安全的内容提供的系统 | |
KR20190093755A (ko) | 빅데이터기반 이미지 텍스트 인식 및 맞춤형 식재료 추천 방법 및 시스템 | |
CN110199275A (zh) | 用于在通信网络中广播个性化内容的装置 | |
US12056928B2 (en) | Computerized system and method for fine-grained event detection and content hosting therefrom | |
US20210176519A1 (en) | System and method for in-video product placement and in-video purchasing capability using augmented reality with automatic continuous user authentication | |
WO2021113687A1 (fr) | Système et procédé permettant un placement de produit dans la vidéo et une capacité d'achat dans la vidéo faisant appel à la réalité augmentée | |
KR101585920B1 (ko) | 사용자 온라인 활동 분석 방법, 사용자 단말 및 컴퓨터 판독 가능 기록 매체 | |
US20230078712A1 (en) | System and method for product placement and embedded marketing | |
KR20230050656A (ko) | 적응형 아이템 소비 플랫폼, 그 시스템 및 운영 방법 | |
KR20190117830A (ko) | 사용자 매체 이용 정보를 통한 빅데이터 분석을 통한 상품 추천 시스템 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20895636 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20895636 Country of ref document: EP Kind code of ref document: A1 |