Nothing Special   »   [go: up one dir, main page]

WO2016169193A1 - 用于检测点击作弊的方法及装置 - Google Patents

用于检测点击作弊的方法及装置 Download PDF

Info

Publication number
WO2016169193A1
WO2016169193A1 PCT/CN2015/089545 CN2015089545W WO2016169193A1 WO 2016169193 A1 WO2016169193 A1 WO 2016169193A1 CN 2015089545 W CN2015089545 W CN 2015089545W WO 2016169193 A1 WO2016169193 A1 WO 2016169193A1
Authority
WO
WIPO (PCT)
Prior art keywords
suspicious
content
click
cheating
group
Prior art date
Application number
PCT/CN2015/089545
Other languages
English (en)
French (fr)
Inventor
庄馨
田天
朱军
夏粉
张潼
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2016169193A1 publication Critical patent/WO2016169193A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present application relates to the field of network technologies, and in particular, to a method and apparatus for detecting click cheating.
  • the existing methods for detecting cheats in advertising crowds include: 1 discovering suspicious clickers by establishing rules describing the characteristics of individual clickers, and then judging cheating behavior.
  • the drawback of this technique is that because the source of crowdsourcing cheats is real user clicks rather than machines, the behavior is highly random and difficult to judge based on rules established by individual users or advertisers. 2 Determine whether there is a cheating behavior against it by observing the click traffic of an advertiser.
  • the drawback of this technique is that since crowdsourcing cheats come from real users, when they feel that cheating is detected, they can quickly adjust their behavior, thus invalidating the previous rules.
  • the present application provides a method and apparatus for detecting click cheating, which solves the technical problem of low detection efficiency and low detection precision for cheating clicks in the prior art.
  • the present application provides a method for detecting a cheating of a click, the method comprising: determining a suspicious click based on a number of times the predetermined content is clicked by the user within a predetermined time period; determining at least one group based on the suspicious click a suspicious user group suspected of cheating; determining a non-cheating user group to be excluded according to a keyword of the suspicious content clicked by the suspicious user group in the predetermined time period; and excluding non-cheating in the suspicious user group User groups to identify groups of cheating users.
  • the determining the suspicious click based on the number of times the predetermined content is clicked by the user within the predetermined time period comprises: obtaining the number of times each predetermined content is clicked by the user within the predetermined time period; determining each of the Whether the number of times the predetermined content is clicked satisfies the predetermined condition; the click corresponding to the predetermined content whose number of clicks satisfies the predetermined condition is determined as a suspicious click.
  • determining whether the number of clicks satisfies a predetermined condition comprises: determining whether the number of clicks is greater than or equal to a first predetermined threshold, and less than or equal to a second predetermined threshold; if yes, determining the The number of times clicked satisfies the predetermined condition.
  • the determining, according to the suspicious click, at least one group of suspicious users suspected of cheating comprises: obtaining related information of the suspicious clicks; determining at least one set of suspicious users based on the related information A group in which each group of suspicious user groups clicks on the same set of content during the same time period.
  • the information related to the suspicious click includes at least one of the following: identification information of the user corresponding to the suspicious click; identification information of the content corresponding to the suspicious click; and corresponding to the suspicious click time.
  • the determining the at least one set of suspicious user groups based on the related information comprises: clustering the suspicious clicks based on the related information, so that a user group corresponding to each cluster center is The same group of content is clicked in the same time period; the user group corresponding to each cluster center is determined as a group of suspicious users.
  • the determining a non-cheating user group to be excluded according to a keyword of the suspicious content clicked by the group of the suspicious user within the predetermined time period includes: acquiring a keyword of the suspicious content clicked by the suspicious user group in the predetermined time period; determining, according to the keyword, whether the suspicious content is a similar content; if yes, corresponding to the suspicious content
  • the suspicious user community is identified as a non-cheating user group to be excluded.
  • the determining whether the suspicious content is a similar content based on the keyword comprises: determining whether a proportion of a similar keyword in the keyword is greater than or equal to a predetermined ratio; if yes, determining the Suspicious content is similar content.
  • the present application provides an apparatus for detecting a cheating of a click, the apparatus comprising: a first determining unit, configured to determine a suspicious click based on a number of times the predetermined content is clicked by the user within a predetermined time period; a unit, configured to determine, according to the suspicious click, at least one group of suspicious users suspected of cheating; and a third determining unit, configured to: according to the suspicious content clicked by each group of the suspicious user group in the predetermined time period
  • the keyword determines a non-cheating user group to be excluded; and a fourth determining unit, configured to exclude the non-cheating user group in the suspicious user group to determine a cheating user group.
  • the first determining unit includes: an obtaining subunit, configured to acquire a number of times each predetermined content is clicked by the user in the predetermined time period; and a determining subunit, configured to determine each of the predetermined Whether the number of times the content is clicked satisfies a predetermined condition; the determining subunit is configured to determine a click corresponding to the predetermined content whose number of clicks meets the predetermined condition as a suspicious click.
  • the determining subunit is configured to: determine whether the number of clicks is greater than or equal to a first predetermined threshold, and less than or equal to a second predetermined threshold; if yes, determine that the number of clicks is satisfied Predetermined conditions.
  • the second determining unit includes: an information acquiring subunit, configured to acquire related information of the suspicious click; and a user group determining subunit, configured to determine at least one set of suspicious based on the related information A group of users, in which each group of suspicious users clicks on the same set of content during the same time period.
  • the information related to the suspicious click includes at least one of the following: identification information of the user corresponding to the suspicious click; identification information of the content corresponding to the suspicious click; and corresponding to the suspicious click time.
  • the user group determining subunit configuration is configured to: The related information clusters the suspicious clicks, so that the user groups corresponding to each cluster center click the same group of content in the same time period; the user groups corresponding to each cluster center are determined as a group of suspicious Client.
  • the third determining unit includes: a keyword obtaining subunit, configured to acquire a keyword of the suspicious content clicked by the group of the suspicious user within the predetermined time period; the category judging a unit, configured to determine, according to the keyword, whether the suspicious content is a similar content; the group determining sub-unit to be excluded, configured to determine, in response to the suspicious content as a similar content, the suspicious user group corresponding to the suspicious content as being to be excluded Non-cheating user groups.
  • the category determining subunit is configured to: determine whether a proportion of the same type of keywords in the keyword is greater than or equal to a predetermined ratio; if yes, determine that the suspicious content is a similar content.
  • the method and device for detecting click cheating provided by the present application determine the suspicious user group suspected of cheating by narrowing the detection range, and the non-cheating user in the suspicious user group according to the keyword of the suspicious content clicked by the suspicious user group Group exclusion, which enables monitoring of clicks on scheduled content, improves the efficiency and detection accuracy of detecting cheating clicks, and reduces waste of time and resources.
  • FIG. 1 is a flowchart of an embodiment of a method for detecting click cheating provided by an embodiment of the present application
  • FIG. 2 is a flowchart of an embodiment of a method for determining a suspicious click provided by an embodiment of the present application
  • FIG. 3 is a flowchart of an embodiment of a method for determining at least one suspicious user group suspected of cheating according to a suspicious click according to an embodiment of the present application
  • FIG. 4 is a flowchart of an embodiment of a method for determining a non-cheating user group to be excluded according to keywords of suspicious content clicked by each group of suspicious user groups in the predetermined time period according to an embodiment of the present application;
  • FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for detecting click cheating provided by an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server of an embodiment of the present application.
  • FIG. 1 a flow 100 of one embodiment of a method for detecting click cheating is shown.
  • a suspicious click is determined based on the number of times the predetermined content is clicked by the user within a predetermined time period.
  • the predetermined content is content that may be involved in cheating clicks, such as content that can benefit from clicks (such as advertisements, voting, social networking sites, etc.), the greater the number of clicks on which the content is clicked, and The greater the benefits to its associated beneficiaries or units.
  • the beneficiaries associated with the scheduled content pass the task publisher, publish the task of clicking the scheduled content (such as advertising) on the crowdsourcing platform, and then the task publisher organizes a large number of netizens to receive the problem. Tasks, netizens complete tasks by clicking on the scheduled content, thereby obtaining rewards for completing the task.
  • the predetermined period of time may select a period of time in which the predetermined content is clicked.
  • the distribution of the click corresponding to the predetermined content on the time axis may be acquired, and the distribution density is greater than a predetermined threshold. A period of time is taken as a predetermined period of time.
  • the amount of clicks on the predetermined content at each time may also be acquired, and a period of time in which the amount of clicks each time is greater than a predetermined threshold is taken as the predetermined time period. It can be understood that there may be other implementation manners for selecting a predetermined time period, which is not limited in this application.
  • the suspicious click may be determined based on the number of times the predetermined content is clicked by the user within the predetermined time period.
  • step 102 at least one suspicious user group suspected of cheating is determined according to the suspicious click.
  • a cheating task publisher may publish a click task containing multiple predetermined content to a crowdsourcing platform at a time, for example, publishing a click task containing 10 advertisements to the crowdsourcing platform.
  • the cheating clicker After the cheating clicker receives the cheat click task, it usually clicks on a set of scheduled contents in the task within a certain period of time. Therefore, the user group performing the task (clicking on a predetermined set of contents corresponding to the task) can be determined by analyzing related information of the suspicious click (such as click time, clicked content, etc.). A group of users who clicked on the same set of predetermined content was identified as a suspicious user group suspected of cheating.
  • the above determined suspicious clicks may include multiple sets of predetermined content corresponding to different cheating click tasks.
  • the predetermined content corresponding to the same cheating click task is a set of predetermined content.
  • each set of scheduled content can also correspond to a group of suspicious users suspected of cheating. Therefore, the suspicious click determined above corresponds to at least one group of suspicious users suspected of cheating.
  • step 103 the non-cheating user group to be excluded is determined according to the keyword of the suspicious content clicked by each group of suspicious user groups within a predetermined time period.
  • some scheduled content may be more popular during a certain period of time, with a large amount of user clicks, or some users may be interested in a related batch of predetermined content within a certain period of time. For example, if a flu occurs in a certain area at a certain time, then there may be a large number of living in the area. People search online for anti-flu or flu-preventing drugs, and the ads or websites that these residents click on may overlap and are related to the flu. For another example, a certain season is more suitable for traveling to a certain area. In this season, a large number of users may search online and click on advertisements or websites related to travel to the area. The content clicked by these users may also partially overlap, and both Related to tourism in the area.
  • These predetermined content does not involve cheating clicks, and users who click on these predetermined content are not cheat user groups, but may be identified as cheating user groups suspected of cheating. Therefore, these non-cheating user groups need to be cheated. Find and exclude suspected suspicious user groups.
  • the suspicious content is a predetermined content that the suspicious user group clicks within the predetermined time period.
  • the non-cheating user group to be excluded may be determined according to the keywords of the suspicious content clicked by each group of suspicious user groups. If the keyword proximity of all suspicious content clicked by a certain group of suspicious users is high, it can be determined that the user group is a non-cheating user group.
  • step 104 the non-cheating user population in the suspicious user population is excluded to determine the cheating user population.
  • the non-cheating user group in the suspicious user group is excluded, and the remaining suspicious user groups are determined as the cheating user group.
  • the method provided by the foregoing embodiment of the present application determines a suspicious user group suspected of cheating by narrowing the detection range, and excludes the non-cheating user group in the suspicious user group according to the keyword of the suspicious content clicked by the suspicious user group, thereby
  • the monitoring of the clicks of the predetermined content is realized, the efficiency of detecting cheating clicks and the detection precision are improved, and the waste of time and resources is reduced.
  • FIG. 2 a flow 200 of one embodiment of a method of determining suspicious clicks is illustrated.
  • step 201 the number of times each predetermined content is clicked by the user within the predetermined time period is acquired.
  • the number of times each predetermined content is clicked by the user in the predetermined time period can be obtained by the click log of the predetermined content. It can be understood that each predetermined content in the predetermined time period can be obtained by other means. The number of times the user clicked. Ben Shen Please do not limit this aspect.
  • step 202 it is determined whether the number of times each predetermined content is clicked satisfies a predetermined condition.
  • a predetermined threshold a the likelihood that the predetermined content involves cheating is small. Because the purpose of cheating is to increase the amount of clicks, if the predetermined content involves cheating, the number of times it is clicked must not be too small.
  • the probability that the predetermined content involves cheating will be small. Because cheating can increase the amount of clicks, the scale of organizational cheating is usually limited, and it is impossible to reach an excessive level. For example, suppose a cheating click can increase the number of clicks 1000, and if the number of times a predetermined content is clicked is 10,000, it can be judged that the predetermined content must not involve cheating. Because even if the predetermined content involves cheating, then the corresponding normal click volume is closer to an order of magnitude larger than the amount of clicks that can be increased by cheating clicks. Therefore, it is not meaningful to increase the amount of clicks by cheating.
  • the predetermined condition is a condition that the predetermined content that may be involved in cheating is satisfied by the number of clicks, and firstly, it is determined whether the number of times the predetermined content is clicked within the predetermined time period is greater than or equal to a first predetermined threshold, and less than or equal to the second predetermined. Threshold. If the number of times is greater than or equal to the first predetermined threshold and less than or equal to the second predetermined threshold, it is determined that the number of times satisfies the predetermined condition.
  • step 203 the click corresponding to the predetermined content whose number of clicks satisfies the predetermined condition is determined as a suspicious click.
  • the predetermined content is likely to involve cheating, and the corresponding click is determined as a suspicious click. It should be noted that suspicious clicks do not necessarily mean cheating clicks, because even if a content involves cheating, the content will also be clicked normally by non-cheating users.
  • a flow 300 of one embodiment of a method of determining at least one suspicious user population suspected of cheating based on suspicious clicks is illustrated.
  • step 301 information related to suspicious clicks is obtained.
  • the related information of the suspicious click may include at least one of the following: The identification information of the user corresponding to the suspicious click; the identification information of the content corresponding to the suspicious click; and the time corresponding to the suspicious click.
  • the identifier information of the user corresponding to the suspicious click may be the MAC address of the user who performs the suspicious click, or the IP address, or the serial number of the terminal device (such as a mobile phone, a computer, etc.), and the present application has suspicious clicks.
  • the specific content and form of the corresponding user identification information are not limited.
  • the content identification information corresponding to the suspicious click may be the name of the content clicked by the suspicious click, or the information such as the number used to identify or distinguish the content, and the specific content and form of the content identification information corresponding to the suspicious click in the present application.
  • the time corresponding to the suspicious click may be the time corresponding to the user performing the suspicious click described above.
  • information related to suspicious clicks can be obtained from the click log. It can be understood that the related information of the suspicious click can also be obtained by other means, and the manner in which the present application obtains the relevant information of the suspicious click is not limited.
  • step 302 at least one set of suspicious user groups is determined based on the related information, wherein each set of suspicious user groups clicks on the same set of content within the same time period.
  • the suspicious user group may be determined based on the foregoing related information, wherein there may be one or more groups of suspicious user groups, and each group of suspicious user groups clicks the same group of content in the same time period.
  • a non-parametric clustering algorithm may be used to determine a suspicious user group. Specifically, first, all suspicious clicks are clustered based on the above related information, so that each cluster center The corresponding user group clicks on the same set of content in the same time period. The user group corresponding to each cluster center is then determined as a group of suspicious users.
  • FIG. 4 there is shown a flow 400 of one embodiment of a method for determining a non-cheating user population to be excluded based on keywords of suspicious content clicked by each set of suspicious user groups during the predetermined time period.
  • step 401 keywords of suspicious content clicked by each group of suspicious user groups within the predetermined time period are acquired.
  • the suspicious content is a predetermined content that the suspicious user group clicks within the predetermined time period. It should be noted that users in the suspicious user group may also click other content that does not involve cheating within the above predetermined time period, but these do not involve cheating. The content is not related to suspicious clicks and therefore will not be judged as suspicious.
  • the keyword of the suspicious content is a word that best reflects various characteristics of the suspicious content.
  • the keyword may be a category of an advertisement product (drug), a name of a disease that the drug can treat, a name of a pharmaceutical factory that produces the drug, and the most important chemical component of the drug. Name and so on.
  • the content of the suspicious content may be parsed to obtain related keywords.
  • the related keywords may also be obtained from the name or identification information of the above suspicious content. It can be understood that there may be other ways of obtaining keywords related to suspicious content, and the method for obtaining keywords related to suspicious content is not limited in the present application.
  • step 402 it is determined whether the suspicious content is the same content based on the keyword.
  • different suspicious content may be determined according to keywords corresponding to different suspicious content. Specifically, firstly, among the keywords of a group of suspicious content clicked by each group of suspicious user groups in the predetermined time period, whether the proportion of the similar keywords is greater than or equal to a predetermined ratio. If the proportion of the similar keywords is greater than or equal to a predetermined ratio, it is determined that the suspicious content is the same content.
  • step 403 the suspicious user group corresponding to the suspicious content is determined as a non-cheating user group to be excluded.
  • FIG. 5 a block diagram of one embodiment of an apparatus for detecting click cheating in accordance with the present application is shown.
  • the apparatus 500 of this embodiment includes: a first determining unit 501, a second determining unit 502, a third determining unit 503, and a fourth determining unit 504.
  • the first determining unit 501 is configured to determine, according to the number of times the predetermined content is clicked by the user within a predetermined time period. Suspicious clicks.
  • the second determining unit 502 is configured to determine, according to the suspicious click, at least one group of suspicious users suspected of cheating.
  • the third determining unit 503 is configured to determine, according to the keyword of the suspicious content clicked by the group of the suspicious user within the predetermined time period, the non-cheating user group to be excluded.
  • the fourth determining unit 504 is configured to exclude the non-cheating user group in the suspicious user group to determine a cheating user group.
  • the first determining unit 501 includes an obtaining subunit, a determining subunit, and a determining subunit (not shown).
  • the obtaining subunit is configured to acquire the number of times each predetermined content is clicked by the user in the predetermined time period.
  • the judging subunit is configured to judge whether the number of times each of the predetermined contents is clicked satisfies a predetermined condition.
  • the determining subunit is configured to determine a click corresponding to the predetermined content whose number of clicks meets the predetermined condition as a suspicious click.
  • the determining subunit is configured to: determine whether the number of clicks is greater than or equal to a first predetermined threshold, and less than or equal to a second predetermined threshold. If so, it is determined that the number of times the click is made satisfies a predetermined condition.
  • the second determining unit 502 includes an information acquisition subunit and a user population determination subunit (not shown).
  • the information obtaining subunit is configured to acquire related information of the suspicious click.
  • the user group determining subunit is configured to determine at least one set of suspicious user groups based on the related information, wherein each group of suspicious user groups clicks the same set of content within the same time period.
  • the information related to the suspicious click includes at least one of the following: identification information of the user corresponding to the suspicious click; identification information of the content corresponding to the suspicious click; and corresponding to the suspicious click Moment.
  • the user group determining subunit is configured to: cluster the suspicious clicks based on the related information, so that a user group corresponding to each cluster center clicks in the same time period The same group of content; the user group corresponding to each cluster center is determined as a group of suspicious users.
  • the third determining unit 503 includes a keyword acquisition subunit, a category determination subunit, and a to-be-excluded group determination subunit (not shown).
  • the keyword acquisition subunit is configured to acquire keywords of the suspicious content clicked by the group of the suspicious user group within the predetermined time period.
  • a category determining subunit for determining the ok based on the keyword Whether the content is similar.
  • the to-be-excluded group determining sub-unit is configured to determine the suspicious user group corresponding to the suspicious content as the non-cheating user group to be excluded in response to the suspicious content being the same content.
  • the category determining subunit is configured to: determine whether a proportion of the same type of keywords in the keyword is greater than or equal to a predetermined ratio; if yes, determine that the suspicious content is a similar content.
  • the device 500 may be preset in the server, or may be loaded into the server by downloading or the like. Corresponding units in device 500 can cooperate with units in the server to implement a scheme for detecting click cheating.
  • FIG. 6 a block diagram of a computer system 600 suitable for use in implementing a terminal device or server of an embodiment of the present application is shown.
  • computer system 600 includes a central processing unit (CPU) 601 that can be loaded into a program in random access memory (RAM) 603 according to a program stored in read only memory (ROM) 602 or from storage portion 608. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM random access memory
  • various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • an embodiment of the present disclosure includes a computer program A sequential product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
  • each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more logic for implementing the specified.
  • Functional executable instructions can also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the unit modules described in the embodiments of the present application may be implemented by software or by hardware.
  • the described unit modules may also be provided in the processor, for example, which may be described as a processor comprising an operation first determining unit, a second determining unit, a third determining unit and a fourth determining unit.
  • the names of the unit modules do not constitute a limitation on the unit modules themselves in some cases.
  • the first determining unit may also be described as “for determining the number of times the predetermined content is clicked by the user within a predetermined time period. Unit of suspicious clicks.”
  • the present application further provides a computer readable storage medium, which may be a computer readable storage medium included in the apparatus described in the foregoing embodiment, or may exist separately, not A computer readable storage medium that is assembled into a terminal.
  • the computer readable storage medium stores one or more programs that are used by one or more processors to perform the methods described herein for detecting click cheating.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

用于检测点击作弊的方法及装置。所述方法包括:基于预定时间段内预定内容被用户点击的次数确定可疑的点击(101);根据所述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体(102);根据每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体(103);以及排除所述可疑用户群体中非作弊用户群体以确定作弊用户群体(104)。实现了对预定内容的点击进行监控,提高了检测作弊点击的效率和检测精度,减少了时间和资源的浪费。

Description

用于检测点击作弊的方法及装置
相关申请的交叉引用
本申请要求申请日为2015年4月24日,申请号为201510202474.9,发明名称为“用于检测点击作弊的方法及装置”的中国专利申请的优先权,其全部内容作为整体并入本申请中。
技术领域
本申请涉及网络技术领域,尤其涉及用于检测点击作弊的方法及装置。
背景技术
随着移动互联网众包技术的迅猛发展,一类通过众包网站发布任务,组织网民人工点击特定广告,并给予一定费用的作弊方式开始兴起。这类作弊行为均是由人工真实的触发,具有很强的隐蔽性,很难用传统的方法进行检测。
目前,现有的对广告众包作弊的检测方法包括:①通过建立描述单个点击者行为特征的规则,来发现可疑点击者,进而判断作弊行为。这种技术的缺陷在于:因为众包作弊来源为真实的用户点击而非来自机器,因此行为有很大的随机性,很难用基于单个用户或广告主建立的规则去判断。②通过观察一个广告主的点击流量来判断是否有针对它的作弊行为。这种技术的缺陷在于:由于众包作弊来自真实用户,当他们感受到作弊被检出后,可以迅速调整自身行为方式,从而使之前的规则失效。③通过寻找点击日志中行为一致的一批点击者,来判断作弊行为。这种技术的缺陷在于:该方法针对人工众包作弊检测实用性不大。
发明内容
为了解决上述问题,本申请提供了一种用于检测点击作弊的方法及装置,解决了现有技术中对作弊点击的检测效率不高,检测精度低的技术问题。
第一方面,本申请提供了一种用于检测点击作弊的方法,所述方法包括:基于预定时间段内预定内容被用户点击的次数确定可疑的点击;根据所述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体;根据每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体;以及排除所述可疑用户群体中非作弊用户群体以确定作弊用户群体。
在某些实施方式中,所述基于预定时间段内预定内容被用户点击的次数确定可疑的点击,包括:获取所述预定时间段内每个预定内容被用户点击的次数;判断所述每个预定内容被点击的次数是否满足预定条件;将被点击的次数满足预定条件的预定内容所对应的点击确定为可疑的点击。
在某些实施方式中,判断所述被点击的次数是否满足预定条件,包括:判断所述被点击的次数是否大于等于第一预定阈值,且小于等于第二预定阈值;如果是,确定所述被点击的次数满足预定条件。
在某些实施方式中,所述根据所述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体,包括:获取所述可疑的点击的相关信息;基于所述相关信息确定至少一组可疑用户群体,其中,每组可疑用户群体在相同时间段内点击同一组内容。
在某些实施方式中,所述可疑的点击的相关信息包括以下至少一项:可疑的点击所对应的用户的标识信息;可疑的点击所对应的内容的标识信息;以及可疑的点击所对应的时刻。
在某些实施方式中,所述基于所述相关信息确定至少一组可疑用户群体,包括:基于所述相关信息对所述可疑的点击进行聚类,使得每个聚类中心对应的用户群体在相同时间段内点击同一组内容;将所述每个聚类中心对应的用户群体确定为一组可疑用户群体。
在某些实施方式中,所述根据每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体, 包括:获取每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词;基于所述关键词判断所述可疑内容是否为同类内容;如果是,将所述可疑内容对应的可疑用户群体确定为待排除的非作弊用户群体。
在某些实施方式中,所述基于所述关键词判断所述可疑内容是否为同类内容,包括:判断所述关键词中同类关键词所占比例是否大于等于预定比例;如果是,确定所述可疑内容为同类内容。
第二方面,本申请提供了一种用于检测点击作弊的装置,所述装置包括:第一确定单元,用于基于预定时间段内预定内容被用户点击的次数确定可疑的点击;第二确定单元,用于根据所述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体;第三确定单元,用于根据每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体;以及第四确定单元,用于排除所述可疑用户群体中非作弊用户群体以确定作弊用户群体。
在某些实施方式中,所述第一确定单元包括:获取子单元,用于获取所述预定时间段内每个预定内容被用户点击的次数;判断子单元,用于判断所述每个预定内容被点击的次数是否满足预定条件;确定子单元,用于将被点击的次数满足预定条件的预定内容所对应的点击确定为可疑的点击。
在某些实施方式中,所述判断子单元配置用于:判断所述被点击的次数是否大于等于第一预定阈值,且小于等于第二预定阈值;如果是,确定所述被点击的次数满足预定条件。
在某些实施方式中,所述第二确定单元包括:信息获取子单元,用于获取所述可疑的点击的相关信息;用户群体确定子单元,用于基于所述相关信息确定至少一组可疑用户群体,其中,每组可疑用户群体在相同时间段内点击同一组内容。
在某些实施方式中,所述可疑的点击的相关信息包括以下至少一项:可疑的点击所对应的用户的标识信息;可疑的点击所对应的内容的标识信息;以及可疑的点击所对应的时刻。
在某些实施方式中,所述用户群体确定子单元配置用于:基于所 述相关信息对所述可疑的点击进行聚类,使得每个聚类中心对应的用户群体在相同时间段内点击同一组内容;将所述每个聚类中心对应的用户群体确定为一组可疑用户群体。
在某些实施方式中,所述第三确定单元包括:关键词获取子单元,用于获取每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词;类别判断子单元,用于基于所述关键词判断所述可疑内容是否为同类内容;待排除群体确定子单元,用于响应于可疑内容为同类内容,将所述可疑内容对应的可疑用户群体确定为待排除的非作弊用户群体。
在某些实施方式中,所述类别判断子单元配置用于:判断所述关键词中同类关键词所占比例是否大于等于预定比例;如果是,确定所述可疑内容为同类内容。
本申请提供的用于检测点击作弊的方法及装置,通过缩小检测范围,确定有作弊嫌疑的可疑用户群体,并根据可疑用户群体所点击的可疑内容的关键词将可疑用户群体中的非作弊用户群体排除,从而实现了对预定内容的点击进行监控,提高了检测作弊点击的效率和检测精度,减少了时间和资源的浪费。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请实施例提供的用于检测点击作弊的方法的一个实施例的流程图;
图2是本申请实施例提供的确定可疑的点击的方法的一个实施例的流程图;
图3是本申请实施例提供的根据可疑的点击确定至少一组有作弊嫌疑的可疑用户群体的方法的一个实施例的流程图;
图4是本申请实施例提供的根据每组可疑用户群体在上述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体的方法的一个实施例的流程图;
图5是本申请实施例提供的用于检测点击作弊的装置的一个实施例的结构示意图;
图6适于用来实现本申请实施例的终端设备或服务器的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
请参考图1,其示出了用于检测点击作弊的方法的一个实施例的流程100。
如图1所示,在步骤101中,基于预定时间段内预定内容被用户点击的次数确定可疑的点击。
在本实施例中,预定内容为有可能涉及作弊点击的内容,诸如一些能够靠点击量获得利益的内容(如广告、投票以及社交网站等等),这些内容被点击的点击量越大,和其相关的受益人或单位获得的收益就越大。一般来说,利用众包平台进行作弊,通常是与预定内容相关的受益者通过任务发布者,在众包平台发布点击预定内容(如广告等)的任务,然后由任务发布者组织大批网民领取任务,网民通过点击预定内容完成任务,从而获得完成任务的奖励。
由此可见,通过众包平台进行作弊点击,点击时间一般都比较集中,通常发生在发布点击任务之后的一定时间段内。所以基本可以排除点击预定内容不够集中的时间段内包含作弊点击的可能。因此,预定时间段可以选取点击预定内容比较集中的一段时间段,具体来说,在一种实现方式中,可以获取对应于预定内容的点击在时间轴上的分布,取分布密度大于预定阈值的一段时间段作为预定时间段。在另一 种实现方式中,也可以获取每个时刻对预定内容的点击量,取连续每个时刻点击量均大于预定阈值的一段时间段作为预定时间段。可以理解,还可以有其它的选取预定时间段的实现方式,本申请对此方面不限定。
在本实施例中,为了能够提高检测作弊点击的效率,减少时间和资源的浪费,首先,可以先确定一些可能与作弊相关的点击作为可疑的点击,排除一些作弊的可能性较小的点击,从而减小了检测的范围。在后续的检测过程中,只在可疑点击的范围中进行检测。具体来说,可以基于预定时间段内预定内容被用户点击的次数确定可疑的点击。
接着,在步骤102中,根据上述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体。
一般来说,进行作弊的任务发布者可能一次发布一个包含多个预定内容的点击任务到众包平台,例如,向众包平台发布一个包含10个广告的点击任务。进行作弊的网民在领取作弊点击任务后,通常会在一定时间段内将任务中的一组预定内容全部进行点击。因此,可以通过对可疑点击的相关信息(如点击时间,点击的内容等等)进行分析,确定执行该任务(点击对应于该任务的一组预定内容)的用户群体。将点击同一组预定内容的用户群体确定为具有作弊嫌疑的可疑用户群体。
因为,在一定时间段内,可能有多个任务发布者发布点击任务,所以,上述确定的可疑点击中可能包含多组对应于不同作弊点击任务的预定内容。其中,对应于同一个作弊点击任务的预定内容为一组预定内容。同时,每组预定内容也可以对应一组有作弊嫌疑的可疑用户群体。因此,上述确定的可疑点击对应至少一组有作弊嫌疑的可疑用户群体。
继而,在步骤103中,根据每组可疑用户群体在预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体。
一般来说,有些预定内容可能在一定时期较为热门,具有很大的用户点击量,或者某些用户在某个时间段内会对相关的一批预定内容感兴趣。例如,某个时期某区域爆发流感,那么该区域可能有大批居 民在网上搜索抗流感或者预防流感的药物等,这些居民点击的广告或者网站可能会有部分重叠,并且均与流感相关。又例如,某个季节比较适合去某地区旅游,那么在该季节可能有大量用户在网上搜索并点击与去该地区旅游相关的广告或者网站,这些用户点击的内容也可能有部分重叠,并且均与该地区的旅游相关。
上述这些预定内容并没有涉及作弊点击,而点击上述这些预定内容的用户也并非作弊用户群体,但有可能被确定为有作弊嫌疑的作弊用户群体,因此,需要将这些非作弊用户群体从有作弊嫌疑的可疑用户群体中找出并排除。
在本实施例中,可疑内容为可疑用户群体在上述预定时间段内所点击的预定内容。可以根据每组可疑用户群体所点击的可疑内容的关键词确定待排除的非作弊用户群体。如果某组可疑用户群体所点击的所有可疑内容的关键词近似度较高,则可以确定该用户群体为非作弊用户群体。
最后,在步骤104中,排除可疑用户群体中非作弊用户群体以确定作弊用户群体。
在本实施例中,将可疑用户群体中的非作弊用户群体加以排除,则将剩下的可疑用户群体确定为作弊用户群体。
本申请的上述实施例提供的方法,通过缩小检测范围,确定有作弊嫌疑的可疑用户群体,并根据可疑用户群体所点击的可疑内容的关键词将可疑用户群体中的非作弊用户群体排除,从而实现了对预定内容的点击进行监控,提高了检测作弊点击的效率和检测精度,减少了时间和资源的浪费。
进一步参考图2,其示出了确定可疑的点击的方法的一个实施例的流程200。
如图2所示,在步骤201中,获取上述预定时间段内每个预定内容被用户点击的次数。
在本实施例中,可以通过预定内容的点击日志来获取上述预定时间段内每个预定内容被用户点击的次数,可以理解,还可以通过其它的方式获取上述预定时间段内每个预定内容被用户点击的次数。本申 请对此方面不限定。
接着,在步骤202中,判断每个预定内容被点击的次数是否满足预定条件。
一般来说,如果一段时间内预定内容被点击的次数过小,如,小于一个预定阈值a,那么该预定内容涉及作弊的可能性就很小。因为,作弊的目的就是为了增加点击量,如果该预定内容涉及了作弊,其被点击的次数一定不会太小。
而如果一段时间内预定内容被点击的次数过大,如,大于一个预定阈值b,那么该预定内容涉及作弊的可能性也会很小。因为,作弊虽然能够增加点击量,但组织作弊的规模通常会比较有限,不可能达到一个过高的量级。例如,假设作弊点击能够增加点击量1000,如果某预定内容被点击的次数为10000,则可以判断该预定内容一定没有涉及作弊。因为,即使该预定内容涉及作弊,那么其对应的正常点击量也比作弊点击所能增加的点击量大接近一个数量级,所以,通过作弊增加点击量的意义并不大。
在本实施例中,预定条件为可能涉及作弊的预定内容被点击的次数所满足的条件,首先判断预定时间段内预定内容被点击的次数是否大于等于第一预定阈值,且小于等于第二预定阈值。如果该次数大于等于第一预定阈值且小于等于第二预定阈值,则确定该次数满足预定条件。
最后,在步骤203中,将被点击的次数满足预定条件的预定内容所对应的点击确定为可疑的点击。
在本实施例中,如果某预定内容被点击的次数满足预定条件,则该预定内容很有可能涉及作弊,将其对应的点击确定为可疑的点击。需要说明的是,可疑的点击并不意味着一定是作弊点击,因为,即使某内容涉及作弊,该内容也同样会被非作弊用户正常点击。
进一步参考图3,其示出了根据可疑的点击确定至少一组有作弊嫌疑的可疑用户群体的方法的一个实施例的流程300。
如图3所示,在步骤301中,获取可疑的点击的相关信息。
在本实施例中,可疑的点击的相关信息可以包括以下至少一项: 可疑的点击所对应的用户的标识信息;可疑的点击所对应的内容的标识信息;以及可疑的点击所对应的时刻。
具体来说,可疑的点击所对应的用户的标识信息可以是执行该可疑点击的用户的MAC地址,或者IP地址,或者终端设备(如手机、电脑等)的序列号等,本申请对可疑点击对应的用户标识信息的具体内容和形式不限定。可疑的点击所对应的内容标识信息可以是被可疑的点击所点击过的内容的名称,或者编号等用于标识或区分内容的信息,本申请对可疑点击对应的内容标识信息的具体内容和形式不限定。可疑的点击所对应的时刻可以是用户执行上述可疑的点击时对应的时刻。在本实施例中,可疑的点击的相关信息可以从点击日志中获得。可以理解,可疑的点击的相关信息也可以通过其它的方式获得,本申请对获得可疑的点击的相关信息的方式不限定。
接着,在步骤302中,基于相关信息确定至少一组可疑用户群体,其中,每组可疑用户群体在相同时间段内点击同一组内容。
在本实施例中,可以基于上述相关信息确定可疑用户群体,其中,可以有一组或多组可疑用户群体,每组可疑用户群体在相同时间段内点击同一组内容。
在本实施例的一种实现中,可以采用非参数化的聚类算法确定可疑用户群体,具体来说,首先基于上述相关信息对所有的可疑的点击进行聚类分析,使得每个聚类中心对应的用户群体在相同时间段内点击同一组内容。然后将每个聚类中心对应的用户群体确定为一组可疑用户群体。
请参考图4,其示出了根据每组可疑用户群体在上述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体的方法的一个实施例的流程400。
如图4所示,在步骤401中,获取每组可疑用户群体在上述预定时间段内所点击的可疑内容的关键词。
在本实施例中,可疑内容为可疑用户群体在上述预定时间段内所点击的预定内容。需要说明的是,可疑用户群体中的用户在上述预定时间段内还可能点击其它不涉及作弊的内容,但这些不涉及作弊的内 容与可疑的点击无关,因此,不会被判定为可疑内容。
在本实施例中,可疑内容的关键词为最能够体现可疑内容各种特征的词。例如,对于一种药品的广告,其关键词可以是广告产品的类别(药品),该药品能够治疗的疾病名称,生产该药品的制药厂的厂名,该药品所含最重要的化学成分的名称等等。
在本实施例的一种实现中,可以对上述可疑内容进行内容解析,以获取其相关的关键词。在另一种实现中,还可以从对上述可疑内容的名称或者标识信息中获取其相关的关键词。可以理解,还可以有其它的获取可疑内容相关的关键词的方式,本申请对获取可疑内容相关的关键词的方式不限定。
接着,在步骤402中,基于上述关键词判断上述可疑内容是否为同类内容。
在本实施例中,可以根据不同可疑内容对应的关键词确定不同可疑内容是否同类。具体来说,首先判断每组可疑用户群体在上述预定时间段内所点击的一组可疑内容的关键词中,同类关键词所占比例是否大于等于预定比例。如果同类关键词所占比例大于等于预定比例,则确定上述可疑内容为同类内容。
最后,在步骤403中,将上述可疑内容对应的可疑用户群体确定为待排除的非作弊用户群体。
应当注意,尽管在附图中以特定顺序描述了本发明方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。相反,流程图中描绘的步骤可以改变执行顺序。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。
进一步参考图5,其示出了根据本申请的用于检测点击作弊的装置的一个实施例的结构示意图。
如图5所示,本实施例的装置500包括:第一确定单元501,第二确定单元502,第三确定单元503和第四确定单元504。其中,第一确定单元501用于基于预定时间段内预定内容被用户点击的次数确定 可疑的点击。第二确定单元502用于根据所述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体。第三确定单元503用于根据每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体。第四确定单元504用于排除所述可疑用户群体中非作弊用户群体以确定作弊用户群体。
在一些可选实施方式中,第一确定单元501包括获取子单元,判断子单元和确定子单元(未示出)。其中,获取子单元用于获取所述预定时间段内每个预定内容被用户点击的次数。判断子单元用于判断所述每个预定内容被点击的次数是否满足预定条件。确定子单元用于将被点击的次数满足预定条件的预定内容所对应的点击确定为可疑的点击。
在一些可选实施方式中,判断子单元配置用于:判断所述被点击的次数是否大于等于第一预定阈值,且小于等于第二预定阈值。如果是,确定所述被点击的次数满足预定条件。
在一些可选实施方式中,所述第二确定单元502包括信息获取子单元和用户群体确定子单元(未示出)。其中,信息获取子单元用于获取所述可疑的点击的相关信息。用户群体确定子单元用于基于所述相关信息确定至少一组可疑用户群体,其中,每组可疑用户群体在相同时间段内点击同一组内容。
在一些可选实施方式中,所述可疑的点击的相关信息包括以下至少一项:可疑的点击所对应的用户的标识信息;可疑的点击所对应的内容的标识信息;以及可疑的点击所对应的时刻。
在一些可选实施方式中,所述用户群体确定子单元配置用于:基于所述相关信息对所述可疑的点击进行聚类,使得每个聚类中心对应的用户群体在相同时间段内点击同一组内容;将所述每个聚类中心对应的用户群体确定为一组可疑用户群体。
在一些可选实施方式中,第三确定单元503包括关键词获取子单元,类别判断子单元和待排除群体确定子单元(未示出)。关键词获取子单元用于获取每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词。类别判断子单元用于基于所述关键词判断所述可 疑内容是否为同类内容。待排除群体确定子单元用于响应于可疑内容为同类内容,将所述可疑内容对应的可疑用户群体确定为待排除的非作弊用户群体。
在一些可选实施方式中,所述类别判断子单元配置用于:判断所述关键词中同类关键词所占比例是否大于等于预定比例;如果是,确定所述可疑内容为同类内容。
应当理解,装置500中记载的诸单元或模块与参考图1-4描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作和特征同样适用于装置500及其中包含的单元,在此不再赘述。装置500可以预先设置在服务器中,也可以通过下载等方式而加载到服务器中。装置500中的相应单元可以与服务器中的单元相互配合以实现的用于检测点击作弊的方案。
下面参考图6,其示出了适于用来实现本申请实施例的终端设备或服务器的计算机系统600的结构示意图。
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程 序产品,其包括有形地包含在机器可读介质上的计算机程序,所述计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,所述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元模块也可以设置在处理器中,例如,可以描述为:一种处理器包括操作第一确定单元,第二确定单元,第三确定单元和第四确定单元。其中,这些单元模块的名称在某种情况下并不构成对该单元模块本身的限定,例如,第一确定单元还可以被描述为“用于基于预定时间段内预定内容被用户点击的次数确定可疑的点击的单元”。
作为另一方面,本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中所述装置中所包含的计算机可读存储介质;也可以是单独存在,未装配入终端中的计算机可读存储介质。所述计算机可读存储介质存储有一个或者一个以上程序,所述程序被一个或者一个以上的处理器用来执行描述于本申请的用于检测点击作弊的方法。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限 于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (18)

  1. 一种用于检测点击作弊的方法,其特征在于,所述方法包括:
    基于预定时间段内预定内容被用户点击的次数确定可疑的点击;
    根据所述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体;
    根据每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体;以及
    排除所述可疑用户群体中非作弊用户群体以确定作弊用户群体。
  2. 根据权利要求1所述的方法,其特征在于,所述基于预定时间段内预定内容被用户点击的次数确定可疑的点击,包括:
    获取所述预定时间段内每个预定内容被用户点击的次数;
    判断所述每个预定内容被点击的次数是否满足预定条件;
    将被点击的次数满足预定条件的预定内容所对应的点击确定为可疑的点击。
  3. 根据权利要求2所述的方法,其特征在于,判断所述被点击的次数是否满足预定条件,包括:
    判断所述被点击的次数是否大于等于第一预定阈值,且小于等于第二预定阈值;
    如果是,确定所述被点击的次数满足预定条件。
  4. 根据权利要求1-3任意一项所述的方法,其特征在于,所述根据所述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体,包括:
    获取所述可疑的点击的相关信息;
    基于所述相关信息确定至少一组可疑用户群体,其中,每组可疑用户群体在相同时间段内点击同一组内容。
  5. 根据权利要求4所述的方法,其特征在于,所述可疑的点击的相关信息包括以下至少一项:
    可疑的点击所对应的用户的标识信息;
    可疑的点击所对应的内容的标识信息;以及
    可疑的点击所对应的时刻。
  6. 根据权利要求4-5任意一项所述的方法,其特征在于,所述基于所述相关信息确定至少一组可疑用户群体,包括:
    基于所述相关信息对所述可疑的点击进行聚类,使得每个聚类中心对应的用户群体在相同时间段内点击同一组内容;
    将所述每个聚类中心对应的用户群体确定为一组可疑用户群体。
  7. 根据权利要求1-6任意一项所述的方法,其特征在于,所述根据每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体,包括:
    获取每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词;
    基于所述关键词判断所述可疑内容是否为同类内容;
    如果是,将所述可疑内容对应的可疑用户群体确定为待排除的非作弊用户群体。
  8. 根据权利要求7所述的方法,其特征在于,所述基于所述关键词判断所述可疑内容是否为同类内容,包括:
    判断所述关键词中同类关键词所占比例是否大于等于预定比例;
    如果是,确定所述可疑内容为同类内容。
  9. 一种用于检测点击作弊的装置,其特征在于,所述装置包括:
    第一确定单元,用于基于预定时间段内预定内容被用户点击的次数确定可疑的点击;
    第二确定单元,用于根据所述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体;
    第三确定单元,用于根据每组所述可疑用户群体在所述预定时间 段内所点击的可疑内容的关键词确定待排除的非作弊用户群体;以及
    第四确定单元,用于排除所述可疑用户群体中非作弊用户群体以确定作弊用户群体。
  10. 根据权利要求9所述的装置,其特征在于,所述第一确定单元包括:
    获取子单元,用于获取所述预定时间段内每个预定内容被用户点击的次数;
    判断子单元,用于判断所述每个预定内容被点击的次数是否满足预定条件;
    确定子单元,用于将被点击的次数满足预定条件的预定内容所对应的点击确定为可疑的点击。
  11. 根据权利要求10所述的装置,其特征在于,所述判断子单元配置用于:
    判断所述被点击的次数是否大于等于第一预定阈值,且小于等于第二预定阈值;
    如果是,确定所述被点击的次数满足预定条件。
  12. 根据权利要求9-11任意一项所述的装置,其特征在于,所述第二确定单元包括:
    信息获取子单元,用于获取所述可疑的点击的相关信息;
    用户群体确定子单元,用于基于所述相关信息确定至少一组可疑用户群体,其中,每组可疑用户群体在相同时间段内点击同一组内容。
  13. 根据权利要求12所述的装置,其特征在于,所述可疑的点击的相关信息包括以下至少一项:
    可疑的点击所对应的用户的标识信息;
    可疑的点击所对应的内容的标识信息;以及
    可疑的点击所对应的时刻。
  14. 根据权利要求12-13任意一项所述的装置,其特征在于,所述用户群体确定子单元配置用于:
    基于所述相关信息对所述可疑的点击进行聚类,使得每个聚类中心对应的用户群体在相同时间段内点击同一组内容;
    将所述每个聚类中心对应的用户群体确定为一组可疑用户群体。
  15. 根据权利要求9-14任意一项所述的装置,其特征在于,所述第三确定单元包括:
    关键词获取子单元,用于获取每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词;
    类别判断子单元,用于基于所述关键词判断所述可疑内容是否为同类内容;
    待排除群体确定子单元,用于响应于可疑内容为同类内容,将所述可疑内容对应的可疑用户群体确定为待排除的非作弊用户群体。
  16. 根据权利要求15所述的装置,其特征在于,所述类别判断子单元配置用于:
    判断所述关键词中同类关键词所占比例是否大于等于预定比例;
    如果是,确定所述可疑内容为同类内容。
  17. 一种设备,其特征在于,包括:
    一个或者多个处理器;
    存储器;
    一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或多个处理器执行时:
    基于预定时间段内预定内容被用户点击的次数确定可疑的点击;
    根据所述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体;
    根据每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体;以及
    排除所述可疑用户群体中非作弊用户群体以确定作弊用户群体。
  18. 一种非易失性计算机存储介质,所述计算机存储介质存储有一个或多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备:
    基于预定时间段内预定内容被用户点击的次数确定可疑的点击;
    根据所述可疑的点击确定至少一组有作弊嫌疑的可疑用户群体;
    根据每组所述可疑用户群体在所述预定时间段内所点击的可疑内容的关键词确定待排除的非作弊用户群体;以及
    排除所述可疑用户群体中非作弊用户群体以确定作弊用户群体。
PCT/CN2015/089545 2015-04-24 2015-09-14 用于检测点击作弊的方法及装置 WO2016169193A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510202474.9A CN104765874B (zh) 2015-04-24 2015-04-24 用于检测点击作弊的方法及装置
CN201510202474.9 2015-04-24

Publications (1)

Publication Number Publication Date
WO2016169193A1 true WO2016169193A1 (zh) 2016-10-27

Family

ID=53647701

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/089545 WO2016169193A1 (zh) 2015-04-24 2015-09-14 用于检测点击作弊的方法及装置

Country Status (2)

Country Link
CN (1) CN104765874B (zh)
WO (1) WO2016169193A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034867A (zh) * 2018-06-21 2018-12-18 腾讯科技(深圳)有限公司 点击流量检测方法、装置及存储介质
CN110210886A (zh) * 2018-05-31 2019-09-06 腾讯科技(深圳)有限公司 识别虚假操作方法、装置、服务器、可读存储介质、系统
CN110827094A (zh) * 2019-11-15 2020-02-21 湖南快乐阳光互动娱乐传媒有限公司 广告投放的反作弊方法及系统
CN113592036A (zh) * 2021-08-25 2021-11-02 北京沃东天骏信息技术有限公司 流量作弊行为识别方法、装置及存储介质和电子设备
CN114926221A (zh) * 2022-05-31 2022-08-19 北京奇艺世纪科技有限公司 作弊用户识别方法、装置、电子设备及存储介质

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765874B (zh) * 2015-04-24 2019-03-26 百度在线网络技术(北京)有限公司 用于检测点击作弊的方法及装置
CN106445796B (zh) * 2015-08-04 2021-01-19 腾讯科技(深圳)有限公司 作弊渠道的自动检测方法及装置
CN105354721B (zh) * 2015-09-29 2019-09-06 北京金山安全软件有限公司 一种识别机器操作行为的方法及装置
CN106998336B (zh) * 2016-01-22 2020-07-31 腾讯科技(深圳)有限公司 渠道中的用户检测方法和装置
CN106649527B (zh) * 2016-10-20 2021-02-09 重庆邮电大学 基于Spark Streaming的广告点击异常检测系统及检测方法
CN107168854B (zh) * 2017-06-01 2020-06-30 北京京东尚科信息技术有限公司 互联网广告异常点击检测方法、装置、设备及可读存储介质
CN107229557B (zh) * 2017-06-26 2020-10-20 微鲸科技有限公司 异常点击检测方法及装置、点击量统计方法及装置
CN107566897B (zh) * 2017-07-19 2019-10-15 北京奇艺世纪科技有限公司 一种视频刷量的鉴别方法、装置及电子设备
CN107529093B (zh) * 2017-09-05 2020-05-22 北京奇艺世纪科技有限公司 一种视频文件播放量的检测方法及系统
CN110046910B (zh) * 2018-12-13 2023-04-14 蚂蚁金服(杭州)网络技术有限公司 判断客户通过电子支付平台所进行交易合法性的方法和设备
CN109842619B (zh) * 2019-01-08 2022-07-08 北京百度网讯科技有限公司 用户账号拦截方法和装置
CN110069691B (zh) * 2019-04-29 2021-05-28 百度在线网络技术(北京)有限公司 用于处理点击行为数据的方法和装置
CN112579907B (zh) * 2020-12-25 2023-08-11 北京百度网讯科技有限公司 一种异常任务检测方法、装置、电子设备和存储介质
CN112508630B (zh) * 2021-01-29 2021-05-25 腾讯科技(深圳)有限公司 异常会话群的检测方法、装置、计算机设备和存储介质
CN113179358B (zh) * 2021-04-09 2022-08-09 作业帮教育科技(北京)有限公司 一种题目解答的防作弊方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093510A (zh) * 2007-07-25 2007-12-26 北京搜狗科技发展有限公司 一种针对网页作弊的反作弊方法及系统
US20090299967A1 (en) * 2008-06-02 2009-12-03 Microsoft Corporation User advertisement click behavior modeling
CN103390027A (zh) * 2013-06-25 2013-11-13 亿赞普(北京)科技有限公司 一种互联网广告反作弊方法和系统
CN104765874A (zh) * 2015-04-24 2015-07-08 百度在线网络技术(北京)有限公司 用于检测点击作弊的方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289756A (zh) * 2010-06-18 2011-12-21 百度在线网络技术(北京)有限公司 点击有效性的判断方法及其系统
CN103853839A (zh) * 2014-03-18 2014-06-11 北京博雅立方科技有限公司 一种评测广告页面恶意点击疑似度的方法及装置
CN103870572B (zh) * 2014-03-18 2017-07-04 北京博雅立方科技有限公司 一种防御恶意点击广告页面的方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093510A (zh) * 2007-07-25 2007-12-26 北京搜狗科技发展有限公司 一种针对网页作弊的反作弊方法及系统
US20090299967A1 (en) * 2008-06-02 2009-12-03 Microsoft Corporation User advertisement click behavior modeling
CN103390027A (zh) * 2013-06-25 2013-11-13 亿赞普(北京)科技有限公司 一种互联网广告反作弊方法和系统
CN104765874A (zh) * 2015-04-24 2015-07-08 百度在线网络技术(北京)有限公司 用于检测点击作弊的方法及装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210886A (zh) * 2018-05-31 2019-09-06 腾讯科技(深圳)有限公司 识别虚假操作方法、装置、服务器、可读存储介质、系统
CN110210886B (zh) * 2018-05-31 2023-08-22 腾讯科技(深圳)有限公司 识别虚假操作方法、装置、服务器、可读存储介质、系统
CN109034867A (zh) * 2018-06-21 2018-12-18 腾讯科技(深圳)有限公司 点击流量检测方法、装置及存储介质
CN109034867B (zh) * 2018-06-21 2022-10-25 腾讯科技(深圳)有限公司 点击流量检测方法、装置及存储介质
CN110827094A (zh) * 2019-11-15 2020-02-21 湖南快乐阳光互动娱乐传媒有限公司 广告投放的反作弊方法及系统
CN110827094B (zh) * 2019-11-15 2023-05-23 湖南快乐阳光互动娱乐传媒有限公司 广告投放的反作弊方法及系统
CN113592036A (zh) * 2021-08-25 2021-11-02 北京沃东天骏信息技术有限公司 流量作弊行为识别方法、装置及存储介质和电子设备
CN114926221A (zh) * 2022-05-31 2022-08-19 北京奇艺世纪科技有限公司 作弊用户识别方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN104765874B (zh) 2019-03-26
CN104765874A (zh) 2015-07-08

Similar Documents

Publication Publication Date Title
WO2016169193A1 (zh) 用于检测点击作弊的方法及装置
US8515966B2 (en) Analyzing queries to generate product intention rules
US20160117736A1 (en) Methods and apparatus for identifying unique users for on-line advertising
KR101923067B1 (ko) 네이티브 애플리케이션 테스트
WO2019169978A1 (zh) 资源推荐方法及装置
JP2015521413A5 (zh)
WO2015120798A1 (en) Method for processing network media information and related system
EP3814958B1 (en) Dynamic application content analysis
JP5577385B2 (ja) コンテンツ配信装置
CN112334871A (zh) 用于基于数字助理的应用的动作验证
US20150113064A1 (en) Network Information Push Method and System Thereof, and Computer Storage Medium
US9122710B1 (en) Discovery of new business openings using web content analysis
US20240248825A1 (en) Contribution incrementality machine learning models
US20240037176A1 (en) Using embedded elements for online content verification
JP2021508400A (ja) 高速かつ安全なコンテンツ提供のためのシステム
US20110238490A1 (en) Auction flighting
US9754211B2 (en) Incrementality modeling
TWI688870B (zh) 用於偵測欺詐的成對用戶-內容提供者之方法與系統
US11170412B2 (en) Using embedded elements for online content verification
JP6872853B2 (ja) 検出装置、検出方法及び検出プログラム
US20150288627A1 (en) Correlating electronic mail with media monitoring
CN111325228B (zh) 一种模型训练方法及装置
US10212239B2 (en) Evaluating the authenticity of geographic data based on spatial distribution
US9749438B1 (en) Providing a content item for presentation with multiple applications
US9767196B1 (en) Content selection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15889680

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15889680

Country of ref document: EP

Kind code of ref document: A1