Nothing Special   »   [go: up one dir, main page]

Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
Qianlitp committed Dec 6, 2022
2 parents 28f31d8 + 9d6f751 commit f96cbf1
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 5 deletions.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,6 @@ https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-te
* `--form-keyword-values, -fkv` Customize the value of the form fill, set by keyword fuzzy match. The keyword matches the four attribute values of `id`, `name`, `class`, `type` of the input box label. For example, fuzzy match the pass keyword to fill 123456 and the user keyword to fill admin, `-fkv user=admin -fkv pass=123456`. (Default: Cralwergo)
### Advanced settings for the crawling process
* `--incognito-context, -i` Browser start incognito mode. (Default: true)
* `--max-tab-count Number, -t Number` The maximum number of tabs the crawler can open at the same time. (Default: 8)
* `--tab-run-timeout Timeout` Maximum runtime for a single tab page. (Default: 20s)
* `--wait-dom-content-loaded-timeout Timeout` The maximum timeout to wait for the page to finish loading. (Default: 5s)
Expand Down
1 change: 0 additions & 1 deletion README_zh-cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,6 @@ crawlergo 拥有灵活的参数配置,以下是详细的选项说明:
* `--filter-mode Mode, -f Mode` 过滤模式,简单:只过滤静态资源和完全重复的请求。智能:拥有过滤伪静态的能力。严格:更加严格的伪静态过滤规则。
* `--output-mode value, -o value` 结果输出模式,`console`:打印当前域名结果。`json`:打印所有结果的json序列化字符串,可直接被反序列化解析。`none`:不打印输出。
* `--output-json filepath` 将爬虫结果JSON序列化之后写入到json文件。
* `--incognito-context, -i` 浏览器启动隐身模式
* `--max-tab-count Number, -t Number` 爬虫同时开启最大标签页,即同时爬取的页面数量。
* `--fuzz-path` 使用常见路径Fuzz目标,获取更多入口。
* `--fuzz-path-dict` 通过字典文件自定义Fuzz目录,传入字典文件路径,如:`/home/user/fuzz_dir.txt`,文件每行代表一个要fuzz的目录。
Expand Down
1 change: 0 additions & 1 deletion cmd/crawlergo/flag.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ func SetChromePath() *cli.PathFlag {
Name: "chromium-path",
Aliases: []string{"c"},
Usage: "`Path` of chromium executable. Such as \"/home/test/chrome-linux/chrome\"",
Required: true,
Destination: &taskConfig.ChromiumPath,
EnvVars: []string{"CRAWLERGO_CHROMIUM_PATH"},
}
Expand Down
8 changes: 6 additions & 2 deletions pkg/engine/browser.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ func InitBrowser(chromiumPath string, extraHeaders map[string]interface{}, proxy
var bro Browser
opts := append(chromedp.DefaultExecAllocatorOptions[:],

// 执行路径
chromedp.ExecPath(chromiumPath),
// 无头模式
chromedp.Flag("headless", !noHeadless),
// https://github.com/chromedp/chromedp/issues/997#issuecomment-1030596050
Expand Down Expand Up @@ -59,6 +57,12 @@ func InitBrowser(chromiumPath string, extraHeaders map[string]interface{}, proxy
opts = append(opts, chromedp.ProxyServer(proxy))
}

if len(chromiumPath) > 0 {

// 指定执行路径
opts = append(opts, chromedp.ExecPath(chromiumPath))
}

allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
bctx, _ := chromedp.NewContext(allocCtx,
chromedp.WithLogf(log.Printf),
Expand Down

0 comments on commit f96cbf1

Please sign in to comment.