Commit Graph

56 Commits

Author SHA1 Message Date
f17ee64676 Use ScrapingBee for some hosts 2022-07-16 14:09:45 -07:00
11f20ab64a Revert "close browser when request finished"
This reverts commit 7e68ad5237.
2022-07-15 15:35:07 +08:00
7e68ad5237 close browser when request finished 2022-07-15 15:23:02 +08:00
b2238ce7f2 revert no-sandbox 2022-07-15 14:43:17 +08:00
ed09d78980 remove no-sandbox 2022-07-15 14:32:02 +08:00
d9bb664fc0 remove puppeteer dependency in docker 2022-07-15 14:15:31 +08:00
9191f5710c remove single-process arg 2022-07-15 14:04:41 +08:00
4929bae81b close context if encounter error 2022-07-15 11:49:36 +08:00
610c790a7e do not use puppeteer-extra plugin 2022-07-15 11:09:57 +08:00
bb7ea78e8f Bump puppeteer-core from 13.7.0 to 15.3.2
Bumps [puppeteer-core](https://github.com/puppeteer/puppeteer) from 13.7.0 to 15.3.2.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md)
- [Commits](https://github.com/puppeteer/puppeteer/compare/v13.7.0...v15.3.2)

---
updated-dependencies:
- dependency-name: puppeteer-core
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-11 21:35:51 +00:00
c94d5db259 Merge pull request #889 from omnivore-app/dependabot/npm_and_yarn/axios-0.27.2
Bump axios from 0.26.0 to 0.27.2
2022-07-08 13:48:27 -07:00
01353add63 Shorten the timeout requesting pages
I believe our process is sometimes being terminated before this
timeout is hit, which means we then don't have time to fetch
with a fallback.
2022-07-05 11:16:11 -07:00
9554f8f6ba Create a scrapingbee url when using the fallback
Javascript hoists variables to the top of scope, so `url` here
refers to the `url` variable defined lower in the block.
2022-07-05 08:41:34 -07:00
3a79710dbf Always fall back to scrapingbee if there is an exception 2022-07-05 21:48:58 +08:00
37075f076e Remove userDataDir 2022-06-29 22:56:14 +08:00
e91f25e58c Bump axios from 0.26.0 to 0.27.2
Bumps [axios](https://github.com/axios/axios) from 0.26.0 to 0.27.2.
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/v0.27.2/CHANGELOG.md)
- [Commits](https://github.com/axios/axios/compare/v0.26.0...v0.27.2)

---
updated-dependencies:
- dependency-name: axios
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-27 21:31:14 +00:00
feb197c731 Fix a crash when parsing content fetches that are blocked 2022-06-22 14:55:37 -07:00
a9b3a5c925 Merge pull request #805 from omnivore-app/fix/duplicate-content
Remove duplicate content
2022-06-19 10:56:10 +08:00
1d99bfaa10 Use a different Dockerfile for content-fetch with App Engine and docker-compose 2022-06-17 17:12:33 -07:00
ddaac82653 Fix content-fetch on docker compose 2022-06-17 14:59:42 -07:00
58814e1854 Run the content-fetch service in docker compose 2022-06-17 14:19:06 -07:00
71f8834477 Fix detection of medium subdomains 2022-06-17 09:25:42 -07:00
486f22a594 remove redundant async 2022-06-15 22:31:55 +08:00
2aafd39650 add fastcompany.com to the non-script hosts list 2022-06-15 22:27:25 +08:00
e028e2e440 generate test page for fast company 2022-06-15 21:22:25 +08:00
486f3c930b Remove PROXY_URL from content-fetch 2022-06-14 20:30:02 -07:00
ec5bbb8350 Return URL as a string 2022-06-14 16:23:07 -07:00
be2801477b Add some extra debugging 2022-06-14 16:13:43 -07:00
159a7f8950 Fallback to scrapingbee if a page cant fetch content 2022-06-14 16:06:01 -07:00
814f6098a3 Log proxy url 2022-06-14 14:27:19 -07:00
a4ad78652a Allow specifying a proxy url when launching puppeteer 2022-06-14 13:30:32 -07:00
b94215f1fc Allow selectively disabling javascript on some hosts
Some hosts readability is improved by disabling javascript
2022-06-10 13:25:14 -07:00
cb98a9cf86 Make clients opt into creating a page when uploading a file 2022-05-26 21:40:40 -07:00
0cc7e84a82 Fix content not getting parsed by linkedom properly without <html> tag by replacing innerHtml with outerHtml 2022-05-18 15:52:16 +08:00
8f0447ed3f Stop blocking images and css file 2022-05-18 15:50:52 +08:00
629aa54c58 Fix youtube handler 2022-05-18 11:28:33 +08:00
ca662964e6 Fix not getting youtube video id from url 2022-05-17 21:51:03 +08:00
745f55a843 Set headless=true 2022-05-14 10:47:15 +08:00
80c14cd6ca Remove single-process from chromium args 2022-05-14 10:37:06 +08:00
7bfb8cfee4 Merge pull request #597 from omnivore-app/remove-chrome-aws-lambda
Optimize puppeteer and remove chrome-aws-lambda dependencies
2022-05-13 16:12:24 -07:00
87b11277d1 Add token verification to the content-fetch service 2022-05-13 15:14:09 -07:00
6f09a4b31a Fix missing variable name in medium handler 2022-05-13 17:47:21 +08:00
f5003c1370 Stop blocking script 2022-05-13 12:17:19 +08:00
37e55add98 Stop blocking stylesheet and media 2022-05-13 12:09:05 +08:00
ad99f933e5 Fix tests cont 2022-05-12 17:53:28 +08:00
60bbbb6cf3 Block requests to 'font', 'image', 'stylesheet', 'script', 'media' in puppeteer 2022-05-12 17:10:38 +08:00
b766e17189 Remove jsdom in content-fetch 2022-05-12 16:48:59 +08:00
9606cd6b28 Remove chrome-aws-lambda dependencies 2022-05-12 16:32:22 +08:00
e1e0ddf7fc Merge pull request #582 from omnivore-app/optimize-parsing
Optimize parsing
2022-05-12 11:07:52 +08:00
0984dca183 Remove adblocker and block resources by url and also block mathJax script 2022-05-11 22:04:47 +08:00