Commit Graph

81 Commits

Author SHA1 Message Date
6cd6994aff Fix docker image 2022-12-28 15:28:27 +08:00
a5f5e6fbdb Fix docker build 2022-12-28 11:51:11 +08:00
e866541ae1 Fix puppeteer launch in head mode 2022-11-17 11:28:46 +08:00
e75e49a7b4 Remove logging dependecies in puppeteer-parse 2022-11-17 11:28:26 +08:00
db8c9cf97d Add function-framework dependency 2022-11-16 10:52:07 +08:00
d6e687d5d1 Update env example 2022-11-16 10:15:49 +08:00
b18af10e75 Import puppeteer-parse in content-fetch 2022-11-16 10:15:49 +08:00
00fed8a0fb Remove content-fetch-gcf and create a Dockerfile for the cloud function 2022-11-16 10:15:49 +08:00
623bb8780c Call puppeteer module from content-fetch 2022-11-16 10:15:49 +08:00
cb858484c6 Make puppeteer-parse a module 2022-11-16 10:15:49 +08:00
b5926ccf1c Get old tweet thread with puppeteer and new tweet with twitter api 2022-10-26 20:41:51 +08:00
bc9b50c3cb Remove dockerfile-local 2022-10-06 12:57:30 +08:00
d6e465d482 Add Dockerfile for pdfHandler 2022-10-04 15:28:12 +08:00
53d6afe25f Fix tests 2022-10-04 10:47:58 +08:00
9cae703666 Fix Dockerfile 2022-10-04 10:20:13 +08:00
4b01fccad8 Fix content-fetch dockerfile 2022-10-03 14:21:31 +08:00
a9607adfd3 Import content-handler as local dependency 2022-10-03 11:11:24 +08:00
99956539a0 Handle newsletter in content-handlers 2022-09-30 12:51:22 +08:00
206d795c54 Import content-handler in puppeteer 2022-09-30 12:51:22 +08:00
8c61832c77 Import content-handler in content-fetch 2022-09-30 12:51:22 +08:00
cb609d893e Escape HTML entities in puppeteer-parse 2022-09-23 16:40:32 +08:00
aef83ee958 Escape HTML entities in Twitter title and description 2022-09-23 16:33:57 +08:00
7656b37e1b Escape youtube title and author name 2022-09-23 16:16:25 +08:00
e52013ccb1 It seems to have some issue with disabling puppeteer timeout by setting it to 0.
So I set the timeout to 2 minutes which should be enough and it works in my local env
2022-08-26 11:54:33 +08:00
d12f3642e6 Bump puppeteer-core from 15.3.2 to 16.1.0
Bumps [puppeteer-core](https://github.com/puppeteer/puppeteer) from 15.3.2 to 16.1.0.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md)
- [Commits](https://github.com/puppeteer/puppeteer/compare/v15.3.2...v16.1.0)

---
updated-dependencies:
- dependency-name: puppeteer-core
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-08-11 02:15:16 +00:00
f17ee64676 Use ScrapingBee for some hosts 2022-07-16 14:09:45 -07:00
11f20ab64a Revert "close browser when request finished"
This reverts commit 7e68ad5237.
2022-07-15 15:35:07 +08:00
7e68ad5237 close browser when request finished 2022-07-15 15:23:02 +08:00
b2238ce7f2 revert no-sandbox 2022-07-15 14:43:17 +08:00
ed09d78980 remove no-sandbox 2022-07-15 14:32:02 +08:00
d9bb664fc0 remove puppeteer dependency in docker 2022-07-15 14:15:31 +08:00
9191f5710c remove single-process arg 2022-07-15 14:04:41 +08:00
4929bae81b close context if encounter error 2022-07-15 11:49:36 +08:00
610c790a7e do not use puppeteer-extra plugin 2022-07-15 11:09:57 +08:00
bb7ea78e8f Bump puppeteer-core from 13.7.0 to 15.3.2
Bumps [puppeteer-core](https://github.com/puppeteer/puppeteer) from 13.7.0 to 15.3.2.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md)
- [Commits](https://github.com/puppeteer/puppeteer/compare/v13.7.0...v15.3.2)

---
updated-dependencies:
- dependency-name: puppeteer-core
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-11 21:35:51 +00:00
c94d5db259 Merge pull request #889 from omnivore-app/dependabot/npm_and_yarn/axios-0.27.2
Bump axios from 0.26.0 to 0.27.2
2022-07-08 13:48:27 -07:00
01353add63 Shorten the timeout requesting pages
I believe our process is sometimes being terminated before this
timeout is hit, which means we then don't have time to fetch
with a fallback.
2022-07-05 11:16:11 -07:00
9554f8f6ba Create a scrapingbee url when using the fallback
Javascript hoists variables to the top of scope, so `url` here
refers to the `url` variable defined lower in the block.
2022-07-05 08:41:34 -07:00
3a79710dbf Always fall back to scrapingbee if there is an exception 2022-07-05 21:48:58 +08:00
37075f076e Remove userDataDir 2022-06-29 22:56:14 +08:00
e91f25e58c Bump axios from 0.26.0 to 0.27.2
Bumps [axios](https://github.com/axios/axios) from 0.26.0 to 0.27.2.
- [Release notes](https://github.com/axios/axios/releases)
- [Changelog](https://github.com/axios/axios/blob/v0.27.2/CHANGELOG.md)
- [Commits](https://github.com/axios/axios/compare/v0.26.0...v0.27.2)

---
updated-dependencies:
- dependency-name: axios
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-06-27 21:31:14 +00:00
feb197c731 Fix a crash when parsing content fetches that are blocked 2022-06-22 14:55:37 -07:00
a9b3a5c925 Merge pull request #805 from omnivore-app/fix/duplicate-content
Remove duplicate content
2022-06-19 10:56:10 +08:00
1d99bfaa10 Use a different Dockerfile for content-fetch with App Engine and docker-compose 2022-06-17 17:12:33 -07:00
ddaac82653 Fix content-fetch on docker compose 2022-06-17 14:59:42 -07:00
58814e1854 Run the content-fetch service in docker compose 2022-06-17 14:19:06 -07:00
71f8834477 Fix detection of medium subdomains 2022-06-17 09:25:42 -07:00
486f22a594 remove redundant async 2022-06-15 22:31:55 +08:00
2aafd39650 add fastcompany.com to the non-script hosts list 2022-06-15 22:27:25 +08:00
e028e2e440 generate test page for fast company 2022-06-15 21:22:25 +08:00