Commit Graph

166 Commits

Author SHA1 Message Date
066883a84d remove unused dependencies 2024-07-24 12:51:25 +08:00
29a5b20d2c remove scrapingbee from content-fetch 2024-07-24 12:17:13 +08:00
dfc0ce0e54 create a context for each new page 2024-07-10 17:20:04 +08:00
e61da03ed4 wait until body is fetched 2024-07-10 16:01:39 +08:00
431c4cc098 use old headless mode for better performance 2024-07-10 15:13:16 +08:00
75338f5927 bypass cloudflare captcha 2024-07-10 14:43:47 +08:00
73e180f43d add more dependencies to docker container 2024-07-09 19:16:21 +08:00
cd83199eb3 remove hardcode gpu vendor as it does not work 2024-07-09 18:18:11 +08:00
c0f45e2411 upgrade puppeteer-core 2024-07-09 18:06:42 +08:00
0899b8fc8f use swAngle 2024-07-09 14:31:51 +08:00
c75cbb39d6 injecting webgl fingerprint 2024-07-09 14:11:31 +08:00
2c15c21bf1 remove user-agent 2024-07-08 18:59:53 +08:00
728059c6f8 do not cache some urls 2024-07-05 19:05:36 +08:00
81fbaf9807 inject fingerprint 2024-07-05 18:52:20 +08:00
a6653414e8 fix: use software graphic rendering instead of gpu and reduce browser launch timeout to 10 seconds 2024-07-05 12:13:00 +08:00
1eb1d25960 remove specific user-data-dir 2024-07-04 19:34:23 +08:00
38a3e03780 improve args 2024-07-04 19:32:11 +08:00
b38b28c75e create a browser singleton instance and checks browser existence before creating context 2024-07-04 19:12:42 +08:00
dde9f16396 put error message in the analytic event 2024-05-17 16:16:44 +08:00
f43c48e376 reduce chromium launch timeout to 30 seconds 2024-05-17 14:27:59 +08:00
293ed87100 remove redundant response from return value 2024-05-16 12:21:11 +08:00
cd315fa6c6 remove redundant assignment 2024-05-14 13:07:42 +08:00
484676750e reconnect/restart browser if it crashed/lost connections 2024-05-14 13:04:17 +08:00
a924c8448b capture content-fetch success and error events 2024-05-13 14:55:48 +08:00
d886c3b7d0 catch puppeteer page error 2024-05-13 14:35:47 +08:00
d23bccf459 upgrade puppeteer-core to prevent ProtocolTimeout and adding more debug logs 2024-05-13 14:28:26 +08:00
0ac5299c32 do not pass browser instance to content-handler 2024-05-13 13:10:02 +08:00
86e637febd add more logs to debug browser context 2024-05-13 12:56:20 +08:00
475c636c1a print browser log 2024-05-13 12:55:15 +08:00
88a7e8d85b fix tests 2024-04-04 12:17:15 +08:00
0e46dc2302 save dir in the database 2024-03-04 12:28:51 +08:00
5e239d2568 run readability in save-page instead of puppeteer 2024-01-25 16:30:59 +08:00
94dd4be659 fix: page content not saved when title is empty but content is not 2024-01-23 16:47:42 +08:00
1411cf074e fix: finalUrl defaults to the url of the page saved 2024-01-23 14:14:54 +08:00
a03eee5ef7 fix dependecies 2024-01-18 18:48:46 +08:00
d9feb740cb convert content-fetch to typescript 2024-01-18 18:48:46 +08:00
cd3402b98a rewrite puppeteer in typescript 2024-01-18 18:48:46 +08:00
51e586ed3d separate content-fetch in puppeteer packages from saving page content 2024-01-18 18:48:46 +08:00
ad63c75e63 fix typo 2023-12-08 11:29:03 +08:00
3759e10615 fix feed url in pdf file not saved 2023-12-08 11:29:02 +08:00
d09ec51136 Merge pull request #3182 from omnivore-app/fix/importer-notification 2023-11-28 14:59:52 +08:00
b10b704da3 fix importer metrics not updated when failed to catch invalid url in the list 2023-11-28 12:14:27 +08:00
fd781644f1 feat: fetch content for rss feed items in following folder 2023-11-23 18:03:25 +08:00
c4773dc904 Landing page improvements and various supporting improvements 2023-10-24 09:43:39 +01:00
1b1cce7485 disable javascript for the host 2023-10-20 18:59:22 +08:00
d746510358 cont 2023-10-19 21:50:16 +08:00
f750648824 fix importer triggers thumbnailer unexpectedly 2023-10-19 21:46:43 +08:00
0fcc7096aa docs: fix typo in packages/puppeteer-parse/README.md 2023-10-18 17:33:22 +05:45
00bd183287 do not retry importer job if user account is deleted 2023-10-16 16:33:22 +08:00
5f6be169bd add savedAt and publishedAt to saveUrl api 2023-08-14 17:10:34 +08:00