Commit Graph

22 Commits

Author SHA1 Message Date
f17ee64676 Use ScrapingBee for some hosts 2022-07-16 14:09:45 -07:00
2660262c69 Use puppeteer-core 2022-07-15 11:43:55 -07:00
2447bd658e Use chrome-aws-lambda in GCF 2022-07-15 10:58:58 -07:00
d404cd7c4c fix comment 2022-07-15 21:41:06 +08:00
1f1698ea81 sync changes to content-fetch-gcf 2022-07-15 15:11:41 +08:00
0cc7e84a82 Fix content not getting parsed by linkedom properly without <html> tag by replacing innerHtml with outerHtml 2022-05-18 15:52:16 +08:00
8f0447ed3f Stop blocking images and css file 2022-05-18 15:50:52 +08:00
0e31a40331 Use chrome-aws-lambda in the puppeteer GCF 2022-05-13 16:48:51 -07:00
f5003c1370 Stop blocking script 2022-05-13 12:17:19 +08:00
37e55add98 Stop blocking stylesheet and media 2022-05-13 12:09:05 +08:00
60bbbb6cf3 Block requests to 'font', 'image', 'stylesheet', 'script', 'media' in puppeteer 2022-05-12 17:10:38 +08:00
9606cd6b28 Remove chrome-aws-lambda dependencies 2022-05-12 16:32:22 +08:00
0984dca183 Remove adblocker and block resources by url and also block mathJax script 2022-05-11 22:04:47 +08:00
0b11c31317 Add linkedom dependency in packages/api 2022-05-10 18:31:25 +08:00
4c7f6d0281 Update comments 2022-05-09 13:45:45 +08:00
4571f1f51c Add metrics 2022-05-09 13:45:45 +08:00
21799b7b6d Add puppeteer-stealth and puppeteer-ad-block plugin and a user-data-dir to reduce processing time 2022-05-09 13:45:45 +08:00
6f29f18743 Parse image and save it in a <img> element 2022-05-05 12:13:08 +08:00
b679451548 Fix parsing articles from www.derstandard.at (#459)
* Fix parsing articles from www.derstandard.at

* slim cookies down
2022-04-22 10:53:28 +08:00
46b526961a Dockerize the puppeteer-parse service and add to docker-compose 2022-02-12 13:14:00 -08:00
42836b6b38 Simplify startup of the puppeteer service
- Run on port 9090 so we don't conflict with other services
- Route the docker-compose requests to the host network
- Dont require preview bucket information on startup
2022-02-11 14:44:32 -08:00
84f32935f5 Open source omnivore 2022-02-11 09:24:33 -08:00