Commit Graph

68 Commits

Author SHA1 Message Date
7bfd36e308 Fix readability not run in puppeteer-parse 2023-01-12 18:33:47 +08:00
7b749f974d Convert HTML to markdown if specified 2023-01-05 12:12:48 +08:00
7c39db207b Replace createArticle with savePage in puppeteer-parse service 2022-12-28 10:15:05 +08:00
e9b31e375f Update savePage API to accept parsed content in the param 2022-12-27 21:59:31 +08:00
080a1982b2 Remove unused import 2022-11-21 22:39:10 +08:00
9741f6b12d Add highlights to newly saved pages if they contain highlight markers 2022-11-21 22:39:10 +08:00
a7e92addb0 Create 128 * 128 proxy image for the site icon 2022-10-18 15:32:30 +08:00
49ed8e279b Calls preParseContent function in content-handler package before parsing content 2022-10-07 16:56:38 +08:00
679977805f Remove custom content-handler in packages/api 2022-10-07 16:56:38 +08:00
8942398092 Change GET to HEAD 2022-09-30 12:52:09 +08:00
9841ce7f8e Remove parsing newsletter emails from forwarded emails 2022-09-30 12:51:23 +08:00
2f6b26d21d Use GET request as some host do not allow HEAD requests 2022-09-28 20:18:40 +08:00
a55ad135fd Set 5s for the HEAD request timeout 2022-09-28 19:26:42 +08:00
a6795b380a Fetch favicon from url 2022-09-28 19:02:29 +08:00
56e3ccadf0 Add test case for Axios AM newsletter 2022-09-21 12:16:13 +08:00
b03f7ebeb8 Add handler to pre-process morning brew newsletters 2022-09-15 16:21:09 +08:00
0cf689ce21 Fix not getting revue newsletter url by checking all the hyper links in the table tr td 2022-08-19 17:32:29 +08:00
dee94f7c93 Fix a bug for converting text to speech for articles with less than 5000 characters 2022-08-18 19:24:38 +08:00
447e413605 Add function to parse HTML to SSML 2022-08-18 19:24:38 +08:00
565da42b46 When parsing newsletters with no url dont fetch the generated url 2022-07-28 11:05:17 -07:00
3a120b8f47 Return empty name if name not found in from header 2022-07-28 10:37:28 +08:00
c99c1db57e Add support to the case when from address is in Name <address> format 2022-07-28 10:26:15 +08:00
6f11ccacb1 Save article from forwarding emails 2022-07-27 12:15:28 +08:00
d184ca8d04 Add function isProbablyArticle to test if a forwarded email contains an article to save 2022-07-27 12:15:26 +08:00
65f28fe708 add revue.email to the revue newsletter attribute selectors 2022-07-18 10:44:39 +08:00
345c020dea fix not saving loqseq weekly newsletter due to no url 2022-07-17 12:36:52 +08:00
c4a599d2ba support newsletters hosted on convertkit.com 2022-07-12 11:47:38 +08:00
9599edb9fb Add support for newsletters hosted on getrevue.co 2022-07-06 22:06:51 +08:00
2e77241044 Check and get Revue newsletter url from email 2022-07-06 22:06:21 +08:00
404805e0c0 Make async calls to parse() 2022-05-31 22:51:00 +08:00
7d4d1d7b67 Parse language in readability 2022-05-26 10:55:21 +08:00
2ee95a1c14 Fix cannot convert null to object error message 2022-05-20 20:38:30 +08:00
eadeccce81 Linting fixes 2022-05-18 11:31:42 -07:00
004c766588 If parsing fails, attempt adding <html> wrappers to a document
LinkedDom seems less forgiving and expects the outerHTML of a
document, however older extension versions still send innerHTML.
2022-05-18 10:55:31 -07:00
d68549bcb7 Remove unused code 2022-05-18 15:52:31 +08:00
e76fb02f43 Fix window is not defined for parsing code blocks 2022-05-17 11:01:55 +08:00
1b8850ed33 Fix tests 2022-05-12 17:41:11 +08:00
602d141dec Rename doc to dom 2022-05-12 11:00:32 +08:00
a78a6c6ba4 Replace DomWindow with Document in handlers 2022-05-10 17:01:23 +08:00
6a57281e74 Remove DomWindow usage 2022-05-10 17:00:56 +08:00
acc7654a2f Replace jsdom with linkedom 2022-05-10 16:59:09 +08:00
5698790288 Pass url to readability 2022-05-10 16:53:45 +08:00
59a2639b7d Reduce http call to get jsonld data if title or content or sitename or byline exists 2022-05-09 13:45:45 +08:00
a457c9d128 Update article content only when code blocks exist 2022-05-09 13:45:45 +08:00
eaad96acdd Return parsed dom back to backend 2022-05-06 12:29:08 +08:00
5f5076e864 Highlight code element without reinitialize jsdom 2022-05-06 12:20:54 +08:00
7c6b810522 Remove redundant JSDOM 2022-05-06 10:53:36 +08:00
6d405432af add site_name and site_icon to page model and return in resolver (#341)
* add site_name and site_icon to page model and return in resolver

* fix tests
2022-03-30 10:43:10 +08:00
ff1200f3a1 Use html decoding when getting values from fetched oembed
If we fetch oembed data from an external source, instead of
handling it in readabilityjs we need to html decode it.
2022-03-16 15:29:42 -07:00
2184c2a8d3 Parse online URLs for beehiiv newsletters 2022-03-07 15:49:44 -08:00