Commit Graph

62 Commits

Author SHA1 Message Date
303165d47c Add icon to the ignored matchlist 2022-10-27 12:06:38 +08:00
990759da73 Save base64 encoded image site icon in page 2022-10-18 15:32:30 +08:00
3f82c0af01 Handle cases where tweet items dont have parents 2022-10-18 13:46:37 +08:00
57676d381c Better handle <a elements with no parent when looking for tweet placeholders 2022-10-18 12:13:54 +08:00
57575c75aa Remove the subscriber dialog that was part of the header before 2022-10-12 14:31:34 +08:00
0116304284 Dont exclude substacks in article headers
Substack uses this class on header elements like <h1> to give
them anchors in the documents, but we mark content with `header`
in its class name as unlikely.
2022-10-12 14:27:31 +08:00
7187326d90 Use li-date and post-tag selectors instead of post-meta as that usually has useful data 2022-10-04 15:06:54 +08:00
8a4777011f Improve readability of dev.to pages 2022-10-04 13:40:26 +08:00
3adf0fe428 Update dock removal for NYT 2022-09-28 18:21:49 +08:00
e8213eff97 Revert typo 2022-09-28 16:51:00 +08:00
9d223bb4e2 Better handling of NYT podcast transcripts 2022-09-28 16:31:35 +08:00
8ff356d3bc keep the market-info class in morning brew newsletters 2022-09-15 16:22:12 +08:00
fa731acd74 Create new newsletter if old one is deleted 2022-09-14 19:14:58 +08:00
6435a20246 Remove social links 2022-09-14 16:59:21 +08:00
1563ef9131 replace tables whose role is not presentation with divs for newsletters 2022-09-14 16:59:21 +08:00
1be0dd50ef Remove anchors class names that contain "tw-text-substack-secondary" as they are used for Substack subscription 2022-09-08 22:19:27 +08:00
f0f7aa5a6d rename jsdom to linkedom 2022-07-14 10:59:44 +08:00
b317a0877b Remove foot and footnote from negative score list 2022-07-07 10:48:04 +08:00
7957cd2126 Do not remove content in .menu-opacity element 2022-06-27 22:27:38 +08:00
bcebe738ff Update Readability-readerable 2022-06-27 11:56:21 +08:00
a3e5d6d817 Remove ad 2022-06-27 11:40:18 +08:00
2504348936 Fix getting nodes from null document element 2022-06-27 11:09:38 +08:00
324c4ee6e7 Give .container-banners higher weight 2022-06-23 21:58:37 +08:00
86a9383b53 Do not remove content in .container-banners class 2022-06-23 21:11:23 +08:00
8535534709 Fix classname = null exception by checking element parent node nullability before checking classname 2022-06-16 18:09:23 +08:00
ff87371d21 Make the tweet a single child in the parent class=tweet element even there are more than one twitter links in it 2022-06-13 17:36:15 +08:00
f8dd405c3f ignores link density for the links inside the .post-body div (the main content) 2022-06-09 16:37:03 +08:00
704726dc6a Improve parsing of channel news asia 2022-06-03 14:03:14 -07:00
0b0edd3e69 Make fetching tweet url async 2022-05-31 22:50:14 +08:00
417ed0a4eb Fetch tweet id from url 2022-05-31 20:02:54 +08:00
b6fef171be If we have a node with only one child element which has the placeholder class, keep it 2022-05-31 14:18:39 +08:00
22f5e1cc32 Fix embed tweets got deleted when simplifying nested elements 2022-05-31 13:42:16 +08:00
a34806a782 Fix tests 2022-05-26 10:55:21 +08:00
7d4d1d7b67 Parse language in readability 2022-05-26 10:55:21 +08:00
bdfa76d716 Remove listnav elements from articles 2022-05-12 15:58:24 -07:00
2755da16a9 Fix not getting iframe src 2022-05-11 19:25:12 +08:00
2152a9e466 Fix getting embeded class lists bug 2022-05-10 16:57:38 +08:00
76d47f7dc5 Fix updating live collections 2022-05-10 16:57:02 +08:00
79a941a2b6 Default use options.url if exists 2022-05-10 16:56:09 +08:00
eaad96acdd Return parsed dom back to backend 2022-05-06 12:29:08 +08:00
5f5076e864 Highlight code element without reinitialize jsdom 2022-05-06 12:20:54 +08:00
384c5dbf9f Improve rendering of the Financial Times 2022-05-05 09:14:51 -07:00
24373018af Return non-text elements if no text content found after parsing in readability 2022-05-05 19:26:39 +08:00
8386aebaf8 Remove ads from fiercepharma page 2022-05-04 14:22:08 -07:00
a24b976546 Remove lazy loaded srcset elements
Some tools like jetpack: https://jetpack.com/support/lazy-images/
use a temporary srcset element set to a data image when lazy
loading, these are later removed by JS. We test if there is
a valid src attribute and if the srcset attribute is a data embed
to remove these.
2022-04-29 10:05:33 -07:00
d2bb359f5c Handle srcset image density specifications 2022-04-26 15:18:23 -07:00
72a231c97e Do not proxy image data uri (#421)
* do not proxy image data uri

* rename data uri

* rename data uri in test
2022-04-14 12:59:12 +08:00
4d01f689b2 replace tables of article content with divs for newsletters 2022-04-11 20:00:11 +08:00
da28998130 Pull ul list out of newsletter blurb
next/react doesnt want child elements of the paragraphs

Improve formatting

Improve wording

Use buttons in the subscribe directly blocks

Simplify docs on setting up forwarding rules

Add extra padding on bottom of help docs

Remove unused style

Add emails help page

Improve formatting

Prefetch page content on iOS

Reduce the reader overly length now that items are precached

Add invalidation when highlights are added to items

fix missing index_settings.json file in api dockerfile for creating elastic index (#363)

Handle full email address objects in the to param from sendgrid

These come in a format like:

"jacksonh-dfdf@inbox.omnivore.app" <jacksonh-dfdf@inbox-demo.omnivore.app>

New IDs for short highlights because they dont cascade delete now

Testing CI issues

Simplify test

CI test

Use promises for async tests

Temporarily remove test to debug CI

Re-enable

re-enable test, return error

Specify a userId when looking up saved email pages

create a unique url for newsletters without a URL

Use 500ms on page test timeouts

Increase timeout

Dont use deep equal to match newsletter label

Run just the labels API

Run against just the newsletter emails

Run without the page tests

Fix

Set the allow uncaught flag

Remove highlight tests

Remove newsletters tests

more resolver tests

Remove newsetter tests

Comment out resolver tests

Use nock for external requests in tests

Specify puppeteer url for tests

Comment out more tests

uncomment tests

re-enable

re-enable email test

Re-disable

Re-enable one pdf attachment test

Re-disable pdf attachment test

Use promises on setTimeout tests

rm label tests

mv label tests into a context

Comment out pdf tests

Comment out pdf tests

Async test

Async wrappers

Delay when creating test pages

More debugging

Unique short ids

Remove potentially problematic test

Fetch page before returning for test

handler in before block

more debugging

More debugging

Move errors checks into contexts

Use a context when saving newsletters to force index refresh

Prettier fix

Fix newsletter label check, remove setTimeout

Re-enable test

timeout on pdf router handler

Fix method call

comment out PDF test

Unique fake username

Comment out PDF test

Debugging signed urls

Re-enable

New email

pdf test

PDF tests

Comment out pdf test

Add nock stubs for email URLs

Use full address for PDF test

Remove debug

Use full email addresses
2022-04-02 16:56:24 -07:00
6d405432af add site_name and site_icon to page model and return in resolver (#341)
* add site_name and site_icon to page model and return in resolver

* fix tests
2022-03-30 10:43:10 +08:00