Skip to main content

Database schema

All schemas need to ensure that we support required operations.

articles

AttributeData Type / PropertiesConstraints / Indexing
item_idBIGINT, Primary KeyNOT NULL, AUTO INCREMENT, UNIQUE
user_idBIGINT, Foreign Key → users.user_idIndexed, NOT NULL
statusSMALLINT
favoriteBOOLEANDEFAULT false
given_urlTEXTNOT NULL
given_titleTEXT
resolved_titleTEXT
resolved_urlTEXT
domain_idBIGINT, Foreign Key -> domains.domain_idDomain of given_url
origin_domain_idBIGINT, Foreign Key -> domains.domain_idDomain of resolved_url
excerptTEXT
is_articleBOOLEANDEFAULT false
is_indexBOOLEANDEFAULT false
has_videoSMALLINTDEFAULT false
has_imageSMALLINTDEFAULT false
word_countINTEGER
time_addedTIMESTAMPIndexed, DEFAULT now()
time_updatedTIMESTAMP
time_favoritedTIMESTAMP
top_image_urlTEXT
author_idBIGINT, Foreign Key → authors.author_idIndexed

Description of the fields

  1. item_id: Unique identifier for the saved item
  2. user_id: id of the user to which this article/item belongs
  3. status: 0 = active, 1 = archived, 2 = deleted
  4. favorite: weather article is favorited by the user. 1=favorited, 0=not favorited
  5. resolved_title: title, found by the parser/backend
  6. resolved_url: final URL resolved to by the parser/backend. For example, by following redirects
  7. excerpt: excerpt of the article found by the parser/backend
  8. is_article: weather the item is an article
  9. is_index: true if parser/backedn thinks this is an index page
    • 1=parser thinks this item is an index page
    • 0=otherwise
  10. has_video:
    • 0=no video;
    • 1=has a video in the body of the article;
    • 2=is a video
  11. has_image:
    • 0=no image;
    • 1=has an image in the body of the article;
    • 2=is an image
  12. word_count: number of words in the article
  13. time_added: timestamp (with timezone info), at which the article was added
  14. time_updated: timestamp (with timezone info), at which the article was last updated (like last read/opened)
  15. time_favorited: time at which the article was favorited
  16. top_image_url: URL of the first found image in html page
  17. author_id: id of the author of the article

Details:

  • this is the main and most important table, which stores information about the stored links.
  • we anticipate queries of the form
-- find all articles that belong to user with user_id -> 123
SELECT * FROM articles WHERE user_id = 123 ORDER BY time_added
  • or we might need articles that were added after "1 january 2024";
SELECT * FROM articles WHERE user_id = 123 AND time_added > '2024-01-01';
  • we may also need to filter by status and favorite

users

AttributeData Type / PropertiesConstraints / Indexing
providerTEXTNOT NULL
user_idSERIAL, Primary KeyNOT NULL, UNIQUE
emailTEXTUNIQUE, Indexed
hashed_passwordTEXT
nameTEXTNOT NULL

details:

login and signup

We should be able to support both traditional email-password based logins as well as o-auth based logins/signups.

  1. traditional login and signup

When the user signs up, firstly we ask for the following information:

name | email | password 

After, we have this information, then we use something like:

-- insert the newly created user into the system
INSERT INTO users (provider, email, hashed_password, name)
VALUES ('open-pocket', 'xyz@mail.com, '12d8287g1', 'xyz');

Note that open-pocket is the provider here, since, the user is registering through our platform. And user_id will be generated autormatically.

Password hashing should be done in the application using bcrypt.

Now when the user tries to login.

We ask for the following information:

email | password

Then we do something like:

-- find the hashed_password for user with email=email
SELECT hashed_password FROM users WHERE email=[supplied_email]

If this query executes successfully, but returns no rows. Then, the user is not present in the system (not registered). Otherwise, the hashed_password must match hash_of_supplied_password for successfull login.

  1. O-auth login and signup

O-auth logins will be supported in the future, once we are done with email-password logins. The provider field will store the O-auth provider's name (e.g. google, github etc.). And additional fields will be added as required.


tags

AttributeData Type / PropertiesConstraints / Indexing
tag_idSERIAL, Primary KeyNOT NULL, UNIQUE
user_idINTEGER, Foreign Key → Users.user_idNOT NULL
item_idBIG INTEGER, Foreign Key → Articles.item_idNOT NULL
tag_nameTEXTIndexed, NOT NULL

Further Indexes and Constraints:

  • Index on (user_id, item_id)
  • Index on (user_id, tag_name)
  • (tag_name, user_id, item_id) are UNIQUE combined. Therefore, no duplicate tag is allowed

Notes:

  • this table basically contains: an article with tag=tag_name on article with item_id, belonging to user of user_id
  • the fields: (tag_name, user_id, item_id) combined must be unique
  • we will be required to handle queries of the form:
-- get all tags on an article with `item_id` and which belongs to
-- user with `user_id`
SELECT tag_name FROM tags WHERE item_id=123 AND user_id=123;
  • therefore, it is beneficial to create an index on (item_id, user_id)

  • here's another use case:

find all articles belonging to user with user_id and with given tag_name:

SELECT item_id FROM tags WHERE user_id=123 AND tag_name="xyz";
  • therefore it might be beneficial to make an index on (user_id, tag_name) columns

Images

AttributeData Type / PropertiesConstraints / Indexing
image_idBIGINT, Primary KeyNOT NULL, AUTO INCREMENT, UNIQUE
article_idBIGINT, Foreign Key → Articles.item_idIndexed, NOT NULL
srcTEXTNOT NULL
widthINTEGER
heightINTEGER
creditTEXT
captionTEXT

Videos

AttributeData Type / PropertiesConstraints / Indexing
video_idBIGINT, Primary KeyNOT NULL, AUTO INCREMENT, UNIQUE
article_idBIGINT, Foreign Key → Articles.item_idIndexed, NOT NULL
srcTEXTNOT NULL
widthINTEGER
heightINTEGER
typeVARCHAR(50)
vidVARCHAR(100)

authors

AttributeData Type / PropertiesConstraints / Indexing
idBIGINT, Primary KeyNOT NULL, AUTO INCREMENT, UNIQUE
nameVARCHAR(255)NOT NULL
urlTEXT

domains

AttributeData Type / PropertiesConstraints / Indexing
domain_idBIGINT, Primary KeyNOT NULL, AUTO INCREMENT, UNIQUE
nameTEXTNOT NULL

This parameter is require, since /add returns it.