Database schema

All schemas need to ensure that we support required operations.

articles

Attribute	Data Type / Properties	Constraints / Indexing
item_id	BIGINT, Primary Key	NOT NULL, AUTO INCREMENT, UNIQUE
user_id	BIGINT, Foreign Key → users.user_id	Indexed, NOT NULL
status	SMALLINT
favorite	BOOLEAN	DEFAULT false
given_url	TEXT	NOT NULL
given_title	TEXT
resolved_title	TEXT
resolved_url	TEXT
domain_id	BIGINT, Foreign Key -> domains.domain_id	Domain of given_url
origin_domain_id	BIGINT, Foreign Key -> domains.domain_id	Domain of resolved_url
excerpt	TEXT
is_article	BOOLEAN	DEFAULT false
is_index	BOOLEAN	DEFAULT false
has_video	SMALLINT	DEFAULT false
has_image	SMALLINT	DEFAULT false
word_count	INTEGER
time_added	TIMESTAMP	Indexed, DEFAULT now()
time_updated	TIMESTAMP
time_favorited	TIMESTAMP
top_image_url	TEXT
author_id	BIGINT, Foreign Key → authors.author_id	Indexed

Description of the fields

item_id: Unique identifier for the saved item
user_id: id of the user to which this article/item belongs
status: 0 = active, 1 = archived, 2 = deleted
favorite: weather article is favorited by the user. 1=favorited, 0=not favorited
resolved_title: title, found by the parser/backend
resolved_url: final URL resolved to by the parser/backend. For example, by following redirects
excerpt: excerpt of the article found by the parser/backend
is_article: weather the item is an article
is_index: true if parser/backedn thinks this is an index page
- 1=parser thinks this item is an index page
- 0=otherwise
has_video:
- 0=no video;
- 1=has a video in the body of the article;
- 2=is a video
has_image:
- 0=no image;
- 1=has an image in the body of the article;
- 2=is an image
word_count: number of words in the article
time_added: timestamp (with timezone info), at which the article was added
time_updated: timestamp (with timezone info), at which the article was last updated (like last read/opened)
time_favorited: time at which the article was favorited
top_image_url: URL of the first found image in html page
author_id: id of the author of the article

Details:

this is the main and most important table, which stores information about the stored links.
we anticipate queries of the form

-- find all articles that belong to user with user_id -> 123
SELECT * FROM articles WHERE user_id = 123 ORDER BY time_added

or we might need articles that were added after "1 january 2024";

SELECT * FROM articles WHERE user_id = 123 AND time_added > '2024-01-01';

we may also need to filter by status and favorite

users

Attribute	Data Type / Properties	Constraints / Indexing
provider	TEXT	NOT NULL
user_id	SERIAL, Primary Key	NOT NULL, UNIQUE
email	TEXT	UNIQUE, Indexed
hashed_password	TEXT
name	TEXT	NOT NULL

details:

We should be able to support both traditional email-password based logins as well as o-auth based logins/signups.

traditional login and signup

When the user signs up, firstly we ask for the following information:

name | email | password

After, we have this information, then we use something like:

-- insert the newly created user into the system
INSERT INTO users (provider, email, hashed_password, name) 
VALUES ('open-pocket', 'xyz@mail.com, '12d8287g1', 'xyz');

Note that open-pocket is the provider here, since, the user is registering through our platform. And user_id will be generated autormatically.

Password hashing should be done in the application using bcrypt.

Now when the user tries to login.

We ask for the following information:

email | password

Then we do something like:

-- find the hashed_password for user with email=email
SELECT hashed_password FROM users WHERE email=[supplied_email]

If this query executes successfully, but returns no rows. Then, the user is not present in the system (not registered). Otherwise, the hashed_password must match hash_of_supplied_password for successfull login.

O-auth login and signup

O-auth logins will be supported in the future, once we are done with email-password logins. The provider field will store the O-auth provider's name (e.g. google, github etc.). And additional fields will be added as required.

Attribute	Data Type / Properties	Constraints / Indexing
tag_id	SERIAL, Primary Key	NOT NULL, UNIQUE
user_id	INTEGER, Foreign Key → Users.user_id	NOT NULL
item_id	BIG INTEGER, Foreign Key → Articles.item_id	NOT NULL
tag_name	TEXT	Indexed, NOT NULL

Images

Attribute	Data Type / Properties	Constraints / Indexing
image_id	BIGINT, Primary Key	NOT NULL, AUTO INCREMENT, UNIQUE
article_id	BIGINT, Foreign Key → Articles.item_id	Indexed, NOT NULL
src	TEXT	NOT NULL
width	INTEGER
height	INTEGER
credit	TEXT
caption	TEXT

Videos

Attribute	Data Type / Properties	Constraints / Indexing
video_id	BIGINT, Primary Key	NOT NULL, AUTO INCREMENT, UNIQUE
article_id	BIGINT, Foreign Key → Articles.item_id	Indexed, NOT NULL
src	TEXT	NOT NULL
width	INTEGER
height	INTEGER
type	VARCHAR(50)
vid	VARCHAR(100)

authors

Attribute	Data Type / Properties	Constraints / Indexing
id	BIGINT, Primary Key	NOT NULL, AUTO INCREMENT, UNIQUE
name	VARCHAR(255)	NOT NULL
url	TEXT

domains

Attribute	Data Type / Properties	Constraints / Indexing
domain_id	BIGINT, Primary Key	NOT NULL, AUTO INCREMENT, UNIQUE
name	TEXT	NOT NULL

This parameter is require, since /add returns it.

articles​

Description of the fields​

Details:​

users​

details:​

login and signup​

tags​

Images​

Videos​

authors​

domains​