Database schema
All schemas need to ensure that we support required operations.
articles
| Attribute | Data Type / Properties | Constraints / Indexing |
|---|---|---|
| item_id | BIGINT, Primary Key | NOT NULL, AUTO INCREMENT, UNIQUE |
| user_id | BIGINT, Foreign Key → users.user_id | Indexed, NOT NULL |
| status | SMALLINT | |
| favorite | BOOLEAN | DEFAULT false |
| given_url | TEXT | NOT NULL |
| given_title | TEXT | |
| resolved_title | TEXT | |
| resolved_url | TEXT | |
| domain_id | BIGINT, Foreign Key -> domains.domain_id | Domain of given_url |
| origin_domain_id | BIGINT, Foreign Key -> domains.domain_id | Domain of resolved_url |
| excerpt | TEXT | |
| is_article | BOOLEAN | DEFAULT false |
| is_index | BOOLEAN | DEFAULT false |
| has_video | SMALLINT | DEFAULT false |
| has_image | SMALLINT | DEFAULT false |
| word_count | INTEGER | |
| time_added | TIMESTAMP | Indexed, DEFAULT now() |
| time_updated | TIMESTAMP | |
| time_favorited | TIMESTAMP | |
| top_image_url | TEXT | |
| author_id | BIGINT, Foreign Key → authors.author_id | Indexed |
Description of the fields
item_id: Unique identifier for the saved itemuser_id: id of the user to which this article/item belongsstatus: 0 = active, 1 = archived, 2 = deletedfavorite: weather article is favorited by the user. 1=favorited, 0=not favoritedresolved_title: title, found by the parser/backendresolved_url: final URL resolved to by the parser/backend. For example, by following redirectsexcerpt: excerpt of the article found by the parser/backendis_article: weather the item is an articleis_index: true if parser/backedn thinks this is an index page- 1=parser thinks this item is an index page
- 0=otherwise
has_video:- 0=no video;
- 1=has a video in the body of the article;
- 2=is a video
has_image:- 0=no image;
- 1=has an image in the body of the article;
- 2=is an image
word_count: number of words in the articletime_added: timestamp (with timezone info), at which the article was addedtime_updated: timestamp (with timezone info), at which the article was last updated (like last read/opened)time_favorited: time at which the article was favoritedtop_image_url: URL of the first found image in html pageauthor_id: id of the author of the article
Details:
- this is the main and most important table, which stores information about the stored links.
- we anticipate queries of the form
-- find all articles that belong to user with user_id -> 123
SELECT * FROM articles WHERE user_id = 123 ORDER BY time_added
- or we might need articles that were added after "1 january 2024";
SELECT * FROM articles WHERE user_id = 123 AND time_added > '2024-01-01';
- we may also need to filter by
statusandfavorite
users
| Attribute | Data Type / Properties | Constraints / Indexing |
|---|---|---|
| provider | TEXT | NOT NULL |
| user_id | SERIAL, Primary Key | NOT NULL, UNIQUE |
| TEXT | UNIQUE, Indexed | |
| hashed_password | TEXT | |
| name | TEXT | NOT NULL |
details:
login and signup
We should be able to support both traditional email-password based
logins as well as o-auth based logins/signups.
- traditional login and signup
When the user signs up, firstly we ask for the following information:
name | email | password
After, we have this information, then we use something like:
-- insert the newly created user into the system
INSERT INTO users (provider, email, hashed_password, name)
VALUES ('open-pocket', 'xyz@mail.com, '12d8287g1', 'xyz');
Note that open-pocket is the provider here, since, the user is
registering through our platform. And user_id will be generated
autormatically.
Password hashing should be done in the application using
bcrypt.
Now when the user tries to login.
We ask for the following information:
email | password
Then we do something like:
-- find the hashed_password for user with email=email
SELECT hashed_password FROM users WHERE email=[supplied_email]
If this query executes successfully, but returns no rows. Then, the user
is not present in the system (not registered). Otherwise, the
hashed_password must match hash_of_supplied_password for successfull
login.
- O-auth login and signup
O-auth logins will be supported in the future, once we are done with
email-password logins. The provider field will store the O-auth
provider's name (e.g. google, github etc.). And additional fields will
be added as required.
tags
| Attribute | Data Type / Properties | Constraints / Indexing |
|---|---|---|
| tag_id | SERIAL, Primary Key | NOT NULL, UNIQUE |
| user_id | INTEGER, Foreign Key → Users.user_id | NOT NULL |
| item_id | BIG INTEGER, Foreign Key → Articles.item_id | NOT NULL |
| tag_name | TEXT | Indexed, NOT NULL |
Further Indexes and Constraints:
- Index on
(user_id, item_id) - Index on
(user_id, tag_name) (tag_name, user_id, item_id)are UNIQUE combined. Therefore, no duplicate tag is allowed
Notes:
- this table basically contains: an article with
tag=tag_nameon article with item_id, belonging to user of user_id - the fields: (tag_name, user_id, item_id) combined must be unique
- we will be required to handle queries of the form:
-- get all tags on an article with `item_id` and which belongs to
-- user with `user_id`
SELECT tag_name FROM tags WHERE item_id=123 AND user_id=123;
-
therefore, it is beneficial to create an index on (item_id, user_id)
-
here's another use case:
find all articles belonging to user with user_id and with given
tag_name:
SELECT item_id FROM tags WHERE user_id=123 AND tag_name="xyz";
- therefore it might be beneficial to make an index on (user_id, tag_name) columns
Images
| Attribute | Data Type / Properties | Constraints / Indexing |
|---|---|---|
| image_id | BIGINT, Primary Key | NOT NULL, AUTO INCREMENT, UNIQUE |
| article_id | BIGINT, Foreign Key → Articles.item_id | Indexed, NOT NULL |
| src | TEXT | NOT NULL |
| width | INTEGER | |
| height | INTEGER | |
| credit | TEXT | |
| caption | TEXT |
Videos
| Attribute | Data Type / Properties | Constraints / Indexing |
|---|---|---|
| video_id | BIGINT, Primary Key | NOT NULL, AUTO INCREMENT, UNIQUE |
| article_id | BIGINT, Foreign Key → Articles.item_id | Indexed, NOT NULL |
| src | TEXT | NOT NULL |
| width | INTEGER | |
| height | INTEGER | |
| type | VARCHAR(50) | |
| vid | VARCHAR(100) |
authors
| Attribute | Data Type / Properties | Constraints / Indexing |
|---|---|---|
| id | BIGINT, Primary Key | NOT NULL, AUTO INCREMENT, UNIQUE |
| name | VARCHAR(255) | NOT NULL |
| url | TEXT |
domains
| Attribute | Data Type / Properties | Constraints / Indexing |
|---|---|---|
| domain_id | BIGINT, Primary Key | NOT NULL, AUTO INCREMENT, UNIQUE |
| name | TEXT | NOT NULL |
This parameter is require, since
/add
returns it.