Graced by Your Presence


Friday Fun Session Participants

Those of us who participate(d) our weekly learning and discussion session:

  1. Bala Krishnan
  2. Tang Biao
  3. Vignesh Shankar
  4. Chia Wei Woo
  5. Mahadevan Hariharan
  6. Ramakrishnan Kalyanaraman
  7. William Lim
  8. Srila Das Bhattacharya
  9. Sravani Vanukuru
  10. Kristipi Valledor
  11. Jeffrey Quiatchon
  12. Jothi Kiruthika
  13. Sayed Neda Fatima
  14. Sreenivasulu Gotla
  15. Vishal Gupta
  16. French Jean Palma Jumawan
  17. Gopi Krishna Pasupuleti
  18. Htet Aung Nay
  19. Aquib Javed Momin
  20. Pravinkarthy Ravi
  21. Rishabh Mangal
  22. Sunil Koli
  23. Vikas Pai
  24. Sandip Dangat
  25. Hui Ling Chong
  26. Srinivasa Puchakayala Reddy
  27. Manikandan Chandran
  28. Sharon Wong
  29. Uma Maheswary Ganesan
  30. Ishwarya Sridharan
  31. Aristotle Tiru
  32. Balamurugan Chennarayaperumal
  33. Aarti Piskala Dhanabalan
  34. Karthik Kumar
  35. Sunil Khamkar
  36. Handy Toh Torres
  37. Daniel Vo
  38. Srinivasan Badri Prasad
  39. Parthasarathi Murugaiyan
  40. Hieu Nguyen Van
  41. Manikandan Panneerselvam
  42. Jayamaran Ayilu
  43. Muukta Kedar
  44. Gaurav Singh
  45. Vikas Kitawat
  46. Tanveer Shaikh
  47. Vishal Jain
  48. Dipti Saurabh Shindhe
  49. Samir Tank
  50. Bhushan Patil
  51. Munendra Tomar
  52. Prabakaran Boopathi
  53. Vikraman Sridharan
  54. Srikanth Rokkam
  55. Santhosh Kumar Janakiraman
  56. Christabel Merline
  57. Ankit Jain
  58. Neethila Arasi
  59. Hari Gopal Raman
  60. Sheryl Teo
  61. Jocelyn Pacson Maranan
  62. Kannan Palanisamy
  63. Chitra Muthu
  64. Chaitanya Joshi
  65. Billie Santiago
  66. Thandar Win
  67. Julius Pedroso
  68. Chinta Nagendra Babu
  69. Lourdu Michael Sam
  70. Gayan Gunarathne
  71. Amit Gupta
  72. Akshatha Davasam
  73. Parimi Chowdary
  74. Haribabu Gattipati
  75. Gopal Chandra Das

Problems in JLTi Code Jam

JLTi Code Jam Problems

Year 2018

Mar (13th) – Currency Arbitrage with Increasing Rate

Feb (12th) – Currency Arbitrage with Decreasing Rate

Jan (11th) – Currency Arbitrage

Year 2017

Dec (10th) – Sprint Completion Time

Nov (9th) – RC Election Result

Oct (8th) – Choosing Oranges

Sep (7th) – Team Lunch

Aug (6th) – FaaS

Jul (5th) – Scoring Weight Loss

Jun (4th) – Manipulating Money Exchange

May (3rd) – Making Money at Stock Exchange

Apr (2nd) – Company Tour to Noland

Mar (1st) – No Two Team Member Next to Each Other

Index

Other Posts

Topics in Friday Fun Session

Friday Fun Session Topics

Year 2019

Feb

14th Feb 2019 (91st) – Build BST in an Efficient Way to Count Height of Each Node (Contd.)

7th Feb 2019 (90th) – Build BST in an Efficient Way to Count Height of Each Node (Contd.)

Jan

31st Jan 2019 (89th) – Build BST in an Efficient Way to Count Height of Each Node (Contd.)

25th Jan 2019 (88th) – Build BST in an Efficient Way to Count Height of Each Node

17th Jan 2019 (87th) – Common First Ancestor of Two Nodes in a Binary Tree with All Unique Values

10th Jan 2019 (86th) – Check if a Binary Tree is Balanced

3rd Jan 2019 (85th) – Number of Paths with a Certain Sum in a Binary Tree (Top-down)

Year 2018

Dec

27th Dec 2018 (84th) – Number of Subsequences Making a Certain Sum In an Array

20th Dec 2018 (83rd) – Number of Paths with a Certain Sum in a Binary Tree (Bottom-up)

13th Dec 2018 (82nd) – Binary Tree to Doubly Linked List

5th Dec 2018 (81st) – Given a BST, Find All Input Sets That Can Build it

Nov

28th Nov 2018 (80th) – Number of ways a BST can be built with n distinct keys

21st Nov 2018 (79th) – Merge Two Lists in All Possible Ways Preserving Relative Order of Elements Within Each List

14th Nov 2018 (78th) – Two-way/Bidirectional Search BFS

7th Nov 2018 (77th) – Detecting Cycle in a Directed Graph

Oct

31st Oct 2018 (76th) – Drawing with HTML5 <canvas> and JavaScript, Rotation of Axes, Arrow Drawing

24th Oct 2018 (75th) – CA, ICA, Chain of Trust

17th Oct 2018 (74th) – How does SSL/TLS work

10th Oct 2018 (73rd) – Self-signed SAN Certificate for localhost Using OpenSSL

3rd Oct 2018 (72nd) – Gradient Descent

Sep

26th Sep 2018 (71th) – Simple Linear Regression Using Gradient Descent

19th Sep 2018 (70th) – Simple Linear Regression Using Linear Least Squares

12th Sep 2018 (69th) – Multiple Linear Regression Demo Using R

5th Sep 2018 (68th) – Cycle Detection Using Union by Rank and Path Compression in an Undirected Graph

Aug

29th Aug 2018 (67th) – Union by Rank and Path Compression

22nd Aug 2018 – Missed, Public Holiday

15th Aug 2018 (66th) – Stable Roommates Problem (continued)

8st Aug 2018 (65th) – Stable Roommates Problem (continued)

1st Aug 2018 (64th) – Stable Roommates Problem

Jul

25th Jul 2018 (63rd) – 2-d Array Printing in Spiral order

18th Jul 2018 (62nd) – Stable Marriage Problem (continued)

11th Jul 2018 (61st) – Stable Marriage Problem

4th Jul 2018 (60th) – DFS

Jun

27th Jun 2018 (59th) – BFS

20th Jun 2018 – Cancelled

13th Jun 2018 (58th) – Ford-Fulkerson Method (Max-flow)

6th Jun 2018 – Missed, On Leave

May

30th May 2018 – Missed, On Leave

23rd May 2018 (57th) – Karger’s Algorithm (Minimum cut)

16th May 2018 (56th) – Solution – Currency Arbitrage with Increasing Rate

11th May 2018 – Cancelled

4th May 2018 – Cancelled

Apr

27th Apr 2018 – Cancelled

20th Apr 2018 – Cancelled

13rd Apr 2018 – Cancelled

6th Apr 2018 – Cancelled

Mar

30th Mar 2018 – Missed, Public Holiday

23rd Mar 2018 (55th) – Bitcoin – Simple Payment Verification (SPV)

16th Mar 2018 (54th) – Cryptographic Hash Function – Properties

9th Mar 2018 (53rd) – Collation in MS SQL Server

2nd Mar 2018 (52nd) – Solution – Currency Arbitrage with Decreasing Rate

Feb

23rd Feb 2018 (51st) – Merkle Tree

16th Feb 2018 – Missed, Chinese New Year

9th Feb 2018 (50th) – RSA

2nd Feb 2018 (49th) – Solution – Currency Arbitrage

 Jan

26th Jan 2018 (48th) – Overview of Bitcoin and Blockchain

19th Jan 2018 (47th) – Johnson’s Algorithm

12th Jan 2018 (46th) – Dijkstra’s Problem with Negative Edge

5th Jan 2018 (45th) – Solution – Sprint Completion Time

Year 2017

Dec

29th Dec 2017 – Missed, On Leave

22nd Dec 2017 – Missed, On Leave

15th Dec 2017 (44th) – Rod Cutting Problem

8th Dec 2017 (43rd) – Task Scheduling – Unlimited Server

1st Dec 2017 (42nd) – Solution – RC Election Result

Nov

24th Nov 2017 (41st) – Traveling Salesman Problem (Brute force and Bellman–Held–Karp)

17th Nov 2017 (40th) – Hamiltonian Path

10th Nov 2017 (39th) – Coin Exchange – Min Number of Coins

3rd Nov 2017 (38th) – Solution – Choosing Oranges

Oct

27th Oct 2017 – Missed, On Leave

20th Oct 2017 (37th) – Coin Exchange – Number of Ways

13th Oct 2017 – Missed, JLT D&D

6th Oct 2017 (36th) – Solution – Team Lunch

Sep

29th Sep 2017 (35th) – Floyd-Warshall Algorithm

22nd Sep 2017 (34th) – Executing SP Using EF; Transaction in Nested SP

15th Sep 2017 (33rd) – Solution – FaaS; Pseudo-polynomial Complexity

8th Sep 2017 – Missed, JLT Family Day

1st Sep 2017 – Missed, Hari Raya

Aug

25th Aug 2017 (32nd) – Multithreaded Programming

18th Aug 2017 (31st) – Knapsack Problem

11th Aug 2017 (30th) – Vertex Coloring

4th Aug 2017 (29th) – Solution – Scoring Weight Loss

Jul

28th Jul 2017 (28th) – Minimum Spanning Tree – Kruskal and Prim

21st Jul 2017 (27th) – Pseudorandom Number Generator

14th Jul 2017 (26th) – Rete Algorithm

7th Jul 2017 (25th) –  Solution – Manipulating Money Exchange

Jun

30th Jun 2017 (24th) –  Rules Engine

23rd Jun 2017 (23rd) –  Inducting Classification Tree

16th Jun 2017 (22nd) –  Incision into Isolation Level; Interpreting IIS Internals; Synchronizing Web System

9th Jun 2017 (21st) –  Maximum Subarray Problem

2nd Jun 2017 (20th) –  Solution – Making Money at Stock Market

May

26th May 2017 (19th) –  Understanding Correlation Coefficient; k-NN Using R

19th May 2017 (18th) –  k-d Tree and Nearest Neighbour Search

12th May 2017 (17th) –  Bellman Ford Algorithm

5th May 2017 (16th) –  Solution – Company Tour 2017 to Noland

Apr

28th Apr 2017 (15th) –  Models in Machine Learning; k-Nearest Neighbors (k-NN)

21st Apr 2017 (14th) – Edit/Levenshtein Distance

14th Apr 2017 – Missed, Good Friday

7th Apr 2017 (13th) – Solution – No Two Team Member Next to Each Other

Mar

31st Mar 2017 (12th) – N-queens

24th Mar 2017 (11th) – Longest Common Subsequence (LCS)

17th Mar 2017 (10th) – Dijkstra’s Algorithm

10th Mar 2017 (9th) – Infix, Prefix (Polish), Postfix (Reverse Polish)

3rd Mar 2017 (8th) – Order 2-D Array in all Directions & Find all Triplets with Sum Zero in an Array

Feb

24th Feb 2017 (7th) – Trailing Zeros in a Factorial

17th Feb 2017 (6th) – Is this Tree a BST?

10th Feb 2017 (5th) – Given a Number, Find the Smallest Next Palindrome

3rd Feb 2017 (4th) – Merge n Sorted Lists, Each Having m Numbers, Into a Sorted List

Jan

27th Jan 2017 – Missed, Chinese New Year Eve

20th Jan 2017 (3rd) – Shortest Exit from Maze

13rd Jan 2017 (2nd) – Finding Fibonacci – Exponential vs. Linear

6th Jan 2017 (1st) – Gmail API with OAuth 2.0

Other Posts

Problems in JLTi Code Jam

Executing SP using EF

34th Friday Fun Session (Part 1) – 22nd Sep 2017

Many a times, we use Entity Framework (EF), Microsoft’s recommended data access technology for an application, to execute (MS SQL Server) Stored Procedure (SP), and retrieve the results emitted by them. Here we discuss different kinds of output that a SP can produce and how we can retrieve them using EF.

SP output

A SP typically provides the following kinds of results:

  1. Return code (single integer)
  2. Output parameters (one or more, any data type)
  3. Result set
    1. Single result set
    2. Multiple result set
      1. All having the same schema
      2. Having different schema

Return code

SP can return a single integer return code. Return statement without any value (null) would automatically return 0. It is mostly used to exit execution of a SP when certain condition is met.

CREATE Procedure SpReturnCode
AS
BEGIN
  RETURN 1;
END

Using T-SQL we can execute SP like below.

DECLARE @Success AS INT
EXEC @Success = [SpReturnCode];
PRINT @Success

Output parameter

SP can return one or more values, each having its own data type.

CREATE Procedure SpOutputParameter (@InputValue INT, @OutputValue INT OUTPUT)
AS
BEGIN
  SET @OutputValue = @InputValue + 1;
  RETURN 0;
END

Using T-SQL we can execute SP like below.

DECLARE @ReturnValue INT;
EXECUTE [SpOutputParameter] 2, @ReturnValue OUTPUT;
PRINT @ReturnValue

Single result set

Returns a result set having 0 or more rows of a certain schema. The following SP returns a result set with a single column named, Success.

CREATE Procedure SpSingleResultSet
AS
BEGIN
  SELECT 3 AS Success
  RETURN 0;
END

Multiple result set, same schema

The following SP returns the result set for Employee schema twice.

CREATE Procedure SpMultipleResultSetSameSchema
AS
BEGIN
  SELECT * FROM [Employee]
  SELECT * FROM [Employee] SELECT [EmployeeId] > 10

  RETURN 0;
END

The following SP returns a result set that is not associated with any database entity.

CREATE Procedure SpMultipleResultSetNonDbContextEntity
AS
BEGIN
  DECLARE @Loop AS INT
  SET @Loop = 0

  WHILE @Loop < 10
  BEGIN
    EXEC SpSingleResultSet
    SET @Loop = @Loop + 1
  END

  RETURN 0;
END

Multiple result set, multiple schema

The following SP returns two different result sets: one for Company and another for Employee.

CREATE Procedure SpMultipleResultSetMultipleSchema
AS
BEGIN
  SELECT * FROM [Company]
  SELECT * FROM [Employee]

  RETURN 0;
END

Executing SP using EF

We will use the following different ways in EF to read different kinds of SP output as described earlier:

  1. ExecuteSqlCommand
  2. SqlQuery
  3. CreateCommand/ExecuteReader

ExecuteSqlCommand

This executes a given DDL/DML command. This can be executed when no result set needs to be returned.

Return code

Return code does not require any explicit output parameter to be used in the SP. However, while calling from EF, it should be treated as an output parameter by specifying the direction for the parameter.

SqlParameter returnCode = new SqlParameter("@ReturnCode", SqlDbType.Int);
returnCode.Direction = ParameterDirection.Output;

Db.Database.ExecuteSqlCommand("exec @ReturnCode = [SpReturnCode] ", returnCode);
var returnCodeValue = (int)returnCode.Value;

Output Parameter

For each of the output parameters, we need to declare an output parameter in EF matching the appropriate data type.

SqlParameter inputParam = new SqlParameter("@InputValue", SqlDbType.Int);
inputParam.Direction = ParameterDirection.Input;

SqlParameter outputParam = new SqlParameter("@OutputValue ", SqlDbType.Int);
outputParam.Direction = ParameterDirection.Output;

Db.Database.ExecuteSqlCommand("[SpOutputParameter] @InputValue, @OutputValue OUT", inputParam, outputParam);
var returnValue = (int)outputParam.Value;

SqlQuery

SqlQuery is usually used when SP returns a single result set. However, it can return any data type including primitive types – not necessarily only entity type. If the SP returns multiple result sets, it will only get the first one. However, the complete execution of the entire SP does happen.

public class SuccessSet
{
  public int Success { get; set; }
}

var result = Db.Database.SqlQuery("[SpSingleResultSet]").ToList();

CreateCommand/ExecuteReader

When multiple result sets to be returned this method can be used. We need to use IObjectContextAdapter interface that makes use explicit interface implementation.

Db.Database.Initialize(force: false);
var cmd = Db.Database.Connection.CreateCommand();
cmd.CommandText = "[SpMultipleResultSetSameSchema]";

Db.Database.Connection.Open();

var reader = cmd.ExecuteReader();
var employees =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
    .ObjectContext
    .Translate<Employee>(reader, "Employee", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

foreach (var employee in employees)
  Console.WriteLine(employee. Name);

While(!reader.NextResult())
{
  employees =
    ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
    .ObjectContext
    .Translate<Employee>(reader, "Employee", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

  foreach (var employee in employees)
    Console.WriteLine(employee. Name);
}

Db.GetDatabase().Connection.Close();

When result set is not in DbContext, slight modification is required.

var successSets =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
  .ObjectContext
  .Translate<SuccessSet>(reader);

When different schema are used we can still use the same, we just need to use the right object while translating.

var companies =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
  .ObjectContext
  .Translate<Company>(reader, "Company", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

reader.NextResult();

var employees =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
  .ObjectContext
  .Translate<Employee>(reader, "Employee", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

Index

Pseudo-polynomial Complexity

33rd Friday Fun Session (Part 2) – 15th Sep 2017

The complexity for FaaS solution is O(n), where n is the largest day number. It looks like polynomial. However, it is actually pseudo-polynomial.

Size of input

Complexity is measured in terms of the size of input, say, in bits. Suppose, there are b bits in n. Then O(n) = O(2b) and hence, it is exponential.

Let’s assume n increases from 10 to 1125899906842624. More specifically, lunch schedule, as used in the previous example, changes from 1, 3, 4, 5, 6, 7, 10 to 1, 3, 4, 5, 6, 7, 1125899906842624. We still have the same 7 days to go for lunch. Yet, we are running 1,125,899,906,842,624 loops. In our layman understanding, the problem is still the same and should have taken the same amount of time to execute, and yet, for the latter, the algorithm takes way too long!

Spot a pseudo-polynomial

This is how we spot a pseudo-polynomial time algorithm. Ideally, we would like to express the complexity using the number of inputs; here, it should have been 7. But the above algorithm works in a way, where the complexity has been expressed in one of the numeric values of the input, the maximum value of the input – 1125899906842624, to be precise. This is where we are tricked into believing it to be a polynomial time algorithm, linear (polynomial) in the (max) numeric value of the input. But if we apply the definition of complexity that takes into consideration the size/length of the input, then it is actually exponential.

To be more specific, if we look at the input size, 4 bits are required to represent 10, while 50 bits are required to represent 1,125,899,906,842,624. Complexity has gone from O(24) = 10 loops to O(250) = 1,125,899,906,842,624 loops.

That is essentially exponential in the number of bits, meaning exponential in the size of the input but polynomial in the numeric value of the input. Algorithm with this kind of running time is called pseudo-polynomial.

Truly polynomial

At this point, you might wonder what is a truly polynomial time algorithm. For example, when we add n numbers using a loop running n times, we say, the complexity of it to be O(n). But here this n can also be written as 2b. So, shall we also say, adding n numbers is a pseudo-polynomial time algorithm?

Well, when we say, adding n numbers, we implicitly say, we want to find the sum of n 32 bit numbers/integers. Then the size of n numbers is 32 * n. Once again, the formal definition of complexity is defined in terms of input size, in bits. What is the input size here? The size here is 32n. The complexity is O(32n) and removing the constant terms it is O(n), a truly polynomial time algorithm.

Index

Solution – FaaS

33rd Friday Fun Session – 15th Sep 2017

Given a lunch schedule – a sequence of days when lunch is planned, and three price plans – daily, weekly and monthly, we want to get the cheapest lunch price.

This is the solution to JLTi Code Jam – Aug 2017 problem.

Let us walk through an example

Let us take an example as mentioned here: 1, 2, 4, 5, 17, 18. Since first day is 1 and last day is 18, it can be put under a month that consists of 20 consecutive days (not calendar month). We can use a monthly plan. But it would be too expensive (S$ 99.99) for just 6 days.

The days: 1, 2, 4 and 5 fall within a week that requires consecutive 5 days (not a calendar week). We have an option to buy a weekly plan for these 4 days that would cost S$ 27.99. However, that would be higher than had we bought day-wise for 4 days at a price of S$24.

Dynamic Programming

In general, at any given day, we have three options:

  1. We buy lunch for this day alone, using daily price S$ 6. Add that to the best price found for the previous day.
  2. We treat this as the last day of a week, if applicable, and buy a weekly plan at a cost of S$ 27.99. Add that to the best price for the day immediately prior to the first day of this week.
  3. We treat this as the last day of a month, if applicable, and buy a monthly plan at a cost of S$ 99.99. Add that to the best price for the day immediately prior to the first day of this month.

This is an optimization problem that can be solved with dynamic programming where we use the result of already solved sub-problems.

Bottom-up

We have two options: top-down and bottom-up. We realize that, at the end, all the sub-problems (for each of the days) have to be solved. We also find that it is easy to visualize the problem bottom-up. And if we do use bottom-up then the required space would be limited by the last day number.

Hence, we will solve it using bottom-up dynamic programming.

Blue colored days are when lunch is scheduled.

DP table1.png

On day 1:

Cost S$ 6.

On day 2:

Daily basis: S$ 6 + price at day 1 = S$ 12

Weekly basis: S$ 27.99

Monthly basis: S$ 99.99

Best price: S$ 12

On day 3:

No lunch schedule, cost of previous day S$ 12 is its cost.

On day 4:

Daily basis: S$ 6 + price at day 3 = S$ 18

Weekly basis: S$ 27.99

Monthly basis: S$ 99.99

Best price: S$ 18

On day 5:

Daily basis: S$ 6 + price at day 4 = S$ 24

Weekly basis: S$ 27.99

Monthly basis: S$ 99.99

Best price: S$ 24

From day 6 to day 16:

No lunch schedule, cost of previous day will be carried forward: S$ 24.

On day 17:

Daily basis: S$ 6 + price at day 16 = S$ 30

Weekly basis: S$ 27.99 + price at day 12 = S$ 51.99

Monthly basis: S$ 99.99

Best price: S$ 30

On day 18:

Daily basis: S$ 6 + price at day 17 = S$ 36

Weekly basis: S$ 27.99 + price at day 13 = S$ 51.99

Monthly basis: S$ 99.99

Best price: S$ 36

Finally, the best price is S$ 36.

Another example

Let us work with another example: 1, 3, 4, 5, 6, 7, 10.

DP table2

On day 7:

Daily basis: S$ 6 + price at day 6 = S$ 36

Weekly basis: S$ 27.99 + price at day 2 = S$ 33.99

Monthly basis: S$ 99.99

Best price: S$ 33.99

Finally, the best price at the end is S$ 39.99.

Complexity

The complexity is O(n), where n is the largest day number. It is a pseudo-polynomial time algorithm.

GitHub: FaaS

Index

Gmail API with OAuth 2.0

1st Friday Fun Session – 6th Jan 2017

IMAP and POP3 are the two most prevalent standard protocols for email retrieval that work over a TCP/IP connection. However, Gmail also introduced Gmail API that gives RESTful access to user’s mailbox under OAuth 2.0 authorization. To get a feel of this, we will write a small desktop application that uses Gmail API to connect to a specific mailbox and performs some operations.

Application description

Our small C# console application connects to a specific mailbox, retrieves items filtered by a time period and puts back the same in the same folder (Inbox, Sent etc.), from where it was originally read, in an encrypted form.

We will connect the mailbox using RESTful access through Gmail API. That also means, authorization would be done using OAuth 2.0 protocol.

Input to the application

Input to the application would be: mailbox name, mailbox password, start of the retrieval period, end of the retrieval period, a password, and a salt – the  last two are used for generating symmetric keys for encryption.

The input is read from a file named input.txt, placed in the application path.

Each of the five input fields is a key-value pair, separated by a single space, occupying a single line. Sample input.txt file content is shown below:

  1. mailbox some.email.id@gmail.com
  2. start_time 2016/12/14
  3. end_time 2016/12/15
  4. input_key BA994A37-901D-4777-8054-6C5D87500AB7
  5. salt 65DA56D0-9285-41A0-975E-323420D0602B

The key mailbox denotes gmail id. Both dates use yyyy/M/d format, end_time being no earlier than start_time. Time could be included/used. Salt must be at least 8 bytes long.

The steps

We need to do the following steps:

  1. Authentication/Authorization
  2. Getting emails from Gmail
  3. Encryption
  4. Insert emails back

Authentication/Authorization

Google APIs (Gmail is one of them) use OAuth 2.0 protocol for authentication and authorization. All applications have to follow a basic pattern for accessing a Google API using OAuth 2.0.

Get credentials

At first, we must create a Google API Console project. Once created, we get credentials for it. Credentials are essentially composed of the following two values that are known to both Google and the application.

  1. Client ID
  2. Client Secret

Google console pages look like below:

Google Console

Application Secret

Get access token

The second step is to get an access token from Google Authorization Server. During access token request, a scope parameter is also sent. Scope indicates all the accesses requested for.

Depending on the kind of applications (web server, installed, client side) the authorization sequences might vary slightly. Since ours is an installed application, we would get authorization code first. We then exchange this code for a token. Below image that is taken from Google, depicts the process.

webflow

Get authorization code

To get authorization code from Google, we first open an HttpListener on local  loopback address, with a dynamically found available port. However, this requires elevated privilege. Hence, the application needs to run at elevated privilege. This could be avoided by doing URL reservation. However, that itself requires elevation.

We then open an authorization request in browser from the application. Authorization request would require:

  1. Authorization endpoint: https://accounts.google.com/o/oauth2/v2/auth
  2. Scope: https://www.googleapis.com/auth/gmail.modify
  3. Redirect URI: local loopback address, as explained earlier
  4. Client Id: Client Id credential, as retrieved for the Google API Console project
  5. State etc.: some additional information like state etc. to verify that response from server is due to a request made by this application. Verification happens once the response from Google server is back.

User’s consent screen looks like below.

Authentication picture

Scope can vary. Modify as scope, as shown here is sufficient to call all the three APIs (get, list and insert) that we need for our application. However, instead of asking for all the authorization upfront, scope should be increased incrementally.

If all is well, we will get an authorization code. However, to make any API call, we need access token. So now we have to exchange this code for a token.

We make an HttpWebRequest POST using

  1. Token endpoint: https://www.googleapis.com/oauth2/v4/token
  2. Code: as found earlier
  3. Client ID: credential as explained earlier
  4. Client secret: credential as explained earlier

All Gmail API endpoints are https, meaning all API calls have to be made using encrypted SSL/TLS channel. In .NET, we can easily use HttpWebRequest class. However, in other languages, if such a library is not available, we might use GnuTLS, OpenSSL etc. to create a TLS/SSL channel to enable us making https calls.

Make API call using access token

Now we can make Gmail API call, using the token in an authorization header.

Refresh Access Token

Access token has limited lifetime. We might require refreshing the access token, if necessary.

Getting emails from Gmail

To read emails, we will use list and get APIs, using the access token that we already received.

list

For list API call, we are making an HttpWebRequest GET request with endpoint: https://www.googleapis.com/gmail/v1/users/userId/messages.

Parameter userId is the gmail id. Optional query parameter q can be used, to specify after and before date to filter data. It should be mentioned that there is a max limit (I found it to be 500) as to how many emails (essentially, emaid id and thread id combination) we can read. We are content with as many results as found in the first call. No attempt is made to make further calls to read more emails.

get

For each listed email found in the above call we are now calling the get API using GET https://www.googleapis.com/gmail/v1/users/userId/messages/id.

Parameter userId is gmail id and id is the email id that is found from the previous call. Optional query parameter, format is used with value set to full. For each email, we are retrieving the subject and labels. If the input has a time component, then at this stage, we can further filter based on it, using internalDate property of the Users.Message resource.

Encryption

Encrypt with Rijndael

For encryption, Rijndael is used. Password and salt are taken as input. A 256 bit AES key is generated. We are using RijndaelManaged with default 128 bit block size; hence, it is AES compatible. Since this is symmetric key, the same key is used for both encryption and decryption. And the same key is used for all the emails.

get with format raw

For each email, we are making another call to get API, this time with format set to raw. Here we are duplicating the read API call. However, this way, we can directly get the raw property of Message resource, necessary to make insert API call. We don’t need to construct the raw property on our own.

At first, we are decoding the raw as found above. Then we are searching for the subject (that, we collected earlier, using get API with format full) in that. Starting from first character of the subject, we are encrypting the rest of raw. That will encrypt subject and body parts, along with some more data.

Regarding encryption, we could do that for the whole of raw or only for parts like subject, body and so on. As mentioned above, for simplicity, we have encrypted everything starting from subject. We have also written functions and tested that we can correctly decrypt the encrypted data.

While inserting the email into gmail folder, it would have been better if we could somehow indicate that the email is encrypted. That way, we could stop encrypting an email more than once, or identify that a certain email is in encrypted form.

So, in short, data in field raw of the Message resource is decoded, converted to a string and then encryption is applied on the plain text. It is then reverted back to base64url encoded form for insert.

Insert emails back

insert

The encoded raw along with labelIds collected earlier is now used to make an insert API call using POST https://www.googleapis.com/gmail/v1/users/userId/messages.

Once again, userId is the gmail id. Only raw and labelIds are used (in the request body), the latter (labelIds) is to make sure that the inserted email ends up in the same folder. Thread Id parameter could be used easily, however, chose not to include.

We have not used the upload URI. This is because we have only dealt with metadata and not (email) attachments, once again for simplicity.

Output

When the application runs, it asks user to press a key. Once the key is pressed, it starts working in asynchronous mode. Pressing a key further would stop it.

The application first opens Google server in browser for user to authenticate/authorize. Once that is done, it prints out how many emails it is going to read/encrypt/insert, based on the input parameters. It might take a while before it starts printing a line for each of the inserted emails.

This is how it looks in the Gmail folder.

Output.png

GMail API client libraries could be good starting point for writing such an application.

GitHub: GMail API

Index