Executing SP using EF

34th Friday Fun Session (Part 1) – 22nd Sep 2017

Many a times, we use Entity Framework (EF), Microsoft’s recommended data access technology for an application, to execute (MS SQL Server) Stored Procedure (SP), and retrieve the results emitted by them. Here we discuss different kinds of output that a SP can produce and how we can retrieve them using EF.

SP output

A SP typically provides the following kinds of results:

  1. Return code (single integer)
  2. Output parameters (one or more, any data type)
  3. Result set
    1. Single result set
    2. Multiple result set
      1. All having the same schema
      2. Having different schema

Return code

SP can return a single integer return code. Return statement without any value (null) would automatically return 0. It is mostly used to exit execution of a SP when certain condition is met.

CREATE Procedure SpReturnCode
AS
BEGIN
  RETURN 1;
END

Using T-SQL we can execute SP like below.

DECLARE @Success AS INT
EXEC @Success = [SpReturnCode];
PRINT @Success

Output parameter

SP can return one or more values, each having its own data type.

CREATE Procedure SpOutputParameter (@InputValue INT, @OutputValue INT OUTPUT)
AS
BEGIN
  SET @OutputValue = @InputValue + 1;
  RETURN 0;
END

Using T-SQL we can execute SP like below.

DECLARE @ReturnValue INT;
EXECUTE [SpOutputParameter] 2, @ReturnValue OUTPUT;
PRINT @ReturnValue

Single result set

Returns a result set having 0 or more rows of a certain schema. The following SP returns a result set with a single column named, Success.

CREATE Procedure SpSingleResultSet
AS
BEGIN
  SELECT 3 AS Success
  RETURN 0;
END

Multiple result set, same schema

The following SP returns the result set for Employee schema twice.

CREATE Procedure SpMultipleResultSetSameSchema
AS
BEGIN
  SELECT * FROM [Employee]
  SELECT * FROM [Employee] SELECT [EmployeeId] > 10

  RETURN 0;
END

The following SP returns a result set that is not associated with any database entity.

CREATE Procedure SpMultipleResultSetNonDbContextEntity
AS
BEGIN
  DECLARE @Loop AS INT
  SET @Loop = 0

  WHILE @Loop < 10
  BEGIN
    EXEC SpSingleResultSet
    SET @Loop = @Loop + 1
  END

  RETURN 0;
END

Multiple result set, multiple schema

The following SP returns two different result sets: one for Company and another for Employee.

CREATE Procedure SpMultipleResultSetMultipleSchema
AS
BEGIN
  SELECT * FROM [Company]
  SELECT * FROM [Employee]

  RETURN 0;
END

Executing SP using EF

We will use the following different ways in EF to read different kinds of SP output as described earlier:

  1. ExecuteSqlCommand
  2. SqlQuery
  3. CreateCommand/ExecuteReader

ExecuteSqlCommand

This executes a given DDL/DML command. This can be executed when no result set needs to be returned.

Return code

Return code does not require any explicit output parameter to be used in the SP. However, while calling from EF, it should be treated as an output parameter by specifying the direction for the parameter.

SqlParameter returnCode = new SqlParameter("@ReturnCode", SqlDbType.Int);
returnCode.Direction = ParameterDirection.Output;

Db.Database.ExecuteSqlCommand("exec @ReturnCode = [SpReturnCode] ", returnCode);
var returnCodeValue = (int)returnCode.Value;

Output Parameter

For each of the output parameters, we need to declare an output parameter in EF matching the appropriate data type.

SqlParameter inputParam = new SqlParameter("@InputValue", SqlDbType.Int);
inputParam.Direction = ParameterDirection.Input;

SqlParameter outputParam = new SqlParameter("@OutputValue ", SqlDbType.Int);
outputParam.Direction = ParameterDirection.Output;

Db.Database.ExecuteSqlCommand("[SpOutputParameter] @InputValue, @OutputValue OUT", inputParam, outputParam);
var returnValue = (int)outputParam.Value;

SqlQuery

SqlQuery is usually used when SP returns a single result set. However, it can return any data type including primitive types – not necessarily only entity type. If the SP returns multiple result sets, it will only get the first one. However, the complete execution of the entire SP does happen.

public class SuccessSet
{
  public int Success { get; set; }
}

var result = Db.Database.SqlQuery("[SpSingleResultSet]").ToList();

CreateCommand/ExecuteReader

When multiple result sets to be returned this method can be used. We need to use IObjectContextAdapter interface that makes use explicit interface implementation.

Db.Database.Initialize(force: false);
var cmd = Db.Database.Connection.CreateCommand();
cmd.CommandText = "[SpMultipleResultSetSameSchema]";

Db.Database.Connection.Open();

var reader = cmd.ExecuteReader();
var employees =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
    .ObjectContext
    .Translate<Employee>(reader, "Employee", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

foreach (var employee in employees)
  Console.WriteLine(employee. Name);

While(!reader.NextResult())
{
  employees =
    ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
    .ObjectContext
    .Translate<Employee>(reader, "Employee", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

  foreach (var employee in employees)
    Console.WriteLine(employee. Name);
}

Db.GetDatabase().Connection.Close();

When result set is not in DbContext, slight modification is required.

var successSets =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
  .ObjectContext
  .Translate<SuccessSet>(reader);

When different schema are used we can still use the same, we just need to use the right object while translating.

var companies =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
  .ObjectContext
  .Translate<Company>(reader, "Company", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

reader.NextResult();

var employees =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
  .ObjectContext
  .Translate<Employee>(reader, "Employee", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

Index

Pseudo-polynomial Complexity

33rd Friday Fun Session (Part 2) – 15th Sep 2017

The complexity for FaaS solution is O(n), where n is the largest day number. It looks like polynomial. However, it is actually pseudo-polynomial.

Size of input

Complexity is measured in terms of the size of input, say, in bits. Suppose, there are b bits in n. Then O(n) = O(2b) and hence, it is exponential.

Let’s assume n increases from 10 to 1125899906842624. More specifically, lunch schedule, as used in the previous example, changes from 1, 3, 4, 5, 6, 7, 10 to 1, 3, 4, 5, 6, 7, 1125899906842624. We still have the same 7 days to go for lunch. Yet, we are running 1,125,899,906,842,624 loops. In our layman understanding, the problem is still the same and should have taken the same amount of time to execute, and yet, for the latter, the algorithm takes way too long!

Spot a pseudo-polynomial

This is how we spot a pseudo-polynomial time algorithm. Ideally, we would like to express the complexity using the number of inputs; here, it should have been 7. But the above algorithm works in a way, where the complexity has been expressed in one of the numeric values of the input, the maximum value of the input – 1125899906842624, to be precise. This is where we are tricked into believing it to be a polynomial time algorithm, linear (polynomial) in the (max) numeric value of the input. But if we apply the definition of complexity that takes into consideration the size/length of the input, then it is actually exponential.

To be more specific, if we look at the input size, 4 bits are required to represent 10, while 50 bits are required to represent 1,125,899,906,842,624. Complexity has gone from O(24) = 10 loops to O(250) = 1,125,899,906,842,624 loops.

That is essentially exponential in the number of bits, meaning exponential in the size of the input but polynomial in the numeric value of the input. Algorithm with this kind of running time is called pseudo-polynomial.

Truly polynomial

At this point, you might wonder what is a truly polynomial time algorithm. For example, when we add n numbers using a loop running n times, we say, the complexity of it to be O(n). But here this n can also be written as 2b. So, shall we also say, adding n numbers is a pseudo-polynomial time algorithm?

Well, when we say, adding n numbers, we implicitly say, we want to find the sum of n 32 bit numbers/integers. Then the size of n numbers is 32 * n. Once again, the formal definition of complexity is defined in terms of input size, in bits. What is the input size here? The size here is 32n. The complexity is O(32n) and removing the constant terms it is O(n), a truly polynomial time algorithm.

Index

Solution – FaaS

33rd Friday Fun Session – 15th Sep 2017

Given a lunch schedule – a sequence of days when lunch is planned, and three price plans – daily, weekly and monthly, we want to get the cheapest lunch price.

This is the solution to JLTi Code Jam – Aug 2017 problem.

Let us walk through an example

Let us take an example as mentioned here: 1, 2, 4, 5, 17, 18. Since first day is 1 and last day is 18, it can be put under a month that consists of 20 consecutive days (not calendar month). We can use a monthly plan. But it would be too expensive (S$ 99.99) for just 6 days.

The days: 1, 2, 4 and 5 fall within a week that requires consecutive 5 days (not a calendar week). We have an option to buy a weekly plan for these 4 days that would cost S$ 27.99. However, that would be higher than had we bought day-wise for 4 days at a price of S$24.

Dynamic Programming

In general, at any given day, we have three options:

  1. We buy lunch for this day alone, using daily price S$ 6. Add that to the best price found for the previous day.
  2. We treat this as the last day of a week, if applicable, and buy a weekly plan at a cost of S$ 27.99. Add that to the best price for the day immediately prior to the first day of this week.
  3. We treat this as the last day of a month, if applicable, and buy a monthly plan at a cost of S$ 99.99. Add that to the best price for the day immediately prior to the first day of this month.

This is an optimization problem that can be solved with dynamic programming where we use the result of already solved sub-problems.

Bottom-up

We have two options: top-down and bottom-up. We realize that, at the end, all the sub-problems (for each of the days) have to be solved. We also find that it is easy to visualize the problem bottom-up. And if we do use bottom-up then the required space would be limited by the last day number.

Hence, we will solve it using bottom-up dynamic programming.

Blue colored days are when lunch is scheduled.

DP table1.png

On day 1:

Cost S$ 6.

On day 2:

Daily basis: S$ 6 + price at day 1 = S$ 12

Weekly basis: S$ 27.99

Monthly basis: S$ 99.99

Best price: S$ 12

On day 3:

No lunch schedule, cost of previous day S$ 12 is its cost.

On day 4:

Daily basis: S$ 6 + price at day 3 = S$ 18

Weekly basis: S$ 27.99

Monthly basis: S$ 99.99

Best price: S$ 18

On day 5:

Daily basis: S$ 6 + price at day 4 = S$ 24

Weekly basis: S$ 27.99

Monthly basis: S$ 99.99

Best price: S$ 24

From day 6 to day 16:

No lunch schedule, cost of previous day will be carried forward: S$ 24.

On day 17:

Daily basis: S$ 6 + price at day 16 = S$ 30

Weekly basis: S$ 27.99 + price at day 12 = S$ 51.99

Monthly basis: S$ 99.99

Best price: S$ 30

On day 18:

Daily basis: S$ 6 + price at day 17 = S$ 36

Weekly basis: S$ 27.99 + price at day 13 = S$ 51.99

Monthly basis: S$ 99.99

Best price: S$ 36

Finally, the best price is S$ 36.

Another example

Let us work with another example: 1, 3, 4, 5, 6, 7, 10.

DP table2

On day 7:

Daily basis: S$ 6 + price at day 6 = S$ 36

Weekly basis: S$ 27.99 + price at day 2 = S$ 33.99

Monthly basis: S$ 99.99

Best price: S$ 33.99

Finally, the best price at the end is S$ 39.99.

Complexity

The complexity is O(n), where n is the largest day number. It is a pseudo-polynomial time algorithm.

GitHub: FaaS

Index

Team Lunch

7th JLTi Code Jam – Sep 2017

Since our JLTi Mumbai colleagues started vising our Singapore office, we are having more team/project lunches. Usually, a number of them come together, and after a short while they also leave together. It is only few days before they leave that we start organizing team lunches. Suppose, there are three colleagues belonging to three different teams, then there would be three team lunches, one for each team.

However, not all members work for an exclusive team. For example, I create an impression as if I work for more than one team, and due to the good grace of those teams, I also get invited in their team lunches.

However, due to the rush of deliverables, that is the norm here, the team lunches are squeezed in the last few days, and at times, multiple team lunches fall on the same day, typically on the last day.

That is all fine and good for most. However, I have a big problem. If two team lunches fall on the same day, and I belong to both, I miss one for obvious reason. I skip lunch does not necessarily mean I skip free lunches.

Hence, I decided to write a small program that will take the team composition in certain way and output the minimum days required to schedule the lunches so that people working on multiple teams don’t miss out any.

Yes, I am not the only person but there are some other colleagues who also work across more than one team. Let us also assume that, for our 7 or 8 teams, it might be easy to calculate it manually. But when the number of team exceeds, say 100, then a program is a must.

Input:

1 2

Output: 2

Explanation:

Input 1 2 (1 and 2 separated by a space) means there are one or more members who belong to both team 1 and team 2.

Output 2 means, at least 2 days are required to arrange lunches for the teams. On day 1, one of the two teams can go for lunch. On the second day, the other can go.

How many teams are there? Well, there are at least 2. There can be more, but that is irrelevant. Suppose there are 4 more teams – team 3, team 4, team 5 and team 6, they can go either on first day or on second day. This is because, no member working in those 4 teams work for a second team. After all, the input says, only team 1 and team 2 have some common members.

Input:

1 2

2 3

Output: 2

We have some members common to team 1 and team 2. And there are some members, who are common to team 2 and team 3, as shown in the second line.

Each line in the input would indicate the presence of common members between two teams, where the two team numbers are separated by a space. There would be at least one line of input, meaning somebody would run this program only if there exist at least one member working for more than one team.

For the above input, we would still require at least two days to avoid any conflict. On one day team 1 and team 3 can go. Team 2 must go on a separate day.

Input:

1 2

2 3

1 3

Output: 3

Now we need 3 separate days. Team 1 cannot go on the same day as team 2 or team 3. This is because team 1 has members working for both team 2 and team 3. Similarly, team 2 cannot go for lunch on the same day as team 3 as they have common members. Hence, team 1, team 2 and team 3 – all need exclusive lunch days.

Task: Given a list of team pairs (like 1 2 is a team pair, as shown in input) sharing common members, we need to write a program, that would output the minimum number of days required to set aside for team lunches, so that nobody who work across multiple teams misses his/her share of team lunches.

Index

Gmail API with OAuth 2.0

1st Friday Fun Session – 6th Jan 2017

IMAP and POP3 are the two most prevalent standard protocols for email retrieval that work over a TCP/IP connection. However, Gmail also introduced Gmail API that gives RESTful access to user’s mailbox under OAuth 2.0 authorization. To get a feel of this, we will write a small desktop application that uses Gmail API to connect to a specific mailbox and performs some operations.

Application description

Our small C# console application connects to a specific mailbox, retrieves items filtered by a time period and puts back the same in the same folder (Inbox, Sent etc.), from where it was originally read, in an encrypted form.

We will connect the mailbox using RESTful access through Gmail API. That also means, authorization would be done using OAuth 2.0 protocol.

Input to the application

Input to the application would be: mailbox name, mailbox password, start of the retrieval period, end of the retrieval period, a password, and a salt – the  last two are used for generating symmetric keys for encryption.

The input is read from a file named input.txt, placed in the application path.

Each of the five input fields is a key-value pair, separated by a single space, occupying a single line. Sample input.txt file content is shown below:

  1. mailbox some.email.id@gmail.com
  2. start_time 2016/12/14
  3. end_time 2016/12/15
  4. input_key BA994A37-901D-4777-8054-6C5D87500AB7
  5. salt 65DA56D0-9285-41A0-975E-323420D0602B

The key mailbox denotes gmail id. Both dates use yyyy/M/d format, end_time being no earlier than start_time. Time could be included/used. Salt must be at least 8 bytes long.

The steps

We need to do the following steps:

  1. Authentication/Authorization
  2. Getting emails from Gmail
  3. Encryption
  4. Insert emails back

Authentication/Authorization

Google APIs (Gmail is one of them) use OAuth 2.0 protocol for authentication and authorization. All applications have to follow a basic pattern for accessing a Google API using OAuth 2.0.

Get credentials

At first, we must create a Google API Console project. Once created, we get credentials for it. Credentials are essentially composed of the following two values that are known to both Google and the application.

  1. Client ID
  2. Client Secret

Google console pages look like below:

Google Console

Application Secret

Get access token

The second step is to get an access token from Google Authorization Server. During access token request, a scope parameter is also sent. Scope indicates all the accesses requested for.

Depending on the kind of applications (web server, installed, client side) the authorization sequences might vary slightly. Since ours is an installed application, we would get authorization code first. We then exchange this code for a token. Below image that is taken from Google, depicts the process.

webflow

Get authorization code

To get authorization code from Google, we first open an HttpListener on local  loopback address, with a dynamically found available port. However, this requires elevated privilege. Hence, the application needs to run at elevated privilege. This could be avoided by doing URL reservation. However, that itself requires elevation.

We then open an authorization request in browser from the application. Authorization request would require:

  1. Authorization endpoint: https://accounts.google.com/o/oauth2/v2/auth
  2. Scope: https://www.googleapis.com/auth/gmail.modify
  3. Redirect URI: local loopback address, as explained earlier
  4. Client Id: Client Id credential, as retrieved for the Google API Console project
  5. State etc.: some additional information like state etc. to verify that response from server is due to a request made by this application. Verification happens once the response from Google server is back.

User’s consent screen looks like below.

Authentication picture

Scope can vary. Modify as scope, as shown here is sufficient to call all the three APIs (get, list and insert) that we need for our application. However, instead of asking for all the authorization upfront, scope should be increased incrementally.

If all is well, we will get an authorization code. However, to make any API call, we need access token. So now we have to exchange this code for a token.

We make an HttpWebRequest POST using

  1. Token endpoint: https://www.googleapis.com/oauth2/v4/token
  2. Code: as found earlier
  3. Client ID: credential as explained earlier
  4. Client secret: credential as explained earlier

All Gmail API endpoints are https, meaning all API calls have to be made using encrypted SSL/TLS channel. In .NET, we can easily use HttpWebRequest class. However, in other languages, if such a library is not available, we might use GnuTLS, OpenSSL etc. to create a TLS/SSL channel to enable us making https calls.

Make API call using access token

Now we can make Gmail API call, using the token in an authorization header.

Refresh Access Token

Access token has limited lifetime. We might require refreshing the access token, if necessary.

Getting emails from Gmail

To read emails, we will use list and get APIs, using the access token that we already received.

list

For list API call, we are making an HttpWebRequest GET request with endpoint: https://www.googleapis.com/gmail/v1/users/userId/messages.

Parameter userId is the gmail id. Optional query parameter q can be used, to specify after and before date to filter data. It should be mentioned that there is a max limit (I found it to be 500) as to how many emails (essentially, emaid id and thread id combination) we can read. We are content with as many results as found in the first call. No attempt is made to make further calls to read more emails.

get

For each listed email found in the above call we are now calling the get API using GET https://www.googleapis.com/gmail/v1/users/userId/messages/id.

Parameter userId is gmail id and id is the email id that is found from the previous call. Optional query parameter, format is used with value set to full. For each email, we are retrieving the subject and labels. If the input has a time component, then at this stage, we can further filter based on it, using internalDate property of the Users.Message resource.

Encryption

Encrypt with Rijndael

For encryption, Rijndael is used. Password and salt are taken as input. A 256 bit AES key is generated. We are using RijndaelManaged with default 128 bit block size; hence, it is AES compatible. Since this is symmetric key, the same key is used for both encryption and decryption. And the same key is used for all the emails.

get with format raw

For each email, we are making another call to get API, this time with format set to raw. Here we are duplicating the read API call. However, this way, we can directly get the raw property of Message resource, necessary to make insert API call. We don’t need to construct the raw property on our own.

At first, we are decoding the raw as found above. Then we are searching for the subject (that, we collected earlier, using get API with format full) in that. Starting from first character of the subject, we are encrypting the rest of raw. That will encrypt subject and body parts, along with some more data.

Regarding encryption, we could do that for the whole of raw or only for parts like subject, body and so on. As mentioned above, for simplicity, we have encrypted everything starting from subject. We have also written functions and tested that we can correctly decrypt the encrypted data.

While inserting the email into gmail folder, it would have been better if we could somehow indicate that the email is encrypted. That way, we could stop encrypting an email more than once, or identify that a certain email is in encrypted form.

So, in short, data in field raw of the Message resource is decoded, converted to a string and then encryption is applied on the plain text. It is then reverted back to base64url encoded form for insert.

Insert emails back

insert

The encoded raw along with labelIds collected earlier is now used to make an insert API call using POST https://www.googleapis.com/gmail/v1/users/userId/messages.

Once again, userId is the gmail id. Only raw and labelIds are used (in the request body), the latter (labelIds) is to make sure that the inserted email ends up in the same folder. Thread Id parameter could be used easily, however, chose not to include.

We have not used the upload URI. This is because we have only dealt with metadata and not (email) attachments, once again for simplicity.

Output

When the application runs, it asks user to press a key. Once the key is pressed, it starts working in asynchronous mode. Pressing a key further would stop it.

The application first opens Google server in browser for user to authenticate/authorize. Once that is done, it prints out how many emails it is going to read/encrypt/insert, based on the input parameters. It might take a while before it starts printing a line for each of the inserted emails.

This is how it looks in the Gmail folder.

Output.png

GMail API client libraries could be good starting point for writing such an application.

GitHub: GMail API

Index

Multithreaded Programming

32nd Friday Fun Session – 25th Aug 2017

Multithreading is a very important aspect of programming. Here we discuss some fundamental concepts and issues around it.

Process vs. Thread

A thread is a sequence of execution, a feature of operating system (OS). OS allocates processor time to a thread. Every process must have at least one thread. After all, it is a thread that executes the instructions.

Operating system gives each process some memory and it makes sure memory of one process is not accessible by other processes. However, all threads of a process would share its memory space, as the threads are parts of it.

Why do we need more than one thread?

As stated earlier, every process has at least one thread. Why do we need more than one thread? There are several reasons for that:

  1. Better UI experience: we can leave an exclusive thread to get user input, giving user a smooth interactive experience while having other threads to do other works.
  2. Do more things: with more than one processor available, using more than one thread, we can get more work done simultaneously.
  3. Use more things: different parts of software might be using difference resources of the computer. More than one thread can be used to simultaneously work with those resources.
  4. Doing slow things: sometimes we need to do some time consuming or slow thing, like writing to disk. We can create a separate thread to do that.

Concepts and Issues

We will focus on the following using VC++ concepts/terminologies/semantics:

  1. Types of threads
  2. Creating threads
  3. Memory issues
  4. Synchronizing threads
  5. Coordinating threads
  6. Exiting threads

Types of threads

There are two kinds of threads: user-interface (UI) thread and worker thread.

UI thread

UI thread has a message pump/loop that is nothing but an infinite loop that keeps on reading a message directed to the thread. Usually, PostThreadMessage is used to post (there is no corresponding send function) message to it. All these messages are queued in message queue, retrieved one by one and processed.

Since UI thread is in an infinite loop, it does not exit on its own. It has to be explicitly exited using PostQuitMessage.

UI thread can create UI components like showing a message box or other UI etc.

Worker thread

Worker thread does not have any message pump. That also means, it cannot receive any message. It cannot show any UI. It executes a set of instructions that is part of it and exits once execution finishes, meaning the thread does not exist any more after that.

Creating threads

AfxBeginThread function can be used to create both kinds of threads. However, a thread can be created in two modes: one that starts executing immediately; another that gets created in suspended mode and can be resumed later.

Each thread’s stack takes a 1 megabyte. 64 bit applications would fare better but keeping the number of threads low should be a priority.

Memory Issues

The thread can work on its own, allocating some (heap) memory that it will exclusively use. However, many a times, main process allocates some (heap) memory and passes it to a thread. This memory is usually freed up (to avoid memory leaks) by the process itself, as it allocated the memory at the first place. However, the process has to be careful not to free up memory when it is still being used by the thread. If the memory is deleted / freed and the thread is still accessing it, application/process/thread would crash (thread is part of the process).

That is one of the reasons we need to track the thread as to when it exits.

When we know that all threads that are using a certain memory have exited, we can free the memory. One way of doing this is again sharing/passing certain process (memory) flag to thread that it can use/set to indicate when it exits.

Synchronizing threads

In last section, we saw that process shares some memory with threads. At times, more than one thread might be using the same piece of memory. If multiple threads, running concurrently, access/modify common memory simultaneously, there would be data inconsistency and access violation crash. To deal with this we need to make those code segments as critical section. Access to critical section is to be controlled by synchronization techniques. There are a number of them. One light weight (compare to mutex etc.) intra-process technique (processor-specific test and set instruction) would be critical section (InitializeCriticalSection, EnterCriticalSection, LeaveCriticalSection, DeleteCriticalSection).

However, common memory object is not the only thing that we might need to synchronize.

Coordinating threads

One of the reasons, we create threads is to do some additional work. At times, one thread (main thread or a created one), let’s call it waiter, is waiting somewhere for another thread (main thread or a created one), let’s call it doer, to finish certain work. How would that be achieved?

We can use event based signalling mechanism. We can create an event (CreateEvent). A waiter can wait on it, usually using a blocking call (WaitForMultipleObjectsEx). Once doer is done, it can signal (SetEvent) using the event handle that it is done. Waiter would come out of the wait (blocking call) and it would know that the work for which it was waiting for is completed.

Waiter thread can again put the event to nonsignaled state (ResetEvent) and start waiting for the doer to signal it, and the cycle can go on and on.

An event can be either named or unnamed.

An event can also be created in two modes: manual-reset that requires an explicit (ResetEvent) call to put that event into nonsignaled state; another is auto-reset that creates the event with nonsignaled state.

Once we are done with an event, it should be closed (CloseHandle) else it would cause resource (handle) leak.

Exiting threads

When a process needs to exit, it should make sure that all the created threads are gracefully shut down first.

UI thread can be instructed to exit using PostQuitMessage function. However, it might take a while as there might be other messages in the queue. In that case, application should wait for the sake of a graceful shutdown. Otherwise, depending on what this thread is designed to do, the system might go to an unstable state, say, due to an inconsistent state of an inter-process synchronization object.

Since worker thread cannot be posted a message, we can use TerminateThread. However, once again this is a dangerous thing to do, for the same reason (can result in unstable state) as mentioned earlier.

So we need to be patient during shutdown. We can use the shared variable approach as discussed earlier, or an event based mechanism so that the process knows when all threads have exited.

Index

Solution – Scoring Weight Loss

29th Friday Fun Session – 4th Aug 2017

Given a sequence of weights (decimal numbers), we want to find the longest decreasing subsequence. And the length of that subsequence is what we are calling weight loss score. This is essentially the standard longest increasing subsequence (LIS) problem, just the other way.

This is the solution to JLTi Code Jam – Jul 2017 problem.

Let us walk through an example

Let us take the example as mentioned here: 95, 94, 97, 89, 99, 100, 101, 102, 103, 104, 105, 100, 95, 90. The subsequence can start at any value, and a value in a subsequence must be strictly lower than the previous value. Any value in the input can be skipped. The soul goal is to find the longest subsequence of decreasing values. Here one of the longest decreasing subsequences could be:  105, 100, 95, 90 and the length would be 4.

Even though, in our weight loss example, we have to find the length of longest decreasing subsequence, the standard problem is called longest increasing subsequence. Essentially the problems are the same. We can have a LIS solution and can pass it the negative of the input values. Alternatively, in the algorithm, we can alter the small to large, greater than to smaller than etc. We chose the former.

We will use two approaches to solve this problem: one is a dynamic programming based solution having O(n2) complexity, another is, let’s call it Skyline solution having O(n log n) complexity.

Dynamic Programming Solution

Let’s work with this example: 95, 96, 93, 101, 91, 90, 95, 100 – to see how LIS would work.

When the first value, 95 comes, we know it alone can make a subsequence of length 1. Well, each value can make a subsequence on its own of length 1.

When the second value 96 comes, we know it is greater than 95. Since 95 already made a subsequence of length 1, 96 can sit next to it and make a subsequence of length 2. And it would be longer than a subsequence of its own of length 1.

When the value 93 comes, it sees it cannot sit next to any value that appeared prior to it (95 and 96). Hence, it has to make a subsequence of its own.

When the value 101 comes, it knows that it can sit next to any prior values (95, 96 and 93). After all, it is bigger than each of them. It then computes the score it would make if it sits next to each of them, separately. The scores would be 2, 3, and 2, if it sits next to 95, 96 and 93 respectively. Of course, it would choose 96. The subsequence is 95, 96, 101 and the score is 3.

So we see, we can go from left to right of the input, and then for each of the previous values, it sees whether it can be placed after it. If yes, it computes the possible score. Finally, it chooses the one that gives it the highest score as its predecessor.

So we are using the solutions already found for existing overlapping sub-problems (the scores already computed for its preceding input values, that we can reuse) and can easily compute its own best score from them. Hence, it is called a dynamic programming solution.

The following table summarizes it.

DP table.png

There are two longest subsequences each with length 3. For a certain value, if we need to know the preceding value, we can backtrace and find from which earlier value its score is computed. That way, we can complete the full subsequence ending with this value.

Since for each of the input values we are looping all the preceding values, the complexity is O(n2).

Skyline Solution

In this approach, we would retain all incompatible and hence promising subsequences since any of them could lead to the construction of one of the final longest subsequences. Only at the end of the input we would know which one is the longest. Since we are retaining all incompatible subsequences I am calling it Skyline, inspired by Skyline operator.

It is obvious but let me state here, all these solutions are standard, already found and used. However, Skyline is a name I am using as I find it an appropriate term to describe this method.

If there are two apples: one big and another small, and if you are asked to choose the better one, you would choose the big one. However, if you are given an apple and an orange, you cannot, as they are incomparable. Hence you need to retain both.

When a value comes it can be one of the below three types:

Smallest value (case 1)

  1. It won’t fit at the end of any existing subsequences. Because the value is smaller than all the end values for all existing subsequences.
  2. There is no other way but to create a new subsequence with this value.
  3. We can safely discard all single value subsequences existed so far. After all, the new subsequence with the smallest value can be compared with each of them and it is clearly superior to them (score for each such subsequence is 1 and the end (and only) value for the new one is the smallest – hence it can accept more future input values than the rests).
  4. In the list of subsequences we can retain the single value subsequence at first. Meaning, every time the new smallest value comes, we simply replace the existing smallest value listed as the first subsequence.

Biggest value (case 2)

  1. The opposite of the previous case is: the new value is bigger than the end values of each of the existing subsequences.
  2. So it can fit at the end of all existing subsequences. So which one to choose?
  3. Suppose, it is the end of the input. In that case, we would like it to go at the end of the longest subsequence found so far and make it longer by one more.
  4. However, if it is not the end of the input and suppose there are some future input values coming that are bigger than the end value of the present longest subsequence and smaller than the present input value. By placing the present input value at the end of the present longest subsequence we will jeopardize a more promising possibility in future.
  5. So we should rather copy the longest subsequence found so far and add this new value at the end of it, making it the new longest.
  6. At the same time, we retain the previous longest subsequence as it is, that by now is the second longest subsequence.
  7. We will add this new and longest subsequence at the end of the list.

Middle value (case 3)

  1. We have a third case where the input value can fit the end of some subsequences and cannot fit at the end of the rest subsequences.
  2. This is because this new value is bigger than the end values of some sun-sequences and smaller than the same for the rests.
  3. So which one to choose? Of course, we have to choose one where it can fit, meaning from those whose end values are smaller than the input value.
  4. And we would like to choose one with the largest end element (yet it is smaller than the input value).
  5. However, we cannot just over-write it for the same reason as stated earlier (case 2, promising reasoning). Rather we copy it, add the new value at the end of it and add it to the list.
  6. Where – at the end of the list?
  7. No, we would insert in next to the subsequence from which we copied and extended it.
  8. And we can safely discard all other subsequences with the same length as this newly created  subsequence. After all, the length is the same and it’s end element is smaller than the end elements of the rests having equal length of it.
  9. Shall we run a loop over the list to find those to be deleted? No, we just need to find the next subsequence and if its length is the same as the newly created subsequence we delete it. No more checking is required.
  10. Why so? Please read the second point as stated below.

So we have handled all possible input values. The list of subsequences that we have created would have some nice properties:

  1. As we go from the first subsequence to the last in the list of subsequences, the length will gradually increase.
  2. There would be a maximum of one subsequence with a certain length.
  3. To find whether the input value is a case 1 or case 2 or case 3 type, we can easily run a binary search with O(log n) complexity over the end elements of the subsequences in the list. Since we would like to do so for each of the n input values, the complexity of this approach would be O(n log n).
  4. For doing the above we can use the list, just that we need to look at the end elements. Then why are we retaining the complete list?
  5. The answer is: to output the longest subsequence as well.
  6. Could we do it without saving the complete subsequence?
  7. We leave it for another day.

Walking through an example

Let’s go through the same example as used earlier: 95, 96, 93, 101, 91, 90, 95, 100.

95 (case 1)

95

96 (case 2)

95

95, 96

93 (case 1)

93

95, 96

101 (case 2)

93

95, 96

95, 96, 101

91 (case 1)

91

95, 96

95, 96, 101

90 (case 1)

90

95, 96

95, 96, 101

95 (case 3)

90

90 95

95, 96 (deleted)

95, 96, 101

100 (case 3)

90

90 95

90 95 100

95, 96, 101 (deleted)

Once all the input values are treated, the last subsequence would be the longest one.

GitHub: Scoring Weight Loss

Index

FaaS

6th JLTi Code Jam – Aug 2017

Threatened by the JLTi Weight Loss Competition where the participants are lining up in front of Salad shops, and the likes of me, who have entirely given up lunch (hopefully I can continue forever), food court shops who are selling oily, low-fibre and various other kinds of unhealthy food have come up with a novel idea.

Inspired from the software world, and more importantly, to attract the software people who sit in their chairs for long hours and are the primary victims of eating these junk, those food shops have chosen a name for this scheme – Food as a Service (FaaS), borrowed from the likes of SaaS, PaaS, IaaS – whatever that means, if that means anything at all.

Instead of paying on a daily basis, they are asking people to subscribe for food.

For example, without subscription, a set lunch would cost S$ 6, as usual, if you want to pay as you eat, just like as you are doing now. No strings attached.

However, if you subscribe for a week (5 meals, one meal one day, 5 consecutive days, not calendar week, can start at any day), instead of paying S$ 30, you can pay S$ 27.99 for five meals. Of course you have to eat from the same (chain of) shop.

And if you subscribe for a month (20 meals, one meal one day, 20 consecutive days, not calendar month, can start at any day) that they are vying for, you pay only S$ 99.99.

Input: 1, 2, 4, 5, 17, 18

Output: 36

Explanation: Input is a list of day numbers when you want to have a meal. The number can start at 1, and go up to any number.

A certain day number, say, 4, would not come more than once in the input, if it comes at all, assuming one can have only one lunch meal a day.

The above input says – you eat for 6 days. It makes no sense for you to go for a monthly subscription. Well, it also does not make sense to go for a weekly subscription. Paying daily basis for 6 days would be the best cost effective decision for you. You pay: S$ 36.

Input: 3, 4, 5, 6, 7, 17, 18

Output: 39.99

You subscribe for one week (first 5 days) and pay individually for the last 2 days. Your best decision cost you S$ 39.99.

Input: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 19, 20, 21, 24

Output: 105.99

Here, a monthly subscription and S$ 6 for the last day would be the best deal for you.

Task: Given lunch calendar for some days (it can be 3 days, 10 days, 121 days or any number of days) as input, as explained above, I am planning to write a program that would output me the best price. Well, if I can find the best price, I also know what subscription plans etc. are. However, put that aside. Let’s find the best price, as shown and explained above.

Index

Solution – Manipulating Money Exchange

25th Friday Fun Session – 7th Jul 2017

Given a set of currencies and some exchange rates among them, we want to find if there exists an arbitrage; meaning, if it is possible to exploit the discrepancies in the exchange rates and transform one unit of a certain currency to more than one unit of the same, thus making a profit.

This is the solution to JLTi Code Jam – Jun 2017 problem.

Let us walk through an example

Let us take the example as mentioned here. We can start with 1 USD, convert that to SGD (1.380 SGD), then convert that to MYR (1.380 * 3.080 MYR), then convert to INR (1.380 * 3.080 * 15.120 INR), then convert to GBP (1.380 * 3.080 * 15.120 * 0.012 GBP), then convert that back to USD (1.380 * 3.080 * 15.120 * 0.012 * 1.30 = 1.0025503488 USD).

We end up with more than 1 USD. That means, we have an arbitrage in this set of exchange rates. The profit making cycle here is: USD -> SGD -> MYR -> INR -> GBP. And since it is a cycle, we can start from any currency within it. For example, SGD -> MYR -> INR -> GBP -> USD also represents the same cycle.

The transformation

In general, if we have to make a profit, the respective rates in the cycle, when multiplied, should give more than 1, as we have seen in the above example.

Formula

Negative cycle in Bellman-Ford

After some simple transformation of the profit making condition, we see, if we take negative of log rate, and use that as the edge cost/distance, then finding profit making cycle is equivalent to finding negative cycle in the corresponding graph. And we can do so using Bellman-Ford algorithm.

To be precise, each of the currencies would be considered as a vertex. If there exists an exchange rate r between two currencies then there would be a directed edge between the corresponding vertices, and –log r would be the associated cost/distance of that edge.

Source of Bellman-Ford

The next question comes: using which vertex as source shall we run the Bellman-Ford? Let us see the below graph.

Souce.png

Suppose, we have a single profit making cycle here: GBP-> AUD -> CAD. In that case, if we start with USD as source vertex, we will never detect this cycle.

Add extra currency as source

To solve this problem, we need to add an extra currency, and then create edges from it to all the existing currencies with cost 0. Now using this extra vertex (EXT) as source we have to run Bellman-Ford and that would ensure that we can detect a cycle, if there exist one.

Extra Souce

GitHub: Manipulating Money Exchange

Index

Scoring Weight Loss

5th JLTi Code Jam – Jul 2017

Now that weight loss competition is back, some people are more than excited about it. And why not? After all, only by running 10 km in the last 3 days, they can effortlessly shed 15 kg!

On the other hand, slim people, by any global standard, like me have to starve an entire month and still have to win this competition only in dream, in some rainy days.

Since the enthusiastic participants approached me to participate, I am thinking of a new scoring system that would remove this inherent bias in the existing scoring system – deducting final weight from first day weight.

So here I propose a new scoring system that would otherwise value the sustained effort and success of the participants, ignoring the total/final/absolute loss.

Input: 73, 72.9, 72.8, 72.8, 72.9, 72.7, 72.0, 71.6, 73, 72.5, 72.4, 71.3, 73.5, 74

Output: 7

Explanation: The above is an estimation of my performance, if the competition runs for 14 days. You can clearly see I start with 73 and end up with 74. In the old standard, I gain weight and penalized. In the new scoring system I score 7. How? Well, it computes how long I keep on decreasing weight, without seeing how much. In the above example, the longest stretch where I continue to lose weight (a value in a sequence must be smaller than its immediate predecessor) is shown below.

73, 72.9, 72.8, 72.7, 72.0, 71.6, 71.3

Let us also find the approximate score of the last time winner. A possible set of weights of him might look like the below:

Input: 95, 94, 97, 89, 99, 100, 101, 102, 103, 104, 105, 100, 95, 90

Output: 4

His success story lies in the last 3 days of blitzkrieg (the first weight in the input does not necessarily need to be the first value in the sequence): 105, 100, 95, 90

Let us also talk about a hypothetical participant who misunderstands this to be a weight gain competition and eats cheese all along.

Input: 53, 53.1, 53.2, 53.4, 53.5, 53.6, 53.9, 54, 54.1, 54.2, 54.2, 54.7, 55.8, 56

Output: 1

The scorer takes note of 53 and it never goes towards the right direction.

Task: A good scoring system indeed – nobody gets zero or below. And I am sure all of you would agree with this. Now let us quickly write a small program that takes an array of weights and computes the score.

Index