How to use Orca to develop quantitative strategies?

This paper describes an example of momentum strategy. Momentum strategy is one of the most famous quantitative long-term and short-term stock strategies. Since Jegadeesh and Titman(1993) first proposed this concept, it has widely appeared in academic research and sales works. In the momentum strategy, investors believe that in individual stocks, past winners will surpass past losers.

The most commonly used momentum factor is the stock's return in the past 12 months, excluding the most recent month. In academic publications, the momentum strategy is usually adjusted once a month and the holding period is also one month. In this example, we rebalance 1 / 21 of our portfolio every day and hold new shares for 21 days. For simplicity, we do not consider transaction costs.

Step 1: load stock trading data, clean and filter the data, and then build a momentum signal for each company's stock in the past 12 months, ignoring the momentum signal of the last month.

def load_price_data(df):
    USstocks = df[, 4), df.PRC.notnull(), df.VOL.notnull()][
                   ['PERMNO', 'date', 'PRC', 'VOL', 'RET', 'SHROUT']
               ].sort_values(by=['PERMNO', 'date'])
    USstocks['PRC'] = USstocks.PRC.abs()
    USstocks['MV'] = USstocks.SHROUT * USstocks.PRC
    USstocks['cumretIndex'] = (USstocks + 1)['RET'].groupby('PERMNO', lazy=True).cumprod()
    USstocks['signal'] = (USstocks.shift(21) / USstocks.shift(252) - 1).groupby(
                            'PERMNO', lazy=True)['cumretIndex'].transform()
    return USstocks

df = orca.read_csv('C:/DolphinDB/Orca/databases/USstocks.csv')
price_data = load_price_data(df)

Note: the above code uses two extensions of Orca.

  1. Orca supports the use of commas instead of &, which is more efficient in some scenarios. See course.
  2. Orca's groupby function provides the lazy parameter, which can be used in conjunction with the transform function to realize the context by function of dolphin dB. See course.

Step 2: generate portfolio for momentum strategy.

First, select the tradable shares that meet the following conditions: no missing momentum signal value, positive trading volume on the same day, market value of more than US $100 million, and price of more than US $5 per share.

def gen_trade_tables(df):
    USstocks = df[(df.PRC > 5), (df.MV > 100000), (df.VOL > 0), (df.signal.notnull())]
    USstocks = USstocks[['date', 'PERMNO', 'MV', 'signal']].sort_values(by='date')
    return USstocks

tradables = gen_trade_tables(price_data)

Secondly, 10 groups of tradable stocks are formulated according to the momentum signal. Only the two most extreme groups (winners and losers) are retained. Suppose we always want to be long $1 and short $1 every day in 21 days, so we are long $1 / 21 in the winner group and short $1 / 21 in the loser group every day. In each group, we can use equal weight or value weight to calculate the weight of each stock on the portfolio formation date.

def form_portfolio(start_date, end_date, tradables, holding_days, groups, wt_scheme):
    ports = tradables[, end_date)].groupby('date').filter('count(PERMNO) >= 100')
    ports['rank'] = ports.groupby('date')['signal'].transform('rank{{,true,{groups}}}'.format(groups=groups))
    ports['wt'] = 0.0
    ports_rank_eq_0 = (ports['rank'] == 0)
    ports_rank_eq_groups_sub_1 = (ports['rank'] == groups-1)
    if wt_scheme == 1:
        ports.loc[ports_rank_eq_0, 'wt'] = \
        ports.loc[ports_rank_eq_groups_sub_1, 'wt'] = \
    elif wt_scheme == 2:
        ports.loc[ports_rank_eq_0, 'wt'] = \
        ports.loc[ports_rank_eq_groups_sub_1, 'wt'] = \
    ports = ports.loc[ports.wt != 0, ['PERMNO', 'date', 'wt']].sort_values(by=['PERMNO', 'date'])
    ports.rename(columns={'date': 'tranche'}, inplace=True)
    return ports

start_date, end_date = orca.Timestamp("1996.01.01"), orca.Timestamp("2017.01.01")
holding_days = 5
groups = 10
ports = form_portfolio(start_date, end_date, tradables, holding_days, groups, 2)
daily_rtn = price_data.loc[, end_date), ['date', 'PERMNO', 'RET']]

Note: the above code uses Orca's extended function, that is, it allows higher-order functions such as filter and transform to accept a string representing a dolphin DB function or script. See course.

Step 3: calculate the profit / loss of each stock in our portfolio for the next 21 days. Close stocks 21 days after the formation date of the portfolio.

def calc_stock_pnl(ports, daily_rtn, holding_days, end_date, last_days):
    dates = ports[['tranche']].drop_duplicates().sort_values(by='tranche')

    dates_after_ages = orca.DataFrame()
    for age in range(1, holding_days+1):
        dates_after_age_i = dates.copy()
        dates_after_age_i['age'] = age
        dates_after_age_i['date_after_age'] = dates_after_age_i['tranche'].shift(-age)
        dates_after_ages.append(dates_after_age_i, inplace=True)

    pos = ports.merge(dates_after_ages, on='tranche')
    pos = pos.join(last_days, on='PERMNO')
    pos = pos.loc[(pos.date_after_age.notnull() & (pos.date_after_age <= pos.last_day.clip(upper=end_date))),
                  ['date_after_age', 'PERMNO', 'tranche', 'age', 'wt']]
    pos = pos.compute()
    pos.rename(columns={'date_after_age': 'date', 'wt': 'expr'}, inplace=True)
    pos['ret'] = 0.0
    pos['pnl'] = 0.0

    # use set_index to make it easy to equal join two Frames
    daily_rtn.set_index(['date', 'PERMNO'], inplace=True)
    pos.set_index(['date', 'PERMNO'], inplace=True)
    pos['ret'] = daily_rtn['RET']
    pos['expr'] = (pos.expr * (1 + pos.ret).cumprod()).groupby(
                    ['PERMNO', 'tranche'], lazy=True).transform()
    pos['pnl'] = pos.expr * pos.ret / (1 + pos.ret)

    return pos

last_days = price_data.groupby('PERMNO')['date'].max()
last_days.rename("last_day", inplace=True)
stock_pnl = calc_stock_pnl(ports, daily_rtn, holding_days, end_date, last_days)

Note: the above code has a pos.compute() statement, which directly calculates the result of an intermediate expression (DataFrame with conditional filtering). Because we only need to assign values to the filtered results.

In addition, the above code calls set on two dataframes_ Index function, and then assign the column of one DataFrame to another. This is similar to left join two dataframes according to the index column, and then assign the corresponding column in the result. If the script POS ['ret '] = pos.merge (daily_rtn, on = ['date','permno '] ['ret']] is directly executed, a join will be performed during calculation and another join will be performed during assignment, resulting in unnecessary calculation.

Step 4: calculate the profit / loss of the portfolio and plot the cumulative return of the momentum strategy over time.

port_pnl = stock_pnl.groupby('date')['pnl'].sum()
cumulative_return = port_pnl.cumsum()

Note: the plot function will download the dolphin DB table corresponding to the entire DataFrame to the client, and then align the drawing. When using, we should pay attention to the amount of data to avoid the performance problems caused by a large number of network transmission. See course.


Click here View the complete code.

Posted by emdee on Sat, 16 Apr 2022 14:22:10 +0930