Load Sina Futures Bars Into Backtrader

Sina finance offers granular bars for futures listed on Chinese futures exchanges, so basically we can source them via either java or python.

#Loads rb2110 60m bars from Sina's MinLine
class SinaMinLine():
    def __init__(self):
        pass
    
    def load(self,symbol="RB2110",type="60"): 
        url_template="https://stock2.finance.sina.com.cn/futures/api/jsonp.php/var%20_{symbol}_{type}_{ts}=/InnerFuturesNewService.getFewMinLine?symbol={symbol}&type={type}"
        ts=datetime.datetime.now().strftime('%Y%m%d%H%M%S')
        url=url_template.format(symbol=symbol,type=type,ts=ts)
        print("url={}".format(url))
        res=requests.get(url, verify=False)
        tokens=re.split("[(]|[)]",res.text)
        bars=json.loads(tokens[1])
        print("scraped symbol={},count={}".format(symbol,len(bars)))
        #collect all bars into dataframe
        rows_list = []
        for bar in bars:
            item_dict={}
            item_dict["date"]=datetime.datetime.strptime(bar['d'],'%Y-%m-%d %H:%M:%S') #2021-05-25 11:15:00
            item_dict["open"]=float(bar['o'])
            item_dict["high"]=float(bar['h'])
            item_dict["low"]=float(bar['l'])
            item_dict["close"]=float(bar['c'])
            item_dict["volume"]=float(bar['v'])
            rows_list.append(item_dict)
        df = pd.DataFrame(rows_list)
        return df

If you have pro license for Tushare for futures bars, then it works fairly similar. Unfortunately I don’t, and the following codes will get you load daily bars for a stock, for illustration purpose.

#Loads bars from Tushare
class TushareProApi():
    def __init__(self):
        #initialize tushare api
        ts.set_token('please_use_your_token')
        self.pro=ts.pro_api()
        
    def histx(self,code='600519.SH', start_date='20220101',end_date='20220601'):
        df=self.pro.daily(ts_code='600519.SH',start_date=start_date,end_date=end_date)
        return df

#remember to format your data to fit backtrader's standard like following
#     pro=TushareProApi()
#     df=pro.histx('600519.SH','20220101','20220606')
#     df['date']=pd.to_datetime(df['trade_date'],format='%Y%m%d')
#     df=df.set_index('date')
#     df=df[['open','high','low','close','vol']]
#     print(df.tail(20))
#     data=bt.feeds.PandasData(...)

With the a given dataframe from SinaMinLine, we can convert its format so that backtrader could recognize. To avoid repeatedly downloading data from Sina, we can save the dataframe to CSV (or pickle), and reuse the persisted copy for coding or backtests.

The following code snippet will initialize backtrader, load some 60min data, inject the data to backtrader, and finally run backtrader. This completes demo on loading 60m bars to backtrader.

if __name__ == '__main__':
    #initialize backtest engine
    cerebro = bt.Cerebro()

    #load data from sina MinLine
    sina=SinaMinLine()
    df=sina.load()
    df.index=df['date']

    #inject dataframe to backtrader via PandasData
    data=bt.feeds.PandasData(dataname=df)
    
    #convert bars to dataframe
    cerebro.adddata(data)

    #run
    cerebro.run()

Python Redbook

Here’re some tricks that I’ve researched when working with PythonGo (a trading terminal that supports algo trading for Chinese futures market), GoldMiner (a backtest engine for CTA strategies), GTJA Quant (the platform that I run stock momentum strategies) and etc.

1. Resample: convert hourly bar to daily bar series
2. Talib for Python
3. Read configuration file
4. Python Read and Write Json
5. Check an object type is datetime
6. Serializing DataFrame With Pickle

Build Daily Bars From Hourly Bars

GoldMiner is handy to load both hourly bar and daily bar by its subscription methods. However, I’d like to use hourly bar to build daily close prices, in order to improve backtesting performance.

The idea is to load 1h bars to dataframe. Then all night session data will be discarded to avoid any conversion issues.

Chinese futures market (for most instruments) will start trading on 21pm of T-1 and ends on 15pm of T. Building the complete pricing information into 1 day bar, python has to shift the datetime (by +2hours) in parrell so that the dataframe is aligned to midnight. However the issue is that Friday’s (or the night before public holiday) night session data gets shifted into Saturday, which will result a bar on Saturday while Monday will have missing night bar information.

For my case, the scenario can be simplifed, as I only need the daily closing (so O/H/L/V can be inaccurate) for indicators such as SMA, BOLL and etc. Hence, I will simply filter out night session prices.

symbol = bar.symbol
histh = context.data(symbol,'3600s', count=context.period)

if histh.shape[0]<context.period:
    return

#convert for daily close prices from hourly bars via resample
histh.index=pd.DatetimeIndex(histh.eob)

#discard night session data, so that we can have day only prices for daily prices.
histx = histh.between_time('09:00','15:00',include_start=True,include_end=True)
histx.index=histx.index + pd.DateOffset(hours=2)

#resample and aggregation
histd=pd.DataFrame()
histd['open']=histx.open.resample('D').first()
histd['high']=histx.high.resample('D').max()
histd['low']=histx.low.resample('D').min()
histd['close']=histx.close.resample('D').last()
histd['volume']=histx.volume.resample('D').sum()
histd = histd.dropna()
priced=histd.close.values

ddbalancer = talib.SMA(priced, timeperiod=context.periodbalancer)

Logging In Python

When code gets more complicated, “print()” will become inefficient to assist debug and analysis, where logging is a good replacement. Java has this, and Python isn’t any different.
The overall idea of logging is the same in Python with in Java. Logger will have log levels and handlers which links to different medium for log dumps, via console, file, database or others. I’m only interested to collect logs via console and file.
Some recomments to put logger configuration at the top of each python file after “import” statements. 
The following works for me. Note that the file is located at the Python’s working directory.

import datetime
import json
import requests
import re
import redis
from xpanda.futures.sina import mdes
from vtObject import VtBarData
import csv
import time
import logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

#log to file
fh = logging.FileHandler('logfile.log')
fh.setLevel(logging.INFO)
formatter= logging.Formatter('%(asctime)s : %(levelname)s : %(name)s : %(message)s')
fh.setFormatter(formatter)
logger.addHandler(fh)

#log to console
stream = logging.StreamHandler()
stream.setLevel(logging.DEBUG)
sFormatter = logging.Formatter("%(asctime)s:%(levelname)s:%(message)s")
stream.setFormatter(sFormatter)
logger.addHandler(stream)

class MDES(object):
    def __init__(self):

Reference:
1. Logging To File: https://www.machinelearningplus.com/python/python-logging-guide/
2. Logging To Handlers: https://pythonhowtoprogram.com/logging-in-python-3-how-to-output-logs-to-file-and-console/#:~:text=Outputting%20Python%20Logs%20to%20Console%20and%20Files%20By,tools%20to%20output%20files%2C%20screen%2C%20email%20and%20others.

CZCE Futures Contract Symbols

All exchanges are listing contracts with format as: Code+YYMM, e.g. RB2110, while CZCE has different format as Code+YMM, e.g. MA109.

Sina Finance provides standard data API to market data, so that CZCE follows the same convention, such as MA2109. We need to have a conversion logic for CZCE.

There’s couple of tricks on python codes:
1. regular expression
PythonGo has format as CODEYMM, where CODE means contract codes, YMM means 3 digits to represent years. Hence, “(\w+)(\d{3})” will be good to split the symbols.
2. integer to string
Apparently, python doesn’t have autoboxing, so converting int to string requires to use “str()”. 
3. epoch time
“datetime.timestamp()” will return epoch time; it’s optional to convert timestamp to UTC.
4. convert from epoch to datetime
rebuild=datetime.datetime.fromtimestamp(now.timestamp())

        #strategy vtSymbol for CZCE has 3 digit month, while sina has 4 digits.
        #scenario as following: 
        #PythonGo requires code : CZ109
        #SinaFinance has code   : CZ2109
        if exchange=='CZCE':
            tokens=re.split("(\w+)(\d{3})",symbol)
            symbol_code=tokens[1]
            symbol_month=str(int((datetime.datetime.now().year-2000)/10))+tokens[2]
            symbol="{}{}".format(symbol_code,symbol_month)

        bartype = "60" #bartype=60 means 1h.
        timestamp=datetime.datetime.now().timestamp()
        url_template="https://.../futures/api/var%20_{symbol}_{type}_{timestamp}=/InnerFuturesNewService.getFewMinLine?symbol={symbol}&type={type}"
        url=url_template.format(symbol=symbol,type=bartype,timestamp=timestamp)

        

Reference
1. Integer To String
https://pythonexamples.org/python-convert-int-to-string/
2. Timestamp
https://www.delftstack.com/howto/python/python-convert-epoch-to-datetime/

Extract 60m Bar From Sina Finance

PythonGo framework has a library to initialize historical data for 1m and other time interval. The strategy I’m running depends on 1H, which can be built from 1m bars from the past 30 days. The issue on the default libary (or the default market data source) is that te 1H bars are missing for certain dates.

Sina Finance provide historical data for futures on intraday prices. This entry describes methods to download 60m bars (or other time interval bars) from Sina Finance.

Approach 1: Mini KLine

import requests
future_code='rb2110'
url_template="http://stock2.finance.sina.com.cn/futures/api/json.php/IndexService.getInnerFuturesDailyKLine?symbol={}"
url=url_template.format(furture_code)
r=requests.get(url)
items=r.json() 
for item in items:
     for v in item:
             print(v+',', end='') 
     print('\n') 

Approach 2: Data API via FewMinLine
Parameters
symbol=RB2110
type=60 (30 for 30, 60 for 1h and etc).

import requests
import re
symbol="RB2110"
type="60"
url_template="https://stock2.finance.sina.com.cn/futures/api/jsonp.php/var%20_{symbol}_{type}_1618643138380=/InnerFuturesNewService.getFewMinLine?symbol={symbol}&type={type}"
url=url_template.format(symbol=symbol,type=type)
res=requests.get(url)
tokens=re.split("[(]|[)]",res.text)
bars=json.loads(tokens[1])
for bar in bars:
    print("date={},close={}".format(bar['d'],bar['c']))

Reference
1. data api via mini kline: https://blog.csdn.net/tcy23456/article/details/80946838
http://stock2.finance.sina.com.cn/futures/api/json.php/IndexService.getInnerFuturesMiniKLine60m?symbol={}
2. data api via FewMinLine: https://stock2.finance.sina.com.cn/futures/api/jsonp.php/var%20_RB2110_30_1618643138380=/InnerFuturesNewService.getFewMinLine?symbol={}&type={}
3. RegExp in Python: https://stackabuse.com/introduction-to-regular-expressions-in-python/
4. String Split in Python: https://docs.python.org/3/library/stdtypes.html#str.split