One new-year resolution I made last year was to understand where my money goes. From previous experiments I know that expense tracking has to be as simple as possible. My approach is to
- Use my cash card as often as possible. This automatically tracks the date and some information on the vendor.
- Use twitter to track my cash expenses. This supplements the bank account statement data.
- Edit, enrich, merge and visualise the two data sources with R. Because it is fun playing with R!
Now after more than one year of expense tracking, I can now analyse the results. The first result however, was disappointing. My cash tracking with twitter was not as complete as I thought it is. Below is a figure that displays the sum tracked with twitter divided by the sum withdrawn from my bank account for each month of 2011.
If I had tracked my cash expenses completely, the ratio would be around 100, the gray dashed line. However, it is systematically below. For September, there is an explanation: I was on holidays and did intentionally not track the expenses. But even considering that, there remain 18 percent of my cash spendings unexplained!
More analysis results will follow. If you are interested in technical aspects of the expense tracking, such as importing the tweets and bank statements, read on. However, there is no R code today, since there is no example data.
There are some things to consider when handling the cash tweets and bank statements.
- Both consists of several files. Apart from the raw tweets there is a “categorizer” which expands the hashtag to a spending category and a place. Same for bank statements.
- The data is not public and it gets updated. This means it is not an option to put the tweets and statements as a dataset in
pft
’s data folder. - The data will grow. Sooner or later I do not want to import all the data, but only the last six months or something. So, instead of a parameterless
data()
call, a parametrization with start and end date seems useful. - Although the data will be merged, it comes in different formats. The tweets are updated by querying the twitter website, the bank statements are imported from a set of file in the SWIFT MT940 format. The formats might change in the future, e.g. using email instead twitter, CSV instead of MT940. So it seems sensible to hide the import/update process from the further analysis.
These considerations led me (the hard way, by trial-and-error) to the idea of a generic data class. Currently, a data class has
- a constructor for passing all details to make the data source work (e.g. the twitter username, the local data folder)
- a
query
method for actually pulling information from the data source. In the example of cash data, aresource
parameter must specify the type of information (raw tweets/ categorizer/ enriched data/ unknown hashtags). Optionally, afrom
andto
parameter can be provided.
In general, other generic methods are conceivable. For instance a resources
method for querying all provided resource names or a submit
method for altering the data. There are similar constructions in other programming languages or frame works, such as the CountryData
and CityData
etc. functions in Mathematica; or the locker project.
For the pft
packages, I implemented three data classes: twttr
as a general twitter api, cash
and bank
. For importing MT940 files, I ported a parser from python to R and named it read.mt940
.
You find the functions in the pft
package. It is work in progress. The current version is experimental and I did not even try to submit it to CRAN. You can inspect it by downloading it from my personal repository:
> install.packages(
+ "pft",
+ repos=c(getOption("repos"),
+ "http://userpage.fu-berlin.de/~kweinert/R"),
+ dependencies=c("Depends", "Suggests")
+ )
If you are not a Windows user, you may have to add the type="source"
argument.
3 comments:
Hello! In your article did you use the information from any researches or here are only your own ideas? Waiting forward to hear from you.
Hello funky, why are you asking?
This is great! I've been looking to do something like this, so it's cool to see there's others before me who have made it happen... Even if it was almost a decade ago 😅
Couple questions:
Are you still using this system? If not, how long did you run with it and what did you end up switching to? Did you use Twitter the whole time or switch to something else for tracking cash?
I've been throwing around the idea of using a Termux script/form for capturing expenses, and having them auto-log in my system. I like that because, in theory, I could have it commited to a git repo that I then pull from when processing later on my computer. Then I can import it into R and we're away to the races. The effort on my side is 1) tap script shortcut, 2) type expense and detail, which seems to be easy enough that I'll actually do it.
Do you have any more wisdom that you've gleaned since you did this in 2011? Be great to hear an update.
Post a Comment