Quick thank you to Don Syme of Microsoft Research for including a demonstration of F# connecting to TimeScape running on the Windows Azure cloud in the F# in Finance event this week in London. F# is functional language that is developing a large following in finance due to its applicability to mathematical problems, the ease of development with F# and its performance. You can find some testimonials on the language here.
Don has implemented a proof-of-concept F# type provider for TimeScape. If that doesn't mean much to you, then a practical example below will help, showing how the financial instrument data in TimeScape is exposed at runtime into the F# programming environment. I guess the key point is just how easy it looks to code with data, since effectively you get guided through what is (and is not!) available as you are coding (sorry if I sound impressed, I spent a reasonable amount of time writing mathematical C code using vi in the mid 90's - so any young uber-geeks reading this, please make allowances as I am getting old(er)...). Example steps are shown below:
Referencing the Xenomorph TimeScape type provider and creating a data context:
Connecting to a TimeScape database:
Looking at categories (classes) of financial instrument available:
Choosing an item (instrument) in a category by name:
Looking at the properties associated with an item:
The intellisense-like behaviour above is similar to what TimeScape's Query Explorer offers and it is great to see this implemented in an external run-time programming language such as F#. Don additionally made the point that each instrument only displays the data it individually has available, making it easy to understand what data you have to work with. This functionality is based on F#'s ability to make each item uniquely nameable, and to optionally to assign each item (instrument) a unique type, where all the category properties (defined at the category schema level) that are not available for the item are hidden.
The next event for F# in Finance will take place in New York on Wednesday 11th of December 2013 in New York, so hope to see you there. We are currently working on a beta program for this functionality to be available early in the New Year so please get in touch if this is of interest via email@example.com.
There are (occasionally!) some good questions and conversations going on within some of the LinkedIn groups. One recently was around what use cases there are for unstructured data within banking and finance, and I found this comment from Tom Deutsch of IBM to be quite insightful and elegant (at least better than I could I have written it...) on what the main types of unstructured data analysis there are:
Listening for the first time is really just making use of what you already probably capture to hear what is being said (or navigated)
Listening better is making sure you are actually both hearing and understanding what is being said. This is sometimes non-trivial as it involves accuracy issues and true (not marketing hype) NLP technologies and integrating multiple sources of information
Adding context is when you either add structured data to the above or add the above to structured data, usually to round out or more fully inform models (or sometimes just build new ones).
I went over to NYU Poly in Brooklyn on Friday of last week for their Big Data Finance Conference. To get a slightly negative point out of the way early, I guess I would have to pose the question "When is a big data conference, not a big data Conference?". Answer: "When it is a time series analysis conference" (sorry if you were expecting a funny answer...but as you can see, then what I occupy my time with professionally doesn't naturally lend itself to too much comedy). As I like time series analysis, then this was ok, but certainly wasn't fully "as advertised" in my view, but I guess other people are experiencing this problem too.
Maybe this slightly skewed agenda was due to the relative newness of the topic, the newness of the event and the temptation for time series database vendors to jump on the "Big Data" marketing bandwagon (what? I hear you say, we vendors jumping on a buzzword marketing bandwagon, never!...). Many of the talks were about statistical time series analysis of market behaviour and less about what I was hoping for, which was new ways in which empirical or data-based approaches to financial problems might be addressed through big data technologies (as an aside, here is a post on a previous PRMIA event on big data in risk management as some additional background). There were some good attempts at getting a cross-discipline fertilization of ideas going at the conference, but given the topic then representatives from the mobile and social media industries were very obviously missing in my view.
So as a complete counterexample to the two paragraphs above, the first speaker (Kevin Atteson of Morgan Stanley) at the event was on very much on theme with the application of big data technologies to the mortgage market. Apparently Morgan Stanley had started their "big data" analysis of the mortgage market in 2008 as part of a project to assess and understand more about the potential losses than Fannie Mae and Freddie Mac faced due to the financial crisis.
Echoing some earlier background I had heard on mortgages, one of the biggest problems in trying to understand the market according to Kevin was data, or rather the lack of it. He compared mortgage data analysis to "peeling an onion" and that going back to the time of the crisis, mortgage data at an individual loan level was either not available or of such poor quality as to be virtually useless (e.g. hard to get accurate ZIP code data for each loan). Kevin described the mortgage data set as "wide" (lots of loans with lots of fields for each loan) rather than "deep" (lots of history), with one of the main data problems was trying to match nearest-neighbour loans. He mentioned that only post crisis have Fannie and Freddie been ordered to make individual loan data available, and that there is still no readily available linkage data between individual loans and mortgage pools (some presentations from a recent PRMIA event on mortgage analytics are at the bottom of the page here for interested readers).
Kevin said that Morgan Stanley had rejected the use of Hadoop, primarily due write through-put capabilities, which Kevin indicated was a limiting factor in many big data technologies. He indicated that for his problem type that he still believed their infrastructure to be superior to even the latest incarnations of Hadoop. He also mentioned the technique of having 2x redundancy or more on the data/jobs being processed, aimed not just at failover but also at using the whichever instance of a job that finished first. Interestingly, he also added that Morgan Stanley's infrastructure engineers have a policy of rebooting servers in the grid even during the day/use, so fault tolerance was needed for both unexpected and entirely deliberate hardware node unavailability.
Other highlights from the day:
One of the most interesting talks was by Johan Walden of the Haas Business School, on the subject of "Investor Networks in the Stock Market". Johan explained how they had used big data to construct a network model of all of the participants in the Turkish stock exchange (both institutional and retail) and in particular how "interconnected" each participant was with other members. His findings seemed to support the hypothesis that the more "interconnected" the investor (at the centre of many information flows rather than add the edges) the more likely that investor would demonstrate superior return levels to the average. I guess this is a kind of classic transferral of some of the research done in social networking, but very interesting to see it applied pragmatically to financial markets, and I would guess an area where a much greater understanding of investor behaviour could be gleaned. Maybe Johan could do with a little geographic location data to add to his analysis of how information flows.
So overall a good day with some interesting talks - the statistical presentations were challenging to listen to at 4pm on a Friday afternoon but the wine afterwards compensated. I would also recommend taking a read through a paper by Charles S. Tapiero on "The Future of Financial Engineering" for one of the best discussions I have so far read about how big data has the potential to change and improve upon some of the assumptions and models that underpin modern financial theory. Coming back to my starting point in this post on the content of the talks, I liked the description that Charles gives of traditional "statistical" versus "data analytics" approaches, and some of the points he makes about data immediately inferring relationships without the traditional "hypothesize, measure, test and confirm-or-not" were interesting, both in favour of data analytics and in cautioning against unquestioning belief in the findings from data (feels like this post from October 2008 is a timely reminder here). With all of the hype and the hope around the benefits of big data, maybe we would all be wise to remember this quote by a certain well-known physicist: "No amount of experimentation can ever prove me right; a single experiment can prove me wrong."
A little late on these notes from this PRMIA Event on Big Data in Risk Management that I helped to organize last month at the Harmonie Club in New York. Big thank you to my PRMIA colleagues for taking the notes and for helping me pull this write-up together, plus thanks to Microsoft and all who helped out on the night.
Introduction: Navin Sharma (of Western Asset Management and Co-Regional Director of PRMIA NYC) introduced the event and began by thanking Microsoft for its support in sponsoring the evening. Navin outlined how he thought the advent of “Big Data” technologies was very exciting for risk management, opening up opportunities to address risk and regulatory problems that previously might have been considered out of reach.
Navin defined Big Data as the structured or unstructured in receive at high volumes and requiring very large data storage. Its characteristics include a high velocity of record creation, extreme volumes, a wide variety of data formats, variable latencies, and complexity of data types. Additionally, he noted that relative to other industries, in the past financial services has created perhaps the largest historical sets of data and continually creates enormous amount of data on a daily or moment-by-moment basis. Examples include options data, high frequency trading, and unstructured data such as via social media. Its usage provides potential competitive advantages in a trading and investment management. Also, by using Big Data it is possible to have faster and more accurate recognition of potential risks via seemingly disparate data - leading to timelier and more complete risk management of investments and firms’ assets. Finally, the use of Big Data technologies is in part being driven by regulatory pressures from Dodd-Frank, Basel III, Solvency II, Markets for Financial Instruments Directives (1 & 2) as well as Markets for Financial Instruments Regulation.
Navin also noted that we will seek to answer questions such as:
Presentation 1: Big Data: What Is It and Where Did It Come From?: The first presentation was given by Michael Di Stefano (of Blinksis Technologies), and was titled “Big Data. What is it and where did it come from?”. You can find a copy of Michael’s presentation here. In summary Michael started with saying that there are many definitions of Big Data, mainly defined as technology that deals with data problems that are either too large, too fast or too complex for conventional database technology. Michael briefly touched upon the many different technologies within Big Data such as Hadoop, MapReduce and databases such as Cassandra and MongoDB etc. He described some of the origins of Big Data technology in internet search, social networks and other fields. Michael described the “4 V’s” of Big Data: Volume, Velocity, Variety and a key point from Michael was “time to Value” in terms of what you are using Big Data for. Michael concluded his talk with some business examples around use of sentiment analysis in financial markets and the application of Big Data to real-time trading surveillance.
Presentation 2: Big Data Strategies for Risk Management: The second presentation “Big Data Strategies for Risk Management” was introduced by Colleen Healy of Microsoft (presentation here). Colleen started by saying expectations of risk management are rising, and that prior to 2008 not many institutions had a good handle on the risks they were taking. Risk analysis needs to be done across multiple asset types, more frequently and at ever greater granularity. Pressure is coming from everywhere including company boards, regulators, shareholders, customers, counterparties and society in general. Colleen used to head investor relations at Microsoft and put forward a number of points:
Colleen explained some of the reasons why good risk management remains a work in progress, and that data is a key foundation for better risk management. However data has been hard to access, analyze, visualize and understand, and used this to link to the next part of the presentation by Denny Yu of Numerix.
Denny explained that new regulations involving measures such as Potential Future Exposure (PFE) and Credit Value Adjustment (CVA) were moving the number of calculations needed in risk management to a level well above that required by methodologies such as Value at Risk (VaR). Denny illustrated how the a typical VaR calculation on a reasonable sized portfolio might need 2,500,000 instrument valuations and how PFE might require as many as 2,000,000,000. He then explain more of the architecture he would see as optimal for such a process and illustrated some of the analysis he had done using Excel spreadsheets linked to Microsoft’s high performance computing technology.
Presentation 3: Big Data in Practice: Unintentional Portfolio Risk: Kevin Chen of Opera Solutions gave the third presentation, titled “Unintentional Risk via Large-Scale Risk Clustering”. You can find a copy of the presentation here. In summary, the presentation was quite visual and illustrating how large-scale empirical analysis of portfolio data could produce some interesting insights into portfolio risk and how risks become “clustered”. In many ways the analysis was reminiscent of an empirical form of principal component analysis i.e. where you can see and understand more about your portfolio’s risk without actually being able to relate the main factors directly to any traditional factor analysis.
Panel Discussion: Brian Sentance of Xenomorph and the PRMIA NYC Steering Committee then moderated a panel discussion. The first question was directed at Michael “Is the relational database dead?” – Michael replied that in his view relational databases were not dead and indeed for dealing with problems well-suited to relational representation were still and would continue to be very good. Michael said that NoSQL/Big Data technologies were complimentary to relational databases, dealing with new types of data and new sizes of problem that relational databases are not well designed for. Brian asked Michael whether the advent of these new database technologies would drive the relational database vendors to extend the capabilities and performance of their offerings? Michael replied that he thought this was highly likely but only time would tell whether this approach will be successful given the innovation in the market at the moment. Colleen Healy added that the advent of Big Data did not mean the throwing out of established technology, but rather an integration of established technology with the new such as with Microsoft SQL Server working with the Hadoop framework.
Brian asked the panel whether they thought visualization would make a big impact within Big Data? Ken Akoundi said that the front end applications used to make the data/analysis more useful will evolve very quickly. Brian asked whether this would be reminiscent of the days when VaR first appeared, when a single number arguably became a false proxy for risk measurement and management? Ken replied that the size of the data problem had increased massively from when VaR was first used in 1994, and that visualization and other automated techniques were very much needed if the headache of capturing, cleansing and understanding data was to be addressed.
Brian asked whether Big Data would address the data integration issue of siloed trading systems? Colleen replied that Big Data needs to work across all the silos found in many financial organizations, or it isn’t “Big Data”. There was general consensus from the panel that legacy systems and people politics were also behind some of the issues found in addressing the data silo issue.
Brian asked if the panel thought the skills needed in risk management would change due to Big Data? Colleen replied that effective Big Data solutions require all kinds of people, with skills across a broad range of specific disciplines such as visualization. Generally the panel thought that data and data analysis would play an increasingly important part for risk management. Ken put forward his view all Big Data problems should start with a business problem, with not just a technology focus. For example are there any better ways to predict stock market movements based on the consumption of larger and more diverse sources of information. In terms of risk management skills, Denny said that risk management of 15 years ago was based on relatively simply econometrics. Fast forward to today, and risk calculations such as CVA are statistically and computationally very heavy, and trading is increasingly automated across all asset classes. As a result, Denny suggested that even the PRMIA PRM syllabus should change to focus more on data and data technology given the importance of data to risk management.
Asked how best to should Big Data be applied?, then Denny replied that echoed Ken in saying that understanding the business problem first was vital, but that obviously Big Data opened up the capability to aggregate and work with larger datasets than ever before. Brian then asked what advice would the panel give to risk managers faced with an IT department about to embark upon using Big Data technologies? Assuming that the business problem is well understood, then Michael said that the business needed some familiarity with the broad concepts of Big Data, what it can and cannot do and how it fits with more mainstream technologies. Colleen said that there are some problems that only Big Data can solve, so understanding the technical need is a first checkpoint. Obviously IT people like working with new technologies and this needs to be monitored, but so long as the business problem is defined and valid for Big Data, people should be encouraged to learn new technologies and new skills. Kevin also took a very positive view that IT departments should be encouraged to experiment with these new technologies and understand what is possible, but that projects should have well-defined assessment/cut-off points as with any good project management to decide if the project is progressing well. Ken put forward that many IT staff were new to the scale of the problems being addressed with Big Data, and that his own company Opera Solutions had an advantage in its deep expertise of large-scale data integration to deliver quicker on project timelines.
Audience Questions: There then followed a number of audience questions. The first few related to other ideas/kinds of problems that could be analyzed using the kind of modeling that Opera had demonstrated. Ken said that there were obvious extensions that Opera had not got around to doing just yet. One audience member asked how well could all the Big Data analysis be aggregated/presented to make it understandable and usable to humans? Denny suggested that it was vital that such analysis was made accessible to the user, and there general consensus across the panel that man vs. machine was an interesting issue to develop in considering what is possible with Big Data. The next audience question was around whether all of this data analysis was affordable from a practical point of view. Brian pointed out that there was a lot of waste in current practices in the industry, with wasteful duplication of ticker plants and other data types across many financial institutions, large and small. This duplication is driven primarily by the perceived need to implement each institution’s proprietary analysis techniques, and that this kind of customization was not yet available from the major data vendors, but will become more possible as cloud technology such as Microsoft’s Azure develops further. There was a lot of audience interest in whether Big Data could lead to better understanding of causal relationships in markets rather than simply correlations. The panel responded that causal relationships were harder to understand, particularly in a dynamic market with dynamic relationships, but that insight into correlation was at the very least useful and could lead to better understanding of the drivers as more datasets are analyzed.
In relation to the Microsoft/PRMIA event that Brian moderated at last night in New York, I spotted this article recently that tries to map out all the different databases that are now commercially available in some form, from SQL to No SQL and all the various incarnations and flavours in between:
As Brian suggested in his recent post, It's amazing to see how much the landscape has evolved from the domination (mantra?) that there was the relational way, or no way. Obviously times have moved on (er, I guess the Internet happened for one thing...) and people are now far more accepting of the need for different approaches to different types and sizes of business problems. That said, I agree with the article and comments that suggest there do seem to be far too many options available now - there has to be some consolidation coming otherwise it will become increasingly difficult to know where to start. Choice is a wonderful thing, but only in moderation!
Good breakfast event from SAP and A-Team last Thursday morning. SAP have been getting (and I guess paying for) a lot of good air-time for their SAP Hana in-memory database technology of late. Domenic Iannaccone of SAP started the briefing with an introduction to big data in finance and how their SAP/Sybase offerings knitted together. He started his presentation with a few quotes, one being "Intellectual property is the oil of the 21st century" by Mark Getty (he of Getty images, but also of the Getty oil family) and "Data is the new oil" by both Clive Humby and Gerd Leonhard (not sure why two people quoted saying the same thing but anyway).
For those of you with some familiarity with the Sybase IQ architecture of a year or two back, then in this architecture SAP Hana seems to have replaced the in-memory ASE database that worked in tandem with Sybase IQ for historical storage (I am yet to confirm this, but hope to find out more in the new year). When challenged on how Hana differs from other in-memory database products, Domenic seemed keen to emphasise its analytical capabilities and not just the database aspects. I guess it was the big data angle of bring the "data closer to the calculations" was his main differentiator on this, but with more time I think a little bit more explanation would have been good.
Pete Harris of the A-Team walked us through some of the key findings of what I think is the best survey I have read so far on the usage of big data in financial markets (free sign-up needed I think, but you can get a copy of the report here). Some key findings from a survey of staff at ten major financial institutions included:
There were a few audience questions - Pete clarified that there was a more varied application of big data amongst sell-side firms, and that on the buy-side it was being applied more KYC and related areas. One of the audience made that point that he thought a real challenge beyond the insight gained from big data analysis was how to translate it into value from an operational point of view. There seemed to be a fair amount of recognition that regulators and auditors are wanting a full audit trail of what has gone on across the whole firm, so audit was seen as a key area for big data. Another audience member suggested that the lack of a rigid data model in some big data technologies enabled greater flexibility in the scope of questions/analysis that could be undertaken.
Coming back to the key findings of the survey, then one question I asked Pete was whether or not big data is a silver bullet for data integration. My motivation was that the survey and much of the press you read talks about how big data can pull all the systems, data and calculations together for better risk management, but while I can understand how massively scaleable data and calculation capabilities was extremely useful, I wondered how exactly all the data was pulled together from the current range of siloed systems and databases where it currently resides. Pete suggested that this was stil a problematic area where Enterprise Application Integration (EAI) tools were needed. Another audience member added that politics within different departments was not making data integration any easier, regardless of the technologies used.
Overall a good event, with audience interaction unsurprisingly being the most interesting and useful part.
NoSQL is an unfortunate name in my view for the loose family of non-relational database technologies associated with "Big Data". NotRelational might be a better description (catchy eh? thought not...) , but either way I don't like the negatives in both of these titles, due to aestetics and in this case because it could be taken to imply that these technologies are critical of SQL and relational technology that we have all been using for years. For those of you who are relatively new to NoSQL (which is most of us), then this link contains a great introduction. Also, if you can put up with a slightly annoying reporter, then the CloudEra CEO is worth a listen to on YouTube.
In my view NoSQL databases are complementary to relational technology, and as many have said relational tech and tabular data are not going away any time soon. Ironically, some of the NoSQL technologies need more standardised query languages to gain wider acceptance, and there will be no guessing which existing query language will be used for ideas in putting these new languages together (at this point as an example I will now say SPARQL, not that should be taken to mean that I know a lot about this, but that has never stopped me before...)
Going back into the distant history of Xenomorph and our XDB database technology, then when we started in 1995 the fact that we then used a proprietary database technology was sometimes a mixed blessing on sales. The XDB database technology we had at the time was based around answering a specific question, which was "give me all of the history for this attribute of this instrument as quickly as possible".
The risk managers and traders loved the performance aspects of our object/time series database - I remember one client with a historical VaR calc that we got running in around 30 minutes on laptop PC that was taking 12 hours in an RDBMS on a (then quite meaty) Sun Sparc box. It was a great example how specific database technology designed for specific problems could offer performance that was not possible from more generic relational technology. The use of database for these problems was never intended as a replacement for relational databases dealing with relational-type "set-based" problems though, it was complementary technology designed for very specific problem sets.
The technologists were much more reserved, some were more accepting and knew of products such as FAME around then, but some were sceptical over the use of non-standard DBMS tech. Looking back, I think this attitude was in part due to either a desire to build their own vector/time series store, but also understandably (but incorrectly) they were concerned that our proprietary database would be require specialist database admin skills. Not that the mainstream RDBMS systems were expensive or specialist to maintain then (Oracle DBA anyone?), but many proprietary database systems with proprietary languages can require expensive and on-going specialist consultant support even today.
The feedback from our clients and sales prospects that our database performance was liked, but the proprietary database admin aspects were sometimes a sales objection caused us to take a look at hosting some of our vector database structures in Microsoft SQL Server. A long time back we had already implemented a layer within our analytics and data management system where we could replace our XDB database with other databases, most notably FAME. You can see a simple overview of the architecture in the diagram below, where other non-XDB databases (and datafeeds) can "plugged in" to our TimeScape system without affecting the APIs or indeed the object data model being used by the client:
Data Unification Layer
Using this layer, we then worked with the Microsoft UK SQL team to implement/host some of our vector database structures inside of Microsoft SQL Server. As a result, we ended up with a database engine that maintained the performance aspects of our proprietary database, but offered clients a standards-based DBMS for maintaining and managing the database. This is going back a few years, but we tested this database at Microsoft with a 12TB database (since this was then the largest disk they had available), but still this contained 500 billion tick data records which even today could be considered "Big" (if indeed I fully understand "Big" these days?). So you can see some of the technical effort we put into getting non-mainstream database technology to be more acceptable to an audience adopting a "SQL is everything" mantra.
Fast forward to 2012, and the explosion of interest in "Big Data" (I guess I should drop the quotes soon?) and in NoSQL databases. It finally seems that due to the usage of these technologies on internet data problems that no relational database could address, the technology community seem to have much more willingness to accept non-RDBMS technology where the problem being addressed warrants it - I guess for me and Xenomorph it has been a long (and mostly enjoyable) journey from 1995 to 2012 and it is great to see a more open-minded approach being taken towards database technology and the recognition of the benefits of specfic databases for (some) specific problems. Hopefully some good news on TimeScape and NoSQL technologies to follow in coming months - this is an exciting time to be involved in analytics and data management in financial markets and this tech couldn't come a moment too soon given the new reporting requirements being requested by regulators.
I went along to "Demystifying Financial Services Semantics" on Tuesday, a one day conference put together by the EDMCouncil and the Object Management Group. Firstly, what are semantics? Good question, to which the general answer is that semantics are the "study of meaning". Secondly, were semantics demystified during the day? - sadly for me I would say that they weren't, but ironically I would put that down mainly to poor presentations rather than a lack of substance, but more of that later.
Quoting from Euzenat (no expert me, just search for Semantics in Wikipedia), semantics "provides the rules for interpreting the syntax which do not provide the meaning directly but constrains the possible interpretations of what is declared." John Bottega (now of BofA) gave an illustration of this in his welcoming speech at the conference by introducing himself and the day in PigLatin, where all of the information he wanted to convey was contained in what he said, but only a small minority of the audience who knew the rules of Pig Latin understood what he was saying. The rest of us were "upidstay"...
Putting this in the more in the context of financial markets technology and data management, the main use of semantics and semantic data models seem to be as a conceptual data model technique that abstract away from any particular data model or database implementation. To humour the many disciples of the "Church of Semantics", such a conceptual data model would also be self-describing in nature, such that you would not need a separate meta data model to understand it. For example take a look at say the equity example from what Mike Aitkin and the EDM Council have put together so far with their "Semantics Repository".
Abstraction and self-description are not new techniques (OO/SOA design anyone?) but I guess even the semantic experts are not claiming that all is new with semantics. So what are they saying? The main themes from the day seem to be that Semantics:
Certainly the issue of business and technology not understanding each other (enough) has been a constant theme of most of my time working in financial services (and indeed is one of the gaps we bridge here at Xenomorph). For example, one project I heard of a few years back was were an IT department had just delivered a tick database project, only for the business users to find that that it did not cope with stock splits and for their purposes was unusable for data analysis. The business people had assumed that IT would know about the need for stock split adjustments, and as such had never felt the need to explicitly specify the requirement. The IT people obviously did not know the business domain well enough to catch this lack of specification.
I think there is a need to involve business people in the design of systems, particularly at the data level (whilst not quite a "semantic" data model, the data model in TimeScape presents business objects and business data types to the end user, so both business people and technologist can use it without showing any detail of an underlying table or physical data structure). You can see a lot of this around with the likes of CADIS pushing its "you don't need a fixed data model" ETL/no datawarehouse type approach against the more rigid (and to some, more complete) data models/datawarehouses of the likes of Asset Control and GoldenSource. You also get the likes of Polarlake pushing its own semantic web and big data approach to data management as a next stage on from relational data models (however I get a bit worried when "semantic web" and "big data" are used together, sounds like we are heading into marketing hype overdrive, warp factor 11...)
So if Semantics is to become prevalent and deliver some of these benefits in bringing greater understanding between business staff and technologists, the first thing that has addressed is that Semantics is a techy topic at the moment, which would cause drooping eyelids on even the most technically enthused members of the business. Ontology, OWL, RDF, CLIF are all great if you are already in the know, but guaranteed to turn a non-technical audience off if trying to understand (demystify?) Semantics in financial markets technology.
Looking at the business benefits, many of the presenters (particularly vendors) put forward slides where "BAM! Look at what semantics delivered here!" was the mantra, whereas I was left with a huge gap in seeing how what they had explained had actually translated into the benefits they were shouting about. There needed to be a much more practical focus to these presentations, rather than semantic "magic" delivering a 50% reduction in cost with no supporting detail of just how this was achieved. Some of the "magic" seemed to be that there was no unravelling of any relational data model to effect new attributes and meanings in the semantic model, but I would suggest that abstracting away from relational representation has always been a good thing if you want to avoid collapsing under the weight of database upgrades, so nothing too new there I would suggest but maybe a new approach for some.
So in summary I was a little disappointed by the day, especially given the "Demystifying" title, although there were a few highlights with Mike Bennett's talk on FIBO (Financial Instruments Business Ontology) being interesting (sorry to use the "O" word). The discussion of the XBRL success story was also good, especially how regulators mandating this standard had enforced its adoption, but from its adoption many end consumers were now doing more with the data, enhancing its adoption further. In fact the XBRL story seemed to be model for regulators could improve the world of data in financial markets, through the provision and enforcement of the data semantics to be used with each new reporting requirement as they are mandated. In summary, a mixed day and one in which I learned that the technical fog that surrounds semantics in financial markets technology is only just beginning to clear.
Corporate (and social) America does lots of things very well - positiveness, enthusiam and lack of (English?) cynicism being some of the best attributes in my view - but other things are not so good such as long "townhall" conference calls with 30 people on the call and only 3 people taking part, and the seeming need to continue talking when it is already evident to you and many listening that you don't know what you are talking about. With these things in mind, I think this article "The Rise of the New Groupthink" in the New York Times is worth a read, as it challenges some of the mainstream practices on corporate collaboration and teaming, and comes out in quiet praise of the creative power of introverts. Seems like Dilbert's cubicle still has its merits in these days of open plan offices and desk sharing.
My colleagues Joanna Tydeman and Matthew Skinner attended the A-Team Group's Data Management for Risk, Analytics and Valuations event today in London. Here are some of Joanna's notes from the day:
Andrew Delaney, Amir Halton (Oracle)
Drivers of the data management problem – regulation and performance.
Key challenges that are faced – the complexity of the instruments is growing, managing data across different geographies, increase in M&As because of volatile market, broader distribution of data and analytics required etc. It’s a work in progress but there is appetite for change. A lot of emphasis is now on OTC derivatives (this was echoed at a CityIQ event earlier this month as well).
Having an LEI is becoming standard, but has its problems (e.g. China has already said it wants its own LEI which defeats the object). This was picked up as one of the main topics by a number of people in discussions after the event, seeming to justify some of the journalistic over-exposure to LEI as the "silver bullet" to solve everyone's counterparty risk problems.
Expressed the need for real time data warehousing and integrated analytics (a familiar topic for Xenomorph!) – analytics now need to reflect reality and to be updated as the data is running - coined as ‘analytics at the speed of thought’ by Amir. Hadoop was mentioned quite a lot during the conference, also NoSQL which is unsurprising from Oracle given their recent move into this tech (see post - a very interesting move given Oracle's relational foundations and history)
Impact of regulations on Enterprise Data Management requirements
Virginie O’Shea, Selwyn Blair-Ford (FRS Global), Matthew Cox (BNY Melon), Irving Henry (BBA), Chris Johnson (HSBC SS)
Discussed the new regulations, how there is now a need to change practice as regulators want to see your positions immediately. Pricing accuracy was mentioned as very important so that valuations are accurate.
Again, said how important it is to establish which areas need to be worked on and make the changes. Firms are still working on a micro level, need a macro level. It was discussed that good reasons are required to persuade management to allocate a budget for infrastructure change. This takes preparation and involving the right people.
Items that panellists considered should be on the priority list for next year were:
· Reporting – needs to be reliable and meaningful
· Long term forecasts – organisations should look ahead and anticipate where future problems could crop up.
· Engage more closely with Europe (I guess we all want the sovereign crisis behind us!)
· Commitment of firm to put enough resource into data access and reporting including on an ad hoc basis (the need for ad hoc was mentioned in another session as well).
Technology challenges of building an enterprise management infrastructure
Virginie O’Shea, Colin Gibson (RBS), Sally Hinds (Reuters), Chris Thompson (Mizuho), Victoria Stahley (RBC)
Coverage and reporting were mentioned as the biggest challenges.
Front office used to be more real time, back office used to handle the reference data, now the two must meet. There is a real requirement for consistency, front office and risk need the same data so that they arrive to the same conclusions.
Money needs to be spent in the right way and fims need to build for the future. There is real pressure for cost efficiency and for doing more for less. Discussed that timelines should perhaps be longer so that a good job can be done, but there should be shorter milestones to keep business happy.
Panellists described the next pain points/challenges that firms are likely to face as:
· Consistency of data including transaction data.
· Data coverage.
· Bringing together data silos, knowing where data is from and how to fix it.
· Getting someone to manage the project and uncover problems (which may be a bit scary, but problems are required in order to get funding).
· Don’t underestimate the challenges of using new systems.
Better business agility through data-driven analytics
Stuart Grant, Sybase
Discussed Event Stream Processing, that now analytics need to be carried out whilst data is running, not when it is standing still. This was also mentioned during other sessions, so seems to be a hot topic.
Mentioned that the buy side’s challenge is that their core competency is not IT. Now with cloud computing they are more easily able to outsource. He mentioned that buy side shouldn’t necessarily build in order to come up with a different, original solution.
Data collection, normalisation and orchestration for risk management
Andrew Delaney, Valerie Bannert-Thurner (FTEN), Michael Coleman (Hyper Rig), David Priestley (CubeLogic), Simon Tweddle (Mizuho)
Complexity of the problem is the main hindrance. When problems are small, it is hard for them to get budget so they have to wait for problems to get big – which is obviously not the best place to start from.
There is now a change in behaviour of senior front office management – now they want reports, they want a global view. Front office do in fact care about risk because they don’t want to lose money. Now we need an open dialogue between front office and risk as to what is required.
Integrating data for high compute enterprise analytics
Andrew Delaney, Stuart Grant (Sybase), Paul Johnstone (independent), Colin Rickard (DataFlux)
The need for granularity and transparency are only just being recognised by regulators. The amount of data is an overwhelming problem for regulators, not just financial institutions.
Discussed how OTCs should be treated more like exchange-traded instruments – need to look at them as structured data.
Sitting by the sea, you have just finished your MATLAB reading and now are wondering what to read next?
We have just published our "TimeScape Data Unification" white paper. Not a pocket edition I am afraid, but some of you may find it interesting.
It describes how - post-crisis - a key business and technical challenge for many large financial institutions is to knit together their many disparate data sources, databases and systems into one consistent framework than can meet the ongoing demands of the business, its clients and regulators. It then analyses the approaches that financial institutions have adopted to respond to this issue, such as implementing a ETL-type infrastructure or a traditional golden copy data management solution.
Taking on from their effectiveness and constraints, it then shows how companies looking to satisfy the need for business-user access to data across multyple systems should consider a "distributed golden copy" approach. This federated approach deals with disparate and distributed sources of data and should also provide easy and end-user interactivity whilst maintaining data quality and auditability.
The white paper is available here if you want to take a look and if you have any feedback or questions, drop us a line!
For those who are wondering what summer reading to take on holiday, we have just published our white paper "TimeScape and MATLAB", a pocket edition which outlines how TimeScape and MATLAB can be combined to provide enhanced data analysis and visualisation tools to financial organisations.
Whilst swimming in the blue ocean, walking in the countryside or enjoying a new country, take a break and find out how TimeScape's best of breed data capture and storage can be combined with the analytical capabilities of MATLAB to produce compelling solutions to real-world problems encountered within financial services.
Ok, ok, kidding here. Just go on holiday and enjoy your time off from complex financial problems!
But when you are back or if you are very interested (or sadly not going on holiday soon), please take a look at our white paper. It details how:
and much more.
Feel also free to suggest this summer reading to your friends (or enemies!).
I almost forgot to mention that I went along to the SIFMA event (previously known as the SIA show) last week to take a look around. For those of you not familiar with the SIFMA/SIA event, then it was the biggest financial services technology event I had ever attended/exhibited, taking up 5 massive floors at the Hilton NY. Everyone used to go there, and indeed that was one of the reasons (the only reason?) to go along. Now the event seems to dying a slow death, something I was going to write about but Adam Honore of Aite and Melanie Rodier beat me to it.
Risk management and data control remain at the top of the agenda at many financial institutions. Many have said that the recent crisis highlighted the need for more consistent, transparent, high quality data management, which I totally agree with (but working for Xenomorph, I would I guess!). Although the crisis started in 2007, it would seem that many organizations still do not have the data management infrastructure in place to achieve better risk management.
I moved apartment last week and had to face the terrifying prospect of visiting IKEA to buy some new furniture. On walking through the endless corridors of furniture ideas I wondered whether the people at major financial institutions feel as I did: I knew I needed two wardrobes, I knew the dimensions of the rooms, I knew how many drawers I wanted. Then I got to the wardrobes showroom, sat in front of the “Create your own wardrobe” IKEA software and the nightmare started. How many solutions are there to solve your problems? And how many solutions, once you get to know of their existence, make you aware of a problem you didn’t know you had? That’s how I spent 2 days at IKEA choosing my furniture and still I wonder whether in the end I got the right solution for my needs.
Coming back to risk management, I imagine the same dilemma may be faced by financial institutions looking to implement a data management solution. How many software providers are out there? What data model do they use? Are they flexible enough to satisfy evolving requirements? How can we achieve an integrated data management approach? Will they support all kind of asset classes, even the most complex?
In these times of new regulations where time goes fast and budget is tight, selection processes have become more scrupulous.
As often happens in life, when we need a plumber for example, or a new dentist, we look for positive recommendations, people willing to endorse the efficiency and reliability of the service. So, with this in mind, please take a look at the case study we put together with Rabobank International, who have been using our TimeScape analytics and data management system at their risk department since 2002 for consolidated data management. More client stories are also available on our website here: www.xenomorph.com/casestudies.
I hope that many of you will benefit from reading the case study and for any questions (on IKEA wardrobes too!), please get in touch...
Xenomorph has released its white paper 'Rates, Curves and Surfaces – Golden Copy Management of Complex Datasets'. The white paper describes how, despite the increasing interest in risk management and tighter regulations following the crisis, the management of complex datasets – such as prices, rates, curves and surfaces - remains an underrated issue in the industry. One that can undermine the effectiveness of an enterprise-wide data management strategy.
In the wake of the crisis, siloed data management, poor data quality, lack of audit trail and transparency have become some of the most talked about topics in financial markets. People have started looking at new approaches to tackle the data quality issue that found many companies unprepared after Lehman Brothers' collapse. Regulators – both nationally and internationally – strive hard to dictate parameters and guidelines.
In light of this, there seems to be a general consensus on the need for financial institutions to implement data management projects that are able to integrate both market and reference data. However, whilst having a good data management strategy in place is vital, the industry also needs to recognize the importance of model and derived data management.
Rates, curves and derived data management is too often a neglected function within financial institutions. What is the point of having an excellent data management infrastructure for reference and market data if ultimately instrument valuations and risk reports are run off spreadsheets using ad-hoc sources of data?
In this evolving environment, financial institutions are becoming aware of the implications of a poor risk management strategy but are still finding it difficult to overcome the political resistance across departments to implementing centralised standard datasets for valuations and risk.
The principles of data quality, consistency and auditability found in traditional data management functions need to be applied to the management of model and derived data too. If financial institutions do not address this issue, how will they be able to deal with the ever-increasing requests from regulators, auditors and clients to explain how a value or risk report was arrived at?
For those who are interested, the white paper is available here.
Last Thursday, I went along to an event organized by the Club Finance Innovation on the topic of “Independent valuations for the buy-side: expectations, challenges and solutions”.
The event was held at the Palais Brongniart in Paris, which, for those who don’t know (like me till Thursday), was built in the years 1807-1826 by the architect Brongniart by order of Napoleone Bonaparte, who wanted the building to permanently host the Paris stock exchange.
Speakers at the roundtable were:
The event focussed on the role of the buy-side in financial markets, looking in particular at the concept of independent valuations and how this has taken an important role after the financial downturn. However, all the speakers agreed that remains a large gap between the sell-side and buy-side in terms of competences and expertise in the field of independent valuations. The buy-side lacks the systems for a better understanding of financial products and should align itself to the best practices of the sell-side and bigger hedge funds.
The roundtable was started by Francis Cornut of DeriveXperts, who gave the audience a definition of independent valuation. Whilst valuation could be defined as the “set of data and models used to explain the result of a valuation”, Cornut highlighted how the difficulty is in saying what independent means; there is in fact a general confusion on what this concept represents: internal confusion, for example between the front office and risk control department of an institution, but also external confusion, when valuations are done by third-parties.
Cornut provided three criteria that an independent valuation should respect:
Independent valuations are the way forward for a better understanding of complex, structured financial products. Cornut advocated the need for financial parties (clients, regulators, users and providers) to invest more and understand the importance of independent valuations, which will ultimately improve risk management.
Jean-Marc Eber, President LexiFi, agreed that the ultimate objective of independent valuations is to allow financial institutions to better understand the market. To accomplish this, Eber pointed to the fact that when we speak about services to clients, we should first think of what are their real needs. The bigger umbrella of “buy-side” implies in fact different needs and there is often a contradiction on what regulators want: on one side, having independent valuations provided by independent third parties; on the other side, independent valuations really mean that internal users/staff do understand what there is underline the products that a company have.In the same way, we don’t just need to value products but also measure their risk and periodically re-value them.It is important, in fact, to have the whole picture of the product being evaluated in order to make the buy-side more competitive.
Another point on which the speakers agreed is traceability: as Eber said, financial products don’t exist just as they are, but they go under transformation and change several times. Therefore, the market needs to follow the products across its life cycle till its maturity stage and this pose a technology challenge, in providing scenario analysis for compliance and keeping track of the audit trail.
At the question, ‘what has the crisis changed’ panellists answered:
Eber: the crisis showed the need to be more competent and technical to avoid risk. He highlighted the need to understand the product and its underlying. Many speak of having a central repository for OTCs, obligations, etc but this needs more thinking from the regulators and the financial markets. Moreover, the markets should focus more on quality data and transparency.
Eric Benhamou, CEO pricing Partners, sees an evolution of the market as the crisis showed underestimated risks which are now being taken in consideration.
Claude Martini, CEO Zeliade, advocated the need for financial markets to implement best practices for product valuations: buy-side should apply the same practices already adopted by the sell-side and verify the hypotheses, price and risk related to a financial product.
Cornut admitted things have changed since 2005, when they launched DerivExperts and nobody seemed to be interested in independent valuations. People would ask what value they would get from an investment in independent valuations: yes, regulators are happy but what’s the benefit for me?
This is changing now that financial institutions know that a deeper understanding of financial products increases their ability to push the products to their clients. The speech I enjoyed the most was from Patrick Hénaff, associated professor at the University of Bretagne and formerly Global Head of Quantitative Analysis - Commodites at Merrill Lynch / Bank of America.
He took a more academic approach and contested the fact that having two prices to confront is thought to reduce the incertitude on the product but highlighting as this is not always the case. I found interesting his idea of giving a product price with a confidence interval or a ‘toxic index’ which would represent the incertitude about the product and reproduce the model risk which may originate from it.
We speak too often about the risk associated to complex products but Hénaff, explained how the risk exists even on simpler products, for example the calculation of VAR on a given stock positioning. A stock is extremely volatile and we can’t know its trend; providing a confidence interval is therefore crucial. What is new instead, it is the interest that many are showing in assigning a price to a determinate risk, whilst before model risk was considered a mere operational risk coming out from the calculation process. Today, a good valuation of the risk associated to a product can result in less regulatory capital used to cover the risk and as such it is gaining much more interest from the market.
Henaff describes two approaches currently taken from academic research on valuations:
1) Adoption of statistic simulation in order to identify the risk deriving from an incorrect calibration of the model. This consists in taking historical data and test the model, through simulations and scenarios, in order to measure the risk associated in choosing a model instead of another;)
2) Have more quality data. Lack of quality data implies that models chosen are inaccurate as it is difficult to identify exactly what model we should be using to price a product.
Model risk, which as said above was before considered an operational risk, now becomes of extremely importance as it can free up capital. Hénaff suggested that is key to find for model risk the equivalent of the VAR for market risk, a normalized measure. He also spoke about the concept of a “Model validation protocol”, giving the example of what happens in the pharmaceutical and biologic sectors: before launching a new pill into the market, this is tested several times.
Whilst in finance products are just given with their final valuation, the pharmaceutical sector provides a “protocol” which describes the calculations, analysis and processes used in order to get to the final value and their systems are organized to provide a report which would show all the deeper detail. To reduce risk, valuations should be a pre-trade process and not a post-trade.
This week, the A-Team group published a valuations benchmarking study which shows how buy-side institutions are turning more and more often to third-parties valuations, driven mainly by risk management, regulations and client needs. Many of the institutions interviewed also admitted that they will increase their spending in technology to automate and improve the pricing process, as well as the data source integration and the workflow.
This is in line on what has been said at the event I attended and confirmed by the technology representatives speaking at the roundtable.
I would like to end with what Hénaff said: there can’t be a truly independent valuation without transparency of the protocols used to get to that value.
Well, Rome wasn’t built in a day (and as it is my city we’re speaking about, I can say there is still much to build, but let’s not get into this!) but there is a great debate going on, meaning that financial institutions are aware of the necessity to take a step forward. Much is being said about the need for more transparency and a better understanding of complex, structured financial products and still there is a lot to debate. Easier said than done I guess but, as Napoleon would say, victory belongs to the most persevering!
Just caught up with this article appeared on the A-Team website - Bloomberg is facing pressure from the industry with regards to users concerns about its initiative to make its codes freely available (see previous post Truly "Open" Bloomberg?). In the article, Max Woolfenden, managing director of FOW Tradedata, recognizes the potential of the BSYM website but advocates more progresses to be made in order to improve completeness of the data offered and in particular to clarify what exactly 'open' means.
According to A-Team, Bloomberg is also facing pressure with regards to a possible introduction of a new licensing structure for Service Provider Agreement (SPA) contracts for fund administration clients. Under the new system, fund administrators would be required 'to pay per security in each individual client portfolio', effectively changing the status of the fund manager to that of data re-distributor with all the cost increases that implies. It will be interesting to see where this heads - will the administrators simply pass the data costs through to their clients, absorb some costs as a competitive play or simply move away from using Bloomberg data?
Sybase have acquired Aleri according to Finextra. It was less than a year ago when the complex event processing (“CEP”) vendors Aleri and Coral8 announced their merger (see press release); there was also a big buzz when Sybase announced a CEP capability based on Coral8 and Streambase decided to offer an Amnesty Program for Aleri-Coral8 Customers (see earlier post 'Merging in public is difficult...). And only a few months later, Microsoft announced that their CEP Orinoco (now integrated with SQL Server 2008 as StreamInsight) was heading to market (see post 'Microsoft CEP surfaces as 'Orinoco').
Another sign that CEP is moving more mainstream and that real-time everything is becoming more important? Or a good market for acquisitions?
...I am very concerned that I have previously missed an important requirement for data management solutions - a heavweight one judging by this great discussion on one of the Microsoft forums.
Seems like Microsoft have now gone public on the Microsoft TechEd site that they have a Complex Event Processing (CEP) engine that will be coming to market shortly (see MagmaSystems blog post ). One of my colleagues Mark Woodgate attended a briefing event at Microsoft for this technology back in February this year - here's an extract from some internal notes that Mark made back then:
"Microsoft CEP is very similar to StreamBase conceptually (and not unsurprisingly), in the sense that there are adapters and streams and how you merge and split them via some kind of query language is the same. However, StreamBase uses the StreamSQL which as we have seen is SQL-like in syntax but Microsoft CEP uses LINQ and .NET and although conceptually it is doing the same thing, it does not look the same. StreamBase’s argument was you can be an SQL programmer to use it and don’t need lower-level like .NET; however, it’s not SQL really as it has all these ‘extensions’ you have to learn so using .NET might look more tricky but in fact it makes sense. They don’t have a sexy GUI yet for designing CEP applications like StreamBase but it will be done in Visual Studio 2008.
Currently, you build various assemblies (I/O adapters, queries and functions) and then bolt them all together, called ‘binding’ by command line tool. You then deploy the application onto one or more machines using another tool so it’s a manual process right now. They are aware this needs to be made easier and more visual. They are allowing other libraries to be bolted in via the various SDKs so it’s pretty open and flexible. It works well with HPC and clusters/grids (or so they say) and of course can be used with SQL Server. The CEP engine also has a web interface based on SOAP so at least non-Windows based systems can talk to it"
The release of this technology will be an interesting addition to the CEP market and to the Microsoft technology stack in general. Assuming performance is at credible levels (i.e. not necessarily leading but not appalling either) it will certainly bring both technical and commercial pressure to bare on existing CEP vendors (see earlier post on Aleri/Coral8) and has the potential to broaden the usage of CEP. Obviously Linux-Lovers (sorry, I didn't mean to be personal...) will not agree with this, but Microsoft is putting together an interesting stack of technology when you see this CEP engine, Microsoft HPC and Microsoft Velocity coming together under .NET.
Sounds like Aleri and Coral8 in the CEP (Complex Event Processing) market are not doing the best job they could of managing the publicity surrounding their recent merger, not helped by announcement of a CEP capability by Sybase, based on Coral8 source code.
Explained more in a post on the Magmasystems Blog, and made more entertaining by the aggressive marketing tactics of Streambase in responding to the merger by offering a software trade-in facility for clients of Aleri and Coral8 (see press release).
I attended a good set of presentations from one of our implementation partners, D-Fine, yesterday on high performance computing (HPC). They had one of their hedge fund clients present talking about some work they had done in valuing derivatives on a cluster, however the most interesting part of the event was talking about Graphics Processing Units (GPUs) being used within HPC grids and clusters.
Seems that GPUs such as those made by Nvidia offer performance levels that easily exceed that possible on traditional CPU-based HPC solutions. Key to it seems to be the GPU being more specifically designed for vector and floating point operations with several hundred processing cores being available on one chip, whereas CPUs are understandably designed for more general processing requirements. So if the problem is well suited to GPUs (not always the case), one of the technologists from QuantCatalyst said that with optimisation, the performance could be improved by up to 10,000 times relative to CPU based cluster solutions.
As with Chris's article on Solid State Drives then GPU usage is not exactly a free lunch, with specific tools and compilers needed to be used for the moment until a more generic, industry accepted abstraction comes along. At 10,000 times quicker it might be worth the effort though - maybe Hank Paulson should take a look given the CDOs he will need to be valuing any day soon...
Interesting article on plans from Microsoft for a completely new OS (i.e. not a new version of Windows), currently called Midori, motivated by the move to distributed computing and the commercial/technical need not be held back by the legacy of the Windows OS code base. Click here for more.
Xenomorph is the leading provider of analytics and data management solutions to the financial markets. Risk, trading, quant research and IT staff use Xenomorph’s TimeScape analytics and data management solution at investment banks, hedge funds and asset management institutions across the world’s main financial centres.