26 posts categorized "Big Data"

30 April 2015

Enterprise Data Management - The Next Generation

Hi folks, there is a recording up of a webinar I took part in yesterday with the A-Team. The webinar was called Enterprise Data Management - The Next Generation and involved discussion about what's needed and what's next in EDM. You can get access to it via this link.

10 March 2015

TabbFORUM Video: The Interdependence of Data, Analytics, and Visualization

Quick plug for an interview I did recently with Paul Rowady on the Tabb Forum, you can get access to the video here and a brief summary of what we talked about is below. As ever, Paul has his humourous angle on things and this time my green socks got the "Umpa Lumpa" treatment (unfortunately you have to watch to the end to catch that one!). Last time it was my likeness to the lead singer of an Australian band. And for the record, we did also have a good conversion on data management and BI/visualization.

"As firms increasingly apply analytics to massive volumes of raw data, the amount of derived data is growing exponentially, and the need to apply strict governance to this derived data is more important than ever. To satisfy regulatory demands, the full data trail – including models and calculations – needs to be auditable, remarks Brian Sentance, CEO, Xenomorph. Unfortunately, there often is a disconnect between the validation of the raw data and the governance of the middle tier of derived data or analytics, he notes. Sentance and TABB Group’s Paul Rowady, principal and director of data and analytics research, examine the breakdown of data governance best practices, the risks involved, and the role of visualization tools in identifying data quality and data management shortfalls."

05 November 2014

Data Management Summit NYC from the A-Team

The A-Team put on another good event at DMS New York yesterday. Lots of good stuff talked and here are a few takeaways that I remember, after a photo of Ludwig D'Angelo of JPMorgan:

WP_20141104_12_33_59_Raw

  • Data Utilities - One of presenters said that "Data Utility" was a really overused term second only to "Big Data". My comment would be that a lot of the managed services folks seem to want to talk about "Data Utilities" - seeming to prefer that term rather than what they are? Maybe because they perceive as better marketing and/or maybe because they hope to be annointed/appointed (how I don't know) as an industry "Data Utility". Anyway for me they fail to address the issue of client-specific data and its management very well, much to the detriment of their argument imho - although SmartStream did say that client data can be mixed up into the data services they offer. 
  • Andrew Gets Literaturally Physical - Andrew Delaney of the A-Team expressed a preference for "physical" books when talking about why the A-Team also prints the Regulatory Data Handbook2 as well as making it available online. I have to agree that holding a book still beats my Kindle experience but maybe I am just getting old. Andrew should check out this YouTube video on how the book was first introduced...
  • FIBO - The Financial Instrument Business Ontology (FIBO) was discussed in the context of trying to establish industry standards for data. As ever the usage of words like "Ontology" I suspect leaves a lot of business folks looking for the nearest double shot of expresso but that aside, seems like the EDM Council are making some progress on developing this standard. Main point from the event was industry adoption is key. I found some of the comments during the day a bit schizophrenic, in that some said that the regulators should not mandate standards (i.e. leave it to industry adoption and principles) but then in the next breath discussing the benefits (or otherwise) of the LEI (ok, not mandated but specific and coming from the regulators). Certainly the industry needs "help" (is that a strong enough word?) to get standards in place.
  • Data Quality - Lots on data quality with assessing the business value of data quality initiatives being a key point. On the same subject, Predrag of element-22 announced that the EDM Council will soon be announcing adoption of the Data Quality Index, which could be used to correlate data quality with operational KPIs for the business. 
  • Regulation (doh!) - It wouldn't be a data management event without lots of discussion on regulation - a key point being that even those regulations that are not directly/explicitly about data still imply that data management is key (take CVA calcs for example) - and on a related note it was suggested that BCBS239 should be considered as a more general data managment template for any business objective. 
  • Entity Hierarchies/LEI - Ludwig D'Angelo of JPMorgan gave a great talk and said that vendors were missing a massive opportunity in delivering good hierarchy datasets to clients, and that the effort expended on this at firms was enormous. Ludwig said that the lack of hierarchies in the Legal Entity Identifier (LEI) is a gap that the private sector could and should fill.  Ludwig also seemed initially to be thrown when one of the audience suggested that they were multiple "golden copies" of hierarchies needed, since definitions of ownership can differ depending on which department you are in (old battle of risk and finance departments again). Good discussion later of how regulation was driving all systems to be much more entity-centric rather than portfolio-centric, emphasising the importance of getting entity hierarchies right. 
  • DCAM - John Bottega did a great presentation on the Data Management Capability Model (DCAM). John asked Predrag of element-22 to speak about DCAM and he said that unlike previous models (DMM) then this framework would not only assess where you are in data management but will also show you where you need to go. DCAM covers data management strategy / operations / quality / business case / data architecture / tech architecture / governance / program. From what I could see it looked like a great framework - it appeared like common sense and obvious but that is in itself difficult to achieve so good effort I think. Element-22 will offer an online service around DCAM that will also allow anonymous benchmarking of data management capabilities as more institutions get involved (update: the service is called pellustro).
  • BCBS239 - Big thanks to John M. Fleming of BNY Mellon and Srikant Ganesan of Risk Focus for taking part in the panel with me. Less focus on spreadsheet use and abuse on this panel unlike the London Panel from last month. John had some very practical ideas such as the use of Wikis to publish/gather data dictionary information and with a large legacy infrastructure you are better documenting differences in definitions across systems rather than trying to change the world from day one. Echoing some of the points from DMS London, it was thought that making the use of internal data standards as part of a project sign off was very pragmatic data governance, but that also some systems should be marked/assessed as obsolete/declining and hence blocked from any additional usage in new project work. Bit of a plug for some of our recent work on data validation and exception management, but the panel said that BCBS239 needs to encompass audit/lineage on calculations/derived data/rules in addition to just the raw data

You can get more on the day by taking a look at my feed via @TheLongSentance and involving others at #DMSNYC.

 

16 October 2014

TabbForum MarketTech 2014: Game of Smarts

A great afternoon event put on by TabbFORUM in New York yesterday with a number of panels and one on one interviews (see agenda). You can see some of went on at the event via the hashtag #TabbTech or via the @XenomorphNews feed.

WP_20141015_16_36_01_Raw

"Death of Legacy" Panel Discussion

11 September 2014

A-Team DMS Awards 2014 - Xenomorph on the Cloud

A-Team’s DMS Data Management Awards close on the 26th of September so if you haven't already, please vote for Xenomorph!

Xenomorph on the Cloud - First of a few lookbacks at what we have been doing over the past year - firstly with a short animation about one of our major initiatives this year, cloud provision of data management and a new venture into cloud-based data publishing with the TimeScape MarketPlace

So it would be fantastic if you could support Xenomorph by voting here

Thank you!

14 July 2014

NoSQL Document Database - Manhattan MarkLogic

Bit late in posting this up, but given I did something about RainStor I thought I should write up my attendance at a MarkLogic event day in downtown Manhattan from several weeks back - their NoSQL database is used to serve up content on the bbc web site if you wanted some context. They are unusual for the NoSQL “movement” in that they are a proprietary vendor in a space that is dominated by open source databases and the companies that offer support for them. The database they most seem to compete with in the NoSQL space seems to be MongoDB, where both have origins as “document databases” (- managing millions of documents is one of the most popular uses for big data technology at the moment, though not so much publicized as more fashionable things like swallowing a twitter feed for sentiment analysis for example).

In order to cope with the workloads needing to be applied to data, MarkLogic argue that data has escaped from the data centre in terms of need separate data warehouses and ETL processes aligned with each silo of the business. They put forward the marketing message that MarkLogic allows the data to come back into the data center given it can be a single platform for where all data lives and all workloads applied to it. As such it is easy to apply proper data governance if the data is in one place rather than distributed across different databases, systems and tools.

Apparently MarkLogic started out with the aims of offering enterprise search of corporate data content but has evolved much beyond just document management. Gary Bloom, their CEO, described the MarkLogic platform as the combination of:

• Database
• Search Engine
• Application Services

He said that the platform is not just the database but particularly search and database together, aligned with the aim of not just storing data and documents but with the aim of getting insights out of the data. Gary also mentioned the increasing importance of elastic compute and MarkLogic has been designed to offer this capability to spin up and down with usage, integrating with and using the latest in cloud, Hadoop and Intel processors.

Apparently one of the large European investment banks is trying to integrate all of their systems for post-trade analysis and regulatory reporting. The bank apparently tried doing this by adopting a standard relational data model but faced two problems in that 1) the relational databases were not standard and 2) that it was difficult to get to and manage an overarching relational schema. On the schema side of things, the main problem they were alluding to seemed to be one schema changing and having to propagate that through the whole architecture. The bank seems now to be having more success now that they have switched to MarkLogic for doing this post-trade analysis – from a later presentation seems like things like trades are taken directly from the Enterprise Service Bus so saving the data in the message as is (schema-less).

One thing that came up time and time again was their pitch that MarkLogic is “the only Enterprise NoSQL database” with high availability, transactional support (ACID) and security built in. He criticized other NoSQL databases for offering “eventual consistency” and said that they aspire to something better than that (to put it mildly). I thought it was interesting over a lunch chat that one of MarkLogic guys said that "MongoDB does a lot of great pre-sales for MarkLogic" meaning I guess that MongoDB is the marketing "poster child" of NoSQL document databases so they get the early leads, but as the client widens the search they find that only MarkLogic is "enterprise" capable. You can bet that the MongoDB team disagree (and indeed they do...).

On the consistency side, Gary talked about “ObamaCare” aka HealthCare.gov that MarkLogic were involved in. First came some performance figures of how they were handling 50,000 transactions/sec with 4-5ms response time for 150,000 concurrent users. This project suffered from a lot of technical problems which really came down to problems of running the system based on a fragile infrastructure with weaknesses in network, servers and storage. Gary said that the government technologists were expecting data consistency problems when things like the network went down, but the MarkLogic database is ACID and all that was needed was to restart the servers once the infrastructure was ready. Gary also mentioned that he spent 14 years working at Oracle (as a lot of the MarkLogic folks seem to have) but it was only really until Oracle 7 that they could really say they offered data consistency.

On security, again there was more criticism of other NoSQL database for offering access to either all of the data or none of it. The analogy used was one of going to an ATM and being offered access to everyone’s money and having to trust each client to only take their own. Continuing the NoSQL criticism, Gary said that he did not like the premise put around that “NoSQL is defined by Open Source” – his argument was that MarkLogic generates more revenue than all the other NoSQL databases on the market. Gary said that one client said that they hosted a “lake of data” in Hadoop but said that Hadoop was a great distributed file system but still needs a database to go with it.

Gary then talked about some of the features of MarkLogic 7, their current release. In particular that MarkLogic 7 offered scale out elasticity but with full ACID support (apparently achieving one should make it not possible to achieve the other), high performance and a flexible schema-less architecture. Gary implied that the marketing emphasis had changed recently from “big data” pitch of a few years back to include both unstructured and structured data but within one platform, so dealing with heterogeneous data which is a core capability of MarkLogic. Other features mentioned were support for XML, JSON and access through a Rest API. Usage of MarkLogic as a semantic database (a triple store) and support for the semantic query language Sparql. Gary mentioned that semantic technology was a big area of growth for them. He also mentioned support for tiered stored on HDFS.

The conversation them moved on to what’s next with version 8 of Mark Logic. The main thing is “Ease of Use” for the next release with the following features:

• MarkLogic Developer – freely downloadable version
• MarkLogic Essential Enterprise – try it for 99c/hour on AWS
• MarkLogic Global Enterprise – 33% less (decided to spend less time on the sales cycle)
• Training for free – all classes sold out – instructor led online

Along this ease of use theme, MarkLogic acknowledged that using their systems needs to be easier and that in addition to XML/XQuery programming they will be adding native support for JavaScript, greatly expanding the number of people who could program with MarkLogic. In terms of storage formats, then in addition to XML they will be adding full JSON support. On the semantics side they will offer full support for RDF, Sparql 1.1. and inferencing. Bi-temporal support will also be added with a view to answering the kind of regulatory driven questions such as “what did they know and when did they know it?”.

Joe Pasqua, SVP of Product Strategy, then took over from Gary for a more technical introduction to the MarkLogic platform. He started by saying that MarkLogic is a schema-less database with a hierarchical data model that is very document-centric, and can be used for both structured and unstructured data. Data is stored in compressed trees with the system. Joe then explained how the system is indexed explaining the “Universal Index” which lists where to find the following kinds of data as in most good search engines:

• Words
• Phrases
• Stemmed words and phrasing
• Structure (this is indexed too as new documents come in)
• Words and phrases in the context of structure
• Values
• Collections
• Security Permissions

Joe also mentioned that a “range index” is used to speed up comparisons, apparently in a similar way to column store. Geospacial indices are like 2D range indices for how near things are to a point. The system also supports semantic indices, indexing on triples of subject-predicate-object.

He showed how the system has failover replication within a database cluster for high availability but also full replication for disaster recover purposes. There were continual side references to Oracle as a “legacy database”.

On database consistency and the ACID capability Joe talked about MVCC (Multi Version Concurrency Control). Each “document” record in MarkLogic seems to have a start and end time for how current it is, and these values are used when updating data to avoid any reduction in read availability. When a document is updated a copy of it is taken but made hidden until ready – the existing document remains available until the update is ready, and then the document “end time” in the old record is marked and the “start time” marked on the new record. So effectively always doing append in serial form not seeking on disk, and the start and end time for the record enables bitemporal functionality to be implemented. Whilst the new record is being created it is already being indexed so there is zero latency searching once the new document is live.

One of the index types mentioned by Joe was a “Reverse Index” where queries are indexed and as a new document comes in it is passed over these queries (sounds like the same story from the complex event processing folks) and can trigger alerts based on what documents fit each query.

In summary, the event was a good one and MarkLogic seems interesting technology and there seems to be a variety of folks using it in financial markets with the post trade analysis example (bit like RainStor I think though, as an archive) and others using it more in the reference data space. Not sure how much MarkLogic is real-time capable – seems to be a lot of emphasis on post trade. Also brought home to me the importance of search and database together which seems to be a big strength of their technology. 

01 July 2014

Cloud, data and analytics in London - thanks for coming along!

We had over 60 folks along to our event our the Merchant Taylors' Hall last week in London. Thanks to all who attended, all who helped with the organization of the event and sorry to miss those of you that couldn't come along this time.

Some photos from the event are below starting with Brad Sevenko of Microsoft (Director, Capital Markets Technology Strategy) in the foreground with a few of the speakers doing some last minute adjustments at the front of the room before the guests arrived:

AzureUK-1

 

Rupesh Khendry of Microsoft (Head of World-Wide Capital Markets Solutions) started off the presentations at the event, introducing Microsoft's capital markets technology strategy to a packed audience:

AzureUK-3

 

After a presentation by Virginie O'Shea of Aite Group on Cloud adoption in capital markets, Antonio Zurlo (below) of Microsoft (Senior Program Manager) gave a quick introduction to the services available through the Microsoft Azure cloud and then moved on to more detail around Microsoft Power BI:

AzureUK-5

 

After Antonio, then yours truly (Brian Sentance, CEO, Xenomorph) gave a presentation on what we have been building with Microsoft over the past 18 months, the TimeScape MarketPlace. At this point in the presentation I was giving some introductory background on the challenges of regulatory compliance and the pros and cons between point solutions and having a more general data framework in place:

AzureUK-6

 

The event ended with some networking and further discussions. Big thanks to those who came forward to speak with me afterwards, great to get some early feedback.

AzureUK-8

 

24 June 2014

Cloud, data and analytics in London. Tomorrow Wednesday 25th June.

One day to go until our TimeScape MarketPlace breakfast briefing "Financial Markets Data and Analytics. Everywhere You Need Them" at Merchant Taylor's Hall tomorrow, Wednesday June 25th. With over ninety people registered so far it should be a great event, but if you can make it please register and come along, it would be great to see you there.

19 June 2014

Cloud, data and analytics in London. Next Wednesday June 25th.

Less than one week to go until our TimeScape MarketPlace breakfast briefing "Financial Markets Data and Analytics. Everywhere You Need Them" at Merchant Taylor's Hall on Wednesday June 25th. 

Come and join Xenomorph, Aite Group and Microsoft for breakfast and hear Virginie O'Shea of the analyst firm Aite Group offering some great insights from financial institutions into their adoption of cloud technology, applying it to address risk management, data management and regulatory reporting challenges.

Microsoft will be showing how their new Power BI can radically change and accelerate the integration of data for business and IT staff alike, regardless of what kind of data it is, what format it is stored in or where it is located.

And Xenomorph will be demonstrating the TimeScape MarketPlace, our new cloud-based data mashup service for publishing and consuming financial markets data and analytics. 

In the meantime, please take a look at the event and register if you can come along, it would be great to see you there.

11 June 2014

Financial Markets Data and Analytics. Everywhere London Needs Them.

Pleased to announce that our TimeScape MarketPlace event "Financial Markets Data and Analytics. Everywhere You Need Them" is coming to London, at Merchant Taylor's Hall on Wednesday June 25th. 

Come and join Xenomorph, Aite Group and Microsoft for breakfast and hear Virginie O'Shea of the analyst firm Aite Group offering some great insights from financial institutions into their adoption of cloud technology, applying it to address risk management, data management and regulatory reporting challenges.

Microsoft will be showing how their new Power BI can radically change and accelerate the integration of data for business and IT staff alike, regardless of what kind of data it is, what format it is stored in or where it is located.

And Xenomorph will be demonstrating the TimeScape MarketPlace, our new cloud-based data mashup service for publishing and consuming financial markets data and analytics. 

In the meantime, please take a look at the event and register if you can come along, it would be great to see you there.

14 May 2014

Clients and Partners. Everywhere You Need Them.

Quick thank you to the clients and partners who took some time out of their working day to attend our breakfast briefing, "Financial Markets Data and Analytics. Everywhere You Need Them." at Microsoft's Times Square offices last Friday morning. Not particularly great weather on here in Manhattan so it was great to see around 60 folks turn up...

Photo 1

 
Rupesh Khendry of Microsoft (Head of World-Wide Capital Markets Solutions) started the event and set out the agenda for the morning. Rupesh described the expense of data within financial markets, and the difficulties experienced by risk managers in pulling together all the data and analytics they need...  Photo 2
 
 ...and following Rupesh was Antonio Zurlo (below) of Microsoft (Senior Program Manager) who explained the fundamentals of Microsoft Azure and what services and infrastructure it offers, including public cloud, virtual private cloud and hybrid cloud architectures. Antonio also described a key usage pattern for HPC/grid on Azure being used to "burst to the cloud" when on-premise infrasture needs to be extended for end/intra-day risk calcs...
Photo 3
 
Sang Lee (below) of Aite Group (Managing Partner) then delivered his presentation "Floating in the Capital Markets Cloud: Moving Beyond Data Storage". Sang's main findings from the survey of 20 financial institutions were that concerns about security and SLAs relating to cloud usage remain, but even those that were concerned about this also said they were planning to start a cloud project within the next 24 months. Cloud technology seems to becoming more acceptable of late, and Sang said this seems to be due to regulation, cost pressures and the desire to offer better services to clients. Sang confirmed that HPC/Grid with "burst to the cloud" is a common usage pattern and that "Data as a Service" is becoming more popular... 
Photo 4
 
Fred Veasley (below) of Microsoft (Tech Solutions Professional) to introduce Microsoft Power BI and Office 365. Fred explained how Power BI extended the capabilities of Excel with data search (finding and retrieving publicized data sources both within an organization and over the web), its integration capabilities with standard databases, NoSQL databases, data standards such as OData and new APIs/sources of data such as Facebook. Once downloaded, the data can be shaped and merged with other datasets (for instance combining data from positions databases/systems with analytics and data from the cloud), and kept up to date automatically. In addition to Power BI, Power View enables great visualizations and interactive dashboards to be created, and once finalized these can be deployed centrally via web pages down to end users...
Photo 5
 
After Fred, Brian Sentance (below), CEO of Xenomorph explained the origins of the TimeScape MarketPlace. Based on some discussions with Microsoft about 18 months back, the idea was effectively to firstly to get TimeScape running in the Microsoft Azure cloud, secondly to turn the data management capabilities of TimeScape "upside-down" by using it as a means to upload and publish data to the cloud and thirdly to provide one-to-many access to multiple sources of data via web interfaces and key delivery tools such as Microsoft Power BI. Put another way, without any local software or hardware infrastructure both business users and IT staff can access multiple data sources in the same format and using the same data model wherever the data is needed. In addition to .NET and Java interfaces to the TimeScape MarketPlace via OData, web API delivery into F#, Python, R and MATLAB are all in development...
Photo 1 - Copy
 
...and in addition to downloading data via Power BI, Brian also demonstrated how you could build on the data using "Power View" to create powerful analytical dashboard functionality that could be built and tested in Excel, then deployed centrally within a browser for access by users outside of Excel. He added that partners was one of the key aspects for the platform, and introduced the TimeScape MarketPlace Partner Program for the platform to get data, analytics, model vendors, software and service vendors involved and building on the platform. Andrew Tognela (below) of Microsoft (Worldwide Managing Director) closed the presentations...
Photo 4 - Copy

02 May 2014

7 days to go - Financial Markets Data and Analytics. Everywhere You Need Them.

Quick reminder that there are just 7 days left to register for Xenomorph's breakfast briefing event at Microsoft's Times Square offices on Friday May 9th, "Financial Markets Data and Analytics. Everywhere You Need Them."

With 90 registrants so far it looks to be a great event with presentations from Sang Lee of Aite Group on the adoption of cloud technology in financial markets, Microsoft showing the self-service (aka easy!) data integration capabilities of Microsoft Power BI for Excel, and introducing the TimeScape MarketPlace, Xenomorph's new cloud-based data mashup service for publishing and consuming financial markets data and analytics.

Hope to see you there and have a great weekend!

 

17 April 2014

Regulatory, Compliance, and Risk Data Technology Challenges - PRMIA

The New York Chapter of PRMIA hosted "Regulatory, Compliance, and Risk Data Technology Challenges" at Credit Suisse's offices in New York, last Thursday 10th April. Abraham Thomas introduce the panelists, and Don Wesnofske started off by setting the scene for the evening's event.

Don outlined how in reaction to the 2008 Crisis the regulators now require data retention for up to 10 years or more. Don cited one particular example where data must be reconstructed within 24 to 48 hours for any date up to 7 years back, and said that this kind of "forensic" investigation capability was an important consideration for many financial institutions. He took us through a good presentation slide of his view on data management/risk architecture, and outlined how operational risk is comprised of people, process, technology and events. Don ended his presentation by taking us through Wikipedia's definition of "Big Data", and in particular talked about how data has a life cycle going through:

  • Production
  • Retention
  • Archive
  • Purged

Don handed then handed over to Luigi Mercone of Credit Suisse who is a Director of Engineering Strategy & Architecture at Credit Suisse. Luigi started by saying that to the business at CS, he is technical support which involves asking "What is on fire today? And whats going to be on fire tomorrow?" Luigi described how some time back CS had regulatory enquiry around their equities business which required them to reconstruct data from 2 years back.

The project to do this took around 4-5 months of database adminstrators time to reconstruct the world as at that point in time (I guess because tape storage was being used, and this needed restoring to disk/database). This was for an equity order management system that had doubled in size every year for the past 17 years, and at that point CS was only retaining data going back 2 years. Luigi said that it was then thought that with new regulations requiring the ability to produce forensice evidence at any point in time would potentially swamp CS's resources unless it was addressed head on and strategically. 

Luigi described the original architecture that they were using being based on an in-memory database for intraday workloads, then standard Sybase (probably ASE I guess) and then Sybase IQ for longer term archiving, taking advantage of the column-store capabilities of Sybase IQ and the resulting data compression possible. He added that the data storage requirements of the system had grown from 150TB to 1.2PB in 4 years.

Luigi then offered a comparison of this original architecture with what he found by implementing RainStor, in the original architecture the Sybase IQ database compressed data down into 160TB, whereas this was improved by a further factor of 10 down to 14TB using RainStor. He said that the RainStor was self-service providing a standard SQL interface, eliminated the need for tape storage, reduced the system "footprint" by 90% at CS, was 1/5 of the cost and the performance was good. (I guess here I would like to caveat that I know nothing of the original architecture other than the summary Luigi provided, and as such it is hard to judge whether the original architecture was optimal for the data growth experienced, and hence whether this was overall an objective comparison of Sybase IQ's capabilities with RainStor.) Luigi closed by saying that whilst RainStor was a great archive database, its original origins were in in-memory databases and he would encourage RainStor to re-enter that market too, given his experience so far. 

John Bantleman CEO of RainStor took over and described how RainStor had been designed specifically for the needs of data archiving (I guess talking more about what it does now rather than its origins outlined by Luigi above). He said that RainStor offers a 20-40x storage footprint reduction over traditional database technology and operates efficiently even at the PetaByte (PB) scale, based around RainStor proprietary database technology making use of columnar storage and being capable of storing data in both relational-style tabular format and also in more "document" style using XML and JSON formats using Key-Value access. John mention that in terms of being able to store data that not only could RainStor retrieve data at a point in time, but it could retrieve the schema being used at that point in time for a more complete view of the state of the world at that point. This echos a couple of past articles that I have penned, one for IRD and one for Wilmott Magazine on bitemporal regulatory requirements.

John said that regulation was driving the need for data archiving capabilities, with 1400 regulations added since 2008 (not sure of source, but believable) and the comment from a Chief Data Officer (CDO) at one financial markets client that if a project wasn't driven by regulatory compliance then the project isn't going to get done (certainly sounds like regulatory overload). John's opening remarks were really around how regulatory cost, complexity and compliance were driving forces behind the growth of RainStor in financial services technology, and whilst regulation is the driver, firms should look at archiving of data as an opportunity too, in order to create value from corporate memory, and to be proactive in addressing future reporting and analysis needs.

John illustrated the regulatory need for data archiving through the Consolidated Audit Trail (CAT) regulation with data retention over 7 years will generate 100PB of data. He also mentioned SEC Rule 17a-4 for broker dealers as another example of "data retention" regulation, with particular reference to storage of records in on-rewriteable, non-erasable format. John termed this WORM storage, meaning Write Once, Read Many. John seemed to imply that both the software (RainStor) and the hardware it runs on (e.g. EMC or Teradata etc) need to be WORM compliant. One of the audience members asked John about BCBS 239, to which John said that he didn't know that particular regulation (fair enough that John didn't know in my opinion, RainStor's tech is general about "data" and is applicable across many industries, whereas BCBS 239 is obviously about banks specifically and is more about data aggregation and reporting than data retention/archiving to my understanding, and this seems to be confirmed with a quick doc scan for "archive" or "retention".)

To finish off the main part of the event (before the drinks and food began) there was a panel discussion. Luigi said that it was best to "prepare for all time, not just specifics" with respect to data retention and that there were dangers in rolling up data (effectively aggregating and loosing granularity to reduce storage needs). John added that his definition of "Big Data" was "All information, for ever". Luigi added that implementing RainStor had allowed CS to spend more time on interesting questions rather than on database restoration. John proposed that version 1 of Big Data involved the retention of web data, and as such loosing a data point here and their didn't matter. Version 2 of Big Data is concerned more with enterprise data where all data has value and needs to be retained i.e. lots of high value data. He added that this was an opportunity for risk and compliance to become an asset. 

WP_20140410_20_27_09_Raw

Abraham (second from left), Don (center) and John (second from right)

Overall it was a good event which I found very interesting (but I have to admit to a certain geeky interest in this kind of tech). The event would have benefitted from say another competitive or complementary technology vendor involved maybe, plus maybe an academic to give a different slant on data retention and on what the regulators hope to gain from this kind of mandated data retention. Not that the regulators have been that good at managing data themselves recently.

WP_20140410_19_52_58_Raw

 Networking afterwards courtesy of Credit Suisse and RainStor

 

 

 

 

 

 

 

15 April 2014

Financial Markets Data and Analytics. Everywhere You Need Them.

Very pleased to announce that Xenomorph will be hosting an event, "Financial Markets Data and Analytics. Everywhere You Need Them.", at Microsoft's Times Square New York offices on May 9th.

This breakfast briefing includes Sang Lee of the analyst firm Aite Group offering some great insights from financial institutions into their adoption of cloud technology, applying it to address risk management, data management and regulatory reporting challenges.

Microsoft will be showing how their new Power BI can radically change and accelerate the integration of data for business and IT staff alike, regardless of what kind of data it is, what format it is stored in or where it is located.

And Xenomorph will be introducing the TimeScape MarketPlace, our new cloud-based data mashup service for publishing and consuming financial markets data and analytics. More background and updates on MarketPlace in coming weeks.

In the meantime, please take a look at the event and register if you can come along, it would be great to see you there.

07 April 2014

When Big Data is not Big Understanding

Good article from Tim Harford (he of the enjoyable "Undercover Economist" books) in the FT last week called "Big data: are we making a big mistake". Tim injects some healthy realism into the hype of Big Data without dismissing its importance and potential benefits. The article talks about the four claims often made when talking about Big Data:

  1. Data analysis often produces uncannily accurate results
  2. Make statistical samplying obsolete by capturing all the data
  3. Statistical correlation is all you need - no need to understand causation
  4. Enough data means that scientific or statistical models aren't needed

Now models can have their own problems, but I can see where he is coming from, for instance 3. and 4. above seem to be in direct contradiction. I particularly like the comment later in the article that "causality won't be discarded, but it is being knocked off its pedestal as the primary fountain of meaning."

Also I liked the definition by one of the academics mentioned of a big data set being one where "N = All", and that you have "all" the data is an incorrect assumption behind some Big Data analysis put forward. Large data sets can mean that sample error is low, but sample bias is still a potentially big problem - for example everyone on Twitter is probably not representative of the population of the human race in general.

So I will now press save on this blog post, publish in Twitter and help re-enforce the impression that Big Data is a hot topic...which it is, but not for everyone I guess is the point.

 

 

20 March 2014

#DMS London - Building a Flexible Enterprise Architecture

You can find A-Team's view on "Building a Flexible Enterprise Architecture" here. Some additional notes/thoughts:

  • I thought Neil van Lint of GoldenSource's comment about "putting lipstick on a pig" with reference to legacy architectures was pretty funny and apt.
  • The old Irish joke about asking for directions and receiving the response "Well I wouldn't start from here" is also amusing but too true with our industry and most large organisations.
  • "Schema on read, not on write" is getting my award for phrase of the month from NoSQL proponents (quote Amir from Mark Logic).
  • Agree that ETL is problematic/a big resource drain but unless starting from a greenfield site it is currently unavoidable.
  • I like the idea of FIBO (and decoupling data meaning from data structure) but still left unsure what it actually (practically) covers so far and how much it is used, despite the references to it by Peter of Nordea. I guess it is all a matter of semantics.
  • I knew little of TOGAF mentioned by Rupert but maybe that is because I am a techie no more (if I ever was).
  • Rupert came back to his "where are we?" and data map questions and asked the audience how many of them had a good handle on where data was used in what systems - unsurprisingly not many with a Morgan Stanley guy saying that there monitoring systems were linked to the operational systems for a full inventory of data.
  • I agree that the regulators need to push standards directly on the industry - Amir ended the panel suggesting the regulators need to say things like "Thou shalt use FIBO".

2 First panel

12 March 2014

S&P Capital IQ Risk Event #2 - Enterprise or Risk Data Strategy?

Christian Nilsson of S&P CIQ followed up Richard Burtsal's talk with a presentation on data management for risk, containing many interesting questions for those considering data for risk management needs. Christian started his talk by taking a time machine back to 2006, and asking what were the issues then in Enterprise Data Management:

  1. There is no current crisis - we have other priorities (we now know what happened there)
  2. The business case is still too fuzzy (regulation took care of this issue)
  3. Dealing with the politics of implementation (silos are still around, but cost and regulation are weakening politics as a defence?)
  4. Understanding data dependencies (understanding this throughout the value chain, but still not clear today?)
  5. The risk of doing it wrong (there are risk you will do data management wrong given all the external parties and sources involved, but what is the risk of not doing it?)

Christian then moved on to say the current regulatory focus is on clearer roadmaps for financial institutions, citing Basel II/III, Dodd Frank/Volker Rule in the US, challenges in valuation from IASB and IFRS, fund management challenges with UCITS, AIFMD, EMIR, MiFID and MiFIR, and Solvency II in the Insurance industry. He coined the phrase that "Regulation Goes Hollywood" with multiple versions of regulation like UCITS I, II, III, IV, V, VII for example having more versions than a set of Rocky movies. 

He then touched upon some of the main motivations behind the BCBS 239 document and said that regulation had three main themes at the moment:

  1. Higher Capital and Liquidity Ratios
  2. Restrictions on Trading Activities
  3. Structural Changes ("ring fence" retail, global operations move to being capitalized local subsidiaries)

Some further observations were on what will be the implications of the effective "loss" of globablization within financial markets, and also what now can be considered as risk free assets (do such things now exist?). Christian then gave some stats on risk as a driver of data and technology spend with over $20-50B being spent over the next 2-3 years (seems a wide range, nothing like a consensus from analysts I guess!). 

The talk then moved on to what role data and data management plays within regulatory compliance, with for example:

  • LEI - Legal Entity Identifiers play out throughout most regulation, as a means to enable automated processing and as a way to understand and aggregate exposures.
  • Dodd-Frank - Data management plays within OTC processing and STP in general.
  • Solvency II - This regulation for insurers places emphasis on data quality/data lineage and within capital reserve requirements.
  • Basel III - Risk aggregation and counterparty credit risk are two areas of key focus.

Christian outlined the small budget of the regulators relative to the biggest banks (a topic discussed in previous posts, how society wants stronger, more effective regulation but then isn't prepared to pay for it directly - although I would add we all pay for it indirectly but that is another story, in part illustrated in the document this post talks about).

In addtion to the well-known term "regulatory arbitrage" dealing with different regulations in different jurisdictions, Christian also mentioned the increasingly used term "subsituted compliance" where a global company tries to optimise which jurisdictions it and its subsidiaries comply within, with the aim of avoiding compliance in more difficult regimes through compliance within others.

I think Christian outlined the "data management dichotomy" within financial markets very well :

  1. Regulation requires data that is complete, accurate and appropriate
  2. Industry standards of data management and data are poorly regulated, and there is weak industry leadership in this area.

(not sure if it was quite at this point, but certainly some of the audience questions were about whether the data vendors themselves should be regulated which was entertaining).

He also outlined the opportunity from regulation in that it could be used as a catalyst for efficiency, STP and cost base reduction.

Obviously "Big Data" (I keep telling myself to drop the quotes, but old habits die hard) is hard to avoid, and Christian mentioned that IBM say that 90% of the world's data has been created in the last 2 years. He described the opportunities of the "3 V's" of Volume, Variety, Velocity and "Dark Data" (exploiting underused data with new technology - "Dark" and "Deep" are getting more and more use of late). No mention directly in his presentation but throughout there was the implied extension of the "3 V's" to "5 V's" with Veracity (aka quality) and Value (aka we could do this, but is it worth it?). Related to the "Value" point Christian brought out the debate about what data do you capture, analyse, store but also what do you deliberately discard which is point worth more consideration that it gets (e.g. one major data vendor I know did not store its real-time tick data and now buys its tick data history from an institution who thought it would be a good idea to store the data long before the data vendor thought of it).

I will close this post taking a couple of summary lists directly from his presentation, the first being the top areas of focus for risk managers:

  • Counterparty Risk
  • Integrating risk into the Pre-trade process
  • Risk Aggregation across the firm
  • Risk Transparency
  • Cross Asset Risk Reporting
  • Cost Management/displacement

The second list outlines the main challenges:

  • Getting complete view of risk from multiple systems
  • Lack of front to back integration of systems
  • Data Mapping
  • Data availability of history
  • Lack of Instrument coverage
  • Inability to source from single vendor
  • Growing volumes of data

Christian's presentation then put forward a lot of practical ideas about how best to meet these challenges (I particularly liked the risk data warehouse parts, but I am unsurprisingly biassed). In summary if you get the chance then see or take a read of Christian's presentation, I thought it was a very thoughtful document with some interesting ideas and advice put forward.

 

 

 

 

 

 

 

03 March 2014

See you at the A-Team Data Management Summit this week!

Xenomorph is sponsoring the networking reception at the A-Team DMS event in London this week, and if you are attending then I wanted to extend a cordial invite to you to attend the drinks and networking reception at the end of day at 5:30pm on Thursday.

In preparation for Thursday’s Agenda then the blog links below are a quick reminder of some of the main highlights from last September’s DMS:

I will also be speaking on the 2pm panel “Reporting for the C-Suite: Data Management for Enterprise & Risk Analytics”. So if you like what you have heard during the day, come along to the drinks and firm up your understanding with further discussion with like-minded individuals. Alternatively, if you find your brain is so full by then of enterprise data architecture, managed services, analytics, risk and regulation that you can hardly speak, come along and allow your cerebellum to relax and make sense of it all with your favourite beverage in hand. Either way your you will leave the event more informed then when you went in...well that’s my excuse and I am sticking with it!

Hope to see you there!

06 December 2013

F# in Finance New York Style

Quick plug for the New York version of F# in Finance event taking place next Wednesday December 11th, following on from the recent event in London. Don Syme of Microsoft Research will be demonstrating access to market data using F# and TimeScape. Hope to see you there!

04 November 2013

Risk Data Aggregation and Risk Reporting from PRMIA

Another good event from PRMIA at the Harmonie Club here in NYC last week, entitled Risk Data Agregation and Risk Reporting - Progress and Challenges for Risk Management. Abraham Thomas of Citi and PRMIA introduced the evening, setting the scene by refering to the BCBS document Principles for effective risk data aggregation and risk reporting, with its 14 principles to be implemented by January 2016 for G-SIBs (Globally Systemically Important Banks) and December 2016 for D-SIBS (Domestically Systemically Important Banks).

The event was sponsored by SAP and they were represented by Dr Michael Adam on the panel, who gave a presentation around risk data management and the problems have having data siloed across many different systems. Maybe unsurprisingly Michael's presentation had a distinct "in-memory" focus to it, with Michael emphasizing the data analysis speed that is now possible using technologies such as SAP's in-memory database offering "Hana".

Following the presentation, the panel discussion started with a debate involving Dilip Krishna of Deloitte and Stephanie Losi of the Federal Reserve Bank of New York. They discussed whether the BCBS document and compliance with it should become a project in itself or part of existing initiatives to comply with data intensive regulations such as CCAR and CVA etc. Stephanie is on the board of the BCBS committee for risk data aggregation and she said that the document should be a guide and not a check list. There seemed to be general agreement on the panel that data architectures should be put together not with a view to compliance with one specific regulation but more as a framework to deal with all regulation to come, a more generalized approach.

Dilip said that whilst technology and data integration are issues, people are the biggest issue in getting a solid data architecture in place. There was an audience question about how different departments need different views of risk and how were these to be reconciled/facilitated. Stephanie said that data security and control of who can see what is an issue, and Dilip agreed and added that enterprise risk views need to be seen by many which was a security issue to be resolved. 

Don Wesnofske of PRMIA and Dell said that data quality was another key issue in risk. Dilip agreed and added that the front office need to be involved in this (data management projects are not just for the back office in insolation) and that data quality was one of a number of needs that compete for resources/budget at many banks at the moment. Coming back to his people theme, Dilip also said that data quality also needed intuition to be carried out successfully. 

An audience question from Dan Rodriguez (of PRMIA and Credit Suisse) asked whether regulation was granting an advantage to "Too Big To Fail" organisations in that only they have the resources to be able to cope with the ever-increasing demands of the regulators, to the detriment of the smaller financial insitutions. The panel did not completely agree with Dan's premise, arguing that smaller organizations were more agile and did not have the legacy and complexity of the larger institutions, so there was probably a sweet spot between large and small from a regulatory compliance perspective (I guess it was interesting that the panel did not deny that regulation was at least affecting the size of financial institutions in some way...)

Again focussing on where resources should be deployed, the panel debated trade-offs such as those between accuracy and consistency. The Legal Entity Identifier (LEI) initiative was thought of as a great start in establishing standards for data aggregation, and the panel encouraged regulators to look at doing more. One audience question was around the different and inconsistent treatment of gross notional and trade accounts. Dilip said that yes this was an issue, but came back to Stephanie's point that what is needed is a single risk data platform that is flexible enough to be used across multiple business and compliance projects.  Don said that he suggests four "views" on risk:

  • Risk Taking
  • Risk Management
  • Risk Measurement
  • Risk Regulation

Stephanie added that organisations should focus on the measures that are most appropriate to your business activity.

The next audience question asked whether the panel thought that the projects driven by regulation had a negative return. Dilip said that his experience was yes, they do have negative returns but this was simply a cost of being in business. Unsurprisingly maybe, Stephanie took a different view advocating the benefits side coming out of some of the regulatory projects that drove improvements in data management.

The final audience question was whether the panel through the it was possible to reconcile all of the regulatory initiatives like Dodd-Frank, Basel III, EMIR etc with operational risk. Don took a data angle to this question, taking about the benefits of big data technologies applied across all relevant data sets, and that any data was now potentially valuable and could be retained. Dilip thought that the costs of data retention were continually going down as data volumes go up, but that there were costs in capturing the data need for operational risk and other applications. Dilip said that when compared globally across many industries, financial markets were way behind the data capabilities of many sectors, and that finance was more "Tiny Data" than "Big Data" and again he came back to the fact that people were getting in the way of better data management. Michael said that many banks and market data vendors are dealing with data in the 10's of TeraBytes range, whereas the amount of data in the world was around 8-900 PetaBytes (I thought we were already just over into ZetaBytes but what are a few hundred PetaBytes between friends...).

Abraham closed off the evening, firstly by asking the audience if they thought the 2016 deadline would be achieved by their organisation. Only 3 people out of around 50+ said yes. Not sure if this was simply people's reticence to put their hand up, but when Abraham asked one key concern for many was that the target would change by then - my guess is that we are probably back into the territory of the banks not implementing a regulation because it is too vague, and the regulators not being too prescriptive because they want feedback too. So a big game of chicken results, with the banks weighing up the costs/fines of non-compliance against the costs of implementing something big that they can't be sure will be acceptable to the regulators. Abraham then asked the panel for closing remarks: Don said that data architecture was key; Stephanie suggested getting the strategic aims in place but implementing iteratively towards these aims; Dilip said that deciding your goal first was vital; and Michael advised building a roadmap for data in risk. 

 

 

 

 

21 October 2013

Credit Risk: Default and Loss Given Default from PRMIA

Great event from PRMIA on Tuesday evening of last week, entitled Credit Risk: The link between Loss Given Default and Default. The event was kicked off by Melissa Sexton of PRMIA, who introduced Jon Frye of the Federal Reserve Bank of Chicago. Jon seems to an acknowledged expert in the field of Loss Given Default (LGD) and credit risk modelling. I am sure that the slides will be up on the PRMIA event page above soon, but much of Jon's presentation seems to be around the following working paper. So take a look at the paper (which is good in my view) but I will stick to an overview and in particular any anecdotal comments made by Jon and other panelists.

Jon is an excellent speaker, relaxed in manner, very knowledgeable about his subject, humourous but also sensibly reserved in coming up with immediate answers to audience questions. He started by saying that his talk was not going to be long on philosophy, but very pragmatic in nature. Before going into detail, he outlined that the area of credit risk can and will be improved, but that this improvement becomes easier as more data is collected, and inevitably that this data collection process may need to run for many years and decades yet before the data becomes statistically significant. 

Which Formula is Simpler? Jon showed two formulas for estimating LGD, one a relatively complex looking formula (the Vasicek distribution mentioned his working paper) and the other a simple linear model of the a + b.x. Jon said that looking at the two formulas, then many would hope that the second formula might work best given its simplicity, but he wanted to convince us that the first formula was infact simpler than the second. He said that the second formula would need to be regressed on all loans to estimate its parameters, whereas the first formula depended on two parameters that most banks should have a fairly good handle on. The two parameters were Default Rate (DR) and Expected Loss (EL). The fact that these parameters were relatively well understood seemed to be the basis for saying the first formula was simpler, despite its relative mathematical complexity. This prompted an audience question on what is the difference between Probability of Default (PD) and Default Rate (DR). Apparently it turns out PD is the expected probability of default before default happens (so ex-ante) and DR is the the realised rate of default (so ex-post). 

Default and LGD over Time. Jon showed a graph (by an academic called Altman) of DR and LGD over time. When the DR was high (lots of companies failing, in a likely economic downtown) the LGD was also perhaps understandably high (so high number of companies failing, in an economic background that is both part of the causes of the failures but also not helping the loss recovery process). When DR is low, then there is a disconnect between LGD and DR. Put another way, when the number of companies failing is low, the losses incurred by those companies that do default can be high or low, there is no discernable pattern. I guess I am not sure in part whether this disconnect is due to the smaller number of companies failing meaning the sample space is much smaller and hence the outcomes are more volatile (no averaging effect), or more likely that in healthy economic times the loss given a default is much more of random variable, dependent on the defaulting company specifics rather than on general economic background.

Conclusions Beware: Data is Sparse. Jon emphasised from the graph that the Altman data went back 28 years, of which 23 years were periods of low default, with 5 years of high default levels but only across 3 separate recessions. Therefore from a statistical point of view this is very little data, so makes drawing any firm statistical conclusions about default and levels of loss given default very difficult and error-prone. 

The Inherent Risk of LGD. Jon here seemed to be focussed not on the probability of default, but rather on the conditional risk that once a default has occurred then how does LGD behave and what is the risk inherent from the different losses faced. He described how LGD affects i) Economic Capital - if LGD is more variable, then you need stronger capital reserves, ii) Risk and Reward - if a loan has more LGD risk, then the lender wants more reward, and iii) Pricing/Valuation - even if the expected LGD of two loans is equal, then different loans can still default under different conditions having different LGD levels.

Models of LGD

Jon showed a chart will LGC plotted against DR for 6 models (two of which I think he was involved in). All six models were dependent on three parameters, PD, EL and correlation, plus all six models seemed to produce almost identical results when plotted on the chart. Jon mentioned that one of his models had been validated (successfully I think, but with a lot of noise in the data) against Moody's loan data taken over the past 14 years. He added that he was surprised that all six models produced almost the same results, implying that either all models were converging around the correct solution or in total contrast that all six models were potentially subject to "group think" and were systematically all wrong in the ways the problem should be looked at.

Jon took one of his LGD models and compared it against the simple linear model, using simulated data. He showed a graph of some data points for what he called a "lucky bank" with the two models superimposed over the top. The lucky bit came in since this bank's data points for DR against LGD showed lower DR than expected for a given LGD, and lower LGD for a given DR. On this specific case, Jon said that the simple linear model fits better than his non-linear one, but when done over many data sets his LGD model fitted better overall since it seemed to be less affected by random data.

There were then a few audience questions as Jon closed his talk, one leading Jon to remind everyone of the scarcity of data in LGD modelling. In another Jon seemed to imply that he would favor using his model (maybe understandably) in the Dodd-Frank Annual Stress Tests for banks, emphasising that models should be kept simple unless a more complex model can be justified statistically. 

Steve Bennet and the Data Scarcity Issue 

Following Jon's talk, Steve Bennet of PECDC picked on Jon's issue of scare data within LGD modelling. Steve is based in the US, working for his organisation PECDC which is a cross border initiative to collect LGD and EAD (exposure at default) data. The basic premise seems to be that in dealing with the scarce data problem, we do not have 100 years of data yet, so in the mean time lets pool data across member banks and hence build up a more statistically significant data set - put another way: let's increase the width of the dataset if we can't control the depth. 

PECDC is a consortia of around 50 organisations that pool data relating to credit events. Steve said that capture data fields per default at four "snapshot" times: orgination, 1 year prior to default, at default and at resolution. He said that every bank that had joined the organisation had managed to improve its datasets. Following an audience question, he clarified that PECDC does not predict LGD with any of its own models, but rather provides the pooled data to enable the banks to model LGD better. 

Steve said that LGD turns out to be very different for different sectors of the market, particularly between SMEs and large corporations (levels of LGD for large corporations being more stable globally and less subject to regional variations). But also there is great LGD variation across specialist sectors such as aircraft finance, shipping and project finance. 

Steve ended by saying that PECDC was orginally formed in Europe, and was now attempting to get more US banks involved, with 3 US banks already involved and 7 waiting to join. There was an audience question relating to whether regulators allowed pooled data to be used under Basel IRB - apparently Nordic regulators allow this due to needing more data in a smaller market, European banks use the pooled data to validate their own data in IRB but in the US banks much use their own data at the moment.

Til Schuermann

Following Steve, Til Schuermann added his thoughts on LGD. He said that LGD has a time variation and is not random, being worse in recession when DR is high. His stylized argument to support this was that in recession there are lots of defaults, leading to lots of distressed assets and that following the laws of supply and demand, then assets used in recovery would be subject to lower prices. Til mentioned that there was a large effect in the timing of recovery, with recovery following default between 1 and 10 quarters later. He offered words of warning that not all defaults and not all collateral are created equal, emphasising that debt structures and industry stress matter. 

Summary

The evening closed with a few audience questions and a general summation by the panelists of the main issues of their talks, primarily around models and modelling, the scarcity of data and how to be pragmatic in the application of this kind of credit analysis. 

 

 

07 October 2013

#DMSLondon - Big Data, Cloud, In-Memory

Andrew Delaney introduced the second panel of the day, with the long title of "The Industry Response: High Performance Technologies for Data Management - Big Data, Cloud, In-Memory, Meta Data & Big Meta Data". The panel included Rupert Brown of UBS, John Glendenning of Datastax, Stuart Grant of SAP and Pavlo Paska of Falconsoft. Andrew started the panel by asking what technology challenges the industry faced:

  • Stuart said that risk data on-demand was a key challenge, that there was the related need to collapse the legacy silos of data.
  • Pavlo backed up Stuart by suggesting that accuracy and consistency were needed for all live data.
  • Rupert suggested that there has been a big focus on low latency and fast data, but raised a smile from the audience when he said that he was a bit frustrated by the "format fetishes" in the industry. He then brought the conversation back to some fundamentals from his viewpoint, talking about wholeness of data and namespaces/data dictionaries - Rupert said that naming data had been too stuck in the functional area and not considered more in isolation from the technology.
  • John said that he thought there were too many technologies around at the moment, particularly in the area of Not Only SQL (NoSQL) databases. John seemed keen to push NoSQL, and in particular Apache Cassandra, as post relational databases. He put forward that these technologies, developed originally by the likes of Google and Yahoo, were the way forward and that in-memory databases from traditional database vendors were "papering over the cracks" of relational database weaknesses.
  • Stuart countered John by saying that properly designed in-memory databases had their place but that some in-memory databases had indeed been designed to paper over the cracks and this was the wrong approach, exascerbating the problem sometimes.
  • Responding to Andrew's questions around whether cloud usage was more accepted by the industry than it had been, Rupert said he thought it was although concerns remain over privacy and regulatory blockers to cloud usage, plus there was a real need for effective cloud data management. Rupert also asked the audience if we knew of any good release management tools for databases (controlling/managing schema versioning etc) because he and his group were yet to find one. 
  • Rupert expressed that Hadoop 2 was of more interest to him at UBS that Hadoop, and as a side note mentioned that map reduce was becoming more prevalent across NoSQL not just within the Hadoop domain. Maybe controversially, he said that UBS was using less data than it used to and as such it was not the "big data" organisation people might think it to be. 
  • As one example of the difficulties of dealing with silos, Stuart said that at one client it required the integration of data from 18 different system to a get an overall view of the risk exposure to one counterparty. Stuart advocated bring the analytics closer to the data, enabling more than one job to be done on one system.
  • Rupert thought that Goldman Sachs and Morgan Stanley seem to do what is the right thing for their firm, laying out a long-term vision for data management. He said that a rethink was needed at many organisations since fundamentally a bank is a data flow.
  • Stuart picked up on this and said that there will be those organisations that view data as an asset and those that view data as an annoyance.
  • Rupert mentioned that in his view accountants and lawyers are getting in the way of better data usage in the industry.
  • Rupert added that data in Excel needed to passed by reference and not passed by value. This "copy confluence" was wasting disk space and a source of operational problems for many organisations (a few past posts here and here on this topic).
  • Moving on to describe some of the benefits of semantic data and triple stores, Rupert proposed that the statistical world needed to be added to the semantic world to produce "Analytical Semantics" (see past post relating to the idea of "analytics management").

Great panel, lots of great insight with particularly good contributions from Rupert Brown.

#DMSLondon - Data Architecture: Sticks or Carrots?

Great day on Thursday at the A-Team Data Management Summit in London (personally not least because Xenomorph won the Best Risk Data Management/Analytics Platform Award but more of that later!). The event kicked off with a brief intro from Andrew Delaney of the A-Team talking through some of the drivers behind the current activity in data management, with Andrew saying that risk and regulation were to the fore. Andrew then introduced Colin Gibson, Head of Data Architecture, Markets Division at Royal Bank of Scotland.

Data Architecture - Sticks or Carrots? Colin began by looking at the definition of "data architecture" showing how the definition on Wikipedia (now obviously the definitive source of all knowledge...) was not particularly clear in his view. He suggested himself that data architecture is composed of two related frameworks:

  • Orderly Arrangement of Parts
  • Discipline 

He said that the orderly arrangement of parts is focussed on business needs and aims, covering how data is sourced, stored, referenced, accessed, moved and managed. On the discipline side, he said that this covered topics such as rules, governance, guides, best practice, modelling and tools.

Colin then put some numbers around the benefits of data management, saying that for every dollar spend on centralising data saves 20 dollars, and mentioning a resulting 80% reduction in operational costs. Related to this he said that for every dollar spent on not replicating data saved a dollar on reconcilliation tools and a further dollar saved on the use of reconcilliation tools (not sure how the two overlap but these are obviously some of the "carrots" from the title of the talk). 

Despite these incentives, Colin added that getting people to actually use centralised reference data remains a big problem in most organisations. He said he thought that people find it too difficult to understand and consume what is there, and faced with a choice they do their own thing as an easier alternative. Colin then talked about a program within RBS called "GoldRush" whereby there is a standard data management library available to all new projects in RBS which contains:

  • messaging standards
  • standard schema
  • update mechanisms

The benefit being that if the project conforms with the above standards then they have little work to do for managing reference data since all the work is done once and centrally. Colin mentioned that also there needs to be feedback from the projects back to central data management team around what is missing/needing to be improved in the library (personally I would take it one step further so that end-users and not just IT projects have easy discovery and access to centralised reference data). The lessons he took from this were that we all need to "learn to love" enterprise messaging if we are to get to the top down publish once/consume often nirvana, where consuming systems can pick up new data and functionality without significant (if any) changes (might be worth a view of this post on this topic). He also mentioned the role of metadata in automating reconcilliation where that needed to occur.

Colin then mentioned that allocation of costs of reference data to consumers is still a hot topic, one where reference data lags behind the market data permissioning/metering insisted upon by exchanges. Related to this Colin thought that the role of the Chief Data Officer to enforce policies was important, and the need for the role was being driven by regulation. He said that the true costs of a tactical, non-standard approach need to be identifiable (quantifying the size of the stick I guess) but that he had found it difficult to eliminate the tactical use of pricing data sourced for the front office. He ended by mentioning that there needs to be a coming together of market data and reference data since operations staff are not doing quantitative valuations (e.g. does the theoretical price of this new bond look ok?) and this needs to be done to ensure better data quality and increased efficiency (couldn't agree more, have a look at this article and this post for a few of my thoughts on the matter). Overall very good speaker with interesting, practical examples to back up the key points he was trying to get across. 

 

21 June 2013

SIFMA - the antidote surfaces

Anyone has followed this blog for a while will know that I (and others) have charted the decline over recent years of the SIFMA Tech exhibition that takes place each June at the Hilton on 6th Avenue in New York. Take a look at this post from 2011, and then this one from 2012. I must admit that I was shocked to see the size of the exhibition this year, with two relatively small areas in direct contrast with the five soccer pitches of previous years filled with vendor stands, exhibits, lounges and bars. 

Given this background it is with some surprise that I can say Xenomorph has had a really good SIFMA in terms of getting to speak to clients, potential clients and partners. It helped that people seem very interested in our TimeScape on the Windows Azure Cloud demos (more of which below), but I have no self-delusions that the fact that Microsoft had a large number of Microsoft Surface RT tablets to give away to clients and partners was a strong driver of attendance in our part of the exhibition hall. So it seems that it takes a lot more to persuade people to come to a fintech exhibition in these days of social media and online video (As a long time iPad fan, I was quite impressed by the Surface, the GUI is better than iOS but it still has a few flakey things that need addressing, not least of which that I think that I am not allowed to use my corporate ID with Skype but only my personal email ID - I just love these user policy decisions from on high...)

Xenomorph was on the Microsoft booth, demoing TimeScape running on the Windows Azure Cloud containing market and reference data from Interactive Data, Numerix pricing analytics and using the "visual landscapes" from our new partner Aqumin. There was a lot of interest shown in our example demos on Azure of performance attribution, correlation matrix calculation, spread curve analysis, option instrument and portfolio pricing analytics - I think the penny was beginning to drop for a number of people that none of the (relatively) complex analytics was going on locally and that they could access the analysis from anywhere on any device that had an internet connection i.e. without any software to install. I also didn't hear so many people raise security concerns around cloud computing - maybe the pressure on operational costs in the market is driving some re-assessment of cloud computing? We also had a good panel discussion at the event with Microsoft and some of the above partners - as I was speaking I wasn't able to take notes but broadly the Numerix event from last week will give you a feel for what was said. 

Final thoughts go out to the Microsoft staff whose email addresses appeared in the SIFMA Tech literature - seeing some of the emails sent to them by people who wanted to get a free Surface but didn't get one (because, for example, they couldn't be bothered to actually come to the Microsoft booth...) are greatly revealing about human nature. There are still a lot of pushy people out there!

17 June 2013

Unifying Risk with Numerix, Tabb and Microsoft

Numerix ran a great event on Thursday morning over at Microsoft's offices here in New York. "The Road to Achieving a Unified View of Risk" was introduced by Paul Rowady of the TABB Group. As at our holiday event last December, Paul is a great speaker and trying to get him to stop talking is the main (positive) problem of working with him (his typical ebullience was also heightened by his appearance in the Wall Street Journal on Thursday, apparently involving nothing illegal he assured me and even about which his mother phoned him during his presentation...).  Paul started by saying that in their end of year review with his colleagues Larry Tabb and Adam Sussman, he suggested that Tabb Group needed to put more into developing the risk management thought leadership, which had led to today's introduction and the work Tabb Group have been doing with Numerix.

Having been involved in financial markets in Chicago, Paul is very bullish about the risk management capabilities of the funds and prop trading shops of the exchange traded options markets from days of old, and said that these risk management capabilities are now needed and indeed coming to the mainstream financial markets. Put another way, post crisis the need for a holistic view on risk has never been stronger. Considering bilateral OTC derivatives and the move towards central clearing, Paul said that he had been thinking that calculations such as CVA would eventually become as extinct as a dodo. However on using some data from the DTCC trade repository, he found that there are still some $65trillion notional of uncleared bilateral trades in the market, and that these will take a further 30 years to expire. Looking at swaptions alone the notional uncleared was $6trillion, and so his point was that bilateral OTC and their associated risks will be around for some time yet.

Paul put forward some slides showing back, middle and front-offices along different siloed business lines, and explained that back in the day when margins were fat and times were good, each unit could be run independently, with no overall view of risk possible given the range of siloed systems and data. In passing Paul also mentioned that one bank he had spoken two had 6,000 separate systems to support on just the banking side, let alone capital markets. Obviously post crisis this has changed, with pressures to reduce operational costs being a key driver at many institutions, and currently only valuation/reference data (+2.4%) and risk management (+1.2%) having increased budget spend across the market in 2013. Given operational costs and regulation such as CVA, risk management is having to move from being an end of day, post-trade process to being pre- and post-trade at intraday frequency. Paul said that not only must consistent approaches to data and analytics be taken across back, middle and front office in each business unit but now an integrated view of risk across business units must be taken (echos of an earlier event with Numerix and PRMIA). Considering consistent analytics, Paul mentioned his paper "The Risk Analytics Library" but suggested that "libraries" of everything were needed, so not just analytics, but libraries of data (data management anyone?), metadata, risk models etc.

Paul asked Ricardo Martinez of Deloite for an update on the regulatory landscape at the moment, and Ricardo responded by focusing down on the derivatives aspects Dodd-Frank. He first pointed out that even after a number of years the regulation was not yet finalized around collateral and clearing. A good point he made was that whilst the focus in the market at the moment is on compliance, he feels that the consequences of the regulation will ripple on over the next 5 years in terms of margining and analytics.

Some panel members disagreed with Paul over the premise that bilateral exotic trades will eventually disappear. Their point was that the needs of pension funds and other clients are very specific and there will always be a need for structured products, despite the capital cost incentives to move everything onto exchanges/clearing. Paul countered by saying that he didn't disagree with this, but the reason for suggesting that the exotics industry may die is trying to find institutions that can warehouse the risk of the trade. 

Satyam Kancharla of Numerix spoke next. Satyam said that two main changes struck him in the market at the moment. One was the adjustment to a mandated market structure with clearing, liquidity and capital changes coming through from the regulators. The other was increased operating efficiency for investment banks. Whilst it is probable that no in investment bank would ever get to the operational efficiency of a retail business like Walmart, this was however the direction of travel with banks looking at how to optimize collateral, optimize trading venues etc.

Satyam put forward that computing power is still adhering to Moore's law, and that as a result some things are possible now that were not before, and that a centralized architecture built on this compute power is needed, but just because it is centralized does not mean that it is too inflexible to deal with each business units needs. Coming back to earlier comments made by the panel, he put forward that a lot of quants are involved in simply re-inventing the wheel, to which Paul added that quants were very experienced in using words like "orthogonal" to confuse mere mortals like him and justify the repetition of business functionality available already (from Numerix obviously, but more of that later). Satyam said that some areas of model development were more mature than others, and that quants should not engage in innovation for innovation's sake. Satyam also made a passing reference to the continuing use of Excel and VBA is the main tool of choice in the front office, suggesting that we still have some way to go in terms of IT maturity (hobby-horse topic of mine, for example see post). 

Prompt by an audience question around data and analytics, Ricardo said that the major challenge towards sharing data was not technical but cultural. Against a background were maybe 50% of investment in technology was regulation-related, he said that there were no shortage of business ideas for P&L in the emerging "mandated" markets of the future, but many of these ideas required wholesale shifts in attitudes at the banks in terms of co-operation across departments and from front to back office. 

Satyam said that he thought of data and analytics as two sides of the same coin (could not agree more, but then again I would say that) in that analytics generate derived data which needs just as much management as the raw data. He said that it should be possible to have systems and architectures that manage the duality of data and analytics well, and these architectures did not have to imply rigidity and inflexibility in meeting individual business needs. 

There was then some debate of trade repositories for derivatives, where the panel discussed the potential conflict between the US regulators wanting competition in this area, but as Paul suggested having competition between DTCC, ICE, Bloomberg, LCH Clearnet etc also led to fragmentation. As such Paul put it that the regulators would need to "boil the ocean" to understand the exposures in the market. Ricardo also mentioned some of the current controversy over who owns the data in the trade repository. One of the panelists suggested that we should also keep an eye open to China and not necessarily get totally tied up in what is happening in "our" markets. The main point was that a huge economy such as China's could not survive without a sophisticated capital market to support it, and that China was not asleep in this regard.

A good audience question came from Don Wesnofske who asked how best to cope with the situation where an institution is selling derivatives based on one set of models, and the client is using another set of models to value the same trade. So the selling institution decides to buy/build a similar model to the client too, and Don wondered how the single analytic library practically helped this situation where I could price on one model and report my P&L using another. One panelist responded that it was mostly the assumptions behind each model that determined differences in price, and that heterogenious models and hence prices where needed for a market to function correctly. Another concurred on this and suggested there needed to be an "officially blessed" model with an institution against which valuations are compared. Amusingly for the audience, Steve O'Hanlon (CEO of Numerix) piped up that the problem was easy to resolve in that everyone should use Numerix's models. 

Mike Opal of Microsoft closed the event with his presentation on data, analytics and cloud computing. Mike started by illustrating that the number of internet-enabled devices passed the human population of the world in 2008 and by 2020 the number of devices would be 50 billion. He showed that the amount of data in the world was 0.8ZB (zetabytes) in 2009, and is projected to reach 8ZB by 2015 and 35ZB by 2020, driven primarily by the growth in internet-enabled devices. Mike also said that the Prism project so in the news of late was involving the construction of a server fame near Salt Lake City of 5ZB in size, so what the industry (in this case the NSA) is trying to do is unimaginable if we were to go back only a few years. He said that Microsoft itself was utterly committed to cloud computing, with 8 datacenters globally but 20 more in construction, at a cost of $500million per center (I recently saw a datacentre in Redmond, totally unlike what I expected with racks pre-housed in lorry containers, and the containers just unloaded within a gigantic hanger and plugged in - the person showing me around asked me who the busiest person was a Microsoft data center and the answer was the truck drivers...)

Talking of "Big Data", he first gave the now-standard disclaimer (as I have I acknowledge) that he disliked the phrase. I thought he made a good point in the Big Data is really about "Small Data", in that a lot of it is about having the capacity to analyze at tiny granular level within huge datasets (maybe journalists will rename it? No, don't think so). He gave a couple of good client case studies, one for Westpac and one for Phoenix on uses of HPC and cloud computing in financial services. He also mentioned the Target retailing story about Big Data, which if you haven't caught it is worth a read. One audience question asked him again how committed Microsoft was to cloud computing given competition from Amazon, Apple and Google. Mike responded that he had only joined Microsoft a year or two back, and in part this was because he believed Microsoft had to succeed and "win" the cloud computing market given that cloud was not the only way to go for these competitors, whereas Microsoft (being a software company) had to succeed at cloud (so far Microsoft have been very helpful to us in relation to Azure, but I guess Amazon and others have other plans.)

In summary a great event from Numerix with good discussions and audience interaction - helped for me by the fact that much of what was said (centralization with flexibility, duality of data and analytics, libraries of everything etc) fits with what Xenomorph and partners like Numerix are delivering for clients. 

 

 

 

 

 

 

 

Xenomorph: analytics and data management

About Xenomorph

Xenomorph is the leading provider of analytics and data management solutions to the financial markets. Risk, trading, quant research and IT staff use Xenomorph’s TimeScape analytics and data management solution at investment banks, hedge funds and asset management institutions across the world’s main financial centres.

@XenomorphNews



Blog powered by TypePad
Member since 02/2008