Thema:
Re:Achja flat
Autor: Eidolon
Datum:10.10.24 21:24
Antwort auf:Re:Achja von Alopex

>Spannend, erzähl mal mehr. :-)

Fünf Jahre in die Vergangenheit abtauchen ist tatsächlich spannend.
Ich habe meinen damaligen Projektbericht mal pseudonymisiert.
Hier kannst Du nun mein damaliges Projekt in Gänze nachlesen:
[http://www.denkeviel.de/maniac/CapstoneReportEidolon.pdf]

Natürlich würde ich heute viele Dinge anders machen, mit fünf Jahren mehr Erfahrung und den neuen Möglichkeiten von generativer KI - die aber zusätzlich das Halluzinationsproblem mit sich bringt. Aber ich wette, dass sich am grundsätzlichen Ergebnis nichts ändert.

Ich betrachtete keine Einzelaktien, sondern einen Marktindex (DJIA natürlich).

Die Aufgabe die ich mir stellte war folgende:

Problem Statement
ETFs (Exchange Traded Funds) are on the rise in stock market trading. This investment class replicates the companies represented in a stock market index such as the DJIA, either directly (direct DJIA company stocks) or „synthetically“ (meaning a part of ETF is invested outside DJIA companies, but still replicates DJIA performance). Bottom line: By investing in an DJIA-based ETF, one invests in the performance of the DJIA index. Each day, ETF investors holding DJIA-based ETF stock are faced with the question to hold, buy or sell their assets, based on their expectation of the short, medium and long-term development of the DJIA.

Core problem statement:
A machine learning model shall predict the future development direction of the DJIA
index based on historical data to support the ETF investor in decision-making.

The machine learning model shall use historical data of the DJIA index performance, and data about the „sentiment“ (mood) of news and market forecasts. The stock index prediction should address various timeframes (e.g. „tomorrow“, „1 week ahead“) so as to give the ETF investor a baseline for decision-making for various timeframes of trading. The development direction of stock index is expressed categorically only („small increase“; „huge decrease“ etc.). Rationale for choosing categorization (instead of keeping it a regression problem) is that it is expected upfront that the prediction accuracy will not be very high. Therefore, the goal is that the result of the model should not lead the investor to believe that it is a highly accurate forecast. By using categorical forecast values (i.e. ML classification problem) instead of numerical forecast values (i.e. ML regression problem), this goal is achieved.

Metrics
The appropriate evaluation metric is Accuracy, defined as: Accuracy of DJIA performance category prediction compared with actual DJIA performance category.

Ethics
The datasets used to tackle the problem are uncritical from an ethical point of view. They do not contain any personalised data; only publicly available non-personalised data is used. However, if it is really possible to build a stock market predicition model with consistently high accuracy (>>50%) which enables investors to consistently „outwit the market“, serious ethical implications would arise which would need to be further discussed. Especially regarding the propagation and publication of the model. Is it ethical if one investor, or a small group of investors, exploit the market? What would happen if the model is publicised and everyone starts using it? What kind of stock market would such a „global-scale feedback loop“ create? Also, philosophical questions would arise, like „is it really that easy to understand the psyche of humankind with regard to stock market participancy?


Ich kam zu folgendem Ergebnis:

Conclusion
It is interesting to note how important it is to pick the right machine learning algorithm and to tune the hyperparameters correctly for optimization of the algorithm performance. The accuracy ranges from 8,1% (least performing, not optimized) to 27,4% (best performing, optimized) with an average of 19,2%.
The most important quality of my project is that I could indeed outperform the "guessing“ benchmark of 12,5%, meaning my resulting model is twice as good as just guessing the evolution of the DJIA. But even this „important“ quality is useless in reality. Most significant is the fact that a „simple“ machine learning pipeline like this one has by far worse prediction accuracy than expert market analysts (‚Expert‘ benchmark at 48%), and clearly fails to raise any ethical red flags (‚Ethics‘ benchmark at 51%).


>Es funktioniert nicht.

Darauf verwette ich auch heute, fünf Jahre später, meinen Data Scientist Hut noch jeden Tag.



EDIT: Den Nachsatz in meiner Projekt-Conclusion möchte ich euch auch nicht vorenthalten, konnte mich gar nicht mehr dran erinnern. :-)

The model should not be used to support investment decisions.

Then again, this outcome of my project fits to my expectation for the problem – an „unsolvable“ one. In case there really was a model to consistently achieve >>50% accuracy, the ethical problems outlined in the introductory chapter would arise.
Come to think of it: If I had really solved this problem – i.e. achieving a prediction accuracy of >>50% - maybe I would not have written this project report, but instead keep the model for myself and get rich with it? Or would I have reported the model to the authorities? Who knows... ;-)


< antworten >