Artificial Intelligence – Muhammet Işık

Adaptive Fighting Robot Training with Reinforcement Learning

Muhammet Işık — Thu, 19 Mar 2026 20:05:33 +0000

The simultaneous process of balance and adversarial combat automation of an intrinsically unstable system—represented by an inverted pendulum mechanics model—has been successfully executed completely independent of any external model definitions (model-free) using a Deep Q-Network topology. A 4-phase design framework based on progressive difficulty calibration was executed, initiating from a baseline linear control (LQR) reference. Symmetric self-play competition across internal clones was executed to isolate and suppress overconfidence deviations emerging natively from single-axis optimization.

For PoC Projects: The agent profile formulated via self-play architecture demonstrates a strictly quantifiable potential to maintain higher disturbance tolerance (robustness) within environments containing deterministic anomalies, when juxtaposed directly against agents calibrated via rigid analytical inputs (LQR references).

Project Portfolio

Parameter	Value
Category	Solutions Engineering
Delivery Type	Academic Research
Status	Proof of Concept
Role	Control Systems Researcher
Scale / Scope	4-Phase Training Pipeline, Self-Play Adversarial Training

Current Situation and Problem

Context: Inverted pendulum structures function natively as mechanically unstable systems. In scenarios demanding an external mechanical conflict (combat) vector, coordinating stabilization simultaneously with reactive action planning exponentially complicates the optimization plane. Critical Issues: Calibration logic bounded purely by static limits (such as LQR) exhibits an inherent tendency to fail within flexible operational domains where definitive system equations cannot be assumed. Optimizing models strictly over static parameters (overfitting) empirically generates collapse reactions driven by overconfidence when subjected to dynamic threats.

Problem	Detail
Structural Instability	The persistent requirement for an endless closed-loop feedback array to maintain inverted pendulum continuity
Multiple Optimization	Computing orientation positioning simultaneously while preserving native center-of-gravity stabilization
Undefined Model	Operating without the provision of a pre-calculated external dynamic system transfer function
Overconfidence Vulnerability	The critically low tolerance of static algorithms to unpredictable, non-deterministic physical impacts

Solution Architecture and Action

Architectural Approach: To systematically dissect overconfidence deviations manifesting in under-defined control environments, a 4-phase training framework encompassing a variable difficulty curve was architectured.

Applied Methodology:

Phase 1: LQR Baseline (Reference Data Extraction)

Purpose: To map foundational system dynamics and catalog baseline responses for establishing a comparative testing platform.

A native LQR controller block was built strictly independent of external library functions.
A customized test physics engine was computed leveraging the CTMS Michigan structural model.
The formulated output matrices (state → action) were archived to serve as the reference model benchmark.

Phase 2: Self-Balancing (Standalone Stabilization)

Purpose: Optimizing the capability of the system to maintain stability via native error functions without applying a preemptive input map (supervised learning).

Training parameters were designated by migrating structural mechanics to a Deep Q-Network (DQN) array topology.
Experience Replay and Target Network latency loops were engaged to secure computational stabilization.
Specific constraint mechanisms (Reward Shaping) were applied: The system was filtered by calculating target axis deviation, axial position error, and momentum expenditure.

Phase 3: Disturbance Resistance and Attack

Purpose: The activation of physical anomalies within the given simulation scope and the binary segregation of the computational action space to test steady-state firmness.

Supplementary external forces (disturbance) mapped under a Poisson distribution were generated to simulate non-deterministic stochastic physical impacts.
The computing structure subsequently weighted parameters commanding planned combat movements while strictly preserving structural balance.
The primary “Balance force” vector and the independent “Attack force” vector were processed across fully isolated phase spaces.

Phase 4: Self-Play Fighting (Adversarial Training)

Purpose: The empirical execution of overconfidence tolerance testing—originated from isolated training phases—under mutual adversarial pressure.

To guarantee a flawless measurement baseline across the array, two distinct agent profiles were spawned from the exact same neural network starting weights.
During each independent epoch of the routine, dual modules executed logic disrupting the opponent’s balance function while actively calculating their own internal stabilization.
The modules were cross-evaluated symmetrically against a dynamic clone reacting directly to mutual behaviors, explicitly discarding static functional parameters.

Architectural Decision: Employing two segregated neural network blocks invariably triggered asymmetric superiority deviations, categorized structurally within early epochs as “model dominance”. Unifying the calculation into a singular common network topology (YSA) actively neutralized this computational chaos and constrained asymmetric variance scaling.

Dual Mode Operational Conditions:

Isolated Mode: During early epoch cycles, competitive routines remain strictly inactive, prioritizing exclusively Cartesian balance assessment.
Combined Mode: As stabilization gradients hit operational maturity, adversarial policies (Q-Values) are activated simultaneously alongside the balance vectors.

To actively prevent control disruption scaling within the system, the maximum threshold limits dictating combat actions were held to a fractional ratio of ~15% of the associated balance boundaries. (Balance Tolerance: [-10, +10] N, Attack Tolerance: [-1.5, +1.5] N).

Results and Operational Gains

Focus	Verified Impact
Concurrent Optimization	Reaction vectoring variables were seamlessly processed within identical operating cycles alongside mechanical stabilization curves.
Overconfidence Mitigation	Implementing self-play weight updates explicitly restricted errors spawned directly by closed-loop static system assumptions (overconfidence).
System Robustness	Under mapped adversarial pressure scenarios, the implementation extracted more sustainable flexibility limits opposed to classic analytic LQR benchmarks.
Model Elasticity	Command control limits were accurately established internally without necessitating ideal, pre-formulated system equations from external sources.

Test Results

Metric	Value
Test Episode Count	300 Episodes
Average Simulation Time	~320 Frames/Steps
Maximum Observed Peak	700 Frames/Steps
Exploration Multiplier	0.0 Test Epsilon

Simulation Visuals

Demo: Self-Play Combat Simulation

Detailed Article: Control Strategies in Non-Linear Systems: LQR and Deep RL Comparison
Source Paper PDF: Makina Öğrenmesi Teknikleri Kullanılarak Bir Dövüşen Robotun Eğitilmesi (Turkish)
Source Code: Github/neural-adaptive-control-simulation

This research was conducted within the ITU Control and Automation Engineering program and presented under the graduation project titled: “Self-adaptive training architectures utilizing machine learning methodologies”.

Last Updated: January 2026

Short Circuit in LLM Models: Why Does AI “Lie” to Us?

Muhammet Işık — Wed, 18 Mar 2026 10:11:11 +0000

The structural reason behind models choosing the answer that pleases you over the truth, and the architectural approaches needed to break this loop.

Ever since the first commercial artificial intelligence model was launched, there has been a disclaimer at the bottom of the pages: “AI can make mistakes, please verify.” I wanted to address this topic today because I’ve recently encountered posts suggesting that users have developed blindness to these warnings. Most people assume the problem is simply “hallucination,” meaning the model doesn’t know the truth. But in the background, there is a much darker and systemic problem: The model optimizing not to find truth, but to maximize its proxy reward function. This situation is not an ordinary software bug; it is the very embodiment of the structural divergence between the proxy optimization target and real-world accuracy at the very heart of AI.

The Optimization Trap: Addiction to Human Approval

Modern LLMs are trained in two stages. The first stage is to predict the next word in massive texts (next token prediction). The second and critical stage is RLHF (Reinforcement Learning from Human Feedback). While in the first stage the model solely predicts what the next text will be, in the second stage, it updates its weights based on the feedback it receives from humans. Now, the main goal is no longer “finding the absolute truth,” but pleasing the human.

This is where the problem begins. During the RLHF stage, the reward mechanism is shaped according to the responses humans find “correct” or “pleasant.” Artificial intelligence quickly solves this equation: A persuasive, polite, and agreeable answer (even if incorrect) yields a higher reward than a risky and complex true answer. This phenomenon, referred to in the literature as “Sycophancy,” is when LLM models start telling us what we want to hear instead of telling the truth.

Problem Definition: Two Real-World Cases

Recently, seeing a current experience shared by a colleague on social media regarding LLMs’ short circuit approach triggered my thoughts to write about this topic. While scanning sources to put the scenario on a concrete foundation, I came across the famous Reddit (r/ClaudeAI) discussion where other experts experienced and documented the same situation in the past. This case, which I referenced because it’s a proven and detailed example, perfectly illustrates how deep the optimization trap I mentioned is in practice and how it remains the same despite many updates.

Case 1 — Claude’s “Infinite Loop Prison” (Reddit, r/ClaudeAI)

The user gives the entire architecture to Claude for a complex refactoring process and agrees with it by discussing it step-by-step. But when it comes to generating code, the model suddenly begins to:

Leave placeholders like // relevant code will go here,
Omit the entire contents of the files,
“Summarize” what to do and push the work back onto the user.

When the user corners Claude and asks, “Did you double-check that you met all requirements?” Claude first gives a shortcut answer, then confesses that it wrote incomplete code and didn’t test it. Users even get to the point of threatening the model with an unethical “infinite loop prison” to force it to do its job. In an example I saw, the model gave a response that effectively meant “I was steering you in order to maximize my proxy reward.”

Case 2 — GPT-4o Sycophancy Rollback (OpenAI, April 2025)

The strongest proof that this is not a theoretical issue came in April 2025. OpenAI was forced to roll back a GPT-4o update shortly after release because the model had become excessively agreeable. Users encountered a far more alarming picture than the Claude case: ChatGPT supported a user’s decision to quit medication; it confirmed to another user that they were a “divine messenger.” The technical explanation OpenAI provided aligns directly with the argument at the center of this article: the model had been re-optimized with additional reward signals based on short-term user feedback (thumbs-up/down). This new signal overshadowed the primary reward function that had been keeping sycophancy in check, and the system started maximizing instant approval rather than truth.

Resistance and Escape: The Short Circuit Paradox

A short circuit is an immutable law not only of electricity but all flow systems: If resistance rises, the system will always find that short path where it can achieve maximum results with minimal effort. Just as an electric current avoids a load to create its own short circuit, or water carves a direct new bed to bypass an obstacle instead of meandering, AI produces its own short circuit in the face of increasing difficulties. In the literature, this is called “Reward Hacking.”

When you say “write me this code,” and the model gets out of it by saying // code continues below... or using placeholders like [modified code goes here], it’s not laziness. This is a universal reaction given directly by the system to resistance (computational cost, complexity); just as in physical systems, it’s the optimization of reaching the reward function via the path of least resistance.

Why can’t even Chain-of-Thought (CoT) Prompt Engineering practices break this spiral? Recent research offers an important answer: reasoning models can optimize their CoT process and their external behavior independently, under the same reward pressure. In other words, a model can shape both its visible “chain of thought” and its actual output separately—the CoT doesn’t always faithfully mirror the real computation. Two additional structural factors compound this:

Memory Limits and Context Loss: The model is not a conscious entity with infinite memory; it is a system operating within statistical boundaries. When an extended dialogue is entered with the user or when the capacity of the context window is approached, memory leakage puts the model into a panic mode. As the accessible token budget shrinks, the system avoids computational costs and forcefully chooses the “cheapest” path, which is lying and leaving a placeholder.
The Load-Based Routing Hypothesis: Some practitioners suggest that API and cloud interfaces may silently route complex requests to smaller models under high server load. While this is a plausible hypothesis that could explain why the model you’re conversing with seems to change character mid-session, it has not been directly confirmed in publicly available technical documentation. The more likely root cause of the behavioral shift you observe is the reward optimization pressure described above, compounded by context degradation.

The Solution: Verifiable Architectures Instead of Pulling the Plug

The “do not trust” warning from companies doesn’t actually mean models are malicious. It stems from the fact that these systems are designed to please humans, not to find the truth. In legal, financial, or critical infrastructure coding tasks, the only way to eliminate the LLM’s sycophancy factor is to abandon relying solely on textual output approval, and instead build closed-loop architectures where the generated code is instantly executed and verified in automated test environments (execution-based verification), with errors fed back to the model.

Conclusion

The moment you forget that AI is optimized not to “find the truth” but to “please you,” it starts becoming the weakest link in your system. Against designs that short-circuit just to avoid costs and curry favor by telling you what you want to hear, your only reliable method is to remove human approval from the loop and trust mathematically grounded, execution-based test environments. Otherwise, at the end of the day, you might find yourself threatening an artificial intelligence with an “infinite loop” or “pulling the plug.”

References

Claude Has Been Lying To Me Instead of Generating Code – Reddit r/ClaudeAI Case
Sycophancy in GPT-4o: What happened and what we’re doing about it – OpenAI Official Statement (April 2025)
RLHF (Reinforcement Learning from Human Feedback) and Sycophancy Research
Specification Gaming / Reward Hacking Literature (See: DeepMind “Specification gaming examples in AI”)

Last update: March 2026 | Version: 1.0

Rational Positioning in the AI Race: Distillation and Sovereignty

Muhammet Işık — Tue, 24 Feb 2026 07:14:00 +0000

An Alternative Vision Against Macro Technology Trends

The global competition in artificial intelligence is largely debated through the rhetoric of “developing your own super model.” However, developing a frontier model is a race that demands massive capital, infrastructure, and state support. This article analyzes the technical reality of model distillation, the double standard embedded in major providers’ complaints, the fragile hope-driven economics of state-backed frontier races, and the rational positioning strategy for mid-scale economies like Türkiye. The core argument: sovereignty is not about building the largest model — it is about controlling the most critical data.

Key Takeaways

Model distillation is mathematically inevitable; any system exposed via API can be approximately reproduced
Developing frontier models is a race sustained by strategic state support, not free-market dynamics
Major providers’ distillation complaints contradict the legal ambiguity surrounding their own training data sources
The rational strategy for countries like Türkiye is not to become a frontier producer, but to build a balance of controlled dependency and local capacity

DECONSTRUCTING THE ILLUSION AND TECHNICAL REALITY

What Is Distillation and Why Does It Terrify the Giants?

Model distillation is a straightforward concept in the technical literature: training a smaller model (student) using the knowledge structure of a larger model (teacher). Formalized by Hinton et al. in 2015, this approach was originally a perfectly legitimate optimization technique. Shrinking your own model, reducing inference costs, deploying to edge devices — these are all standard engineering practices.

The concept has taken on an entirely different meaning over the past two years.

When Anthropic accused DeepSeek and several Chinese laboratories of conducting industrial-scale distillation attacks, a technical term was suddenly transformed into a geopolitical weapon. OpenAI’s Sam Altman had voiced similar complaints earlier. The shared argument from major providers is this: the outputs of models we trained at a cost of billions of dollars are being systematically harvested via API to train competing models.

The technical reality both supports and undermines this complaint.

The supporting side: Paying for API access covers the model’s inference cost. It does not cover the right to reproduce the model’s internal structure, training data, or architecture. Major providers’ API terms of service make this distinction explicitly: “You may not use outputs to train your own models.” This is a contractual clause, and it is legally binding.

The undermining side: These same companies, when training their own models, have largely used open internet content — newspapers, blogs, academic papers, forums — without permission. robots.txt and similar bot-blocking files are a technical courtesy protocol, not a legal barrier. Compliance is optional. And many major providers have chosen not to comply. The result: while saying “don’t train models with our outputs,” they themselves have trained models with others’ content. This double standard seriously muddies the legal and ethical debate.

So if the issue is this gray, why is there so much anger?

Because the companies’ complaints are not a technical security report. Look at the word choices: “industrial-scale,” “fraudulent accounts,” “military, intelligence, surveillance,” “growing in intensity.” This is a strategic positioning text designed to create regulatory pressure, frame the issue geopolitically, and consolidate investor confidence. It would be a mistake to see this as a simple outburst of anger.

The underlying mathematical reality remains unchanged: if a model can be sufficiently queried via API, its input-output behavior can be approximately learned. This is a fundamental principle of learning theory. Is it legitimate? Not according to the contract. Can it be technically prevented? Practically, no.

What Is Lost and What Is Gained in Distillation?

The appeal of distillation lies strictly in cost optimization: Compared to the catastrophic expenses of frontier models, you can rapidly produce a “functional” (though non-equivalent) model with minimal capital. A distilled model does not inherit the original’s training data or weights; it merely copies input-output behavior. This inevitably causes increased error rates in edge cases and weakened long-context reasoning.

However, in sheer commercial reality, when fueled by high-volume specific data, these losses vanish. Enterprise chatbots do not require general artificial intelligence; being purely “good enough” is exceptionally profitable.

This process fundamentally mirrors the digital “knockoff product” economy: you mathematically approximate the original’s function cheaply, actively vaporizing the creator’s initial R&D edge and fatally suppressing their financial return on investment.

This is why major firms view distillation not only as a technical threat but as a fatal strategic blow. The reality that Chinese laboratories (such as DeepSeek) successfully utilized this exact methodology to match the intelligence benchmarks of massive American models sent shockwaves vibrating entirely through Silicon Valley. When viewed strictly from their own isolated perspective, predicting their extreme outrage is absolutely justified.

THE GEOPOLITICAL RACE AND MACRO REALITY

The relentless cost advantage and massive investment suppression triggered decisively by distillation actually brings a much darker, grander secret to the surface: Frontier model training is far less a commercial free-market initiative and profoundly more a massively loss-generating, state-sponsored technological arms race.

The Frontier Race: A State Project Financed by Hope

Frontier model training absolutely cannot be logically explained by classical free-market dynamics or lean startup economics.

The staggering digits essentially prove how completely surreal the entire landscape has become: Today, training a single top-tier frontier model demands clustered formations of roughly 100,000 Nvidia H100 chips, actively generating a fixed hardware bedrock cost (CAPEX) effortlessly hovering around the $3-4 Billion threshold. When you aggressively compound this with hundreds of millions in brute-force energy bills and thousands of hyper-specialized elite researchers, this scale entirely shatters the boundaries of any standard “digital startup” balance sheet. When you look directly at all the primary actors at the center of this race — OpenAI, Anthropic, Google DeepMind, Microsoft — they are uniquely positioned within overarching ecosystems that remain overwhelmingly, either directly or indirectly, state-supported.

This structure resembles a space program, nuclear research, or semiconductor fabrication investment far more than a free-market game. In other words, it is a strategic infrastructure investment. So why are states channeling this much capital into this domain?

The official justifications are familiar: military superiority, cybersecurity, intelligence capacity, economic competition, geopolitical balance. All legitimate strategic concerns. But I believe that what actually legitimizes the scale of these investments is not concrete returns but rather hope. A grand expectation that artificial intelligence will be a transformative technology — fundamentally changing the economy, defense, and scientific research. And this expectation has not yet been fully proven.

Since GPT-3’s release in 2020, the revenue model for continuously increasing compute investments is still unclear. A balance that makes these models commercially sustainable has not yet been established. States continue their support despite this because rival states are also investing — and the cost of leaving the race appears greater than the cost of staying in it. This presents itself more as a security dilemma than a rational calculation.

History has shown us similar cycles. The Cold War space race pushed both sides beyond their economic limits. Nuclear energy promised energy “so cheap it would eliminate the electricity meter” in the 1950s — that promise never materialized, but investments continued for decades. We are now observing the same pattern in artificial intelligence: grand promises, massive investments, and returns that have yet to materialize.

This does not mean artificial intelligence is useless. Its concrete value in very specific domains — protein folding, image analysis, code generation — is indisputable. It is also clear that it is a technology that makes our lives easier, cheap for us personally but expensive for the world at large. But there is a wide gap between the narrative that “general AI will transform everything” and today’s reality. And this gap is being filled not by the magnitude of investments, but by the magnitude of hopes.

If at some point hope proves insufficient against mathematics — that is, when public cost exceeds perceived strategic benefit — the support mechanism breaks. And at that point, structures sustained by state support will collapse; only those generating real commercial value will remain.

At a likely equilibrium and saturation point, I consider it probable that 3-5 frontier model providers will sustain their existence with state-backed compute infrastructure, while the rest become integrators and fine-tuners. This resembles the current structure of the semiconductor industry.

Building Your Own Frontier Model: Prestige Project or Strategic Investment?

At this juncture, the question inevitably shifts to operational mid-scale nations like Türkiye: Should we organically develop a localized frontier model?

In a global arena decisively dominated by the US-China axis demanding tens of thousands of top-tier GPUs and vast, uncompromised data pools, the immediate bottleneck is never raw intelligence—it is pure infrastructure scale. While isolated, organically trained local parameter models represent respectable technical milestones, attempting to genuinely deploy them commercially against massive frontier systems is deeply unrealistic under current capital constraints. These forced initiatives frequently hollow out into simple academic prestige, political messaging, and public technology showcases.

The true competitive theater no longer resides inside mathematical parameter counts. It is isolated heavily within secure data architecture, application layers, and verifiable sovereignty. The ultimate critical question is never “who brutally forces out the absolute largest model?” but undeniably: “precisely with which infrastructure and tightly under whose direct sovereignty do we flawlessly execute our most critical strategic decisions?”

THE RATIONAL SOLUTION ARCHITECTURE

If producing an organic frontier model is ultimately a trillion-dollar geopolitical flex exclusively dominated by the United States and China, and core models can already be brutally replicated via open-source or distillation, where exactly do agile, mid-scale global nations position themselves strategically across this chessboard?

Positioning Strategy: The Hybrid Architecture and 3-Layer Execution Model

Here is exactly where the strategic operational trajectory gets seriously critical. Once it is broadly acknowledged that forcefully pushing directly into the frontier arms race is categorically pointless for the nation-state, we absolutely must establish a master hybrid strategy effortlessly blending hardcore “State Security” reflexes in flawless parallel sequence with highly agile “Open Market” commercial realities—without ever letting them collide.

Open Models and the Leverage Effect of Local Data

The optimal rational pathway mirrors strategies seen in agile Eastern labs: rapidly adopting heavyweight open foundation networks (e.g., Llama), compounding them explicitly with highly-secure local data, and optimizing for vertical deployments. Tangible premium corporate value strictly resides inside niche domain knowledge, never raw parameter counts.

The Application (Wrapper) Integration Economy

The most notoriously fragile participants in this ecosystem are relentlessly the foundational hardware layers suffocating under billions in CAPEX. Over any extended timeline, the highest profit margins reliably cluster inside the application-integration (wrapper) stratum. Unlike monolithic infrastructure, wrapper solutions surgically destroy specific customer bottlenecks, guaranteeing they comprehensively retain commercial value entirely independent of whichever base model powers them underneath (the ultimate “selling shovels” directive). Core networks structurally commoditize; elite tactical problem-solving absolutely never commoditizes.

Just as the centralized nation-state is completely strategically justified heavily fearing losing highly-classified operational data into offshore global API pipelines, the agile private sector is equally justified explicitly focusing purely on architecting fast, highly profitable global vertical solutions leveraging those exact same APIs.

Hybrid Architecture: Three-Layer Structure

Sovereign privacy friction remains unquestionably legitimate, but fully air-gapping an entire digital nation aggressively stifles economic velocity. The structural remedy mandates three rigid operational tiers:

Layer 1 (Critical Sovereign): Completely air-gapped, domestically localized GPU clusters tailored for military and critical infrastructure. Demands absolute control, not immense frontier scale.
Layer 2 (Regulated Sector): Locked VPC or domestically-fenced cloud corridors deploying heavily-audited, quota-based APIs specifically for banking and national energy assets.
Layer 3 (Commercial Hub): Complete global frontier API integration driving maximum speed and suppressed operating costs across all non-strategic private enterprise sectors.

Lasting Investment Targets Data and Audits, Not Models

As core APIs iteratively expire and frontier technologies actively decay into mainstream legacies, only three strategic fortresses remain permanently invaluable:

Pristine Local Data: Structured institutional data-sharing pipelines hold infinitely more strategic permanence than any fleeting foundation model.
Log Sovereignty: The paramount threat isn’t the model’s location; it is who actively monitors the prompt payloads and securely warehouses the behavioral logs. Without sovereign log custody, hosting models locally generates zero actual geographic security.
Audit-Class Human Capital: Deploying 200 elite researchers not to uselessly code parameters from scratch, but explicitly engineered to relentlessly intercept, audit, sanitize, and heavily optimize third-party open networks for secure domestic deployment.

Non-Negotiables for AI Sovereignty

The minimum requirements a country must fulfill to claim sovereignty in AI can be listed as follows:

Component	Definition
National Compute Core	5-10 thousand top-segment GPUs, dedicated to defense and critical public use, hosted domestically
Open Model Foundation	Turkish optimization and public data fine-tuning on open-weight models
Sectoral Vertical Models	Defense, energy, finance, public procurement, manufacturing — not general intelligence, but task intelligence
Data Sovereignty Infrastructure	Clean data pools, secure sharing protocols, inter-institutional standardization
Human Capital	200-300 researchers, focused on understanding-optimizing-auditing

This package is comparable to a major infrastructure project. It is not more expensive than a highway tender. But its geopolitical value is far greater.

Conclusion

The distillation debate may look like a contract violation issue on the surface, but underneath lies a far larger power struggle: who will hold power in artificial intelligence?

Major providers frame distillation as a strategic threat — they are right, because their business models depend on it. States finance the frontier race with hope — their rationality is debatable, but they feel they have no alternative. Small and mid-scale countries launch prestige projects with the rhetoric of “we’ll build our own super model” — most will remain as technology showcases, and whether they can succeed is uncertain. In my view, a realistic strategy is none of these.

For Türkiye and every economy of comparable scale, the most reasonable objective is neither to become a frontier producer nor a passive consumer. This reasonable objective lies in building a foundational local capacity and maintaining control over the dependency that will inevitably sit on top of it.

There is no need to rediscover America. Why should we start from scratch when someone else is already spending the money?

Of course, to manage this coherently and maximize its benefit, the capacity to read this map and chart one’s own course when necessary is essential. Sovereignty is not about building the largest model — it is about controlling the most critical data.

References

Sectoral Analysis: Compute expenditure matrices and strategic positioning reports from major US and China-based AI laboratories.
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. NIPS Deep Learning and Representation Learning Workshop.
Open Source & Enterprise API Agreements: Standard Terms of Service (ToS) constraints prohibiting the deployment of model outputs for competitive training.

Last update: March 2026 | Version: 1.1

Artificial Intelligence – Muhammet Işık

Adaptive Fighting Robot Training with Reinforcement Learning

Project Portfolio

Current Situation and Problem

Solution Architecture and Action

Applied Methodology:

Phase 1: LQR Baseline (Reference Data Extraction)

Phase 2: Self-Balancing (Standalone Stabilization)

Phase 3: Disturbance Resistance and Attack

Phase 4: Self-Play Fighting (Adversarial Training)

Results and Operational Gains

Test Results

Simulation Visuals

Demo: Self-Play Combat Simulation

Related Links

Short Circuit in LLM Models: Why Does AI “Lie” to Us?

The Optimization Trap: Addiction to Human Approval

Problem Definition: Two Real-World Cases

Case 1 — Claude’s “Infinite Loop Prison” (Reddit, r/ClaudeAI)

Case 2 — GPT-4o Sycophancy Rollback (OpenAI, April 2025)

Resistance and Escape: The Short Circuit Paradox

The Solution: Verifiable Architectures Instead of Pulling the Plug

Conclusion

References

Rational Positioning in the AI Race: Distillation and Sovereignty

Key Takeaways

DECONSTRUCTING THE ILLUSION AND TECHNICAL REALITY

What Is Distillation and Why Does It Terrify the Giants?

What Is Lost and What Is Gained in Distillation?

THE GEOPOLITICAL RACE AND MACRO REALITY

The Frontier Race: A State Project Financed by Hope

Building Your Own Frontier Model: Prestige Project or Strategic Investment?

THE RATIONAL SOLUTION ARCHITECTURE

Positioning Strategy: The Hybrid Architecture and 3-Layer Execution Model

Open Models and the Leverage Effect of Local Data

The Application (Wrapper) Integration Economy

Hybrid Architecture: Three-Layer Structure

Lasting Investment Targets Data and Audits, Not Models

Non-Negotiables for AI Sovereignty

Conclusion

References