Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning.

X. Yu, C. Bai, H. Guo, C. Wang, and Z. Wang. CoRR, (2024)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Hongyi Wang

Hongyi Chu

Xin Guo

Baoliang Guo

Yubao Guo

Other publications of authors with the same name

Behavior Contrastive Learning for Unsupervised Skill Discovery.R. Yang, C. Bai, H. Guo, S. Li, B. Zhao, Z. Wang, P. Liu, and X. Li. ICML, volume 202 of Proceedings of Machine Learning Research, page 39183-39204. PMLR, (2023)Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games.H. Guo, Z. Fu, Z. Yang, and Z. Wang. ICML, volume 139 of Proceedings of Machine Learning Research, page 3899-3909. PMLR, (2021)Life Assistants for the Elderly Based on Mobile Devices.W. Diao, Z. Gao, R. Xu, Y. Xie, K. Yan, and H. Guo. DASC/PiCom/DataCom/CyberSciTech, page 537-542. IEEE, (2019)Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning.X. Yu, C. Bai, H. Guo, C. Wang, and Z. Wang. CoRR, (2024)Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes.H. Guo, Q. Cai, Y. Zhang, Z. Yang, and Z. Wang. ICML, volume 162 of Proceedings of Machine Learning Research, page 8016-8038. PMLR, (2022)Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards.W. Shen, X. Zhang, Y. Yao, R. Zheng, H. Guo, and Y. Liu. CoRR, (2024)Policy Learning Using Weak Supervision.J. Wang, H. Guo, Z. Zhu, and Y. Liu. NeurIPS, page 19960-19973. (2021)Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer.Z. Liu, M. Lu, S. Zhang, B. Liu, H. Guo, Y. Yang, J. Blanchet, and Z. Wang. CoRR, (2024)Toward Optimal LLM Alignments Using Two-Player Games.R. Zheng, H. Guo, Z. Liu, X. Zhang, Y. Yao, X. Xu, Z. Wang, Z. Xi, T. Gui, Q. Zhang and 3 other author(s). CoRR, (2024)Automatic Threshold Calculation Based Label Propagation Algorithm for Overlapping Community.G. Liu, K. Meng, H. Guo, L. Pan, and J. Li. DSC, page 382-387. IEEE Computer Society, (2016)

BibSonomy

Disambiguation of "Guo, Hongyi"

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning.

Please choose a person to relate this publication to

Hongyi Wang

Hongyi Chu

Xin Guo

Baoliang Guo

Yubao Guo

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Guo, Hongyi"

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning.

Please choose a person to relate this publication to

Hongyi Wang

Hongyi Chu

Xin Guo

Baoliang Guo

Yubao Guo

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning.