@misc{zhang2024autocoderover,
title={AutoCodeRover: Autonomous Program Improvement},
author={Yuntong Zhang and Haifeng Ruan and Zhiyu Fan and Abhik Roychoudhury},
year={2024},
eprint={2404.05427},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
A SequenceInputStream represents the logical concatenation of other input streams. It starts out with an ordered collection of input streams and reads from the first one until end of file is reached, whereupon it reads from the second one, and so on, until end of file is reached on the last of the contained input streams.
pgloader will keep a separate file of rejected data, but continue trying to copy good data in your database.
pgloader also implements data reformatting, a typical example of that being the transformation of MySQL datestamps 0000-00-00 and 0000-00-00 00:00:00 to PostgreSQL NULL value
A very common workflow is to index some data based on its embeddings and then given a new query embedding retrieve the most similar examples with k-Nearest Neighbor search. For example, you can imagine embedding a large collection of papers by their abstracts and then given a new paper of interest retrieve the most similar papers to it.
TLDR in my experience it ~always works better to use an SVM instead of kNN, if you can afford the slight computational hit
["slug" being an entity attribute]
Spring Data offers an existsBy query method, which we can define in the PostRepository, as follows:
1
2
3
4
5
6
@Repository
public interface PostRepository
extends JpaRepository<Post, Long> {
boolean existsBySlug(String slug);
}
[another] option to emulate existence is using a CASE WHEN EXISTS native SQL query:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@Repository
public interface PostRepository
extends JpaRepository<Post, Long> {
@Query(value = """
SELECT
CASE WHEN EXISTS (
SELECT 1
FROM post
WHERE slug = :slug
)
THEN 'true'
ELSE 'false'
END
""",
nativeQuery = true
)
boolean existsBySlugWithCase(@Param("slug") String slug);
}
@Repository
public interface PostRepository extends BaseJpaRepository<Post, Long> {
@Query("""
select p
from Post p
where date(p.createdOn) >= :sinceDate
"""
)
@QueryHints(
@QueryHint(name = AvailableHints.HINT_FETCH_SIZE, value = "25")
)
Stream<Post> streamByCreatedOnSince(@Param("sinceDate") LocalDate sinceDate);
}
The FETCH_SIZE JPA query hint is necessary for PostgreSQL and MySQL to instruct the JDBC Driver to prefetch at most 25 records. Otherwise, the PostgreSQL and MySQL JDBC Drivers would prefetch all the query results prior to traversing the underlying ResultSet.
C. Biemann. Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, page 73--80. Stroudsburg, PA, USA, Association for Computational Linguistics, (2006)
R. Koeling, D. McCarthy, and J. Carroll. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, page 419--426. Stroudsburg, PA, USA, Association for Computational Linguistics, (2005)
D. Martinez, and E. Agirre. Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13, page 207--215. Stroudsburg, PA, USA, Association for Computational Linguistics, (2000)
R. Navigli, K. Litkowski, and O. Hargraves. Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), page 30--35. Prague, Czech Republic, Association for Computational Linguistics, (June 2007)
E. Voorhees. Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, page 171--180. New York, NY, USA, ACM, (1993)
M. Carpuat, and D. Wu. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, page 387--394. Stroudsburg, PA, USA, Association for Computational Linguistics, (2005)
D. Cutting, D. Karger, J. Pedersen, and J. Tukey. SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, page 318--329. New York, NY, USA, ACM Press, (1992)
K. Macherey, A. Dai, D. Talbot, A. Popat, and F. Och. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-11), Portland, OR, (2011)
M. Sanderson. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, page 142--151. New York, NY, USA, Springer-Verlag New York, Inc., (1994)
H. Schütze, and J. Pedersen. Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval, page 161--175. Las Vegas, USA, (1995)
G. Miller, C. Leacock, R. Tengi, and R. Bunker. Proceedings of the workshop on Human Language Technology, page 303--308. Stroudsburg, PA, USA, Association for Computational Linguistics, (1993)
W. Elshamy, D. Caragea, and W. Hsu. Proceedings of the 5th International Workshop on Semantic Evaluation, page 367--370. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
P. Pantel, and D. Lin. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, page 613--619. New York, NY, USA, ACM, (2002)
B. Snyder, and M. Palmer. Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, page 41--43. Barcelona, Spain, Association for Computational Linguistics, (July 2004)
D. Jurgens, and K. Stevens. Proceedings of the First workshop on Unsupervised Learning in NLP, page 113--123. Edinburgh, Scotland, Association for Computational Linguistics, (July 2011)
E. Agirre, and A. Soroa. Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), page 7--12. Prague, Czech Republic, Association for Computational Linguistics, (June 2007)
S. Brody, and M. Lapata. Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), page 103-111. Athens, Greece, Association for Computational Linguistics, (March 2009)
S. Manandhar, I. Klapaftis, D. Dligach, and S. Pradhan. Proceedings of the 5th International Workshop on Semantic Evaluation, page 63--68. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
I. Korkontzelos, and S. Manandhar. Proceedings of the 5th International Workshop on Semantic Evaluation, page 355--358. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
D. Jurgens, and K. Stevens. Proceedings of the 5th International Workshop on Semantic Evaluation, page 359--362. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
R. Kern, M. Muhr, and M. Granitzer. Proceedings of the 5th International Workshop on Semantic Evaluation, page 351--354. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
T. Pedersen. Proceedings of the 5th International Workshop on Semantic Evaluation, page 363--366. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
P. Kanerva, J. Kristofersson, and A. Holst. Proceedings of the 22nd Annual Conference of the Cognitive Science Society, 1036, Erlbaum, New Jersey, (2000)
C. Monson, A. Lavie, J. Carbonell, and L. Levin. In Proceedings of the Seventh Meeting of the ACL Special Interest Group in Computational Phonology (SIGPHON ’04, page 52--61. (2004)
E. Shutova. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, page 1029--1037. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)
D. Lin, and P. Pantel. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'01), page 323--328. New York, NY, USA, ACM Press, (2001)
D. Milne, and I. Witten. CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge mining, page 509--518. New York, NY, USA, ACM, (2008)
P. Talukdar, T. Brants, M. Liberman, and F. Pereira. Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), page 141--148. New York City, Association for Computational Linguistics, (June 2006)
D. Lewis. Proceedings of ECML-98, 10th European Conference on Machine Learning, 1398, page 4--15. Chemnitz, DE, Springer Verlag, Heidelberg, DE, (1998)
T. Joachims. Proceedings of ICML-99, 16th International Conference on Machine Learning, page 200--209. Bled, SL, Morgan Kaufmann Publishers, San Francisco, US, (1999)
Y. Yang, and X. Liu. SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, page 42--49. New York, NY, USA, ACM Press, (1999)
W. Cavnar, and J. Trenkle. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, page 161--175. Las Vegas, US, (1994)
D. Hickok, D. Lesniak, and M. Rowe. 38th Midwest Instruction and Computing Symposium April 8 - 9, 2005. University of Wisconsin-Eau Claire, Eau Claire, WI, (2005)
D. Lewis. SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, page 37--50. New York, NY, USA, ACM Press, (1992)